Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
ok there were a couple of similar issues such as in-set-operations query 9,
group-analytics.sql.out query 21 and 22
---
If your project is set up for it, you can reply to this email and have
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
ok so here is an example of output I'm not sure is correct:
in-order-by
-- !query 17
SELECT Count(DISTINCT( t1a )),
t1b
FROM t1
WHERE t1h NOT IN (SELECT
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
@hvanhovell So I backed out the changes in this PR, implemented your change
to SQLQueryTestSuite.getNormalizedResult, regenerated the golden results files
and the tests all pass on my x86 and big
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
I think that the current "order if not currently ordered" in the test suite
is good for checking the set of results for unordered queries.
If ordered at all then the resu
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
Jenkins retest please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
@gatorsmile I'm glad it wasn't just me that found it complex ;-)
I've modified the patch to remove an unnecessary change as that query was
not ordered and the test suite code handles
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/17039
@hvanhovell @gatorsmile I agree that would be a better solution however I
don't know how to achieve that being unfamiliar with this code.
---
If your project is set up for it, you can reply
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/17039
[SPARK-19710] Fix ordering of rows in query results
## What changes were proposed in this pull request?
Changes to SQLQueryTests to make the order of the results constant.
Where possible
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16841
OK I'll raise a separate Jira, document the differences and submit a PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16841
@kevinyu98 Several of the new tests fail on Big Endian platforms. It
appears that rows are returned in a slightly different order but are still a
correct output from the query. For example
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/16795#discussion_r99622258
--- Diff: sql/core/pom.xml ---
@@ -130,6 +130,12 @@
test
+ org.apache.avro
--- End diff
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16751
Sorry, I've been away for the w/end. Yes we use maven for our test runs.
Looks like you have it under control.
Thanks
---
If your project is set up for it, you can reply to this email
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16751
Since this commit our test runs are failing with
ParquetAvroCompatibilitySuite:
*** RUN ABORTED ***
java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16375
Test run is failing with an unrelated error
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/16375
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/16375#discussion_r93587770
--- Diff:
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
---
@@ -591,7 +591,11 @@ public void writeToOutputStreamIntArray
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/16375
[SPRK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray
test
fails on big endian. Only change byte order on little endian
## What changes were proposed in this pull
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/15307
This PR seems to cause intermittent test failures eg:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1736/testReport/junit/org.apache.spark.sql.streaming
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/15464
Tests all pass on big-endian with this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/15464
This PR contains a change to o.a.s.sql.hive.StatisticsSuite which I
believe should fix that issue (awaiting big-endian build to complete)
---
If your project is set up for it, you can reply
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/15464
[SPARK-17827][SQL]maxColLength type should be Int for String and Binary
## What changes were proposed in this pull request?
correct the expected type from Length function to be Int
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13652
Also failing here in the UK:
{noformat}
- to UTC timestamp *** FAILED ***
"2016-03-13 [02]:00:00.0" did not equal "2016-03-13 [10]:00:00.0"
(DateTime
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13707
So clearly that code doesn't work when the type is a primitive. I'm not
familiar with the code generation. Is there a way to detect the type during
generation rather than generating the dodgy
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13707
@davies @hvanhovell Can you take a look at this. I'm not sure it is the
best fix. Also are there any other types (structs, arrays etc) that are created
by pointing into an UnsafeRow that could
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/13707
[SPARK-15822][SQL] avoid UTF8String references into freed pages
## What changes were proposed in this pull request?
In SMJ codegen we need to save copies of UTF8String values
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13589
As Adam says I still get the segv with OpenJDK on linux amd64 running our
app. This fix does appear to fix the issue reported in
https://issues.apache.org/jira/browse/SPARK-15825
---
If your
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13355
@zsxwing ok to merge now?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user robbinspg commented on the issue:
https://github.com/apache/spark/pull/13355
Test suite removed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/13355#discussion_r65332472
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala ---
@@ -38,7 +38,8 @@ class BlockManagerMaster(
/** Remove
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13355
reverted original fix and replaced with using non-blocking call in
BlockManagerMaster.removeExecutor.
Also added a new test suite to run Distributed suite forcing the number
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13355
OK, that's what I tried but it threw up some errors in some other tests
which I'm investigating.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13355
@zsxwing Do you mean change BlockManagerMaster.removeExecutor to send the
message using send (fire and forget) rather than askWithRetry?
---
If your project is set up for it, you can
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13355#issuecomment-09844
agreed. I'll take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13355#issuecomment-222104521
Although this patch resolves this particular issue I would echo the comment
in https://github.com/apache/spark/pull/11728 by @zsxwing
{quote}
However
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/13355
Use a minimum of 3 dispatcher threads to avoid deadlocks
## What changes were proposed in this pull request?
Set minimum number of dispatcher threads to 3 to avoid deadlocks on
machines
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/13009#issuecomment-218105338
should I add an assert into the LongHashedRelation.apply to validate the
key and a test to cover this?
---
If your project is set up for it, you can reply
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/13009
[Spark-15154][SQL] Change key types to Long in tests
## What changes were proposed in this pull request?
As reported in the Jira the 2 tests changed here are using a key of type
Integer
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-216360442
Many thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-215690156
Sorry to keep bugging you on this but I'd really like to fix this major
issue and move on. If there are no objections to merging this into master could
a committer
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-214981491
@rxin @hvanhovell is there anything preventing this being merged? IMHO the
jira it is fixing is a blocking defect
---
If your project is set up for it, you can
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-214629761
@hvanhovell Spark 1.6.1 is fine on BE. The issues have been with new
function added for Spark 2.0. This PR fixes the major issue. There are a few
other issues which
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-214429928
@hvanhovell Can me merge this now?
I agree the benchmarks should run after a steady state is achieved. Also
I'll probably create a change to allow
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-213670252
Can we re test this as I think there was a minor change since the test build
---
If your project is set up for it, you can reply to this email and have your
reply
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12501#issuecomment-213670198
closing this in favour of other implementation
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user robbinspg closed the pull request at:
https://github.com/apache/spark/pull/12501
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-213482568
@rxin Do you think we can merge in this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/12610
[SPARK-14848][SQL] Compare as Set in DatasetSuite - Java encoder
## What changes were proposed in this pull request?
Change test to compare sets rather than sequence
## How
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212930732
@hvanhovell Here are the test results running 10x the size:
[ParquetReadBenchmarks.txt](https://github.com/apache/spark/files/230027/ParquetReadBenchmarks.txt
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212789051
@hvanhovell Any thoughts/interpretations on those benchmark results? I
think the differences are all within the bounds of randomness!
---
If your project is set up
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212404982
[ParquetReadBenchmark-PartitionedTable.txt](https://github.com/apache/spark/files/227908/ParquetReadBenchmark-PartitionedTable.txt)
---
If your project is set up
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212388759
Averaged results for 5 runs for first 3 benchmarks:
[ParquetReadBenchmark.txt](https://github.com/apache/spark/files/227827/ParquetReadBenchmark.txt
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212358250
@hvanhovell Yes I will. I'm trying to get a stable base benchmark first as
running the ParquetReadBenchmark repeatedly against the base code (before
either PR) I
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-212024789
Alternative implementation in https://github.com/apache/spark/pull/12501
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/12501
[SPARK-13745][SQL]Support columnar in memory representation on Big Endian
platforms - implent by subclassing
## What changes were proposed in this pull request?
An alternative
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-211952868
So I have ran the ParquetReadBenchmark several times before this PR and
after. I'm not sure how to interpret the results though as there is quite a
variation
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-211565728
will do
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-211563273
@rxin so what do we need to do to get this into 2.0.0? Although the JIRA
is of type "improvement" I could argue that it is a blocking defect as Spark
has
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-211247119
I haven't ran any explicit performance tests for this. Do we have any
specific to this area?
Using the static final boolean is allowing the jit to eliminate
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-210549840
@nongli
So I've changed the patch to wrap the buffer in initFromPage but only for
Big Endian. I'd like to see this patch get in to 2.0.0 so Big Endian
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/12397#discussion_r59863075
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -31,6 +33,8 @@
private
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/12397#discussion_r59770275
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -31,6 +33,8 @@
private
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/12397#issuecomment-209971909
@nongli please can you review this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/12397
[SPARK-13745][SQL]Support columnar in memory representation on Big Endian
platforms
## What changes were proposed in this pull request?
parquet datasource and ColumnarBatch tests fail
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10628#issuecomment-209832195
@nongli I'm just about there with a solution for Big Endian platforms and
will be using https://issues.apache.org/jira/browse/SPARK-14151 for the changes.
I
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10628#issuecomment-206732654
We are actually seeing this issue in the OnHeap code as the byte array
passed in to putIntLittleEndian is in little endian and the code is trying to
read
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10628#issuecomment-204684533
So big endian implementations of OffHeapColumnVector and
OnHeapColumnVector are needed. I don't think we'd want to have an inline 'if
(bigEndian)' in the relevent
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-168963270
I have a fix for the test failure. Should I create a new Jira and PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/10599
[SPARK-12647][SQL] Fix
o.a.s.sqlexecution.ExchangeCoordinatorSuite.determining the number of reducers:
aggregate operator
change expected partition sizes
You can merge this pull request
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-168973101
created https://issues.apache.org/jira/browse/SPARK-12647 and associated PR
---
If your project is set up for it, you can reply to this email and have your
reply
Github user robbinspg closed the pull request at:
https://github.com/apache/spark/pull/10599
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10599#issuecomment-169148536
I closed this as per request but it states "Closed with unmerged commits"
---
If your project is set up for it, you can reply to this email and have
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-168919276
Merging this into the 1.6 stream has caused a test failure in
org.apache.spark.sql.execution.ExchangeCoordinatorSuite.determining the
number of reducers
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-168172679
re-merged with latest master
Please retest
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-168091202
Fixed scala style check
Please retest
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user robbinspg commented on a diff in the pull request:
https://github.com/apache/spark/pull/10421#discussion_r48599743
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
---
@@ -171,7 +171,7 @@ object
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-167738123
@rxin as the original author of this code could you please review the PR?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/10421#issuecomment-166585412
I believe this is uncovering a test failure in ExchangeCoordinatorSuite so
this please hold this PR until I investigate further.
- determining the number
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/10421
[SPARK-12470] Fix size reduction calculation
also only allocate required buffer size
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/8008#issuecomment-142583322
My 1.5 branch build is failing with as described in SPARK-9710 and I notice
that this merge didn't make it into that branch. Any chance this will be
backported
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/8605
[SPARK-10454][Spark Core] wait for empty event queue
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/robbinspg/spark-1 DAGSchedulerSuite-fix
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/8589
[SPARK-9869][Streaming] Wait for all event notifications before asserting
results
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/robbinspg
GitHub user robbinspg opened a pull request:
https://github.com/apache/spark/pull/8582
[SPARK-10431][ Spark Core ] Fix intermittent test failure. Wait for event
queue to be clear
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user robbinspg commented on the pull request:
https://github.com/apache/spark/pull/8582#issuecomment-137432486
I see the test failure is https://issues.apache.org/jira/browse/SPARK-9869
which I'm sure is not related to this pull request.
Ironically, looking at SPARK
83 matches
Mail list logo