Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-67382177
hi @erikerlandson, thanks for working on this. It would be great to have a
solution to this long running problem. Since it looks like there is still some
work to be
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-67382250
Do please reopen though once you having something that is passing tests :)
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/3079
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-63881800
For reference, this other issue has some overlap:
https://issues.apache.org/jira/browse/SPARK-4514
---
If your project is set up for it, you can reply to this
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/3079#discussion_r20062337
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +117,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675397
[Test build #22880 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull)
for PR 3079 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675448
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675446
[Test build #22880 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull)
for PR 3079 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61704937
[Test build #22892 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull)
for PR 3079 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61719975
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61719969
[Test build #22892 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull)
for PR 3079 at commit
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-61508261
@marmbrus, FWIW, the `correlationoptimizer14` test appears to be working
for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true
./dev/run-tests
GitHub user erikerlandson opened a pull request:
https://github.com/apache/spark/pull/3079
[SPARK-1021] Defer the data-driven computation of partition bounds in so...
...rtByKey() until evaluation.
You can merge this pull request into a Git repository by running:
$ git pull
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61555496
Reboot of #1689
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556278
[Test build #22828 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull)
for PR 3079 at commit
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556289
@marmbrus, @scwf, FWIW, the `correlationoptimizer14` test appears to be
working for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556754
@erikerlandson I think you also need -Phive for the tests to run. It is
possible some other things changed (or even that that test case changed with
the upgrade to
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61565401
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61565392
[Test build #22828 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull)
for PR 3079 at commit
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57106886
Since this PR was merged the correlationoptimizer14 test has been hanging.
We might want to consider rolling back. You can reproduce the problem as
follows: `sbt
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57108427
I reverted this commit. @erikerlandson mind taking a look at this problem?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57110142
@rxin @marmbrus I will check it out
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043705
Actually I looked at it again. I don't think it would block the scheduler
because we compute partitions outside the scheduler thread. This approach looks
good to me!
---
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122197
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
}
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122212
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122214
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043822
@erikerlandson i'm going to merge this first. Maybe we can do the cleanup
later.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1689
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043862
BTW one thing that would be great to add is a test that makes sure we don't
block the main dag scheduler thread. The reason I think we don't block is that
we call
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043930
Have either of you thought about how to coordinate this with Josh's work on
SPARK-3626? https://github.com/apache/spark/pull/2482
---
If your project is set up for
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55797086
Yea I don't think we need to fully solve 3 here.
My main concern with these set of changes is 2, since a single badly
behaved RDD can potentially block the
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55805772
So far the best idea I have for (2) is to set some kind of time-out on the
evaluation. The bound computation uses subsampling that will (when all goes
well) cap
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55627362
Hi @rxin,
1) SimpleFutureAction is still referred to in submitJob method, but that
doesn't appear to be invoked anywhere. I was reluctant to get rid of
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55628401
Or, maybe just look into playing the same game with the cogrouped RDDs that
I did with sortByKey. Don't get into invoking `defaultPartitioner` until
somebody
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55456438
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55457077
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull)
for PR 1689 at commit
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55458226
@erikerlandson thanks for looking at this.
A few questions:
1. After this pull request, does anything still use SimpleFutureAction?
2. If I understand
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55464403
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull)
for PR 1689 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-54694535
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52397817
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull)
for PR 1689 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52400243
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull)
for PR 1689 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52336221
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull)
for PR 1689 at commit
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52336202
Latest push updates RangePartition sampling job to be async, and updates
the async action functions so that they will properly enclose the sampling job
induced by
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52339006
Excellent! I'll try to find some time to review this soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52342401
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull)
for PR 1689 at commit
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15919599
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var
Github user erikerlandson commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15931609
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15900503
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,13 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-51421389
QA tests have started for PR 1689. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18089/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-51424177
QA results for PR 1689:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15919352
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
}
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50765803
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user erikerlandson opened a pull request:
https://github.com/apache/spark/pull/1689
[SPARK-1021] Defer the data-driven computation of partition bounds in so...
...rtByKey() until evaluation.
You can merge this pull request into a Git repository by running:
$ git pull
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50824343
Jenkins, this is ok to test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50824621
QA tests have started for PR 1689. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17611/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50829158
QA results for PR 1689:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
56 matches
Mail list logo