[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16574 I need to make a survey for better Cartesian implementation, especially in shuffle way. Close this PR for now and when the new solution is done I will reopen it. --- If your project is set up

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16574 @mridulm Year, I know you are worried about the shuffling cost here. Currently when `spark.shuffle.reduceLocality.enabled` is true(by default), each shuffling reducer will be launched on th

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16574 Couple of points : a) Can recomputation be expensive ? Unfortunately, yes if not used properly. For better or for worse, this has been the implementation in spark since early days - pr

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16574 @mridulm En...so that still keep `NarrowDependency` seems better, but I think the recomputation is a serious problem when parents RDD not persisted, I think in this case we should try to print

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16574 This is a behavior change and will break expectations from existing code depending on cartesian to not go through shuffle (particularly when data is already persisted). --- If your project is

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71328/ Test FAILed. ---

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71328/testReport)** for PR 16574 at commit [`815063b`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71328/testReport)** for PR 16574 at commit [`815063b`](https://github.com/apache/spark/commit/81

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16574 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71322/ Test FAILed. ---

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71322/testReport)** for PR 16574 at commit [`815063b`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71322/testReport)** for PR 16574 at commit [`815063b`](https://github.com/apache/spark/commit/81

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71321/testReport)** for PR 16574 at commit [`e114eed`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71321/ Test FAILed. ---

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71321/testReport)** for PR 16574 at commit [`e114eed`](https://github.com/apache/spark/commit/e1

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16574 **[Test build #71320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71320/testReport)** for PR 16574 at commit [`14ba3b2`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71320/ Test FAILed. ---

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16574 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e