[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-65017779 @uncleGen Could you comment here to provide examples of when it's beneficial to disable map-side aggregation? If there is a legitimate case for disabling it, then we should add this option in Scala / Java as well. Otherwise, do you mind closing this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-65019432 @JoshRosen We already have this in Scala/Java. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-65019636 @JoshRosen We already have this in Scala/Java. What about `reduceByKey`? I don't see a variant with a flag for disabling map-side combining: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L262. We definitely have the `mapSideCombine` option for `combineByKey` but not for `reduceByKey`. I guess I kind of pattern-matched on the `reduceByKey` in my earlier comment; the `combineByKey` flag makes sense and we should definitely include that for feature-parity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3366 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-65020767 Actually, let's re-open this one since part of it should still go in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63831692 @davies Could you help reviewing this patch? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63859043 What's the cases that we should disable map side aggregation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/3365 [SPARK-4488][PySpark] Add control over map-side aggregation You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master-clean-141119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3365 commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201 Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:09:11Z add control over map-side aggregation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3365#issuecomment-63625644 [Test build #23608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23608/consoleFull) for PR 3365 at commit [`a4a5804`](https://github.com/apache/spark/commit/a4a580424b8eea3264ae9c4ae9ae2bec22af6201). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3365#issuecomment-63625774 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23608/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/3365 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
GitHub user uncleGen reopened a pull request: https://github.com/apache/spark/pull/3365 [SPARK-4488][PySpark] Add control over map-side aggregation You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master-clean-141119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3365 commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201 Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:09:11Z add control over map-side aggregation commit e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6 Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:28:31Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3365#issuecomment-63627276 [Test build #23610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23610/consoleFull) for PR 3365 at commit [`e3b0bc4`](https://github.com/apache/spark/commit/e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3365#issuecomment-63627393 [Test build #23610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23610/consoleFull) for PR 3365 at commit [`e3b0bc4`](https://github.com/apache/spark/commit/e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3365#issuecomment-63627396 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23610/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/3365 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/3366 [SPARK-4488][PySpark] Add control over map-side aggregation You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master-pyspark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3366 commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201 Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:09:11Z add control over map-side aggregation commit e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6 Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:28:31Z fix commit 66561d4aed9a02aeaaa84009ac679401ac4f4bfd Author: uncleGen husty...@gmail.com Date: 2014-11-19T11:46:03Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63629285 [Test build #23611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23611/consoleFull) for PR 3366 at commit [`66561d4`](https://github.com/apache/spark/commit/66561d4aed9a02aeaaa84009ac679401ac4f4bfd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63638355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23611/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63638348 [Test build #23611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23611/consoleFull) for PR 3366 at commit [`66561d4`](https://github.com/apache/spark/commit/66561d4aed9a02aeaaa84009ac679401ac4f4bfd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org