[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-65017779
  
@uncleGen Could you comment here to provide examples of when it's 
beneficial to disable map-side aggregation?  If there is a legitimate case for 
disabling it, then we should add this option in Scala / Java as well.  
Otherwise, do you mind closing this pull request? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-30 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-65019432
  
@JoshRosen We already have this in Scala/Java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-65019636
  
 @JoshRosen We already have this in Scala/Java.

What about `reduceByKey`?  I don't see a variant with a flag for disabling 
map-side combining: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L262.
  We definitely have the `mapSideCombine` option for `combineByKey` but not for 
`reduceByKey`.

I guess I kind of pattern-matched on the `reduceByKey` in my earlier 
comment; the `combineByKey` flag makes sense and we should definitely include 
that for feature-parity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3366


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-65020767
  
Actually, let's re-open this one since part of it should still go in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-20 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63831692
  
@davies Could you help reviewing this patch? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-20 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63859043
  
What's the cases that we should disable map side aggregation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/3365

[SPARK-4488][PySpark] Add control over map-side aggregation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master-clean-141119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3365


commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:09:11Z

add control over map-side aggregation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3365#issuecomment-63625644
  
  [Test build #23608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23608/consoleFull)
 for   PR 3365 at commit 
[`a4a5804`](https://github.com/apache/spark/commit/a4a580424b8eea3264ae9c4ae9ae2bec22af6201).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3365#issuecomment-63625774
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23608/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/3365


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread uncleGen
GitHub user uncleGen reopened a pull request:

https://github.com/apache/spark/pull/3365

[SPARK-4488][PySpark] Add control over map-side aggregation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master-clean-141119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3365


commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:09:11Z

add control over map-side aggregation

commit e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:28:31Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3365#issuecomment-63627276
  
  [Test build #23610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23610/consoleFull)
 for   PR 3365 at commit 
[`e3b0bc4`](https://github.com/apache/spark/commit/e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3365#issuecomment-63627393
  
  [Test build #23610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23610/consoleFull)
 for   PR 3365 at commit 
[`e3b0bc4`](https://github.com/apache/spark/commit/e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3365#issuecomment-63627396
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23610/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/3365


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/3366

[SPARK-4488][PySpark] Add control over map-side aggregation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master-pyspark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3366.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3366


commit a4a580424b8eea3264ae9c4ae9ae2bec22af6201
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:09:11Z

add control over map-side aggregation

commit e3b0bc4f3a97e50a9584bf2281ddc6aa8034b3d6
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:28:31Z

fix

commit 66561d4aed9a02aeaaa84009ac679401ac4f4bfd
Author: uncleGen husty...@gmail.com
Date:   2014-11-19T11:46:03Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63629285
  
  [Test build #23611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23611/consoleFull)
 for   PR 3366 at commit 
[`66561d4`](https://github.com/apache/spark/commit/66561d4aed9a02aeaaa84009ac679401ac4f4bfd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63638355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23611/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63638348
  
  [Test build #23611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23611/consoleFull)
 for   PR 3366 at commit 
[`66561d4`](https://github.com/apache/spark/commit/66561d4aed9a02aeaaa84009ac679401ac4f4bfd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org