[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-09 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91363104 @JoshRosen Good catch! fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91133286 [Test build #29925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29925/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91133293 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-09 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91347431 I spent a bit of time fuzz-testing this code to try to reach 100% coverage of the changes in this patch. While doing so, I think I uncovered a bug: ```

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r27998013 --- Diff: python/pyspark/rdd.py --- @@ -1755,21 +1753,33 @@ def createZero(): return self.combineByKey(lambda v: func(createZero(), v),

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28001568 --- Diff: python/pyspark/serializers.py --- @@ -220,6 +220,29 @@ def __repr__(self): return BatchedSerializer(%s, %d) % (str(self.serializer),

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28003603 --- Diff: python/pyspark/shuffle.py --- @@ -367,32 +372,13 @@ def iteritems(self): def _external_items(self): Return all

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28003896 --- Diff: python/pyspark/shuffle.py --- @@ -367,32 +372,13 @@ def iteritems(self): def _external_items(self): Return all

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91011683 I spent most of the morning looking this over again and the patch looks pretty good to me. I think I understand the lifecycle of values pretty well. I left a couple

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r27994956 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +520,295 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r27995826 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +520,295 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28027864 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28032966 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28027636 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91037582 @JoshRosen Thanks for review this, this may be the most complicated part in PySpark. :( For partitioned file, they will be cleaned up after merging, partition by

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91121784 [Test build #29925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29925/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28032910 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91051963 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91050249 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28013345 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +520,295 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91052717 [Test build #29895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29895/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91060947 @JoshRosen the last comments had been addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28019740 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,301 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91069245 [Test build #29895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29895/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91069251 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28027737 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91038005 @JoshRosen Also, I had rollback the changes in ResultIterable (because some one is using ResultIterable.maxindex), and improve the performance of len(ResultIterable) for

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91077067 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28033500 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91108456 @JoshRosen Thanks for the comments, it looks better now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28019825 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,301 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91115066 [Test build #29921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29921/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91115081 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91036610 [Test build #29891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29891/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91062845 [Test build #29900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29900/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91108728 [Test build #29921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29921/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28028089 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28027829 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +522,322 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91050238 [Test build #29891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29891/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91077061 [Test build #29900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29900/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-91081544 Sorry for my initial confusion regarding the external lists of lists. I think that the `__len__` thing might be an issue if we ever directly expose

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28010925 --- Diff: python/pyspark/shuffle.py --- @@ -367,32 +372,13 @@ def iteritems(self): def _external_items(self): Return all

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r28009944 --- Diff: python/pyspark/shuffle.py --- @@ -529,6 +520,295 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90697704 [Test build #29802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29802/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90697722 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90670585 [Test build #29802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29802/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90717977 [Test build #637 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/637/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90718319 [Test build #638 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/638/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90737032 [Test build #638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/638/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90740407 [Test build #637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/637/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90750781 [Test build #29816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29816/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90766722 [Test build #29816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29816/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90766727 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90293354 **[Test build #29757 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29757/consoleFull)** for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90293365 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90288434 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90288396 **[Test build #29756 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29756/consoleFull)** for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90258222 [Test build #29757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29757/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90255941 [Test build #29756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29756/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90356172 [Test build #29779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29779/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90371839 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-90371821 [Test build #29779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29779/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-26 Thread mkhaitman
Github user mkhaitman commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-86640421 Tried this out with 1.3-rc3 and was getting FetchFailedExceptions while performing a join between two RDDs: org.apache.spark.shuffle.FetchFailedException:

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-86686865 Do you hit this error without the patch? I have no idea on why they are related. On Thursday, March 26, 2015 at 10:43 AM, mkhaitman wrote: Tried this

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-26 Thread mkhaitman
Github user mkhaitman commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-86688264 @davies Sorry, I deleted the comment though you still received the notification. Think it was just a fluke since it didn't happen the second time. Sorry about that! So

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-26 Thread mkhaitman
Github user mkhaitman commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r27223983 --- Diff: python/pyspark/shuffle.py --- @@ -244,72 +258,57 @@ def _next_limit(self): def mergeValues(self, iterator): Combine

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-26 Thread mkhaitman
Github user mkhaitman commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r27223759 --- Diff: python/pyspark/shuffle.py --- @@ -244,72 +258,57 @@ def _next_limit(self): def mergeValues(self, iterator): Combine

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-25 Thread mkhaitman
Github user mkhaitman commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-86236377 This PR looks amazing! I'm going to test this out tomorrow with 1.3-rc3 and report back with some findings. I started taking a stab initially at trying to improve the

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-10 Thread airhorns
Github user airhorns commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-78116054 Hey @davies this would be freakin' fantastic to get merged... any chance that might happen soon? We hit many issues with skewed group sizes causing a whole job to fail

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2015-03-10 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-78118374 Ping @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-68388416 [Test build #24904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24904/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-68400348 **[Test build #24904 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24904/consoleFull)** for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-68400353 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-68400955 [Test build #557 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/557/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-68407892 [Test build #557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/557/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-14 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-63164145 I agree that this is a good fix; I've been letting the review slip because this PR is pretty complex and it will take me a decent amount of time to be sure that it's

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-62447893 @JoshRosen @mateiz Could we make this into 1.2 ? it had sit here for 2 months. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-62448001 [Test build #23156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23156/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-62460926 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-62460909 [Test build #23156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23156/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r20132241 --- Diff: python/pyspark/shuffle.py --- @@ -520,6 +505,295 @@ def sorted(self, iterator, key=None, reverse=False): return heapq.merge(chunks,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1977#discussion_r20132424 --- Diff: python/pyspark/rdd.py --- @@ -1579,21 +1577,34 @@ def createZero(): return self.combineByKey(lambda v: func(createZero(), v), func,

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-61682581 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-11-04 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-61682571 [Test build #22878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22878/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-31 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-61338247 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-61338701 [Test build #22651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22651/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-61345056 [Test build #22651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22651/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-28 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60717966 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60718072 [Test build #484 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/484/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60718263 [Test build #22348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22348/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60722423 [Test build #22348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22348/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60722431 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60473744 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60473742 [Test build #22199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22199/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60474811 **[Test build #431 timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/431/consoleFull)** for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60472577 [Test build #431 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/431/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-60472642 [Test build #22199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22199/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-58248987 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21397/consoleFull) for PR 1977 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-07 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-58250188 I had simplify GroupByKey, it's much readable now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-58259787 Test PASSed. Refer to this link for build results (access rights to CI server needed):

  1   2   >