[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-27 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/1791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-27 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53647004 Most of useful parts have been merged separately, so close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53465719 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19221/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53467694 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19224/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53473137 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19229/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53473278 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19221/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53474577 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19224/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53479397 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19229/consoleFull) for PR 1791 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-21 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-53024348 @mateiz @JoshRosen some APIs has been splitted out as separated PRs: #2091, #2092, #2093, #2094, #2095 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16266558 --- Diff: python/pyspark/rdd.py --- @@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator): yield item

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16266596 --- Diff: python/pyspark/rdd.py --- @@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator): yield item

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16266655 --- Diff: python/pyspark/rdd.py --- @@ -812,23 +842,39 @@ def func(iterator): return self.mapPartitions(func).fold(zeroValue, combOp)

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16266736 --- Diff: python/pyspark/rdd.py --- @@ -858,6 +904,88 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16266909 --- Diff: python/pyspark/rdd.py --- @@ -858,6 +904,88 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16267032 --- Diff: python/pyspark/rdd.py --- @@ -1685,11 +1813,69 @@ def zip(self, other): x.zip(y).collect() [(0, 1000), (1, 1001), (2,

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16267396 --- Diff: python/pyspark/rdd.py --- @@ -1756,6 +1942,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16267443 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16267475 --- Diff: python/pyspark/rdd.py --- @@ -1756,6 +1942,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52246974 @davies I looked over all of this now and made some comments, but you should have Josh check too. Just to be clear though, I don't think this can make it into 1.1, so we

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52256403 @mateiz thanks for review this, I had addressed all you comments. @JoshRosen could you take a look a this again? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-14 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52258026 The description had been updated to list all the added APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16209228 --- Diff: python/pyspark/context.py --- @@ -260,6 +260,20 @@ def defaultMinPartitions(self): return

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16209359 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16209395 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16209415 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16209516 --- Diff: python/pyspark/context.py --- @@ -260,6 +260,20 @@ def defaultMinPartitions(self): return

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16210568 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16210900 --- Diff: python/pyspark/rdd.py --- @@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self): # on the key; we need to compare the hash of the key

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52121066 If this PR is too huge to be merged, I can split it, then merge some good parts of it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52121288 QA tests have started for PR 1791. This patch DID NOT merge cleanly! brView progress:

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52125485 QA results for PR 1791:br- This patch FAILED unit tests.brbrFor more information see test

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52129398 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52129587 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18498/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-52132725 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16071018 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -741,6 +741,23 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51823944 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18319/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r16072784 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -741,6 +741,23 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51826195 @JoshRosen @mateiz I had commented out those not implemented APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/1791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread davies
GitHub user davies reopened a pull request: https://github.com/apache/spark/pull/1791 [SPARK-2871] [PySpark] Add missing API Try to bring all Java/Scala API to PySpark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51826337 closed by accident --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51830890 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51731037 BTW leaving TODOs in the Python code would also be okay, if you want to see this in the code. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51731023 I also actually prefer leaving out the non-implemented ones instead of putting them in with NotImplementedError. Especially when working in an IDE or something similar,

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-07 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15955661 --- Diff: python/pyspark/rdd.py --- @@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator): yield item

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51516873 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18137/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51297756 The histogram() had been implemented in pure Python, it will support integer better, also it will support RDD of strings and other comparable objects. This was

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51299595 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread nrchandan
Github user nrchandan commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15858417 --- Diff: python/pyspark/rdd.py --- @@ -854,6 +884,97 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51372924 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18032/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51379740 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51401265 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18055/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51401363 QA results for PR 1791:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51402438 QA results for PR 1791:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51402751 @JoshRosen @mateiz Could you take a look at this? I hope that this can be in 1.1. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51402995 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18059/consoleFull ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15906724 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1004,7 +1004,7 @@ abstract class RDD[T: ClassTag]( }, (h1:

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15906969 --- Diff: python/pyspark/context.py --- @@ -727,6 +738,13 @@ def sparkUser(self): return self._jsc.sc().sparkUser() +

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51404681 If we're going to add placeholders for unimplemented methods, what about just commenting out that code instead of throwing NotImplementedError? That might be less

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15907204 --- Diff: python/pyspark/context.py --- @@ -727,6 +738,13 @@ def sparkUser(self): return self._jsc.sc().sparkUser() +

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15907322 --- Diff: python/pyspark/context.py --- @@ -727,6 +738,13 @@ def sparkUser(self): return self._jsc.sc().sparkUser() +

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51405218 The difference is that whether those unimplemented API should in the API docs, I think we should have an complete set of API in Java or Python, and user can easily know

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51408173 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916433 --- Diff: python/pyspark/context.py --- @@ -727,6 +738,13 @@ def sparkUser(self): return self._jsc.sc().sparkUser() +

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51425517 @matei @pwendell Do you have any thoughts on placeholders vs. leaving out APIs that aren't implemented in PySpark? Which is better from a usability perspective? ---

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916708 --- Diff: python/pyspark/context.py --- @@ -260,6 +260,17 @@ def defaultMinPartitions(self): return

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916742 --- Diff: python/pyspark/rdd.py --- @@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator): yield item

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916764 --- Diff: python/pyspark/rdd.py --- @@ -811,23 +841,39 @@ def func(iterator): return self.mapPartitions(func).fold(zeroValue, combOp)

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916792 --- Diff: python/pyspark/rdd.py --- @@ -1684,11 +1812,57 @@ def zip(self, other): x.zip(y).collect() [(0, 1000), (1, 1001), (2,

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15916811 --- Diff: python/pyspark/rdd.py --- @@ -1684,11 +1812,57 @@ def zip(self, other): x.zip(y).collect() [(0, 1000), (1, 1001), (2,

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51281172 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-05 Thread zzl0
Github user zzl0 commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15856478 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -730,7 +730,25 @@ private[spark] object PythonRDD extends Logging { }

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51295996 QA tests have started for PR 1791. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull ---