Github user davies closed the pull request at:
https://github.com/apache/spark/pull/1791
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53647004
Most of useful parts have been merged separately, so close this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53465719
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19221/consoleFull)
for PR 1791 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53467694
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19224/consoleFull)
for PR 1791 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53473137
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19229/consoleFull)
for PR 1791 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53473278
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19221/consoleFull)
for PR 1791 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53474577
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19224/consoleFull)
for PR 1791 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53479397
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19229/consoleFull)
for PR 1791 at commit
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-53024348
@mateiz @JoshRosen some APIs has been splitted out as separated PRs: #2091,
#2092, #2093, #2094, #2095
---
If your project is set up for it, you can reply to this email
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16266558
--- Diff: python/pyspark/rdd.py ---
@@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator):
yield item
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16266596
--- Diff: python/pyspark/rdd.py ---
@@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator):
yield item
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16266655
--- Diff: python/pyspark/rdd.py ---
@@ -812,23 +842,39 @@ def func(iterator):
return self.mapPartitions(func).fold(zeroValue, combOp)
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16266736
--- Diff: python/pyspark/rdd.py ---
@@ -858,6 +904,88 @@ def redFunc(left_counter, right_counter):
return self.mapPartitions(lambda i:
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16266909
--- Diff: python/pyspark/rdd.py ---
@@ -858,6 +904,88 @@ def redFunc(left_counter, right_counter):
return self.mapPartitions(lambda i:
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16267032
--- Diff: python/pyspark/rdd.py ---
@@ -1685,11 +1813,69 @@ def zip(self, other):
x.zip(y).collect()
[(0, 1000), (1, 1001), (2,
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16267396
--- Diff: python/pyspark/rdd.py ---
@@ -1756,6 +1942,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16267443
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16267475
--- Diff: python/pyspark/rdd.py ---
@@ -1756,6 +1942,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52246974
@davies I looked over all of this now and made some comments, but you
should have Josh check too. Just to be clear though, I don't think this can
make it into 1.1, so we
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52256403
@mateiz thanks for review this, I had addressed all you comments.
@JoshRosen could you take a look a this again?
---
If your project is set up for it, you can
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52258026
The description had been updated to list all the added APIs.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16209228
--- Diff: python/pyspark/context.py ---
@@ -260,6 +260,20 @@ def defaultMinPartitions(self):
return
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16209359
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16209395
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16209415
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16209516
--- Diff: python/pyspark/context.py ---
@@ -260,6 +260,20 @@ def defaultMinPartitions(self):
return
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16210568
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16210900
--- Diff: python/pyspark/rdd.py ---
@@ -1755,6 +1941,114 @@ def _defaultReducePartitions(self):
# on the key; we need to compare the hash of the key
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52121066
If this PR is too huge to be merged, I can split it, then merge some good
parts of it.
---
If your project is set up for it, you can reply to this email and have your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52121288
QA tests have started for PR 1791. This patch DID NOT merge cleanly!
brView progress:
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52125485
QA results for PR 1791:br- This patch FAILED unit tests.brbrFor more
information see test
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52129398
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52129587
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18498/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-52132725
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16071018
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -741,6 +741,23 @@ private[spark] object PythonRDD extends Logging {
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51823944
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18319/consoleFull
---
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r16072784
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -741,6 +741,23 @@ private[spark] object PythonRDD extends Logging {
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51826195
@JoshRosen @mateiz I had commented out those not implemented APIs.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user davies closed the pull request at:
https://github.com/apache/spark/pull/1791
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user davies reopened a pull request:
https://github.com/apache/spark/pull/1791
[SPARK-2871] [PySpark] Add missing API
Try to bring all Java/Scala API to PySpark.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51826337
closed by accident
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51830890
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51731037
BTW leaving TODOs in the Python code would also be okay, if you want to see
this in the code.
---
If your project is set up for it, you can reply to this email and have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51731023
I also actually prefer leaving out the non-implemented ones instead of
putting them in with NotImplementedError. Especially when working in an IDE or
something similar,
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15955661
--- Diff: python/pyspark/rdd.py ---
@@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator):
yield item
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51516873
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18137/consoleFull
---
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51297756
The histogram() had been implemented in pure Python, it will support
integer better, also it will support RDD of strings and other comparable
objects.
This was
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51299595
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user nrchandan commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15858417
--- Diff: python/pyspark/rdd.py ---
@@ -854,6 +884,97 @@ def redFunc(left_counter, right_counter):
return self.mapPartitions(lambda i:
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51372924
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18032/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51379740
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51401265
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18055/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51401363
QA results for PR 1791:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51402438
QA results for PR 1791:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51402751
@JoshRosen @mateiz Could you take a look at this? I hope that this can be
in 1.1.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51402995
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18059/consoleFull
---
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15906724
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1004,7 +1004,7 @@ abstract class RDD[T: ClassTag](
},
(h1:
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15906969
--- Diff: python/pyspark/context.py ---
@@ -727,6 +738,13 @@ def sparkUser(self):
return self._jsc.sc().sparkUser()
+
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51404681
If we're going to add placeholders for unimplemented methods, what about
just commenting out that code instead of throwing NotImplementedError? That
might be less
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15907204
--- Diff: python/pyspark/context.py ---
@@ -727,6 +738,13 @@ def sparkUser(self):
return self._jsc.sc().sparkUser()
+
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15907322
--- Diff: python/pyspark/context.py ---
@@ -727,6 +738,13 @@ def sparkUser(self):
return self._jsc.sc().sparkUser()
+
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51405218
The difference is that whether those unimplemented API should in the API
docs, I think we should have an complete set of API in Java or Python, and user
can easily know
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51408173
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916433
--- Diff: python/pyspark/context.py ---
@@ -727,6 +738,13 @@ def sparkUser(self):
return self._jsc.sc().sparkUser()
+
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51425517
@matei @pwendell Do you have any thoughts on placeholders vs. leaving out
APIs that aren't implemented in PySpark? Which is better from a usability
perspective?
---
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916708
--- Diff: python/pyspark/context.py ---
@@ -260,6 +260,17 @@ def defaultMinPartitions(self):
return
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916742
--- Diff: python/pyspark/rdd.py ---
@@ -737,6 +754,19 @@ def _collect_iterator_through_file(self, iterator):
yield item
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916764
--- Diff: python/pyspark/rdd.py ---
@@ -811,23 +841,39 @@ def func(iterator):
return self.mapPartitions(func).fold(zeroValue, combOp)
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916792
--- Diff: python/pyspark/rdd.py ---
@@ -1684,11 +1812,57 @@ def zip(self, other):
x.zip(y).collect()
[(0, 1000), (1, 1001), (2,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15916811
--- Diff: python/pyspark/rdd.py ---
@@ -1684,11 +1812,57 @@ def zip(self, other):
x.zip(y).collect()
[(0, 1000), (1, 1001), (2,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51281172
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user zzl0 commented on a diff in the pull request:
https://github.com/apache/spark/pull/1791#discussion_r15856478
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -730,7 +730,25 @@ private[spark] object PythonRDD extends Logging {
}
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1791#issuecomment-51295996
QA tests have started for PR 1791. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull
---
73 matches
Mail list logo