[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-11-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-11-07 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-6172 Hi @mattf, If you don't have any additional comments, do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-10-13 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-58884300 @JoshRosen @pwendell any further comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-10-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-58916892 @mattf I'm not sure that it's worth exposing those `Partition`-accepting methods in Python, since I don't think that they're really intended to be called by users. I

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-29 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-57258363 @JoshRosen a partition itself doesn't have much in the way of a user api. it wouldn't be difficult to wrap the java objects in a python Partition. we should then start

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-25 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56805152 RDD._jrdd is very heavy for PipelinedRDD, but getNumPartitions() could be optimized for PipelinedRDD to avoid the creation of _jrdd (could be

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-25 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56858958 what's the purpose of exposing an array of partition objects in ScalaJava? In Scala / Java, I think we expose Partition objects for use in custom RDD

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-24 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56717218 I think `len(rdd)` has the potential to be confused with `rdd.count()`, since calling `len()` on a Python collection usually returns the size of that collection.

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-23 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56552452 RDD._jrdd is very heavy for PipelinedRDD, but getNumPartitions() could be optimized for PipelinedRDD to avoid the creation of _jrdd (could be

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-21 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2478 [SPARK-3580] add 'partitions' property to PySpark RDD 'rdd.partitions' is available in scalajava, primarily used for its size() method to get the number of partitions. pyspark instead has a

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56299526 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20621/consoleFull) for PR 2478 at commit

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56301525 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20621/consoleFull) for PR 2478 at commit