[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-26 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53416722 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53481077 LGTM. Merged into `master` and `branch-1.1` (since it only adds new methods and doesn't modify any existing code). --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2091 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53305144 Why do we even need `evenBuckets`? Can't we just check whether the buckets are evenly-spaced and automatically perform the optimization if they are? This only

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53343906 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19159/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53343907 @JoshRosen I had removed evenBuckets, also added more tests, and some test cases for `str` type. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53350206 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19159/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53354293 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19165/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53355210 @JoshRosen sure doing a linear scan works, the evenBuckets was because the caller knows if its providing even buckets. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53359625 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19165/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53375345 When you guys merge this, please close https://github.com/apache/spark/pull/122 as well. You should just edit the pull request description to also say closes #122, and

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53378149 done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53180978 @mateiz @JoshRosen I would like to change `evenBuckets` to `even`, the later one is meaningful enough and much shorter. One concern is that we will have

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53144107 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19094/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53145718 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53145884 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19098/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53147036 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19098/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16630390 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16633581 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-23 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53172796 These are _excellent_ unit tests. I ran a coverage report with [coverage.py](http://nedbatchelder.com/code/coverage/) and it reports essentially 100% coverage for the

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-22 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16627566 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-22 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16627673 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-22 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16627753 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-22 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2091#discussion_r16628887 --- Diff: python/pyspark/rdd.py --- @@ -856,6 +856,104 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i:

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53143433 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19094/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-21 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2091 [SPARK-2871] [PySpark] add histgram() API Compute a histogram using the provided buckets. The buckets are all open to the right except for the last which is closed.

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53022070 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19071/consoleFull) for PR 2091 at commit

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2091#issuecomment-53024894 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19071/consoleFull) for PR 2091 at commit