[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13176 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59694/ Test PASSed.

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13176 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13176 **[Test build #59694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59694/consoleFull)** for PR 13176 at commit [`4b1a1fa`](https://github.com/apache/spark/

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13176 **[Test build #59694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59694/consoleFull)** for PR 13176 at commit [`4b1a1fa`](https://github.com/apache/spark/c

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 @MLnick +1 for making the change in the example as well. Calling out difference in result due to parallelism might be little confusing in this document. --- If your project is set

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 That would require setting `relativeError` to `0` in the examples however. Open to other suggestions. --- If your project is set up for it, you can reply to this email and have your reply app

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 Ok, at least we know the issue now. I'd say we can leave the example as is, but let's add something like: ``` Given `numBuckets = 3`, and computing exact quantiles (by setting

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 I just tried with `--master local[8]` and I get the same results as you do. Should I call this out in the example? --- If your project is set up for it, you can reply to this email a

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 I just did. It is local[4] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 Can you check with `sysctl -n hw.ncpu`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featu

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 @MLnick I am using local. I havent explicitly setup thread count. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 @GayathriMurali what master are you using for spark-shell? If using `local[4]` I get the same result as you (default for me is 8 threads), so probably due to difference in parallelism (merging

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/13176 I will try as well this afternoon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread oliverpierson
Github user oliverpierson commented on the pull request: https://github.com/apache/spark/pull/13176 `Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)` on my machine. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 On Mac. Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73). I checked again and I consistently get the same output on master. @MLnick Please let me know how you wo

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 @GayathriMurali what environment are you using? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13176 Yeah I get the following ``` scala> df.stat.approxQuantile("hour", Array(1.0/3, 2.0/3), relativeError=0.001) res1: Array[Double] = Array(2.2, 5.0) ``` env: on Mac, `Scal

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 @BryanCutler @oliverpierson Looks like something is wrong on my side. I just checked again on a fresh build and got the same results. Will dig deeper. --- If your project is set up fo

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread BryanCutler
Github user BryanCutler commented on the pull request: https://github.com/apache/spark/pull/13176 I'm also getting the same results as @MLnick and @oliverpierson , also getting `Array(2.2, 5.0)` from the stat call. My env is: master (updated this morning) on d67c82e4b647dacd0

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread oliverpierson
Github user oliverpierson commented on the pull request: https://github.com/apache/spark/pull/13176 That's wild. I'm getting `Array[Double] = Array(2.2, 5.0)` and I'm guessing @MLnick is also. `approxQuantile` is deterministic so I'm not really sure why we're getting different resul

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 I get this : Array[Double] = Array(5.0, 8.0) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread oliverpierson
Github user oliverpierson commented on the pull request: https://github.com/apache/spark/pull/13176 @GayathriMurali Looks like it could be an issue with bucketing, but I'm not sure how. What does `df.stat.approxQuantile("hour", Array(1.0/3, 2.0/3), relativeError=0.001)` return? ---

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on the pull request: https://github.com/apache/spark/pull/13176 @MLnick @oliverpierson I checked again with a clean build off master. Here is the hash : 2bfc4f15214a870b3e067f06f37eb506b0070a1f. Here is what I see https://cloud.githubuserco

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread GayathriMurali
Github user GayathriMurali commented on a diff in the pull request: https://github.com/apache/spark/pull/13176#discussion_r65223909 --- Diff: docs/ml-features.md --- @@ -145,9 +148,11 @@ for more details on the API. passed to other algorithms like LDA. During the f

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and examples for ...

2016-05-31 Thread oliverpierson
Github user oliverpierson commented on a diff in the pull request: https://github.com/apache/spark/pull/13176#discussion_r65173690 --- Diff: docs/ml-features.md --- @@ -145,9 +148,11 @@ for more details on the API. passed to other algorithms like LDA. During the fi