Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59694/
Test PASSed.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fe
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176
**[Test build #59694 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59694/consoleFull)**
for PR 13176 at commit
[`4b1a1fa`](https://github.com/apache/spark/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176
**[Test build #59694 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59694/consoleFull)**
for PR 13176 at commit
[`4b1a1fa`](https://github.com/apache/spark/c
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
@MLnick +1 for making the change in the example as well. Calling out
difference in result due to parallelism might be little confusing in this
document.
---
If your project is set
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
That would require setting `relativeError` to `0` in the examples however.
Open to other suggestions.
---
If your project is set up for it, you can reply to this email and have your
reply app
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
Ok, at least we know the issue now.
I'd say we can leave the example as is, but let's add something like:
```
Given `numBuckets = 3`, and computing exact quantiles (by setting
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
I just tried with `--master local[8]` and I get the same results as you do.
Should I call this out in the example?
---
If your project is set up for it, you can reply to this email a
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
I just did. It is local[4]
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
en
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
Can you check with `sysctl -n hw.ncpu`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this featu
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
@MLnick I am using local. I havent explicitly setup thread count.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
@GayathriMurali what master are you using for spark-shell? If using
`local[4]` I get the same result as you (default for me is 8 threads), so
probably due to difference in parallelism (merging
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/13176
I will try as well this afternoon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user oliverpierson commented on the pull request:
https://github.com/apache/spark/pull/13176
`Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)`
on my machine.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
On Mac. Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_73). I checked again and I consistently get the same output on master.
@MLnick Please let me know how you wo
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
@GayathriMurali what environment are you using?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have th
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176
Yeah I get the following
```
scala> df.stat.approxQuantile("hour", Array(1.0/3, 2.0/3),
relativeError=0.001)
res1: Array[Double] = Array(2.2, 5.0)
```
env:
on Mac, `Scal
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
@BryanCutler @oliverpierson Looks like something is wrong on my side. I
just checked again on a fresh build and got the same results. Will dig deeper.
---
If your project is set up fo
Github user BryanCutler commented on the pull request:
https://github.com/apache/spark/pull/13176
I'm also getting the same results as @MLnick and @oliverpierson , also
getting `Array(2.2, 5.0)` from the stat call. My env is:
master (updated this morning) on d67c82e4b647dacd0
Github user oliverpierson commented on the pull request:
https://github.com/apache/spark/pull/13176
That's wild. I'm getting `Array[Double] = Array(2.2, 5.0)` and I'm
guessing @MLnick is also. `approxQuantile` is deterministic so I'm not really
sure why we're getting different resul
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
I get this : Array[Double] = Array(5.0, 8.0)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user oliverpierson commented on the pull request:
https://github.com/apache/spark/pull/13176
@GayathriMurali Looks like it could be an issue with bucketing, but I'm not
sure how. What does `df.stat.approxQuantile("hour", Array(1.0/3, 2.0/3),
relativeError=0.001)` return?
---
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176
@MLnick @oliverpierson I checked again with a clean build off master. Here
is the hash : 2bfc4f15214a870b3e067f06f37eb506b0070a1f. Here is what I see
https://cloud.githubuserco
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r65223909
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the f
Github user oliverpierson commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r65173690
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the fi
25 matches
Mail list logo