Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
Oh OK! Thanks @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16355
Done, and it synced now. Merged to master/2.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16355
It's an apache-github sync issue:
https://github.com/apache/spark/commits/branch-2.1
is missing the latest commit from
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/16355
> but now I can't get the merge script to merge it for branch-2.1
I just had some issues with that too. But manually merging (git cherry-pick
+ git push) seems to still work, so maybe try
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
I was able to check out this commit and test it with branch-2.1, but now I
can't get the merge script to merge it for branch-2.1. @srowen would you mind
trying? Thanks!
---
If your project is
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
Merging with master. Will try to backport to branch-2.1 as well.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #3548 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3548/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #3548 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3548/testReport)**
for PR 16355 at commit
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
LGTM
Thanks!
Will merge after fresh tests
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
ping @jkbradley would you be able to take another look at the bisecting
kmeans model? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
ping @jkbradley would you be able to take another look at the bisecting
kmeans model? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
ping @jkbradley would you be able to take another look at the bisecting
kmeans model? I've updated with the random seed as requested, and the build
succeeded. Thank you!
---
If your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
ping @jkbradley would you be able to take another look at the bisecting
kmeans model? I've updated with the random seed as requested.
---
If your project is set up for it, you can reply to
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #3538 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3538/testReport)**
for PR 16355 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71538/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71538 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71538/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #3538 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3538/testReport)**
for PR 16355 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71533/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71533 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71533/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71538 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71538/testReport)**
for PR 16355 at commit
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley done, added seed. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
I was about to say this is ready, but I do think we should add the seed.
Other than that, this should be ready!
---
If your project is set up for it, you can reply to this email and have your
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71533 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71533/testReport)**
for PR 16355 at commit
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
LGTM pending a fresher run of the tests
Thanks @imatiach-msft !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
ping @jkbradley would you be able to take another look at the bisecting
K-Means fix?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71289/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71289 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #71289 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)**
for PR 16355 at commit
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley thanks, I've updated the code based on your latest comments - I
removed k and the verification for the setters.
---
If your project is set up for it, you can reply to this email
Github user filousen commented on the issue:
https://github.com/apache/spark/pull/16355
@imatiach-msft I can confirm the fix works after copying the full source to
make sure no mistake was done.
I ran it with around 200 datasets, and they all worked ð
Thank you for you
Github user carocat commented on the issue:
https://github.com/apache/spark/pull/16355
You are right, I didn't add all changes you had proposed for buildtree.
Everything works fine with your changes :-) Many thanks
---
If your project is set up for it, you can reply to this email
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley @yu-iskw @srowen can you please take another look at the
bisecting k-means algorithm fix? Thank you!
---
If your project is set up for it, you can reply to this email and have
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@carocat @filousen
Please look at these changes that I updated on December 28:
-val height = math.sqrt(Seq(leftIndex, rightIndex).map { childIndex
=>
+val indexes
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
It looks like you don't have all of my changes. I also updated the
buildSubTree method. Please take a look at the latest commit.
---
If your project is set up for it, you can reply to this
Github user carocat commented on the issue:
https://github.com/apache/spark/pull/16355
The changes proposed in _updateAssignments_ solved partially the problem,
because the key exception is still there.
There another key exception in buildSubTree method, if I do i_f (isInternal
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@filousen please note this fix is still in review and hasn't been checked
into spark yet. Can you send me the error you are seeing? Also, are you sure
you have ported my entire fix to your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@filousen I must have fixed your issue, because if I undo my changes and
run your code I can reproduce the error, you must be running your code without
this fix:
Job aborted due to
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
How did you verify that this change does not fix it? I ran the following
code and it ran without errors:
test("Verify issue from user") {
val jsonDs =
Github user filousen commented on the issue:
https://github.com/apache/spark/pull/16355
@imatiach-msft thank you for checking this.
I'm using a VectorAssembler to transform the dataset I read from the json I
pasted earlier.
VectorAssembler assembler = new
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@filousen could you please share the code that you used to load and run the
dataset and the full error message with stack trace you are seeing? I'm a bit
confused since the dataset is not a
Github user filousen commented on the issue:
https://github.com/apache/spark/pull/16355
Hi
[Here](http://pastebin.com/WecrbYQ0) is a dataset that makes it fails with
K=100 and maxIter=2
I know K>distinct features but I can reproduce the error with bigger
datasets.
The fix
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley @yu-iskw @srowen can you please take another look at the
bisecting k-means algorithm fix? Thank you!
---
If your project is set up for it, you can reply to this email and have
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70994/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70994 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70994/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70994 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70994/testReport)**
for PR 16355 at commit
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley Thank you for taking a look! I've updated the code based on your
comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
the only problem I see is that with this code we generate k-1 clusters
instead of k, but it states in the algorithm documentation that it is not
guaranteed to generate k clusters, it could be
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
@yu-iskw Pinging on this since you wrote bisecting k-means originally. Do
you have time to take a look? Thanks!
---
If your project is set up for it, you can reply to this email and have your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley @srowen any comments on the changes? Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70688/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70688 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70688/testReport)**
for PR 16355 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70688 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70688/testReport)**
for PR 16355 at commit
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70682 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70682/testReport)**
for PR 16355 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70682/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70682 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70682/testReport)**
for PR 16355 at commit
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
I've updated with a new commit. I was able to reproduce the issue by
generating a synthetic sparse dataset similar to the one Alok sent me, in
accordance with the test-style of spark test
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70679/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70679 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70679/testReport)**
for PR 16355 at commit
Github user alokob commented on the issue:
https://github.com/apache/spark/pull/16355
Nice to know that , codefix I suggested is working. Its really nice to
contribute in spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16355
**[Test build #70679 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70679/testReport)**
for PR 16355 at commit
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16355
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
I have very good news :). I was not only able to repro the issue with your
dataset, but I was also able to verify that with the suggested fix the
algorithm does not fail (adding the val
Github user alokob commented on the issue:
https://github.com/apache/spark/pull/16355
Thats ok , enjoy Xmas.
Please keep me posted if you find that issue is not resolved.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Hi Alok!
Sorry I was away for holiday break. I will try to reproduce the failure.
Thank you, Ilya
---
If your project is set up for it, you can reply to this email and have your
Github user alokob commented on the issue:
https://github.com/apache/spark/pull/16355
@imatiach-msft Did you find the dataset suitable. Is anything else needed
from my side?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user alokob commented on the issue:
https://github.com/apache/spark/pull/16355
You can get sample vectors at this location
https://github.com/alokob/SparkClusteringDataSet/SampleVectors.txt.
Also while executing bisecting K-Means , we have set following
configuration
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Yep, there is still a TODO to verify the fix. I'm waiting for the dataset
from Alok to reproduce the issue:
https://issues.apache.org/jira/browse/SPARK-16473
---
If your project is set
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16355
That makes more sense as a fix, yes. Sounds like there is still a to-do to
verify the fix. If it's possible to write a simple unit test to cover it, all
the better.
---
If your project is set up
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Good point. It looks like we should be checking if the map contains the
child or not. However, I'm not sure if that is the correct solution either. I
need a repro dataset from the bug
Github user alokob commented on the issue:
https://github.com/apache/spark/pull/16355
@imatiach-msft , thanks for creating pull request and committing change
which I have shared , I will try to share some sample dataset for this issue.
---
If your project is set up for it, you can
Github user wangmiao1981 commented on the issue:
https://github.com/apache/spark/pull/16355
@imatiach-msft Can you add a test case?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user wangmiao1981 commented on the issue:
https://github.com/apache/spark/pull/16355
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16355
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
83 matches
Mail list logo