[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-05-01 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-216047890 @srowen my JIRA username is "flysjy", thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-216017374 LGTM thanks all for the patch & reviews! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215948829 @flyjy if you tell me your JIRA handle I'll assign to you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-30 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11812 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215948535 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215819301 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215819305 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215819124 **[Test build #57346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57346/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215798743 **[Test build #57346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57346/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread jyshen15
Github user jyshen15 commented on a diff in the pull request: https://github.com/apache/spark/pull/11812#discussion_r61606066 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -108,5 +108,26 @@ class Word2VecSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215664477 I confirmed the test case fails on master without the changes in this PR. LGTM. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11812#discussion_r61549783 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -108,5 +108,26 @@ class Word2VecSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-29 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215661660 @jkbradley are you OK with the test here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215632760 @srowen The PR with unit testing passed after rebasing master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215632322 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215632323 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215632229 **[Test build #57311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57311/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215628418 **[Test build #57311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57311/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215601512 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215601488 **[Test build #57290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57290/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215601509 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215598306 **[Test build #57290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57290/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-26 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-214970336 Yes, I am working it. Will finish tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-214676646 @flyjy are you updating this? it's almost done I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-22 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213528482 @srowen , I agree with you. That is a good idea to skip the word2vec iteration step, and directly initialize the `Word2VecModel` class. Will go with this approach. ---

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213295833 It depends on the license of this corpus. Is it sufficient to test behavior with a very large input vector? or are we not so clear that's the issue? --- If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213232923 That is a good idea about the unit test. I actually first included the unit test codes of @MLnick on March 22 with Lee corpus from Gensim, but later did not include them

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213203872 +1 for @MLnick 's suggestion of adding a unit test to mllib/tests.py which fails before your fix --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212969679 Yes, it's reproducible as mentioned in the third comment at https://issues.apache.org/jira/browse/SPARK-13289 I thought this PR will solve the issue. Isn't

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212921574 The result here is a similarity rather than a distance. It should never be more than 1, unless there's a bug, because it's a cosine similarity. I can see here a case

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212915342 My observation (of the current implementation of word2vec) is that the distances between synonyms are getting larger and larger with more iterations and finally to

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212828428 @srowen was about to ping you on this. Yup, that is basically the idea. I would prefer to add a test case here, where it fails without the changes in the PR. --- If

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212822658 Is the problem that the input vector may have a very large norm, causing the dot product with other vectors to be Infinity? There's a little, opposite problem: dividing

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212175243 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212175249 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212175003 **[Test build #56290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56290/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212165957 **[Test build #56290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56290/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread jyshen15
Github user jyshen15 commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211756190 i will handle the python style issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750751 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750733 **[Test build #56198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750648 Thanks. Have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750023 **[Test build #56198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211240792 You need to do `model.findSynonyms("a", 2).select("word", fmt("similarity", 5).alias("similarity")).show()` in order to truncate the `similarity` col to 5 significant

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-210930706 **[Test build #56028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56028/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-210930845 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-210930838 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-210926153 **[Test build #56028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56028/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-12 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-208894112 Test failure looks like small precision issue. You can do the following perhaps in the doc string test: ``` >>> from pyspark.sql.functions import

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207856560 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207856559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207856553 **[Test build #55446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55446/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207855658 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207855659 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207855649 **[Test build #55445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55445/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207854058 **[Test build #55446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55446/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-207853515 **[Test build #55445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55445/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-204128446 Also for some reason there are a huge number of files changed in the GitHub view. Perhaps an issue with rebase / merge with current master? --- If your project is set

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-31 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-204126393 Yeah I did think about that too. There is a `TODO` to adjust the learning rate by iteration. But I think it makes this PR easier to analyze if the only change is

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-31 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-204120379 How about keep the learning rate related code unchanged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203516285 Looks like some the pySpark unit tests expect to have ++---+ |word| similarity| ++---+ |

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203512904 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203512900 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203512838 **[Test build #54529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54529/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203499113 **[Test build #54529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54529/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203389060 @flyjy perhaps try rebasing to current master just in case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202892564 Is this caused by the changes made on word2vec.scala after this PR was initialed? Maybe the change developed a conflict to this PR. (This is just my naive guess. I

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202831224 **[Test build #54432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54432/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202831264 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202831267 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202824019 **[Test build #54432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54432/consoleFull)** for PR 11812 at commit

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202823312 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-27 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202224483 I tested this commit on the "One Billion Words Language Modeling" dataset with 72 partitions and 15 iterations. It works well. --- If your project is set up for

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-26 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-201971518 @MLnick This bug has been fixed without changing existing interfaces. Have tested it with your test script with Lee corpus from Gensim. I am not sure whether you

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-23 Thread jyshen15
Github user jyshen15 commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-200416321 @MLnick cool! It actually comes down to the question that what should the `getVectors ` outputs? If the equation ` getVectors("Paris") - getVectors("France") +

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-22 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-199698551 Here is my test case - I can replicate the `Infinity` similarities on a small test dataset. It only occurs when the num partitions and the num iterations is very high.

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-20 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-19474 Yes, please don't change any existing behavior of public methods. Ok - I also managed to create a small test case that replicates the issue. I verified that

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198852800 Thanks. I have checked that the problem still exists with only the adaptive learning rate change. So, I will fix this bug without change the existing interface.

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11812#discussion_r56619432 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -463,12 +465,17 @@ class Word2VecModel private[spark] ( //

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread flyjy
GitHub user flyjy opened a pull request: https://github.com/apache/spark/pull/11812 [SPARK-13289][MLLIB] Fix infinite distances between word vectors in Word2VecModel ## What changes were proposed in this pull request? This PR fixes the bug that generates infinite distances

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11812#discussion_r56619995 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -532,28 +539,14 @@ class Word2VecModel private[spark] (

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198242957 Thanks for this. While I see that normalizing the vectors internally may be useful, it does change behaviour in the `getVectors` and `findSynonyms` methods. See my

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198194974 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-18 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198270501 It would also be ideal to create a test case that can replicate the issue with the old code, and pass with the new code, for regression testing going forward. --- If