[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-12-07 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17673 @ngopal this one can't be merged as-is and looks like it was abandoned. Would you like to take this PR, update per reviews? I'd review that. I think CBOW could be useful in MLlib. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-12-07 Thread ngopal
Github user ngopal commented on the issue: https://github.com/apache/spark/pull/17673 When can we anticipate this branch being merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-07-13 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17673 @shubhamchopra are you still working on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-06-28 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17673 Jenkins OK to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-11-06 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 @hhbyyh Thanks for your suggestions. Will try to incorporate these in a day or so. --- - To unsubscribe, e-mail:

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82569/ Test PASSed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82569/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82569/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82568/ Test FAILed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82568/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82568/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-05 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for your comments/suggestions @MLnick and @sethah . Working on incorporating these. --- - To unsubscribe, e-mail:

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82005/ Test PASSed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82005/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #82005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82005/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81320/ Test PASSed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #81320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81320/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #81320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81320/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81231/ Test FAILed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #81231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81231/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #81231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81231/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80243/ Test PASSed. ---

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #80243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80243/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17673 **[Test build #80243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80243/testReport)** for PR 17673 at commit

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17673 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-18 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 Code-review comments/suggestions so far have been incorporated. Thanks for looking into the code. Happy to incorporate more suggestions and feedback. --- If your project is set up for it,

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-04 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 @MLnick I half expected that. No worries. I have incorporated some of your feedback in the meantime and also added subsampling as well. Thanks for looking into the code. --- If your project

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-03 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17673 FYI, realistically there won't be bandwidth to really focus on this until after Spark 2.2 QA is done at the earliest. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-30 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for the detailed response @shubhamchopra. I'd like to clarify my point about whether this should be implemented in Spark: Spark MlLib is first and foremost a framework for doing ML on

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-28 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 @Krimit _Can you provide some information about the practical differences between CBOW and skip-grams?_ ![Model

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-27 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 @Krimit @MLnick @hhbyyh I am working on getting your earlier queries answered. @Krimit Thanks for looking into the code, I will try to get the code-review feedback incorporated in a

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-26 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 @shubhamchopra have you run this code in a distributed spark cluster yet? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-25 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17673 I can maybe help out a bit in a week and a bit (I've also done some poking inside of Word2Vec) but I need to wrap up some travel and Python stuff first. --- If your project is set up for it, you

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-25 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 I'm happy to take a look! I'll have some time to dig in deeper tomorrow. Some of my initial impressions: * There's a lot going on here, I agree with @hhbyyh that it would be cleaner to put the

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17673 It would be ideal to have both methods, but I'm worried about reviewer bandwidth vs priority on this. @Krimit you were working on Word2Vec recently - thoughts? Perhaps you have time to help

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-20 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for working on this, I'm traveling right now but maybe @MLNick has some bandwith to look at this. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-19 Thread shubhamchopra
Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 The [original paper](https://arxiv.org/abs/1301.3781) proposed two model architectures for generating word embeddings, Continuous Skip-Gram model and continuous Bag-of-words model. Spark ML

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-18 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for sharing the work. To help make the review easier, I would recommend: 1. Provide some background info. Is the new algorithm better than the existing one and in which cases?

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17673 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this