[jira] [Updated] (SPARK-20902) Word2Vec implementations with Negative Sampling

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20902:
-
Labels: ML bulk-closed  (was: ML)

> Word2Vec implementations with Negative Sampling
> ---
>
> Key: SPARK-20902
> URL: https://issues.apache.org/jira/browse/SPARK-20902
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.1.1
>Reporter: Shubham Chopra
>Priority: Major
>  Labels: ML, bulk-closed
>
> Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical 
> softmax. Both Continuous bag of words (CBOW) and SkipGram have shown 
> comparative or better performance with Negative Sampling. This umbrella JIRA 
> is to keep a track of the effort to add negative sampling based 
> implementations of both CBOW and SkipGram models to Spark MLlib.
> Since word2vec is largely a pre-processing step, the performance often can 
> depend on the application it is being used for, and the corpus it is 
> estimated on. These implementation give users the choice of picking one that 
> works best for their use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20902) Word2Vec implementations with Negative Sampling

2017-05-26 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated SPARK-20902:
---
Description: 
Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical softmax. 
Both Continuous bag of words (CBOW) and SkipGram have shown comparative or 
better performance with Negative Sampling. This umbrella JIRA is to keep a 
track of the effort to add negative sampling based implementations of both CBOW 
and SkipGram models to Spark MLlib.

Since word2vec is largely a pre-processing step, the performance often can 
depend on the application it is being used for, and the corpus it is estimated 
on. These implementation give users the choice of picking one that works best 
for their use-case.

  was:Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical 
softmax. Both Continuous bag of words (CBOW) and SkipGram have shown 
comparative or better performance with Negative Sampling. This umbrella JIRA is 
to keep a track of the effort to add negative sampling based implementations of 
both CBOW and SkipGram models to Spark MLlib.


> Word2Vec implementations with Negative Sampling
> ---
>
> Key: SPARK-20902
> URL: https://issues.apache.org/jira/browse/SPARK-20902
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.1.1
>Reporter: Shubham Chopra
>  Labels: ML
>
> Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical 
> softmax. Both Continuous bag of words (CBOW) and SkipGram have shown 
> comparative or better performance with Negative Sampling. This umbrella JIRA 
> is to keep a track of the effort to add negative sampling based 
> implementations of both CBOW and SkipGram models to Spark MLlib.
> Since word2vec is largely a pre-processing step, the performance often can 
> depend on the application it is being used for, and the corpus it is 
> estimated on. These implementation give users the choice of picking one that 
> works best for their use-case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org