Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
@Krimit
_Can you provide some information about the practical differences between
CBOW and skip-grams?_

As mentioned in [this paper](https://arxiv.org/pdf/1301.3781.pdf), CBOW
model looks at the words around a target word, and tries to predict the target
word. SkipGram does just the opposite. Given a target word, it tries to predict
the context words around it. The prediction is done using a very simple neural
network with a single hidden layer.
_Wikipedia quotes the author (I assume they mean Tomas) as saying that CBOW
is faster while skip-gram is slower but does a better job for infrequent words.
Has this been your experience as well? How pronounced is the difference?_
The current CBOW + Negative Sampling I found to take almost the same time
as the existing SkipGram + Hierarchical sampling. The negative sampling is
tunable, and the performance will be slower for a higher number of negative
samples.
_in what cases would a user choose one over the other? I'm basically
seconding @hhbyyh's comment on a more in-depth comparison experiment._
There is a good amount of research around this with comparison experiments.
It appears to largely depend on the application embeddings would be used for.
[Levy et al](http://www.aclweb.org/anthology/Q15-1016) show how different
methods perform with extensive experiments. They used the embeddings to perform
similarity, relatedness and other tests on some open datasets.
[Mikolov et
al](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
found SkipGram with Negative Sampling to outperform CBOW. [Baroni et
al](http://anthology.aclweb.org/P/P14/P14-1023.pdf) found that CBOW had a
slight advantage. [Levy et al](http://www.aclweb.org/anthology/Q15-1016)
explain that while CBOW did not perform as well in their experiments, others
have shown that capturing joint contexts (CBOW does this) can improve
performance on word similarity tasks. They also saw CBOW to perform well in
analogy tasks. So again, it depends on the task being performed.
[Mikolov et al](https://arxiv.org/pdf/1309.4168.pdf) recommend using
Skip-Gram when mono-lingual data is small and CBOW for larger datasets.
_The fact that the original paper has both implementations is not in itself
enough of a reason for Spark to do the same, IMO_
This is an active area of research, and both methods generate embeddings
that perform well on different tasks. As a library providing these
implementations, the choice I think is best left to the user and the
application it is being used for.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]