GitHub user nzw0301 opened a pull request:
https://github.com/apache/spark/pull/19372
[MLLIB] Fix update equation of learning rate in Word2Vec.scala
## What changes were proposed in this pull request?
Current equation of learning rate is incorrect when `numIterations` > `1`.
This PR is based on [original C
code](https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L393).
cc: @mengxr
## How was this patch tested?
manual tests
I modified [this example
code](https://spark.apache.org/docs/2.1.1/mllib-feature-extraction.html#example).
### `numIteration=1`
#### Code
```scala
import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line =>
line.split(" ").toSeq)
val word2vec = new Word2Vec()
val model = word2vec.fit(input)
val synonyms = model.findSynonyms("1", 5)
for((synonym, cosineSimilarity) <- synonyms) {
println(s"$synonym $cosineSimilarity")
}
```
#### Result
```
0 0.3267880082130432
2 0.21420614421367645
3 0.19923636317253113
9 0.1063166931271553
4 0.0397246889770031
```
### `numIteration=5`
#### Code
```scala
import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line =>
line.split(" ").toSeq)
val word2vec = new Word2Vec()
word2vec.setNumIterations(5)
val model = word2vec.fit(input)
val synonyms = model.findSynonyms("1", 5)
for((synonym, cosineSimilarity) <- synonyms) {
println(s"$synonym $cosineSimilarity")
}
```
#### Result
```
2 0.9803512096405029
0 0.9774332642555237
3 0.9450059533119202
4 0.9394038319587708
9 -0.7876168489456177
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nzw0301/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19372.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19372
----
commit e2a7d393e141405f658a68f99bc4a1f53816db95
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-27T17:04:03Z
Update equation of lr
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]