Github user LowikC commented on a diff in the pull request:
https://github.com/apache/spark/pull/19372#discussion_r141607418
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -368,9 +368,9 @@ class Word2Vec extends Serializable with Logging {
var wc = wordCount
if (wordCount - lastWordCount > 10000) {
lwc = wordCount
- alpha =
- learningRate *
- (1 - numPartitions * wordCount.toDouble / (numIterations
* trainWordsCount + 1))
+ alpha = learningRate *
+ (1 - numPartitions * wordCount.toDouble + (k - 1) *
trainWordsCount /
--- End diff --
you need `numPartitions * wordCount.toDouble + (k - 1) * trainWordsCount`
between parentheses
`alpha = learningRate * (1 - (numPartitions * wordCount.toDouble + (k - 1)
* trainWordsCount) / (numIterations * trainWordsCount + 1))`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]