srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560173675 Hm, that's a crazy result. Something is wrong, to be sure. I can't imagine why just 5 partitions would make such a difference. I don't know word2vec well, but it looks kind of like it adds the new vectors per word instead of taking one of them arbitrarily (a la Hogwild). But I might be misreading it. And even if it were adding them, you'd imagine that 5 partitions might make the result 5x larger than usual, not 10^17. Do you see info log output showing what alpha is? I'm curious about what happens in the line: ``` alpha = learningRate * (1 - (numPartitions * wordCount.toDouble + numWordsProcessedInPreviousIterations) / totalWordsCounts) ``` and also how this might become large: ``` val g = ((1 - bcVocab.value(word).code(d) - f) * alpha).toFloat ``` It kind of feels like some multiplier which is supposed to be in [0,1] is becoming significantly negative and it makes it grow out of control
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
