viirya commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560225461 > I'm kind of wondering about this line in the C code: https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L448 > I don't quite see its equivalent here. syn0 is basically used for neu1, but it's missing some normalization by cw, which is I believe 2 * windowSize + 1 - 2 * b here. That's up to a factor of about 9 if windowSize is 4. That feeds, I think, directly into the size of g as it makes the magnitude of the dot product that feeds f a lot larger. This part is for cbow architecture in Word2Vec. As we support only Skip-gram, I think the update rule is different. For Skip-gram training beginning at https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L495, I think we did the same thing in updating weights. Looks not an issue to me.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
