viirya commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors 
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560225461
 
 
   > I'm kind of wondering about this line in the C code:
   https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L448
   
   > I don't quite see its equivalent here. syn0 is basically used for neu1, 
but it's missing some normalization by cw, which is I believe 2 * windowSize + 
1 - 2 * b here. That's up to a factor of about 9 if windowSize is 4. That 
feeds, I think, directly into the size of g as it makes the magnitude of the 
dot product that feeds f a lot larger.
   
   This part is for cbow architecture in Word2Vec. As we support only 
Skip-gram, I think the update rule is different.
   
   For Skip-gram training beginning at 
https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L495, I think we 
did the same thing in updating weights. Looks not an issue to me.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to