srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors 
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560173675
 
 
   Hm, that's a crazy result. Something is wrong, to be sure. I can't imagine 
why just 5 partitions would make such a difference. I don't know word2vec well, 
but it looks kind of like it adds the new vectors per word instead of taking 
one of them arbitrarily (a la Hogwild). But I might be misreading it. And even 
if it were adding them, you'd imagine that 5 partitions might make the result 
5x larger than usual, not 10^17.
   
   Do you see info log output showing what alpha is? I'm curious about what 
happens in the line:
   ```
    alpha = learningRate *
                   (1 - (numPartitions * wordCount.toDouble + 
numWordsProcessedInPreviousIterations) /
                     totalWordsCounts)
   ```
   
   and also how this might become large:
   
   ```
   val g = ((1 - bcVocab.value(word).code(d) - f) * alpha).toFloat
   ```
   
    It kind of feels like some multiplier which is supposed to be in [0,1] is 
becoming significantly negative and it makes it grow out of control

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to