srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-562385030
I'm OK with putting it in 2.4, I think. It's a minor behavior change, but,
also appears to be
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-561703811
PS does this also solve your problem? this change sounds OK to me.
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560431648
Ah right, disregard my previous comment. Am I right that the original
implementation, being
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560191791
Hm, that value isn't negative though, just very small. The next line,
perhaps accidentally,
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560173675
Hm, that's a crazy result. Something is wrong, to be sure. I can't imagine
why just 5
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560125533
BTW what are the exponents on these figures -- can you print more? or print
their magnitude?
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560018694
I see, so are you saying the weights are effectively N times larger with N
partitions than 1?
srowen commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560012547
Hm, I don't think it's only cosine similarity that matters; these are often
used in general in