GitHub user ezli opened a pull request:

    https://github.com/apache/spark/pull/6245

    [MLLIB][SPARK-7615, SPARK-7617, SPARK-7618]: Avoid divided by zero norm, 
cache normalized Word Vectors

    1. [SPARK-7615] Commit 8c23ddd: Fix wordVectors divided by norm = 0. Add a 
ScalaTest for divided by zero scenario;
    2. [SPARK-7617] Commit a941a3d: Normalize fVector in findSynonyms() to make 
cosine distances comparable across all words.
    3. [SPARK-7618] Commit 8642ff2: Cache the normalized wordVectors, speed up 
multiple findSynonyms() calls; Do lazy loading for wordVectors and 
wordVectorsNormalized;

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ezli/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6245
    
----
commit 8c23dddd9ca7d0f767457a79ad3ab8f93655b257
Author: Eric Li <[email protected]>
Date:   2015-05-18T17:33:33Z

    Fix wordVectors divided by norm = 0

commit a941a3d8e1434cbf694754059e6aec9da88a3fd0
Author: Eric Li <[email protected]>
Date:   2015-05-18T17:36:25Z

    SPARK-7617: normalize fVector

commit 8642ff26e69ea341ac319c5540e0964f9324e97a
Author: Eric Li <[email protected]>
Date:   2015-05-18T20:41:45Z

    SPARK-7618: Cache normalized wordVectors; Lazy loading wordVectors and 
wordVectorsNormalized.

commit 95db8b70efae30b2dab138f2c277b9dacc8a87f9
Author: Eric Li <[email protected]>
Date:   2015-05-18T20:53:56Z

    Merge branch 'master' into SPARK-7615/divide-by-zero-norm

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to