William Benton created SPARK-17595:
--------------------------------------

             Summary: Inefficient selection in Word2VecModel.findSynonyms
                 Key: SPARK-17595
                 URL: https://issues.apache.org/jira/browse/SPARK-17595
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 2.0.0
            Reporter: William Benton
            Priority: Minor


The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements with 
the highest similarity to the query vector currently sorts the similarities for 
every vocabulary element.  This involves making multiple copies of the 
collection of similarities while doing a (relatively) expensive sort.  It would 
be more efficient to find the best matches by maintaining a bounded priority 
queue and populating it with a single pass over the vocabulary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to