[ https://issues.apache.org/jira/browse/SPARK-17595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-17595. ------------------------------- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15150 [https://github.com/apache/spark/pull/15150] > Inefficient selection in Word2VecModel.findSynonyms > --------------------------------------------------- > > Key: SPARK-17595 > URL: https://issues.apache.org/jira/browse/SPARK-17595 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 2.0.0 > Reporter: William Benton > Priority: Minor > Fix For: 2.1.0 > > > The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements > with the highest similarity to the query vector currently sorts the > similarities for every vocabulary element. This involves making multiple > copies of the collection of similarities while doing a (relatively) expensive > sort. It would be more efficient to find the best matches by maintaining a > bounded priority queue and populating it with a single pass over the > vocabulary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org