[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

willb Fri, 16 Sep 2016 05:21:08 -0700

Github user willb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15105#discussion_r79157601
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
    @@ -227,7 +227,7 @@ class Word2VecModel private[ml] (
        */
       @Since("1.5.0")
       def findSynonyms(word: String, num: Int): DataFrame = {
    -    findSynonyms(wordVectors.transform(word), num)
    +    findSynonyms(wordVectors.transform(word), num, Some(word))
    --- End diff --
    
    In this case (and similarly in 
[`Word2VecModelWrapper`](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/Word2VecModelWrapper.scala))
 I opted to call the three-argument version because the wrappers both 
explicitly convert their argument to a vector before calling `findSynonyms` on 
the underlying model (and so `wordOpt` would not be defined if the wrapper were 
invoked with a word).  If we were to make the three-argument `findSynonyms` 
private we wouldn't be able to share a code path in the wrapper classes and 
would need to duplicate the code to tidy and reformat results in both methods 
(data frame creation in this case, unzipping and `asJava` in the Python model 
wrapper) or factor it out to a separate method.  Let me know how you want me to 
proceed here.
    
    I agree that updating the docs makes sense and will make it clearer to 
future maintainers as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

Reply via email to