GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/5596

    [ML][SPARK-6529] Add Word2Vec transformer

    See JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-6529).
    
    There are some notes:
    
    1. I add `learningRate` in sharedParams since it is a common parameter for 
ML algorithms.
    2. We will not support transform of finding synonyms from a `Vector`, which 
will support in further JIRA issues.
    3. Word2Vec is different with other ML models that its training set and 
transformed set are different. Its training set is an `RDD[Iterable[String]]` 
which represents documents, but the transformed set we want is an `RDD[String]` 
that represents unique words. So you have to switch your `inputCol` in these 
two stages.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-6529

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5596
    
----
commit 6a514f16fd12f7b7dbf9fe33e442b33958d1cd20
Author: Xusen Yin <[email protected]>
Date:   2015-04-19T09:22:33Z

    add word2vec transformer

commit 02767fb59d2b3583000a15d3a337b6a41c6be71f
Author: Xusen Yin <[email protected]>
Date:   2015-04-20T04:43:48Z

    add shared params

commit fe3afe99214f72517a3a695063dc710110f8dd31
Author: Xusen Yin <[email protected]>
Date:   2015-04-20T06:53:29Z

    add test suite and pass it

commit e29680a091806bcb3ee6c9b8a44e407b4bd040fa
Author: Xusen Yin <[email protected]>
Date:   2015-04-20T15:34:09Z

    fix errors

commit 618abd0cc3727896448c227ccccac351a0e592a6
Author: Xusen Yin <[email protected]>
Date:   2015-04-20T15:57:37Z

    refine comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to