[GitHub] spark pull request: [SPARK-7586][ML][doc] Add docs of Word2Vec in ...

jkbradley Fri, 15 May 2015 19:40:54 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6181#discussion_r30457642
  
    --- Diff: docs/ml-features.md ---
    @@ -106,6 +106,84 @@ for features_label in featurized.select("features", 
"label").take(3):
     </div>
     </div>
     
    +## Word2Vec
    +
    +`Word2Vec` is an `Estimator` which takes sequences of words that 
represents documents and trains a `Word2VecModel`. The model is a `Map(String, 
Vector)` essentially, which maps each word to an unique fix-sized vector. The 
`Word2VecModel` transforms each documents into a vector using the average of 
all words in the document, which aims to other computations of documents such 
as similarity calculation consequencely. Please refer to the [MLlib user guide 
on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on 
Word2Vec.
    +
    +Word2Vec is implemented in 
[Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the 
following code segment, we start with a set of documents, each of them is 
represented as a sequence of words. For each document, we transform it into a 
feature vector. This feature vector could then be passed to a learning 
algorithm.
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +import org.apache.spark.ml.feature.Word2Vec
    +
    +val documentDF = sqlContext.createDataFrame(Seq(
    --- End diff --
    
    Add comment in line above:
    ```
    Input data: Each row is a bag of words from a sentence or document.
    ```
    (Please add to other examples too.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7586][ML][doc] Add docs of Word2Vec in ...

Reply via email to