[GitHub] spark pull request: [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML g...

feynmanliang Tue, 15 Sep 2015 13:22:37 -0700

Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8752#discussion_r39559560
  
    --- Diff: docs/ml-features.md ---
    @@ -123,12 +123,21 @@ for features_label in rescaledData.select("features", 
"label").take(3):
     
     ## Word2Vec
     
    -`Word2Vec` is an `Estimator` which takes sequences of words that 
represents documents and trains a `Word2VecModel`. The model is a `Map(String, 
Vector)` essentially, which maps each word to an unique fix-sized vector. The 
`Word2VecModel` transforms each documents into a vector using the average of 
all words in the document, which aims to other computations of documents such 
as similarity calculation consequencely. Please refer to the [MLlib user guide 
on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on 
Word2Vec.
    +`Word2Vec` is an `Estimator` which takes sequences of words representing 
documents and trains a
    +`Word2VecModel`. The model maps each word to a unique fixed-size vector. 
The `Word2VecModel`
    +transforms each document into a vector using the average of all words in 
the document; this vector
    +can then be used for as features for prediction, document similarity 
calculations, etc.
    +Please refer to the [MLlib user guide on 
Word2Vec](mllib-feature-extraction.html#Word2Vec) for more
    +details.
     
    -Word2Vec is implemented in 
[Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the 
following code segment, we start with a set of documents, each of them is 
represented as a sequence of words. For each document, we transform it into a 
feature vector. This feature vector could then be passed to a learning 
algorithm.
    +In the following code segment, we start with a set of documents, each of 
which is represented as a sequence of words. For each document, we transform it 
into a feature vector. This feature vector could then be passed to a learning 
algorithm.
     
     <div class="codetabs">
     <div data-lang="scala" markdown="1">
    +
    +Refer to the [Word2Vec Scala 
docs](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec)
    --- End diff --
    
    The classname is \`backticked\` in ChiSqSelector but not here or in 
Binarizer, we should choose one and be consistent. I would vote for backticking 
everything since that's what I've been doing



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML g...

Reply via email to