Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8752#discussion_r39559560
--- Diff: docs/ml-features.md ---
@@ -123,12 +123,21 @@ for features_label in rescaledData.select("features",
"label").take(3):
## Word2Vec
-`Word2Vec` is an `Estimator` which takes sequences of words that
represents documents and trains a `Word2VecModel`. The model is a `Map(String,
Vector)` essentially, which maps each word to an unique fix-sized vector. The
`Word2VecModel` transforms each documents into a vector using the average of
all words in the document, which aims to other computations of documents such
as similarity calculation consequencely. Please refer to the [MLlib user guide
on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on
Word2Vec.
+`Word2Vec` is an `Estimator` which takes sequences of words representing
documents and trains a
+`Word2VecModel`. The model maps each word to a unique fixed-size vector.
The `Word2VecModel`
+transforms each document into a vector using the average of all words in
the document; this vector
+can then be used for as features for prediction, document similarity
calculations, etc.
+Please refer to the [MLlib user guide on
Word2Vec](mllib-feature-extraction.html#Word2Vec) for more
+details.
-Word2Vec is implemented in
[Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the
following code segment, we start with a set of documents, each of them is
represented as a sequence of words. For each document, we transform it into a
feature vector. This feature vector could then be passed to a learning
algorithm.
+In the following code segment, we start with a set of documents, each of
which is represented as a sequence of words. For each document, we transform it
into a feature vector. This feature vector could then be passed to a learning
algorithm.
<div class="codetabs">
<div data-lang="scala" markdown="1">
+
+Refer to the [Word2Vec Scala
docs](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec)
--- End diff --
The classname is \`backticked\` in ChiSqSelector but not here or in
Binarizer, we should choose one and be consistent. I would vote for backticking
everything since that's what I've been doing
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]