Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20257#discussion_r162040939
--- Diff: docs/ml-features.md ---
@@ -783,11 +783,11 @@ Because this existing `OneHotEncoder` is a stateless
transformer, it is not usab
## OneHotEncoderEstimator
-[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column of
label indices to a column of binary vectors, and each output binary vector
includes at most a single one-value. This encoding allows algorithms which
expect continuous features, such as Logistic Regression, to use categorical
features. For string type input data, it is common to encode categorical
features using [StringIndexer](ml-features.html#stringindexer) first.
+[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a
categorical feature, represented as a label index, to a binary vector with at
most a single one-value indicating the presence of a specific feature value
from among the set of all feature values.
--- End diff --
@viirya sorry for any confusion but I didn't intend you to remove these
sentences:
```This encoding allows algorithms which expect continuous features, such
as Logistic Regression, to use categorical features. For string type input
data, it is common to encode categorical features using
[StringIndexer](ml-features.html#stringindexer) first.```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]