Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20257#discussion_r161722104
--- Diff: docs/ml-features.md ---
@@ -775,35 +775,43 @@ for more details on the API.
</div>
</div>
-## OneHotEncoder
+## OneHotEncoder (Deprecated since 2.3.0)
-[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column of
label indices to a column of binary vectors, with at most a single one-value.
This encoding allows algorithms which expect continuous features, such as
Logistic Regression, to use categorical features.
+Because this existing `OneHotEncoder` is a stateless transformer, it is
not usable on new data where the number of categories may differ from the
training data. In order to fix this, a new `OneHotEncoderEstimator` was created
that produces an `OneHotEncoderModel` when fitting. For more detail, please see
the JIRA ticket (https://issues.apache.org/jira/browse/SPARK-13030).
--- End diff --
Change the JIRA link to a Markdown link, e.g.
"see `[SPARK-13030](...)`"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]