Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20257#discussion_r161472191
--- Diff: docs/ml-features.md ---
@@ -775,7 +775,9 @@ for more details on the API.
</div>
</div>
-## OneHotEncoder
+## OneHotEncoder (Deprecated since 2.3.0)
--- End diff --
I think we should add a little more detail about why it's deprecated.
The reason is that because the existing `OneHotEncoder` is a stateless
transformer, it is not usable on new data where the number of categories may
differ from the training data. In order to fix this, a new
`OneHotEncoderEstimator` was created that produces a `OneHotEncoderModel` when
fit. Add a link to the JIRA ticket for more detail
(https://issues.apache.org/jira/browse/SPARK-13030).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]