Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20257#discussion_r161740612
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java
---
@@ -35,41 +34,37 @@
import org.apache.spark.sql.types.StructType;
// $example off$
-public class JavaOneHotEncoderExample {
+public class JavaOneHotEncoderEstimatorExample {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
- .appName("JavaOneHotEncoderExample")
+ .appName("JavaOneHotEncoderEstimatorExample")
.getOrCreate();
// $example on$
+ // Notice: this categorical features are usually encoded with
`StringIndexer`.
--- End diff --
Perhaps we can move the note above the `$example on$` - I don't think it is
necessary for it to appear in the user guide as we've mentioned it above.
Also perhaps rather: `Note: categorical features are usually first encoded
with StringIndexer`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]