Github user Lewuathe commented on a diff in the pull request:
https://github.com/apache/spark/pull/8205#discussion_r37291358
--- Diff: docs/ml-features.md ---
@@ -654,7 +654,11 @@ for expanded in polyDF.select("polyFeatures").take(3):
`StringIndexer` encodes a string column of labels to a column of label
indices.
The indices are in `[0, numLabels)`, ordered by label frequencies.
So the most frequent label gets index `0`.
-If the input column is numeric, we cast it to string and index the string
values.
+If the input column is numeric, we cast it to string and index the string
+values. When following pipeline components such as `Estimator` or
--- End diff --
Thank you for comment and sorry for bothering you.
Though this might be a nit-picky, it is a little difficult to infer the
fact that outputs of each estimator are passed to inputs of next estimator.
This doesn't seem to be described in `ml-guide.md`.
([Details](https://spark.apache.org/docs/latest/ml-guide.html#details) might be
written about it.) So the reason why I wrote this sentence was to make it
explicit.
If there is any other good place to put this sentence in, we can write.
Anyway I'll update based on your pointing out.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]