Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/21792 @srowen I think we need to update the docs 1, Current doc in `StringIndexer` is somewhat misleading: "The indices are in `[0, numLabels)`, ordered by label frequencies, so the most frequent label gets index `0`." this is true only with default ordering type. 2, In RFormula, `stringOrderType` only affect feature columns, not label column. This need to be emphasised, which is somewhat out of expectation. @MLnick your thoughts?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org