Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/21792
@srowen I think we need to update the docs
1, Current doc in `StringIndexer` is somewhat misleading: "The indices are
in `[0, numLabels)`, ordered by label frequencies, so the most frequent label
gets index `0`." this is true only with default ordering type.
2, In RFormula, `stringOrderType` only affect feature columns, not label
column. This need to be emphasised, which is somewhat out of expectation.
@MLnick your thoughts?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]