Github user ericl commented on the pull request:
https://github.com/apache/spark/pull/7987#issuecomment-133127935
If if I understand correctly, the concern is that the category to index
assignment when predicting data will be different from that used when fitting
the model. This should be OK here since `StringIndexer` retains a mapping from
category to indices, which is reused when calling predict() on the model later.
It is true that it would be nice to have a more predictable ordering (such
as alphabetic) for some tasks like comparing coefficients between different
models, but I think this is should be a feature of `StringIndexer` and is not
really related to this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]