Github user ericl commented on the pull request:

    https://github.com/apache/spark/pull/7987#issuecomment-133127935
  
    If if I understand correctly, the concern is that the category to index 
assignment when predicting data will be different from that used when fitting 
the model. This should be OK here since `StringIndexer` retains a mapping from 
category to indices, which is reused when calling predict() on the model later.
    
    It is true that it would be nice to have a more predictable ordering (such 
as alphabetic) for some tasks like comparing coefficients between different 
models, but I think this is should be a feature of `StringIndexer` and is not 
really related to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to