Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/12663#issuecomment-214790032
Out of curiosity, if or when
[SPARK-7126](https://issues.apache.org/jira/browse/SPARK-7126) is implemented,
do we plan to remove this behavior?
Regarding small datasets and cross validation, it is a bit concerning that
the model could get trained with an incorrect number of classes, and since it
will happen silently, it could create some confusion. However, I think it is
reasonable to expect that end users should realize that some splits of their
data could be missing label class values, and without explicitly flagging the
number of classes, there is no way for the algorithm to know.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]