Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3975#issuecomment-70765133
LIBSVM is also used for regression data. I don't think we can transform
label values automatically for users inside `loadLibSVMFile`. There used to be
a `multiclass` flag, but we decided to deprecate its usage and load the raw
data from LIBSVM files.
The label set should be validated inside each algorithm, e.g.,
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/DataValidators.scala
@jkbradley It would be easier for the algorithms takes 0,1,2,... and make
predictions in the same domain. For test data, users should apply the same
sequence of transformations that were used to produce the training dataset. We
can embed the mapping from raw labels to indexed labels in the metadata.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]