[GitHub] spark pull request: [SPARK-5119] java.lang.ArrayIndexOutOfBoundsEx...

mengxr Tue, 20 Jan 2015 16:50:20 -0800

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/3975#issuecomment-70765133
  
    LIBSVM is also used for regression data. I don't think we can transform 
label values automatically for users inside `loadLibSVMFile`. There used to be 
a `multiclass` flag, but we decided to deprecate its usage and load the raw 
data from LIBSVM files.
    
    The label set should be validated inside each algorithm, e.g., 
    
    
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/DataValidators.scala
    
    @jkbradley It would be easier for the algorithms takes 0,1,2,... and make 
predictions in the same domain. For test data, users should apply the same 
sequence of transformations that were used to produce the training dataset. We 
can embed the mapping from raw labels to indexed labels in the metadata.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5119] java.lang.ArrayIndexOutOfBoundsEx...

Reply via email to