Vincent created SPARK-17055:
-------------------------------

             Summary: add labelKFold to CrossValidator
                 Key: SPARK-17055
                 URL: https://issues.apache.org/jira/browse/SPARK-17055
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
    Affects Versions: 2.0.0
            Reporter: Vincent
            Priority: Minor


Current CrossValidator only supports k-fold, which randomly divides all the 
samples in k groups of samples. But in cases when data is gathered from 
different subjects and we want to avoid over-fitting, we want to hold out 
samples with certain labels from training data and put them into validation 
fold, i.e. we want to ensure that the same label is not in both testing and 
training sets.

Mainstream package like Sklearn already supports such cross validation method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to