Hello,
I am very new to scikit-learn and am trying to run cross-validation on a data
frame consisting of text features, classification class. I am trying to perform
text data classification. It is a 2-class classification problem where the
distribution between positive and negative instances is extremely skewed ( we
want to keep it that way on purpose ). Is there a specific cross-validation
type in scikit-learn, where I am able to split each of the K-folds so that each
fold has the same proportion of the positive and negative examples? Meaning if
I have :
100 Positive instances
1000 Negative instances,
would it be possible for me to run a 10 fold Cross-validation where each fold
has 10 +ve and 100 -ve examples randomly chosen from the set, held out as the
validation set?
Some sample code or a link with the same would be helpful.
Thanks,
Nikhil
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general