Hi, Nikhil,

you could use stratified k-fold cross validation, which preserves the 
"original" class proportions. An example can be found here:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html
 
<http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html>

Best,
Sebastian

> On Apr 28, 2015, at 10:40 PM, nmura...@masonlive.gmu.edu wrote:
> 
> Hello,
> 
> I am very new to scikit-learn and am trying to run cross-validation on a data 
> frame consisting of text features, classification class. I am trying to 
> perform text data classification. It is a 2-class classification problem 
> where the distribution between positive and negative instances is extremely 
> skewed ( we want to keep it that way on purpose ). Is there a specific 
> cross-validation type in scikit-learn, where I am able to split each of the 
> K-folds so that each fold has the same proportion of the positive and 
> negative examples?  Meaning if I have :
> 
> 100 Positive instances
> 1000 Negative instances,  
> 
> would it be possible for me to run a 10 fold Cross-validation where each fold 
> has 10 +ve and 100 -ve examples randomly chosen from the set, held out as the 
> validation set?
> 
> Some sample code or a link with the same would be helpful.
> 
> Thanks,
> Nikhil
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>  
> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to