Hi, Nikhil,
you could use stratified k-fold cross validation, which preserves the
"original" class proportions. An example can be found here:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html
<http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html>
Best,
Sebastian
> On Apr 28, 2015, at 10:40 PM, nmura...@masonlive.gmu.edu wrote:
>
> Hello,
>
> I am very new to scikit-learn and am trying to run cross-validation on a data
> frame consisting of text features, classification class. I am trying to
> perform text data classification. It is a 2-class classification problem
> where the distribution between positive and negative instances is extremely
> skewed ( we want to keep it that way on purpose ). Is there a specific
> cross-validation type in scikit-learn, where I am able to split each of the
> K-folds so that each fold has the same proportion of the positive and
> negative examples? Meaning if I have :
>
> 100 Positive instances
> 1000 Negative instances,
>
> would it be possible for me to run a 10 fold Cross-validation where each fold
> has 10 +ve and 100 -ve examples randomly chosen from the set, held out as the
> validation set?
>
> Some sample code or a link with the same would be helpful.
>
> Thanks,
> Nikhil
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>
> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general