Re: [Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

Sebastian Raschka Wed, 29 Apr 2015 08:57:14 -0700

Maybe instead of penalizing false negatives more then false positives you could 
optimize your model for recall instead of accuracy, where recall is the [true 
positives]/ [all positives] and then also experiment with the class weights a 
little but. Also, you could try over- or undersampling techniques and see if 
that helps.


> Has the interface been designed in such a way that once we load the data into 
> a format that works with one learning model ( maybe a naive-bayes classifier

the data is usually represented as regular numpy array or sparse matrix. I 
think the count and tfidf vectorizer return sparse matrices, but it should be 
supported by most models. I think random forests didn't support it though -- 
here, you can simply converted a sparse matrix to a regular numpy array via 
X.toarray().
In a nutshell: You don't have to do anything special to your numpy array to use 
it for different learning models except for making sure that categorical 
variables are encoded correctly and you don't have any missing values in your 
cells.

> On Apr 29, 2015, at 10:02 AM, nmura...@masonlive.gmu.edu wrote:
> 
> Sure will do thank you. I had a question, unrelated to the cross-validation 
> issue. I have a training set with skewed distribution  of positive and 
> negative examples in which negative examples far out-number positive ones. 
> The problem setting is 2 class text classification. 
> 
> 1. How can I specify the Learning model to penalize false negatives more than 
> false positives? Could we use the same method with different classifiers to 
> achieve this effect?
> 
> 
> 2. My second question relates to interface consistency across Scikit-Learn 
> code base. Has the interface been designed in such a way that once we load 
> the data into a format that works with one learning model ( maybe a 
> naive-bayes classifier) , the same data can be used for other learning 
> models, dimensionality reduction, clustering etc...?
> 
> Thanks,
> Nikhil
> 
> Sent from my iPhone
> 
> On Apr 28, 2015, at 11:04 PM, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> 
>> Hi, Nikhil,
>> 
>> you could use stratified k-fold cross validation, which preserves the 
>> "original" class proportions. An example can be found here:
>> http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html
>>  
>> <http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html>
>> 
>> Best,
>> Sebastian
>> 
>>> On Apr 28, 2015, at 10:40 PM, nmura...@masonlive.gmu.edu 
>>> <mailto:nmura...@masonlive.gmu.edu> wrote:
>>> 
>>> Hello,
>>> 
>>> I am very new to scikit-learn and am trying to run cross-validation on a 
>>> data frame consisting of text features, classification class. I am trying 
>>> to perform text data classification. It is a 2-class classification problem 
>>> where the distribution between positive and negative instances is extremely 
>>> skewed ( we want to keep it that way on purpose ). Is there a specific 
>>> cross-validation type in scikit-learn, where I am able to split each of the 
>>> K-folds so that each fold has the same proportion of the positive and 
>>> negative examples?  Meaning if I have :
>>> 
>>> 100 Positive instances
>>> 1000 Negative instances,  
>>> 
>>> would it be possible for me to run a 10 fold Cross-validation where each 
>>> fold has 10 +ve and 100 -ve examples randomly chosen from the set, held out 
>>> as the validation set?
>>> 
>>> Some sample code or a link with the same would be helpful.
>>> 
>>> Thanks,
>>> Nikhil
>>> 
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>>>  
>>> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y 
>> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y>_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

Reply via email to