Re: [Scikit-learn-general] Nesting of stratified crossvalidation

Andreas Mueller Fri, 30 Oct 2015 08:40:11 -0700

Depends.

If you give "fit_params" to cross_val_score it will be passed toGridSearchCV in the correct way, I believe.



On 10/30/2015 06:36 AM, Christoph Sawade wrote:

Thanks for the response. I am actually interested in the newDisjointLabelKFold(https://github.com/scikit-learn/scikit-learn/pull/4444) which dependson an additional label. This use case seems to be not yet covered inthe new sklearn.model_selection, is it?
> Changes to support this case have recently been merged into master,and an
>example is on its way:
>https://github.com/scikit-learn/scikit-learn/issues/5589
>
>I think you should be able to run your code by importing GridSearchCV,
>cross_val_score and StratifiedShuffleSplit from the new
>sklearn.model_selection, then the code is identical except you dropthe `y`>argument from StratifiedShuffleSplit's constructor (it's a differentclass,
>actually).
>
>Please do try it out!
>
>On 29 October 2015 at 05:00, Christoph Sawade <
>christoph.saw...@googlemail.com<mailto:christoph.saw...@googlemail.com>> wrote:
>
>> Hey there!
>>
>>A general purpose in machine learning when training a model is toestimate>> also the performance. This is often done via cross validation. Inorder to>> tune also hyperparameters one might want to nest the crossvalidationloops
>> into another. The sklearn framework makes that very easy. However,
>> sometimes it is necessary to stratify the folds to ensure someconstrains
>> (e.g., roughly some proportion of the target label in each fold). These
>> splitters are also provided (e.g., StratifiedShuffleSplit) but donot work
>> when they are nested:
>>
>> import numpy as np
>> from sklearn.grid_search import GridSearchCV
>> from sklearn.cross_validation import StratifiedShuffleSplit
>> from sklearn.linear_model import LogisticRegression
>> from sklearn.cross_validation import cross_val_score
>>
>> # Number of samples per component
>> n_samples = 1000
>>
>> # Generate random sample, two classes
>> X = np.r_[
>>     np.dot(np.random.randn(n_samples, 2), np.array([[0., -0.1], [1.7,
>> .4]])),
>>     np.dot(np.random.randn(n_samples, 2), np.array([[1.0, 0.0], [0.0,
>> 1.0]])) + np.array([-2, 2])
>> ]
>> y = np.concatenate([np.ones(n_samples), -np.ones(n_samples)])
>>
>> # Fit model
>> LogRegOptimalC = GridSearchCV(
>>  estimator=LogisticRegression(),
>>     cv = StratifiedShuffleSplit(y, 3, test_size=0.5, random_state=0),
>>     param_grid={
>>         'C': np.logspace(-3, 3, 7)
>>     }
>> )
>> print cross_val_score(LogRegOptimalC, X, y, cv=5).mean()
>>
>> The problem seems to be that the array reflecting the splittingcriterion>> (here the target y) is not splitted for the inner folds. Is theresome way
>> to tackle that or are there already initiatives dealing with it?
>>
>> Thx Christoph
>>
>>
>>------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Nesting of stratified crossvalidation

Reply via email to