Changes to support this case have recently been merged into master, and an example is on its way: https://github.com/scikit-learn/scikit-learn/issues/5589
I think you should be able to run your code by importing GridSearchCV, cross_val_score and StratifiedShuffleSplit from the new sklearn.model_selection, then the code is identical except you drop the `y` argument from StratifiedShuffleSplit's constructor (it's a different class, actually). Please do try it out! On 29 October 2015 at 05:00, Christoph Sawade < christoph.saw...@googlemail.com> wrote: > Hey there! > > A general purpose in machine learning when training a model is to estimate > also the performance. This is often done via cross validation. In order to > tune also hyperparameters one might want to nest the crossvalidation loops > into another. The sklearn framework makes that very easy. However, > sometimes it is necessary to stratify the folds to ensure some constrains > (e.g., roughly some proportion of the target label in each fold). These > splitters are also provided (e.g., StratifiedShuffleSplit) but do not work > when they are nested: > > import numpy as np > from sklearn.grid_search import GridSearchCV > from sklearn.cross_validation import StratifiedShuffleSplit > from sklearn.linear_model import LogisticRegression > from sklearn.cross_validation import cross_val_score > > # Number of samples per component > n_samples = 1000 > > # Generate random sample, two classes > X = np.r_[ > np.dot(np.random.randn(n_samples, 2), np.array([[0., -0.1], [1.7, > .4]])), > np.dot(np.random.randn(n_samples, 2), np.array([[1.0, 0.0], [0.0, > 1.0]])) + np.array([-2, 2]) > ] > y = np.concatenate([np.ones(n_samples), -np.ones(n_samples)]) > > # Fit model > LogRegOptimalC = GridSearchCV( > estimator=LogisticRegression(), > cv = StratifiedShuffleSplit(y, 3, test_size=0.5, random_state=0), > param_grid={ > 'C': np.logspace(-3, 3, 7) > } > ) > print cross_val_score(LogRegOptimalC, X, y, cv=5).mean() > > The problem seems to be that the array reflecting the splitting criterion > (here the target y) is not splitted for the inner folds. Is there some way > to tackle that or are there already initiatives dealing with it? > > Thx Christoph > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general