Changes to support this case have recently been merged into master, and an
example is on its way:
https://github.com/scikit-learn/scikit-learn/issues/5589

I think you should be able to run your code by importing GridSearchCV,
cross_val_score and StratifiedShuffleSplit from the new
sklearn.model_selection, then the code is identical except you drop the `y`
argument from StratifiedShuffleSplit's constructor (it's a different class,
actually).

Please do try it out!

On 29 October 2015 at 05:00, Christoph Sawade <
christoph.saw...@googlemail.com> wrote:

> Hey there!
>
> A general purpose in machine learning when training a model is to estimate
> also the performance. This is often done via cross validation. In order to
> tune also hyperparameters one might want to nest the crossvalidation loops
> into another. The sklearn framework makes that very easy. However,
> sometimes it is necessary to stratify the folds to ensure some constrains
> (e.g., roughly some proportion of the target label in each fold). These
> splitters are also provided (e.g., StratifiedShuffleSplit) but do not work
> when they are nested:
>
> import numpy as np
> from sklearn.grid_search import GridSearchCV
> from sklearn.cross_validation import StratifiedShuffleSplit
> from sklearn.linear_model import LogisticRegression
> from sklearn.cross_validation import cross_val_score
>
> # Number of samples per component
> n_samples = 1000
>
> # Generate random sample, two classes
> X = np.r_[
>     np.dot(np.random.randn(n_samples, 2), np.array([[0., -0.1], [1.7,
> .4]])),
>     np.dot(np.random.randn(n_samples, 2), np.array([[1.0, 0.0], [0.0,
> 1.0]])) + np.array([-2, 2])
> ]
> y = np.concatenate([np.ones(n_samples), -np.ones(n_samples)])
>
> # Fit model
> LogRegOptimalC = GridSearchCV(
>     estimator=LogisticRegression(),
>     cv = StratifiedShuffleSplit(y, 3, test_size=0.5, random_state=0),
>     param_grid={
>         'C': np.logspace(-3, 3, 7)
>     }
> )
> print cross_val_score(LogRegOptimalC, X, y, cv=5).mean()
>
> The problem seems to be that the array reflecting the splitting criterion
> (here the target y) is not splitted for the inner folds. Is there some way
> to tackle that or are there already initiatives dealing with it?
>
> Thx Christoph
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to