Depends.
If you give "fit_params" to cross_val_score it will be passed to GridSearchCV in the correct way, I believe.


On 10/30/2015 06:36 AM, Christoph Sawade wrote:
Thanks for the response. I am actually interested in the new DisjointLabelKFold (https://github.com/scikit-learn/scikit-learn/pull/4444) which depends on an additional label. This use case seems to be not yet covered in the new sklearn.model_selection, is it?

> Changes to support this case have recently been merged into master, and an
>example is on its way:
>https://github.com/scikit-learn/scikit-learn/issues/5589
>
>I think you should be able to run your code by importing GridSearchCV,
>cross_val_score and StratifiedShuffleSplit from the new
>sklearn.model_selection, then the code is identical except you drop the `y` >argument from StratifiedShuffleSplit's constructor (it's a different class,
>actually).
>
>Please do try it out!
>
>On 29 October 2015 at 05:00, Christoph Sawade <
>christoph.saw...@googlemail.com <mailto:christoph.saw...@googlemail.com>> wrote:
>
>> Hey there!
>>
>>A general purpose in machine learning when training a model is to estimate >> also the performance. This is often done via cross validation. In order to >> tune also hyperparameters one might want to nest the crossvalidation loops
>> into another. The sklearn framework makes that very easy. However,
>> sometimes it is necessary to stratify the folds to ensure some constrains
>> (e.g., roughly some proportion of the target label in each fold). These
>> splitters are also provided (e.g., StratifiedShuffleSplit) but do not work
>> when they are nested:
>>
>> import numpy as np
>> from sklearn.grid_search import GridSearchCV
>> from sklearn.cross_validation import StratifiedShuffleSplit
>> from sklearn.linear_model import LogisticRegression
>> from sklearn.cross_validation import cross_val_score
>>
>> # Number of samples per component
>> n_samples = 1000
>>
>> # Generate random sample, two classes
>> X = np.r_[
>>     np.dot(np.random.randn(n_samples, 2), np.array([[0., -0.1], [1.7,
>> .4]])),
>>     np.dot(np.random.randn(n_samples, 2), np.array([[1.0, 0.0], [0.0,
>> 1.0]])) + np.array([-2, 2])
>> ]
>> y = np.concatenate([np.ones(n_samples), -np.ones(n_samples)])
>>
>> # Fit model
>> LogRegOptimalC = GridSearchCV(
>>  estimator=LogisticRegression(),
>>     cv = StratifiedShuffleSplit(y, 3, test_size=0.5, random_state=0),
>>     param_grid={
>>         'C': np.logspace(-3, 3, 7)
>>     }
>> )
>> print cross_val_score(LogRegOptimalC, X, y, cv=5).mean()
>>
>> The problem seems to be that the array reflecting the splitting criterion >> (here the target y) is not splitted for the inner folds. Is there some way
>> to tackle that or are there already initiatives dealing with it?
>>
>> Thx Christoph
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to