Hi, Raga,

I think that if GridSearchCV is used for classification, the stratified k-fold 
doesn’t do shuffling by default. 

Say you do 20 grid search repetitions, you could then do sth like:


from sklearn.model_selection import StratifiedKFold

for i in range(n_reps):
    k_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
    gs = GridSearchCV(..., cv=k_fold)
    ...

Best,
Sebastian

> On Jan 26, 2017, at 5:39 PM, Raga Markely <[email protected]> wrote:
> 
> Hello,
> 
> I was trying to do repeated Grid Search CV (20 repeats). I thought that each 
> time I call GridSearchCV, the training and test sets separated in different 
> splits would be different. 
> 
> However, I got the same best_params_ and best_scores_ for all 20 repeats. It 
> looks like the training and test sets are separated in identical folds in 
> each run? Just to clarify, e.g. I have the following data: 0,1,2,3,4. Class 1 
> = [0,1,2] and Class 2 = [3,4]. Suppose I call cv = 2. The split is always for 
> instance [0,3] [1,2,4] in each repeat, and I couldn't get [1,3] [0,2,4] or 
> other combinations.
> 
> If I understand correctly, GridSearchCV uses StratifiedKFold when I enter cv 
> = integer. The StratifiedKFold command has random state; I wonder if there is 
> anyway I can make the the training and test sets randomly separated each time 
> I call the GridSearchCV? 
> 
> Just a note, I used the following classifiers: Logistic Regression, KNN, SVC, 
> Kernel SVC, Random Forest, and had the same observation regardless of the 
> classifiers.
> 
> Thank you very much!
> Raga
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to