Cool, glad to hear that it was such an easy fix :) > On Jan 30, 2017, at 3:49 PM, Raga Markely <raga.mark...@gmail.com> wrote: > > Nice catch!! The sklearn was 0.18, but I used sklearn.grid_search instead of > sklearn.model_selection. > > Error is gone now. > > Thank you, Sebastian! > Raga > > On Mon, Jan 30, 2017 at 3:37 PM, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > Hm, which version of scikit-learn are you using? Are you running this on > sklearn 0.18? > > Best, > Sebastian > > > On Jan 30, 2017, at 2:48 PM, Raga Markely <raga.mark...@gmail.com> wrote: > > > > Hi Sebastian, > > > > Following up on the original question on repeated Grid Search CV, I tried > > to do repeated nested loop using the followings: > > N_outer=10 > > N_inner=10 > > scores=[] > > for i in range(N_outer): > > k_fold_outer = StratifiedKFold(n_splits=10,shuffle=True,random_state=i) > > for j in range(N_inner): > > k_fold_inner = > > StratifiedKFold(n_splits=10,shuffle=True,random_state=j) > > gs = GridSearchCV(estimator=pipe_svc, > > param_grid=param_grid,cv=k_fold_inner) > > score=cross_val_score(estimator=gs,X=X,y=y,cv=k_fold_outer) > > scores.append(score) > > np.mean(scores) > > np.std(scores) > > > > But, I get the following error: TypeError: 'StratifiedKFold' object is not > > iterable > > > > I did some trials, and the error is gone when I remove cv=k_fold_inner from > > gs = ... > > Could you give me some tips on what I can do? > > > > Thank you! > > Raga > > > > > > > > On Fri, Jan 27, 2017 at 1:16 PM, Raga Markely <raga.mark...@gmail.com> > > wrote: > > Hi Sebastian, > > > > Sorry, I used the wrong terms (I was referring to algo as model).. great > > then, i think what i have is aligned with your workflow.. > > > > Thank you very much for your help! > > > > Have a good weekend, > > Raga > > > > On Fri, Jan 27, 2017 at 1:01 PM, Sebastian Raschka <se.rasc...@gmail.com> > > wrote: > > Hi, Raga, > > > > sounds good, but I am wondering a bit about the order. 2) should come > > before 1), right? Because model selection is basically done via hyperparam > > optimization. > > > > Not saying that this is the optimal/right approach, but I usually do it > > like this: > > > > 1.) algo selection via nested cv > > 2.) model selection based on best algo via k-fold on whole training set > > 3.) fit best algo w. best hyperparams (from 2.) to whole training set > > 4.) evaluate on test set > > 5.) fit classifier to whole dataset, done > > > > Best, > > Sebastian > > > > > On Jan 27, 2017, at 12:49 PM, Sebastian Raschka > > > <m...@sebastianraschka.com> wrote: > > > > > > Hi, Raga, > > > > > > sounds good, but I am wondering a bit about the order. 2) should come > > > before 1), right? Because model selection is basically done via > > > hyperparam optimization. > > > > > > Not saying that this is the optimal/right approach, but I usually do it > > > like this: > > > > > > 1.) algo selection via nested cv > > > 2.) model selection based on best algo via k-fold on whole training set > > > 3.) fit best algo w. best hyperparams (from 2.) to whole training set > > > 4.) evaluate on test set > > > 5.) fit classifier to whole dataset, done > > > > > > Best, > > > Sebastian > > > > > >> On Jan 27, 2017, at 10:23 AM, Raga Markely <raga.mark...@gmail.com> > > >> wrote: > > >> > > >> Sounds good, Sebastian.. thanks for the suggestions.. > > >> > > >> My dataset is relatively small (only ~35 samples), and this is the > > >> workflow I have set up so far.. > > >> 1. Model selection: use nested loop using > > >> cross_val_score(GridSearchCV(...),...) same as shown in the scikit-learn > > >> page that you provided - the results show no statistically significant > > >> difference in accuracy mean +/- SD among classifiers.. this is expected > > >> as the pattern is pretty obvious and simple to separate by eyes after > > >> dimensionality reduction (I use pipeline of stdscaler, LDA, and > > >> classifier)... so i take all of them and use voting classifier in step > > >> #3.. > > >> 2. Hyperparameter optimization: use GridSearchCV to optimize > > >> hyperparameters of each classifiers > > >> 3. Decision Region: use the hyperparameters from step #2, fit each > > >> classifier separately to the whole dataset, and use voting classifier to > > >> get decision region > > >> > > >> This sounds reasonable? > > >> > > >> Thank you very much! > > >> Raga > > >> > > >> On Thu, Jan 26, 2017 at 8:31 PM, Sebastian Raschka > > >> <se.rasc...@gmail.com> wrote: > > >> You are welcome! And in addition, if you select among different > > >> algorithms, here are some more suggestions > > >> > > >> a) don’t do it based on your independent test set if this is going to > > >> your final model performance estimate, or be aware that it would be > > >> overly optimistic > > >> b) also, it’s not the best idea to select algorithms using > > >> cross-validation on the same training set that you used for model > > >> selection; a more robust way would be nested CV (e.g,. > > >> http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html) > > >> > > >> But yeah, it all depends on your dataset and size. If you have a neural > > >> net that takes week to train, and if you have a large dataset anyway so > > >> that you can set aside large sets for testing, I’d train on > > >> train/validation splits and evaluate on the test set. And to compare > > >> e.g., two networks against each other on large test sets, you could do a > > >> McNemar test. > > >> > > >> Best, > > >> Sebastian > > >> > > >>> On Jan 26, 2017, at 8:09 PM, Raga Markely <raga.mark...@gmail.com> > > >>> wrote: > > >>> > > >>> Ahh.. nice.. I will use that.. thanks a lot, Sebastian! > > >>> > > >>> Best, > > >>> Raga > > >>> > > >>> On Thu, Jan 26, 2017 at 6:34 PM, Sebastian Raschka > > >>> <se.rasc...@gmail.com> wrote: > > >>> Hi, Raga, > > >>> > > >>> I think that if GridSearchCV is used for classification, the stratified > > >>> k-fold doesn’t do shuffling by default. > > >>> > > >>> Say you do 20 grid search repetitions, you could then do sth like: > > >>> > > >>> > > >>> from sklearn.model_selection import StratifiedKFold > > >>> > > >>> for i in range(n_reps): > > >>> k_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=i) > > >>> gs = GridSearchCV(..., cv=k_fold) > > >>> ... > > >>> > > >>> Best, > > >>> Sebastian > > >>> > > >>>> On Jan 26, 2017, at 5:39 PM, Raga Markely <raga.mark...@gmail.com> > > >>>> wrote: > > >>>> > > >>>> Hello, > > >>>> > > >>>> I was trying to do repeated Grid Search CV (20 repeats). I thought > > >>>> that each time I call GridSearchCV, the training and test sets > > >>>> separated in different splits would be different. > > >>>> > > >>>> However, I got the same best_params_ and best_scores_ for all 20 > > >>>> repeats. It looks like the training and test sets are separated in > > >>>> identical folds in each run? Just to clarify, e.g. I have the > > >>>> following data: 0,1,2,3,4. Class 1 = [0,1,2] and Class 2 = [3,4]. > > >>>> Suppose I call cv = 2. The split is always for instance [0,3] [1,2,4] > > >>>> in each repeat, and I couldn't get [1,3] [0,2,4] or other combinations. > > >>>> > > >>>> If I understand correctly, GridSearchCV uses StratifiedKFold when I > > >>>> enter cv = integer. The StratifiedKFold command has random state; I > > >>>> wonder if there is anyway I can make the the training and test sets > > >>>> randomly separated each time I call the GridSearchCV? > > >>>> > > >>>> Just a note, I used the following classifiers: Logistic Regression, > > >>>> KNN, SVC, Kernel SVC, Random Forest, and had the same observation > > >>>> regardless of the classifiers. > > >>>> > > >>>> Thank you very much! > > >>>> Raga > > >>>> > > >>>> _______________________________________________ > > >>>> scikit-learn mailing list > > >>>> scikit-learn@python.org > > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > > >>> > > >>> _______________________________________________ > > >>> scikit-learn mailing list > > >>> scikit-learn@python.org > > >>> https://mail.python.org/mailman/listinfo/scikit-learn > > >>> > > >>> _______________________________________________ > > >>> scikit-learn mailing list > > >>> scikit-learn@python.org > > >>> https://mail.python.org/mailman/listinfo/scikit-learn > > >> > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn@python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > >> > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn@python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn