Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-26 Thread Sebastian Raschka
You are welcome! And in addition, if you select among different algorithms, here are some more suggestions a) don’t do it based on your independent test set if this is going to your final model performance estimate, or be aware that it would be overly optimistic b) also, it’s not the best idea

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-26 Thread Raga Markely
Ahh.. nice.. I will use that.. thanks a lot, Sebastian! Best, Raga On Thu, Jan 26, 2017 at 6:34 PM, Sebastian Raschka wrote: > Hi, Raga, > > I think that if GridSearchCV is used for classification, the stratified > k-fold doesn’t do shuffling by default. > > Say you do 20

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-26 Thread Sebastian Raschka
Hi, Raga, I think that if GridSearchCV is used for classification, the stratified k-fold doesn’t do shuffling by default. Say you do 20 grid search repetitions, you could then do sth like: from sklearn.model_selection import StratifiedKFold for i in range(n_reps): k_fold =

Re: [scikit-learn] top N accuracy classification metric

2017-01-26 Thread Johnson, Jeremiah
Okay, I didn't see anything equivalent in the issue tracker, so submitted a pull request. Jeremiah === Jeremiah W. Johnson, Ph. D Assistant Professor of Data Science Analytics Bachelor of Science Program Coordinator University of New Hampshire

Re: [scikit-learn] Scores in Cross Validation

2017-01-26 Thread Raga Markely
Thank you, Guillaume. 1. I agree with you - that's what I have been learning and makes sense.. I was a bit surprised when I read the paper today.. 2. Ah.. thank you.. I got to change my glasses :P Best, Raga *Guillaume Lemaître* g.lemaitre58 at gmail.com

[scikit-learn] Scores in Cross Validation

2017-01-26 Thread Raga Markely
Hello, I have 2 questions regarding cross_val_score. 1. Do the scores returned by cross_val_score correspond to only the test set or the whole data set (training and test sets)? I tried to look at the source code, and it looks like it returns the score of only the test set (line 145:

[scikit-learn] (personal) Survey for future scikit-learn development

2017-01-26 Thread Andreas Mueller
Hey all. I created a survey to prioritize and justify (to people that give me money) future scikit-learn development. It would be great if you could answer it, it should be pretty sort (it's 10 questions, mostly multiple choice). Feel free to share, more replies are better ;)