Re: [Scikit-learn-general] gridsearchCV - overfitting

2016-05-12 Thread Josh Vredevoogd
Another point of confusion: You shouldn't be using clf.predict() to calculate roc auc, you need clf.predict_proba(). Roc is a measure of sorting and predict only gives you the predicted class, not the probability, so the roc "curve" can only have points at 0 and 1 instead of any probability in

Re: [Scikit-learn-general] gridsearchCV - overfitting

2016-05-12 Thread Andreas Mueller
How did you evaluate on the development set? You should use "best_score_", not grid_search.score. On 05/12/2016 08:07 AM, A neuman wrote: thats actually what i did. and the difference is way to big. Should I do it withlout gridsearchCV? I'm just wondering why gridsearch giving me overfitted

Re: [Scikit-learn-general] gridsearchCV - overfitting

2016-05-12 Thread A neuman
thats actually what i did. and the difference is way to big. Should I do it withlout gridsearchCV? I'm just wondering why gridsearch giving me overfitted values. I know that these are the best params and so on... but i thought i can skip the manual part where i test the params on my own.

Re: [Scikit-learn-general] gridsearchCV - overfitting

2016-05-12 Thread Olivier Grisel
2016-05-12 13:02 GMT+02:00 A neuman : > Thanks for the answer! > > but how should i check that its overfitted or not? Do a development / evaluation split of your dataset, for instance with the train_test_split utility first. Then train your GridSearchCV model on the

Re: [Scikit-learn-general] gridsearchCV - overfitting

2016-05-12 Thread Joel Nothman
This would be much clearer if you provided some code, but I think I get what you're saying. The final GridSearchCV model is trained on the full training set, so the fact that it perfectly fits that data with random forests is not altogether surprising. What you can say about the parameters is