Another point of confusion:
You shouldn't be using clf.predict() to calculate roc auc, you need
clf.predict_proba(). Roc is a measure of sorting and predict only gives you
the predicted class, not the probability, so the roc "curve" can only have
points at 0 and 1 instead of any probability in
How did you evaluate on the development set?
You should use "best_score_", not grid_search.score.
On 05/12/2016 08:07 AM, A neuman wrote:
thats actually what i did.
and the difference is way to big.
Should I do it withlout gridsearchCV? I'm just wondering why
gridsearch giving me overfitted
thats actually what i did.
and the difference is way to big.
Should I do it withlout gridsearchCV? I'm just wondering why gridsearch
giving me overfitted values. I know that these are the best params and so
on... but i thought i can skip the manual part where i test the params on
my own.
2016-05-12 13:02 GMT+02:00 A neuman :
> Thanks for the answer!
>
> but how should i check that its overfitted or not?
Do a development / evaluation split of your dataset, for instance with
the train_test_split utility first. Then train your GridSearchCV model
on the
This would be much clearer if you provided some code, but I think I get
what you're saying.
The final GridSearchCV model is trained on the full training set, so the
fact that it perfectly fits that data with random forests is not altogether
surprising. What you can say about the parameters is