It means that in your script you should print the score on the validation set instead of the test set.
Then you are allowed to tweak the values in your params dict to see if you can find values that improve that score. Once you are confident that you can no longer improve the validation score via parameter tweaking (or feature engineering) you can evaluate your best model on the final test set (only once). It can be the case that the final test score is a bit worse than the validation score. If that's the case you should trust the test score as the most realistic evaluation of the true generalization performance of your final model. You might also be interested in implementing early stopping with warm started models to adjust the value of n_estimators. For instance see (towards the last third of the notebook): https://github.com/ogrisel/notebooks/blob/master/sklearn_demos/Gradient%20Boosting.ipynb -- Olivier _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
