Hi Yanir.
I was not aware that GradientBoosting had oob scores.
Is that even possible / sensible? It definitely does not do what it promises :-/

Peter, any thoughts?

Cheers,
Andy

On 03/22/2013 11:39 AM, Yanir Seroussi wrote:
Hi,

I'm new to the mailing list, so I apologise if this has been asked before.

I want to use the oob_score_ in GradientBoostingRegressor to determine the optimal number of iterations without relying on an external validation set, so I set the subsample parameter to 0.5 and trained the model. However, I've noticed that oob_score_ improves in a similar manner to the in-bag scores (train_score_). That is, it goes down very fast, and keeps improving regardless of the number of iterations.

Digging through the code in ensemble/gradient_boosting.py, it seems like the cause is that oob_score_[i] includes previous trees that were trained on the OOB instances of the i-th sample. Isn't the OOB score supposed to be calculated for each OOB instance using only trees that where this instance wasn't used for training (as done for random forests)?

Cheers,
Yanir


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to