I saw the new parameter warm_start on the RandomForestClassifier class and
was curious about what is its most common use. I can see two uses for it:
(1) instead of fitting a huge forest in one go, fit it many times in the
same data set and check the improvement in cross validation, (2) fit the
data in a whole new data set and so doing some kind of "online mini-batch
Random Forest", which might be useful for some of my use cases.
The first use is very straightforward and it seems obvious that a forest
trained two times in the same dataset with warm_start=True will have
similar performance with a forest trained only "once" with double
n_estimators.
The second case though seem to break a few assumptions of the statistical
ideas behind Random Forests. Have anyone tried it successfully?
Thanks,
Rafael Calsaverini
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general