Re: [Scikit-learn-general] Warm_start on Random Forest Classifiers

Dale Smith Thu, 25 Jun 2015 09:52:33 -0700

Yes, I’m using it. Here’s an example. I also posted my own code yesterday; you 
can find it in the archives.


http://scikit-learn.org/dev/auto_examples/ensemble/plot_ensemble_oob.html

It works well with n_jobs > 0; ignore the comment in the example. Examining the 
code in forest.py, using warm_start=Trure means the incremental number of 
additional trees are fit.

For large datasets (400,000 cases, 3,500 features), I’m experiencing 
bottlenecks before and after the trees are actually built.

https://github.com/scikit-learn/scikit-learn/issues/4898

The OOB error versus the number of trees as an evaluation tool is mentioned in 
the paper

http://projecteuclid.org/euclid.ssu/1257431567

Free for download.


Dale Smith, Ph.D.
Data Scientist

[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]<http://nexidia.com/>

d. 404.495.7220 x 4008   f. 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
30305

[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]<http://blog.nexidia.com/>
 [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg] 
<https://www.linkedin.com/company/nexidia>  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg] 
<https://plus.google.com/u/0/107921893643164441840/posts>  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg] 
<https://twitter.com/Nexidia>  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg] 
<https://www.youtube.com/user/NexidiaTV>

From: Rafael Calsaverini [mailto:rafael.calsaver...@gmail.com]
Sent: Thursday, June 25, 2015 11:59 AM
To: scikit-learn-general
Subject: [Scikit-learn-general] Warm_start on Random Forest Classifiers

I saw the new parameter warm_start on the RandomForestClassifier class and was 
curious about what is its most common use. I can see two uses for it: (1) 
instead of fitting a huge forest in one go, fit it many times in the same data 
set and check the improvement in cross validation, (2) fit the data in a whole 
new data set and so doing some kind of "online mini-batch Random Forest", which 
might be useful for some of my use cases.
The first use is very straightforward and it seems obvious that a forest 
trained two times in the same dataset with warm_start=True will have similar 
performance with a forest trained only "once" with double n_estimators.
The second case though seem to break a few assumptions of the statistical ideas 
behind Random Forests. Have anyone tried it successfully?
Thanks,
Rafael Calsaverini

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Warm_start on Random Forest Classifiers

Reply via email to