Hi Dale

Thanks for the code sample! Indeed, warm_start does not disable
parallelization, I can confirm by both running your code and checking the
source. Moreover, that example you mentioned was added on May, 2nd,
and it doesn't
look
<https://github.com/scikit-learn/scikit-learn/commits/master/sklearn/ensemble/forest.py>
like there were any relevant changes to the master branch since.


On Wed, Jun 24, 2015 at 6:06 PM, Dale Smith <dsm...@nexidia.com> wrote:

>  Hello,
>
>
>
> Version 0.16.1 adds warm_start to RandomForestClassifier, but the
> documentation doesn't include a note that warm_start disables
> parallelization. I found a reference to this in a comment in the "OOB
> Errors for Random Forests" example in the development documentation.
>
>
>
> http://scikit-learn.org/dev/auto_examples/ensemble/plot_ensemble_oob.html
>
>
>
> Setting both does not generate a warning or error. My own testing
> indicates that warm_start allows the use of n_jobs for version 0.16.1. I
> can see the processor use in Task Manager.
>
>
>
> I am using a Numpy 64bit experimental build with Mingw-w64 and OpenBlas
> provided by Carl Kleffner (
> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads). I have a VM
> with the out-of-the box numpy and scikit-learn version 0.16.1, and I
> observe the same behavior – use of more than one core as confirmed by Task
> Manager. For the record, I’m using Windows 7 and Anaconda 3.
>
>
>
> Am I missing something? Does warm_start allow the use of more than one
> processor? Has there been a change in the development tree that affects
> parallelization? I’ve searched around for an answer but can’t find anything
> relevant. Here is some reproducible code.
>
>
>
> The RandomForestClassifier constructor documentation doesn’t address these
> concerns. I’m willing to edit the documentation myself once this issue is
> clarified.
>
>
>
> Thanks.
>
>
>
> import sklearn as sk
> from pandas import DataFrame
> from sklearn.ensemble import RandomForestClassifier
> from sklearn.datasets import make_classification
> import seaborn as sns
> import numpy as np
>
> print(sk.__version__)
>
> # Build a classification task using 300 informative features
> # There are 100,000 samples
> X, y = make_classification(n_samples=100000,
>                            n_features=500,
>                            n_informative=30,
>                            n_redundant=0,
>                            n_repeated=0,
>                            n_classes=2,
>                            random_state=0,
>                            shuffle=False)
>
> forest = RandomForestClassifier(n_jobs=10, random_state=100,
> oob_score=True, bootstrap=True, warm_start=True)
>
> n_estimators = 200
> rng = range(50, n_estimators + 1, 25)
> error_rate = DataFrame(index=np.arange(0, len(rng)), columns=('Number of
> Trees', 'OOB Error'))
> for i, n_trees in enumerate(rng):
>    print("Fit training set for {0:d} trees.".format(n_trees))
>    forest.set_params(n_estimators=n_trees)
>    forest.set_params(n_jobs=10)
>    params = forest.get_params()
>    forest.fit(X, y)
>    error_rate.loc[i] = [n_trees, 1 - forest.oob_score_]
>
> sns.lmplot('Number of Trees', 'OOB Error',
> error_rate).savefig("test_warm_start.png")
>
> print("Finished")
>
>
>
>
>
>
> *Dale Smith, Ph.D.*
> Data Scientist
> ​
> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]
> <http://nexidia.com/>
>
> * d.* 404.495.7220 x 4008   *f.* 404.795.7221
> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta,
> GA 30305
>
> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]
> <http://blog.nexidia.com/> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg]
> <https://www.linkedin.com/company/nexidia> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg]
> <https://plus.google.com/u/0/107921893643164441840/posts> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg]
> <https://twitter.com/Nexidia> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg]
> <https://www.youtube.com/user/NexidiaTV>
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to