Andy, BaseForest's fit method uses the same parallel loop, regardless of
warm-start. So, as you state, it's just that the example only adds a single
additional estimator at a time.


On Tue, Jun 30, 2015 at 12:48 PM, Andreas Mueller <t3k...@gmail.com> wrote:

>  Unless I am misremembering how warm starts are implemented (tree growers
> around?)
> the comment seems badly phrased. I think what it means to say is that
> warm-starting repeatedly with the number of trees increasing by increments
> of 1
> will make the fitting be serial (you only built a single tree at a time).
> If you would make the loop to be
>
> range(min_estimators, max_estimators + 1, 10)
>
> It could built up to 10 estimators in parallel.
> If anyone want so check with the code / confirm, that would be great, and
> we should fix the wording in the example.
>
>
>
> On 06/24/2015 02:01 PM, Artem wrote:
>
>  Hi Dale
>
>  Thanks for the code sample! Indeed, warm_start does not disable
> parallelization, I can confirm by both running your code and checking the
> source. Moreover, that example you mentioned was added on May, 2nd, and it 
> doesn't
> look
> <https://github.com/scikit-learn/scikit-learn/commits/master/sklearn/ensemble/forest.py>
> like there were any relevant changes to the master branch since.
>
>
> On Wed, Jun 24, 2015 at 6:06 PM, Dale Smith <dsm...@nexidia.com> wrote:
>
>>  Hello,
>>
>>
>>
>> Version 0.16.1 adds warm_start to RandomForestClassifier, but the
>> documentation doesn't include a note that warm_start disables
>> parallelization. I found a reference to this in a comment in the "OOB
>> Errors for Random Forests" example in the development documentation.
>>
>>
>>
>> http://scikit-learn.org/dev/auto_examples/ensemble/plot_ensemble_oob.html
>>
>>
>>
>> Setting both does not generate a warning or error. My own testing
>> indicates that warm_start allows the use of n_jobs for version 0.16.1. I
>> can see the processor use in Task Manager.
>>
>>
>>
>> I am using a Numpy 64bit experimental build with Mingw-w64 and OpenBlas
>> provided by Carl Kleffner (
>> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads). I have a
>> VM with the out-of-the box numpy and scikit-learn version 0.16.1, and I
>> observe the same behavior – use of more than one core as confirmed by Task
>> Manager. For the record, I’m using Windows 7 and Anaconda 3.
>>
>>
>>
>> Am I missing something? Does warm_start allow the use of more than one
>> processor? Has there been a change in the development tree that affects
>> parallelization? I’ve searched around for an answer but can’t find anything
>> relevant. Here is some reproducible code.
>>
>>
>>
>> The RandomForestClassifier constructor documentation doesn’t address
>> these concerns. I’m willing to edit the documentation myself once this
>> issue is clarified.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> import sklearn as sk
>> from pandas import DataFrame
>> from sklearn.ensemble import RandomForestClassifier
>> from sklearn.datasets import make_classification
>> import seaborn as sns
>> import numpy as np
>>
>> print(sk.__version__)
>>
>> # Build a classification task using 300 informative features
>> # There are 100,000 samples
>> X, y = make_classification(n_samples=100000,
>>                            n_features=500,
>>                            n_informative=30,
>>                            n_redundant=0,
>>                            n_repeated=0,
>>                            n_classes=2,
>>                            random_state=0,
>>                            shuffle=False)
>>
>> forest = RandomForestClassifier(n_jobs=10, random_state=100,
>> oob_score=True, bootstrap=True, warm_start=True)
>>
>> n_estimators = 200
>> rng = range(50, n_estimators + 1, 25)
>> error_rate = DataFrame(index=np.arange(0, len(rng)), columns=('Number of
>> Trees', 'OOB Error'))
>> for i, n_trees in enumerate(rng):
>>    print("Fit training set for {0:d} trees.".format(n_trees))
>>    forest.set_params(n_estimators=n_trees)
>>    forest.set_params(n_jobs=10)
>>    params = forest.get_params()
>>    forest.fit(X, y)
>>    error_rate.loc[i] = [n_trees, 1 - forest.oob_score_]
>>
>> sns.lmplot('Number of Trees', 'OOB Error',
>> error_rate).savefig("test_warm_start.png")
>>
>> print("Finished")
>>
>>
>>
>>
>>
>>
>> *Dale Smith, Ph.D.*
>> Data Scientist
>> ​
>> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]
>> <http://nexidia.com/>
>>
>> * d.* 404.495.7220 x 4008   *f.* 404.795.7221
>> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 |
>> Atlanta, GA 30305
>>
>> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]
>> <http://blog.nexidia.com/> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg]
>> <https://www.linkedin.com/company/nexidia> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg]
>> <https://plus.google.com/u/0/107921893643164441840/posts> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg]
>> <https://twitter.com/Nexidia> [image:
>> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg]
>> <https://www.youtube.com/user/NexidiaTV>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download 
> nowhttp://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to