Re: [Scikit-learn-general] Question and comments on RandomForests

Olivier Grisel Tue, 10 Jan 2012 06:49:10 -0800

2012/1/10 Andreas <[email protected]>:
> On 01/10/2012 03:21 PM, Gilles Louppe wrote:
>>> The current code works great for me (thanks for contributing!!!!),
>>> still it would mean a lot if I could make it even faster. At the moment
>>> it takes me
>>> about 8 hours to grow a tree with only a subset of the features
>>> that I actually want to use.... I have a 128 core cluster here but then
>>> building
>>> a forest with 1000 trees would still take roughly 6 days....
>>>
>> Did you stick to random forests? They are much slower than extra-trees
>> (because they look for the best splits, while in extra-trees splits
>> are drawn at random). They also compare to each other in terms of
>> accuracy. In addition, from experience, on large to big datasets,
>> bootstrap doesn't help. You can turn it off (as long as max_features
>> <<  n_features, with RFs).
>>
> Up to now I used RandomForests.
> Thanks for the tips. I'll give it a try.


Out of curiosity can you please report comparative timings on your data?

Also I think Gilles' remark should be added (and made prominent) to
the narrative documentation and also in the "See also" section of the
docstrings of RandomForests.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Question and comments on RandomForests

Reply via email to