2012/1/10 Andreas <[email protected]>: > On 01/10/2012 03:21 PM, Gilles Louppe wrote: >>> The current code works great for me (thanks for contributing!!!!), >>> still it would mean a lot if I could make it even faster. At the moment >>> it takes me >>> about 8 hours to grow a tree with only a subset of the features >>> that I actually want to use.... I have a 128 core cluster here but then >>> building >>> a forest with 1000 trees would still take roughly 6 days.... >>> >> Did you stick to random forests? They are much slower than extra-trees >> (because they look for the best splits, while in extra-trees splits >> are drawn at random). They also compare to each other in terms of >> accuracy. In addition, from experience, on large to big datasets, >> bootstrap doesn't help. You can turn it off (as long as max_features >> << n_features, with RFs). >> > Up to now I used RandomForests. > Thanks for the tips. I'll give it a try.
Out of curiosity can you please report comparative timings on your data? Also I think Gilles' remark should be added (and made prominent) to the narrative documentation and also in the "See also" section of the docstrings of RandomForests. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
