> The current code works great for me (thanks for contributing!!!!), > still it would mean a lot if I could make it even faster. At the moment > it takes me > about 8 hours to grow a tree with only a subset of the features > that I actually want to use.... I have a 128 core cluster here but then > building > a forest with 1000 trees would still take roughly 6 days....
Did you stick to random forests? They are much slower than extra-trees (because they look for the best splits, while in extra-trees splits are drawn at random). They also compare to each other in terms of accuracy. In addition, from experience, on large to big datasets, bootstrap doesn't help. You can turn it off (as long as max_features << n_features, with RFs). Gilles ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
