Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-25 Thread Andreas Mueller
On 03/22/2016 03:27 AM, Gilles Louppe wrote: > Unfortunately, the most important parameters to adjust to maximize > accuracy are often those controlling the randomness in the algorithm, > i.e. max_features for which this strategy is not possible. > > That being said, in the case of boosting, I th

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-22 Thread Lam Dang
Interesting, Yes max_features is probably the most important parameter. However those other parameters may have big contribution to reduce overfitting too. I would probably make some test but I am not experienced with the low level API of scikit-learn. Any experimented scikit-learn contributors w

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-22 Thread Gilles Louppe
Unfortunately, the most important parameters to adjust to maximize accuracy are often those controlling the randomness in the algorithm, i.e. max_features for which this strategy is not possible. That being said, in the case of boosting, I think this strategy would be worth automatizing, e.g. to a

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-21 Thread Mathieu Blondel
Related issue: https://github.com/scikit-learn/scikit-learn/issues/3652 On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber wrote: > It should if you're using those parameters. It's basically similar to > calculating the regularization path for LASSO, since these are also > regularization terms. I

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-21 Thread Jacob Schreiber
It should if you're using those parameters. It's basically similar to calculating the regularization path for LASSO, since these are also regularization terms. I think this would probably be a good addition if there was a clean implementation for it. On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang wrot

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-21 Thread Lam Dang
Hi Jacob, Thanks for your answer. Indeed you are right, some parameters cannot be adjusted off-data. Let's go through the parameters list to see which one can be adjusted: n_estimators : this is simple - the more the better criterion : No max_features : No max_depth : Yes min_samples_split : Yes m

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

2016-03-21 Thread Jacob Schreiber
Hi Lam The idea of exploiting redundancies to speed up algorithms is a good intuition. However, I don't think that most attributes would be able to be done in this manner. For example, considering different numbers of max features in the splits would be difficult to calculate without storing all p