Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

Jacob Schreiber Mon, 21 Mar 2016 14:34:14 -0700

It should if you're using those parameters. It's basically similar to
calculating the regularization path for LASSO, since these are also
regularization terms. I think this would probably be a good addition if
there was a clean implementation for it.


On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang <tunglam.d...@gmail.com> wrote:

> Hi Jacob,
>
> Thanks for your answer. Indeed you are right, some parameters cannot be
> adjusted off-data. Let's go through the parameters list to see which one
> can be adjusted:
> n_estimators : this is simple - the more the better
> criterion : No
> max_features : No
> max_depth : Yes
> min_samples_split : Yes
> min_samples_leaf : Yes
> min_weight_fraction_leaf : Yes
> max_leaf_nodes Yes
> bootstrap : No
>
> So basically the bootstrap-related parameters cannot be adjusted, while
> tree parameters can. It should still speed up the search, right?
> Best,
> Lam
>
> 2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>:
>
>> Hi Lam
>>
>> The idea of exploiting redundancies to speed up algorithms is a good
>> intuition. However, I don't think that most attributes would be able to be
>> done in this manner. For example, considering different numbers of max
>> features in the splits would be difficult to calculate without storing all
>> possible splits at each node and just reducing the set of considered ones.
>> And since all splits depend on the split before them, it may be difficult
>> to modify splits in the middle of the tree without simply regrowing them
>> (such as changing the feature the tree was split on.)
>>
>> Jacob
>>
>> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com> wrote:
>>
>>> Hello scikit-learners,
>>>
>>> Here is an idea to accelerate to accelerate parameters tuning for Random
>>> Forest and Extra Trees. I am very interested if anyone know if the idea is
>>> exploited somewhere or whether it makes sense.
>>>
>>> Let's say we have a data set with train and validation (cross-validation
>>> also works).
>>>
>>> The process today of tuning Random Forest is to try different set of
>>> parameters, check validation performance, reiterate and take the model with
>>> best validation score in the end.
>>>
>>> The idea to improve this process is:
>>> - Fit the model once while growing all the trees to maximum, save this
>>> model as a baseline
>>> - For any set of parameters, the new model can be produced by reducing
>>> the trees in the baseline model based on parameters. For example, for
>>> max_depth=5, one can just remove all the nodes with depth higher than 5.
>>>   This process should be much faster than regrowing trees since it
>>> doen't need to refit the model
>>> - Use validation (or cross-validaiton) performance to choose best model
>>> as usual.
>>>
>>> This works (theoretically) because:
>>> - For any parameters, the fitted trees will be just a part of the
>>> baseline trees grown as maximum (except for criterion but it probably
>>> matters less)
>>> - Trees are grown independant to each other (so this idea will not work
>>> for GBM)
>>>
>>> That's it. I am very interested in any feedback, whether it makes sense,
>>> of it was done somewhere else already or whether it will work.
>>>
>>> Best regards,
>>> Lam Dang
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

Reply via email to