Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

Gilles Louppe Tue, 22 Mar 2016 00:29:03 -0700

Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.


That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to adjust the number of trees.

Gilles

On 22 March 2016 at 01:44, Mathieu Blondel <math...@mblondel.org> wrote:
> Related issue:
> https://github.com/scikit-learn/scikit-learn/issues/3652
>
> On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber <jmschreibe...@gmail.com>
> wrote:
>>
>> It should if you're using those parameters. It's basically similar to
>> calculating the regularization path for LASSO, since these are also
>> regularization terms. I think this would probably be a good addition if
>> there was a clean implementation for it.
>>
>> On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang <tunglam.d...@gmail.com> wrote:
>>>
>>> Hi Jacob,
>>>
>>> Thanks for your answer. Indeed you are right, some parameters cannot be
>>> adjusted off-data. Let's go through the parameters list to see which one can
>>> be adjusted:
>>> n_estimators : this is simple - the more the better
>>> criterion : No
>>> max_features : No
>>> max_depth : Yes
>>> min_samples_split : Yes
>>> min_samples_leaf : Yes
>>> min_weight_fraction_leaf : Yes
>>> max_leaf_nodes Yes
>>> bootstrap : No
>>>
>>> So basically the bootstrap-related parameters cannot be adjusted, while
>>> tree parameters can. It should still speed up the search, right?
>>> Best,
>>> Lam
>>>
>>> 2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>:
>>>>
>>>> Hi Lam
>>>>
>>>> The idea of exploiting redundancies to speed up algorithms is a good
>>>> intuition. However, I don't think that most attributes would be able to be
>>>> done in this manner. For example, considering different numbers of max
>>>> features in the splits would be difficult to calculate without storing all
>>>> possible splits at each node and just reducing the set of considered ones.
>>>> And since all splits depend on the split before them, it may be difficult 
>>>> to
>>>> modify splits in the middle of the tree without simply regrowing them (such
>>>> as changing the feature the tree was split on.)
>>>>
>>>> Jacob
>>>>
>>>> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello scikit-learners,
>>>>>
>>>>> Here is an idea to accelerate to accelerate parameters tuning for
>>>>> Random Forest and Extra Trees. I am very interested if anyone know if the
>>>>> idea is exploited somewhere or whether it makes sense.
>>>>>
>>>>> Let's say we have a data set with train and validation
>>>>> (cross-validation also works).
>>>>>
>>>>> The process today of tuning Random Forest is to try different set of
>>>>> parameters, check validation performance, reiterate and take the model 
>>>>> with
>>>>> best validation score in the end.
>>>>>
>>>>> The idea to improve this process is:
>>>>> - Fit the model once while growing all the trees to maximum, save this
>>>>> model as a baseline
>>>>> - For any set of parameters, the new model can be produced by reducing
>>>>> the trees in the baseline model based on parameters. For example, for
>>>>> max_depth=5, one can just remove all the nodes with depth higher than 5.
>>>>>   This process should be much faster than regrowing trees since it
>>>>> doen't need to refit the model
>>>>> - Use validation (or cross-validaiton) performance to choose best model
>>>>> as usual.
>>>>>
>>>>> This works (theoretically) because:
>>>>> - For any parameters, the fitted trees will be just a part of the
>>>>> baseline trees grown as maximum (except for criterion but it probably
>>>>> matters less)
>>>>> - Trees are grown independant to each other (so this idea will not work
>>>>> for GBM)
>>>>>
>>>>> That's it. I am very interested in any feedback, whether it makes
>>>>> sense, of it was done somewhere else already or whether it will work.
>>>>>
>>>>> Best regards,
>>>>> Lam Dang
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Transform Data into Opportunity.
>>>>> Accelerate data analysis in your applications with
>>>>> Intel Data Analytics Acceleration Library.
>>>>> Click to learn more.
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning

Reply via email to