Hi Jacob,
Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which one
can be adjusted:
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No
So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>:
> Hi Lam
>
> The idea of exploiting redundancies to speed up algorithms is a good
> intuition. However, I don't think that most attributes would be able to be
> done in this manner. For example, considering different numbers of max
> features in the splits would be difficult to calculate without storing all
> possible splits at each node and just reducing the set of considered ones.
> And since all splits depend on the split before them, it may be difficult
> to modify splits in the middle of the tree without simply regrowing them
> (such as changing the feature the tree was split on.)
>
> Jacob
>
> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com> wrote:
>
>> Hello scikit-learners,
>>
>> Here is an idea to accelerate to accelerate parameters tuning for Random
>> Forest and Extra Trees. I am very interested if anyone know if the idea is
>> exploited somewhere or whether it makes sense.
>>
>> Let's say we have a data set with train and validation (cross-validation
>> also works).
>>
>> The process today of tuning Random Forest is to try different set of
>> parameters, check validation performance, reiterate and take the model with
>> best validation score in the end.
>>
>> The idea to improve this process is:
>> - Fit the model once while growing all the trees to maximum, save this
>> model as a baseline
>> - For any set of parameters, the new model can be produced by reducing
>> the trees in the baseline model based on parameters. For example, for
>> max_depth=5, one can just remove all the nodes with depth higher than 5.
>> This process should be much faster than regrowing trees since it doen't
>> need to refit the model
>> - Use validation (or cross-validaiton) performance to choose best model
>> as usual.
>>
>> This works (theoretically) because:
>> - For any parameters, the fitted trees will be just a part of the
>> baseline trees grown as maximum (except for criterion but it probably
>> matters less)
>> - Trees are grown independant to each other (so this idea will not work
>> for GBM)
>>
>> That's it. I am very interested in any feedback, whether it makes sense,
>> of it was done somewhere else already or whether it will work.
>>
>> Best regards,
>> Lam Dang
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general