Hi Lam

The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult
to modify splits in the middle of the tree without simply regrowing them
(such as changing the feature the tree was split on.)

Jacob

On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com> wrote:

> Hello scikit-learners,
>
> Here is an idea to accelerate to accelerate parameters tuning for Random
> Forest and Extra Trees. I am very interested if anyone know if the idea is
> exploited somewhere or whether it makes sense.
>
> Let's say we have a data set with train and validation (cross-validation
> also works).
>
> The process today of tuning Random Forest is to try different set of
> parameters, check validation performance, reiterate and take the model with
> best validation score in the end.
>
> The idea to improve this process is:
> - Fit the model once while growing all the trees to maximum, save this
> model as a baseline
> - For any set of parameters, the new model can be produced by reducing the
> trees in the baseline model based on parameters. For example, for
> max_depth=5, one can just remove all the nodes with depth higher than 5.
>   This process should be much faster than regrowing trees since it doen't
> need to refit the model
> - Use validation (or cross-validaiton) performance to choose best model as
> usual.
>
> This works (theoretically) because:
> - For any parameters, the fitted trees will be just a part of the baseline
> trees grown as maximum (except for criterion but it probably matters less)
> - Trees are grown independant to each other (so this idea will not work
> for GBM)
>
> That's it. I am very interested in any feedback, whether it makes sense,
> of it was done somewhere else already or whether it will work.
>
> Best regards,
> Lam Dang
>
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to