Interesting,

Yes max_features is probably the most important parameter. However those
other parameters may have big contribution to reduce overfitting too.
I would probably make some test but I am not experienced with the low level
API of scikit-learn.

Any experimented scikit-learn contributors want to collaborate?


Le mardi 22 mars 2016, Gilles Louppe <g.lou...@gmail.com> a écrit :

> Unfortunately, the most important parameters to adjust to maximize
> accuracy are often those controlling the randomness in the algorithm,
> i.e. max_features for which this strategy is not possible.
>
> That being said, in the case of boosting, I think this strategy would
> be worth automatizing, e.g. to adjust the number of trees.
>
> Gilles
>
> On 22 March 2016 at 01:44, Mathieu Blondel <math...@mblondel.org> wrote:
> > Related issue:
> > https://github.com/scikit-learn/scikit-learn/issues/3652
> >
> > On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber <
> jmschreibe...@gmail.com>
> > wrote:
> >>
> >> It should if you're using those parameters. It's basically similar to
> >> calculating the regularization path for LASSO, since these are also
> >> regularization terms. I think this would probably be a good addition if
> >> there was a clean implementation for it.
> >>
> >> On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang <tunglam.d...@gmail.com>
> wrote:
> >>>
> >>> Hi Jacob,
> >>>
> >>> Thanks for your answer. Indeed you are right, some parameters cannot be
> >>> adjusted off-data. Let's go through the parameters list to see which
> one can
> >>> be adjusted:
> >>> n_estimators : this is simple - the more the better
> >>> criterion : No
> >>> max_features : No
> >>> max_depth : Yes
> >>> min_samples_split : Yes
> >>> min_samples_leaf : Yes
> >>> min_weight_fraction_leaf : Yes
> >>> max_leaf_nodes Yes
> >>> bootstrap : No
> >>>
> >>> So basically the bootstrap-related parameters cannot be adjusted, while
> >>> tree parameters can. It should still speed up the search, right?
> >>> Best,
> >>> Lam
> >>>
> >>> 2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>:
> >>>>
> >>>> Hi Lam
> >>>>
> >>>> The idea of exploiting redundancies to speed up algorithms is a good
> >>>> intuition. However, I don't think that most attributes would be able
> to be
> >>>> done in this manner. For example, considering different numbers of max
> >>>> features in the splits would be difficult to calculate without
> storing all
> >>>> possible splits at each node and just reducing the set of considered
> ones.
> >>>> And since all splits depend on the split before them, it may be
> difficult to
> >>>> modify splits in the middle of the tree without simply regrowing them
> (such
> >>>> as changing the feature the tree was split on.)
> >>>>
> >>>> Jacob
> >>>>
> >>>> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hello scikit-learners,
> >>>>>
> >>>>> Here is an idea to accelerate to accelerate parameters tuning for
> >>>>> Random Forest and Extra Trees. I am very interested if anyone know
> if the
> >>>>> idea is exploited somewhere or whether it makes sense.
> >>>>>
> >>>>> Let's say we have a data set with train and validation
> >>>>> (cross-validation also works).
> >>>>>
> >>>>> The process today of tuning Random Forest is to try different set of
> >>>>> parameters, check validation performance, reiterate and take the
> model with
> >>>>> best validation score in the end.
> >>>>>
> >>>>> The idea to improve this process is:
> >>>>> - Fit the model once while growing all the trees to maximum, save
> this
> >>>>> model as a baseline
> >>>>> - For any set of parameters, the new model can be produced by
> reducing
> >>>>> the trees in the baseline model based on parameters. For example, for
> >>>>> max_depth=5, one can just remove all the nodes with depth higher
> than 5.
> >>>>>   This process should be much faster than regrowing trees since it
> >>>>> doen't need to refit the model
> >>>>> - Use validation (or cross-validaiton) performance to choose best
> model
> >>>>> as usual.
> >>>>>
> >>>>> This works (theoretically) because:
> >>>>> - For any parameters, the fitted trees will be just a part of the
> >>>>> baseline trees grown as maximum (except for criterion but it probably
> >>>>> matters less)
> >>>>> - Trees are grown independant to each other (so this idea will not
> work
> >>>>> for GBM)
> >>>>>
> >>>>> That's it. I am very interested in any feedback, whether it makes
> >>>>> sense, of it was done somewhere else already or whether it will work.
> >>>>>
> >>>>> Best regards,
> >>>>> Lam Dang
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> ------------------------------------------------------------------------------
> >>>>> Transform Data into Opportunity.
> >>>>> Accelerate data analysis in your applications with
> >>>>> Intel Data Analytics Acceleration Library.
> >>>>> Click to learn more.
> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> >>>>> _______________________________________________
> >>>>> Scikit-learn-general mailing list
> >>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>> Transform Data into Opportunity.
> >>>> Accelerate data analysis in your applications with
> >>>> Intel Data Analytics Acceleration Library.
> >>>> Click to learn more.
> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> >>>> _______________________________________________
> >>>> Scikit-learn-general mailing list
> >>>> Scikit-learn-general@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Transform Data into Opportunity.
> >>> Accelerate data analysis in your applications with
> >>> Intel Data Analytics Acceleration Library.
> >>> Click to learn more.
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Transform Data into Opportunity.
> >> Accelerate data analysis in your applications with
> >> Intel Data Analytics Acceleration Library.
> >> Click to learn more.
> >> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to