Unfortunately, the most important parameters to adjust to maximize accuracy are often those controlling the randomness in the algorithm, i.e. max_features for which this strategy is not possible.
That being said, in the case of boosting, I think this strategy would be worth automatizing, e.g. to adjust the number of trees. Gilles On 22 March 2016 at 01:44, Mathieu Blondel <math...@mblondel.org> wrote: > Related issue: > https://github.com/scikit-learn/scikit-learn/issues/3652 > > On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber <jmschreibe...@gmail.com> > wrote: >> >> It should if you're using those parameters. It's basically similar to >> calculating the regularization path for LASSO, since these are also >> regularization terms. I think this would probably be a good addition if >> there was a clean implementation for it. >> >> On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang <tunglam.d...@gmail.com> wrote: >>> >>> Hi Jacob, >>> >>> Thanks for your answer. Indeed you are right, some parameters cannot be >>> adjusted off-data. Let's go through the parameters list to see which one can >>> be adjusted: >>> n_estimators : this is simple - the more the better >>> criterion : No >>> max_features : No >>> max_depth : Yes >>> min_samples_split : Yes >>> min_samples_leaf : Yes >>> min_weight_fraction_leaf : Yes >>> max_leaf_nodes Yes >>> bootstrap : No >>> >>> So basically the bootstrap-related parameters cannot be adjusted, while >>> tree parameters can. It should still speed up the search, right? >>> Best, >>> Lam >>> >>> 2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>: >>>> >>>> Hi Lam >>>> >>>> The idea of exploiting redundancies to speed up algorithms is a good >>>> intuition. However, I don't think that most attributes would be able to be >>>> done in this manner. For example, considering different numbers of max >>>> features in the splits would be difficult to calculate without storing all >>>> possible splits at each node and just reducing the set of considered ones. >>>> And since all splits depend on the split before them, it may be difficult >>>> to >>>> modify splits in the middle of the tree without simply regrowing them (such >>>> as changing the feature the tree was split on.) >>>> >>>> Jacob >>>> >>>> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com> >>>> wrote: >>>>> >>>>> Hello scikit-learners, >>>>> >>>>> Here is an idea to accelerate to accelerate parameters tuning for >>>>> Random Forest and Extra Trees. I am very interested if anyone know if the >>>>> idea is exploited somewhere or whether it makes sense. >>>>> >>>>> Let's say we have a data set with train and validation >>>>> (cross-validation also works). >>>>> >>>>> The process today of tuning Random Forest is to try different set of >>>>> parameters, check validation performance, reiterate and take the model >>>>> with >>>>> best validation score in the end. >>>>> >>>>> The idea to improve this process is: >>>>> - Fit the model once while growing all the trees to maximum, save this >>>>> model as a baseline >>>>> - For any set of parameters, the new model can be produced by reducing >>>>> the trees in the baseline model based on parameters. For example, for >>>>> max_depth=5, one can just remove all the nodes with depth higher than 5. >>>>> This process should be much faster than regrowing trees since it >>>>> doen't need to refit the model >>>>> - Use validation (or cross-validaiton) performance to choose best model >>>>> as usual. >>>>> >>>>> This works (theoretically) because: >>>>> - For any parameters, the fitted trees will be just a part of the >>>>> baseline trees grown as maximum (except for criterion but it probably >>>>> matters less) >>>>> - Trees are grown independant to each other (so this idea will not work >>>>> for GBM) >>>>> >>>>> That's it. I am very interested in any feedback, whether it makes >>>>> sense, of it was done somewhere else already or whether it will work. >>>>> >>>>> Best regards, >>>>> Lam Dang >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Transform Data into Opportunity. >>>>> Accelerate data analysis in your applications with >>>>> Intel Data Analytics Acceleration Library. >>>>> Click to learn more. >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Transform Data into Opportunity. >>>> Accelerate data analysis in your applications with >>>> Intel Data Analytics Acceleration Library. >>>> Click to learn more. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Transform Data into Opportunity. >>> Accelerate data analysis in your applications with >>> Intel Data Analytics Acceleration Library. >>> Click to learn more. >>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> ------------------------------------------------------------------------------ >> Transform Data into Opportunity. >> Accelerate data analysis in your applications with >> Intel Data Analytics Acceleration Library. >> Click to learn more. >> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general