Related issue:
https://github.com/scikit-learn/scikit-learn/issues/3652
On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber <jmschreibe...@gmail.com>
wrote:
> It should if you're using those parameters. It's basically similar to
> calculating the regularization path for LASSO, since these are also
> regularization terms. I think this would probably be a good addition if
> there was a clean implementation for it.
>
> On Mon, Mar 21, 2016 at 2:19 PM, Lam Dang <tunglam.d...@gmail.com> wrote:
>
>> Hi Jacob,
>>
>> Thanks for your answer. Indeed you are right, some parameters cannot be
>> adjusted off-data. Let's go through the parameters list to see which one
>> can be adjusted:
>> n_estimators : this is simple - the more the better
>> criterion : No
>> max_features : No
>> max_depth : Yes
>> min_samples_split : Yes
>> min_samples_leaf : Yes
>> min_weight_fraction_leaf : Yes
>> max_leaf_nodes Yes
>> bootstrap : No
>>
>> So basically the bootstrap-related parameters cannot be adjusted, while
>> tree parameters can. It should still speed up the search, right?
>> Best,
>> Lam
>>
>> 2016-03-21 21:42 GMT+01:00 Jacob Schreiber <jmschreibe...@gmail.com>:
>>
>>> Hi Lam
>>>
>>> The idea of exploiting redundancies to speed up algorithms is a good
>>> intuition. However, I don't think that most attributes would be able to be
>>> done in this manner. For example, considering different numbers of max
>>> features in the splits would be difficult to calculate without storing all
>>> possible splits at each node and just reducing the set of considered ones.
>>> And since all splits depend on the split before them, it may be difficult
>>> to modify splits in the middle of the tree without simply regrowing them
>>> (such as changing the feature the tree was split on.)
>>>
>>> Jacob
>>>
>>> On Mon, Mar 21, 2016 at 1:24 PM, Lam Dang <tunglam.d...@gmail.com>
>>> wrote:
>>>
>>>> Hello scikit-learners,
>>>>
>>>> Here is an idea to accelerate to accelerate parameters tuning for
>>>> Random Forest and Extra Trees. I am very interested if anyone know if the
>>>> idea is exploited somewhere or whether it makes sense.
>>>>
>>>> Let's say we have a data set with train and validation
>>>> (cross-validation also works).
>>>>
>>>> The process today of tuning Random Forest is to try different set of
>>>> parameters, check validation performance, reiterate and take the model with
>>>> best validation score in the end.
>>>>
>>>> The idea to improve this process is:
>>>> - Fit the model once while growing all the trees to maximum, save this
>>>> model as a baseline
>>>> - For any set of parameters, the new model can be produced by reducing
>>>> the trees in the baseline model based on parameters. For example, for
>>>> max_depth=5, one can just remove all the nodes with depth higher than 5.
>>>> This process should be much faster than regrowing trees since it
>>>> doen't need to refit the model
>>>> - Use validation (or cross-validaiton) performance to choose best model
>>>> as usual.
>>>>
>>>> This works (theoretically) because:
>>>> - For any parameters, the fitted trees will be just a part of the
>>>> baseline trees grown as maximum (except for criterion but it probably
>>>> matters less)
>>>> - Trees are grown independant to each other (so this idea will not work
>>>> for GBM)
>>>>
>>>> That's it. I am very interested in any feedback, whether it makes
>>>> sense, of it was done somewhere else already or whether it will work.
>>>>
>>>> Best regards,
>>>> Lam Dang
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general