Maybe some of the tree huggers can say something about that ;) Below are my best guess.

I am surprised to see that the docs say no regularization is usually best.
I would not use such large upper bounds as you did, and I would never search the full range, but rather steps to get only a few candidates, and possibly refine later. You can do max_depth=None and see how large fully grown trees are and start from there.

I think the CV method should not really impact the parameters so much as it is not even a factor of 2 difference in n_samples.



On 09/27/2014 02:56 PM, Satrajit Ghosh wrote:
thanks andy.

are there any general heuristics for these parameters - given that their ranges are over the samples?

max_depth = range(1, nsamples)
or
min_samples_leaves = range(1, nsamples)

also related question: given that nsamples would actually depend on the cv method of the GridSearchCV, is there a way to specify possible ranges without trying to calculate what the CV method would do?

i.e is there a way to couple the parameter specification to when the grid search runs the internal CV such that it will limit the parameter based on the size of the internal training set?

cheers,

satra

On Sat, Sep 27, 2014 at 2:25 AM, Andy <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Hi Satra.
    You should set "n_estimators" as high as you can afford time and
    memory wise, and then cross-validate over (at least) one of the
    regularization parameters,
    for example over max_depth or min_samples_leaves. You can also
    search over max_features.

    Cheers,
    Andy



    On 09/26/2014 10:24 PM, Satrajit Ghosh wrote:
    hi folks,

    what are some useful ranges of parameters to throw into a grid
    search? and are there specific difference between randomforests
    and extra trees? i understand one could try different impurity
    measures for classification, but any suggestions on sensitivity
    of other parameters would be nice.

    cheers,

    satra

    On Thu, Sep 25, 2014 at 8:48 AM, Andy <t3k...@gmail.com
    <mailto:t3k...@gmail.com>> wrote:

        On 09/23/2014 11:50 PM, Pagliari, Roberto wrote:

        I’m a bit confused as to why gridsearchCV is not needed with
        random forests. I understand that with RF, each tree will
        only get to see a partial representation of the data.

        Why do you say GridSearchCV is not needed?
        I think it should always be used, only not for setting
        n_estimators.
        You can use the oob estimates, but actually I don't think we
        have an automated way to use these to adjust parameters.

        
------------------------------------------------------------------------------
        Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
        Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI
        DSS Reports
        Are you Audit-Ready for PCI DSS 3.0 Compliance? Download
        White paper
        Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog
        Analyzer
        
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




    
------------------------------------------------------------------------------
    Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
    Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
    Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
    Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
    http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------
    Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
    Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS
    Reports
    Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
    Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
    http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to