You can do ensembles with prepruning (max-depth etc) but not post-pruning.
The general consensus is that post-pruning doesn't help in ensembles.


On 9/2/2015 8:43 PM, Rex X wrote:
Andreas,

Can we do ensembles with *pruning* in scikit-learn?


Rex

On Mon, Aug 31, 2015 at 9:15 AM, Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    You will not get results close to ensembles with pruning (unless
    your dataset is very specific).
    You can probably do your node filtering on ensembles, too.



    On 08/30/2015 03:44 PM, Rex X wrote:
    Jacob, I agree with both of your points about the ensemble
    methods. They can give quite good prediction result.

    But the question is to interpret these models. We want to extract
    specific decision rules, as fraud transaction declining rules for
    example. The motivation is to port these rules to other systems.

    Currently I am searching each node of one tree, to filter these
    nodes satisfying the conditions I want. I did obtained some
    interesting result. I wish that I can obtain result close to
    ensemble method.

    Any further tips?


    Best,
    Rex



    On Sun, Aug 30, 2015 at 11:45 AM, Jacob Schreiber
    <jmschreibe...@gmail.com <mailto:jmschreibe...@gmail.com>> wrote:

        Usually one would use an ensemble of trees to prevent
        overfitting. Two common techniques are a Random Forest or
        Gradient Boosting Trees. Gradient Boosting in particular has
        done well in competitions recently.

        While this may give you better generalization, it becomes
        difficult to interpret these models. You can try to constrain
        your model by requiring a higher number of examples, or
        higher weight of examples, be present at each leaf. This will
        prevent the tree from splitting to accomodate a single point,
        which may cause overfitting.

        On Sun, Aug 30, 2015 at 10:37 AM, Rex X <dnsr...@gmail.com
        <mailto:dnsr...@gmail.com>> wrote:

            Hi Jacob,

            Is there anything we can do to get better generalized
            decision rules?

            For example, after one tree fitting, select top (N-1)
            features by feature_importance, and then do the fitting
            again.

            Can this be helpful?


            Best,
            Rex




            On Sun, Aug 30, 2015 at 8:07 AM, Jacob Schreiber
            <jmschreibe...@gmail.com
            <mailto:jmschreibe...@gmail.com>> wrote:

                Tree pruning is currently not supported in sklearn.

                On Sun, Aug 30, 2015 at 6:44 AM, Rex X
                <dnsr...@gmail.com <mailto:dnsr...@gmail.com>> wrote:

                    Tree pruning process is very important to get a
                    better decision tree.

                    One idea is to recursively remove the leaf node
                    which cause least hurt to the decision tree.

                    Any idea how to do this for the following sample
                    case?


                        import pandas as pd
                        from sklearn.datasets import load_iris
                        from sklearn import tree
                        import sklearn

                        iris = sklearn.datasets.load_iris()
                        clf =
                        tree.DecisionTreeClassifier(class_weight={0 :
                        0.30, 1: 0.3, 2:0.4}, max_features="auto")
                        clf.fit(iris.data, iris.target)


                    
------------------------------------------------------------------------------

                    _______________________________________________
                    Scikit-learn-general mailing list
                    Scikit-learn-general@lists.sourceforge.net
                    <mailto:Scikit-learn-general@lists.sourceforge.net>
                    
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



                
------------------------------------------------------------------------------

                _______________________________________________
                Scikit-learn-general mailing list
                Scikit-learn-general@lists.sourceforge.net
                <mailto:Scikit-learn-general@lists.sourceforge.net>
                
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



            
------------------------------------------------------------------------------

            _______________________________________________
            Scikit-learn-general mailing list
            Scikit-learn-general@lists.sourceforge.net
            <mailto:Scikit-learn-general@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



        
------------------------------------------------------------------------------

        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to