Jacob, I agree with both of your points about the ensemble methods. They
can give quite good prediction result.

But the question is to interpret these models. We want to extract specific
decision rules, as fraud transaction declining rules for example. The
motivation is to port these rules to other systems.

Currently I am searching each node of one tree, to filter these nodes
satisfying the conditions I want. I did obtained some interesting result. I
wish that I can obtain result close to ensemble method.

Any further tips?


Best,
Rex



On Sun, Aug 30, 2015 at 11:45 AM, Jacob Schreiber <jmschreibe...@gmail.com>
wrote:

> Usually one would use an ensemble of trees to prevent overfitting. Two
> common techniques are a Random Forest or Gradient Boosting Trees. Gradient
> Boosting in particular has done well in competitions recently.
>
> While this may give you better generalization, it becomes difficult to
> interpret these models. You can try to constrain your model by requiring a
> higher number of examples, or higher weight of examples, be present at each
> leaf. This will prevent the tree from splitting to accomodate a single
> point, which may cause overfitting.
>
> On Sun, Aug 30, 2015 at 10:37 AM, Rex X <dnsr...@gmail.com> wrote:
>
>> Hi Jacob,
>>
>> Is there anything we can do to get better generalized decision rules?
>>
>> For example, after one tree fitting, select top (N-1) features by
>> feature_importance, and then do the fitting again.
>>
>> Can this be helpful?
>>
>>
>> Best,
>> Rex
>>
>>
>>
>>
>> On Sun, Aug 30, 2015 at 8:07 AM, Jacob Schreiber <jmschreibe...@gmail.com
>> > wrote:
>>
>>> Tree pruning is currently not supported in sklearn.
>>>
>>> On Sun, Aug 30, 2015 at 6:44 AM, Rex X <dnsr...@gmail.com> wrote:
>>>
>>>> Tree pruning process is very important to get a better decision tree.
>>>>
>>>> One idea is to recursively remove the leaf node which cause least hurt
>>>> to the decision tree.
>>>>
>>>> Any idea how to do this for the following sample case?
>>>>
>>>>
>>>> import pandas as pd
>>>>> from sklearn.datasets import load_iris
>>>>> from sklearn import tree
>>>>> import sklearn
>>>>>
>>>>> iris = sklearn.datasets.load_iris()
>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3,
>>>>> 2:0.4}, max_features="auto")
>>>>> clf.fit(iris.data, iris.target)
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to