Re: [Scikit-learn-general] Getting decision tree regressor to predict using median not mean, of final subset

Joel Nothman Mon, 23 Jun 2014 05:21:08 -0700

I think that should be Tree.apply, not apply_Tree. I.e. I guess you want to
use something like (unverified):


for leaf_ind, values in groupby(sorted(zip(regressor.tree_.apply(X_train),
y_train)), operator.itemgetter(0)):
    regressor.tree_.values[leaf_ind, ...] = np.median(list(values))


On 23 June 2014 07:57, Peter Prettenhofer <[email protected]>
wrote:

> Hi James,
>
> if you look at the LAD loss function in the gradient_boosting module you
> can find an example how to do it. Basically, you need to update the values
> array in the Tree extension type. Tree.apply_Tree(x_train) gives you the
> training instances in each leaf.
>
> HTH,
> Peter
> Am 23.06.2014 13:48 schrieb "James McMurray" <[email protected]>:
>
>> Hi,
>>
>> I want to use the decision tree regressor to predict using the median of
>> the resulting subset from the tree, rather than the mean?
>>
>> Is there a simple way to do this?
>>
>> I looked at the code, but in sklearn/tree/tree.py, the only relevant line
>> is:
>>         proba = self.tree_.predict(X)
>>
>> Where the prediction is already done (presumably in the Cython code), I
>> don't have experience with Cython so I'm not sure how to modify _tree.pyx
>> to do this.
>>
>> Many thanks,
>> James McMurray
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Getting decision tree regressor to predict using median not mean, of final subset

Reply via email to