Re: [scikit-learn] custom loss function in RandomForestRegressor

Sebastian Raschka Thu, 01 Mar 2018 07:05:55 -0800

Unfortunately (or maybe fortunately :)) no, maximizing variance reduction & 
minimizing MSE are just special cases :)


Best,
Sebastian

> On Mar 1, 2018, at 9:59 AM, Thomas Evangelidis <teva...@gmail.com> wrote:
> 
> Does this generalize to any loss function? For example I also want to 
> implement Kendall's tau correlation coefficient and a combination of R, tau 
> and RMSE. :) 
> 
> On Mar 1, 2018 15:49, "Sebastian Raschka" <se.rasc...@gmail.com> wrote:
> Hi, Thomas,
> 
> as far as I know, it's all the same and doesn't matter, and you would get the 
> same splits, since R^2 is just a rescaled MSE.
> 
> Best,
> Sebastian
> 
> > On Mar 1, 2018, at 9:39 AM, Thomas Evangelidis <teva...@gmail.com> wrote:
> >
> > Hi Sebastian,
> >
> > Going back to Pearson's R loss function, does this imply that I must add an 
> > abstract "init2" method to RegressionCriterion (that's where MSE class 
> > inherits from) where I will add the target values X as extra argument? And 
> > then the node impurity will be 1-R (the lowest the best)? What about the 
> > impurities of the left and right split? In MSE class they are (sum_i^n 
> > y_i)**2 where n is the number of samples in the respective split. It is not 
> > clear how this is related to variance in order to adapt it for my purpose.
> >
> > Best,
> > Thomas
> >
> >
> > On Mar 1, 2018 14:56, "Sebastian Raschka" <se.rasc...@gmail.com> wrote:
> > Hi, Thomas,
> >
> > in regression trees, minimizing the variance among the target values is 
> > equivalent to minimizing the MSE between targets and predicted values. This 
> > is also called variance reduction: 
> > https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction
> >
> > Best,
> > Sebastian
> >
> > > On Mar 1, 2018, at 8:27 AM, Thomas Evangelidis <teva...@gmail.com> wrote:
> > >
> > >
> > > Hi again,
> > >
> > > I am currently revisiting this problem after familiarizing myself with 
> > > Cython and Scikit-Learn's code and I have a very important query:
> > >
> > > Looking at the class MSE(RegressionCriterion), the node impurity is 
> > > defined as the variance of the target values Y on that node. The 
> > > predictions X are nowhere involved in the computations. This contradicts 
> > > my notion of "loss function", which quantifies the discrepancy between 
> > > predicted and target values. Am I looking at the wrong class or what I 
> > > want to do is just not feasible with Random Forests? For example, I would 
> > > like to modify the RandomForestRegressor code to minimize the Pearson's R 
> > > between predicted and target values.
> > >
> > > I thank you in advance for any clarification.
> > > Thomas
> > >
> > >
> > >
> > >
> > > On 02/15/2018 01:28 PM, Guillaume Lemaitre wrote:
> > >> Yes you are right pxd are the header and pyx the definition. You need to 
> > >> write a class as MSE. Criterion is an abstract class or base class (I 
> > >> don't have it under the eye)
> > >>
> > >> @Andy: if I recall the PR, we made the classes public to enable such 
> > >> custom criterion. However, ‎it is not documented since we were not 
> > >> officially supporting it. So this is an hidden feature. We could always 
> > >> discuss to make this feature more visible and document it.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > ======================================================================
> > > Dr Thomas Evangelidis
> > > Post-doctoral Researcher
> > > CEITEC - Central European Institute of Technology
> > > Masaryk University
> > > Kamenice 5/A35/2S049,
> > > 62500 Brno, Czech Republic
> > >
> > > email: tev...@pharm.uoa.gr
> > >               teva...@gmail.com
> > >
> > > website: https://sites.google.com/site/thomasevangelidishomepage/
> > >
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn@python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] custom loss function in RandomForestRegressor

Reply via email to