Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Sebastian Raschka
Unfortunately (or maybe fortunately :)) no, maximizing variance reduction & minimizing MSE are just special cases :) Best, Sebastian > On Mar 1, 2018, at 9:59 AM, Thomas Evangelidis wrote: > > Does this generalize to any loss function? For example I also want to >

Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Thomas Evangelidis
Does this generalize to any loss function? For example I also want to implement Kendall's tau correlation coefficient and a combination of R, tau and RMSE. :) On Mar 1, 2018 15:49, "Sebastian Raschka" wrote: > Hi, Thomas, > > as far as I know, it's all the same and doesn't

Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Sebastian Raschka
Hi, Thomas, as far as I know, it's all the same and doesn't matter, and you would get the same splits, since R^2 is just a rescaled MSE. Best, Sebastian > On Mar 1, 2018, at 9:39 AM, Thomas Evangelidis wrote: > > Hi Sebastian, > > Going back to Pearson's R loss

Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Thomas Evangelidis
Hi Sebastian, Going back to Pearson's R loss function, does this imply that I must add an abstract "init2" method to RegressionCriterion (that's where MSE class inherits from) where I will add the target values X as extra argument? And then the node impurity will be 1-R (the lowest the best)?

Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Sebastian Raschka
Hi, Thomas, in regression trees, minimizing the variance among the target values is equivalent to minimizing the MSE between targets and predicted values. This is also called variance reduction: https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction Best, Sebastian > On Mar

Re: [scikit-learn] custom loss function in RandomForestRegressor

2018-03-01 Thread Thomas Evangelidis
Hi again, I am currently revisiting this problem after familiarizing myself with Cython and Scikit-Learn's code and I have a very important query: Looking at the class MSE(RegressionCriterion), the node impurity is defined as the variance of the target values Y on that node. The predictions X