Hi Eskil, (CC: the scikit-learn mailing list)
Unfortunately, I would not have time myself to implement this new criterion. In any case, given the recent publication of this paper, I dont think we would add it to the scikit-learn codebase. Our policy is to only include time-tested algorithms. That being said, maybe someone from the mailing list would be interested in helping you implementing this criterion in a separate fork. Best, Gilles On 14 March 2016 at 22:48, Eskil Forsell <eskil.fors...@phdstudent.hhs.se> wrote: > Dear Gilles, > I'm writing to you as the first author of the tree module in scikit-learn to > gauge your interest in implementing a novel and really useful (at least for > policy oriented economists like me) splitting criterion. > > I'm a PhD student in economics at Stockholm School of Economics and my > research and work focuses largely on evaluating policy by using randomised > controlled trials. There has recently been a lot of buzz in the field of > economics of the potential intersection of machine learning and causal > inference. Much of this buzz has been inspired by a paper outlining how to > use a splitting criteria tailored to the idea that the splits will later be > used as subpopulation for estimating treatment effects on a hold-out sample, > thus yielding correct standard errors. (I'm attaching the paper.) > > The authors are working on implementing the criterion in R but haven't yet > released anything publicly. I really believe that this method of estimating > heterogenous causal effects will be extremely popular among empirical > economists and potentially be of great use to policy-makers who want to > figure out how interventions work differently depending on personal > characteristics. > > I had a look at the criterion file but quickly realized that this wouldn't > be something I could implement myself. If you're interested I'd love to talk > more about it. If you're not interested, perhaps you could point me in the > direction of someone who might be and who'd have no problem of implementing > the criterion? Based on my understanding of the paper it's actually a quite > simple extension of the MSE criterion slightly complicated by the fact that > instead of raw means, we're using treatment effects (which crucially depend > on a treatment indicator variable). > > All the best and hope to hear from you soon. > > Regards, > Eskil ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general