Hi. I am fairly new to scikit-learn. I first tried it out for a Kaggle 
competition a few months ago. I didn't have enough time to put into that, but 
it gave me an appreciation of the powerful yet simple design of the library. 
Also, I've used Python on and off for about a decade for small things, but I am 
not an expert, and I know little about Cython.

I like using Extremely Randomized Trees, but I'm looking for more flexibility 
in generating them. In particular, I'd like to be able to specify my own 
criterion and split finding algorithm. I'm curious why these are passed in as 
strings instead of functions/objects. Part of me thinks it has something to do 
with Cython. Otherwise, I could imagine wanting to be more abstract and leave 
decisions to the code; for example, best_split and random_split would use 
different implementations to have an efficient MAE criterion.

So I'd like to contribute a simple MAE criterion that would be efficient for 
random splits (i.e. O(n) given a single batch update.) Is the direction forward 
for something like this to hard-code more criteria in _tree.pyx, or would it be 
better to approach some modularity and allow a Criterion object to be passed in?


Ken Geis


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to