Hi. I am fairly new to scikit-learn. I first tried it out for a Kaggle competition a few months ago. I didn't have enough time to put into that, but it gave me an appreciation of the powerful yet simple design of the library. Also, I've used Python on and off for about a decade for small things, but I am not an expert, and I know little about Cython.
I like using Extremely Randomized Trees, but I'm looking for more flexibility in generating them. In particular, I'd like to be able to specify my own criterion and split finding algorithm. I'm curious why these are passed in as strings instead of functions/objects. Part of me thinks it has something to do with Cython. Otherwise, I could imagine wanting to be more abstract and leave decisions to the code; for example, best_split and random_split would use different implementations to have an efficient MAE criterion. So I'd like to contribute a simple MAE criterion that would be efficient for random splits (i.e. O(n) given a single batch update.) Is the direction forward for something like this to hard-code more criteria in _tree.pyx, or would it be better to approach some modularity and allow a Criterion object to be passed in? Ken Geis ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
