Hi all, The following PR on Random Forests by Gilles deserves a final round of reviews:
https://github.com/scikit-learn/scikit-learn/pull/439 On a related topic here I just came across the following open access paper from the microarray data literature that sounds interesting (open access for the win!): Random KNN feature selection -- a fast and stable alternative to Random Forests http://www.biomedcentral.com/1471-2105/12/450 Would be interesting to try and reproduce their claim with the scikit. Also we need examples of application of feature selection (e.g. with L1 penalized methods) on micro array data in the scikit. The afore-mentioned paper links to the following dataset archives: http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html http://www.gems-system.org/ Would be great if someone would volunteer and come of with a scikit loader for those dataset. Alternatively a champion could ask the permissions to the author to upload those dataset to mldata.org and use the standard mldata dataset loader from the scikit to build a microarray example. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
