Hi all,

The following PR on Random Forests by Gilles deserves a final round of reviews:

  https://github.com/scikit-learn/scikit-learn/pull/439

On a related topic here I just came across the following open access
paper from the microarray data literature that sounds interesting
(open access for the win!):

Random KNN feature selection -- a fast and stable alternative to Random Forests
http://www.biomedcentral.com/1471-2105/12/450

Would be interesting to try and reproduce their claim with the scikit.

Also we need examples of application of feature selection (e.g. with
L1 penalized methods) on micro array data in the scikit. The
afore-mentioned paper links to the following dataset archives:

http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html
http://www.gems-system.org/

Would be great if someone would volunteer and come of with a scikit
loader for those dataset. Alternatively a champion could ask the
permissions to the author to upload those dataset to mldata.org and
use the standard mldata dataset loader from the scikit to build a
microarray example.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to