[Scikit-learn-general] Modifying the subspace used in a Random Forest

David Gerster Sun, 11 Jan 2015 23:07:45 -0800

I found an interesting paper that claims to improve predictive models on
genomes by exploiting the fact that genes tend to clump together on a
chromosome ("linkage disequilibrium").


http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0093379#s2

In their data, they order the genes (technically single-nucleotide
polymorphisms, or "SNPs") according to their physical location on a
chromosome and organize these features into disjoint "blocks" of contiguous
SNPs.  They then modify the Random Forest to select only from the SNPs in a
randomly selected "block" at each node, instead of the normal RF which
picks randomly from all the features at each node.

Anyway, I was wondering how to adapt the scikit RF algorithm to do
something like this. I'm working with some plant biologists who would find
this useful.  (Also, I'm not a biologist, so please feel free to correct me
if I got anything wrong.)

Thanks

DG

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
vanity: www.gigenet.com

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Modifying the subspace used in a Random Forest

Reply via email to