I found an interesting paper that claims to improve predictive models on
genomes by exploiting the fact that genes tend to clump together on a
chromosome ("linkage disequilibrium").

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0093379#s2

In their data, they order the genes (technically single-nucleotide
polymorphisms, or "SNPs") according to their physical location on a
chromosome and organize these features into disjoint "blocks" of contiguous
SNPs.  They then modify the Random Forest to select only from the SNPs in a
randomly selected "block" at each node, instead of the normal RF which
picks randomly from all the features at each node.

Anyway, I was wondering how to adapt the scikit RF algorithm to do
something like this. I'm working with some plant biologists who would find
this useful.  (Also, I'm not a biologist, so please feel free to correct me
if I got anything wrong.)

Thanks

DG
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
vanity: www.gigenet.com
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to