I found an interesting paper that claims to improve predictive models on
genomes by exploiting the fact that genes tend to clump together on a
chromosome ("linkage disequilibrium").
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0093379#s2
In their data, they order the genes (technically single-nucleotide
polymorphisms, or "SNPs") according to their physical location on a
chromosome and organize these features into disjoint "blocks" of contiguous
SNPs. They then modify the Random Forest to select only from the SNPs in a
randomly selected "block" at each node, instead of the normal RF which
picks randomly from all the features at each node.
Anyway, I was wondering how to adapt the scikit RF algorithm to do
something like this. I'm working with some plant biologists who would find
this useful. (Also, I'm not a biologist, so please feel free to correct me
if I got anything wrong.)
Thanks
DG
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
vanity: www.gigenet.com
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general