David,
one of the primary authors of the paper is the primary author of the
RF/tree module in scikit-learn (Giles Louppe). The full code used for the
paper is available on github: https://github.com/0asa/TTree-source
Federico
On Mon, Jan 12, 2015 at 7:39 AM, David Gerster <[email protected]> wrote:
> I found an interesting paper that claims to improve predictive models on
> genomes by exploiting the fact that genes tend to clump together on a
> chromosome ("linkage disequilibrium").
>
>
> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0093379#s2
>
> In their data, they order the genes (technically single-nucleotide
> polymorphisms, or "SNPs") according to their physical location on a
> chromosome and organize these features into disjoint "blocks" of contiguous
> SNPs. They then modify the Random Forest to select only from the SNPs in a
> randomly selected "block" at each node, instead of the normal RF which
> picks randomly from all the features at each node.
>
> Anyway, I was wondering how to adapt the scikit RF algorithm to do
> something like this. I'm working with some plant biologists who would find
> this useful. (Also, I'm not a biologist, so please feel free to correct me
> if I got anything wrong.)
>
> Thanks
>
> DG
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> vanity: www.gigenet.com
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
vanity: www.gigenet.com
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general