On 3/16/06, Daniel Franke <[EMAIL PROTECTED]> wrote:
> Each single file contains detailed genotypic information of many
> individuals at a given genomic region. We have to implement _loads_ of
> quality control  measures to ensure the maximum possible data
> correctness. Earlier, we did this manually. We can't do this any
> longer. Handling that many files is nightmare, especially if a couple
> of different individuals are involved. For example, it is horrible to
> extract subsets from that mess (a few markers from a couple of
> individuals is a major problem with many, many files, but easily
> solved in a relational db).
>

In sqlite you can attach two databases and query across them.
Have you considered leaving it as many databases for each genomic
region and writing a tool that attaches what it needs to perform it's query?

I would bet sqlite could handle the 20 gig file, but it will take a big
machine with a lot of RAM and disk space.

Reply via email to