Note: this is a combined reply to answers sent by Fred and Jay.

Fred wrote:
> > > If so, are you trying to use a blender to stir the ocean?
> > > You might reevaluate if you're using the right tool for the job.
> >
> > That's my question: IS sqlite the right tool here? =)
>
> And I believe he is asking, "Is this the right problem, here." :-)
>
> Does sound like an awful lot of data.  I think the question might be
> reworded to ask is there any manageable logical groups of data that
> might lend themselves to simple segmentation into separate
> tables/databases?

The table in question:
CREATE TABLE genotypes (markerid integer NOT NULL REFERENCES marker(id),
                        individualid integer NOT NULL REFERENCES
individuals(id),
                        genA integer,
                        genB integer);

I don't see how to segment this. I get 1.000+ individuals and up to
500.000 markers. In a full cross of both, I get 500.000.000+ rows. It
should be possible to use a table per marker or a table per individual
-- but this would strike me odd. Even if this table is split up per
individual (within the same dbfile), say, I fail to identify any gain
...?!


Jay wrote
> In sqlite you can attach two databases and query across them.
> Have you considered leaving it as many databases for each genomic
> region and writing a tool that attaches what it needs to perform it's query?
I'm not sure whether I understand you correctly. You suggest to employ
as many small
databases as there are flat files? To attach and detach them as needed?


To put together both suggestions as I understand them: it's most
appealing to have a database for each individual and a master database
that knows about all the others ... that's an approach I have to think
about ...

Reply via email to