Note: this is a combined reply to answers sent by Fred and Jay. Fred wrote: > > > If so, are you trying to use a blender to stir the ocean? > > > You might reevaluate if you're using the right tool for the job. > > > > That's my question: IS sqlite the right tool here? =) > > And I believe he is asking, "Is this the right problem, here." :-) > > Does sound like an awful lot of data. I think the question might be > reworded to ask is there any manageable logical groups of data that > might lend themselves to simple segmentation into separate > tables/databases?
The table in question: CREATE TABLE genotypes (markerid integer NOT NULL REFERENCES marker(id), individualid integer NOT NULL REFERENCES individuals(id), genA integer, genB integer); I don't see how to segment this. I get 1.000+ individuals and up to 500.000 markers. In a full cross of both, I get 500.000.000+ rows. It should be possible to use a table per marker or a table per individual -- but this would strike me odd. Even if this table is split up per individual (within the same dbfile), say, I fail to identify any gain ...?! Jay wrote > In sqlite you can attach two databases and query across them. > Have you considered leaving it as many databases for each genomic > region and writing a tool that attaches what it needs to perform it's query? I'm not sure whether I understand you correctly. You suggest to employ as many small databases as there are flat files? To attach and detach them as needed? To put together both suggestions as I understand them: it's most appealing to have a database for each individual and a master database that knows about all the others ... that's an approach I have to think about ...