Re: couchdb for genome data

Tom Sante Thu, 04 Mar 2010 03:19:27 -0800

Thanks. Are there any limits to the number of databases in couchdb? Afew 1000 probably won't be a problem I guess?


On 4/03/10 11:49, Simon Metson wrote:

Hi,
Why not use a database per experiment? Do you need to process data
across experiments? Can you store your raw data in individual databases
and then pull summary data into a single database?
Cheers
Simon


On 3 Mar 2010, at 22:21, Tom Sante wrote:

Hi

The data is now stored in a mysql table with about a billion (1000
million) rows.
These rows are the data of a genetic test (arrayCGH) and build up like
this:

Every experiment (a few thousand of them total) contains measurements
of about 180000 genetic probes. This raw data will be analyzed and the
values run through different algorithms, so every probe needs to store
more than 1 value after the analysis is done. The values of different
analysis are now stored in columns in that table making it a pain if
we have to add a analysis to the table not yet part of the existing
columns. This is why a schema free document based DB is probably a
better fit.
The initial idea was to give each probe a separate document, and when
the original value is transform to an other value store this in the
same document.

{
"probe_id" : 1234567890,
"experiment_id" : 1234567890,
"raw_value" : 0.43524,
"analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
}

Once added to the database almost all changes to the data will be
contained within an experiment.

MongoDB has something like collections that would be a appropriate
abstraction ~ experiment. But in couchdb I would have to add all these
probe documents in 1 big database without collections. So if I only
make changes to probes within an experiment this would influence the
views of all the other billions document in the db. Because of the
large number of documents it would be good to know beforehand what the
implications are of this performance wise?

Any suggestions are welcome.

Tom

Re: couchdb for genome data

Reply via email to