Re: couchdb for genome data

Simon Metson Thu, 04 Mar 2010 02:49:50 -0800

Hi,

Why not use a database per experiment? Do you need to process dataacross experiments? Can you store your raw data in individualdatabases and then pull summary data into a single database?

Cheers
Simon


On 3 Mar 2010, at 22:21, Tom Sante wrote:

Hi
The data is now stored in a mysql table with about a billion (1000million) rows.These rows are the data of a genetic test (arrayCGH) and build uplike this:
Every experiment (a few thousand of them total) containsmeasurements of about 180000 genetic probes. This raw data will beanalyzed and the values run through different algorithms, so everyprobe needs to store more than 1 value after the analysis is done.The values of different analysis are now stored in columns in thattable making it a pain if we have to add a analysis to the table notyet part of the existing columns. This is why a schema free documentbased DB is probably a better fit.The initial idea was to give each probe a separate document, andwhen the original value is transform to an other value store this inthe same document.
{
        "probe_id" : 1234567890,
        "experiment_id" : 1234567890,
        "raw_value" : 0.43524,
        "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
}
Once added to the database almost all changes to the data will becontained within an experiment.
MongoDB has something like collections that would be a appropriateabstraction ~ experiment. But in couchdb I would have to add allthese probe documents in 1 big database without collections. So if Ionly make changes to probes within an experiment this wouldinfluence the views of all the other billions document in the db.Because of the large number of documents it would be good to knowbeforehand what the implications are of this performance wise?
Any suggestions are welcome.

Tom

Re: couchdb for genome data

Reply via email to