couchdb for genome data

Tom Sante Wed, 03 Mar 2010 14:21:53 -0800

Hi

The data is now stored in a mysql table with about a billion (1000million) rows.

These rows are the data of a genetic test (arrayCGH) and build up like this:

Every experiment (a few thousand of them total) contains measurements ofabout 180000 genetic probes. This raw data will be analyzed and thevalues run through different algorithms, so every probe needs to storemore than 1 value after the analysis is done. The values of differentanalysis are now stored in columns in that table making it a pain if wehave to add a analysis to the table not yet part of the existingcolumns. This is why a schema free document based DB is probably abetter fit.The initial idea was to give each probe a separate document, and whenthe original value is transform to an other value store this in the samedocument.


{
        "probe_id" : 1234567890,
        "experiment_id" : 1234567890,
        "raw_value" : 0.43524,
        "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
}

Once added to the database almost all changes to the data will becontained within an experiment.

MongoDB has something like collections that would be a appropriateabstraction ~ experiment. But in couchdb I would have to add all theseprobe documents in 1 big database without collections. So if I only makechanges to probes within an experiment this would influence the views ofall the other billions document in the db. Because of the large numberof documents it would be good to know beforehand what the implicationsare of this performance wise?


Any suggestions are welcome.

Tom

couchdb for genome data

Reply via email to