Hah! Also, even better tool. I misread disk for couch file the first time. And by better I mean even awesomer hack. In this case with the encrypted lvm I'd be morbidly curious what it turns up.
On Jun 6, 2011, at 10:06 PM, Jason Smith <[email protected]> wrote: > Thanks! I've used good tools. I would classify grep_couch as a "bad > tool" but maybe "best of class" :) > > Compaction creates a new file on the fs and unlinks the old one. If > you compact regularly, you'll tend to have *lots* of ejson fragments > laying around in the un-(re)-allocated free parts of the fs. > > That reminds me, the tool does not account for filesystem > fragmentation or anything that would make the doc not physically > contiguous on disk. Fortunately most docs are small (even in ejson!) > and they survive, especially if you've compacted a few times and you > have duplicate data on-disk but with dissimilar fragment locations. > > On Tue, Jun 7, 2011 at 8:38 AM, Paul J. Davis > <[email protected]> wrote: >> Jason, >> >> Good tool, but unless I'm mistaken, the issue here I that the data just >> doesn't exist on disk. I think we're fairly sure that this isn't the 1.0.0 >> bug but something else. I'm leaning towards something config specific but >> all of our theories appear to be incorrect given the reported observations. >> >> On Jun 6, 2011, at 9:29 PM, Jason Smith <[email protected]> wrote: >> >>> I once made a very simple CouchDB undelete tool. It scans your disk >>> device for anything that looks like the on-disk CouchDB JSON format. >>> >>> https://github.com/jhs/grep_couch >>> >>> I've recovered data with it, but notably, the _id and _rev are *not* >>> stored with the rest of a document, so you tend to get lots of docs >>> with no _id field. (I'm considering always having an "id" field to >>> dupe the "_id" in case I ever have to do that again.) >>> >>> On Mon, Jun 6, 2011 at 3:35 PM, René Brüntrup <[email protected]> wrote: >>>> Hello! >>>> >>>>> 1) Are you certain that you were in fact writing to the database on this >>>>> server and not the replica? Can you share some access logs towards that >>>>> end? >>>> >>>> We could not find the missing data in any of the replicated files. Each >>>> backup has an increasing number of documents with the least amount of >>>> missing data in the newest backup. Access logs from the 2011-03-08 are >>>> unfortunately missing, because the log rotation already removed them. >>>> >>>>> 2) Is it possible that you've inadvertently restored the database file >>>>> from a backup? >>>> >>>> No backup was created at this date and we do not have any mechanisms >>>> that could automatically restore the backups. >>>> >>>>> 3) Is it possible that you were writing "underneath" the encrypted LVM >>>>> volume for the past two months? >>>> >>>> Our system does not work without an initialized database that contains a >>>> number of user account and definition documents. But even if such an >>>> initialized database would have been available underneath the encrypted >>>> volume we would have noticed a data loss after changing the database, >>>> because the system was already in use before the 2011-03-08. >>>> >>>> >>>> We will check again, if there is a database file underneath the >>>> encrypted volume, but we cannot stop the system right now. When >>>> replicating the database we just noticed, that the timestamp of the >>>> source database was updated. The replication processes that where >>>> started after the 2011-03-08 and before the reboot of the server did not >>>> change the timestamp of the source database. >>>> >>>> >>>> Regards, >>>> René >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Iris Couch >> > > > > -- > Iris Couch
