No. If you delete a document properly (using DELETE, not just setting a _deleted property) you won't have this problem. The old revision with the data will be gone after compaction, leaving only an empty "tombstone".
--Jens [via iPhone] On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <[email protected]> wrote: > I understand if this is necessary for eventual consistency, but shouldn't > this be better-documented? I generally expected that if I delete sensitive > or unwanted data, or that a user requests that their personal or private > data be deleted, it'll be deleted in a way that's more solid than basically > hiding it. Sure, CouchDB won't let you get at that document, but it's > certainly still there on the disk, and presumably detectable if you > inspected the data structure that holds individual documents. Not a very > good situation vis a vis security. I know that normal unix "deletion" > leaves files technically on disk, but there are ways to allow for that and > prevent it from being an issue. > > Even setting data security aside, I've been using CouchDB as a kind of > staging environment for large amounts of data which should ultimately be > elsewhere (different flavours relational databases, databases belonging to > different organisations, etc.) because it's really easy to implement as an > interface and let people just throw whatever they want into it with a POST. > It's really the perfect tool for that, but pretty soon there'll be tens of > gigabytes a day of data flowing through the system, and most of it just > needs to be indexed for a while before our scheduled scripts pull it all > out, shove it elsewhere and delete it. In this use case, if I'm > understanding this correctly, we'll get crazy storage blowouts unless we > implement a bunch of hacks to switch to new databases after performing > deletions (as well as scripts that make our HTTP reverse proxy > transparently and intelligently route data to the new database - absolutely > not a trivial task in any complex system with many moving parts). > > But you know, this all comes with the territory. If the devs say there's a > good reason for documents to stick around after deletion, I believe them, > but I think that's a pretty huge point and I don't know how I've missed it. > > What's the way to delete a document if I actually want to really delete the > data? Changing it to a blank document before deleting, and then compacting? > > On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <[email protected]> wrote: > >> >> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote: >> >>> 1) How exactly could you make this switch without interrupting service? >> >> Replicate database to new db, then atomically switch your proxy or >> whatever to the new db from the old one. >> Depending on how long the replication takes, there’s a race condition here >> where changes made to the old db during the replication won’t be propagated >> to the new one; you could either repeat the process incrementally until >> this doesn’t happen, or else put the db into read-only mode while you’re >> doing the copy. >> >> This might also be helpful: http://tinyurl.com/89lr3fl >> >>> 2) Wouldn't this procedure create the exact same eventual consistency >>> problems that deleting documents in a db would? >> >> No; what’s necessary is the revision tree, and the replication will >> preserve that. You’re just losing the contents of the deleted revisions >> that accidentally got left behind because of the weird way the documents >> were deleted. >> >> —Jens >> >>
