So, can I re-use the deleted document? My _id is part of the data and has meaning. If I delete the old _id, am I not allowed to have that same meaning again by reclaiming the _id? _id="block_1_house_1" then a hurricane and so we delete it. Then we rebuild it (maybe) and so I need _id="block_1_house_1" again.
George On Sun, Dec 25, 2011 at 5:20 AM, Robert Newson <[email protected]> wrote: > Mark, > > Using the DELETE method simply updates the document to > > {"_id":"foo","_rev":"newrev","_deleted":true} > > If you did the same via PUT or POST, you'd get exactly the same effect > as DELETE. > > Daniel, > > You have a valid point, that this should be better documented. It is > unknown how many phantom documents are out there, those that were > deleted by adding _deleted:true on the assumption that this cleans out > the document. In fact, when I first noticed this effect I created a > JIRA ticket and applied a patch to fix it, before Damien pointed out > that this behavior is intentional (indeed, necessary). > > To answer your final question, CouchDB preserves what you ask it to, > it does not alter the contents of documents itself. So, if you save > {"_id":"foo","_rev":"newrev","_deleted":true. "password to my bank > account":"foobar"}, it will do so. Use either the DELETE http method > or POST/PUT only the document you wish to be stored (minimum is, as > noted above, _id, _rev and _deleted). > > B. > > > On 25 December 2011 00:40, Jens Alfke <[email protected]> wrote: > > No. If you delete a document properly (using DELETE, not just setting a > _deleted property) you won't have this problem. The old revision with the > data will be gone after compaction, leaving only an empty "tombstone". > > > > --Jens [via iPhone] > > > > On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <[email protected]> wrote: > > > >> I understand if this is necessary for eventual consistency, but > shouldn't > >> this be better-documented? I generally expected that if I delete > sensitive > >> or unwanted data, or that a user requests that their personal or private > >> data be deleted, it'll be deleted in a way that's more solid than > basically > >> hiding it. Sure, CouchDB won't let you get at that document, but it's > >> certainly still there on the disk, and presumably detectable if you > >> inspected the data structure that holds individual documents. Not a very > >> good situation vis a vis security. I know that normal unix "deletion" > >> leaves files technically on disk, but there are ways to allow for that > and > >> prevent it from being an issue. > >> > >> Even setting data security aside, I've been using CouchDB as a kind of > >> staging environment for large amounts of data which should ultimately be > >> elsewhere (different flavours relational databases, databases belonging > to > >> different organisations, etc.) because it's really easy to implement as > an > >> interface and let people just throw whatever they want into it with a > POST. > >> It's really the perfect tool for that, but pretty soon there'll be tens > of > >> gigabytes a day of data flowing through the system, and most of it just > >> needs to be indexed for a while before our scheduled scripts pull it all > >> out, shove it elsewhere and delete it. In this use case, if I'm > >> understanding this correctly, we'll get crazy storage blowouts unless we > >> implement a bunch of hacks to switch to new databases after performing > >> deletions (as well as scripts that make our HTTP reverse proxy > >> transparently and intelligently route data to the new database - > absolutely > >> not a trivial task in any complex system with many moving parts). > >> > >> But you know, this all comes with the territory. If the devs say > there's a > >> good reason for documents to stick around after deletion, I believe > them, > >> but I think that's a pretty huge point and I don't know how I've missed > it. > >> > >> What's the way to delete a document if I actually want to really delete > the > >> data? Changing it to a blank document before deleting, and then > compacting? > >> > >> On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <[email protected]> wrote: > >> > >>> > >>> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote: > >>> > >>>> 1) How exactly could you make this switch without interrupting > service? > >>> > >>> Replicate database to new db, then atomically switch your proxy or > >>> whatever to the new db from the old one. > >>> Depending on how long the replication takes, there’s a race condition > here > >>> where changes made to the old db during the replication won’t be > propagated > >>> to the new one; you could either repeat the process incrementally until > >>> this doesn’t happen, or else put the db into read-only mode while > you’re > >>> doing the copy. > >>> > >>> This might also be helpful: http://tinyurl.com/89lr3fl > >>> > >>>> 2) Wouldn't this procedure create the exact same eventual consistency > >>>> problems that deleting documents in a db would? > >>> > >>> No; what’s necessary is the revision tree, and the replication will > >>> preserve that. You’re just losing the contents of the deleted revisions > >>> that accidentally got left behind because of the weird way the > documents > >>> were deleted. > >>> > >>> —Jens > >>> > >>> > -- George Burt President TrueShot Enterprises, LLC. (386) 208-1309 Fax (213) 477-2195 www.TrueShot.com 12756 92nd Ter Live Oak, FL 32060
