" single atomic '_bulk_docs' operation" FYI: _bulk_docs is not atomic.
B. On Mon, Jan 10, 2011 at 5:52 PM, Mike Leddy <[email protected]> wrote: > Thanks for the explanation. The reason I was using 'all_or_nothing' is > because earlier versions of the script tried to do all the work in a > single '_bulk_docs' call. Now I am starting to realize why that did > not work fro me..... > > I now understand why the deleted revision must be kept for replication. > I guess what I was trying to do was wrong. I cannot simply delete all > the revisions and insert potentially the same document again. > > I will go back to the original idea of doing everything in a single > atomic '_bulk_docs' operation but I will have to handle the special > case of the document not changing by simply leaving it alone and > just delete its conflicts. > > Thanks, > > Mike > > On Mon, 2011-01-10 at 11:47 -0500, Paul Davis wrote: >> On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy <[email protected]> wrote: >> > Hello, >> > >> > I have a situation where the same document can come from several sources >> > and I have written a script (in ruby) which effectively merges the >> > information in all conflicting documents (deletes the originals) and >> > inserts the new merged document. >> > >> > Everything seemed fine until I observed that sometimes the newly inserted >> > document remained deleted..... >> > >> > On further investigation I discovered that I was (by design) merging the >> > documents in a deterministic way and it was possible that if I was merging >> > documents A + B + C giving A ie: document A already has all the >> > information contained in documents A & B & C. >> > >> > Since i was deleting A and then subsequently inserting essentially the same >> > document it remained deleted even though the bulk_docs API was indicating a >> > successful insertion. >> > >> > I am using a recent 1.0.x branch. Here is the essence of what is happening >> > using the same API calls: >> > >> > # create a database >> > curl -X PUT 127.0.0.1:5984/bulk_docs >> > {"ok":true} >> > >> > # insert a doc 'mike' >> > curl -X POST -H 'Content-type: application/json' >> > 'localhost:5984/bulk_docs/_bulk_docs' -d >> > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' >> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] >> > >> > # insert another doc 'john' with same id >> > curl -X POST -H 'Content-type: application/json' >> > 'localhost:5984/bulk_docs/_bulk_docs' -d >> > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}' >> > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] >> > >> > # 'john' is the winning conflict >> > curl 'localhost:5984/bulk_docs/same' >> > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"} >> > >> > # delete 'john' >> > curl -X DELETE >> > 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7' >> > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} >> > >> > # delete 'mike' >> > curl -X DELETE >> > 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf' >> > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"} >> > >> > # none left >> > curl 'localhost:5984/bulk_docs/same' >> > {"error":"not_found","reason":"deleted"} >> > >> > # insert 'mike' again >> > curl -X POST -H 'Content-type: application/json' >> > 'localhost:5984/bulk_docs/_bulk_docs' -d >> > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' >> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] >> > >> > # ouch !!!!! >> > curl 'localhost:5984/bulk_docs/same' >> > {"error":"not_found","reason":"deleted"} >> > >> > Since I have the conflict resolution script working on all nodes I want >> > the result to be deterministic so as to be sure that all nodes calculate >> > the same result and produce revisions that are the same....... always >> > converging on exactly the same result. >> > >> > Any insights ? >> > >> > Regards, >> > >> > Mike >> > >> > >> >> This has to do with how docs in deleted states can be revived which >> can lead to unexpected behavior like this. >> >> A somewhat simpler curl session: >> >> $ curl -X PUT http://127.0.0.1:5984/test >> {"ok":true} >> >> $ curl -X POST -H "Content-Type: application/json" >> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, >> "docs": [{"_id": "same", "name": "john"}]}' >> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] >> >> $ curl -X DELETE >> http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7 >> {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} >> >> $ curl -X POST -H "Content-Type: application/json" >> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, >> "docs": [{"_id": "same", "name": "john"}]}' >> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] >> >> $ curl http://127.0.0.1:5984/test/same >> {"error":"not_found","reason":"deleted"} >> >> $ curl -X POST -H "Content-Type: application/json" >> http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same", >> "name": "john"}]}' >> [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}] >> >> $ curl http://127.0.0.1:5984/test/same >> {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"} >> >> >> What's happening here is that you're playing with the revision tree >> weirdly. The progression from your session looks something like such: >> >> [note: I'm using 0 (zero) to indicate the null state of document not >> existing]. >> >> 0 >> 0 -> A # put mike >> 0 -> (A | B) # put john, introducing conflict >> 0 -> (A | B -> C:deleted) # delete john, no more conflict >> 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted >> 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike, >> but its still deleted. >> >> What happens in the last step is that since your 're-put of mike' >> ended up creating a revision identical to A (because of >> all_or_nothing: true) the revision trees still have the D:deleted >> revision which means that the document is still deleted which gives >> you the behavior you're seeing. >> >> If when you re-put the mike version you don't use all_or_nothing: >> true, then you end up creating a revision tree like such: >> >> 0 -> (A -> D:deleted -> E | B -> C:deleted) >> >> Which recreates the doc with the new revision. >> >> On a side note, the reason that we need to keep deleted revisions is >> because that's how we determine if a conflict has been resolved during >> replication. If those revisions disappeared, you'd have to re-resolve >> conflicts after ever replication. >> > > >
