On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy <[email protected]> wrote: > Hello, > > I have a situation where the same document can come from several sources > and I have written a script (in ruby) which effectively merges the > information in all conflicting documents (deletes the originals) and > inserts the new merged document. > > Everything seemed fine until I observed that sometimes the newly inserted > document remained deleted..... > > On further investigation I discovered that I was (by design) merging the > documents in a deterministic way and it was possible that if I was merging > documents A + B + C giving A ie: document A already has all the > information contained in documents A & B & C. > > Since i was deleting A and then subsequently inserting essentially the same > document it remained deleted even though the bulk_docs API was indicating a > successful insertion. > > I am using a recent 1.0.x branch. Here is the essence of what is happening > using the same API calls: > > # create a database > curl -X PUT 127.0.0.1:5984/bulk_docs > {"ok":true} > > # insert a doc 'mike' > curl -X POST -H 'Content-type: application/json' > 'localhost:5984/bulk_docs/_bulk_docs' -d > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > # insert another doc 'john' with same id > curl -X POST -H 'Content-type: application/json' > 'localhost:5984/bulk_docs/_bulk_docs' -d > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}' > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] > > # 'john' is the winning conflict > curl 'localhost:5984/bulk_docs/same' > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"} > > # delete 'john' > curl -X DELETE > 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7' > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} > > # delete 'mike' > curl -X DELETE > 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf' > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"} > > # none left > curl 'localhost:5984/bulk_docs/same' > {"error":"not_found","reason":"deleted"} > > # insert 'mike' again > curl -X POST -H 'Content-type: application/json' > 'localhost:5984/bulk_docs/_bulk_docs' -d > '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > # ouch !!!!! > curl 'localhost:5984/bulk_docs/same' > {"error":"not_found","reason":"deleted"} > > Since I have the conflict resolution script working on all nodes I want > the result to be deterministic so as to be sure that all nodes calculate > the same result and produce revisions that are the same....... always > converging on exactly the same result. > > Any insights ? > > Regards, > > Mike > >
This has to do with how docs in deleted states can be revived which can lead to unexpected behavior like this. A somewhat simpler curl session: $ curl -X PUT http://127.0.0.1:5984/test {"ok":true} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, "docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] $ curl -X DELETE http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7 {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, "docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] $ curl http://127.0.0.1:5984/test/same {"error":"not_found","reason":"deleted"} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}] $ curl http://127.0.0.1:5984/test/same {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"} What's happening here is that you're playing with the revision tree weirdly. The progression from your session looks something like such: [note: I'm using 0 (zero) to indicate the null state of document not existing]. 0 0 -> A # put mike 0 -> (A | B) # put john, introducing conflict 0 -> (A | B -> C:deleted) # delete john, no more conflict 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike, but its still deleted. What happens in the last step is that since your 're-put of mike' ended up creating a revision identical to A (because of all_or_nothing: true) the revision trees still have the D:deleted revision which means that the document is still deleted which gives you the behavior you're seeing. If when you re-put the mike version you don't use all_or_nothing: true, then you end up creating a revision tree like such: 0 -> (A -> D:deleted -> E | B -> C:deleted) Which recreates the doc with the new revision. On a side note, the reason that we need to keep deleted revisions is because that's how we determine if a conflict has been resolved during replication. If those revisions disappeared, you'd have to re-resolve conflicts after ever replication.
