On Oct 29, 2009, at 12:30 PM, Brian Candler wrote:
On Thu, Oct 29, 2009 at 07:28:33AM +0100, fana wrote:
I read the book, Wiki and some Blogs about CouchDB,
but there is still a question in my mind.
If a document is in conflict, the application has to resolve it.
But what, if this never happens?
All the conflicting versions remain around, even through compaction.
However
if you request a document by ID, by default you will get an arbitrary
revision. The algorithm is the same across all nodes, so all nodes
will see
the same. The "winning" document is also the one seen by views.
Can the document in conflict still be read and edited?
Yes. Conflicts branch into a tree. When you've resolved a conflict,
you need
to delete the conflicting revisions explicitly.
Example:
X0
User 1 fetches X0 and updates it to X1. User 2 fetches X0 and
updates it to
X2. Then you get:
,-> X1
X0
`-> X2
If either user reads, they will see one of the versions (say X1).
They won't
even know that there's a conflict unless they query with ?
conflicts=true, in
which case they'll see the rev of X2 as well, but would need to do a
second
read to get the contents of X2.
If the database is compacted then the common ancestor X0 will be lost
forever, but X1 and X2 will still remain. (Hence you can't rely on
doing a
diff between X0 and X1, and another diff between X0 and X2, to merge
the
changes).
If you want DVCS like full diffing, then one way is to attach a diff
and revision metadata of each edit before PUTing on a document. When
there is a conflict, the revision history is completely available for
inspection, and the user can see where the conflicting edit began, etc.
If a user edits X1 and saves back as X3, you will get
,-> X1 -> X3
X0
`-> X2
Now X2 and X3 are in conflict. The conflict may be resolved in
favour of X3;
actually, I don't know the details of the algorithm, so it might be
possible
for it to be resolved in favour or X2, which means that the changes
seen in
X1 and X3 would both appear to "vanish" at that point.
The one with more edits wins, which prevents the arbitrary
disappearance of document from normal editing.
Note: if you are running on a single node, then by default,
conflicting
updates are forbidden with a 409 error. But you can get them in two
ways: by
making the changes on two separate nodes and replicating the nodes
to each
other; or by using the _bulk_docs API with {"all_or_nothing":true}.
The second case is used in the following shell script, so this may
be a good
starting point for experimentation.
---- 8< -------------
HOST=http://127.0.0.1:5984
DB="$HOST/conflict_test"
EP="$DB/_bulk_docs"
curl -s "$HOST"
curl -sX DELETE "$DB"
curl -sX PUT "$DB"
resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"type":"test"
}]}
JSON
rev0=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev0
resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev0",
"type":"test",
"data":"foo"
}]}
JSON
rev1=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev1
resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev0",
"type":"wibble",
"data":"bar"
}]}
JSON
rev2=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev2
# Now we have two conflicting versions.
echo
echo "Getting the auto-selected version:"
curl -s "$DB/mydoc"
echo
echo "Getting the auto-selected version with 'conflicts':"
curl -s "$DB/mydoc?conflicts=true"
echo
echo "Getting the auto-selected version with 'revs_info':"
curl -s "$DB/mydoc?revs_info=true"
# Note that you would have to retrieve the conflicting versions
yourself
echo "Now updating version $rev1"
resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev1",
"type":"test",
"data":"baz"
}]}
JSON
rev3=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev3
echo
echo "Getting the auto-selected version:"
curl -s "$DB/mydoc"
echo
echo "Getting the auto-selected version with 'conflicts':"
curl -s "$DB/mydoc?conflicts=true"
---- 8< -------------
Is this a sensible API? You decide. I've given my opinion previously.
This api seems weird, but it's the closest thing we can have to multi-
document transactions in CouchDB and be a distributed, partitioned
database. This is because it's pretty much impossible to support all-
or-nothing conflict checking transactions with partitioned database
without some sort of double-lock checking, which is slow and
expensive. And also replication doesn't replicate transactions, only
documents, so we don't wish to confuse users by introducing
transactions that aren't supported by the rest of CouchDB.
If you want an easier API for saving documents into a conflicted state
(something like ?conflict=ok), that would be a fairly easy patch to
make. But I'm not sure why users would want that for a single document.
Thanks for this write up, you seem to have given a good high
description how conflicts work in CouchDB.
-Damien
HTH,
Brian.