On Jun 30, 2011, at 11:37 PM, Zdravko Gligic wrote:
> But neither one even bothered trying to answer my question of whether
> just the last updated header or perhaps the last few are ever used.
Just the last one. But at any point in time, the last one is vital for
recovery. It just becomes useless after another one is successfully appended.
> I was also under an impression that the update headers (pointers to
> the root of btree) are also somehow being used for reading
> consistency. If so then this might suggest that a database could be
> rolled back to some previous point in time. How far back and how
> practical is another question.
Sort of. My understanding (I haven’t looked at the source) is that when a
request handler begins, it reads the header at the current EOF and finds the
root node. After that it reads by starting from that root node. But I think
that after the request handling begins, the header isn’t looked at anymore.
I am not sure whether the db looks up older revisions of documents by starting
from an earlier header (“going back in time”); I don’t think so, because this
would be inefficient (O(N)) for finding a specific revision of a document.
Instead my hunch is that each document points back to the position in the file
of its previous revision. (Again, disclaimer, I am extrapolating based on my
knowledge of similar data structures.)
You might find the blog post on CouchDBs internal structures interesting. It’s
two years old, though, so I don’t know how much of it is still accurate:
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
—Jens