Wednesday, April 14, 2004, 1:16:54 AM, Andrew Piskorski wrote:

>[...]

> Doug Currie's "Shadow Paging" design sounds promising.
> Unfortunately, I have not been able to download the referenced
> papers at all (where can I get them?),

There are three sources for the papers. The two links on the wiki page
have been reliably available.
http://www.sqlite.org/cvstrac/wiki?p=BlueSky

> but as far as I can tell, it seems to be describing a system with
> the usual Oracle/PostgreSQL MVCC semantics, EXCEPT of course that
> Currie proposes that each Write transaction must take a lock on the
> database as a whole.

I agree with your summary.

> [...]

> Since Currie's design has only one db-wide write lock, it is
> semantically equivalent to PostgreSQL's "serializable" isolation
> level, correct?

I believe that is true.

> How could this be extended to support table locking and PostgreSQL's
> default "read committed" isolation level? Would the smallest locking
> granularity possible in Currie's design be one page of rows, however
> many rows that happens to be?

Things get *much* more complicated once you have multiple simultaneous
write transactions. I didn't want to go there.

One way to get table level locking without a great deal of pain is to
integrate the shadow paging ideas with BTree management. Rather than
using page tables for the shadow pages, use the BTrees themselves.
This means that any change to a BTree requires changes along the
entire path back to the root so that only free pages are used to store
new data, including the BTree itself. Writing the root page(s) of the
BTree(s) commits the changes to that table (these tables).

> The one process, many threads aspect of Currie's design sounds just
> fine to me.  The one write lock for the whole database, on the other
> hand, could be quite limiting.

It happens to fit my applications well. I have many short duration
write transactions (data logging) and lots of long duration analysis
transactions.

>[...] Currie's design also seems to defer writing any data to disk
> until the transaction commits

This is not really true. The design does attempt to defer the writing
of data until commit to (1) "batch write" to as many contiguous
sectors as possible (to reduce seek/rotation time latency), and (b) to
prevent writing pages more than once (in case there are multiple
modifications to a page in a transaction). However, if the in-memory
cache fills, data pages are spilled to disk, and are written only that
once if there are no further changes to the page.

e


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to