Wednesday, April 14, 2004, 1:16:54 AM, Andrew Piskorski wrote: >[...]
> Doug Currie's "Shadow Paging" design sounds promising. > Unfortunately, I have not been able to download the referenced > papers at all (where can I get them?), There are three sources for the papers. The two links on the wiki page have been reliably available. http://www.sqlite.org/cvstrac/wiki?p=BlueSky > but as far as I can tell, it seems to be describing a system with > the usual Oracle/PostgreSQL MVCC semantics, EXCEPT of course that > Currie proposes that each Write transaction must take a lock on the > database as a whole. I agree with your summary. > [...] > Since Currie's design has only one db-wide write lock, it is > semantically equivalent to PostgreSQL's "serializable" isolation > level, correct? I believe that is true. > How could this be extended to support table locking and PostgreSQL's > default "read committed" isolation level? Would the smallest locking > granularity possible in Currie's design be one page of rows, however > many rows that happens to be? Things get *much* more complicated once you have multiple simultaneous write transactions. I didn't want to go there. One way to get table level locking without a great deal of pain is to integrate the shadow paging ideas with BTree management. Rather than using page tables for the shadow pages, use the BTrees themselves. This means that any change to a BTree requires changes along the entire path back to the root so that only free pages are used to store new data, including the BTree itself. Writing the root page(s) of the BTree(s) commits the changes to that table (these tables). > The one process, many threads aspect of Currie's design sounds just > fine to me. The one write lock for the whole database, on the other > hand, could be quite limiting. It happens to fit my applications well. I have many short duration write transactions (data logging) and lots of long duration analysis transactions. >[...] Currie's design also seems to defer writing any data to disk > until the transaction commits This is not really true. The design does attempt to defer the writing of data until commit to (1) "batch write" to as many contiguous sectors as possible (to reduce seek/rotation time latency), and (b) to prevent writing pages more than once (in case there are multiple modifications to a page in a transaction). However, if the in-memory cache fills, data pages are spilled to disk, and are written only that once if there are no further changes to the page. e --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]