Thursday, April 15, 2004, 9:16:01 AM, Christian Smith wrote:

> On Wed, 14 Apr 2004, Doug Currie wrote:

>>One way to get table level locking without a great deal of pain is to
>>integrate the shadow paging ideas with BTree management. Rather than
>>using page tables for the shadow pages, use the BTrees themselves.
>>This means that any change to a BTree requires changes along the
>>entire path back to the root so that only free pages are used to store
>>new data, including the BTree itself. Writing the root page(s) of the
>>BTree(s) commits the changes to that table (these tables).

> Actually, this gets my vote. Keeps the pager layer the same,

The pager gets *much* simpler because it doesn't need to make a log
file. The log file is not necessary because writes only go to free
pages.

Well, there would be one write-ahead log. It's needed to prevent
partial updates to the page number pointers to the roots page(s) of
the BTree(s) at commit. This log is created at commit time, and is
much simpler and much smaller than the present log file.

> and only requires a cache of the root btree for each object
> (table/index) in the database to be maintained on a per-transaction
> basis

Yes, you need to cache the page number of each BTree root at
transaction start.

You'd also need a forest of free pages organized by transaction so
they can be freed at the right time (when the oldest read-transaction
that can reference them has completed).

> , reducing the complications of what to do under memory pressure
> when pages are spilled from the cache as we should be able to keep
> them in memory all the time.

Yes.

> Committing of a transaction would then be an atomic update root btree page
> number in the catalog table.

Yes, atomically for all the BTrees modified. This is probably a single
page of data (4 to 8 bytes of root page number per BTree, i.e., per
table and per index). Well, I usually assume fairly large pages
compared with SQLite's default of 1K. Using larger pages also
decreases the depth of the BTree which reduces the number of pages
written.

This design works well. It has the advantage (compared with shadow
pager) that reads are not burdened with page table indirection. It has
the potential disadvantage (compared with SQLite 2.8) that small
writes can modify several pages (based on the depth of the BTree).

I used this design in a proprietary database in the late 1980s. The
only reason I didn't consider modifying SQLite this way up until now
is that I was anticipating BTree changes for 3.0, so I confined my
efforts to the pager layer.

e


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to