G'day,
"D. Richard Hipp" <[EMAIL PROTECTED]>
12/06/2004 08:16 AM
To: [EMAIL PROTECTED]
cc:
Subject: [sqlite] Locking and concurrency in SQLite version 3.0
> http://www.sqlite.org/lockingv3.html
My thoughts are listed as they come to me.
Thought 1:
Section 5.0, entitled "Writing to a database file":
After the in-memory cache initially spills to disk the exclusive lock must
be maintained because the database file is changed. One way to avoid this
happening might be to change the database file and log structure as
follows:
1) Add a numeric entry to each page in the database file that refers to a
specific page (or file offset) into the log file.
2) Add an entry to each log file entry indicating a 0 or 1.
If a page is read from the database and has a non-zero file offset, that
page refers to the roll-forward log that superceeds it. A zero in the log
file entry indicates it is a rollback entry, while a 1 indicates it is a
roll-forward entry.
The algorithms described would change in the following ways:
1) Instead of writing the dirty page to the main file when memory
spilliage occurs, write it to the journal. If the main file entry already
has a file offset encoded into it, write the page to that offset. If the
main file entry has no offset, write it at the end of the journal file and
overwrite only the offset of the main page.
2) Readers with shared locks should always overlook any such offsets it
finds in main files. Readers with any of the writer locks should refer to
the journal for the updated version of such pages.
3) When rolling back a journal file, only rollback pages with a 0 entry in
the rollback/roll-forward field.
4) When committing a transaction write all pages from memory, but also
commit any pages in the journal with a 1 in the rollback/roll-forward
field.
5) You might have to rethink any vacuum operation and some other small
aspects of life. By using the main file as an index into the roll-forward
log you make truncating the database file more difficult.
One extra alternative to throw in is to keep the roll-back and
roll-forward journals in separate files. That would avoid the need to
identify the individual log entries as roll-back or roll-forward and may
improve performance of large changes. The roll-forward file would never
have to be committed.
This approach differs slightly from previous suggestions of the shadow
pager or of creating tree structures in the journal file. It does not
completely virtualise the pager level, although the concept is similar. It
requires only trivial extra structure in the journal file since it uses
the real main file as an index into the roll-forward section of the
journal. If this kind of scheme were to be implimented in the future the
groundwork in file format changes could be laid now in a
forward-compatable way by allocating the necessary spaces and always
ensuring they had a zero value.
Thought 2:
I'm a little concerned about when SQLITE_BUSY can be returned. In section
7.0, entitled "Transaction Control At The SQL Level" a mention is made of
locks not be acquired with the BEGIN statement. Personally I don't like to
see SQLITE_BUSY at all. I currently modify my sqlite version to use
blocking locks in restricted ways to avoid getting the message and ensure
optimum fairness. If they do occur, I would prefer they happen at
well-defined and designated places. Hmmm... I guess I can't think of any
cases where this is really an issue, though.
I would like to see blocking locks supported by SQLITE. If that's not
possible it's ok, but my preference is that the capability should exist.
Currently sqlite provides an API to execute a function when SQLITE_BUSY
would be returned. That's ok, but doesn't suit blocking locks well for two
reasons: 1) The locking semantics of sqlite use operating system locks in
specific ways that would be unwise to mess with in a callback function. 2)
I don't belive there is an API to register a corresponding unlock function
to the sqlite_busy_callback, so whatever locks might be put in place can't
be unmade at appropriate times. Perhaps the API should be changed to
support replacement of the various os.c lock functions for each of the
specific lock types in the new sqlite locking model.
As a matter of interest, the current sqlite isn't far off being able to
work with blocking lock in place of its existing non-blocking locks. The
main prohibition that needs to be imposed is that shared locks cannot be
upgraded to exclusive locks. The current sqlite can be "tuned" to ensure
exclusive locks are obtained early to prevent blocking locks from
deadlocking. I haven't seen the new sqlite3 code and haven't seen detail
of how the various locking mode transitions will be implimented in a posix
environment to know whether extra problems will be introduced in this
area.
By my reading the allowable transitions will be these:
Unlocked -> Shared (non-blocking, may return busy)
Shared -> Pending (non-blocking, my return busy)
Exclusive -> Shared (will always work immediately)
Shared -> Reserved (non-blocking, may return busy)
Reserved -> Pending (will always work immediately when preparing to
write?)
Pending -> Exclusive (non-blocking, may return busy when preparing to
rollback)
Pending -> Exclusive (blocking, returns succesful eventually when
preparing to write)
Again, I'm not sure of the exact posix operations involved in each
transition but I read this as a lot of shared locks that are transfomed
into exclusive locks only twice: Once in a non-blocking manner during
rollback, and once in a blocking manner during write operations. More may
be introduced in the fiddling with reserved and pending locks. I would
guess that Shared -> Reserved requires this at least once, and that
Reserved -> Pending does too.
Either way, it looks like blocking locks could be used most of the time,
at least in principle. Hmmm... I'm rambling a little... but to get to the
point: Making blocking locks can be tricky. It kinda looks like it'll be
trickier under the new system. It's only natural when you're trying to
avoid holding exclusive locks for any extended period of time, but still
guaranteeing that you will be able to get one when the time is right. I
suppose the balance is right there, and it'll be my code that has to
change in the end to get the most of this sqlite version. Anything that
preserves the new functionality and reduces the opportunities for getting
SQLITE_BUSY would be good, though.
Benjamin.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]