[sqlite] A quick code review, analysis of locking model, "fine-tuning" suggestions

ben . carlyle Thu, 24 Jun 2004 04:53:10 -0700

G'day,

I thought I'd so somewhat of a code review on the lower-level pieces of 
sqlite 3.0.1, so here goes:


v----- code review ----v

<file name="os_unix.c">
        <comment lines="546-549" function="sqlite3OsWrite" 
importance="high">
                You use a loop here to try and write all data 
synchronously to disk. I had to read it a couple of times before I 
realised it was actually performing the check on write's return correctly, 
but I am happy with it
                Is there a reason why similar logic doesn't appear in 
sqlite3OsRead? 
                Background: Under unix, multiple reads/writes may be 
required for "slow" devices. One read/write should alway suffice for a 
"fast" device. A disk is considered a fast device for this purpose. An nfs 
mount would be considered a slow device.
        </comment>
        <comment function="sqlite3OsLock" importance="medium-buggsme">
                I've always been uneasy about the interaction in sqlite 
between the pager an os layer with regards to locking. It seems like 
excessive coupling to me. The os layer is essentially written to know the 
lock state of the pager, and then participate in a dance whereby the pager 
and os always approximately agree on the lock status. The sqlite3OsLock 
function is an example of this, where an attempt to grab a lock of a less 
than or equal status to the current lock is a no-op. To me this means that 
the pager is making uncessary calls and doesn't seem particularly aware of 
its own state. The os layer knows too much about what's happening ("I know 
that you didn't really mean to call this function"), and the pager doesn't 
know enough ("I'll make this call, just in case"). I think that all such 
no-op forms of os layer functions should be replaced with assertions and 
the code above made self-contained.
                The concept of upgrading and downgrading locks 
transparently has also consistently bugged me, especially when I've wanted 
to use blocking locks. It makes things much harder when the aren't 
clearly-defined individual lifecycles for reader and writer locks, and 
when writers are forced to share the early lifecycle of readers.
        </comment>
        <comment lines="742" function="sqlite3OsLock" importance="low">
                Extraneous assert. The if condition tests this assert 
already.
        </comment>
        <comment lines="743" function="sqlite3OsLock" importance="low">
                Use the #define value rather than numeric value of 
NO_LOCK.
        </comment>
        <comment lines="761" function="sqlite3OsLock" 
importance="critical">
                F_RDLCK should be replaced with F_WRLCK. Locking only with 
F_RDLCK has no effect.
        </comment>
        <comment lines="804" function="sqlite3OsLock" importance="low">
                Use the #define value rather than numeric value of 
NO_LOCK.
        </comment>
        <comment lines="808" function="sqlite3OsLock" importance="low">
                lock.l_len should be set to 1 when it needs to be that 
value. Doing it at the top of function reduces code clarity and introduces 
uncecessary asymmetry in conditional branches.
        </comment>
        <comment function="sqlite3OsLock" importance="medium-buggsme">
                I'm nervy about the different locks that might be held by 
an process in EXCLUSIVE_LOCK state depending on how it reached that state. 
If it went through SHARED_LOCK it has SHARED_FIRST through SHARED_FIRST + 
SHARED_SIZE write locked + PENDING_BYTE through PENDING_BYTE + 1 read 
locked. If it came through the reserved state it also has a write lock on 
RESERVED_BYTE through RESERVED_BYTE + 1. This appears to be dealt with in 
the unlock code, but it grates a little. I actually don't like the SHARED 
-> PENDING path at all, and think it should be removed for simplicity. It 
effectively creates two versions of the pending and exclusive states, 
respectively.
        </comment>
        <comment lines="826-829" function="sqlite3OsLock" 
importance="low">
                I don't like the setting of PENDING_LOCK here. Surely the 
code would be clearer if it were set back when the pending lock was 
obtained. I understand that you still want access to the "old value" 
during the rest of the function, but couldn't you copy it?
        </comment>
        Ok, I haven't reviewed much past this point. I was hoping to get 
in some comments on the pager itself which I haven't read, yet... but I've 
been at this for a little too long now.
</file>

v---- analysis of locking model ----v

Just to get this straight in my head, this is the current unix locking 
model (does this appear somewhere in comments?):

States

NO_LOCK = Nothing locked
SHARED_LOCK = SHARED_FIRST through SHARED_FIRST + SHARED_SIZE read-locked
RESERVED_LOCK = RESERVED_BYTE through RESERVED_BYTE + 1 write-locked + 
locks of SHARED_LOCK state
PENDING_LOCK = PENDING_BYTE through PENDING_BYTE + 1 read-locked + either 
locks of SHARED_LOCK state, or locks of RESERVED_LOCK state
EXCLUSIVE_LOCK = SHARED_FIRST through SHARED_FIRST + SHARED_SIZE 
write-locked + locks of PENDING_LOCK state

Transitions

NO_LOCK -> SHARED_LOCK:
1. Pick up locks for pending state
2. Pick up locks for shared state
3. Drop locks for pending state

SHARED_LOCK -> RESERVED_LOCK:
1. Pick up locks for reserved state. Reserved lock is exclusive, so only 
one process can be in reserved state at any one time but concurrency with 
readers is ok.

SHARED_LOCK -> EXCLUSIVE_LOCK
1. Pick up locks for pending state
2. Pick up locks for exclusive state

RESERVED_LOCK -> EXCLUSIVE_LOCK
1. Pick up locks for pending state
2. Pick up locks for exclusive state

PENDING_LOCK -> EXCLUSIVE_LOCK
2. Pick up locks for exclusive state

This makes three main locking lifecycles:
1. NO_LOCK -> SHARED_LOCK -> NO_LOCK
2. NO_LOCK -> SHARED_LOCK -> SHARED_PENDING_LOCK -> SHARED_EXCLUSIVE_LOCK 
-> NO_LOCK
3. NO_LOCK -> SHARED_LOCK -> RESERVED_LOCK -> RESERVED_PENDING_LOCK -> 
RESERVED_EXCLUSIVE_LOCK -> NO_LOCK

Thoughts

My comments on compatability with a blocking lock mechanism:
* Path 1 is safe to replace with blocking locks. A strict sequence of 
locks is followed and no lock upgrade occurs.
* Paths 2 and 3 are not safe. They put a pending lock on the database 
while they still have a read lock on the shared section. When they 
subsequently put a write lock on the shared section they can't guarantee 
that all existing read locks will eventually clear. Other processes may be 
trying the same manouver.

v----- My suggestion, from here down -----v

I think things are too complicated as they are, and don't map well to 
blocking semantics. It's not far off, though, and fixes should be simple. 
Here's what I think should be done:

Create separate locking lifecycles for a database reader and a database 
writer. The shared "SHARED_LOCK" state does noone any good. I recommend 
the following transitions:
1. NO_LOCK -> PENDING_LOCK -> SHARED_LOCK -> NO_LOCK
2. NO_LOCK -> RESERVED_LOCK -> PENDING_LOCK -> EXCLUSIVE_LOCK

Consider the following lock states:
NO_LOCK = Nothing locked
SHARED_LOCK = SHARED_FIRST through SHARED_FIRST + SHARED_SIZE read-locked
RESERVED_LOCK = RESERVED_BYTE through RESERVED_BYTE + 1 write-locked + 
locks of SHARED_LOCK state
PENDING_LOCK = PENDING_BYTE through PENDING_BYTE + 1 write-locked only
EXCLUSIVE_LOCK = SHARED_FIRST through SHARED_FIRST + SHARED_SIZE 
write-locked only

Consider the following transitions:
NO_LOCK -> SHARED_LOCK
1. Acquire exclusive pending lock
2. Acquire shared lock

NO_LOCK -> RESERVED_LOCK
1. Acquire reserved lock (don't do anything with pending locks)
2. Acquire shared lock

RESERVED_LOCK -> EXCLUSIVE_LOCK
1. Acquire pending lock
2. Upgrade shared lock to exclusive lock
3. Drop reserved and pending locks

* Use the reader lifecycle for SELECT statements outside of transactions. 
Create the exclusive pending lock, acquire the reader lock, then drop the 
pending lock. Reader locks must eventually be unlocked.
* When a transaction is started (explicitly or implicitly) drop any shared 
locks back to the NO_LOCK state, then move to the RESERVED_LOCK state 
immediately or on the first SQL statement inside the transaction. This 
should be done by acquiring only the reserve lock.
* When actual modification to the database is required (either at commit 
or page cache overflow time), acquire the pending lock exclusively, then 
acquire the writer lock and drop the pending lock. Writer locks must 
eventually be unlocked.
* All error cases must return to the NO_LOCK state unless recovery and 
continuing on in the current lifecycle is possible.

I believe this set of states and transitions has all the positive 
qualities associated with the current implementation. Additionally, it 
supports blocking locks and to my mind is simpler. Here are the sequences:
Reader :-
1. Reader acquires pending lock so one of no writer, reserved writer, or 
exclusive writer is true. No writer will not interfere, nor will reserved 
writer. Exclusive writer will finish writing and unlock eventually.
2. Reader acquires shared lock so one of no writer, or reserved writer is 
true. Read to your heart's content. Eventually reader will finish reading 
and unlock

Writer :-
1. Writer acquires reserved lock so readers may or may not be present (and 
will terminate eventually), and a writer in the exclusive state may be 
present (and will terminate eventually)
2. Writer acquires shared lock so readers may or may not be present (and 
will terminate eventually), and no other writer has a lock
3. Writer acquires pending lock so readers may or may not be present (and 
will terminate eventually), new readers will be blocked out (and not begin 
any read operations), and no other writer has a lock
4. Writer upgrades shared lock to exclusive lock so no readers are present 
and no writers are present. Party time.
5. Writer unlocks the reserved lock and pending lock to let the next 
writer get in line. That writer won't be able to become active until it 
obtains a shared lock and we are done.

Is there any chance of this locking model going in before the public 3.0 
release? If so, yippee for me. If not, are they compatible? Essentially 
the same locks are used, so I think they will all behave so long as 
blocking locks are not attempted with the current model. It's actually 
getting late, now. I've been reviewing the locking model for about four 
hours now to get to this point so I don't think I can spend any longer 
thinking about compatibility issues. Hopefully what I've written to this 
point is clear and correct.

Benjamin.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[sqlite] A quick code review, analysis of locking model, "fine-tuning" suggestions

Reply via email to