G'day, I've been mulling over this for a little while, but only got around to writing anything down last night. I may have made some errors, perhaps serious ones. Either way, I thought I'd just put it out there. This email contains an algorithm for permitting concurrent reader and writer access to an sqlite database. The algorithm has the following characteristics: - File-based locking (rather than page or row-level locking) is retained - Semantics of current operations remains the same, as far as I can tell - A change to the sqlite file format would be required - The solution is (probably) not backwards-compatible - Multiple readers can access the database at the same time - A single writer can prepare changes to the database while readers are still reading - The writer locks out all readers and other writers while it performs a commit operation - Blocking locks would be supported (and may be required in some places) - Lock periods are short, even when transactions are held open for a long time. This may be blocking locks are more appropriate than they were previously. A process that holds a database open for a second at a time to improve throughput will only block readers on the second boundary, although it will still block other writers for the full second duration. - Inconsistency only occurs if a writer fails to apply its transaction to the main at commit time, not if it fails while the transaction is still open. - More synching may be proposed than in the current algorithm, but I'm pretty picky about synchs - The journal file is used for writer locking contention, so can't be erased at the end of each transaction. It can be erased safely only when there are no writers currently waiting. - The algorithm doesn't allow modification of main file (apart from journal-based recovery) until the commit phase of the writer algorithm.
Reader algorithm: open main file read-lock main file Check for inconsistency marker in main file If inconsistent release main file read-lock[1a] open journal file write-lock journal file write-lock main file[1b] perform journal playback routine[2] sync main file clear inconsistency marker in main file sync main file truncate journal file release journal file write-lock downgrade main file write-lock to read-lock [3] perform queries flush block cache release main file read-lock Notes: [1] Don't upgrade read lock to write lock, otherwise deadlock may occur for blocking lock. Never attempt to lock the journal file while you have a lock on the main file. [2] May be a no-op as a process who got the write-lock first may have already performed the playback [3] Consistency is now guaranteed, and a valid read-lock exists on the main file Writer algorithm[1]: open journal file write-lock journal file write-lock main file[2a] Check for inconsistency marker in main file If inconsistent perform journal playback routine sync main file clear inconsistency marker in main file sync main file downgrade main file write-lock to read-lock[2b] truncate journal [3] perform updates in-memory and in-journal only. Do not modify main file. Commit: sync journal file upgrade main file read-lock to write-lock write inconsistency marker into main file[4] sync main file write changes into main file sync main file clear inconsistency marker for main file sync main file[5] truncate journal file[6] Notes: [1] To avoid deadlock, a reader should never attempt to "become" a writer. The reader must first drop its read lock (with associated cache flush), and then begin the Writer algorithm steps. [2] Write-lock main instead of read-lock as a precaution. If we read-lock it, readers could open the database concurrently, see the same inconsistency as we do, and get caught behind our transaction as writers trying to resolve it. With a write-lock up-front until we're sure no inconsistency exists we get to avoid this degenerate case. This ends up meaning a writer briefly locks the main file twice: once at the beginning of the transaction, and once at the end. [3] At this point the writer has a write lock on the journal, a read lock on the database, and the journal is empty. No inconsistency exists in the database. [4] The use of an inconsistency marker would be the main incompatability between proposed and existing sqlite implementations. I see this most probably as an inconspicuous boolean that could sneak into an sqlite database header. [5] Why so many syncs? Well, to be safe. We want to know all the data we might roll back in to the main is written to disk before we record the fact that that roll-back has to occur. We want to know that the inconsistency is noted before we start modifying the database, otherwise the modifications could be written but not the inconsistency record. We want to know the the data has been fully updated before we clear the inconsistency marker, and finally we want to know that the consistency marker is clear before we clear out the journal or allow the next writer to start modifying it. This may be excessive, and I think its more detail than the current sqlite goes into, but I can't see any other way around it. The current sqlite should probably be performing similar procedures for handling its journal and its directory entry. [6] No need to sync, since all inconsistency checks hinge off the inconsistency marker in the main file So, now that it's out there... comments? I think it addresses the most commonly expressed issue in the not-so-recent thread on this subject, which is concurrent read with a writer that has long(ish) transactions. It also deals with my own issues regarding blocking locks. You might have to put some more thought into it to work out how to do non-blocking locks in these algorithms (I haven't described any abort paths), but again I don't really think they're necessary in the general case given the algorithms outlined. The most important place to have non-blocking locks is probably the initial lock of the journal by the writer algorithm. That's the only lock that might be held up for a long time (i.e. longer than it takes to commit a transaction). That's the place it could be held-up behind another writer with a long transaction. I guess I would propose that that be the only place where non-blocking locks could be used and that blocking locks still be used there by default. Benjamin. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]