On 21/01/2019 18:46, andrew.g...@l3t.com wrote:
Okay, I put in some instrumentation. Basically I print out all database
queries as they happen, along with all calls to sqlite3OsLock() and
sqlite3OsUnlock() (including their lockType argument and any abnormal return
code). Also I print out how many times sqlite3InvokeBusyHandler() has to call
the busy handler.
For the moment, the solution that is working for me is to disable syncing with
PRAGMA synchronous = OFF. This is acceptable in this particular application
because a power failure or OS crash will necessitate restarting the data
gathering process anyway. I'll explain later why this change helps.
In looking at the logs, I'm seeing several unlucky locking patterns. Let's
just pick one. Process A gets starved by process B since B does many write
transactions in a row, each time successfully getting an EXCLUSIVE lock. When
B calls fdatasync() (actually fsync()), it blocks while holding EXCLUSIVE. A
wakes up, but A can't immediately get a SHARED lock because B holds EXCLUSIVE.
A goes to sleep while holding no locks, then B wakes up when the fsync()
completes. B then releases locks but grabs EXCLUSIVE again to complete its
next transaction, and the cycle repeats. A still can't get its SHARED lock, so
it goes to sleep again, and then B continues to monopolize the lock. This goes
on long enough to exhaust A's patience, at which point SQLITE_BUSY propagates
to the application.
Long ago, when I was at PowerFlex, on an OS long dead (Siemens-Nixdorf
SINIX) using the MIPS RM series of processors, we saw a similar problem.
In that case it was a batch process locking out interactive users, but
*only* on the top-of-the-line machine, the lower end, slower CPUs were fine.
One of S-N's finest engineers informed the customer that we "didn't know
anything about file-locking on UNIX(TM) systems," which was slightly
annoying, to say the least.
So we started into space, and we drew diagrams on the whiteboard, and I
scribbled on my notepad, and stared at my locking code, and eventually
decided that it was probably down to the batch process on the faster CPU
just not giving up the lock when the interactive processes were ready to
run. I.e. "Here's the lock... Oh, I've still got some time left, I'll
hang on to it... My time has run out but I've still got the lock, nyah,
nyah, nyah."
We added "sleep(0)" after the "unlock()" call.
The problem went away.
This may help you, it may not.
Cheers,
Gary B-)
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users