On 21/01/2019 18:46, andrew.g...@l3t.com wrote:
Okay, I put in some instrumentation.  Basically I print out all database 
queries as they happen, along with all calls to sqlite3OsLock() and 
sqlite3OsUnlock() (including their lockType argument and any abnormal return 
code).  Also I print out how many times sqlite3InvokeBusyHandler() has to call 
the busy handler.

For the moment, the solution that is working for me is to disable syncing with 
PRAGMA synchronous = OFF.  This is acceptable in this particular application 
because a power failure or OS crash will necessitate restarting the data 
gathering process anyway.  I'll explain later why this change helps.

In looking at the logs, I'm seeing several unlucky locking patterns.  Let's 
just pick one.  Process A gets starved by process B since B does many write 
transactions in a row, each time successfully getting an EXCLUSIVE lock.  When 
B calls fdatasync() (actually fsync()), it blocks while holding EXCLUSIVE.  A 
wakes up, but A can't immediately get a SHARED lock because B holds EXCLUSIVE.  
A goes to sleep while holding no locks, then B wakes up when the fsync() 
completes.  B then releases locks but grabs EXCLUSIVE again to complete its 
next transaction, and the cycle repeats.  A still can't get its SHARED lock, so 
it goes to sleep again, and then B continues to monopolize the lock.  This goes 
on long enough to exhaust A's patience, at which point SQLITE_BUSY propagates 
to the application.

Long ago, when I was at PowerFlex, on an OS long dead (Siemens-Nixdorf SINIX) using the MIPS RM series of processors, we saw a similar problem.

In that case it was a batch process locking out interactive users, but *only* on the top-of-the-line machine, the lower end, slower CPUs were fine.

One of S-N's finest engineers informed the customer that we "didn't know anything about file-locking on UNIX(TM) systems," which was slightly annoying, to say the least.

So we started into space, and we drew diagrams on the whiteboard, and I scribbled on my notepad, and stared at my locking code, and eventually decided that it was probably down to the batch process on the faster CPU just not giving up the lock when the interactive processes were ready to run. I.e. "Here's the lock... Oh, I've still got some time left, I'll hang on to it... My time has run out but I've still got the lock, nyah, nyah, nyah."

We added "sleep(0)" after the "unlock()" call.

The problem went away.

This may help you, it may not.

        Cheers,
                Gary    B-)
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to