I am not running on *ix, so I cannot test.however, your mentioning multi-cpu
machines, brings something to mind.
In windows, most synchronization functions rely on some form of INTERLOCKED
operation. Now, these operations are implemented differently on multi-cpu
(and hyperthreading counts as multi-cpu in this context), than on single-cpu
machines.
Namely, in the multi-cpu kernel, each interlocked operation is prefixed by a
bus LOCK instruction, whereas in the single-cpu kernel, it is prefixed by a
NOP. As a result, if multi-threaded code runs on a machine that has a
single-cpu kernel, but multiple-cpus, all hell breaks loose (it can happen,
due to a bad installation). Your problem is suspiciously similar.

here are some things to check for:

1. is your *ix kernel compiled for multi-cpu ? I think that linux requires a
special build of the kernel for multi-cpu machines.
2. is your sqlite code compiled with SQLITE_UNIX_THREADS enabled (and
THREADSAFE enabled) ? If not, the default
mechanism used for mutexes in os_unix.c will certainly FAIL on multi-cpu
machines, as it does not have the atomic LOCK prefixes.
If yes, then the functionality of the mutexes (sqlite3OsEnterMutex) depends
on how well the POSIX (or whatever) lock mechanism works. From the comments
on the beginning of the file, I suspect that locking in Linux is horribly
unstable...

So, I would first check that the mutex mechanism works as expected in your
environment before looking for a bug in the sqlite code (unless the default,
simplistic mutex mechanism in sqlite3OsEnterMutex counts as a bug)

-----Original Message-----
From: Eli Burke [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 12, 2005 6:34 PM
To: sqlite-users@sqlite.org
Subject: [sqlite] multiple thread concurrency problem with exclusive
transaction locks

I hate to beat on a tired horse (threads and db locking issues), but I am
running into what I believe is a bug as we scale up the number of threads in
our application. A little background: there is a main scheduler thread that
does most of the processing, and client threads to handle remote
connections. These clients don't touch the database very often, but on
occasion (in particular when they exit) they request an exclusive lock to
remove themselves from the database.

The problem that I see is that with multiple threads all attempting to
"BEGIN EXCLUSIVE", they will occasionally *all* fail, calling the busy
handler repeatedly until it finally returns SQL_BUSY. Let me re-state for
clarity's sake: 10 threads all try "BEGIN EXCLUSIVE" at the same time. One
succeeds, processes, and COMMITs. The other 9 will sometimes repeatedly call
the busy handler over and over until they fail with SQLITE_BUSY, even though
the database *should be* available to start a new exclusive transaction.

Reply via email to