I am not running on *ix, so I cannot test.however, your mentioning multi-cpu machines, brings something to mind. In windows, most synchronization functions rely on some form of INTERLOCKED operation. Now, these operations are implemented differently on multi-cpu (and hyperthreading counts as multi-cpu in this context), than on single-cpu machines. Namely, in the multi-cpu kernel, each interlocked operation is prefixed by a bus LOCK instruction, whereas in the single-cpu kernel, it is prefixed by a NOP. As a result, if multi-threaded code runs on a machine that has a single-cpu kernel, but multiple-cpus, all hell breaks loose (it can happen, due to a bad installation). Your problem is suspiciously similar.
here are some things to check for: 1. is your *ix kernel compiled for multi-cpu ? I think that linux requires a special build of the kernel for multi-cpu machines. 2. is your sqlite code compiled with SQLITE_UNIX_THREADS enabled (and THREADSAFE enabled) ? If not, the default mechanism used for mutexes in os_unix.c will certainly FAIL on multi-cpu machines, as it does not have the atomic LOCK prefixes. If yes, then the functionality of the mutexes (sqlite3OsEnterMutex) depends on how well the POSIX (or whatever) lock mechanism works. From the comments on the beginning of the file, I suspect that locking in Linux is horribly unstable... So, I would first check that the mutex mechanism works as expected in your environment before looking for a bug in the sqlite code (unless the default, simplistic mutex mechanism in sqlite3OsEnterMutex counts as a bug) -----Original Message----- From: Eli Burke [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 12, 2005 6:34 PM To: sqlite-users@sqlite.org Subject: [sqlite] multiple thread concurrency problem with exclusive transaction locks I hate to beat on a tired horse (threads and db locking issues), but I am running into what I believe is a bug as we scale up the number of threads in our application. A little background: there is a main scheduler thread that does most of the processing, and client threads to handle remote connections. These clients don't touch the database very often, but on occasion (in particular when they exit) they request an exclusive lock to remove themselves from the database. The problem that I see is that with multiple threads all attempting to "BEGIN EXCLUSIVE", they will occasionally *all* fail, calling the busy handler repeatedly until it finally returns SQL_BUSY. Let me re-state for clarity's sake: 10 threads all try "BEGIN EXCLUSIVE" at the same time. One succeeds, processes, and COMMITs. The other 9 will sometimes repeatedly call the busy handler over and over until they fail with SQLITE_BUSY, even though the database *should be* available to start a new exclusive transaction.