On 2 July 2018 at 23:32, Charles Samuels <char...@derkarl.org> wrote:
> I have found that when my process has a lot of threads each of which opens > a > DIFFERENT database, they each block on eachother while opening each > database. > > This is at least on conflict with the documentation, as the documentation > suggests that a thread is a process for the purposes of sqlite's > concurrency, Actually I believe the mutex acquisition here is to provide those semantics, because unix file locks are necessarily bound to a process and not a thread. So sqlite must manage some of the per-thread details itself to correctly handle the case where multiple threads are using the same DB, possibly with different locking levels, and the mutex protects those shared structures. Intuitively it seems like a mutex held on the sqlite3_file structure rather than a global mutex could be used for this purpose which would prevent unrelated databases from contesting each other, however... > I also feel like it's a bug in > general, as opening two unrelated databases should not block eachother. > ... unix is something of a complex beast and the question of whether two databases are unrelated is not as simple as it may seem. You might have two databases /tmp/db and /var/tmp/db, but what if /var/tmp is a symlink to /tmp? Or what if the two db files are hardlinks of the same inode? Identifying multiple opens of the same inode is the other problem the mutex is helping to solve here. And it's a detail sqlite _must_ be aware of because of the bizarre POSIX locking semantics where closing a file descriptor relinquishes all locks associated with its inode, across the entire process. So if sqlite isn't careful about this, corruption can easily result. I'm not saying there isn't room for improvement here, but it's not a simple problem! Each thread blocks at this point: > > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86 > _64/lowlevellock.S:135 > #1 0x00007ffff74e4b95 in __GI___pthread_mutex_lock (mutex=0x7ffff7dd8148 > <staticMutexes.10049+360>) > at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007ffff7b26419 in unixEnterMutex () at sqlite3.c:31952 > #3 unixLock (id=0x7ffff4a25180, eFileLock=1) at sqlite3.c:32894 > #4 0x00007ffff7b1f2fa in sqlite3OsLock (lockType=1, id=<optimized out>) > at > sqlite3.c:21299 > #5 pagerLockDb (pPager=pPager@entry=0x7ffff4a25008, eLock=eLock@entry=1) > at > sqlite3.c:50293 > #6 0x00007ffff7b1f34b in pagerLockDb (eLock=1, pPager=0x7ffff4a25008) at > sqlite3.c:53054 > #7 pager_wait_on_lock (pPager=pPager@entry=0x7ffff4a25008, > locktype=locktype@entry=1) at sqlite3.c:53051 > #8 0x00007ffff7b55d6c in sqlite3PagerSharedLock (pPager=0x7ffff4a25008) > at > sqlite3.c:54293 > #9 0x00007ffff7b56835 in lockBtree (pBt=0x7ffff4a151e8) at sqlite3.c:64591 > #10 sqlite3BtreeBeginTrans (p=0x7ffff4a1b508, wrflag=wrflag@entry=0) at > sqlite3.c:64956 > #11 0x00007ffff7b82997 in sqlite3InitOne (db=0x7ffff4a12008, iDb=iDb@entry=0, > > pzErrMsg=pzErrMsg@entry=0x7ffff51fe778) at sqlite3.c:119558 > #12 0x00007ffff7b82aca in sqlite3Init (db=0x7ffff4a12008, > pzErrMsg=pzErrMsg@entry=0x7ffff51fe778) > at sqlite3.c:119740 > #13 0x00007ffff7b82b00 in sqlite3ReadSchema > (pParse=pParse@entry=0x7ffff51fe770) at sqlite3.c:119765 > #14 0x00007ffff7b8d8e4 in sqlite3Pragma (pParse=0x7ffff51fe770, > pId1=pId1@entry=0x7ffff51fddd0, > pId2=pId2@entry=0x7ffff51fdde8, pValue=pValue@entry=0x7ffff51fde18, > minusFlag=minusFlag@entry=0) > at sqlite3.c:117300 > Hm, this backtrace doesn't seem to have anything to do with WAL mode, so I'm not sure why using the rollback journal makes for a 5 time speedup. Perhaps because the same mutex is used to protect shared-memory accesses? sqlite3OSLock really doesn't do much with the mutex held; is there another thread holding things up via a different codepath? -Rowan _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users