Thanks Quan Yhong and Stephen. Today I did more experiment and finally narrowed
down the cause (but not the root cause) of the problem.
Under the following situation the in-memory source database and the NFS disk
destination database will meet deadlock across 2 threads:
* Thread 1 (T1) creates an in-memory database with a dir-path-like database
name, such as “/NFS/some_dir/database” (/NFS/some_dir/ exists)
* std::string m_tmpFileName =
"file:/NFS/some_dir/database?mode=memory&&cache=shared”;
* int rv = sqlite3_open_v2(m_tmpFileName.c_str(), &m_sqlObj,
SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_SHAREDCACHE |
SQLITE_OPEN_URI, NULL);
* Thread 2 (T2) creates a NFS disk database with exactly the same
dir-path-like database name - “/NFS/some_dir/database”
* std::string m_fileNameOnNFS = "/NFS/some_dir/database”;
* int rc = sqlite3_open(m_fileNameOnNFS.c_str(), &m_sqlBackupDestObj);
This is the event sequence in my scenario when deadlock is reproduced:
1. T1 is writing to the in-memory db. Meanwhile, T2 is trying to back up the
in-memory db to NFS db but fails due to in-memory db is busy (sqlite3_back_step
returns SQLITE_BUSY).
2. T1 finishes the writing.
3. T2 retries the back-up. It acquires the in-memory db’s mutex and Btree’s
mutex. Then acquires the NFS db’s mutex. But finally fails to acquire the NFS
db’s Btree’s lock somehow (probably due to the 2 databases share the same name)
4. T1 tries to write to in-memory db again. However, it fails to acquire the
mutex since T2 has acquired it in step #3. Now, both threads cannot go ahead.
It looks sqlite does not handle the locking well for this case where both
databases use the same dir-path-like database name. I have tried that if the
name does not include any slashes (“/“) there would be no problem.
Thanks
From: Quan Yong Zhai <[email protected]<mailto:[email protected]>>
Date: Tuesday, July 19, 2016 at 23:56
To: Yihong Zhan <[email protected]<mailto:[email protected]>>,
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: RE: [sqlite] In what case will the online backup api hang for
acquiring the Btree lock of the destination on-disk file?
https://www.sqlite.org/faq.html#q5<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.sqlite.org_faq.html-23q5&d=CwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=hWCczoV-_c3EeFxFPpde-lvcji311JR9aqXM5bT-Au8&m=Zz6_m7HMAnSuoPYhyhHzjB8USuFBhD8fjBYZtz7eNDc&s=vCz3acNtAD1M-d-CtQ3o-vSyJEdCE3Jv9W2uhhEu6ds&e=>
“But use caution: this locking mechanism might not work correctly if the
database file is kept on an NFS filesystem. This is because fcntl() file
locking is broken on many NFS implementations. You should avoid putting SQLite
database files on NFS if multiple processes might try to access the file at the
same time.”
Sent from
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986&d=CwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=hWCczoV-_c3EeFxFPpde-lvcji311JR9aqXM5bT-Au8&m=Zz6_m7HMAnSuoPYhyhHzjB8USuFBhD8fjBYZtz7eNDc&s=npAE2j3TT1TbjUwOqt4dkwBdBs_usUwMILcjsYd1nCk&e=>
for Windows 10
From: Yihong Zhan<mailto:[email protected]>
Sent: 2016年7月19日 19:44
To:
[email protected]<mailto:[email protected]>
Subject: [sqlite] In what case will the online backup api hang for acquiring
the Btree lock of the destination on-disk file?
Hi sqlite experts,
I am currently incorporating the sqlite online backup API (version 3.8.0.2)
into our product. I meet a tricky problem, where the sqlite3_backup_step hangs
at acquiring the BTree’s lock of the destination file.
Specifically, I am backing up in-memory database to NFS disk file. The
in-memory database is manipulated in the main thread (T1) while the back-up
runs in another thread (T2). There are only 2 threads in the process and there
is no other process existing in our product environment.
The stack trace I got in the T2 when the hanging happens is as following:
#0 0x0000003de740d654 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003de7408f4a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x0000003de7408e0c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaaaacc20f6 in pthreadMutexEnter (p=0x23084300) at sqlite3.c:18299
#4 0x00002aaaaacc1f31 in sqlite3_mutex_enter (p=0x23084300) at sqlite3.c:17812
#5 0x00002aaaaacd9e4f in lockBtreeMutex (p=0x2aaad8001040) at sqlite3.c:49323
#6 0x00002aaaaacd9f49 in sqlite3BtreeEnter (p=0x2aaad8001040) at
sqlite3.c:49410
#7 0x00002aaaaacdd970 in sqlite3BtreeBeginTrans (p=0x2aaad8001040, wrflag=2)
at sqlite3.c:52240
#8 0x00002aaaaace71b7 in sqlite3_backup_step (p=0x2aaad8000fc0, nPage=50) at
sqlite3.c:58451
#9 0x00002aaace885b37 in ResDB::backup (this=0x2314c3f0, numBlocks=10,
numPagesPerBlock=50, blockIntervalInMillisec=50) at dbEngine.cpp:659
#10 0x00002aaace977373 in ResDBBackupManager::process (this=0x23030880) at
dbBackupManager.cpp:53
#11 0x00002aaace976e3a in DBBackupEventHandler::event (this=0x23040870,
event=0x2aaad8000990) at dbBackupEventHandler.cpp:35
The sqlite3_backup_step hangs at the very first call and it cannot go any
further so that nothing is actually back-uped and the file on-disk is still
empty, which disappoints me very much.
-rw-rw-r-- 1 yhzhan cadence3 0 2016-07-19 06:59 Interactive.3.rdb
Take a look at the code of sqlite3_backup_step. It looks there are 2 locks for
both source and destination database. One is for the database itself and the
other is for the tables (Btree). The flow hangs at sqlite3BtreeBeginTrans
SQLITE_API int sqlite3_backup_step(sqlite3_backup *p, int nPage){
sqlite3_mutex_enter(p->pSrcDb->mutex);
sqlite3BtreeEnter(p->pSrc);
if( p->pDestDb ){
sqlite3_mutex_enter(p->pDestDb->mutex);
}
……
/* Lock the destination database, if it is not locked already. */
if( SQLITE_OK==rc && p->bDestLocked==0
&& SQLITE_OK==(rc = sqlite3BtreeBeginTrans(p->pDest, 2))
){
p->bDestLocked = 1;
sqlite3BtreeGetMeta(p->pDest, BTREE_SCHEMA_VERSION, &p->iDestSchema);
}
Also tried the latest version of sqlite 3.13.0.0, the problem remains.
My back-upcode is quite simple, almost the duplicate of online backup api
example
https://www.sqlite.org/backup.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.sqlite.org_backup.html&d=CwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=hWCczoV-_c3EeFxFPpde-lvcji311JR9aqXM5bT-Au8&m=Zz6_m7HMAnSuoPYhyhHzjB8USuFBhD8fjBYZtz7eNDc&s=oIQVENv_MuC6W604iN4cAgM-Wb6P0fl0ctt76smyr2Q&e=>
bool ResDB::backup(int numBlocks, int numPagesPerBlock, int
blockIntervalInMillisec)
{
if (!m_sqlBackupDestObj) {
int rc = sqlite3_open(m_fileNameOnNFS.c_str(), &m_sqlBackupDestObj);
Q_ASSERT(rc == SQLITE_OK);
}
if (!m_backupHandle) {
m_backupHandle = sqlite3_backup_init(m_sqlBackupDestObj, "main",
m_sqlObj, "main");
Q_ASSERT(m_backupHandle);
}
bool finished = false;
int i = 0;
while (!finished && i < numBlocks) {
while (true) {
int rc = sqlite3_backup_step(m_backupHandle, numPagesPerBlock);
if (rc == SQLITE_OK) {
sqlite3_sleep(blockIntervalInMillisec);
++i;
break;
} else if (rc == SQLITE_DONE) {
finished = true;
break;
} else if (rc == SQLITE_LOCKED) {
sqlite3_sleep(blockIntervalInMillisec);
continue;
} else if (rc == SQLITE_BUSY) {
sqlite3_sleep(blockIntervalInMillisec);
continue;
} else {
finished = true;
Q_ASSERT(false);
break;
}
}
}
if (finished) {
sqlite3_backup_finish(m_backupHandle);
m_backupHandle = NULL;
sqlite3_close(m_sqlBackupDestObj);
m_sqlBackupDestObj = NULL;
}
return finished;
}
So I am wondering what factor will result in lockBtreeMutex hang? I am not sure
whether there is any misuse of the online backup api. Currently I am clueless.
Do you have any suggestions?
_______________________________________________
sqlite-users mailing list
[email protected]<mailto:[email protected]>
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users<https://urldefense.proofpoint.com/v2/url?u=http-3A__mailinglists.sqlite.org_cgi-2Dbin_mailman_listinfo_sqlite-2Dusers&d=CwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=hWCczoV-_c3EeFxFPpde-lvcji311JR9aqXM5bT-Au8&m=Zz6_m7HMAnSuoPYhyhHzjB8USuFBhD8fjBYZtz7eNDc&s=bJMXiHJ4JeSCgTOnv_U6NQmt27qPCSnAdNMGIMKrPt4&e=>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users