On 05/27/2014 01:49 PM, Török Edwin wrote: > On 05/27/2014 01:40 PM, Richard Hipp wrote: >>> SQLite 3.8.4.3 still has the SQLITE_PROTOCOL bug with WAL databases [1], >>> and I don't see anything about that in the changes list. >>> Is it still a know bug in 3.8.5? >>> >>> I'm confused on whats the status of that bug. >>> When I reported the rng fork issue to the ML it got fixed really quickly >>> (thanks for that!), but when I asked [1] about the SQLITE_PROTOCOL >>> thats been known since 3 years [2] it got ignored. >>> Tried opening a ticket, and that one got deleted. >>> >> >> Nothing has changed with SQLITE_PROTOCOL since 3.7.6. >> >> If you are getting PROTOCOL errors, then the thing to do is to describe the >> circumstances under which the errors occur so that we can begin to try to >> understand what is going wrong. > > The circumstances are very similar to those described in this commit (in a > different application though): > https://github.com/NixOS/nix/commit/d05bf044441dbf8e000036d545df9689bdec0b72 > > I'll create a testcase that illustrates my problem and get back to you.
Hi, I made a little progress on the testcase: now I have a machine where SQLITE_PROTOCOL triggers at least once if I import large amounts of data overnight. So I can test patches, but unfortunately I wasn't able to create a small self-contained testcase yet. I added some debugging code to sqlite3.c to know which WAL_RETRYs trigger (it collects them all and prints them when cnt>100): http://gitweb.skylable.com/gitweb/?p=sx.git;a=blob_plain;f=3rdparty/libsqlite3/sqlite3.c;hb=HEAD And this is what I got in my logs: walTryBeginRead: WAL_RETRY exceeded walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49 000 49000 49000 49000 49000 49000 49000 49000 49000 walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49 000 49000 49000 49000 49000 49000 49000 49000 49000 walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49 000 49000 49000 49000 49000 49000 49000 49000 49000 walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 statement aborts at 33: [INSERT INTO topush (block, size, node) VALUES (:b, :s, :n)] locking protocol So it looks like it triggers here all the time: rc = walLockShared(pWal, WAL_READ_LOCK(mxI)); if( rc ){ pWal->retry_locs[cnt] = __LINE__;// added for debugging return rc==SQLITE_BUSY ? WAL_RETRY : rc; } Database size: -rw-r--r-- 1 nobody nogroup 29M Jun 6 07:50 /mnt/spare/sx/xfers.db -rw-r--r-- 1 nobody nogroup 224K Jun 6 07:50 /mnt/spare/sx/xfers.db-shm -rw-r--r-- 1 nobody nogroup 25M Jun 6 07:50 /mnt/spare/sx/xfers.db-wal Database schema: CREATE TABLE hashfs (key TEXT NOT NULL PRIMARY KEY, value TEXT NOT NULL); CREATE TABLE topush (id INTEGER NOT NULL PRIMARY KEY, block BLOB(20) NOT NULL, size INTEGER NOT NULL, node BLOB(16) NOT NULL, sched_time TEXT NOT NULL DEFAULT (strftime('%Y-%m-%d %H:%M:%f')), expiry_time TEXT NOT NULL DEFAULT (strftime('%Y-%m-%d %H:%M:%f', 'now', '900 seconds')), UNIQUE (block, size, node)); CREATE INDEX topush_sched ON topush(sched_time ASC, expiry_time); WAL mode: PRAGMA synchronous=NORMAL PRAGMA journal_mode=WAL busy timeout: 20s there is also a WAL hook that does a passive checkpoint at 10000 pages, and a restart checkpoint at 20000 pages (and a passive checkpoint at 5000 pages when idle), however I've been getting these sqlite_protocol errors even when using sqlite's default checkpoints. On another machine: [3299] walTryBeginRead: WAL_RETRY exceeded [3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3350] walTryBeginRead: WAL_RETRY exceeded [3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3299] statement aborts at 14: [SELECT blockno FROM blocks WHERE hash = :hash] locking protocol [3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 [3350] statement aborts at 14: [SELECT blockno FROM blocks WHERE hash = :hash] locking protocol Database size: -rw-r--r-- 1 nobody nogroup 13M Jun 5 10:10 /mnt/spare/sx/hm00000008.db -rw-r--r-- 1 nobody nogroup 96K Jun 6 06:25 /mnt/spare/sx/hm00000008.db-shm -rw-r--r-- 1 nobody nogroup 11M Jun 6 06:25 /mnt/spare/sx/hm00000008.db-wal Database schema: CREATE TABLE hashfs (key TEXT NOT NULL PRIMARY KEY, value TEXT NOT NULL); CREATE TABLE blocks (hash BLOB(20) NOT NULL PRIMARY KEY, blockno INTEGER NOT NULL); CREATE TABLE avail (blocknumber INTEGER NOT NULL PRIMARY KEY ASC); WAL mode: PRAGMA synchronous=NORMAL PRAGMA journal_mode=WAL same busy timeout/wal hook AFAICT there were no checkpoint running during those WAL_RETRY errors. Some info about these two machines: machine1: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 machine2: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 gcc version 4.7.2 (Debian 4.7.2-5) sqlite 3.8.4.3 amalgamation, compile flags: -DSQLITE_OMIT_LOAD_EXTENSION -DSQLITE_THREADSAFE=0 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_USLEEP=1 -g -O2 sqlite3_sourceid(): 2014-04-03 16:53:12 a611fa96c4a848614efe899130359c9f6fb889c3 sqlite3_compileoption_get(): OMIT_LOAD_EXTENSION SYSTEM_MALLOC THREADSAFE=0 The database is on a HDD: /dev/mapper/vg0-indian01--spare on /mnt/spare type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered) vg0 uses /dev/md1 as a physical volume, which is a RAID1 of 2 Seagate ST2000NM0033-9ZM175 drives. Server memory usage now (unfortunately I don't have the exact values from when the WAL_RETRY happened): KiB Mem: 49560152 total, 49065808 used, 494344 free, 9312792 buffers KiB Swap: 16777212 total, 2888208 used, 13889004 free, 16245816 cached Best regards, --Edwin _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users