On 05/27/2014 01:49 PM, Török Edwin wrote:
> On 05/27/2014 01:40 PM, Richard Hipp wrote:
>>> SQLite 3.8.4.3 still has the SQLITE_PROTOCOL bug with WAL databases [1],
>>> and I don't see anything about that in the changes list.
>>> Is it still a know bug in 3.8.5?
>>>
>>> I'm confused on whats the status of that bug.
>>> When I reported the rng fork issue to the ML it got fixed really quickly
>>> (thanks for that!), but when I asked [1] about the SQLITE_PROTOCOL
>>> thats been known since 3 years [2] it got ignored.
>>> Tried opening a ticket, and that one got deleted.
>>>
>>
>> Nothing has changed with SQLITE_PROTOCOL since 3.7.6.
>>
>> If you are getting PROTOCOL errors, then the thing to do is to describe the
>> circumstances under which the errors occur so that we can begin to try to
>> understand what is going wrong.
> 
> The circumstances are very similar to those described in this commit (in a 
> different application though):
> https://github.com/NixOS/nix/commit/d05bf044441dbf8e000036d545df9689bdec0b72
> 
> I'll create a testcase that illustrates my problem and get back to you.

Hi,

I made a little progress on the testcase: now I have a machine where 
SQLITE_PROTOCOL triggers at least once if I import large amounts of data 
overnight.
So I can test patches, but unfortunately I wasn't able to create a small 
self-contained testcase yet. I added some debugging code to sqlite3.c to know 
which WAL_RETRYs trigger (it collects them all and prints them when cnt>100):
http://gitweb.skylable.com/gitweb/?p=sx.git;a=blob_plain;f=3rdparty/libsqlite3/sqlite3.c;hb=HEAD

And this is what I got in my logs:

walTryBeginRead: WAL_RETRY exceeded
walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49
000 49000 49000 49000 49000 49000 49000 49000 49000
walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49
000 49000 49000 49000 49000 49000 49000 49000 49000
walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49
000 49000 49000 49000 49000 49000 49000 49000 49000
walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000
statement aborts at 33: [INSERT INTO topush (block, size, node) VALUES (:b, :s, 
:n)] locking protocol

So it looks like it triggers here all the time:
    rc = walLockShared(pWal, WAL_READ_LOCK(mxI));
    if( rc ){
      pWal->retry_locs[cnt] = __LINE__;// added for debugging
      return rc==SQLITE_BUSY ? WAL_RETRY : rc;
    }

Database size:
-rw-r--r-- 1 nobody nogroup  29M Jun  6 07:50 /mnt/spare/sx/xfers.db
-rw-r--r-- 1 nobody nogroup 224K Jun  6 07:50 /mnt/spare/sx/xfers.db-shm
-rw-r--r-- 1 nobody nogroup  25M Jun  6 07:50 /mnt/spare/sx/xfers.db-wal

Database schema:
CREATE TABLE hashfs (key TEXT NOT NULL PRIMARY KEY, value TEXT NOT NULL);
CREATE TABLE topush (id INTEGER NOT NULL PRIMARY KEY, block BLOB(20) NOT NULL, 
size INTEGER NOT NULL, node BLOB(16) NOT NULL, sched_time TEXT NOT NULL DEFAULT 
(strftime('%Y-%m-%d %H:%M:%f')), expiry_time TEXT NOT NULL DEFAULT 
(strftime('%Y-%m-%d %H:%M:%f', 'now', '900 seconds')), UNIQUE (block, size, 
node));
CREATE INDEX topush_sched ON topush(sched_time ASC, expiry_time);

WAL mode:
PRAGMA synchronous=NORMAL
PRAGMA journal_mode=WAL

busy timeout: 20s
there is also a WAL hook that does a passive checkpoint at 10000 pages, and a 
restart checkpoint at 20000 pages
(and a passive checkpoint at 5000 pages when idle), however I've been getting 
these sqlite_protocol errors
even when using sqlite's default checkpoints.

On another machine:
[3299] walTryBeginRead: WAL_RETRY exceeded
[3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3350] walTryBeginRead: WAL_RETRY exceeded
[3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3299] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000
[3299] statement aborts at 14: [SELECT blockno FROM blocks WHERE hash = :hash] 
locking protocol
[3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000 49000 49000 49000
[3350] walTryBeginRead WAL_RETRY locs: 49000 49000 49000 49000 49000 49000 
49000 49000 49000 49000 49000 49000 49000
[3350] statement aborts at 14: [SELECT blockno FROM blocks WHERE hash = :hash] 
locking protocol

Database size:
-rw-r--r-- 1 nobody nogroup 13M Jun  5 10:10 /mnt/spare/sx/hm00000008.db
-rw-r--r-- 1 nobody nogroup 96K Jun  6 06:25 /mnt/spare/sx/hm00000008.db-shm
-rw-r--r-- 1 nobody nogroup 11M Jun  6 06:25 /mnt/spare/sx/hm00000008.db-wal

Database schema:
CREATE TABLE hashfs (key TEXT NOT NULL PRIMARY KEY, value TEXT NOT NULL);
CREATE TABLE blocks (hash BLOB(20) NOT NULL PRIMARY KEY, blockno INTEGER NOT 
NULL);
CREATE TABLE avail (blocknumber INTEGER NOT NULL PRIMARY KEY ASC);

WAL mode:
PRAGMA synchronous=NORMAL
PRAGMA journal_mode=WAL

same busy timeout/wal hook
AFAICT there were no checkpoint running during those WAL_RETRY errors.

Some info about these two machines:
machine1: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64
machine2: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64

gcc version 4.7.2 (Debian 4.7.2-5)
sqlite 3.8.4.3 amalgamation, compile flags: -DSQLITE_OMIT_LOAD_EXTENSION 
-DSQLITE_THREADSAFE=0  -DHAVE_POSIX_FALLOCATE=1 -DHAVE_USLEEP=1 -g -O2
sqlite3_sourceid(): 2014-04-03 16:53:12 a611fa96c4a848614efe899130359c9f6fb889c3
sqlite3_compileoption_get(): OMIT_LOAD_EXTENSION SYSTEM_MALLOC THREADSAFE=0

The database is on a HDD:
/dev/mapper/vg0-indian01--spare on /mnt/spare type ext4 
(rw,relatime,user_xattr,barrier=1,data=ordered)
vg0 uses /dev/md1 as a physical volume, which is a RAID1 of 2 Seagate 
ST2000NM0033-9ZM175 drives.

Server memory usage now (unfortunately I don't have the exact values from when 
the WAL_RETRY happened):
KiB Mem:  49560152 total, 49065808 used,   494344 free,  9312792 buffers
KiB Swap: 16777212 total,  2888208 used, 13889004 free, 16245816 cached

Best regards,
--Edwin
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to