On 11/25/2015 05:37 PM, Richard Hipp wrote:
> On 11/25/15, T?r?k Edwin <edwin+sqlite3 at etorok.net> wrote:
>> Hi,
>>
>> A user of our application (CC-ed) reported a corruption in an SQLite DB on
>> FreeBSD (see below). There was no mmap used in this case, just a WAL DB with
>> PRAGMA SYNCHRONOUS=NORMAL.
>> From my reading of the SQLite docs this should not result in a corrupt DB,
>> worst case some COMMITs could be lost/missing.
>>
> 
> The database file "temp.db" and the WAL file "temp.db-wal" appear
> unrelated to one another.  It as if they came from different places,
> or from different points in time.
> 
> The database file contains 272 pages, but only the first 8 pages are
> used for content.  The final 264 pages of the file are all "freelist"
> pages, ready to be reused.
> 
> The WAL file contains 15 separate transactions.  But every single one
> of those transactions says that the database is only 24 pages long.
> Furthermore, 12 of the 15 transactions (including the first two and
> the last one) only touch pages that are on the freelist of the
> database, which is not really possible.

Hmm after the WAL is recovered the DB is indeed 24 pages long:
-rw-r--r-- 1 edwin edwin 24K Nov 25 17:47 temp.db

Here are the original files on the original server:
# stat /usr/local/sxserver/lib/sxserver/storage
79 244829 drwxr-x--- 2 nobody nobody 486168 7168 "Jul 23 17:24:08 2015" "Nov 25 
12:57:30 2015" "Nov 25 12:57:30 2015" "Jul 23 17:24:08 2015" 32768 16 0 
/usr/local/sxserver/lib/sxserver/storage

# stat /usr/local/sxserver/lib/sxserver/storage/temp.db
79 244943 -rw-r--r-- 1 nobody nobody 526624 278528 "Nov 25 13:57:34 2015" "Oct 
13 15:30:47 2015" "Oct 13 15:30:47 2015" "Jul 23 17:24:09 2015" 32768 544 0 
/usr/local/sxserver/lib/sxserver/storage/temp.db

# stat /usr/local/sxserver/lib/sxserver/storage/temp.db-wal
79 241313 -rw-r--r-- 1 nobody nobody 495152 36712 "Nov 25 13:57:34 2015" "Nov 
25 13:51:49 2015" "Nov 25 13:51:49 2015" "Nov 25 12:57:29 2015" 32768 72 0 
/usr/local/sxserver/lib/sxserver/storage/temp.db-wal

It looks like the timestamps are ordered from oldest to newest: temp.db, 
<directory>, temp.db-wal.

We do use SQLITE_CHECKPOINT_PASSIVE and SQLITE_CHECKPOINT_RESTART when certain 
thresholds or timer trigger, what happens if during a passive checkpoint it 
doesn't sync the DB?
"Checkpoint as many frames as possible without waiting for any database readers 
or writers to finish, then sync the database file if all frames in the log were 
checkpointed."
Would it still update the size of the .db file without syncing the parent dir?

> 
> I don't know how this could come about.  It's like you took a random
> database file and a random WAL file and paired them together by giving
> them similar names.
> 
> Do you have any additional information about how this problem might have 
> arisen?

Power was lost, we're doing some tests to reproduce the problem.

The I/O path is more complicated than I originally thought though:
FreeBSD server (running our app with SQLite) with UFS+SU -> Linux (Xen host) -> 
FreeBSD (ISCSI + ZFS)

Thanks for the quick response,
-- 
Edwin T?r?k | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com

Reply via email to