Re: [sqlite] Btree page corruption

2017-12-28 Thread Simon Slavin


On 29 Dec 2017, at 4:10am, Rowan Worth  wrote:

> do any of your processes open the database file, for any
> reason, without going through sqlite's API?

Just to note that a major offender in this respect is anti-virus software.  So 
don’t think just of things that might want to read a SQLite database, but think 
of anything that might read all files, or changed files.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-28 Thread Rowan Worth
On 28 December 2017 at 02:55, Simon Slavin  wrote:

> On 27 Dec 2017, at 6:10pm, Nikhil Deshpande  wrote:
>
> >> Can you include a "pragma integrity_check" at startup ?
> >> Can you include a "pragma integrity_check" executed at regular
> intervals ?
> > The writer process does "pragma quick_check" on every startup at init,
> > bails out on failure and spawns a separate thread to do same
> > "pragma quick_check" every 5 minutes (opens it's own separate DB handle
> > and closes it). Would changing quick_check to integrity_check be
> > helpful? (Would integrity_check catch such corruption earlier than
> > quick_check? Would it hold longer exclusive locks on the DB file
> > that could prevent reads?)
>
> Changing "quick_check" to "integrity_check" may help in this case may be
> useful in investigating your problem.  "integrity_check" is far more
> thorough and checks every aspect of data integrity from both directions.
> However, the type of corruption you’re experiencing may be one which is
> spotted just as well by "quick_check".  There’s no way to know without
> checking it.
>

I haven't checked the code but the docs say:

https://sqlite.org/pragma.html#pragma_quick_check

> The pragma is like integrity_check
 except that it does
not verify UNIQUE constraints and does
> not verify that index content matches table content. By skipping UNIQUE
and index consistency checks,
> quick_check is able to run much faster than integrity_check. Otherwise
the two pragmas are the same.

Based on my personal experience/testing, quick_check still reads the entire
DB. The only type of corruption I've ever seen integrity_check report which
quick_check didn't was an index lacking an entry for a particular ROWID (in
an otherwise well-formed DB). Coupled with Nikhil's tests I'm certain
quick_check will catch this type of corruption.

So the only scenario left to worry about is an index corruption (ie. one
only detected by integrity_check) happening first, which somehow leads to a
DB page corruption AND masks the original index corruption. Doesn't seem
likely, but switching to integrity_check would rule out any chance.

It does take longer than quick_check, but how much longer will depend on
how big your indices are. On a 2GB DB I've measured 200 seconds for
quick_check vs 300 seconds for integrity_check, but that was over NFS.

I don't think either check takes an EXCLUSIVE lock on the DB? But a
competing writer might, which ends up locking out readers until the check
finishes (and the write completes).


I agree with Simon you're not doing anything obviously wrong... This is
listed in the how to corrupt page, but I'll ask anyway because it's a
subtle one: do any of your processes open the database file, for any
reason, without going through sqlite's API?

-Rowan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-27 Thread Simon Slavin
On 27 Dec 2017, at 6:55pm, Simon Slavin  wrote:

> An alternative might be to run "integrity_check" on backup copies which don’t 
> show up anything on "quick_check".  This could be done without blocking the 
> production system.  If you never find anything then you know "quick_check" is 
> all you need. But it might spot corruption in some databases you don’t think 
> have been corrupted yet.

Just to be clear, SQLite database files can be corrupted in ways that don’t 
show up until a particular row of a particular index is used.  If a row pointer 
of a particular index becomes corrupt, and something tries to UPDATE that row, 
SQLite will corrupt the file even more, causing corruption to spread to the 
table, and to other indexes of that table.

This is why using a database without problems for months, then suddenly having 
a problem, does not mean that the database only just became corrupt.

The only way to check everything (at least, the developers hope it’s 
everything) is "PRAGMA integrity_check".

Note: the above is simplified for the sake of simplicity.  SQLite does a 
certain amount of internal checking.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-27 Thread Nikhil Deshpande

Thanks Richard for the response!

On 12/21/17 5:07 PM, Richard Hipp wrote:

On 12/21/17, Nikhil Deshpande  wrote:


There were no power-off or reboots in near time vicinity when the
corruption was detected.


(1) Might the corruption have been sitting dormant due to some far
away power-off or reboot and was only recently discovered?  How much
do you trust the fsync() system call and the filesystems ability to
honor that system call on the VM on which this code is running?  Do
you run "PRAGMA quick_check" on initial power-up?


The previous reboot was almost 4 days before the corruption was
detected by the reader process. Trawling through the logs for those
4 days showed no anomalies or different behavior (the writer was
functioning and inserting data as expected, writer did not detect
the corruption that the reader process ran into, our guess is that
the writer was not touching the corrupt btree pages, writing to some
other part of the DB file).


(2) Is this the only corruption instance you have seen?

We have seen such 4 instances on 4 different VMs in path 4 months.
The corruption symptom is same, btree page header lacking the correct
bits set.


(3) The showdb outputs you sent only shows me the first 8 bytes of the
page.  From those 8 bytes, it looks to be an overflow page that is
being used as a btree page.  Are there any further clues in the hex
dump of the bad pages?


Attaching full dump of the corrupt pages 46 and 3818.
This page 46 seems to contain reasonable strings (written by the app,
metadata about the timeseries tables), not garbage/arbitrary data.
Page 3818 (belongs to a timeseries table) seems to contain zeros
except at the head 16 bytes and tail end 16 bytes of the page.


(4) What does "showdb pgidx" look like?

Attaching "showdb pgidx" output, command runs fine (exiting with 0),
showing these 2 pages as corrupt:
   46: corrupt node [sm_stats_metadata], child 2 of page 2
 3818: root corrupt node [sm_stat_disk_latency_rate_dm-9]


(5) How many threads are accessing the database?  Are you sure that
you have the database set in a thread-safe mode?

There are total 3 threads in the writer process accessing the DB,
two threads do periodic "INSERT OR REPLACE (timestamp, val)" into
bunch of tables. The third thread runs "pragma quick_check"
periodically.

Logging sqlite3_threadsafe() output shows value 1 (serialized mode
build time setting) and sqlite3_db_mutex() on a db handle shows
a non-NULL value. The DB file is opened using sqlite3_open_v2()
with SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE flags, it does
not explicitly pass SQLITE_OPEN_FULLMUTEX flag (relies on the compile
time threading mode to be inherited, which is serialized threading
mode build time setting).

Thanks,
 Nikhil
Pagesize: 4096
Available pages: 1..16865
Page 46:   (offsets 0x2d000..0x2dfff)
 : 00 00 00 00 0b 01 25 00 0e a4 0d 43 0b dd 0a b0 ..%C
 0010: 09 7f 08 13 06 bc 05 73 04 00 02 86 01 25 00 00 ...s.%..
 0020: 00 65 00 65 00 65 00 00 00 00 00 00 00 00 00 00 .e.e.e..
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 82 71 81 4b .q.K
 0040: 05 59 69 84 3d 73 74 6f 72 61 67 65 2e 74 6f 74 .Yi.=storage.tot
 0050: 61 6c 73 69 7a 65 2e 64 69 72 65 63 74 6f 72 79 alsize.directory
 0060: 2e 76 63 64 62 5f 74 61 73 6b 73 73 6d 5f 73 74 .vcdb_taskssm_st
 0070: 61 74 5f 73 74 6f 72 61 67 65 5f 74 6f 74 61 6c at_storage_total
 0080: 73 69 7a 65 5f 64 69 72 65 63 74 6f 72 79 5f 76 size_directory_v
 0090: 63 64 62 5f 74 61 73 6b 73 63 61 74 65 67 6f 72 cdb_taskscategor
 00a0: 79 3d 63 6f 6d 2e 76 6d 77 61 72 65 2e 61 70 70 y=com.vmware.app
 00b0: 6c 6d 67 6d 74 2e 6d 6f 6e 2e 63 61 74 2e 73 74 lmgmt.mon.cat.st
 00c0: 6f 72 61 67 65 3b 75 6e 69 74 73 3d 63 6f 6d 2e orage;units=com.
 00d0: 76 6d 77 61 72 65 2e 61 70 70 6c 6d 67 6d 74 2e vmware.applmgmt.
 00e0: 6d 6f 6e 2e 75 6e 69 74 2e 6b 62 3b 69 6e 73 74 mon.unit.kb;inst
 00f0: 61 6e 63 65 3d 2f 73 74 6f 72 61 67 65 2f 73 65 ance=/storage/se
 0100: 61 74 2f 76 70 6f 73 74 67 72 65 73 2f 74 61 73 at/vpostgres/tas
 0110: 6b 74 62 6c 73 70 3b 82 76 82 1a 05 5b 6b 84 43 ktblsp;.v...[k.C
 0120: 73 74 6f 72 61 82 5d 82 68 05 55 65 84 1d 73 74 stora.].h.Ue..st
 0130: 6f 72 61 67 65 2e 75 73 65 64 2e 66 69 6c 65 73 orage.used.files
 0140: 79 73 74 65 6d 2e 69 6d 61 67 65 62 75 69 6c 64 ystem.imagebuild
 0150: 65 72 73 6d 5f 73 74 61 74 5f 73 74 6f 72 61 67 ersm_stat_storag
 0160: 65 5f 75 73 65 64 5f 66 69 6c 65 73 79 73 74 65 e_used_filesyste
 0170: 6d 5f 69 6d 61 67 65 62 75 69 6c 64 65 72 63 61 m_imagebuilderca
 0180: 74 65 67 6f 72 79 3d 63 6f 6d 2e 76 6d 77 61 72 tegory=com.vmwar
 0190: 65 2e 61 70 70 6c 6d 67 6d 74 2e 6d 6f 6e 2e 63 e.applmgmt.mon.c
 01a0: 61 74 2e 73 74 6f 72 61 67 65 3b 75 6e 69 74 73 at.storage;units
 01b0: 3d 63 6f 6d 2e 76 6d 77 61 72 65 2e 61 70 70 6c =com.vmware.appl
 01c0: 6d 67 6d 74 2e 6d 6f 6e 2e 75 6e 69 74 2e 6b 62 mgmt.mon.unit.kb
 01d0: 3b 69 6e 73 74 61 6e 63 65 3d 2f 73 74 6f 72 61 

Re: [sqlite] Btree page corruption

2017-12-27 Thread Simon Slavin


On 27 Dec 2017, at 6:10pm, Nikhil Deshpande  wrote:

>> Can you include a "pragma integrity_check" at startup ?
>> Can you include a "pragma integrity_check" executed at regular intervals ?
> The writer process does "pragma quick_check" on every startup at init,
> bails out on failure and spawns a separate thread to do same
> "pragma quick_check" every 5 minutes (opens it's own separate DB handle
> and closes it). Would changing quick_check to integrity_check be
> helpful? (Would integrity_check catch such corruption earlier than
> quick_check? Would it hold longer exclusive locks on the DB file
> that could prevent reads?)

Thanks Nikhil.

Had I noticed your email address I wouldn’t have asked picky questions about 
abnormal termination and VM management.

Your responses indicate that you are doing everything properly.  I don’t see 
anything there that makes me think that you’re missing a cause of corruption.  
However people who know more than I do about SQLite might spot something.

Changing "quick_check" to "integrity_check" may help in this case may be useful 
in investigating your problem.  "integrity_check" is far more thorough and 
checks every aspect of data integrity from both directions.  However, the type 
of corruption you’re experiencing may be one which is spotted just as well by 
"quick_check".  There’s no way to know without checking it.

But it does suffer from the problem you noted (a 65 Meg database would be 
locked for longer) and this might be unacceptable in your environment.  Or you 
might be able to do it just once an hour instead of every 5 minutes.

An alternative might be to run "integrity_check" on backup copies which don’t 
show up anything on "quick_check".  This could be done without blocking the 
production system.  If you never find anything then you know "quick_check" is 
all you need.  But it might spot corruption in some databases you don’t think 
have been corrupted yet.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-27 Thread Nikhil Deshpande

On 12/21/17 9:45 PM, Rowan Worth wrote:

Does either process take backups of the DB? If so, how is that implemented?


Thanks Rowan for the response!
Backup is done by a separate process through command:
   sqlite3 /path/to/db_file .dump > dump.sql
and not using the sqlite3 backup API.

Thanks,
 Nikhil
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-27 Thread Nikhil Deshpande

Thanks Simon for the response!

On 12/21/17 5:05 PM, Simon Slavin wrote:

When "pragma integrity_check" detects an error, does "PRAGMA quick_check" do 
too ?

Yes, both pragmas return same error.


Is your database file really about 65 Meg in size ?  Just roughly.

Yes:
$ du -sh *
66M appliance_stats.sqlite
32K appliance_stats.sqlite-shm
4.2Mappliance_stats.sqlite-wal

num_pages X page_size seem to match file system metadata, fsck has not
revealed any inconsistencies.


How is your application terminated ?  User action or self-determination ?

The writer process receives a SIGTERM and in that handler signals a
condition that is waited by all other threads (when not doing any work).
If the threads are doing any work, they finish it and then wait on the
condition. Threads are not abnormally terminated.
The writer process runs till the VM is shutdown.


Does your application call sqlite3_close() ?  Does it cbeck the code returned ?

Does your application call sqlite3_shutdown() ?  Does it cbeck the code 
returned ?

Yes, the return codes are logged and used as process exit code, our logs
do not show non-zero error codes/logs.


How does your VM get shutdown ?  Does it correctly shut down your application 
before it quits ?

VM shutdown is normal shutdown with systemd shutting down a simple
watchdog that forwards shutdown to the SQLite DB writer process.


Can you include a "pragma integrity_check" at startup ?
Can you include a "pragma integrity_check" executed at regular intervals ?

The writer process does "pragma quick_check" on every startup at init,
bails out on failure and spawns a separate thread to do same
"pragma quick_check" every 5 minutes (opens it's own separate DB handle
and closes it). Would changing quick_check to integrity_check be
helpful? (Would integrity_check catch such corruption earlier than
quick_check? Would it hold longer exclusive locks on the DB file
that could prevent reads?)


Thanks,
 Nikhil
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-21 Thread Rowan Worth
Does either process take backups of the DB? If so, how is that implemented?
-Rowan

On 22 December 2017 at 05:47, Nikhil Deshpande 
wrote:

> Hi,
>
> We have an application that in a Linux VM that's running into
> SQLite DB corruption (after weeks and months of running,
> 4 such instances yet in different VMs).
>
> We would appreciate some help in debugging this further to identify
> source of corruption!
>
> Symptom is btree page corruption, e.g.
>
> > $ sqlite3 stats.sqlite "pragma integrity_check;"
> > *** in database main ***
> > Page 3818: btreeInitPage() returns error code 11
> > Page 46: btreeInitPage() returns error code 11
> > Error: database disk image is malformed
> (Same error is raised for SELECT queries too.)
>
> There were no power-off or reboots in near time vicinity when the
> corruption was detected. We have poured over this document
> https://sqlite.org/howtocorrupt.html
> many times to check if any of the conditions could apply,
> but so far no leads yet.
>
> We have also been unable to reproduce the corruption by stressing
> application's SQLite DB read/write code paths for a week.
>
> I'm attaching showdb output for the DB header and 2 corrupt pages
> if it's of any hint.
>
> ---
>
> A bit more application setup context/information:
>
> - Linux kernel 4.4.41
> - glibc 2.22
> - Ext4 file system, mounted as (rw,relatime,data=ordered).
>
> - Writer C++ process: sqlite-3.17
>   - Creates a set of "time series" tables, each table has 2 numeric
> columns (timestamp, int) during initialization.
>   - Every 1 minute, 2 threads will do total 15 writes. (using "INSERT OR
> REPLACE ... (timestamp, int)" SQL into 15 tables).
>   - SQLite DB opened with SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
> flags, initialized with "PRAGMA journal_mode=wal;", threading mode
> is Serialized for the libsqlite build, uses default VFS ("unix").
> All other config params are default (e.g. autovacuum is disabled
> etc.).
>   - A separate thread runs "PRAGMA quick_check;" periodically every 5
> minutes, in its own separate DB connection.
> - Reader process: sqlite-3.11 + Python 2.7.11
>   - Periodically reads time series tables for a given timestamp range
> (usually latest 5 minutes) using SELECT queries (no INSERT/UPDATE/
> DELETE from this process).
>   - Uses same same "PRAGMA journal_mode=wal", uses the sqlite3 DBAPI
> module from Python standard library.
> Apart from above 2, no other processes are accessing the SQLite DB file.
>
> We have updated both the reader and writer to use latest SQLite 3.21,
> but without understanding the cause of corruption, we are unable to
> say if this update to latest 3.21 would indeed prevent further
> occurrences.
>
> Thanks,
>  Nikhil
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-21 Thread Richard Hipp
On 12/21/17, Nikhil Deshpande  wrote:
>
> There were no power-off or reboots in near time vicinity when the
> corruption was detected.

(1) Might the corruption have been sitting dormant due to some far
away power-off or reboot and was only recently discovered?  How much
do you trust the fsync() system call and the filesystems ability to
honor that system call on the VM on which this code is running?  Do
you run "PRAGMA quick_check" on initial power-up?

(2) Is this the only corruption instance you have seen?

(3) The showdb outputs you sent only shows me the first 8 bytes of the
page.  From those 8 bytes, it looks to be an overflow page that is
being used as a btree page.  Are there any further clues in the hex
dump of the bad pages?

(4) What does "showdb pgidx" look like?

(5) How many threads are accessing the database?  Are you sure that
you have the database set in a thread-safe mode?


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Btree page corruption

2017-12-21 Thread Simon Slavin
On 21 Dec 2017, at 9:47pm, Nikhil Deshpande  wrote:

> We have an application that in a Linux VM that's running into
> SQLite DB corruption (after weeks and months of running,
> 4 such instances yet in different VMs).
> 
> [snip]
> 
> There were no power-off or reboots in near time vicinity when the
> corruption was detected. We have poured over this document
> https://sqlite.org/howtocorrupt.html
> many times to check if any of the conditions could apply,
> but so far no leads yet.
> 
> We have also been unable to reproduce the corruption by stressing
> application's SQLite DB read/write code paths for a week.

Well done for finding the appropriate web page.  You’ve done all the easy stuff 
which leaves us with just the hard questions.  These won’t necessarily give us 
an answer, but they might remind one of your readers of something.

When "pragma integrity_check" detects an error, does "PRAGMA quick_check" do 
too ?

Is your database file really about 65 Meg in size ?  Just roughly.

How is your application terminated ?  User action or self-determination ?

Does your application call sqlite3_close() ?  Does it cbeck the code returned ?

Does your application call sqlite3_shutdown() ?  Does it cbeck the code 
returned ?

How does your VM get shutdown ?  Does it correctly shut down your application 
before it quits ?

Can you include a "pragma integrity_check" at startup ?
Can you include a "pragma integrity_check" executed at regular intervals ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users