Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

R Smith Tue, 15 Nov 2016 15:12:01 -0800


On 2016/11/15 10:34 PM, Jens Alfke wrote:

On Nov 15, 2016, at 10:57 AM, Simon Slavin <slav...@bigfraud.org> wrote:

sqlite> PRAGMA checkpoint_fullfsync;
1

I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value 
is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. 
(Contradicting the docs.)

So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, 
but not at other times that it fsyncs. What does that mean in actual use, 
assuming that I always use WAL mode? Is there still an opportunity for 
corruption in the face of power failures?

(Sorry to be frothing at the mouth about this; but my team’s dealing with a few 
users/customers whose apps encounter db corruption, on Android as well as 
macOS, and we’re getting really frustrated trying to figure out what’s going 
on.)

Quite OK to be unsettled by learning that a flaw in the system that youassumed did not exist, might be the cause of your troubles. I thinkhowever something is missing in the complete understanding, so to be clear:

Calling F_FULLFSYNC when checkpointing or otherwise invokes a contractbetween the running software (your system) and the Operating Systemwhereby the Operating system promises to A - put the current buffer'sworth of written data INTO the BUS feeding the writable media, AND B -then asking said media to confirm the writing has happened (committed)BEFORE handing back control (moving your thread pointer along). This isnot exactly the same for all OSes, but more or less similar.

This can slow down things, but sometimes the security is worth the pricepaid in latency. However, Hard drive manufacturers tend to sometimes lieabout having committed your data. It is a well known and almostuniversally used method in standard desktop / laptop computers for theharddrive to tell the OS that: "YES indeed, I have committed" when infact it is still piping data into the platters. Yes SSD's are better atthis by simple virtue of lower latency from buffer-to-silicone, but theyare not above lying either.

This means that unless you have a SERVER quality drive with typicallyits own battery-backup that guarantees ANY buffered writes to reach theplatters, there simply is zero guarantee that all writes WILL go todisk, and any normal system that guarantees it lies.

This does not mean however that you should be experiencing corruption.SQLite might not be able to guarantee all writes reaching the disk, butin most cases, if a final write did not happen, the usual last step incommitting a transaction is deleting / truncating a journal file orwriting a checkpoint marker or such, which, if it did not happen, shouldhave the entire write roll back (next time you open the DB) and leaveyou in a non-corrupt state. IF this does not happen it means a write mayhave happened out of order (not very common but can happen) or someother worse problem occurred - most importantly, FULL_FSYNC isn't thewild goose to be chasing. Whether or not any write happened is never anacceptable cause of corruption, so trying to wrestle with the thing thatpromises to make writes happen "more" as a causal relation to acorruption problem, is simply moot. (This is vigorously tested withevery release of SQLite too).

If you can get the DB files (journals and all) from such a system wherea user claims to be able to reproduce the corruption reliably, thatwould be an easy thing to check and the Devs here might learn somethingfrom it. You can simply make something that copy all the DB files beforeopening them at startup, until you have produced a corrupt DB, thenthose last copied files will be the corrupted DB files that can beinvestigated.


Good luck!
Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

Reply via email to