Re: [sqlite] Potential bug in crash-recovery code: unlink() and friends are not synchronous

Richard Hipp Tue, 21 May 2013 09:53:03 -0700

On Tue, May 21, 2013 at 12:04 PM, Thanumalayan Sankaranarayana Pillai <
madth...@cs.wisc.edu> wrote:


> Hi all,
>
> Did anyone look into this? I might be setting some config option wrong,
> so it would be great if you sent me a "you did something wrong" reply if
> you feel that I might have the wrong config (or might be doing something
> totally idiotic).
>

There are no configuration options set wrong.  You shouldn't ever have to
set configuration options in order to get SQLite to work right.   I think
instead you are just running up against fundamental limits of modern
consumer-grade hardware.  I don't think there is anything anybody can do
about it.


>
> I tested with a few other Linux machines and a few different SQLite
> versions, and found that there is an upto-5-second delay (the default
> filesystem commit interval) in the transactions getting durable (with power
> crashes, that is).
>
> Thanks in advance,
> Thanu
>
> On Sat, May 18, 2013 at 3:41 AM, thanumalayan mad <madth...@gmail.com
> >wrote:
>
> > Hi All,
> >
> > I was testing out SQLite with a framework I developed. I believe, while
> > running on Linux, transactions might not be durable when a power crash
> > happens immediately after a commit. I observed this using "SQLite version
> > 3.7.16.2 2013-04-12 11:52:43", and kernel "3.8.4-102.fc17.x86_64". Steps
> to
> > reproduce:
> >
> > 1. Use a Linux machine with an Ext4 filesystem (default mount options).
> > 2. Create a database file, set journal_mode to DELETE, perform a
> > transaction using "begin transaction ... commit;".
> > 3. Pull the power plug immediately after "commit;" returns.
> > 4. Put plug back in, power on the machine, open database file with
> SQLite,
> > and examine whether the transaction has been executed.
> >
> > Expected result: You always find that the transaction had been executed.
> > Observed result: You sometimes find that the transaction did not execute.
> > (To increase the chance that you end with the transaction not having
> > executed, create an Ext4 partition in an unmounted hard disk, mount the
> > partition with the "-o commit=30" mount option, and do the experiment
> with
> > the database in that partition).
> >
> > Of course, it's possible that the transactions are durable, and I'm just
> > being totally bonkers, or running SQLite in some really wrong way. If
> not,
> > I believe that the problem lies in SQLite not doing an fsync() on the
> > parent directory after unlink()-ing the rollback journal file. It seems
> > that, in Ext4, unlink() is atomic, but not synchronous: the rollback
> > journal can exist after rebooting if a power crash follows the unlink().
> > Also, ftruncate() doesn't seem to be synchronous either (non-durability
> > even if the journal_mode is set to TRUNCATE).
> >
> > I haven't examined how other filesystems treat unlink(). I haven't also
> > tested with other pragma options and other journal_modes. I expect it
> > wouldn't be a problem with WAL (it should be correct even if the log
> exists
> > after the reboot); however, with things like VACUUM-ing, the database
> size
> > would probably not be reduced if a power crash happened, because of
> > ftruncate() not being synchronous. More generally, I think some of the
> code
> > assumes that all non-write and non-create operations that modify a file
> (in
> > other words, unlink and ftruncate) are synchronous; but that's true in
> > Linux only if the partition is mounted with "-o dirsync". One more random
> > thought: if I'm not wrong, "SQLite crash-tests" currently tests
> > partial-flushing only with write() calls....
> >
> > Again, sorry for the spam if I'm just being bonkers.
> >
> > Thanks,
> > Thanu
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Potential bug in crash-recovery code: unlink() and friends are not synchronous

Reply via email to