Hi All, I was testing out SQLite with a framework I developed. I believe, while running on Linux, transactions might not be durable when a power crash happens immediately after a commit. I observed this using "SQLite version 3.7.16.2 2013-04-12 11:52:43", and kernel "3.8.4-102.fc17.x86_64". Steps to reproduce:
1. Use a Linux machine with an Ext4 filesystem (default mount options). 2. Create a database file, set journal_mode to DELETE, perform a transaction using "begin transaction ... commit;". 3. Pull the power plug immediately after "commit;" returns. 4. Put plug back in, power on the machine, open database file with SQLite, and examine whether the transaction has been executed. Expected result: You always find that the transaction had been executed. Observed result: You sometimes find that the transaction did not execute. (To increase the chance that you end with the transaction not having executed, create an Ext4 partition in an unmounted hard disk, mount the partition with the "-o commit=30" mount option, and do the experiment with the database in that partition). Of course, it's possible that the transactions are durable, and I'm just being totally bonkers, or running SQLite in some really wrong way. If not, I believe that the problem lies in SQLite not doing an fsync() on the parent directory after unlink()-ing the rollback journal file. It seems that, in Ext4, unlink() is atomic, but not synchronous: the rollback journal can exist after rebooting if a power crash follows the unlink(). Also, ftruncate() doesn't seem to be synchronous either (non-durability even if the journal_mode is set to TRUNCATE). I haven't examined how other filesystems treat unlink(). I haven't also tested with other pragma options and other journal_modes. I expect it wouldn't be a problem with WAL (it should be correct even if the log exists after the reboot); however, with things like VACUUM-ing, the database size would probably not be reduced if a power crash happened, because of ftruncate() not being synchronous. More generally, I think some of the code assumes that all non-write and non-create operations that modify a file (in other words, unlink and ftruncate) are synchronous; but that's true in Linux only if the partition is mounted with "-o dirsync". One more random thought: if I'm not wrong, "SQLite crash-tests" currently tests partial-flushing only with write() calls.... Again, sorry for the spam if I'm just being bonkers. Thanks, Thanu _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

