On Sat, Apr 13, 2013 at 11:30 PM, someone <newsbo...@gmail.com> wrote: > On 04/13/2013 01:39 PM, Chris Angelico wrote: >> Note that there's a caveat: You have to tell SQLite to be ACID >> compliant, effectively. > > > So, you're saying to me that by default SQLite isn't ACID compliant, if I > begin to use it in my own small programs? > ... > Do I understand you correct, that by "You have to tell SQLite to be ACID > compliant, effectively", you're saying that by default SQLite isn't ACID > compliant ? >
First off: I am NOT inherently familiar with sqlite. I'm more familiar with PostgreSQL, DB2, and MySQL. I'm also not an expert at database engine design, so this discussion is from the point of view of an applications developer who has used databases from his apps. True ACID compliance demands support at every level: 1) The application has to operate in logical units of work, which - apart from with DB2 - requires an explicit "BEGIN" query, or single-statement transactions. 2) The database engine must employ some form of write-ahead log. Different databases do this somewhat differently (according to the page I linked to, SQLite does this in reverse, maintaining a log that's sufficient to *undo* the transaction, while PostgreSQL does this forwards, maintaining a log that's sufficient to *redo* it as well - more effort, but it can be used for database replication), but one way or another, there must be a way to detect half-done transactions. 3) The operating system and filesystem must support a forced file synchronization (fsync/fdatasync), so the database engine can wait for the data to be written to disk. 4) The underlying media (hard disk, SSD, USB stick, etc) must respond to the fsync call by actually writing the content to persistent storage before returning. Failure at any level means the overall system is not ACID compliant. PostgreSQL has a huge amount of code in it to try to deal with (or at least recognize) a level-3 failure, but nothing in the database engine can deal with level 1 or 4 issues. You'd have to actually test it. The easiest way is to get two computers, side by side, and run the database engine on one and a monitor on the other. To test some SSDs at work, I knocked together a little program that worked somewhat thus: * Connect to the database over TCP/IP (easy, as we were doing this with PostgreSQL) * Create a table with a number of rows with an ID and a counter, initialized to 0 * Repeatedly, in parallel, perform a transaction: - Increment the counter on one of the rows (at random) - Increment a "possible" in-memory counter for that row - Commit the database transaction - Increment a "confirmed" in-memory counter for that row * When an error of "database seems to be down" is detected, wait for it to come up again, then query the table. The counters must all be at least their corresponding "possible" value and at most the "confirmed". With that running, I simply pulled the plug on the database computer. With a properly-configured hard disk, every one of the counters was within its correct range. With a lying SSD, though, they could be anywhere from "pretty close" (with a low workload - simulated by having only a single thread doing transactions and having it sleep for a few ms each iteration) to "pretty appalling" (with a bunch of threads spinning tightly, keeping the workload high). Once the SSD starts doing major write reordering, its throughput soars, but at the cost of trustworthiness. > Next question: Is it something I should worry about in my own programs (I'm > not sure, I'm an SQL noob)... ? Yes, it most certainly is. If you have any data that you care about, put together some kind of test that will allow you to literally pull the plug on the database, while still knowing whether or not your transaction was completed (so you'll most likely need some kind of "possible" / "confirmed" counter pair as I used above). ChrisA -- http://mail.python.org/mailman/listinfo/python-list