[sqlite] Bug: Successfully committed transaction rolled back after power failure

Yannick Duchêne Tue, 2 Feb 2016 22:57:49 +0100

On Thu, 28 Jan 2016 14:55:28 +0000
Simon Slavin <slavins at bigfraud.org> wrote:

> 
> On 28 Jan 2016, at 1:38pm, Bernard McNeill <bm.email01 at gmail.com> wrote:
> 
> > ===
> > Like the user reading ?saving OK? and throwing away the
> > Post-It with the original information
> > ===
> > 
> > This is exactly my concern.
> > The user throwing away the Post-It is entirely reasonable if he sees a
> > message like that.
> > 
> > Do you happen to know if Linux/Debian (which I think uses a journalling
> > filesystem) carries this risk?
> 
> The problem is not at the software level.  Various operating systems and file 
> systems are correctly programmed with regard to waiting for write commands to 
> complete.  I don't know specifically about Debian but Linux has a good 
> reputation for such things, and anyone who bothers to write a journalling 
> file system would understand how to do things properly.
> 
> The problem is at the hardware level.  Standard disk drives (including their 
> motherboard if they have one, and their firmware) are designed for speed, not 
> integrity.  The assumption is that you will be using them to play games or 
> write your CV in Word, not to keep vital data.  So they are set up, using 
> their default jumper positions, to lie.  In order to keep their computer 
> running as fast as possible, instead of
> 
> 1) receive write command
> 2) perform write command
> 3) read that bit of disk to confirm the change
> 4) if not, bring SMART system into play and try writing it somewhere else
> 5) if succeed, tell the computer "I wrote that and it worked."
> 6) otherwise tell the computer "I wrote that and it failed."
> 
> they do this
> 
> 1) receive write command
> 2) tell the computer "I wrote that and it worked."
> 3) perform write command
> 4) read that bit of disk to confirm the change
> 5) if not, bring SMART system into play and try writing it somewhere else

Coincidence. I just had a funny incident; may be it's related.

I just modified a program so that it create fours triggers in a database. I ran 
the program, then got an error from APSW (the program uses Python) complaining 
about an I/O or disk error. This frightened me a bit, but I though this may be 
due to SQLiteBrowser opened on the same DB, which I forget to close and I 
closed just after I started the program, which was subject to the I/O issue. I 
open the DB and can see only two of the four triggers, two missing. So I delete 
the DB and regenerate it three times, without error messages, but still two 
triggers missing. That's only at the fourth repetition the four triggers was 
all there.

The SMART data indicates zero reallocated sectors.

In the SMART utility, I noticed there is an hardware cache, which I disabled, 
in case of and thinking about ?lying devices? I remembered from this message.

This is frightening to me, as I got an error message the first time only, but 
not the other times while something wrong seems to have happened too. Also, I 
could notice something was missing in the DB (even when generated without an 
error notification), because it was about something I was precisely looking at, 
at that moment; if it had been about some rows missing in any table, I would 
have not noticed it.

Hardware failure? OS failure? Software failure? Can't tell for sure??

-- 
Yannick Duch?ne

[sqlite] Bug: Successfully committed transaction rolled back after power failure

Reply via email to