[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-07 Thread Simon Slavin

On 7 Feb 2016, at 1:51pm, Bernard McNeill  wrote:

> Is it correct to say that, under Linux, if SQLITE_EXTRA_DURABLE is set (and
> all other settings are left as default values), and with the further
> assumption that the hardware
> reports write status accurately to the OS, then SQLITE_OK will only be
> returned after a transaction is irreversably fixed on the disk?

As far as we know, yes.  But the assumption you mentioned doesn't hold on a 
setup intended for normal desktop use.  Never.  It'll happen only when you have 
bought server-range storage devices and they have their jumpers set for server 
use.

I once put together a computer which everything in server mode and installed a 
standard copy of Windows XP and Office on it.  It was horribly laggy to use.  
Took two minutes to boot, over a minute to load Word, and got about three 
keystrokes a second once Word was loaded.  Something like 20 seconds to switch 
which App was foregrounded.  I loaded Doom on it and got something like ten 
frames a second.

I really must do this experiment again since the figures I got don't mean 
anything to todays users and they need something they can relate to.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-07 Thread Bernard McNeill
===

https://www.sqlite.org/src/timeline?y=ci=af92401826f5cf49e62c


===

To clarify:

Is it correct to say that, under Linux, if SQLITE_EXTRA_DURABLE is set (and
all other settings are left as default values), and with the further
assumption that the hardware
reports write status accurately to the OS, then SQLITE_OK will only be
returned after a transaction is irreversably fixed on the disk?

Best regards



On Sat, Feb 6, 2016 at 2:45 PM, Richard Hipp  wrote:

> On 2/6/16, Bernard McNeill  wrote:
> > Please can I formally propose that, for Linux:
> > 1. A Pragma or other compile-time option is created such that SQLITE_OK
> is
> > not issued on file writes/modifications/deletes until the hardware
> > indicates that all associated Directory syncs, etc., are done.
>
> https://www.sqlite.org/src/timeline?y=ci=af92401826f5cf49e62c
>
> > 2. Since the absence of 1. appears to break the 'D' in ACID, the option
> is
> > set on by default,
>
> No.  This would be a serious performance hit for billions and billions
> of devices that have soldered-in batteries and generally do not care
> about durability in the event of an OS crash.  Remember that SQLite
> has never synced directories like this before in its 15+ year history
> and that was never a problem for anyone until a few weeks ago. If
> handset manufacturers, and/or OS vendors, want to make it the default
> on their systems, there is a compile-time option for that:
> https://www.sqlite.org/draft/compile.html#extra_durable
> --
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-06 Thread Bernard McNeill
Please can I formally propose that, for Linux:
1. A Pragma or other compile-time option is created such that SQLITE_OK is
not issued on file writes/modifications/deletes until the hardware
indicates that all associated Directory syncs, etc., are done.
2. Since the absence of 1. appears to break the 'D' in ACID, the option is
set on by default, but the docs make the point that it could produce a
substantial performance hit.
The docs could also make the point that if the hardware returns incorrect
status to the OS,
then, even if the option is on, 'D' may still not be true (but not Sqlite's
fault).

Best regards


On Mon, Jan 25, 2016 at 3:47 PM, Richard Hipp  wrote:

> On 1/25/16, Matthias-Christian Ott  wrote:
> >
> > Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
> > journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
> > transaction that it said to be committed depending on the VFS?
>
> Sort of.  This appears to be true if you are running on QNX and you
> lose power (or do a hard reset) shortly after the transaction commits.
> It might be the case on other OSes/filesystems but it has never before
> been observed.
>
> This is not new behavior.  This is apparently what SQLite has been
> doing for 15 years, across quadrillions of transactions on many
> billions of devices, and it has never before caused any issues, until
> just recently when Mr. Meinlschmidt upgraded to a newer version of
> QNX.
>
> >
> > If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> > be more important than performance, except when the constraints are such
> > that correctness has to be sacrificed for performance?
> >
> > The trade-off that is described in the description of SQLite
> > SQLITE_EXTRA_DURABLE reads like an excerpt from the MySQL manual when
> > MyISAM was still widely used. Perhaps I'm also too irritated by
> > discussions with advocates of MySQL who would argue against the fact
> > that proper transactions were necessary because the DBMS would be faster
> > without them. That is not to say that the ACID properties and
> > transactions solve every concurrency or correctness problem but they
> > help significantly.
> >
>
> As you point out, it is an engineering tradeoff.
>
> The feedback I receive is that most users of SQLite would much rather
> avoid the extra directory syncs, even if it means having the last
> transaction rollback following a power loss.  Most developers do not
> care that much about durability, at least not enough to want to take
> the performance hit of syncing the directory after every unlink.
> Non-durable commits on power-loss have long been the default in WAL
> mode (run-time fixable by setting PRAGMA synchronous=FULL) and nobody
> has before ever complained.  Most people consider this a feature.  In
> fact, if I recall correctly, we first made synchronous=NORMAL the
> default in WAL mode by popular request.  WAL mode used to default to
> power-loss durable but people requested the change for performance
> reasons.
>
> Note especially that this is about durability, not consistency.
> SQLite guarantees consistency regardless.  People care about
> consistency.  Durability, not so much.  I'm not a MySQL expert, but I
> think the historical complaints about MyISAM had more to do with
> consistency than with durability, did they not?
>
> --
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-06 Thread Richard Hipp
On 2/6/16, Bernard McNeill  wrote:
> Please can I formally propose that, for Linux:
> 1. A Pragma or other compile-time option is created such that SQLITE_OK is
> not issued on file writes/modifications/deletes until the hardware
> indicates that all associated Directory syncs, etc., are done.

https://www.sqlite.org/src/timeline?y=ci=af92401826f5cf49e62c

> 2. Since the absence of 1. appears to break the 'D' in ACID, the option is
> set on by default,

No.  This would be a serious performance hit for billions and billions
of devices that have soldered-in batteries and generally do not care
about durability in the event of an OS crash.  Remember that SQLite
has never synced directories like this before in its 15+ year history
and that was never a problem for anyone until a few weeks ago. If
handset manufacturers, and/or OS vendors, want to make it the default
on their systems, there is a compile-time option for that:
https://www.sqlite.org/draft/compile.html#extra_durable
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-03 Thread Yannick DuchĂȘne
On Wed, 03 Feb 2016 06:30:13 -0700
"Keith Medcalf"  wrote:

> 
> Is this on windows?  Any errors in the Eventlogs to the tune "Oooopsie -- 
> accidentally threw away your data instead of writing it to disk"?  Windows 
> does this quite commonly under some circumstances.  MicroSoft created the bug 
> in NT 4 and has been unable to locate or fix it since -- though there is no 
> problem producing an error message about 50% of the time it happens.

No, that's Ubuntu Trusty 32 bits on Ext4. For an anecdote, I personally never 
had file system issues with Windows, the only few cases I had was with Ubuntu 
(especially with Btrfs which I left for this reason).

> [?]
> > 
> > Coincidence. I just had a funny incident; may be it's related.
> > 
> > I just modified a program so that it create fours triggers in a database.
> > I ran the program, then got an error from APSW (the program uses Python)
> > complaining about an I/O or disk error. This frightened me a bit, but I
> > though this may be due to SQLiteBrowser opened on the same DB, which I
> > forget to close and I closed just after I started the program, which was
> > subject to the I/O issue. I open the DB and can see only two of the four
> > triggers, two missing. So I delete the DB and regenerate it three times,
> > without error messages, but still two triggers missing. That's only at the
> > fourth repetition the four triggers was all there.
> > 
> > The SMART data indicates zero reallocated sectors.
> > 
> > In the SMART utility, I noticed there is an hardware cache, which I
> > disabled, in case of and thinking about ?lying devices? I remembered from
> > this message.
> > 
> > This is frightening to me, as I got an error message the first time only,
> > but not the other times while something wrong seems to have happened too.
> > Also, I could notice something was missing in the DB (even when generated
> > without an error notification), because it was about something I was
> > precisely looking at, at that moment; if it had been about some rows
> > missing in any table, I would have not noticed it.
> > 
> > Hardware failure? OS failure? Software failure? Can't tell for sure??

-- 
Yannick Duch?ne


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-03 Thread Meinlschmidt Stefan
> Is this on windows?

The original problem was on QNX 6.5 employing a QNX6 filesystem on mNAND
flash.

S.M.
-- 
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
Fax: +49-8458-3332-20-531


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-03 Thread Keith Medcalf

Is this on windows?  Any errors in the Eventlogs to the tune "Oooopsie -- 
accidentally threw away your data instead of writing it to disk"?  Windows does 
this quite commonly under some circumstances.  MicroSoft created the bug in NT 
4 and has been unable to locate or fix it since -- though there is no problem 
producing an error message about 50% of the time it happens.

> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> bounces at mailinglists.sqlite.org] On Behalf Of Yannick Duch?ne
> Sent: Tuesday, 2 February, 2016 14:58
> To: sqlite-users at mailinglists.sqlite.org
> Subject: Re: [sqlite] Bug: Successfully committed transaction rolled back
> after power failure
> 
> On Thu, 28 Jan 2016 14:55:28 +
> Simon Slavin  wrote:
> 
> >
> > On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
> >
> > > ===
> > > Like the user reading ?saving OK? and throwing away the
> > > Post-It with the original information
> > > ===
> > >
> > > This is exactly my concern.
> > > The user throwing away the Post-It is entirely reasonable if he sees a
> > > message like that.
> > >
> > > Do you happen to know if Linux/Debian (which I think uses a
> journalling
> > > filesystem) carries this risk?
> >
> > The problem is not at the software level.  Various operating systems and
> file systems are correctly programmed with regard to waiting for write
> commands to complete.  I don't know specifically about Debian but Linux
> has a good reputation for such things, and anyone who bothers to write a
> journalling file system would understand how to do things properly.
> >
> > The problem is at the hardware level.  Standard disk drives (including
> their motherboard if they have one, and their firmware) are designed for
> speed, not integrity.  The assumption is that you will be using them to
> play games or write your CV in Word, not to keep vital data.  So they are
> set up, using their default jumper positions, to lie.  In order to keep
> their computer running as fast as possible, instead of
> >
> > 1) receive write command
> > 2) perform write command
> > 3) read that bit of disk to confirm the change
> > 4) if not, bring SMART system into play and try writing it somewhere
> else
> > 5) if succeed, tell the computer "I wrote that and it worked."
> > 6) otherwise tell the computer "I wrote that and it failed."
> >
> > they do this
> >
> > 1) receive write command
> > 2) tell the computer "I wrote that and it worked."
> > 3) perform write command
> > 4) read that bit of disk to confirm the change
> > 5) if not, bring SMART system into play and try writing it somewhere
> else
> 
> Coincidence. I just had a funny incident; may be it's related.
> 
> I just modified a program so that it create fours triggers in a database.
> I ran the program, then got an error from APSW (the program uses Python)
> complaining about an I/O or disk error. This frightened me a bit, but I
> though this may be due to SQLiteBrowser opened on the same DB, which I
> forget to close and I closed just after I started the program, which was
> subject to the I/O issue. I open the DB and can see only two of the four
> triggers, two missing. So I delete the DB and regenerate it three times,
> without error messages, but still two triggers missing. That's only at the
> fourth repetition the four triggers was all there.
> 
> The SMART data indicates zero reallocated sectors.
> 
> In the SMART utility, I noticed there is an hardware cache, which I
> disabled, in case of and thinking about ?lying devices? I remembered from
> this message.
> 
> This is frightening to me, as I got an error message the first time only,
> but not the other times while something wrong seems to have happened too.
> Also, I could notice something was missing in the DB (even when generated
> without an error notification), because it was about something I was
> precisely looking at, at that moment; if it had been about some rows
> missing in any table, I would have not noticed it.
> 
> Hardware failure? OS failure? Software failure? Can't tell for sure??
> 
> 
> 
> --
> Yannick Duch?ne
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread Yannick DuchĂȘne
On Thu, 28 Jan 2016 14:55:28 +
Simon Slavin  wrote:

> 
> On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
> 
> > ===
> > Like the user reading ?saving OK? and throwing away the
> > Post-It with the original information
> > ===
> > 
> > This is exactly my concern.
> > The user throwing away the Post-It is entirely reasonable if he sees a
> > message like that.
> > 
> > Do you happen to know if Linux/Debian (which I think uses a journalling
> > filesystem) carries this risk?
> 
> The problem is not at the software level.  Various operating systems and file 
> systems are correctly programmed with regard to waiting for write commands to 
> complete.  I don't know specifically about Debian but Linux has a good 
> reputation for such things, and anyone who bothers to write a journalling 
> file system would understand how to do things properly.
> 
> The problem is at the hardware level.  Standard disk drives (including their 
> motherboard if they have one, and their firmware) are designed for speed, not 
> integrity.  The assumption is that you will be using them to play games or 
> write your CV in Word, not to keep vital data.  So they are set up, using 
> their default jumper positions, to lie.  In order to keep their computer 
> running as fast as possible, instead of
> 
> 1) receive write command
> 2) perform write command
> 3) read that bit of disk to confirm the change
> 4) if not, bring SMART system into play and try writing it somewhere else
> 5) if succeed, tell the computer "I wrote that and it worked."
> 6) otherwise tell the computer "I wrote that and it failed."
> 
> they do this
> 
> 1) receive write command
> 2) tell the computer "I wrote that and it worked."
> 3) perform write command
> 4) read that bit of disk to confirm the change
> 5) if not, bring SMART system into play and try writing it somewhere else

Coincidence. I just had a funny incident; may be it's related.

I just modified a program so that it create fours triggers in a database. I ran 
the program, then got an error from APSW (the program uses Python) complaining 
about an I/O or disk error. This frightened me a bit, but I though this may be 
due to SQLiteBrowser opened on the same DB, which I forget to close and I 
closed just after I started the program, which was subject to the I/O issue. I 
open the DB and can see only two of the four triggers, two missing. So I delete 
the DB and regenerate it three times, without error messages, but still two 
triggers missing. That's only at the fourth repetition the four triggers was 
all there.

The SMART data indicates zero reallocated sectors.

In the SMART utility, I noticed there is an hardware cache, which I disabled, 
in case of and thinking about ?lying devices? I remembered from this message.

This is frightening to me, as I got an error message the first time only, but 
not the other times while something wrong seems to have happened too. Also, I 
could notice something was missing in the DB (even when generated without an 
error notification), because it was about something I was precisely looking at, 
at that moment; if it had been about some rows missing in any table, I would 
have not noticed it.

Hardware failure? OS failure? Software failure? Can't tell for sure??



-- 
Yannick Duch?ne


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread James K. Lowden
On Sun, 31 Jan 2016 20:27:56 -0700
Scott Robison  wrote:

> On Sun, Jan 31, 2016 at 7:35 PM, Rowan Worth  wrote:
> 
> > On 31 January 2016 at 03:56, James K. Lowden
> >  wrote:
> >
> > > Surely SQLite does both -- fsync on file and directory -- as part
> > > of a commit. That's not in doubt, is it?
> > >
> >
> > No, SQLite does not. On COMMIT it fsyncs the database file and
> > unlinks the journal[1], but does not fsync the directory. This is
> > demonstrated by the logs in Stefan's initial post, and indeed the
> > purpose of his patch was to introduce an fsync on the directory
> > after unlinking the journal.
> >
> 
> Perhaps there could be a "paranoid" journal mode, where it first
> zeros out the header ala persist, then truncates the file, then
> deletes the file.

Well, I would say there are four choices:

1.  Something like you suggest, but just mark the  "committed
fact" in  the log file.  It would be enough to write "we're done here"
at the end of the file, followed by fsync, followed by unlink.  Then if
the log file is found (undeleted) after a crash, the information in the
file suffices.  

2.  Call fsync after unlink.  

3.  Call fsync after unlink if directory's extended attributes don't
include FS_DIRSYNC_FL *or* the filesystem was mounted with MS_DIRSYNC.  

4.  For systems that support it (Linux 2.6+) just go ahead and set
FS_DIRSYNC_FL.  

The 3rd option underscores the pitfalls of directory synchronization
under Linux.  NetBSD for example has neither option; afaik all
directory operations are synchronous.  (Posix doesn't mention it.)  

Personally, I like to keep things simple.  Update & fsync the log file
one last time before deleting it.  Works on every system, and is
impervious to future permutations to directory synchronization.  


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread Rowan Worth
On 2 February 2016 at 08:22, Stephen Chrzanowski 
wrote:

> On Mon, Feb 1, 2016 at 11:20 AM, Rowan Worth  wrote:

> As I indicated in the last paragraph of my mail, I'm not in favour of
> > fsync-directory-on-commit in the general case. But that's because I worry
> > about the performance impact, not because SQLite is doing its best and at
> > the mercy of the OS. I can't remember who in this thread suggested it but
> > fsync-directory-on-close seemed like a great compromise.
>
> *ONLY* if configurable, and *ONLY* if not a default.  Most of my apps are
> setup at highest risk because the data they contain doesn't matter and I
> want performance vs safety.


Performance over safety is exactly the reason we are here in the first
place. If performance wasn't a factor the OS would go straight to the disk
platter in response to a write() call.

But if we accept reality, we know that we have no such guarantee the data
we just wrote to a file is on the platter before we call fsync(). And
secondly, we have no guarantee that the directory structure we just changed
(eg. by unlinking the journal) is on the platter before we call fsync() on
the directory.

The OS/hardware can still lie, sure. But sqlite hasn't given it a chance to
tell the truth since doesn't fsync the directory - this is what I meant
when I said "hasn't even asked the question." SQLite doesn't ask for the
directory update to hit the platter, and then proceeds execution as if it
has.

To reiterate, I don't have a problem with sqlite's behaviour at all. I
think it's perfectly acceptable. But I think it deserves to be acknowledged
and understood for what it is; a pragmatic decision by sqlite to improve
performance in the general case at the "expense" of a
not-strictly-necessary rollback in an obscure failure scenario. It's not
fair to meet it with the double standard of "fsyncing the directory every
COMMIT would hurt general-case performance too much" and "the OS is
responsible for ensuring data gets to disk", because that essentially asks
the OS to fsync directory after every *i/o operation*, whether asked to or
not. Which would hurt general-case performance much more!

I'm right there behind you that it shouldn't be forced on users of sqlite :)

-Rowan


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread Olivier Mascia
> Le 2 f?vr. 2016 ? 03:25, Rowan Worth  a ?crit :
> 
> To reiterate, I don't have a problem with sqlite's behaviour at all. I
> think it's perfectly acceptable. But I think it deserves to be acknowledged
> and understood for what it is; a pragmatic decision by sqlite to improve
> performance in the general case at the "expense" of a
> not-strictly-necessary rollback in an obscure failure scenario. It's not
> fair to meet it with the double standard of "fsyncing the directory every
> COMMIT would hurt general-case performance too much" and "the OS is
> responsible for ensuring data gets to disk", because that essentially asks
> the OS to fsync directory after every *i/o operation*, whether asked to or
> not. Which would hurt general-case performance much more!
> 
> I'm right there behind you that it shouldn't be forced on users of sqlite :)

Indeed, and those users insisting on using journal_mode=DELETE while living 
with a filesystem requiring an fsync of the directory after create/remove 
files, could also fsync() the directory themselves after the SQLite successful 
commit, before considering it "done". Or they could use one of the alternates 
modes which do not delete the journal, but either reset it (TRUNCATE) or clear 
it (PERSIST). Or even consider WAL mode which has very interesting properties, 
in itself.

https://www.sqlite.org/pragma.html#pragma_journal_mode

Compared to the amount of (generally bad) sample code seen on the internet 
(outside of sqlite.org) where people seem to think it is ordinary business to 
do PRAGMA synchronous=OFF or even PRAGMA journal_mode=OFF, I live happily with 
this journal_mode=DELETE (which I happen to not use). :)

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread Rowan Worth
Hi Stephen,

On 1 February 2016 at 21:45, Stephen Chrzanowski 
wrote:
>
> SQLite is nothing more than part of a program run by the OS.  It completely
> relies on whatever the OS tells it.  If the OS tells it that things are OK,
> then that is all that can be done.  SQLite can't take on the responsibility
> of making sure that my taxes are done right, it won't ever wash my car (I'd
> be afraid the T would scratch the paint anyways) and it absolutely cannot
> defend against the lies an OS gives it.


Of course! SQLite is a remarkably reliable bit of software and I hold it in
high regard, but as you say software has its limits. I'm not one to expect
more of SQLite then it can reasonably provide.

Going back to the OS, I've seen instance, in both Windows and Linux based
> platform (RH and Debian) where file sizes do not change even though streams
> of data are being pumped into that particular file.


Stands to reason on linux if the directory hasn't been synced to disk yet.
The file's dirent is stored in a completely independent extent.


> I've seen situations
> where files have been deleted in Linux (Not allowed in Windows) and data
> still gets dumped into the now deleted file.


?
This is not an error, this is standard POSIX file semantics - if you have a
file descriptor referring to a file which gets deleted from disk, the file
descriptor is guaranteed to remain valid until it is closed. NFS even goes
to some lengths to make this work.

Is it the programs fault that
> the file is physically deleted but the program isn't aware?  Is it supposed
> to check every single time it writes a byte to see if the file exists, if
> the partition is healthy, and the root file system passes a FSCK at every
> commit?
>

:)


> SQLite doesn't know or care what sector the bytes
> live at on the drive physically.  It doesn't know about fragmentation, it
> doesn't know what size the clusters are, or anything that is important for
> file storage.


But SQLite *does* know how to interface with the file system. It's all very
well to say the OS/hardware sometimes lies, but in this specific instance
SQLite hasn't even asked the question.

As I indicated in the last paragraph of my mail, I'm not in favour of
fsync-directory-on-commit in the general case. But that's because I worry
about the performance impact, not because SQLite is doing its best and at
the mercy of the OS. I can't remember who in this thread suggested it but
fsync-directory-on-close seemed like a great compromise.

-Rowan


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread bm.emai...@gmail.com
Maybe, for Linux, there could be a  Pragma to choose between 
fsync-directory-on-commit,
fsync-directory-on-close, or no fsync-directory.

Particularly if it is a documented property (not unintended bug) that Linux 
filesystem operates this way.

Best regards

Sent from my BlackBerry? smartphone on O2

-Original Message-
From: Rowan Worth <row...@dugeo.com>
Sender: sqlite-users-bounces at mailinglists.sqlite.orgDate: Tue, 2 Feb 2016 
00:20:55 
To: SQLite mailing list
Reply-To: SQLite mailing list 
Subject: Re: [sqlite] Bug: Successfully committed transaction rolled back
 after power failure

Hi Stephen,

On 1 February 2016 at 21:45, Stephen Chrzanowski 
wrote:
>
> SQLite is nothing more than part of a program run by the OS.  It completely
> relies on whatever the OS tells it.  If the OS tells it that things are OK,
> then that is all that can be done.  SQLite can't take on the responsibility
> of making sure that my taxes are done right, it won't ever wash my car (I'd
> be afraid the T would scratch the paint anyways) and it absolutely cannot
> defend against the lies an OS gives it.


Of course! SQLite is a remarkably reliable bit of software and I hold it in
high regard, but as you say software has its limits. I'm not one to expect
more of SQLite then it can reasonably provide.

Going back to the OS, I've seen instance, in both Windows and Linux based
> platform (RH and Debian) where file sizes do not change even though streams
> of data are being pumped into that particular file.


Stands to reason on linux if the directory hasn't been synced to disk yet.
The file's dirent is stored in a completely independent extent.


> I've seen situations
> where files have been deleted in Linux (Not allowed in Windows) and data
> still gets dumped into the now deleted file.


?
This is not an error, this is standard POSIX file semantics - if you have a
file descriptor referring to a file which gets deleted from disk, the file
descriptor is guaranteed to remain valid until it is closed. NFS even goes
to some lengths to make this work.

Is it the programs fault that
> the file is physically deleted but the program isn't aware?  Is it supposed
> to check every single time it writes a byte to see if the file exists, if
> the partition is healthy, and the root file system passes a FSCK at every
> commit?
>

:)


> SQLite doesn't know or care what sector the bytes
> live at on the drive physically.  It doesn't know about fragmentation, it
> doesn't know what size the clusters are, or anything that is important for
> file storage.


But SQLite *does* know how to interface with the file system. It's all very
well to say the OS/hardware sometimes lies, but in this specific instance
SQLite hasn't even asked the question.

As I indicated in the last paragraph of my mail, I'm not in favour of
fsync-directory-on-commit in the general case. But that's because I worry
about the performance impact, not because SQLite is doing its best and at
the mercy of the OS. I can't remember who in this thread suggested it but
fsync-directory-on-close seemed like a great compromise.

-Rowan
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Rowan Worth
On 1 February 2016 at 18:58, Simon Slavin  wrote:

>
> On 1 Feb 2016, at 9:23am, bm.email01 at gmail.com wrote:
>
> > ---
> > No, SQLite does not. On COMMIT it fsyncs the database file and unlinks
> the
> > journal[1], but does not fsync the directory.
> > ---
> >
> > Since that can cause the last transaction to be lost, despite Sqlite
> returning a 'Commit successful' code to the application, doesn't that mean
> that Sqlite is _not_ truly 'Durable' (per ACID)?
>
> 1) The fault here is the file system not SQLite.  This one particular file
> system has this strange requirement that you fsync the directory.  SQLite
> is ACID.  It's the file system that messes it up.  SQLite cannot defend
> against untrustworthy middleware -- sooner or later a programmer has to
> trust their hardware is doing what it says it's doing.
>

I take your point, but as Florian pointed out it's not just one file
system; its a somewhat well known quirk of POSIX fsync.
http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

It's a bit disingenuous to say "the file system requires a particular dance
to ensure an unlink persists on disk, but the dance is excessively complex
so we're going to skip it and blame any problems on the file system." The
fact remains that sqlite returned SQLITE_OK for a transaction that never
succeeded, because it didn't take the steps required for the i/o to hit
disk. However...


> 3) For both the above problems SQLite still neither loses data nor
> corrupts the database file.  If the program crashes the file is
> automatically closed.  When the SQLite API is used to reopen the file the
> unexpected close will be discovered and corrected for.
>

I agree with this 100%. SQLite absolutely provides consistency here, which
I would argue is the most important. In an actual power-loss scenario, what
is an application going to do between receiving SQLITE_OK and before losing
power that possibly matters? If the disk cache hasn't been flushed by the
time the power drops, then nothing else you've done in the meantime will
have hit disk either.

Note that the OP's situation is very unusual in that a controlled shutdown
is initiated after sqlite returns SQLITE_OK, but that shutdown *does not
sync the filesystem*! That is the only reason for the 30 second window. In
usual circumstances, my feeling is that the window between "power failing
in the middle of sqlite_exec" and "power failing after sqlite_exec has
returned but before the unlinked journal is synced to disk" is so small
that sqlite is within its rights to blur the two.

-Rowan


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Stephen Chrzanowski
On Mon, Feb 1, 2016 at 11:20 AM, Rowan Worth  wrote:

>
> > I've seen situations
> > where files have been deleted in Linux (Not allowed in Windows) and data
> > still gets dumped into the now deleted file.
>
>
> ?
> This is not an error, this is standard POSIX file semantics - if you have a
> file descriptor referring to a file which gets deleted from disk, the file
> descriptor is guaranteed to remain valid until it is closed. NFS even goes
> to some lengths to make this work.
>
>
I didn't claim that this was an error, but a possibility to introduce
confusion and data corruption.

For instance, if the first DB file gets deleted, but the controlling
program still owns it, the file gets deleted, the OS still allows the first
instance to do what it needs to do with that file.  The only 'things' that
know that file exists is the OS and the single instance of that application.

Now say a second instance of the application starts up, recreates the
original file (New INODE) and starts creating journal/wal files, all of a
sudden you have two programs that have very real and very valid temp files
that could potentially be accessed by both processes and potentially get
your data into a really nasty state.  Is the application going to pass or
fail ACID because of external pressures?  Yanking the power cable is
considered an external pressure.

This has nothing to do with the thread, but just an illustration that your
software is at the mercy of the OS.  And if the OS isn't telling your
program the truth then it is the OS's fault, not the application or SQLite.


> Is it the programs fault that
> > the file is physically deleted but the program isn't aware?  Is it
> supposed
> > to check every single time it writes a byte to see if the file exists, if
> > the partition is healthy, and the root file system passes a FSCK at every
> > commit?
> >
>
> :)
>
>

>
> > SQLite doesn't know or care what sector the bytes
> > live at on the drive physically.  It doesn't know about fragmentation, it
> > doesn't know what size the clusters are, or anything that is important
> for
> > file storage.
>
>
> But SQLite *does* know how to interface with the file system. It's all very
> well to say the OS/hardware sometimes lies, but in this specific instance
> SQLite hasn't even asked the question.
>
>
That is all it has.  An interface.  Good word.  The information exchanged
is cut down to the basics.  It doesn't know what the OS REALLY did with
that fsync, or if those magnetic fields are aligned properly on the
platter, or stored in those transistor cells properly.

I'm not sure that I understand where you're going with "hasn't even asked
the question"?  Are you referring to pre-hard power off or post?  What I've
understood this thread as being "My application received a SQLITE_OK for a
commit, but, after a hard power loss by yanking the power cable, the reboot
comes back and my commit isn't there."  Its because the OS didn't write the
data when the software asked it to.  My point is how far can the software
go to make sure the OS is doing its job?  Which is why I wrote that
paragraph above where you smilied. ;)


> As I indicated in the last paragraph of my mail, I'm not in favour of
> fsync-directory-on-commit in the general case. But that's because I worry
> about the performance impact, not because SQLite is doing its best and at
> the mercy of the OS. I can't remember who in this thread suggested it but
> fsync-directory-on-close seemed like a great compromise.
>
> -Rowan
>

*ONLY* if configurable, and *ONLY* if not a default.  Most of my apps are
setup at highest risk because the data they contain doesn't matter and I
want performance vs safety.  There are applications I have written that are
data critical and I've got full sync on, my desk is illuminated by the HDD
light while I wait for that INSERT finishes.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Stephen Chrzanowski
This is steering off topic, but, here's my deal with computers as a whole.
Bluntly, I love them, I can't be without them, and I wouldn't change my
experiences with them for anything.  That doesn't mean that I implicitly
trust them.

My expectations are "low" because experience has taught me to keep them low
in the 30 years I've been dealing not only with software but hardware as
well going back to the Vic-20.  I was 8 when we got that machine, and by 9
I was already tearing it apart to "see how it worked".  I don't think my
parents ever found out though {smirks at the good ol' days}  I've seen
pretty funky things with computers, especially when it comes to flaky
electricity supplies.  I only point that out because this thread originated
on the statement "Pulled the plug".

At my old job as a repair tech at a mom-and-pop computer shop for 6 years,
I've had my hand on literally thousands of computers, and except for the
very few times when I was to build a few at the same time, they were all
unique.  I've had situations with customers machine who literally didn't
like the "taste" of electricity at the owners house, but would work
flawlessly on my work bench.  I set that machine, and one other I had
laying around up at the customers house, and hers failed to work while in
the OS while mine started no problem.  Possible reason is power sags and
power spikes affected the PSU.  I don't remember what I did with that
machine afterwards.  (I might have the notes still in my database).  When
you yank that power cord, and who knows which way that read/write head is
going to go, let alone what kind of condition the rest of your hardware is
going to be at.  You may think that yanking the power cord is the end all
and say all, but there are sparks of electricity that jump between the
connector plate and the plug in the PSU, and THAT is what causes the
unknowns.

Right now, at work, I have an external drive plugged in via ESATA so that I
can keep my own OS instead of sharing an OS with the other gents that use
that machine.  I'm looking at odd electrical problems with it right now,
probably because of the constant power switching over the past 3 years I've
been running it.  I have to let that drive spin up, warm up for a few
minutes, power it off, then back on, THEN turn the computer on.  I've
ALWAYS treated that drive and enclosure with the respect it needs as there
are no computer shops open 24/7.  I've never lost data, but, now I'm having
problems with it in a cold state.  So yes, again another reason to lower
that expectation bar.

I'm the kind of tech that doesn't use the PSU power switch to power off a
machine that is in a bad state unless I absolutely have to.  Soft power it
off with the power button for 4 seconds.  Only then would I use the PSU
power switch, or, yank the cable of the supply didn't have a switch.

I'm pretty firm with my statement of "You pull that plug, all bets are off".

All software depends on hardware.  You yank that plug, you have an unknown
state until you bring it up.  Power surges and sags are evil to computer
hardware.

On Mon, Feb 1, 2016 at 10:14 AM, Howard Chu  wrote:

Stephen Chrzanowski wrote:
>
> Your expectations are pretty low. On a properly configured Unix host,
> there's no reason for a powerfail to prevent a successful reboot. E.g., if
> you mount boot and root filesystems as read-only filesystems, they can
> never get corrupted. If you're using modern filesystems for your writable
> partitions (e.g., FSs with journaling) then there's also no reason for them
> to fail to come back online.
>
> So it just comes down to your application code being reliable.
>
> I should note that SQLightning has none of the problems being described in
> this thread - in its default mode, it is full-ACID and a powerfail cannot
> lose data or corrupt the database. And it does all this while being at
> least 30% faster on writes than vanilla SQLite.
>
> Yes, the OS could have bugs. Yes, the hardware could physically fail.
> That's pretty rare though; HDDs R/W heads auto-retract on powerfail so
> unless the entire mechanism actually jammed, there's no way for a powerfail
> to cause a head crash or any other destructive event.
>
> Bottom line - if your OS reboots successfully, there's no excuse for your
> database to not also come up successfully, fully intact.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.   http://www.symas.com
>   Director, Highland Sun http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Jean-Christophe Deschamps
At 17:55 01/02/2016, you wrote:
>The above about implementation of RAID is good. There were battery
>backed up caching controllers 20 years ago. In the event of a power
>loss, the cached writes could be completed later.

I run such one RAID Areca controller with 8 server disks in RAID6. Even 
after trying hard to make writes fail, I never had a single incident in 
many years. I'm sure many other users still run such setups today, no 
need to go that far back in time.



[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Howard Chu
Stephen Chrzanowski wrote:
> @Rowan;
>
> First off, whether the OS or SQLite is ACID or not, if you pull the plug on
> your hardware, all bets are off on whether it'll BOOT, let alone recover a
> single transaction.  I get that this could be a useful tool when doing
> disaster proofing, but, at that stage in the game of bulletproofing, you
> can't win every battle, and you're running into that at 100 miles an hour.

Your expectations are pretty low. On a properly configured Unix host, there's 
no reason for a powerfail to prevent a successful reboot. E.g., if you mount 
boot and root filesystems as read-only filesystems, they can never get 
corrupted. If you're using modern filesystems for your writable partitions 
(e.g., FSs with journaling) then there's also no reason for them to fail to 
come back online.

So it just comes down to your application code being reliable.

I should note that SQLightning has none of the problems being described in 
this thread - in its default mode, it is full-ACID and a powerfail cannot lose 
data or corrupt the database. And it does all this while being at least 30% 
faster on writes than vanilla SQLite.

Yes, the OS could have bugs. Yes, the hardware could physically fail. That's 
pretty rare though; HDDs R/W heads auto-retract on powerfail so unless the 
entire mechanism actually jammed, there's no way for a powerfail to cause a 
head crash or any other destructive event.

Bottom line - if your OS reboots successfully, there's no excuse for your 
database to not also come up successfully, fully intact.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Adam Devita
At the risk of repeating something mentioned last week on this thread.
One tactic to reduce/avoid the no-directory sync problem is to use WAL
mode. The commit in WAL is write to the WAL file, so the the directory
sync problem goes away.

If you want to be in paranoid mode, don't trust others. Why not use
the backup api before checkpointing and after the checkpoint has
succeeded, check that both dbs have the same state before deleting (or
archiving) the backup?

Another tactic to handle that hasn't been discussed (if memory serves
me) that I'm curious if the following would work to get around the
directory sync issue:
Separate Sqlite  telling your program that the transaction is done
from the program telling the user. Don't tell the *user* that the
transaction is done until you have confirmed the .journal file that
existed before your commit no longer exists after it, so a power off
rollback can't happen. Could the OS lie and say the .journal file has
been deleted only to have it re-appear if the power failure is at the
'wrong' time?


The above about implementation of RAID is good. There were battery
backed up caching controllers 20 years ago. In the event of a power
loss, the cached writes could be completed later.

regards,
Adam




On Mon, Feb 1, 2016 at 10:14 AM, Howard Chu  wrote:
> Stephen Chrzanowski wrote:
>>
>> @Rowan;
>>
>> First off, whether the OS or SQLite is ACID or not, if you pull the plug
>> on
>> your hardware, all bets are off on whether it'll BOOT, let alone recover a
>> single transaction.  I get that this could be a useful tool when doing
>> disaster proofing, but, at that stage in the game of bulletproofing, you
>> can't win every battle, and you're running into that at 100 miles an hour.
>
>
> Your expectations are pretty low. On a properly configured Unix host,
> there's no reason for a powerfail to prevent a successful reboot. E.g., if
> you mount boot and root filesystems as read-only filesystems, they can never
> get corrupted. If you're using modern filesystems for your writable
> partitions (e.g., FSs with journaling) then there's also no reason for them
> to fail to come back online.
>
> So it just comes down to your application code being reliable.
>
> I should note that SQLightning has none of the problems being described in
> this thread - in its default mode, it is full-ACID and a powerfail cannot
> lose data or corrupt the database. And it does all this while being at least
> 30% faster on writes than vanilla SQLite.
>
> Yes, the OS could have bugs. Yes, the hardware could physically fail. That's
> pretty rare though; HDDs R/W heads auto-retract on powerfail so unless the
> entire mechanism actually jammed, there's no way for a powerfail to cause a
> head crash or any other destructive event.
>
> Bottom line - if your OS reboots successfully, there's no excuse for your
> database to not also come up successfully, fully intact.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.   http://www.symas.com
>   Director, Highland Sun http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
--
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Simon Slavin

On 1 Feb 2016, at 11:24am, Rowan Worth  wrote:
> 
> I take your point, but as Florian pointed out it's not just one file
> system; its a somewhat well known quirk of POSIX fsync.
> http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/
> 
> It's a bit disingenuous to say "the file system requires a particular dance
> to ensure an unlink persists on disk, but the dance is excessively complex
> so we're going to skip it and blame any problems on the file system." The
> fact remains that sqlite returned SQLITE_OK for a transaction that never
> succeeded, because it didn't take the steps required for the i/o to hit
> disk.

Well, that's a fair point.  The problem is that SQLite does not know it's 
running on POSIX.  It does, however, have different VFSes for different 
platforms.  So the logical place to put this would be in the VFS.  At least 
that's what it looks like to me but I don't know much about the programming 
that goes into SQLite.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Florian Weimer
On 01/25/2016 04:47 PM, Richard Hipp wrote:
> On 1/25/16, Matthias-Christian Ott  wrote:
>>
>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>> transaction that it said to be committed depending on the VFS?
> 
> Sort of.  This appears to be true if you are running on QNX and you
> lose power (or do a hard reset) shortly after the transaction commits.
> It might be the case on other OSes/filesystems but it has never before
> been observed.

Both XFS and Ceph on Linux require the directory fsync operation to make
an unlink persistent.  (I did not run experiments, I asked people
familiar with these file systems.)

Please activate directory fsyncs for Linux, too.

Florian



[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Simon Slavin

On 1 Feb 2016, at 9:23am, bm.email01 at gmail.com wrote:

> ---
> No, SQLite does not. On COMMIT it fsyncs the database file and unlinks the
> journal[1], but does not fsync the directory.
> ---
> 
> Since that can cause the last transaction to be lost, despite Sqlite 
> returning a 'Commit successful' code to the application, doesn't that mean 
> that Sqlite is _not_ truly 'Durable' (per ACID)?

1) The fault here is the file system not SQLite.  This one particular file 
system has this strange requirement that you fsync the directory.  SQLite is 
ACID.  It's the file system that messes it up.  SQLite cannot defend against 
untrustworthy middleware -- sooner or later a programmer has to trust their 
hardware is doing what it says it's doing.

2) Even if this were fixed, with an fsync of the directory included in the VFS, 
your storage system lies to the OS.  So you still wouldn't get ACID behaviour 
because your storage system isn't ACID.  SQLite cannot defend against 
untrustworthy storage -- sooner or later a programmer has to trust their 
hardware is doing what it says it's doing.

3) For both the above problems SQLite still neither loses data nor corrupts the 
database file.  If the program crashes the file is automatically closed.  When 
the SQLite API is used to reopen the file the unexpected close will be 
discovered and corrected for.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Rowan Worth
On 31 January 2016 at 03:56, James K. Lowden 
wrote:

> Surely SQLite does both -- fsync on file and directory -- as part of a
> commit. That's not in doubt, is it?
>

No, SQLite does not. On COMMIT it fsyncs the database file and unlinks the
journal[1], but does not fsync the directory. This is demonstrated by the
logs in Stefan's initial post, and indeed the purpose of his patch was to
introduce an fsync on the directory after unlinking the journal.

[1] with the default PRAGMA journal_mode=delete, obviously
-Rowan


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Scott Robison
On Mon, Feb 1, 2016 at 9:55 AM, Adam Devita  wrote:

> At the risk of repeating something mentioned last week on this thread.
> One tactic to reduce/avoid the no-directory sync problem is to use WAL
> mode. The commit in WAL is write to the WAL file, so the the directory
> sync problem goes away.
>

Further, if durability is important to your application, there are truncate
and persist options in addition to delete that might be more effective for
a particular use case.

Again, as I suggested last night (somewhat tongue in cheek) we could have a
paranoid journal mode that does persist, truncate and delete in that order.


> If you want to be in paranoid mode, don't trust others. Why not use
> the backup api before checkpointing and after the checkpoint has
> succeeded, check that both dbs have the same state before deleting (or
> archiving) the backup?
>

In this case you aren't necessarily ahead because if the information is
cached in RAM but not flushed to disk, you could (theoretically) get a 100%
match yet not have the bits of either file truly synced to disk.


> Another tactic to handle that hasn't been discussed (if memory serves
> me) that I'm curious if the following would work to get around the
> directory sync issue:
> Separate Sqlite  telling your program that the transaction is done
> from the program telling the user. Don't tell the *user* that the
> transaction is done until you have confirmed the .journal file that
> existed before your commit no longer exists after it, so a power off
> rollback can't happen. Could the OS lie and say the .journal file has
> been deleted only to have it re-appear if the power failure is at the
> 'wrong' time?
>

Isn't that the exact nature of the problem being discussed? Journal is
deleted, hard power loss occurs, and journal file still exists when the OS
next powers up and looks for the journal file?

-- 
Scott Robison


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread bm.emai...@gmail.com
---
No, SQLite does not. On COMMIT it fsyncs the database file and unlinks the
journal[1], but does not fsync the directory.
---

Since that can cause the last transaction to be lost, despite Sqlite returning 
a 'Commit successful' code to the application, doesn't that mean that Sqlite is 
_not_ truly 'Durable' (per ACID)?

Best regards



Sent from my BlackBerry? smartphone on O2

-Original Message-
From: Rowan Worth <row...@dugeo.com>
Sender: sqlite-users-bounces at mailinglists.sqlite.orgDate: Mon, 1 Feb 2016 
10:35:34 
To: SQLite mailing list
Reply-To: SQLite mailing list 
Subject: Re: [sqlite] Bug: Successfully committed transaction rolled back
 after power failure

On 31 January 2016 at 03:56, James K. Lowden 
wrote:

> Surely SQLite does both -- fsync on file and directory -- as part of a
> commit. That's not in doubt, is it?
>

No, SQLite does not. On COMMIT it fsyncs the database file and unlinks the
journal[1], but does not fsync the directory. This is demonstrated by the
logs in Stefan's initial post, and indeed the purpose of his patch was to
introduce an fsync on the directory after unlinking the journal.

[1] with the default PRAGMA journal_mode=delete, obviously
-Rowan
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Stephen Chrzanowski
@Rowan;

First off, whether the OS or SQLite is ACID or not, if you pull the plug on
your hardware, all bets are off on whether it'll BOOT, let alone recover a
single transaction.  I get that this could be a useful tool when doing
disaster proofing, but, at that stage in the game of bulletproofing, you
can't win every battle, and you're running into that at 100 miles an hour.

Next, SQLite *CAN* tell the OS that it is done with its data, that the OS
can close file handles, and that the OS can flush the cache to the disk.
Whether or not the OS, drivers, or disk drive obeys the command is an
entirely different story.

Now for my spiel;

SQLite is nothing more than part of a program run by the OS.  It completely
relies on whatever the OS tells it.  If the OS tells it that things are OK,
then that is all that can be done.  SQLite can't take on the responsibility
of making sure that my taxes are done right, it won't ever wash my car (I'd
be afraid the T would scratch the paint anyways) and it absolutely cannot
defend against the lies an OS gives it.  Believe me, computers are
sometimes worse liars than humans are when it comes to internal stability.
Look at SMART and the number of times it does error correction.  Were you
aware that digital TVs do error correction when getting data from your PVR,
Satellite receiver, or cable TV due to electrical noise?  Looks all good to
you while you're watching the tube, but it is constantly correcting errors
and lying that things are OK.

SQLite doesn't manage your OS, it doesn't manage the firmware running on
the drives.  It doesn't care whether it runs over SATA, IDE, SCSI, USB,
Firewire, Ethernet, or SAN.  It relies on the OS to get whatever data it
asks for, in either a read or write situation, and depends entirely on the
OS to get the job done.  SQLite does not manage the sector by sector reads
and writes, it doesn't handle the physical writing of the bits to the
platter.  The OS handles that via its drivers, and if the drivers lie, or
the OS isn't resilient enough for your needs, find a different OS or file
system.

The problem can also persist beyond the OS.  The hard drives own firmware
can lie to the OS.  Error correction happens after it tries to write the
data.  Some drives will do the write attempt once, fail but still send back
the OK, try again, and THEN send a fail back at the next operation that may
not even be related to your program.

Going back to the OS, I've seen instance, in both Windows and Linux based
platform (RH and Debian) where file sizes do not change even though streams
of data are being pumped into that particular file.  I've seen situations
where files have been deleted in Linux (Not allowed in Windows) and data
still gets dumped into the now deleted file.  Is it the programs fault that
the file is physically deleted but the program isn't aware?  Is it supposed
to check every single time it writes a byte to see if the file exists, if
the partition is healthy, and the root file system passes a FSCK at every
commit?

So yes, you HAVE to blame the file system and/or OS because SQLite isn't
managing a file system.  It manages a stream of data packaged into
particular lengths of raw data we call pages, tells the OS that it needs to
read a certain set of bytes from a specific number of bytes from the
beginning of a file.  SQLite doesn't know or care what sector the bytes
live at on the drive physically.  It doesn't know about fragmentation, it
doesn't know what size the clusters are, or anything that is important for
file storage.

On Mon, Feb 1, 2016 at 6:53 AM, Simon Slavin  wrote:

>
> On 1 Feb 2016, at 11:24am, Rowan Worth  wrote:
> >
> > I take your point, but as Florian pointed out it's not just one file
> > system; its a somewhat well known quirk of POSIX fsync.
> >
> http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/
> >
> > It's a bit disingenuous to say "the file system requires a particular
> dance
> > to ensure an unlink persists on disk, but the dance is excessively
> complex
> > so we're going to skip it and blame any problems on the file system." The
> > fact remains that sqlite returned SQLITE_OK for a transaction that never
> > succeeded, because it didn't take the steps required for the i/o to hit
> > disk.
>
> Well, that's a fair point.  The problem is that SQLite does not know it's
> running on POSIX.  It does, however, have different VFSes for different
> platforms.  So the logical place to put this would be in the VFS.  At least
> that's what it looks like to me but I don't know much about the programming
> that goes into SQLite.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-31 Thread Scott Robison
On Sun, Jan 31, 2016 at 7:35 PM, Rowan Worth  wrote:

> On 31 January 2016 at 03:56, James K. Lowden 
> wrote:
>
> > Surely SQLite does both -- fsync on file and directory -- as part of a
> > commit. That's not in doubt, is it?
> >
>
> No, SQLite does not. On COMMIT it fsyncs the database file and unlinks the
> journal[1], but does not fsync the directory. This is demonstrated by the
> logs in Stefan's initial post, and indeed the purpose of his patch was to
> introduce an fsync on the directory after unlinking the journal.
>

Perhaps there could be a "paranoid" journal mode, where it first zeros out
the header ala persist, then truncates the file, then deletes the file.

-- 
Scott Robison


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-31 Thread James K. Lowden
On Sat, 30 Jan 2016 20:00:19 +
Simon Slavin  wrote:

> On 30 Jan 2016, at 7:56pm, James K. Lowden 
> wrote:
> 
> > Given that the fsync has returned successfully, I don't know of any
> > hardware that then will take 1000 ms to complete the write.  That's
> > the basis for my "subsecond interval" assumption. 
> 
> Writing to a RAID which has other write commands queued.

As Keith says, you get what you pay for.  I would say that hardware
that behaves as you describe is broken.  Hardware must expose some
kind of "yes, defintely writen" notification interface.  If it does,
that's what fsync abstracts.  If it doesn't, it's broken, because
"write" only means, "whatever, dude".  

A 7200 RPM drive is capable of ~75 I/O operation/second.  To reach my
1-second threshold, the RAID controller would have to have that many
queued operations.  Not only that: it would have to have the chutzpah
to claim to have written the data, knowing it couldn't guarantee its
promise.  Is Bernie Madoff consulting for Seagate?  

You might know more about this than I do.  The last time I dealt with
anything in the vicinity, write-queue depths of 10 were considered
problematic.  100 would have been catastrophic.  

Stefan Meinlschmidt mentioned "on Linux the interval can be, IIRC, up
to 30s".  I think that's a point of confusion.  That 30s sounds to me
suspiciously like the kernel's sync checkpoint interval.  Surely
buffered writes inside the device are not subject to "up to" 30s; the
device (in general) has no such timer.  

--jkl


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-30 Thread Simon Slavin

On 30 Jan 2016, at 7:56pm, James K. Lowden  wrote:

> Given that the fsync has returned successfully, I don't know of any
> hardware that then will take 1000 ms to complete the write.  That's the
> basis for my "subsecond interval" assumption. 

Writing to a RAID which has other write commands queued.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-30 Thread James K. Lowden
On Thu, 28 Jan 2016 08:00:08 +
Meinlschmidt Stefan  wrote:

> > But I ask you, what action could the application possibly take, in
> > that subsecond interval, that it matters?
> 
> Under the QNX OS using a QNX6 filesystem with default configuration,
> that ?subsecond interval? is actually up to 10s. For any
> non-journalling filesystem (SD cards, anyone?) mounted without
> immediate write-through (for efficiency) on Linux the interval can
> be, IIRC, up to 30s. 

Are we talking about the same thing?  write(2) makes no guarantee about
when the bits will actually be written.  It transfers the data to the
kernel.  fsync(2) *does* commit the data to media and does not return
until the hardware indicates the write is complete.  

That's the data, the extent of the inode.  For metadata, e.g. mtime,
Linux requires a second fsync is required, on the containing
directory.  

Surely SQLite does both -- fsync on file and directory -- as part of a
commit. That's not in doubt, is it?  

Given that the fsync has returned successfully, I don't know of any
hardware that then will take 1000 ms to complete the write.  That's the
basis for my "subsecond interval" assumption.  

The OP's question involved Linux, and that I think is covered. As for
QNX, if the default configuration has broken fsync semantics, it's hard
to see what SQLite can do about that.  The DBMS has to rely on the OS to
write the data, and there has to be some way to ascertain that's been
done.  


> The application could for example write to some non-SQLite
> storage  Or (what we did) it could shut off device power.

Yes.  In general it's better not to maintain state in two places; if
the application writes "committed to database" status after committing
to the database, it's going to have to contend with inconsistencies on
startup after a crash.  And your scenario is reasonable: you committed
the data, and now you can turn off the device until it's needed again.  

In such a constrained environment, you need reliable status from the
device.  If you can't get that, you're definitely hosed.  If you can
get it, what does it look like?  Perhaps the SQLite VFS can expose that
as a callback or a knob.  

--jkl




[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-30 Thread Keith Medcalf

That varies with your definition of RAID.  With a decent hardware RAID 
controller that works properly, the RAID controller can have megabytes or 
gigabytes of data which has not been written to the physical spinning disk when 
power is abruptly turned off.  And none of it will be lost.

Then you have a range of getting ever cheaper hardware RAID where varying 
amounts of data get lost, either from the controller or the embedded drive 
cache.

Then you have "fake RAID", also known as software raid, where all data not 
physically written is lost.

In other words you get what you pay for.

> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> bounces at mailinglists.sqlite.org] On Behalf Of Simon Slavin
> Sent: Saturday, 30 January, 2016 13:00
> To: SQLite mailing list
> Subject: Re: [sqlite] Bug: Successfully committed transaction rolled back
> after power failure
> 
> 
> On 30 Jan 2016, at 7:56pm, James K. Lowden 
> wrote:
> 
> > Given that the fsync has returned successfully, I don't know of any
> > hardware that then will take 1000 ms to complete the write.  That's the
> > basis for my "subsecond interval" assumption.
> 
> Writing to a RAID which has other write commands queued.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-29 Thread Bernard McNeill
===
...you must fsync the containing directory...
===

Is there an Sqlite option to achieve that?


In fact, to summarise:
Suppose I would like to maximise my chances of avoiding the 'Lost Post-It'
problem described above.
What are _all_ the Sqlite compile-time options, and their values, needed
under Linux?

(I appreciate the disk hard/firmware will need to be looked at as well)

Best regards



On Thu, Jan 28, 2016 at 4:37 PM, Howard Chu  wrote:

> Simon Slavin wrote:
>
>>
>> On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
>>
>> ===
>>> Like the user reading ?saving OK? and throwing away the
>>> Post-It with the original information
>>> ===
>>>
>>> This is exactly my concern.
>>> The user throwing away the Post-It is entirely reasonable if he sees a
>>> message like that.
>>>
>>> Do you happen to know if Linux/Debian (which I think uses a journalling
>>> filesystem) carries this risk?
>>>
>>
>> The problem is not at the software level.
>>
>
> Not true. There *is* a problem at the software level - on Linux, current
> BSD (and apparently also on QNX) you must fsync the containing directory
> when you make changes to the contents of a directory (create/delete/rename
> files). This is above and beyond whatever lies the hardware layer may tell
> you. It's a documented requirement in Linux, at least. It is also
> independent of whether or not the filesystem uses journaling.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.   http://www.symas.com
>   Director, Highland Sun http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Howard Chu
Simon Slavin wrote:
>
> On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
>
>> ===
>> Like the user reading ?saving OK? and throwing away the
>> Post-It with the original information
>> ===
>>
>> This is exactly my concern.
>> The user throwing away the Post-It is entirely reasonable if he sees a
>> message like that.
>>
>> Do you happen to know if Linux/Debian (which I think uses a journalling
>> filesystem) carries this risk?
>
> The problem is not at the software level.

Not true. There *is* a problem at the software level - on Linux, current BSD 
(and apparently also on QNX) you must fsync the containing directory when you 
make changes to the contents of a directory (create/delete/rename files). This 
is above and beyond whatever lies the hardware layer may tell you. It's a 
documented requirement in Linux, at least. It is also independent of whether 
or not the filesystem uses journaling.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Simon Slavin

On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:

> ===
> Like the user reading ?saving OK? and throwing away the
> Post-It with the original information
> ===
> 
> This is exactly my concern.
> The user throwing away the Post-It is entirely reasonable if he sees a
> message like that.
> 
> Do you happen to know if Linux/Debian (which I think uses a journalling
> filesystem) carries this risk?

The problem is not at the software level.  Various operating systems and file 
systems are correctly programmed with regard to waiting for write commands to 
complete.  I don't know specifically about Debian but Linux has a good 
reputation for such things, and anyone who bothers to write a journalling file 
system would understand how to do things properly.

The problem is at the hardware level.  Standard disk drives (including their 
motherboard if they have one, and their firmware) are designed for speed, not 
integrity.  The assumption is that you will be using them to play games or 
write your CV in Word, not to keep vital data.  So they are set up, using their 
default jumper positions, to lie.  In order to keep their computer running as 
fast as possible, instead of

1) receive write command
2) perform write command
3) read that bit of disk to confirm the change
4) if not, bring SMART system into play and try writing it somewhere else
5) if succeed, tell the computer "I wrote that and it worked."
6) otherwise tell the computer "I wrote that and it failed."

they do this

1) receive write command
2) tell the computer "I wrote that and it worked."
3) perform write command
4) read that bit of disk to confirm the change
5) if not, bring SMART system into play and try writing it somewhere else

Doing things this way around means that the computer has to wait less than half 
the time to continue with the next task.  Which is what users want.

Notice that with this behaviour it doesn't matter how fault-tolerant or ACID 
your file system is, because it has to trust its storage system and the storage 
system is lying to it.

You can get server-quality disks (sometimes called "enterprise drives") which 
do this correctly.  They cost more because they have to survive longer and more 
complicated testing at the factory before they're sold to you.  And if you put 
one in someone's computer they'd complain that it takes 20 seconds to load a 
Facebook page and they get lousy frame rates on their favourite game.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Meinlschmidt Stefan
> ===
> Like the user reading ?saving OK? and throwing away the
> Post-It with the original information
> ===
> 
> This is exactly my concern.
> The user throwing away the Post-It is entirely reasonable if he sees a
> message like that.
> 
> Do you happen to know if Linux/Debian (which I think uses a journalling
> filesystem) carries this risk?

I do not know Linux' current behaviour, especially with recent
filesystems. I seem to remember from last century that Linux by default
automatically issues a sync every 30s. So every filesystem that is
writable and uses a write-back-cache (i.e. that doesn't right writeaway)
would have a 0s?30s amnesy window. (V)FAT would be a candidate, but SD
cards usually are parameterized at mount to actually write through, i.e.
are less efficient but have no problem. I have no clue what journalling
file systems do here.

S.M.
-- 
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
Fax: +49-8458-3332-20-531


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Bernard McNeill
===
Like the user reading ?saving OK? and throwing away the
Post-It with the original information
===

This is exactly my concern.
The user throwing away the Post-It is entirely reasonable if he sees a
message like that.

Do you happen to know if Linux/Debian (which I think uses a journalling
filesystem) carries this risk?

Best regards


On Thu, Jan 28, 2016 at 8:00 AM, Meinlschmidt Stefan <
Stefan.Meinlschmidt at esolutions.de> wrote:

> >> Using the standard defaults (which avoid WAL), is there any
> >> possibility whatsoever of that last SQL transaction being lost?
> >
> > I have an unusual answer: Yes, and it doesn't matter.
>
> While I agree in principle, your answer depends on some assumptions that
> need not hold.
>
> > Let's suppose, as you did, that the application got back "Commit
> > Successful" followed quickly by a power failure.  You want to know,
> > could the transaction be lost anyway?  But I ask you, what action could
> > the application possibly take, in that subsecond interval, that it
> > matters?
>
> Under the QNX OS using a QNX6 filesystem with default configuration,
> that ?subsecond interval? is actually up to 10s. For any non-journalling
> filesystem (SD cards, anyone?) mounted without immediate write-through
> (for efficiency) on Linux the interval can be, IIRC, up to 30s. So how
> much can happen in this period is very sensitive to details not
> necessarily under control of or even available to the SQLite user.
>
> The application could for example write to some non-SQLite storage
> (other file system, raw flash, physical journal printout, ?) and try to
> guarantee global consistency by waiting for the SQLite transaction to
> complete. Like the user reading ?saving OK? and throwing away the
> Post-It with the original information. Or (what we did) it could shut
> off device power.
>
> > There is no God's-eye view of application state.  The important service
> > provided by the DBMS is not "what's committed is definitely saved", but
> > rather that "what's committed is definitely *consistent*".
>
> So when your application requires consistency of some broader scope, you
> need the DBMS give you enough rope^h^h^h^h^h^h^h^h^h^h^hthe tools to
> implement that yourself. Without a durability guarantee you're screwed.
>
> The more frequent simpler usecases of course are not susceptible to that
> and then indeed it doesn't matter.
>
> S.M.
> --
> Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
> Am Wolfsmantel 46, 91058 Tennenlohe, Germany
> Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
> Fax: +49-8458-3332-20-531
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Meinlschmidt Stefan
>> Using the standard defaults (which avoid WAL), is there any
>> possibility whatsoever of that last SQL transaction being lost?
> 
> I have an unusual answer: Yes, and it doesn't matter.  

While I agree in principle, your answer depends on some assumptions that
need not hold.

> Let's suppose, as you did, that the application got back "Commit
> Successful" followed quickly by a power failure.  You want to know,
> could the transaction be lost anyway?  But I ask you, what action could
> the application possibly take, in that subsecond interval, that it
> matters?

Under the QNX OS using a QNX6 filesystem with default configuration,
that ?subsecond interval? is actually up to 10s. For any non-journalling
filesystem (SD cards, anyone?) mounted without immediate write-through
(for efficiency) on Linux the interval can be, IIRC, up to 30s. So how
much can happen in this period is very sensitive to details not
necessarily under control of or even available to the SQLite user.

The application could for example write to some non-SQLite storage
(other file system, raw flash, physical journal printout, ?) and try to
guarantee global consistency by waiting for the SQLite transaction to
complete. Like the user reading ?saving OK? and throwing away the
Post-It with the original information. Or (what we did) it could shut
off device power.

> There is no God's-eye view of application state.  The important service
> provided by the DBMS is not "what's committed is definitely saved", but
> rather that "what's committed is definitely *consistent*".

So when your application requires consistency of some broader scope, you
need the DBMS give you enough rope^h^h^h^h^h^h^h^h^h^h^hthe tools to
implement that yourself. Without a durability guarantee you're screwed.

The more frequent simpler usecases of course are not susceptible to that
and then indeed it doesn't matter.

S.M.
-- 
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
Fax: +49-8458-3332-20-531


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-27 Thread James K. Lowden
On Wed, 27 Jan 2016 08:51:16 +
Bernard McNeill  wrote:

> Using the standard defaults (which avoid WAL), is there any
> possibility whatsoever of that last SQL transaction being lost?

I have an unusual answer: Yes, and it doesn't matter.  

Let's suppose, as you did, that the application got back "Commit
Successful" followed quickly by a power failure.  You want to know,
could the transaction be lost anyway?  But I ask you, what action could
the application possibly take, in that subsecond interval, that it
matters?  The failure could have happened an instant earlier, just
before "Commit Successful", and it would have to recover from that.  I
can think of no scenario in which the committed-but-lost transaction is
very different from the uncommitted-and-lost transaction.  

There is no God's-eye view of application state.  The important service
provided by the DBMS is not "what's committed is definitely saved", but
rather that "what's committed is definitely *consistent*".  If on
recovery the DBMS finds that,  contrary to its prior report, the state
of the database does not warrant including the final transaction, it
rolls it back, and presents the data in an internally consistent
state.  The application doesn't have to cope with the ledger being out
of balance and suchlike.  

--jkl


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-27 Thread Rowan Worth
On 25 January 2016 at 18:26, Meinlschmidt Stefan <
Stefan.Meinlschmidt at esolutions.de> wrote:

> > In your case it sounds like a controlled shutdown - is there a reason you
> > don't do a full disk sync before that?
>
> Yes, it is a controlled shutdown, so in my case the /* post-commit logic
> */ basically pulls the plug.
>
> Trouble is that I only control the database, not the shutdown procedure
> (this is a commercial product with several hundred people working on
> different aspects of the system). So while I can try to ask the shutdown
> crew to sync like any real computer would do, I ultimately have no
> saying in that but still need to ensure survival of data.
>
> Without the patch (which I do have applied locally, of course), me and
> everyone with a similar usecase get into trouble for relying on the
> quite excellent reputation of SQLite. Actually I first fingerpointed to
> the flash hardware or its driver, because ?SQLite is well-tested and
> doesn't have this kind of bugs? :-)
>

But if the shutdown procedure doesn't actually sync the disk, surely its
not just sqlite users that will suffer? _Anything_ which has done a write()
in the last ~10 seconds without sync()ing is going to lose data, and unlike
sqlite the vast majority of standard tools _never_ call sync().

Obviously I don't know much about your project/environment, so perhaps this
is what you want. It just strikes me as a problem easier solved at the
system level is all :)

-Rowan


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-27 Thread Simon Slavin

On 27 Jan 2016, at 8:51am, Bernard McNeill  wrote:

> Situation: Under Linux/Debian, Sqlite opens an entirely valid DB, and runs
> an entirely valid SQL transaction against that database.
> Following a Commit, the application gets back a 'Commit Successful' code.
> (Ignore any issues of disks returning hardware 'write done' flags
> prematurely).

We can ignore this, but almost all systems are set up that way.  So we 
shouldn't.

> There is then a power/OS failure.
> 
> Using the standard defaults (which avoid WAL), is there any possibility
> whatsoever of that last SQL transaction being lost?

If all of these ...

1) there was a gap of a second or so for the writing commands to complete
2) the power failure doesn't cause spurious data to be written to disk
3) the power failure doesn't cause low-level disk format problems
4) the power failure doesn't physically damage the drive
5) some other part of the OS doesn't get confused and write over the database

then you should have an uncorrupted database with the last transaction 
correctly written.

Under these circumstances the database is held not just in the database file 
but partly in the journal file.  If you analyse just the database file you may 
conclude that you have a corrupted database or that the most recent transaction 
has been lost.  One of the jobs of the SQLite _open() commands is to notice 
that the database file and/or the journal file indicate that the database 
wasn't closed cleanly, and to rescue as recent as possible uncorrupted 
database.  It will do this without informing the calling program.  The calling 
program will just thing the database was fine to begin with.

Simon.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-27 Thread Bernard McNeill
Just to be clear

Situation: Under Linux/Debian, Sqlite opens an entirely valid DB, and runs
an entirely valid SQL transaction against that database.
Following a Commit, the application gets back a 'Commit Successful' code.
(Ignore any issues of disks returning hardware 'write done' flags
prematurely).
There is then a power/OS failure.

Using the standard defaults (which avoid WAL), is there any possibility
whatsoever of that last SQL transaction being lost?


Best regards

On Mon, Jan 25, 2016 at 8:39 PM, Richard Hipp  wrote:

> On 1/25/16, Howard Chu  wrote:
> >
> > This is actually quite an unusual requirement; on older Unix systems you
> > couldn't even *open* a directory, let alone obtain write access to it or
> > fsync it.
>
> Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
> we disable the directory sync logic for this reason.  Some unixes
> (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
> MacOS, and *BSD all work without it, so I thought I'd just not bring
> that up...
> --
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-26 Thread Howard Chu
Richard Hipp wrote:
> On 1/25/16, Howard Chu  wrote:
>>
>> This is actually quite an unusual requirement; on older Unix systems you
>> couldn't even *open* a directory, let alone obtain write access to it or
>> fsync it.
>
> Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
> we disable the directory sync logic for this reason.  Some unixes
> (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
> MacOS, and *BSD all work without it, so I thought I'd just not bring
> that up...

I would have to say this is a BSD OS bug as it breaks the guarantees stated in 
the manpages. I.e., rename() and unlink() syscalls are documented to be 
atomic, and fsync() doesn't say anything about being needed to sync a directory.

http://www.unix.com/man-page/FreeBSD/2/fsync
http://www.unix.com/man-page/FreeBSD/2/unlink
http://www.unix.com/man-page/FreeBSD/2/rename/

I no longer have BSD source code on hand but I'd bet that when those manpages 
were written, all directory modifications in the BSD ffs were always 
synchronous. Linux obviously changed this but at least their fsync() manpage 
documents the behavior.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-26 Thread Warren Young
On Jan 25, 2016, at 8:47 AM, Richard Hipp  wrote:
> 
> The feedback I receive is that most users of SQLite would much rather
> avoid the extra directory syncs, even if it means having the last
> transaction rollback following a power loss.

Why not do the directory fsync in sqlite3_close_v2()?

As I understand the original problem, this is happening in a system doing a 
controlled shutdown, not a crash or spontaneous reboot.

I?m close to this problem at the moment because we recently switched to using 
WAL mode by default, and I noticed that if there is a code path that causes 
sqlite3_close_v2() to be skipped, the WAL file remains behind, causing a 
SQLITE_NOTICE_RECOVER_WAL complaint when the app starts back up again.

If the last writer to a SQLite DB closes its connection down gracefully, there 
should be no uncertainty about whether all transactions have durably hit the 
disk.

I can live with such uncertainty if the last writer *doesn?t* gracefully close 
its connection.  That?s kind of concomitant with using an in-process DBMS.

(Contrasting with a client-server DBMS, where durability is not compromised if 
a remote client disappears after COMMIT without gracefully closing its TCP 
connection afterward.)


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread R Smith


On 2016/01/25 5:08 PM, Matthias-Christian Ott wrote:
> On 25/01/16 14:14, Richard Hipp wrote:
>> On 1/19/16, Meinlschmidt Stefan  wrote:
>>> Shutting down power right after a successfully committed
>>> transaction rolls back that transaction on next startup.
>> Patches checked in:
>>
>>  https://www.sqlite.org/src/info/30671345b1c1ee55
>>  https://www.sqlite.org/draft/compile.html#extra_durable
> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
> transaction that it said to be committed depending on the VFS?
>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?

I understand your concern, but don't confuse the loss of a transaction 
with any form of inconsistency or broken database integrity. It 
essentially means that on some OSes (to knowledge, only QNX currently) 
in the event of power loss or system breakdown right at the moment you 
were editing and saving some value, you might suffer the inconvenience 
of having to add that one song to your music list again, or have to add 
the name of that one contact again. This is not really an 
end-of-the-World-OMG-we-all-gonna-die! problem in general application 
terms. It's quite acceptable to 99% of databases.

Sure, if your database is critical and needs the durability to be 
absolute, you HAVE to go full synchronous regardless of OS, but that 
must already be obvious at the very onset of your design and everything 
you do or decide must keep this in mind. Such a developer would never 
simply roll with the defaults of any DB system - I'd hope.

The very idea of defaults is to cater for the majority, not the rarest 
of use-cases.



[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Matthias-Christian Ott
On 2016-01-25 16:47, Richard Hipp wrote:
> On 1/25/16, Matthias-Christian Ott  wrote:
>>
>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>> transaction that it said to be committed depending on the VFS?
> 
> Sort of.  This appears to be true if you are running on QNX and you
> lose power (or do a hard reset) shortly after the transaction commits.
> It might be the case on other OSes/filesystems but it has never before
> been observed.
> 
> This is not new behavior.  This is apparently what SQLite has been
> doing for 15 years, across quadrillions of transactions on many
> billions of devices, and it has never before caused any issues, until
> just recently when Mr. Meinlschmidt upgraded to a newer version of
> QNX.

So it would make sense to add a note that you should check whether your
target VFS and target operating environment needs an fsync after a
journal commit if you want to use this journal mode. Would it be
possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux
distributions package SQLite and therefore not every application can
compile SQLite with different options.

> Note especially that this is about durability, not consistency.
> SQLite guarantees consistency regardless.  People care about
> consistency.  Durability, not so much.  I'm not a MySQL expert, but I
> think the historical complaints about MyISAM had more to do with
> consistency than with durability, did they not?

That's also my understanding. It's unarguably an odd comparison but I
was referring to the ACID properties in general.

- Matthias-Christian


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Howard Chu
Matthias-Christian Ott wrote:
> On 2016-01-25 16:47, Richard Hipp wrote:
>> On 1/25/16, Matthias-Christian Ott  wrote:
>>>
>>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>>> transaction that it said to be committed depending on the VFS?
>>
>> Sort of.  This appears to be true if you are running on QNX and you
>> lose power (or do a hard reset) shortly after the transaction commits.
>> It might be the case on other OSes/filesystems but it has never before
>> been observed.
>>
>> This is not new behavior.  This is apparently what SQLite has been
>> doing for 15 years, across quadrillions of transactions on many
>> billions of devices, and it has never before caused any issues, until
>> just recently when Mr. Meinlschmidt upgraded to a newer version of
>> QNX.
>
> So it would make sense to add a note that you should check whether your
> target VFS and target operating environment needs an fsync after a
> journal commit if you want to use this journal mode. Would it be
> possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux
> distributions package SQLite and therefore not every application can
> compile SQLite with different options.

The question isn't just whether "an fsync is needed" with journal mode - the 
question is *which* fsync is needed? The issue here is that file 
creation/deletion/rename ops require an fsync *on the containing directory*. 
This is actually quite an unusual requirement; on older Unix systems you 
couldn't even *open* a directory, let alone obtain write access to it or fsync 
it.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Stephan Beal
On Mon, Jan 25, 2016 at 5:08 PM, Richard Hipp  wrote:

> On 1/25/16, Stephen Chrzanowski  wrote:
> >
> > You also have to look at balance across many millions (or is it
> billions?)
> > of devices out there that use SQLite for their primary operations.
>
> Billions and billions.
>

https://en.wikipedia.org/wiki/Billions_and_Billions


> I don't think it is even that serious.  This problem is that if you
> (for example) set a new bookmark on your browser just as the cat is
> tripping over the power cord, then after reboot the bookmark
> disappears.  The bookmark database is still completely intact - it
> just went backwards in time a little.
>

Or, alternately, it _never went forward_ in time.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Richard Hipp
On 1/25/16, Howard Chu  wrote:
>
> This is actually quite an unusual requirement; on older Unix systems you
> couldn't even *open* a directory, let alone obtain write access to it or
> fsync it.

Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
we disable the directory sync logic for this reason.  Some unixes
(HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
MacOS, and *BSD all work without it, so I thought I'd just not bring
that up...
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Matthias-Christian Ott
On 25/01/16 14:14, Richard Hipp wrote:
> On 1/19/16, Meinlschmidt Stefan  wrote:
>>
>> Shutting down power right after a successfully committed
>> transaction rolls back that transaction on next startup.
> 
> Patches checked in:
> 
> https://www.sqlite.org/src/info/30671345b1c1ee55
> https://www.sqlite.org/draft/compile.html#extra_durable

Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
transaction that it said to be committed depending on the VFS?

If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
be more important than performance, except when the constraints are such
that correctness has to be sacrificed for performance?

The trade-off that is described in the description of SQLite
SQLITE_EXTRA_DURABLE reads like an excerpt from the MySQL manual when
MyISAM was still widely used. Perhaps I'm also too irritated by
discussions with advocates of MySQL who would argue against the fact
that proper transactions were necessary because the DBMS would be faster
without them. That is not to say that the ACID properties and
transactions solve every concurrency or correctness problem but they
help significantly.

- Matthias-Christian



[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Richard Hipp
On 1/25/16, Stephen Chrzanowski  wrote:
>
> You also have to look at balance across many millions (or is it billions?)
> of devices out there that use SQLite for their primary operations.

Billions and billions.

> Slapping a serious performance decrease on devices where time and
> performance is essential BY DEFAULT seems a little strict.  And all at the
> cost of one transaction?  To some, maybe a worthwhile trade off, sure, but
> I'm not sure I'm be overly outraged if my wifes music box (iPod) suddenly
> had to recompile its music list.

I don't think it is even that serious.  This problem is that if you
(for example) set a new bookmark on your browser just as the cat is
tripping over the power cord, then after reboot the bookmark
disappears.  The bookmark database is still completely intact - it
just went backwards in time a little.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Stephen Chrzanowski
On Mon, Jan 25, 2016 at 10:08 AM, Matthias-Christian Ott 
wrote:

>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?
>

I wouldn't want that, as that would entirely depend on the application.
Maybe I'm an edge case, but I've written apps where the database is
'production', but not 'valuable' enough to have performance slapped.  The
app didn't care if the data was destroyed.  It'd go and recompile what it
needed from the external sources from which the now-destroyed database was
created on.  It basically was a cache type implementation, but changes to
the database then were pushed to whatever the external source was.

SQLite is also mostly looking at devices that are embeded, limited on
performance, and not a lot of wiggle room where performance would seriously
degrade the "worth-while-ness" of the device.  ... which is why I no longer
have a Rogers PVR as it'd take 5+ seconds to do anything...ahem... Anyways..

You also have to look at balance across many millions (or is it billions?)
of devices out there that use SQLite for their primary operations.
Slapping a serious performance decrease on devices where time and
performance is essential BY DEFAULT seems a little strict.  And all at the
cost of one transaction?  To some, maybe a worthwhile trade off, sure, but
I'm not sure I'm be overly outraged if my wifes music box (iPod) suddenly
had to recompile its music list.


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Richard Hipp
On 1/25/16, Matthias-Christian Ott  wrote:
>
> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
> transaction that it said to be committed depending on the VFS?

Sort of.  This appears to be true if you are running on QNX and you
lose power (or do a hard reset) shortly after the transaction commits.
It might be the case on other OSes/filesystems but it has never before
been observed.

This is not new behavior.  This is apparently what SQLite has been
doing for 15 years, across quadrillions of transactions on many
billions of devices, and it has never before caused any issues, until
just recently when Mr. Meinlschmidt upgraded to a newer version of
QNX.

>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?
>
> The trade-off that is described in the description of SQLite
> SQLITE_EXTRA_DURABLE reads like an excerpt from the MySQL manual when
> MyISAM was still widely used. Perhaps I'm also too irritated by
> discussions with advocates of MySQL who would argue against the fact
> that proper transactions were necessary because the DBMS would be faster
> without them. That is not to say that the ACID properties and
> transactions solve every concurrency or correctness problem but they
> help significantly.
>

As you point out, it is an engineering tradeoff.

The feedback I receive is that most users of SQLite would much rather
avoid the extra directory syncs, even if it means having the last
transaction rollback following a power loss.  Most developers do not
care that much about durability, at least not enough to want to take
the performance hit of syncing the directory after every unlink.
Non-durable commits on power-loss have long been the default in WAL
mode (run-time fixable by setting PRAGMA synchronous=FULL) and nobody
has before ever complained.  Most people consider this a feature.  In
fact, if I recall correctly, we first made synchronous=NORMAL the
default in WAL mode by popular request.  WAL mode used to default to
power-loss durable but people requested the change for performance
reasons.

Note especially that this is about durability, not consistency.
SQLite guarantees consistency regardless.  People care about
consistency.  Durability, not so much.  I'm not a MySQL expert, but I
think the historical complaints about MyISAM had more to do with
consistency than with durability, did they not?

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Meinlschmidt Stefan
Hi Rowan!

>> Shutting down power right after a successfully committed
>> transaction rolls back that transaction on next startup.
> 
> nitpick: This is sqlite behaving as advertised. See
> https://www.sqlite.org/lockingv3.html section 5.0 step 6, and
> https://www.sqlite.org/atomiccommit.html section 3.11 which explain that
> the presence of a [well-formed] journal file is the mechanism via which
> sqlite discriminates between a committed and in-progress transaction.
> 
> ie. according to sqlite a transaction is *not* successfully committed if
> the journal file is still present, so its well within its rights to
> rollback in this scenario.

Yes, a hot journal file means an incomplete transaction and should
absolutly roll back. What actually bugs me is that this happens after
COMMIT has returned without an error.

> That said, syncing the directory doesn't sound like a terrible idea. But
> it's not clear to me that the cost of another sync every transaction is
> worth a marginal reduction in the power-failure-leads-to-rollback window.

The actual costs are not clear to me either. I hope that someone on the
list with more experience in that field knows more.

> That's if it even reduces the window; it wouldn't surprise me to find that
> the journal is removed from disk after the same delay both with and without
> dirsync, the difference being that the dirsync prevents sqlite from
> returning control to your code until its certain the commit has persisted.

The filesystem used (QNX6) is advertised as performing a full sync on
every fsync and automatically some delay time after each write (10s),
and it looks like it is really doing so. If it wouldn't sync on fsync,
or if fsync would just wait until the automatic sync is through, I would
surely open a ticket with QNX and have some stern words with their resident.

That said I would expect the dirsync to shorten the window of
unnecessary rollback by an average of 5s in our case, which may or may
not be relevant to other users. On other filesystems with longer times
until it syncs anyway (I remember 30s from my first Linux box) the
reduction might be more substantial.

> There's certainly a surprising result here:
> 
> if (sqlite3_exec("COMMIT") == SQLITE_OK) {
> /* post-commit logic */
> }
> 
> Logically, the code in the if block can reasonably assume that the
> transaction will not rollback.

And that's where the potential for a tradeoff lies. Should SQLite
guarantee that in post-commit the transaction is done, durably, even for
the cost of the additional? Or should it not guarantee durability an be
a bit (or much, I don't know) faster?

Personally I tend to assume a database is made for guaranteeing ACID
properties, except when the user explicitly decides otherwise. So in my
eyes an fsync plus possibly an #ifdef for explicitly disabling it would
be the cleanest thing. OTOH it might surprise users that are used to the
higher performance/lower load and don't care for durability, so an
#ifdef for explicitly enabling the fsync instead might still be a good
compromise.

> But as you have discovered this is not
> always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be
> fair I don't think there are many power-failure scenarios where the
> post-commit logic would have a chance to do anything significant, so the
> incorrect assumption will usually be moot.

Yes, in a normal power failure you ?only? win more time, a few seconds
or more, depending on the filesystem's details.

BTW., as you have mentioned JOURNAL_MODE=DELETE?I have not examined what
happens in the other journal modes, i.e. I'm not sure whether
invalidating the journal by other means has the same lack of sync or
not. A proper patch might need to involve the other modes, too.

> In your case it sounds like a controlled shutdown - is there a reason you
> don't do a full disk sync before that?

Yes, it is a controlled shutdown, so in my case the /* post-commit logic
*/ basically pulls the plug.

Trouble is that I only control the database, not the shutdown procedure
(this is a commercial product with several hundred people working on
different aspects of the system). So while I can try to ask the shutdown
crew to sync like any real computer would do, I ultimately have no
saying in that but still need to ensure survival of data.

Without the patch (which I do have applied locally, of course), me and
everyone with a similar usecase get into trouble for relying on the
quite excellent reputation of SQLite. Actually I first fingerpointed to
the flash hardware or its driver, because ?SQLite is well-tested and
doesn't have this kind of bugs? :-)

S.M.
-- 
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
Fax: +49-8458-3332-20-531


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Richard Hipp
On 1/19/16, Meinlschmidt Stefan  wrote:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

Patches checked in:

https://www.sqlite.org/src/info/30671345b1c1ee55
https://www.sqlite.org/draft/compile.html#extra_durable

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Richard Hipp
On 1/19/16, Meinlschmidt Stefan  wrote:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

As you observe, this is a file-system dependent thing, and probably
only happens on the QNX filesystem.  I will see if we can add a
compile-time option to the next release that will allow you to force a
directory sync after the rollback journal commits.  But this won't be
on by default because it would do nothing but slow down commits on the
overwhelming majority of SQLite users.

Meanwhile, a good work-around for you might be to use either

 PRAGMA journal_mode=TRUNCATE;
 PRAGMA journal_mode=PERSIST;

Both of which sync the rollback-journal upon commit.  Or, use:

 PRAGMA journal_mode=WAL; PRAGMA synchronous=FULL;

which causes the write-ahead log to be synced following every commit.
Note that the default behavior for WAL mode is that the WAL is not
synced, and hence committed transactions might rollback following a
power loss.  This is the behavior most people desire (not that their
transactions roll back but rather they are willing to endure that in
exchange for fewer fsyncs.)  You must set "PRAGMA synchronous=FULL" to
cause the WAL to be synced after each transaction.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-22 Thread Rowan Worth
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

nitpick: This is sqlite behaving as advertised. See
https://www.sqlite.org/lockingv3.html section 5.0 step 6, and
https://www.sqlite.org/atomiccommit.html section 3.11 which explain that
the presence of a [well-formed] journal file is the mechanism via which
sqlite discriminates between a committed and in-progress transaction.

ie. according to sqlite a transaction is *not* successfully committed if
the journal file is still present, so its well within its rights to
rollback in this scenario.


That said, syncing the directory doesn't sound like a terrible idea. But
it's not clear to me that the cost of another sync every transaction is
worth a marginal reduction in the power-failure-leads-to-rollback window.
That's if it even reduces the window; it wouldn't surprise me to find that
the journal is removed from disk after the same delay both with and without
dirsync, the difference being that the dirsync prevents sqlite from
returning control to your code until its certain the commit has persisted.

There's certainly a surprising result here:

if (sqlite3_exec("COMMIT") == SQLITE_OK) {
/* post-commit logic */
}

Logically, the code in the if block can reasonably assume that the
transaction will not rollback. But as you have discovered this is not
always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be
fair I don't think there are many power-failure scenarios where the
post-commit logic would have a chance to do anything significant, so the
incorrect assumption will usually be moot.

In your case it sounds like a controlled shutdown - is there a reason you
don't do a full disk sync before that?

-Rowan

On 19 January 2016 at 21:33, Meinlschmidt Stefan <
Stefan.Meinlschmidt at esolutions.de> wrote:

> Hi everybody!
>
> TL;DR:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup. This is a
> problem in write-on-shutdown-then-power-off scenarios and violates my
> expectation of SQLite's transactions being ACID. This can be fixed by
> setting the dirSync-flag of the sqlite3OsDelete-call within
> pager_end_transaction.
>
>


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-19 Thread Meinlschmidt Stefan
Hi everybody!

TL;DR:

Shutting down power right after a successfully committed
transaction rolls back that transaction on next startup. This is a
problem in write-on-shutdown-then-power-off scenarios and violates my
expectation of SQLite's transactions being ACID. This can be fixed by
setting the dirSync-flag of the sqlite3OsDelete-call within
pager_end_transaction.

Full:

We are using a slightly patched SQLite 3.7.5 in an embedded system
running under QNX 6.5 as a data store that does the right thing even
when power fails, which has worked remarkably well up to now. After
migrating the database onto a QNX6 filesystem (which does not write
through synchronously) we almost reliably lose the last transaction on
power off, which happens ~1.5s after. We use journal mode, BTW, for its
more reliably limited file sizes.

While 3.7.5 is quite old, the same behaviour can easily be demonstrated
with a simple test program using sqlite-amalgamation-201601141433. To
try yourself, unpack sqlite3.{c,h} into the same directory as the
attached files and run make.

The test program opens a database in the current directory, creates a
table and adds a row to it, then asks you to power off. I did this on
our device, running the snoopy filesystem access logger alongside to see
what's happening:

> # ls -l
> total 16
> drwxr-xr-x  2 root  root   4096 Jan 01 00:02 .
> drwxrwxr-x  4 root  root   4096 Jan 01 00:02 ..
> # /fs/sda0/snoopy `pwd`
> # /fs/sda0/uncommit
> opening uncommit.sqlite...
> 4382854,1,,READLINK,/mnt/boardbook/test/uncommit.sqlite,No such file or 
> directory
> 4382854,1,,OPEN,/mnt/boardbook/test/.,No error,
> 4382854,1,,CLOSE,/mnt/boardbook/test/.,No error
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No such file or directory,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,rwc
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,DEVCTL,/mnt/boardbook/test/uncommit.sqlite,No 
> error,40a8020d,FSYS_STATVFS
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 4382854,1,,READ,/mnt/boardbook/test/uncommit.sqlite,No error,0/100
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or 
> directory,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or 
> directory
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or 
> directory,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or 
> directory
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,rwc
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 4382854,1,,CHOWN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0:0
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite-journal,No error,512/512
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,1024
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 4382854,1,,FSYNC,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
>