2015-06-25 21:06 GMT+07:00 Simon Slavin <slavins at bigfraud.org>:

>
> On 25 Jun 2015, at 11:49am, Mikael <mikael.trash at gmail.com> wrote:
>
> > Harddrives can fail in the most remarkable, surprising ways imaginable,
> as
> > for instance reflected on
> >
> https://www.usenix.org/legacy/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/index.html
> > and https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/
> .
>
> A couple of nice pieces of detective work.
>
> > (E.g., writes may fail transparently, random sectors may be zeroed, the
> > disk's internal sector map may break so reads may return any garbled
> data,
> > RAM-based disk cache may hide write failures so data initially seems to
> > have been written but was not, etc.)
>
> Don't be unduly deterred from SSDs by some reports you'll read on the
> internet.  Many of these articles start from the false premise that "We
> know rotating drives aren't perfect, but SSDs are better technology with no
> moving parts so they should be perfect.".


Ah no I didn't mean to exclude magnet drives.


> The actual figures for faulty firmware in rotating drives compared with
> SSDs are about the same for each new model (around 1 in 150 newly launched
> models of drive has faulty firmware).  So they're about equally likely to
> show firmware-related faults.
>

Dear.

> This motivates me to ask for your complete answer to,
> >
> > 1) What is the worst-case SQLite misbehavior(s) that can disk failure can
> > lead to, and,
>
> Any misbehaviour by SQLite that gets reported here gets fixed by the
> developer team.  They're really good.  It's one reason why I'm still
> working with SQLite.  So I'm going to answer your question as if it refers
> to the entire system being used: program, APIs, and hardware.
>

Ah no no - by misbehavior here, I meant, what in the process of opening an
SQLite database and making queries to it, could fail -

I.e., i'm not talking about what bugs could appear in SQLite - I share your
trust in its quality there - but how the error handling would look like.

It depends on what you're using the disk for.  In banking, the worst-case
> misbehaviour happens not when the drive stops working, but when incorrect
> data is returned.  This generally takes two simultaneous
> parity/checksum/hashing failures which happens more than you'd think.


Wait, can you detail this further, and what you would do about it?

ZFS for instance has https://en.wikipedia.org/wiki/Fletcher's_checksum
checksums, up to the root level, and the community seems to trust them
really well?

(Curiously enough, ZFS RAID1 won't actually read from both disks on read
operation, and would thus be useless in this regard.)

It's far better for the bank for the disk to fail obviously and
> completely.  A bank is rich, has an extremely good backup protocol, and can
> easily afford to buy a new disk when one disk fails.  If hardware fails in
> a bank, it just means that a couple of techies are going to claim overtime.
>
> If, on the other hand, you have a small measuring device with an embedded
> controller (e.g. GPS unit, measurement devices, calculators), a single
> incorrect figure will probably be obviously wrong, and it will just make
> you shake their head and try again.  If you're at a customer site and you
> spot the number is wrong, the customer will never notice you retrying.  But
> if you're at a customer site and your tools stop working completely it
> makes you look like you buy cheap untrustworthy tools and that can make you
> or your company look bad to the customer.
>

Among these levels of correctness requirement you described here, let's go
with the "bank" one: Complete infallability.


> > 2) Through what return values and otherwise does SQLite report non-exotic
> > disk failures, and,
>
> <https://www.sqlite.org/c3ref/c_abort.html>
>
> If the fault happens when the data is not being processed (computer idle
> or file closed) you'll probably get something like _CORRUPT or _NOTADB.  If
> failure happens while data is being handled you'll probably get something
> like _IOERR.  It is rare to see any of these in the wild unless you already
> know your hardware is faulty.  For example, if your computer show
> unexpected unexplained error messages during bootup, don't carry on as
> normal.
>
> A further breakdown of a number of reasons for SQLITE_IOERR can be found at
>
> <https://www.sqlite.org/c3ref/c_abort_rollback.html>
>
> The lesson here is that your code should check and report any non
> SQLITE_OK result codes returned by /all/ SQLite calls, even if you don't
> think a fault can result, or if the program can't do anything about the
> error.  Sometimes the error for writing to disk doesn't show up in the
> "UPDATE" command but later on when you try to close the file.  Programs I
> write for life-critical or mission-critical situations check and report
> everything.  Programs I write for games, quick lookups or calculations for
> trivial purposes might not.
>
> > 3) In a server setup, is there any panacea against disk failures - basing
> > the storage on RAID1 perhaps?
>
> Nope.  For this question ignore entirely the fact that you're using SQLite
> and just imagine what you'd do for any storage device.  RAID1, for example,
> is just a mirrored pair of disks.  If one returns incorrect data for a
> 'read' your system still falls over because it doesn't know which disk to
> believe.
>
> If data matters to you, buy a drive rated for server use (24/7 uptime,
> parity/checksums checked, write-before-acknowledge, jumpers set and drivers
> configured for the drive being used in a server).  It will be slower than
> most drives but more trustworthy.  Even then you must assume that your
> hardware will fail sometime, so create a backup procedure, take the
> backups, and occasionally try to restore one to make sure it's a usable
> backup and your copy/restore procedure works.  The most common failure
> modes in a backup system are not a failure to take backups but faults in
> the procedure so that backups don't contain all the data needed, or that
> the backup can only be restored on the hardware which has just failed.
>

Even a particular SSD can fail, even if it's of a nice brand and properly
configured, with finegrained flushing.


It seems to me that the ZFS guys are quite confident that a mirrored ZFS
setup (so kinda ZFS RAID1, the mirror used for the storage to be redundant
only, not to double-read data) offers absolute data storage integrity.

Do you disagree?

Reply via email to