[sqlite] Atomic commit assumptions regarding underlying writes

Drake Wilson Sun, 03 Oct 2010 04:13:18 -0700

Here's two parts of the doc on atomic commit behavior in SQLite, from
http://sqlite.org/atomiccommit.html as retrieved on 2010-10-03.


Section 2.0, "Hardware Assumptions", states:
| SQLite does not assume that a sector write is atomic. However, it does
| assume that a sector write is linear. By "linear" we mean that SQLite
| assumes that when writing a sector, the hardware begins at one end of
| the data and writes byte by byte until it gets to the other end. The
| write might go from beginning to end or from end to beginning. If a
| power failure occurs in the middle of a sector write it might be that
| part of the sector was modified and another part was left
| unchanged. The key assumption by SQLite is that if any part of the
| sector gets changed, then either the first or the last bytes will be
| changed.

I interpret this to imply that if we have a sector S that contains the
bitstring D and a system crash occurs while writing a different string
D' to S, a subsequent read of S will return E such that E[j] = D[j]
for all j where D[j] = D'[j].  In other words, bits that are not being
changed do not get corrupted by a partial sector write.  (This is a
weaker statement than my interpretation of the above that the bits
that get flipped will be in a contiguous region starting from one of
the ends.)

However, in section 6.1, "Always Journal Complete Sectors", I see:
| It is important to store all pages of a sector in the rollback journal
| in order to prevent database corruption following a power loss while
| writing the sector. Suppose that pages 1, 2, 3, and 4 are all stored
| in sector 1 and that page 2 is modified. In order to write the changes
| to page 2, the underlying hardware must also rewrite the content of
| pages 1, 3, and 4 since the hardware must write the complete
| sector. If this write operation is interrupted by a power outage, one
| or more of the pages 1, 3, or 4 might be left with incorrect data.

This would seem to mean that my initial interpretation of the
paragraph in section 2.0 is wrong, because it would imply that the
other pages remain untouched.  If the identical page 1 is written back
to the database file, then a partial page 2, and then a crash occurs,
pages 1, 3, and 4 should all remain intact, and similarly if the write
begins from the end or if the crash occurs in any other location.  Is
it that the parts that are "touched" by a linear write are potentially
totally corrupted by a crash?

Could anyone provide some clarification on what states the underlying
OS+hardware stack is "allowed" to expose after a crashed sector write
in order for the journaling mechanism to work properly?

I'm also curious as to the source data for these sorts of assumptions
regarding what modern OS+hardware stacks do when rewriting sectors of
a mass storage device.  In particular, I haven't found good data on
what results occur after power failures during rewriting sectors of a
hard disk or "rewriting" sectors of a flash-based device, or similar
information for filesystems that don't do in-place writes.  Pointers
would be appreciated.

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Atomic commit assumptions regarding underlying writes

Reply via email to