On 08.03.2013 05:31, Bruce Momjian wrote:
Also, don't all modern storage drives have built-in checksums, and
report problems to the system administrator?  Does smartctl help report
storage corruption?

Let me take a guess at answering this --- we have several layers in a
database server:

        1 storage
        2 storage controller
        3 file system
        4 RAM
        5 CPU

My guess is that storage checksums only cover layer 1, while our patch
covers layers 1-3, and probably not 4-5 because we only compute the
checksum on write.

There is a thing called "Data Integrity Field" and/or "Data Integrity Extensions", that allow storing a checksum with each disk sector, and verifying the checksum in each layer. The basic idea is that instead of 512 byte sectors, the drive is formatted to use 520 byte sectors, with the extra 8 bytes used for the checksum and some other metadata. That gets around the problem we have in PostgreSQL, and that filesystems have, which is that you need to store the checksum somewhere along with the data.

When a write I/O request is made in the OS, the OS calculates the checksum and passes it to through the controller to the drive. The drive verifies the checksum, and aborts the I/O request if it doesn't match. On a read, the checksum is read from the drive along with the actual data, passed through the controller, and the OS verifies it. This covers layers 1-2 or 1-3.

Now, this requires all the components to have support for that. I'm not an expert on these things, but I'd guess that that's a tall order today. I don't know which hardware vendors and kernel versions support that. But things usually keep improving, and hopefully in a few years, you can easily buy a hardware stack that supports DIF all the way through.

In theory, the OS could also expose the DIF field to the application, so that you get end-to-end protection from the application to the disk. This means that the application somehow gets access to those extra bytes in each sector, and you have to calculate and verify the checksum in the application. There are no standard APIs for that yet, though.

See https://www.kernel.org/doc/Documentation/block/data-integrity.txt.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to