Re: [HACKERS] Enabling Checksums

Greg Smith Thu, 12 Sep 2013 18:27:29 -0700

On 3/18/13 10:52 AM, Bruce Momjian wrote:

With a potential 10-20% overhead, I am unclear who would enable this at
initdb time.

If you survey people who are running PostgreSQL on "cloud" hardware, beit Amazon's EC2 or similar options from other vendors, you will find ahigh percentage of them would pay quite a bit of performance to maketheir storage more reliable. To pick one common measurement forpopularity, a Google search on "ebs corruption" returns 17 million hits.To quote one of those, Baron Schwartz of Percona talking about MySQLon EC2:

"BTW, I have seen data corruption on EBS volumes. It’s not clear whetherit was InnoDB’s fault (extremely unlikely IMO), the operating system’sfault, EBS’s fault, or something else."


http://www.mysqlperformanceblog.com/2011/08/04/mysql-performance-on-ec2ebs-versus-rds/

*That* uncertainty is where a lot of the demand for this feature iscoming from. People deploy into the cloud, their data gets corrupted,and no one call tell them what/why/how it happened. And that means theydon't even know what to change to make it better. The only people I seereally doing something about this problem all seem years off, and I'mnot sure they are going to help--especially since some of them aretargeting "enterprise" storage rather than the cloud-style installations.

I assume a user would wait until they suspected corruption to turn it
on, and because it is only initdb-enabled, they would have to
dump/reload their cluster.  The open question is whether this is a
usable feature as written, or whether we should wait until 9.4.

The reliability issues of both physical and virtual hardware are sowidely known that many people will deploy with this on as their defaultconfiguration.

If you don't trust your existing data, you can't retroactively check it.A checksum of an already corrupt block is useless. Accordingly, thereis no use case for converting an installation with real or evensuspected problems to a checksummed one. If you wait until you suspectcorruption to care about checksums, it's really too late. There is onlyone available next step: you must do a dump to figure out what'sreadable. That is the spot that all of the incoming data recoverycustomers we see at 2ndQuadrant are already in when we're called. Thecluster is suspicious, sometimes they can get data out of it with adump, and if we hack up their install we can usually recover a bit morethan they could.

After the data from a partially corrupted database is dumped, someonewho has just been through that pain might decide they should turnchecksums on when they restore the dump. When it's on, they can accessfuture damage easily at the block level when it happens, and possiblyrepair it without doing a full dump/reload. What's implemented in thefeature we're talking about has a good enough UI to handle this entirecycle I see damaged installations go through.

In fact, this feature is going to need
pg_upgrade changes to detect from pg_controldata that the old/new
clusters have the same checksum setting.


I think that's done already, but it's certainly something to test out too.

Good questions, Bruce, I don't think the reasons behind this feature'sdemand have been highlighted very well before. I try not to spook theworld by talking regularly about how many corrupt PostgreSQL databasesI've seen, but they do happen. Most of my regular ranting on crappySSDs that lie about writes comes from a TB scale PostgreSQL install thatgot corrupted due to the write-cache flaws of the early IntelSSDs--twice. The would have happily lost even the worst-case 20% ofregular performance to avoid going down for two days each time they sawcorruption, where we had to dump/reload to get them going again. If theinstall had checksums, I could have figured out which blocks weredamaged and manually fixed them. Without checksums, there's no way toeven tell for sure what is broken.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

Reply via email to