2013/3/8 Bruce Momjian <br...@momjian.us>: > On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: >> Putting aside the not-so-rosy predictions seen elsewhere in this >> thread about the availability of a high performance, reliable >> checksumming file system available on common platforms, I'd like to >> express what benefit this feature will have to me: >> >> Corruption has easily occupied more than one person-month of time last >> year for us. This year to date I've burned two weeks, although >> admittedly this was probably the result of statistical clustering. >> Other colleagues of mine have probably put in a week or two in >> aggregate in this year to date. The ability to quickly, accurately, >> and maybe at some later date proactively finding good backups to run >> WAL recovery from is one of the biggest strides we can make in the >> operation of Postgres. The especially ugly cases are where the page >> header is not corrupt, so full page images can carry along malformed >> tuples...basically, when the corruption works its way into the WAL, >> we're in much worse shape. Checksums would hopefully prevent this >> case, converting them into corrupt pages that will not be modified. >> >> It would be better yet if I could write tools to find the last-good >> version of pages, and so I think tight integration with Postgres will >> see a lot of benefits that would be quite difficult and non-portable >> when relying on file system checksumming. > > I see Heroku has corruption experience, and I know Jim Nasby has > struggled with corruption in the past. > > I also see the checksum patch is taking a beating. I wanted to step > back and ask what percentage of known corruptions cases will this > checksum patch detect? What percentage of these corruptions would > filesystem checksums have detected? > > Also, don't all modern storage drives have built-in checksums, and > report problems to the system administrator? Does smartctl help report > storage corruption? > > Let me take a guess at answering this --- we have several layers in a > database server: > > 1 storage > 2 storage controller > 3 file system > 4 RAM > 5 CPU > > My guess is that storage checksums only cover layer 1, while our patch > covers layers 1-3, and probably not 4-5 because we only compute the > checksum on write. > > If that is correct, the open question is what percentage of corruption > happens in layers 1-3?
I cooperate with important Czech bank - and they request checksum as any other tool to increase a possibility to failure identification. So missing checksums penalize a usability PostgreSQL to critical systems - speed is not too important there. Regards Pavel > > -- > Bruce Momjian <br...@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers