Re: [HACKERS] Enabling Checksums

Greg Smith Mon, 18 Mar 2013 19:14:44 -0700

On 3/18/13 5:36 PM, Daniel Farina wrote:

Clarification, because I think this assessment as delivered feeds some
unnecessary FUD about EBS:


EBS is quite reliable.  Presuming that all noticed corruptions are
strictly EBS's problem (that's quite a stretch), I'd say the defect
rate falls somewhere in the range of volume-centuries.

I wasn't trying to flog EBS as any more or less reliable than othertypes of storage. What I was trying to emphasize, similarly to your"quite a stretch" comment, was the uncertainty involved when suchdeployments fail. Failures happen due to many causes outside of justEBS itself. But people are so far removed from the physical objectsthat fail, it's harder now to point blame the right way when things fail.

A quick example will demonstrate what I mean. Let's say my server athome dies. There's some terrible log messages, it crashes, and when itcomes back up it's broken. Troubleshooting and possibly replacementparts follow. I will normally expect an eventual resolution thatincludes data like "the drive showed X SMART errors" or "I swapped thememory with a similar system and the problem followed the RAM". I'lllearn something about what failed that I might use as feedback to adjustmy practices. But an EC2+EBS failure doesn't let you get to the rootcause effectively most of the time, and that makes people nervous.

I can already see "how do checksums alone help narrow the blame?" as thenext question. I'll post something summarizing how I use them for thattomorrow, just out of juice for that tonight.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

Reply via email to