On 3/18/13 5:36 PM, Daniel Farina wrote:
Clarification, because I think this assessment as delivered feeds some
unnecessary FUD about EBS:

EBS is quite reliable.  Presuming that all noticed corruptions are
strictly EBS's problem (that's quite a stretch), I'd say the defect
rate falls somewhere in the range of volume-centuries.

I wasn't trying to flog EBS as any more or less reliable than other types of storage. What I was trying to emphasize, similarly to your "quite a stretch" comment, was the uncertainty involved when such deployments fail. Failures happen due to many causes outside of just EBS itself. But people are so far removed from the physical objects that fail, it's harder now to point blame the right way when things fail.

A quick example will demonstrate what I mean. Let's say my server at home dies. There's some terrible log messages, it crashes, and when it comes back up it's broken. Troubleshooting and possibly replacement parts follow. I will normally expect an eventual resolution that includes data like "the drive showed X SMART errors" or "I swapped the memory with a similar system and the problem followed the RAM". I'll learn something about what failed that I might use as feedback to adjust my practices. But an EC2+EBS failure doesn't let you get to the root cause effectively most of the time, and that makes people nervous.

I can already see "how do checksums alone help narrow the blame?" as the next question. I'll post something summarizing how I use them for that tomorrow, just out of juice for that tonight.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to