> You made a comment about the outage in April. I know that we have taken it > very >seriously and have identified and implemented things to prevent similar > things from >happening. It was a all hands on deck situation. On the side, I > manage a friends site and >even though he was affected and a single instance > user (the site doesn't get enough traffic >to justify anything else, yet), I > was able to take a snapshot of the instance and launch in a >different AZ and > get him back up after a few hours even though the main instance that was > >affected was stuck for the whole duration of the outage.
On the other hand I had a client lose an entire site during the outage. He was in a single AZ, his data was stored on an EBS volume and that volume somehow became non-recoverable even after the outage was over. Thankfully his weekly backup had run a few days before and we were able to move his data to a shared host temporarily and restore when things were worked out. Still I agree if you design it with a different mindset you can quickly make highly survivable infrastructure, but in my own opinion it does require a completely different mindset. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
