That's a great list of tips Nathan. Interestingly our datacentre (Primus) suffered loss of power 5 days after I posed this question to the list. Backup generators failed and some customers were without power for over four hours.
I visited the datacentre as one server failed to boot due to a full log in and IPMI card*. When I got there I saw *lots* of sysadmins. It appears a lot of systems didn't recover well after a power cut. - fsck can take a *long* time on large disks. - RAID arrays don't take kindly to having the power yoinked. - Ensure BIOS is set to boot server after power outage. I would like to have been able to cut over to Amazon ec2 during the outage. This would be a *relatively* cheap DR/continuity option and wouldn't cost much when not in use (mainly storage and backup traffic). I'm getting a group together to talk about the technical side of using the cloud for this sort of thing. Get in touch if you're interested. - Mike * It turns out there is a firmware fix to avoid this. On Tue, Jan 27, 2009 at 11:14 PM, Nathan de Vries <[email protected]> wrote: > > On 27/01/2009, at 4:34 PM, Mike Bailey wrote: >> I'm interested in hearing from people who have good Disaster >> Recovery setups. > > Some simple steps I try and abide by (if it's a serious app, for > novelty apps I generally don't care): > > * Keep deployment automated and cheap > * Set low TTLs for all DNS records > * MySQL master/slave replication with slave used for backing up. Live > dumps from a non-replicated DB is fine with low traffic, though. > * s3sync.rb backing up MySQL dumps from the slave, as well as user- > generated files/content (if applicable) > * Pre-configured maintenance pages with the ability to include > downtime messages to users > * Configuration options to disable critical features (e.g. checkout if > payment gateway is down), re-configurable without code-redeployment > * No architectural astronauting > * Don't let disaster-resilience get in the way of normal development > > Simplest seems to be the best. Working with up-front "code-for-scale" > tends to make programming unbearable for me. I prefer being reactive > on the whole, but with the above proactive measures depending on the > seriousness of the project. > > > Cheers, > > -- > Nathan de Vries > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en -~----------~----~----~----~------~----~------~--~---
