Thanks Alex, Your post got me thinking. I found some time (and a couple of dev servers!) to have another attempt at live migration of slices between hosts. Success!
<For those unfamiliar with Xen: 'domU' means the same as 'slice'> I did a fresh install of Debian Etch (stable) on two servers and installed the Xen version that came with it (xen-3.0). I made the disk and swap images for the domU available via AoE on a third server. The live migration went smoothly. I was able to move the running domU from one host to the other without a noticable delay in an active file download or ssh session. One slight problem I noticed was when I migrated a domU without any sessions (download or otherwise) open to it. The Xen version that ships with Etch does't send a gratuitous arp which is required to notify other servers on the network that the domU has moved. When I migrated without any active sessions open to the host and pinged it from another host, the pings stopped during the migration. They started again when I pinged out from the migrated domU (via console). There's a known problem with the Xen version shipped with Etch (stable). Amazon ec2 seems to be using xen-3.1.0. What version are you using? The biggest question in my mind is how to protect against the same domU being manually started on two hosts. Mounting the same network block device (AoE) from two hosts will most likely corrupt the filesystem. Are you using some form of fencing? I also wonder whether it's safe to use lvm2 on a shared block device. I could create a volume group for domUs on a block device and have logical volumes for each domU. I imagine there would be risk of corruption in the volume management if more than one host were to create logical volumes. Perhaps it's possible to restrict volume creation to a single host and ensure that vgscan is run on all other hosts after each new volume is created. I've tried in the past to get clvm working on ubuntu and had no success. I suggest keeping it to no more than one domU per CPU > core unless you have a good reason to do otherwise. Interesting. I've not heard that sugestion before but I'm sure it comes from hard earned experience! :-) A number of our slices use very little CPU (mail, web redirectors) and we have some high availability hot standbys so we've found 12-16 slices on an 8 core server have been running OK. I'd like to investigate pinning slices to VCPUs to provide some protection for services that need it though. > > Depending on how many CPU cores you have, one > server consuming too much CPU time has the potential to adversely > affect other servers. I found this to be true when each slice ran Ubuntu's cron.daily tasks. Urgh! The 'nice' vaue means nothing to a server running within Xen. I dithered the times that cron.daily runs on slices but would love to know a better solution. (One slice per core would be one answer) > > > # We can manage disk more easily > > > > Isolation of disk between virtual servers is a benefit as it reduces the > > damage a full disk would cause (our monitoring alerts us well before that > > happens though!). > > This is largely a moot point. On a standalone server, you would have > at *least* a RAID 1 setup to handle disk failures. I was referring to "no space left on device" errors. We got one the other day when Starling filled a disk. Only that slice felt it. I agree with you about the RAID though. I've never had a disk failure but I've never regretted the money my clients invest in RAID cards. Thanks again for sharing your experiences Alex. Xen experts seem to be a bit scarce. I've been hearing that kvm is 'the next big thing' but discovering Amazon ec2 runs on Xen has strengthened my view that Xen is still a great choice for virtualization. - Mike --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Deploying Rails" group. To post to this group, send email to rubyonrails-deployment@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment?hl=en -~----------~----~----~----~------~----~------~--~---