Thanks Alex,

Your post got me thinking. I found some time (and a couple of dev servers!)
to have another attempt at live migration of slices between hosts. Success!

<For those unfamiliar with Xen: 'domU' means the same as 'slice'>

I did a fresh install of Debian Etch (stable) on two servers and installed
the Xen version that came with it (xen-3.0). I made the disk and swap images
for the domU available via AoE on a third server. The live migration went
smoothly. I was able to move the running domU from one host to the other
without a noticable delay in an active file download or ssh session.

One slight problem I noticed was when I migrated a domU without any sessions
(download or otherwise) open to it. The Xen version that ships with Etch
does't send a gratuitous arp which is required to notify other servers on
the network that the domU has moved. When I migrated without any active
sessions open to the host and pinged it from another host, the pings stopped
during the migration. They started again when I pinged out from the migrated
domU (via console). There's a known problem with the Xen version shipped
with Etch (stable). Amazon ec2 seems to be using xen-3.1.0. What version are
you using?

The biggest question in my mind is how to protect against the same domU
being manually started on two hosts. Mounting the same network block device
(AoE) from two hosts will most likely corrupt the filesystem. Are you using
some form of fencing?

I also wonder whether it's safe to use lvm2 on a shared block device. I
could create a volume group for domUs on a block device and have logical
volumes for each domU. I imagine there would be risk of corruption in the
volume management if more than one host were to create logical volumes.
Perhaps it's possible to restrict volume creation to a single host and
ensure that vgscan is run on all other hosts after each new volume is
created. I've tried in the past to get clvm working on ubuntu and had no
success.

I suggest keeping it to no more than one domU per CPU
> core unless you have a good reason to do otherwise.


Interesting. I've not heard that sugestion before but I'm sure it comes from
hard earned experience! :-)

A number of our slices use very little CPU (mail, web redirectors) and we
have some high availability hot standbys so we've found 12-16 slices on an 8
core server have been running OK. I'd like to investigate pinning slices to
VCPUs to provide some protection for services that need it though.


>
> Depending on how many CPU cores you have, one
> server consuming too much CPU time has the potential to adversely
> affect other servers.


I found this to be true when each slice ran Ubuntu's cron.daily tasks. Urgh!
The 'nice' vaue means nothing to a server running within Xen. I dithered the
times that cron.daily
runs on slices but would love to know a better solution. (One slice per core
would be one answer)


>
> > # We can manage disk more easily
> >
> > Isolation of disk between virtual servers is a benefit as it reduces the
> > damage a full disk would cause (our monitoring alerts us well before that
> > happens though!).
>
> This is largely a moot point. On a standalone server, you would have
> at *least* a RAID 1 setup to handle disk failures.


I was referring to "no space left on device" errors. We got one the other
day when Starling filled a disk. Only that slice felt it.

I agree with you about the RAID though. I've never had a disk failure but
I've never regretted the money my clients invest in RAID cards.

Thanks again for sharing your experiences Alex. Xen experts seem to be a bit
scarce. I've been hearing that kvm is 'the next big thing' but discovering
Amazon ec2 runs on Xen has strengthened my view that Xen is still a great
choice for virtualization.

- Mike

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Deploying Rails" group.
To post to this group, send email to rubyonrails-deployment@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-deployment?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to