----- Original Message ----- > From: "Daniel P. Berrange" <berra...@redhat.com> > To: "Robert Collins" <robe...@robertcollins.net> > Cc: "OpenStack Development Mailing List (not for usage questions)" > <email@example.com>, > openstack-operat...@lists.openstack.org > Sent: Monday, 2 February, 2015 5:56:56 AM > Subject: Re: [openstack-dev] [nova][libvirt] RFC: ensuring live migration > ends > > On Mon, Feb 02, 2015 at 08:24:20AM +1300, Robert Collins wrote: > > On 31 January 2015 at 05:47, Daniel P. Berrange <berra...@redhat.com> > > wrote: > > > In working on a recent Nova migration bug > > > > > > https://bugs.launchpad.net/nova/+bug/1414065 > > > > > > I had cause to refactor the way the nova libvirt driver monitors live > > > migration completion/failure/progress. This refactor has opened the > > > door for doing more intelligent active management of the live migration > > > process. > > ... > > > What kind of things would be the biggest win from Operators' or tenants' > > > POV ? > > > > Awesome. Couple thoughts from my perspective. Firstly, there's a bunch > > of situation dependent tuning. One thing Crowbar does really nicely is > > that you specify the host layout in broad abstract terms - e.g. 'first > > 10G network link' and so on : some of your settings above like whether > > to compress page are going to be heavily dependent on the bandwidth > > available (I doubt that compression is a win on a 100G link for > > instance, and would be suspect at 10G even). So it would be nice if > > there was a single dial or two to set and Nova would auto-calculate > > good defaults from that (with appropriate overrides being available). > > I wonder how such an idea would fit into Nova, since it doesn't really > have that kind of knowledge about the network deployment characteristics. > > > Operationally avoiding trouble is better than being able to fix it, so > > I quite like the idea of defaulting the auto-converge option on, or > > perhaps making it controllable via flavours, so that operators can > > offer (and identify!) those particularly performance sensitive > > workloads rather than having to guess which instances are special and > > which aren't. > > I'll investigate the auto-converge further to find out what the > potential downsides of it are. If we can unconditionally enable > it, it would be simpler than adding yet more tunables. > > > Being able to cancel the migration would be good. Relatedly being able > > to restart nova-compute while a migration is going on would be good > > (or put differently, a migration happening shouldn't prevent a deploy > > of Nova code: interlocks like that make continuous deployment much > > harder). > > > > If we can't already, I'd like as a user to be able to see that the > > migration is happening (allows diagnosis of transient issues during > > the migration). Some ops folk may want to hide that of course. > > > > I'm not sure that automatically rolling back after N minutes makes > > sense : if the impact on the cluster is significant then 1 minute vs > > 10 doesn't instrinsically matter: what matters more is preventing too > > many concurrent migrations, so that would be another feature that I > > don't think we have yet: don't allow more than some N inbound and M > > outbound live migrations to a compute host at any time, to prevent IO > > storms. We may want to log with NOTIFICATION migrations that are still > > progressing but appear to be having trouble completing. And of course > > an admin API to query all migrations in progress to allow API driven > > health checks by monitoring tools - which gives the power to manage > > things to admins without us having to write a probably-too-simple > > config interface. > > Interesting, the point about concurrent migrations hadn't occurred to > me before, but it does of course make sense since migration is > primarily network bandwidth limited, though disk bandwidth is relevant > too if doing block migration.
Indeed, there was a lot time spent investigating this topic (in Ovirt again) and eventually it was decided to expose a config option and allow 3 concurrent migrations by default. https://github.com/oVirt/vdsm/blob/master/lib/vdsm/config.py.in#L126 > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev