On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: > On 02/22/2016 10:43 AM, Chris Friesen wrote: > > Hi all, > > > > We've recently run into some interesting behaviour that I thought I > > should bring up to see if we want to do anything about it. > > > > Basically the problem seems to be that nova-compute is doing disk I/O > > from the main thread, and if it blocks then it can block all of > > nova-compute (since all eventlets will be blocked). Examples that we've > > found include glance image download, file renaming, instance directory > > creation, opening the instance xml file, etc. We've seen nova-compute > > block for upwards of 50 seconds. > > > > Now the specific case where we hit this is not a production > > environment. It's only got one spinning disk shared by all the guests, > > the guests were hammering on the disk pretty hard, the IO scheduler for > > the instance disk was CFQ which seems to be buggy in our kernel. > > > > But the fact remains that nova-compute is doing disk I/O from the main > > thread, and if the guests push that disk hard enough then nova-compute > > is going to suffer. > > > > Given the above...would it make sense to use eventlet.tpool or similar > > to perform all disk access in a separate OS thread? There'd likely be a > > bit of a performance hit, but at least it would isolate the main thread > > from IO blocking. > > Making nova-compute more robust is fine, though the reality is once you > IO starve a system, a lot of stuff is going to fall over weird. > > So there has to be a tradeoff of the complexity of any new code vs. what > it gains. I think individual patches should be evaluated as such, or a > spec if this is going to get really invasive.
There are OS level mechanisms (eg cgroups blkio controller) for doing I/O priorization that you could use to give Nova higher priority over the VMs, to reduce (if not eliminate) the possibility that a busy VM can inflict a denial of service on the mgmt layer. Of course figuring out how to use that mechanism correctly is not entirely trivial. I think it is probably worth focusing effort in that area, before jumping into making all the I/O related code in Nova more complicated. eg have someone investigate & write up recommendation in Nova docs for how to configure the host OS & Nova such that VMs cannot inflict an I/O denial of service attack on the mgmt service. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev