On 22 February 2016 at 17:38, Sean Dague <s...@dague.net> wrote: > On 02/22/2016 12:20 PM, Daniel P. Berrange wrote: >> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: >>> On 02/22/2016 10:43 AM, Chris Friesen wrote: >>>> Hi all, >>>> >>>> We've recently run into some interesting behaviour that I thought I >>>> should bring up to see if we want to do anything about it. >>>> >>>> Basically the problem seems to be that nova-compute is doing disk I/O >>>> from the main thread, and if it blocks then it can block all of >>>> nova-compute (since all eventlets will be blocked). Examples that we've >>>> found include glance image download, file renaming, instance directory >>>> creation, opening the instance xml file, etc. We've seen nova-compute >>>> block for upwards of 50 seconds. >>>> >>>> Now the specific case where we hit this is not a production >>>> environment. It's only got one spinning disk shared by all the guests, >>>> the guests were hammering on the disk pretty hard, the IO scheduler for >>>> the instance disk was CFQ which seems to be buggy in our kernel. >>>> >>>> But the fact remains that nova-compute is doing disk I/O from the main >>>> thread, and if the guests push that disk hard enough then nova-compute >>>> is going to suffer. >>>> >>>> Given the above...would it make sense to use eventlet.tpool or similar >>>> to perform all disk access in a separate OS thread? There'd likely be a >>>> bit of a performance hit, but at least it would isolate the main thread >>>> from IO blocking. >>> >>> Making nova-compute more robust is fine, though the reality is once you >>> IO starve a system, a lot of stuff is going to fall over weird. >>> >>> So there has to be a tradeoff of the complexity of any new code vs. what >>> it gains. I think individual patches should be evaluated as such, or a >>> spec if this is going to get really invasive. >> >> There are OS level mechanisms (eg cgroups blkio controller) for doing >> I/O priorization that you could use to give Nova higher priority over >> the VMs, to reduce (if not eliminate) the possibility that a busy VM >> can inflict a denial of service on the mgmt layer. Of course figuring >> out how to use that mechanism correctly is not entirely trivial. >> >> I think it is probably worth focusing effort in that area, before jumping >> into making all the I/O related code in Nova more complicated. eg have >> someone investigate & write up recommendation in Nova docs for how to >> configure the host OS & Nova such that VMs cannot inflict an I/O denial >> of service attack on the mgmt service. > > +1 that would be much nicer. > > We've got some set of bugs in the tracker right now which are basically > "after the compute node being at loadavg of 11 for an hour, nova-compute > starts failing". Having some basic methodology to use Linux > prioritization on the worker process would mitigate those quite a bit, > and could be used by all users immediately, vs. complex nova-compute > changes which would only apply to new / upgraded deploys. >
+1 Does that turn into improved deployment docs that cover how you do that on various platforms? Maybe some tools to help with that also go in here? http://git.openstack.org/cgit/openstack/osops-tools-generic/ Thanks, John PS FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit in a more heavy handed way. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev