Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
For what it's worth Glance API also has to deal with file I/O blocking all greenthreads and has CooperativeReaders/Writers that yield around the file I/O to mitigate starvation. A while ago I hit an issue with 5 concurrent VM snapshots starving Nova compute eventlets due to the excessive file IO of reading the snapshot file to upload. This could be remedied by taking the cooperative reader from Glance API and using it in glanceclient [1]. It's not perfect but something similar could help out the glance image download issues without needing to tweak the OS. [1] https://bugs.launchpad.net/python-glanceclient/+bug/1327248 On Mon, Feb 22, 2016 at 1:13 PM, Chris Friesen wrote: > On 02/22/2016 11:20 AM, Daniel P. Berrange wrote: >> >> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: >>> >>> On 02/22/2016 10:43 AM, Chris Friesen wrote: > > But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. >>> >>> >>> Making nova-compute more robust is fine, though the reality is once you >>> IO starve a system, a lot of stuff is going to fall over weird. >>> >>> So there has to be a tradeoff of the complexity of any new code vs. what >>> it gains. I think individual patches should be evaluated as such, or a >>> spec if this is going to get really invasive. >> >> >> There are OS level mechanisms (eg cgroups blkio controller) for doing >> I/O priorization that you could use to give Nova higher priority over >> the VMs, to reduce (if not eliminate) the possibility that a busy VM >> can inflict a denial of service on the mgmt layer. Of course figuring >> out how to use that mechanism correctly is not entirely trivial. > > > The 50+ second delays were with CFQ as the disk scheduler. (No cgroups > though, just CFQ with equal priorities on nova-compute and the guests.) > This was with a 3.10 kernel though, so maybe CFQ behaves better on newer > kernels. > > If you put nova-compute at high priority then glance image downloads, > qemu-img format conversions, and volume clearing will also run at the higher > priority, potentially impacting running VMs. > > In an ideal world we'd have per-VM cgroups and all activity on behalf of a > particular VM would be done in the context of that VM's cgroup. > > Chris > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 11:20 AM, Daniel P. Berrange wrote: On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: On 02/22/2016 10:43 AM, Chris Friesen wrote: But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. Making nova-compute more robust is fine, though the reality is once you IO starve a system, a lot of stuff is going to fall over weird. So there has to be a tradeoff of the complexity of any new code vs. what it gains. I think individual patches should be evaluated as such, or a spec if this is going to get really invasive. There are OS level mechanisms (eg cgroups blkio controller) for doing I/O priorization that you could use to give Nova higher priority over the VMs, to reduce (if not eliminate) the possibility that a busy VM can inflict a denial of service on the mgmt layer. Of course figuring out how to use that mechanism correctly is not entirely trivial. The 50+ second delays were with CFQ as the disk scheduler. (No cgroups though, just CFQ with equal priorities on nova-compute and the guests.) This was with a 3.10 kernel though, so maybe CFQ behaves better on newer kernels. If you put nova-compute at high priority then glance image downloads, qemu-img format conversions, and volume clearing will also run at the higher priority, potentially impacting running VMs. In an ideal world we'd have per-VM cgroups and all activity on behalf of a particular VM would be done in the context of that VM's cgroup. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 22/02/16 19:07, "John Garbutt" wrote: >On 22 February 2016 at 17:38, Sean Dague wrote: >> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote: >>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: On 02/22/2016 10:43 AM, Chris Friesen wrote: > Hi all, > > We've recently run into some interesting behaviour that I thought I > should bring up to see if we want to do anything about it. > > Basically the problem seems to be that nova-compute is doing disk I/O > from the main thread, and if it blocks then it can block all of > nova-compute (since all eventlets will be blocked). Examples that we've > found include glance image download, file renaming, instance directory > creation, opening the instance xml file, etc. We've seen nova-compute > block for upwards of 50 seconds. > > Now the specific case where we hit this is not a production > environment. It's only got one spinning disk shared by all the guests, > the guests were hammering on the disk pretty hard, the IO scheduler for > the instance disk was CFQ which seems to be buggy in our kernel. > > But the fact remains that nova-compute is doing disk I/O from the main > thread, and if the guests push that disk hard enough then nova-compute > is going to suffer. > > Given the above...would it make sense to use eventlet.tpool or similar > to perform all disk access in a separate OS thread? There'd likely be a > bit of a performance hit, but at least it would isolate the main thread > from IO blocking. Making nova-compute more robust is fine, though the reality is once you IO starve a system, a lot of stuff is going to fall over weird. So there has to be a tradeoff of the complexity of any new code vs. what it gains. I think individual patches should be evaluated as such, or a spec if this is going to get really invasive. >>> >>> There are OS level mechanisms (eg cgroups blkio controller) for doing >>> I/O priorization that you could use to give Nova higher priority over >>> the VMs, to reduce (if not eliminate) the possibility that a busy VM >>> can inflict a denial of service on the mgmt layer. Of course figuring >>> out how to use that mechanism correctly is not entirely trivial. >>> >>> I think it is probably worth focusing effort in that area, before jumping >>> into making all the I/O related code in Nova more complicated. eg have >>> someone investigate & write up recommendation in Nova docs for how to >>> configure the host OS & Nova such that VMs cannot inflict an I/O denial >>> of service attack on the mgmt service. >> >> +1 that would be much nicer. >> >> We've got some set of bugs in the tracker right now which are basically >> "after the compute node being at loadavg of 11 for an hour, nova-compute >> starts failing". Having some basic methodology to use Linux >> prioritization on the worker process would mitigate those quite a bit, >> and could be used by all users immediately, vs. complex nova-compute >> changes which would only apply to new / upgraded deploys. >> > >+1 > >Does that turn into improved deployment docs that cover how you do >that on various platforms? > >Maybe some tools to help with that also go in here? >http://git.openstack.org/cgit/openstack/osops-tools-generic/ And some easy configuration in the puppet/ansible/chef standard recipes would also help. > >Thanks, >John > >PS >FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit >in a more heavy handed way. > >__ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 22/02/16 19:07, "John Garbutt" wrote: >On 22 February 2016 at 17:38, Sean Dague wrote: >> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote: >>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: On 02/22/2016 10:43 AM, Chris Friesen wrote: > Hi all, > > We've recently run into some interesting behaviour that I thought I > should bring up to see if we want to do anything about it. > > Basically the problem seems to be that nova-compute is doing disk I/O > from the main thread, and if it blocks then it can block all of > nova-compute (since all eventlets will be blocked). Examples that we've > found include glance image download, file renaming, instance directory > creation, opening the instance xml file, etc. We've seen nova-compute > block for upwards of 50 seconds. > > Now the specific case where we hit this is not a production > environment. It's only got one spinning disk shared by all the guests, > the guests were hammering on the disk pretty hard, the IO scheduler for > the instance disk was CFQ which seems to be buggy in our kernel. > > But the fact remains that nova-compute is doing disk I/O from the main > thread, and if the guests push that disk hard enough then nova-compute > is going to suffer. > > Given the above...would it make sense to use eventlet.tpool or similar > to perform all disk access in a separate OS thread? There'd likely be a > bit of a performance hit, but at least it would isolate the main thread > from IO blocking. Making nova-compute more robust is fine, though the reality is once you IO starve a system, a lot of stuff is going to fall over weird. So there has to be a tradeoff of the complexity of any new code vs. what it gains. I think individual patches should be evaluated as such, or a spec if this is going to get really invasive. >>> >>> There are OS level mechanisms (eg cgroups blkio controller) for doing >>> I/O priorization that you could use to give Nova higher priority over >>> the VMs, to reduce (if not eliminate) the possibility that a busy VM >>> can inflict a denial of service on the mgmt layer. Of course figuring >>> out how to use that mechanism correctly is not entirely trivial. >>> >>> I think it is probably worth focusing effort in that area, before jumping >>> into making all the I/O related code in Nova more complicated. eg have >>> someone investigate & write up recommendation in Nova docs for how to >>> configure the host OS & Nova such that VMs cannot inflict an I/O denial >>> of service attack on the mgmt service. >> >> +1 that would be much nicer. >> >> We've got some set of bugs in the tracker right now which are basically >> "after the compute node being at loadavg of 11 for an hour, nova-compute >> starts failing". Having some basic methodology to use Linux >> prioritization on the worker process would mitigate those quite a bit, >> and could be used by all users immediately, vs. complex nova-compute >> changes which would only apply to new / upgraded deploys. >> > >+1 > >Does that turn into improved deployment docs that cover how you do >that on various platforms? > >Maybe some tools to help with that also go in here? >http://git.openstack.org/cgit/openstack/osops-tools-generic/ I think we could also include something in the puppet/chef/ansible/… Configurations to do the appropriate settings. > >Thanks, >John > >PS >FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit >in a more heavy handed way. > >__ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 22 February 2016 at 17:38, Sean Dague wrote: > On 02/22/2016 12:20 PM, Daniel P. Berrange wrote: >> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: >>> On 02/22/2016 10:43 AM, Chris Friesen wrote: Hi all, We've recently run into some interesting behaviour that I thought I should bring up to see if we want to do anything about it. Basically the problem seems to be that nova-compute is doing disk I/O from the main thread, and if it blocks then it can block all of nova-compute (since all eventlets will be blocked). Examples that we've found include glance image download, file renaming, instance directory creation, opening the instance xml file, etc. We've seen nova-compute block for upwards of 50 seconds. Now the specific case where we hit this is not a production environment. It's only got one spinning disk shared by all the guests, the guests were hammering on the disk pretty hard, the IO scheduler for the instance disk was CFQ which seems to be buggy in our kernel. But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. >>> >>> Making nova-compute more robust is fine, though the reality is once you >>> IO starve a system, a lot of stuff is going to fall over weird. >>> >>> So there has to be a tradeoff of the complexity of any new code vs. what >>> it gains. I think individual patches should be evaluated as such, or a >>> spec if this is going to get really invasive. >> >> There are OS level mechanisms (eg cgroups blkio controller) for doing >> I/O priorization that you could use to give Nova higher priority over >> the VMs, to reduce (if not eliminate) the possibility that a busy VM >> can inflict a denial of service on the mgmt layer. Of course figuring >> out how to use that mechanism correctly is not entirely trivial. >> >> I think it is probably worth focusing effort in that area, before jumping >> into making all the I/O related code in Nova more complicated. eg have >> someone investigate & write up recommendation in Nova docs for how to >> configure the host OS & Nova such that VMs cannot inflict an I/O denial >> of service attack on the mgmt service. > > +1 that would be much nicer. > > We've got some set of bugs in the tracker right now which are basically > "after the compute node being at loadavg of 11 for an hour, nova-compute > starts failing". Having some basic methodology to use Linux > prioritization on the worker process would mitigate those quite a bit, > and could be used by all users immediately, vs. complex nova-compute > changes which would only apply to new / upgraded deploys. > +1 Does that turn into improved deployment docs that cover how you do that on various platforms? Maybe some tools to help with that also go in here? http://git.openstack.org/cgit/openstack/osops-tools-generic/ Thanks, John PS FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit in a more heavy handed way. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 12:20 PM, Daniel P. Berrange wrote: > On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: >> On 02/22/2016 10:43 AM, Chris Friesen wrote: >>> Hi all, >>> >>> We've recently run into some interesting behaviour that I thought I >>> should bring up to see if we want to do anything about it. >>> >>> Basically the problem seems to be that nova-compute is doing disk I/O >>> from the main thread, and if it blocks then it can block all of >>> nova-compute (since all eventlets will be blocked). Examples that we've >>> found include glance image download, file renaming, instance directory >>> creation, opening the instance xml file, etc. We've seen nova-compute >>> block for upwards of 50 seconds. >>> >>> Now the specific case where we hit this is not a production >>> environment. It's only got one spinning disk shared by all the guests, >>> the guests were hammering on the disk pretty hard, the IO scheduler for >>> the instance disk was CFQ which seems to be buggy in our kernel. >>> >>> But the fact remains that nova-compute is doing disk I/O from the main >>> thread, and if the guests push that disk hard enough then nova-compute >>> is going to suffer. >>> >>> Given the above...would it make sense to use eventlet.tpool or similar >>> to perform all disk access in a separate OS thread? There'd likely be a >>> bit of a performance hit, but at least it would isolate the main thread >>> from IO blocking. >> >> Making nova-compute more robust is fine, though the reality is once you >> IO starve a system, a lot of stuff is going to fall over weird. >> >> So there has to be a tradeoff of the complexity of any new code vs. what >> it gains. I think individual patches should be evaluated as such, or a >> spec if this is going to get really invasive. > > There are OS level mechanisms (eg cgroups blkio controller) for doing > I/O priorization that you could use to give Nova higher priority over > the VMs, to reduce (if not eliminate) the possibility that a busy VM > can inflict a denial of service on the mgmt layer. Of course figuring > out how to use that mechanism correctly is not entirely trivial. > > I think it is probably worth focusing effort in that area, before jumping > into making all the I/O related code in Nova more complicated. eg have > someone investigate & write up recommendation in Nova docs for how to > configure the host OS & Nova such that VMs cannot inflict an I/O denial > of service attack on the mgmt service. +1 that would be much nicer. We've got some set of bugs in the tracker right now which are basically "after the compute node being at loadavg of 11 for an hour, nova-compute starts failing". Having some basic methodology to use Linux prioritization on the worker process would mitigate those quite a bit, and could be used by all users immediately, vs. complex nova-compute changes which would only apply to new / upgraded deploys. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On Mon, Feb 22, 2016, at 12:15 PM, Mike Bayer wrote: > > > On 02/22/2016 11:30 AM, Chris Friesen wrote: > > On 02/22/2016 11:17 AM, Jay Pipes wrote: > >> On 02/22/2016 10:43 AM, Chris Friesen wrote: > >>> Hi all, > >>> > >>> We've recently run into some interesting behaviour that I thought I > >>> should bring up to see if we want to do anything about it. > >>> > >>> Basically the problem seems to be that nova-compute is doing disk I/O > >>> from the main thread, and if it blocks then it can block all of > >>> nova-compute (since all eventlets will be blocked). Examples that we've > >>> found include glance image download, file renaming, instance directory > >>> creation, opening the instance xml file, etc. We've seen nova-compute > >>> block for upwards of 50 seconds. > >>> > >>> Now the specific case where we hit this is not a production > >>> environment. It's only got one spinning disk shared by all the guests, > >>> the guests were hammering on the disk pretty hard, the IO scheduler for > >>> the instance disk was CFQ which seems to be buggy in our kernel. > >>> > >>> But the fact remains that nova-compute is doing disk I/O from the main > >>> thread, and if the guests push that disk hard enough then nova-compute > >>> is going to suffer. > >>> > >>> Given the above...would it make sense to use eventlet.tpool or similar > >>> to perform all disk access in a separate OS thread? There'd likely be a > >>> bit of a performance hit, but at least it would isolate the main thread > >>> from IO blocking. > >> > >> This is probably a good idea, but will require quite a bit of code > >> change. I > >> think in the past we've taken the expedient route of just exec'ing > >> problematic > >> code in a greenthread using utils.spawn(). > > > > I'm not an expert on eventlet, but from what I've seen this isn't > > sufficient to deal with disk access in a robust way. > > > > It's my understanding that utils.spawn() will result in the code running > > in the same OS thread, but in a separate eventlet greenthread. If that > > code tries to access the disk via a potentially-blocking call the > > eventlet subsystem will not jump to another greenthread. Because of > > this it can potentially block the whole OS thread (and thus all other > > greenthreads running in that OS thread). > > not sure what utils.spawn() does but if it is in fact an "exec" (or if > Jay is suggesting that an exec() be used within) then the code would be > in a different process entirely, and communicating with it becomes an > issue of pipe IO over unix sockets which IIRC can do non blocking. utils.spawn() is just a wrapper around eventlet.spawn(), mostly there to be stubbed out in testing. > > > > > > I think we need to eventlet.tpool for disk IO (or else fork a whole > > separate process). Basically we need to ensure that the main OS thread > > never issues a potentially-blocking syscall. > > tpool would probably be easier (and more performant because no socket > needed). > > > > > > Chris > > > > __ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote: > On 02/22/2016 10:43 AM, Chris Friesen wrote: > > Hi all, > > > > We've recently run into some interesting behaviour that I thought I > > should bring up to see if we want to do anything about it. > > > > Basically the problem seems to be that nova-compute is doing disk I/O > > from the main thread, and if it blocks then it can block all of > > nova-compute (since all eventlets will be blocked). Examples that we've > > found include glance image download, file renaming, instance directory > > creation, opening the instance xml file, etc. We've seen nova-compute > > block for upwards of 50 seconds. > > > > Now the specific case where we hit this is not a production > > environment. It's only got one spinning disk shared by all the guests, > > the guests were hammering on the disk pretty hard, the IO scheduler for > > the instance disk was CFQ which seems to be buggy in our kernel. > > > > But the fact remains that nova-compute is doing disk I/O from the main > > thread, and if the guests push that disk hard enough then nova-compute > > is going to suffer. > > > > Given the above...would it make sense to use eventlet.tpool or similar > > to perform all disk access in a separate OS thread? There'd likely be a > > bit of a performance hit, but at least it would isolate the main thread > > from IO blocking. > > Making nova-compute more robust is fine, though the reality is once you > IO starve a system, a lot of stuff is going to fall over weird. > > So there has to be a tradeoff of the complexity of any new code vs. what > it gains. I think individual patches should be evaluated as such, or a > spec if this is going to get really invasive. There are OS level mechanisms (eg cgroups blkio controller) for doing I/O priorization that you could use to give Nova higher priority over the VMs, to reduce (if not eliminate) the possibility that a busy VM can inflict a denial of service on the mgmt layer. Of course figuring out how to use that mechanism correctly is not entirely trivial. I think it is probably worth focusing effort in that area, before jumping into making all the I/O related code in Nova more complicated. eg have someone investigate & write up recommendation in Nova docs for how to configure the host OS & Nova such that VMs cannot inflict an I/O denial of service attack on the mgmt service. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 11:30 AM, Chris Friesen wrote: On 02/22/2016 11:17 AM, Jay Pipes wrote: On 02/22/2016 10:43 AM, Chris Friesen wrote: Hi all, We've recently run into some interesting behaviour that I thought I should bring up to see if we want to do anything about it. Basically the problem seems to be that nova-compute is doing disk I/O from the main thread, and if it blocks then it can block all of nova-compute (since all eventlets will be blocked). Examples that we've found include glance image download, file renaming, instance directory creation, opening the instance xml file, etc. We've seen nova-compute block for upwards of 50 seconds. Now the specific case where we hit this is not a production environment. It's only got one spinning disk shared by all the guests, the guests were hammering on the disk pretty hard, the IO scheduler for the instance disk was CFQ which seems to be buggy in our kernel. But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. This is probably a good idea, but will require quite a bit of code change. I think in the past we've taken the expedient route of just exec'ing problematic code in a greenthread using utils.spawn(). I'm not an expert on eventlet, but from what I've seen this isn't sufficient to deal with disk access in a robust way. It's my understanding that utils.spawn() will result in the code running in the same OS thread, but in a separate eventlet greenthread. If that code tries to access the disk via a potentially-blocking call the eventlet subsystem will not jump to another greenthread. Because of this it can potentially block the whole OS thread (and thus all other greenthreads running in that OS thread). not sure what utils.spawn() does but if it is in fact an "exec" (or if Jay is suggesting that an exec() be used within) then the code would be in a different process entirely, and communicating with it becomes an issue of pipe IO over unix sockets which IIRC can do non blocking. I think we need to eventlet.tpool for disk IO (or else fork a whole separate process). Basically we need to ensure that the main OS thread never issues a potentially-blocking syscall. tpool would probably be easier (and more performant because no socket needed). Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 10:43 AM, Chris Friesen wrote: > Hi all, > > We've recently run into some interesting behaviour that I thought I > should bring up to see if we want to do anything about it. > > Basically the problem seems to be that nova-compute is doing disk I/O > from the main thread, and if it blocks then it can block all of > nova-compute (since all eventlets will be blocked). Examples that we've > found include glance image download, file renaming, instance directory > creation, opening the instance xml file, etc. We've seen nova-compute > block for upwards of 50 seconds. > > Now the specific case where we hit this is not a production > environment. It's only got one spinning disk shared by all the guests, > the guests were hammering on the disk pretty hard, the IO scheduler for > the instance disk was CFQ which seems to be buggy in our kernel. > > But the fact remains that nova-compute is doing disk I/O from the main > thread, and if the guests push that disk hard enough then nova-compute > is going to suffer. > > Given the above...would it make sense to use eventlet.tpool or similar > to perform all disk access in a separate OS thread? There'd likely be a > bit of a performance hit, but at least it would isolate the main thread > from IO blocking. Making nova-compute more robust is fine, though the reality is once you IO starve a system, a lot of stuff is going to fall over weird. So there has to be a tradeoff of the complexity of any new code vs. what it gains. I think individual patches should be evaluated as such, or a spec if this is going to get really invasive. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 11:17 AM, Jay Pipes wrote: On 02/22/2016 10:43 AM, Chris Friesen wrote: Hi all, We've recently run into some interesting behaviour that I thought I should bring up to see if we want to do anything about it. Basically the problem seems to be that nova-compute is doing disk I/O from the main thread, and if it blocks then it can block all of nova-compute (since all eventlets will be blocked). Examples that we've found include glance image download, file renaming, instance directory creation, opening the instance xml file, etc. We've seen nova-compute block for upwards of 50 seconds. Now the specific case where we hit this is not a production environment. It's only got one spinning disk shared by all the guests, the guests were hammering on the disk pretty hard, the IO scheduler for the instance disk was CFQ which seems to be buggy in our kernel. But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. This is probably a good idea, but will require quite a bit of code change. I think in the past we've taken the expedient route of just exec'ing problematic code in a greenthread using utils.spawn(). I'm not an expert on eventlet, but from what I've seen this isn't sufficient to deal with disk access in a robust way. It's my understanding that utils.spawn() will result in the code running in the same OS thread, but in a separate eventlet greenthread. If that code tries to access the disk via a potentially-blocking call the eventlet subsystem will not jump to another greenthread. Because of this it can potentially block the whole OS thread (and thus all other greenthreads running in that OS thread). I think we need to eventlet.tpool for disk IO (or else fork a whole separate process). Basically we need to ensure that the main OS thread never issues a potentially-blocking syscall. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
On 02/22/2016 10:43 AM, Chris Friesen wrote: Hi all, We've recently run into some interesting behaviour that I thought I should bring up to see if we want to do anything about it. Basically the problem seems to be that nova-compute is doing disk I/O from the main thread, and if it blocks then it can block all of nova-compute (since all eventlets will be blocked). Examples that we've found include glance image download, file renaming, instance directory creation, opening the instance xml file, etc. We've seen nova-compute block for upwards of 50 seconds. Now the specific case where we hit this is not a production environment. It's only got one spinning disk shared by all the guests, the guests were hammering on the disk pretty hard, the IO scheduler for the instance disk was CFQ which seems to be buggy in our kernel. But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. This is probably a good idea, but will require quite a bit of code change. I think in the past we've taken the expedient route of just exec'ing problematic code in a greenthread using utils.spawn(). Best, -jay [1] __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
Hi all, We've recently run into some interesting behaviour that I thought I should bring up to see if we want to do anything about it. Basically the problem seems to be that nova-compute is doing disk I/O from the main thread, and if it blocks then it can block all of nova-compute (since all eventlets will be blocked). Examples that we've found include glance image download, file renaming, instance directory creation, opening the instance xml file, etc. We've seen nova-compute block for upwards of 50 seconds. Now the specific case where we hit this is not a production environment. It's only got one spinning disk shared by all the guests, the guests were hammering on the disk pretty hard, the IO scheduler for the instance disk was CFQ which seems to be buggy in our kernel. But the fact remains that nova-compute is doing disk I/O from the main thread, and if the guests push that disk hard enough then nova-compute is going to suffer. Given the above...would it make sense to use eventlet.tpool or similar to perform all disk access in a separate OS thread? There'd likely be a bit of a performance hit, but at least it would isolate the main thread from IO blocking. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev