Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-25 Thread Sam Matzek
For what it's worth Glance API also has to deal with file I/O blocking
all greenthreads and has CooperativeReaders/Writers that yield around
the file I/O to mitigate starvation.  A while ago I hit an issue with
5 concurrent VM snapshots starving Nova compute eventlets due to the
excessive file IO of reading the snapshot file to upload.  This could
be remedied by taking the cooperative reader from Glance API and using
it in glanceclient [1].  It's not perfect but something similar could
help out the glance image download issues without needing to tweak the
OS.

[1] https://bugs.launchpad.net/python-glanceclient/+bug/1327248

On Mon, Feb 22, 2016 at 1:13 PM, Chris Friesen
 wrote:
> On 02/22/2016 11:20 AM, Daniel P. Berrange wrote:
>>
>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>>>
>>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>
>
 But the fact remains that nova-compute is doing disk I/O from the main
 thread, and if the guests push that disk hard enough then nova-compute
 is going to suffer.

 Given the above...would it make sense to use eventlet.tpool or similar
 to perform all disk access in a separate OS thread?  There'd likely be a
 bit of a performance hit, but at least it would isolate the main thread
 from IO blocking.
>>>
>>>
>>> Making nova-compute more robust is fine, though the reality is once you
>>> IO starve a system, a lot of stuff is going to fall over weird.
>>>
>>> So there has to be a tradeoff of the complexity of any new code vs. what
>>> it gains. I think individual patches should be evaluated as such, or a
>>> spec if this is going to get really invasive.
>>
>>
>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>> I/O priorization that you could use to give Nova higher priority over
>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>> can inflict a denial of service on the mgmt layer.  Of course figuring
>> out how to use that mechanism correctly is not entirely trivial.
>
>
> The 50+ second delays were with CFQ as the disk scheduler.  (No cgroups
> though, just CFQ with equal priorities on nova-compute and the guests.)
> This was with a 3.10 kernel though, so maybe CFQ behaves better on newer
> kernels.
>
> If you put nova-compute at high priority then glance image downloads,
> qemu-img format conversions, and volume clearing will also run at the higher
> priority, potentially impacting running VMs.
>
> In an ideal world we'd have per-VM cgroups and all activity on behalf of a
> particular VM would be done in the context of that VM's cgroup.
>
> Chris
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Chris Friesen

On 02/22/2016 11:20 AM, Daniel P. Berrange wrote:

On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:

On 02/22/2016 10:43 AM, Chris Friesen wrote:



But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.


Making nova-compute more robust is fine, though the reality is once you
IO starve a system, a lot of stuff is going to fall over weird.

So there has to be a tradeoff of the complexity of any new code vs. what
it gains. I think individual patches should be evaluated as such, or a
spec if this is going to get really invasive.


There are OS level mechanisms (eg cgroups blkio controller) for doing
I/O priorization that you could use to give Nova higher priority over
the VMs, to reduce (if not eliminate) the possibility that a busy VM
can inflict a denial of service on the mgmt layer.  Of course figuring
out how to use that mechanism correctly is not entirely trivial.


The 50+ second delays were with CFQ as the disk scheduler.  (No cgroups though, 
just CFQ with equal priorities on nova-compute and the guests.)  This was with a 
3.10 kernel though, so maybe CFQ behaves better on newer kernels.


If you put nova-compute at high priority then glance image downloads, qemu-img 
format conversions, and volume clearing will also run at the higher priority, 
potentially impacting running VMs.


In an ideal world we'd have per-VM cgroups and all activity on behalf of a 
particular VM would be done in the context of that VM's cgroup.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Tim Bell




On 22/02/16 19:07, "John Garbutt"  wrote:

>On 22 February 2016 at 17:38, Sean Dague  wrote:
>> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
>>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
 On 02/22/2016 10:43 AM, Chris Friesen wrote:
> Hi all,
>
> We've recently run into some interesting behaviour that I thought I
> should bring up to see if we want to do anything about it.
>
> Basically the problem seems to be that nova-compute is doing disk I/O
> from the main thread, and if it blocks then it can block all of
> nova-compute (since all eventlets will be blocked).  Examples that we've
> found include glance image download, file renaming, instance directory
> creation, opening the instance xml file, etc.  We've seen nova-compute
> block for upwards of 50 seconds.
>
> Now the specific case where we hit this is not a production
> environment.  It's only got one spinning disk shared by all the guests,
> the guests were hammering on the disk pretty hard, the IO scheduler for
> the instance disk was CFQ which seems to be buggy in our kernel.
>
> But the fact remains that nova-compute is doing disk I/O from the main
> thread, and if the guests push that disk hard enough then nova-compute
> is going to suffer.
>
> Given the above...would it make sense to use eventlet.tpool or similar
> to perform all disk access in a separate OS thread?  There'd likely be a
> bit of a performance hit, but at least it would isolate the main thread
> from IO blocking.

 Making nova-compute more robust is fine, though the reality is once you
 IO starve a system, a lot of stuff is going to fall over weird.

 So there has to be a tradeoff of the complexity of any new code vs. what
 it gains. I think individual patches should be evaluated as such, or a
 spec if this is going to get really invasive.
>>>
>>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>>> I/O priorization that you could use to give Nova higher priority over
>>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>>> can inflict a denial of service on the mgmt layer.  Of course figuring
>>> out how to use that mechanism correctly is not entirely trivial.
>>>
>>> I think it is probably worth focusing effort in that area, before jumping
>>> into making all the I/O related code in Nova more complicated. eg have
>>> someone investigate & write up recommendation in Nova docs for how to
>>> configure the host OS & Nova such that VMs cannot inflict an I/O denial
>>> of service attack on the mgmt service.
>>
>> +1 that would be much nicer.
>>
>> We've got some set of bugs in the tracker right now which are basically
>> "after the compute node being at loadavg of 11 for an hour, nova-compute
>> starts failing". Having some basic methodology to use Linux
>> prioritization on the worker process would mitigate those quite a bit,
>> and could be used by all users immediately, vs. complex nova-compute
>> changes which would only apply to new / upgraded deploys.
>>
>
>+1
>
>Does that turn into improved deployment docs that cover how you do
>that on various platforms?
>
>Maybe some tools to help with that also go in here?
>http://git.openstack.org/cgit/openstack/osops-tools-generic/

And some easy configuration in the puppet/ansible/chef standard recipes would 
also help.

>
>Thanks,
>John
>
>PS
>FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit
>in a more heavy handed way.
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Tim Bell

On 22/02/16 19:07, "John Garbutt"  wrote:

>On 22 February 2016 at 17:38, Sean Dague  wrote:
>> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
>>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
 On 02/22/2016 10:43 AM, Chris Friesen wrote:
> Hi all,
>
> We've recently run into some interesting behaviour that I thought I
> should bring up to see if we want to do anything about it.
>
> Basically the problem seems to be that nova-compute is doing disk I/O
> from the main thread, and if it blocks then it can block all of
> nova-compute (since all eventlets will be blocked).  Examples that we've
> found include glance image download, file renaming, instance directory
> creation, opening the instance xml file, etc.  We've seen nova-compute
> block for upwards of 50 seconds.
>
> Now the specific case where we hit this is not a production
> environment.  It's only got one spinning disk shared by all the guests,
> the guests were hammering on the disk pretty hard, the IO scheduler for
> the instance disk was CFQ which seems to be buggy in our kernel.
>
> But the fact remains that nova-compute is doing disk I/O from the main
> thread, and if the guests push that disk hard enough then nova-compute
> is going to suffer.
>
> Given the above...would it make sense to use eventlet.tpool or similar
> to perform all disk access in a separate OS thread?  There'd likely be a
> bit of a performance hit, but at least it would isolate the main thread
> from IO blocking.

 Making nova-compute more robust is fine, though the reality is once you
 IO starve a system, a lot of stuff is going to fall over weird.

 So there has to be a tradeoff of the complexity of any new code vs. what
 it gains. I think individual patches should be evaluated as such, or a
 spec if this is going to get really invasive.
>>>
>>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>>> I/O priorization that you could use to give Nova higher priority over
>>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>>> can inflict a denial of service on the mgmt layer.  Of course figuring
>>> out how to use that mechanism correctly is not entirely trivial.
>>>
>>> I think it is probably worth focusing effort in that area, before jumping
>>> into making all the I/O related code in Nova more complicated. eg have
>>> someone investigate & write up recommendation in Nova docs for how to
>>> configure the host OS & Nova such that VMs cannot inflict an I/O denial
>>> of service attack on the mgmt service.
>>
>> +1 that would be much nicer.
>>
>> We've got some set of bugs in the tracker right now which are basically
>> "after the compute node being at loadavg of 11 for an hour, nova-compute
>> starts failing". Having some basic methodology to use Linux
>> prioritization on the worker process would mitigate those quite a bit,
>> and could be used by all users immediately, vs. complex nova-compute
>> changes which would only apply to new / upgraded deploys.
>>
>
>+1
>
>Does that turn into improved deployment docs that cover how you do
>that on various platforms?
>
>Maybe some tools to help with that also go in here?
>http://git.openstack.org/cgit/openstack/osops-tools-generic/

I think we could also include something in the puppet/chef/ansible/… 
Configurations to do the appropriate settings.

>
>Thanks,
>John
>
>PS
>FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit
>in a more heavy handed way.
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread John Garbutt
On 22 February 2016 at 17:38, Sean Dague  wrote:
> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
 Hi all,

 We've recently run into some interesting behaviour that I thought I
 should bring up to see if we want to do anything about it.

 Basically the problem seems to be that nova-compute is doing disk I/O
 from the main thread, and if it blocks then it can block all of
 nova-compute (since all eventlets will be blocked).  Examples that we've
 found include glance image download, file renaming, instance directory
 creation, opening the instance xml file, etc.  We've seen nova-compute
 block for upwards of 50 seconds.

 Now the specific case where we hit this is not a production
 environment.  It's only got one spinning disk shared by all the guests,
 the guests were hammering on the disk pretty hard, the IO scheduler for
 the instance disk was CFQ which seems to be buggy in our kernel.

 But the fact remains that nova-compute is doing disk I/O from the main
 thread, and if the guests push that disk hard enough then nova-compute
 is going to suffer.

 Given the above...would it make sense to use eventlet.tpool or similar
 to perform all disk access in a separate OS thread?  There'd likely be a
 bit of a performance hit, but at least it would isolate the main thread
 from IO blocking.
>>>
>>> Making nova-compute more robust is fine, though the reality is once you
>>> IO starve a system, a lot of stuff is going to fall over weird.
>>>
>>> So there has to be a tradeoff of the complexity of any new code vs. what
>>> it gains. I think individual patches should be evaluated as such, or a
>>> spec if this is going to get really invasive.
>>
>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>> I/O priorization that you could use to give Nova higher priority over
>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>> can inflict a denial of service on the mgmt layer.  Of course figuring
>> out how to use that mechanism correctly is not entirely trivial.
>>
>> I think it is probably worth focusing effort in that area, before jumping
>> into making all the I/O related code in Nova more complicated. eg have
>> someone investigate & write up recommendation in Nova docs for how to
>> configure the host OS & Nova such that VMs cannot inflict an I/O denial
>> of service attack on the mgmt service.
>
> +1 that would be much nicer.
>
> We've got some set of bugs in the tracker right now which are basically
> "after the compute node being at loadavg of 11 for an hour, nova-compute
> starts failing". Having some basic methodology to use Linux
> prioritization on the worker process would mitigate those quite a bit,
> and could be used by all users immediately, vs. complex nova-compute
> changes which would only apply to new / upgraded deploys.
>

+1

Does that turn into improved deployment docs that cover how you do
that on various platforms?

Maybe some tools to help with that also go in here?
http://git.openstack.org/cgit/openstack/osops-tools-generic/

Thanks,
John

PS
FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit
in a more heavy handed way.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Sean Dague
On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>>> Hi all,
>>>
>>> We've recently run into some interesting behaviour that I thought I
>>> should bring up to see if we want to do anything about it.
>>>
>>> Basically the problem seems to be that nova-compute is doing disk I/O
>>> from the main thread, and if it blocks then it can block all of
>>> nova-compute (since all eventlets will be blocked).  Examples that we've
>>> found include glance image download, file renaming, instance directory
>>> creation, opening the instance xml file, etc.  We've seen nova-compute
>>> block for upwards of 50 seconds.
>>>
>>> Now the specific case where we hit this is not a production
>>> environment.  It's only got one spinning disk shared by all the guests,
>>> the guests were hammering on the disk pretty hard, the IO scheduler for
>>> the instance disk was CFQ which seems to be buggy in our kernel.
>>>
>>> But the fact remains that nova-compute is doing disk I/O from the main
>>> thread, and if the guests push that disk hard enough then nova-compute
>>> is going to suffer.
>>>
>>> Given the above...would it make sense to use eventlet.tpool or similar
>>> to perform all disk access in a separate OS thread?  There'd likely be a
>>> bit of a performance hit, but at least it would isolate the main thread
>>> from IO blocking.
>>
>> Making nova-compute more robust is fine, though the reality is once you
>> IO starve a system, a lot of stuff is going to fall over weird.
>>
>> So there has to be a tradeoff of the complexity of any new code vs. what
>> it gains. I think individual patches should be evaluated as such, or a
>> spec if this is going to get really invasive.
> 
> There are OS level mechanisms (eg cgroups blkio controller) for doing
> I/O priorization that you could use to give Nova higher priority over
> the VMs, to reduce (if not eliminate) the possibility that a busy VM
> can inflict a denial of service on the mgmt layer.  Of course figuring
> out how to use that mechanism correctly is not entirely trivial.
> 
> I think it is probably worth focusing effort in that area, before jumping
> into making all the I/O related code in Nova more complicated. eg have
> someone investigate & write up recommendation in Nova docs for how to
> configure the host OS & Nova such that VMs cannot inflict an I/O denial
> of service attack on the mgmt service.

+1 that would be much nicer.

We've got some set of bugs in the tracker right now which are basically
"after the compute node being at loadavg of 11 for an hour, nova-compute
starts failing". Having some basic methodology to use Linux
prioritization on the worker process would mitigate those quite a bit,
and could be used by all users immediately, vs. complex nova-compute
changes which would only apply to new / upgraded deploys.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Andrew Laski


On Mon, Feb 22, 2016, at 12:15 PM, Mike Bayer wrote:
> 
> 
> On 02/22/2016 11:30 AM, Chris Friesen wrote:
> > On 02/22/2016 11:17 AM, Jay Pipes wrote:
> >> On 02/22/2016 10:43 AM, Chris Friesen wrote:
> >>> Hi all,
> >>>
> >>> We've recently run into some interesting behaviour that I thought I
> >>> should bring up to see if we want to do anything about it.
> >>>
> >>> Basically the problem seems to be that nova-compute is doing disk I/O
> >>> from the main thread, and if it blocks then it can block all of
> >>> nova-compute (since all eventlets will be blocked).  Examples that we've
> >>> found include glance image download, file renaming, instance directory
> >>> creation, opening the instance xml file, etc.  We've seen nova-compute
> >>> block for upwards of 50 seconds.
> >>>
> >>> Now the specific case where we hit this is not a production
> >>> environment.  It's only got one spinning disk shared by all the guests,
> >>> the guests were hammering on the disk pretty hard, the IO scheduler for
> >>> the instance disk was CFQ which seems to be buggy in our kernel.
> >>>
> >>> But the fact remains that nova-compute is doing disk I/O from the main
> >>> thread, and if the guests push that disk hard enough then nova-compute
> >>> is going to suffer.
> >>>
> >>> Given the above...would it make sense to use eventlet.tpool or similar
> >>> to perform all disk access in a separate OS thread?  There'd likely be a
> >>> bit of a performance hit, but at least it would isolate the main thread
> >>> from IO blocking.
> >>
> >> This is probably a good idea, but will require quite a bit of code
> >> change. I
> >> think in the past we've taken the expedient route of just exec'ing
> >> problematic
> >> code in a greenthread using utils.spawn().
> >
> > I'm not an expert on eventlet, but from what I've seen this isn't
> > sufficient to deal with disk access in a robust way.
> >
> > It's my understanding that utils.spawn() will result in the code running
> > in the same OS thread, but in a separate eventlet greenthread.  If that
> > code tries to access the disk via a potentially-blocking call the
> > eventlet subsystem will not jump to another greenthread.  Because of
> > this it can potentially block the whole OS thread (and thus all other
> > greenthreads running in that OS thread).
> 
> not sure what utils.spawn() does but if it is in fact an "exec" (or if 
> Jay is suggesting that an exec() be used within) then the code would be 
> in a different process entirely, and communicating with it becomes an 
> issue of pipe IO over unix sockets which IIRC can do non blocking.

utils.spawn() is just a wrapper around eventlet.spawn(), mostly there to
be stubbed out in testing.


> 
> 
> >
> > I think we need to eventlet.tpool for disk IO (or else fork a whole
> > separate process).  Basically we need to ensure that the main OS thread
> > never issues a potentially-blocking syscall.
> 
> tpool would probably be easier (and more performant because no socket 
> needed).
> 
> 
> >
> > Chris
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Daniel P. Berrange
On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
> On 02/22/2016 10:43 AM, Chris Friesen wrote:
> > Hi all,
> > 
> > We've recently run into some interesting behaviour that I thought I
> > should bring up to see if we want to do anything about it.
> > 
> > Basically the problem seems to be that nova-compute is doing disk I/O
> > from the main thread, and if it blocks then it can block all of
> > nova-compute (since all eventlets will be blocked).  Examples that we've
> > found include glance image download, file renaming, instance directory
> > creation, opening the instance xml file, etc.  We've seen nova-compute
> > block for upwards of 50 seconds.
> > 
> > Now the specific case where we hit this is not a production
> > environment.  It's only got one spinning disk shared by all the guests,
> > the guests were hammering on the disk pretty hard, the IO scheduler for
> > the instance disk was CFQ which seems to be buggy in our kernel.
> > 
> > But the fact remains that nova-compute is doing disk I/O from the main
> > thread, and if the guests push that disk hard enough then nova-compute
> > is going to suffer.
> > 
> > Given the above...would it make sense to use eventlet.tpool or similar
> > to perform all disk access in a separate OS thread?  There'd likely be a
> > bit of a performance hit, but at least it would isolate the main thread
> > from IO blocking.
> 
> Making nova-compute more robust is fine, though the reality is once you
> IO starve a system, a lot of stuff is going to fall over weird.
> 
> So there has to be a tradeoff of the complexity of any new code vs. what
> it gains. I think individual patches should be evaluated as such, or a
> spec if this is going to get really invasive.

There are OS level mechanisms (eg cgroups blkio controller) for doing
I/O priorization that you could use to give Nova higher priority over
the VMs, to reduce (if not eliminate) the possibility that a busy VM
can inflict a denial of service on the mgmt layer.  Of course figuring
out how to use that mechanism correctly is not entirely trivial.

I think it is probably worth focusing effort in that area, before jumping
into making all the I/O related code in Nova more complicated. eg have
someone investigate & write up recommendation in Nova docs for how to
configure the host OS & Nova such that VMs cannot inflict an I/O denial
of service attack on the mgmt service.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Mike Bayer



On 02/22/2016 11:30 AM, Chris Friesen wrote:

On 02/22/2016 11:17 AM, Jay Pipes wrote:

On 02/22/2016 10:43 AM, Chris Friesen wrote:

Hi all,

We've recently run into some interesting behaviour that I thought I
should bring up to see if we want to do anything about it.

Basically the problem seems to be that nova-compute is doing disk I/O
from the main thread, and if it blocks then it can block all of
nova-compute (since all eventlets will be blocked).  Examples that we've
found include glance image download, file renaming, instance directory
creation, opening the instance xml file, etc.  We've seen nova-compute
block for upwards of 50 seconds.

Now the specific case where we hit this is not a production
environment.  It's only got one spinning disk shared by all the guests,
the guests were hammering on the disk pretty hard, the IO scheduler for
the instance disk was CFQ which seems to be buggy in our kernel.

But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.


This is probably a good idea, but will require quite a bit of code
change. I
think in the past we've taken the expedient route of just exec'ing
problematic
code in a greenthread using utils.spawn().


I'm not an expert on eventlet, but from what I've seen this isn't
sufficient to deal with disk access in a robust way.

It's my understanding that utils.spawn() will result in the code running
in the same OS thread, but in a separate eventlet greenthread.  If that
code tries to access the disk via a potentially-blocking call the
eventlet subsystem will not jump to another greenthread.  Because of
this it can potentially block the whole OS thread (and thus all other
greenthreads running in that OS thread).


not sure what utils.spawn() does but if it is in fact an "exec" (or if 
Jay is suggesting that an exec() be used within) then the code would be 
in a different process entirely, and communicating with it becomes an 
issue of pipe IO over unix sockets which IIRC can do non blocking.





I think we need to eventlet.tpool for disk IO (or else fork a whole
separate process).  Basically we need to ensure that the main OS thread
never issues a potentially-blocking syscall.


tpool would probably be easier (and more performant because no socket 
needed).





Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Sean Dague
On 02/22/2016 10:43 AM, Chris Friesen wrote:
> Hi all,
> 
> We've recently run into some interesting behaviour that I thought I
> should bring up to see if we want to do anything about it.
> 
> Basically the problem seems to be that nova-compute is doing disk I/O
> from the main thread, and if it blocks then it can block all of
> nova-compute (since all eventlets will be blocked).  Examples that we've
> found include glance image download, file renaming, instance directory
> creation, opening the instance xml file, etc.  We've seen nova-compute
> block for upwards of 50 seconds.
> 
> Now the specific case where we hit this is not a production
> environment.  It's only got one spinning disk shared by all the guests,
> the guests were hammering on the disk pretty hard, the IO scheduler for
> the instance disk was CFQ which seems to be buggy in our kernel.
> 
> But the fact remains that nova-compute is doing disk I/O from the main
> thread, and if the guests push that disk hard enough then nova-compute
> is going to suffer.
> 
> Given the above...would it make sense to use eventlet.tpool or similar
> to perform all disk access in a separate OS thread?  There'd likely be a
> bit of a performance hit, but at least it would isolate the main thread
> from IO blocking.

Making nova-compute more robust is fine, though the reality is once you
IO starve a system, a lot of stuff is going to fall over weird.

So there has to be a tradeoff of the complexity of any new code vs. what
it gains. I think individual patches should be evaluated as such, or a
spec if this is going to get really invasive.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Chris Friesen

On 02/22/2016 11:17 AM, Jay Pipes wrote:

On 02/22/2016 10:43 AM, Chris Friesen wrote:

Hi all,

We've recently run into some interesting behaviour that I thought I
should bring up to see if we want to do anything about it.

Basically the problem seems to be that nova-compute is doing disk I/O
from the main thread, and if it blocks then it can block all of
nova-compute (since all eventlets will be blocked).  Examples that we've
found include glance image download, file renaming, instance directory
creation, opening the instance xml file, etc.  We've seen nova-compute
block for upwards of 50 seconds.

Now the specific case where we hit this is not a production
environment.  It's only got one spinning disk shared by all the guests,
the guests were hammering on the disk pretty hard, the IO scheduler for
the instance disk was CFQ which seems to be buggy in our kernel.

But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.


This is probably a good idea, but will require quite a bit of code change. I
think in the past we've taken the expedient route of just exec'ing problematic
code in a greenthread using utils.spawn().


I'm not an expert on eventlet, but from what I've seen this isn't sufficient to 
deal with disk access in a robust way.


It's my understanding that utils.spawn() will result in the code running in the 
same OS thread, but in a separate eventlet greenthread.  If that code tries to 
access the disk via a potentially-blocking call the eventlet subsystem will not 
jump to another greenthread.  Because of this it can potentially block the whole 
OS thread (and thus all other greenthreads running in that OS thread).


I think we need to eventlet.tpool for disk IO (or else fork a whole separate 
process).  Basically we need to ensure that the main OS thread never issues a 
potentially-blocking syscall.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Jay Pipes

On 02/22/2016 10:43 AM, Chris Friesen wrote:

Hi all,

We've recently run into some interesting behaviour that I thought I
should bring up to see if we want to do anything about it.

Basically the problem seems to be that nova-compute is doing disk I/O
from the main thread, and if it blocks then it can block all of
nova-compute (since all eventlets will be blocked).  Examples that we've
found include glance image download, file renaming, instance directory
creation, opening the instance xml file, etc.  We've seen nova-compute
block for upwards of 50 seconds.

Now the specific case where we hit this is not a production
environment.  It's only got one spinning disk shared by all the guests,
the guests were hammering on the disk pretty hard, the IO scheduler for
the instance disk was CFQ which seems to be buggy in our kernel.

But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.


This is probably a good idea, but will require quite a bit of code 
change. I think in the past we've taken the expedient route of just 
exec'ing problematic code in a greenthread using utils.spawn().


Best,
-jay

[1]

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Chris Friesen

Hi all,

We've recently run into some interesting behaviour that I thought I should bring 
up to see if we want to do anything about it.


Basically the problem seems to be that nova-compute is doing disk I/O from the 
main thread, and if it blocks then it can block all of nova-compute (since all 
eventlets will be blocked).  Examples that we've found include glance image 
download, file renaming, instance directory creation, opening the instance xml 
file, etc.  We've seen nova-compute block for upwards of 50 seconds.


Now the specific case where we hit this is not a production environment.  It's 
only got one spinning disk shared by all the guests, the guests were hammering 
on the disk pretty hard, the IO scheduler for the instance disk was CFQ which 
seems to be buggy in our kernel.


But the fact remains that nova-compute is doing disk I/O from the main thread, 
and if the guests push that disk hard enough then nova-compute is going to suffer.


Given the above...would it make sense to use eventlet.tpool or similar to 
perform all disk access in a separate OS thread?  There'd likely be a bit of a 
performance hit, but at least it would isolate the main thread from IO blocking.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev