Re: [openstack-dev] [oslo] rpc concurrency control rfc

2014-03-04 Thread Daniel P. Berrange
On Tue, Mar 04, 2014 at 04:15:03PM +, Duncan Thomas wrote:
> On 28 November 2013 10:14, Daniel P. Berrange  wrote:
> 
> > For this specific block zero'ing case it occurred to me that it might
> > be sufficient to just invoke 'ionice dd' instead of 'dd' and give it
> > a lower I/O priority class than normal.
> 
> Excuse the thread necromancy, I've just been searching for thoughts
> about this very issue. I've merged a patch that does I/O nice, and it
> helps, but it is easy to DoS a volume server by creating and deleting
> volumes fast while maintaining a high i/o load... the zeroing never
> runs and so you run out of allocatable space.

Oh well, thanks for experimenting with this idea anyway.

> I'll take a look at writing something with more controls than dd for
> doing the zeroing...

Someone already beat you to it

  commit 71946855591a41dcc87ef59656a8a340774eeaf2
  Author: Pádraig Brady 
  Date:   Tue Feb 11 11:51:39 2014 +

libvirt: support configurable wipe methods for LVM backed instances

Provide configurable methods to clear these volumes.
The new 'volume_clear' and 'volume_clear_size' options
are the same as currently supported by cinder.

* nova/virt/libvirt/imagebackend.py: Define the new options.
* nova/virt/libvirt/utils.py (clear_logical_volume): Support the
new options. Refactor the existing dd method out to
_zero_logic_volume().
* nova/tests/virt/libvirt/test_libvirt_utils.py: Add missing test cases
for the existing clear_logical_volume code, and for the new code
supporting the new clearing methods.
* etc/nova/nova.conf.sample: Add the 2 new config descriptions
to the [libvirt] section.

Change-Id: I5551197f9ec89ae2f9b051696bccdeb1af2c031f
Closes-Bug: #889299

this matches equivalent config in cinder.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] rpc concurrency control rfc

2014-03-04 Thread Duncan Thomas
On 28 November 2013 10:14, Daniel P. Berrange  wrote:

> For this specific block zero'ing case it occurred to me that it might
> be sufficient to just invoke 'ionice dd' instead of 'dd' and give it
> a lower I/O priority class than normal.

Excuse the thread necromancy, I've just been searching for thoughts
about this very issue. I've merged a patch that does I/O nice, and it
helps, but it is easy to DoS a volume server by creating and deleting
volumes fast while maintaining a high i/o load... the zeroing never
runs and so you run out of allocatable space.

I'll take a look at writing something with more controls than dd for
doing the zeroing...

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-28 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 07:34:15PM +, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 06:43:42PM +, Edward Hope-Morley wrote:
> > On 27/11/13 18:20, Daniel P. Berrange wrote:
> > > On Wed, Nov 27, 2013 at 06:10:47PM +, Edward Hope-Morley wrote:
> > >> On 27/11/13 17:43, Daniel P. Berrange wrote:
> > >>> On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
> >  On 27/11/13 15:49, Daniel P. Berrange wrote:
> > > On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
> > >> Moving this to the ml as requested, would appreciate
> > >> comments/thoughts/feedback.
> > >>
> > >> So, I recently proposed a small patch to the oslo rpc code 
> > >> (initially in
> > >> oslo-incubator then moved to oslo.messaging) which extends the 
> > >> existing
> > >> support for limiting the rpc thread pool so that concurrent requests 
> > >> can
> > >> be limited based on type/method. The blueprint and patch are here:
> > >>
> > >> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> > >>
> > >> The basic idea is that if you have server with limited resources you 
> > >> may
> > >> want restrict operations that would impact those resources e.g. live
> > >> migrations on a specific hypervisor or volume formatting on 
> > >> particular
> > >> volume node. This patch allows you, admittedly in a very crude way, 
> > >> to
> > >> apply a fixed limit to a set of rpc methods. I would like to know
> > >> whether or not people think this is sort of thing would be useful or
> > >> whether it alludes to a more fundamental issue that should be dealt 
> > >> with
> > >> in a different manner.
> > > Based on this description of the problem I have some observations
> > >
> > >  - I/O load from the guest OS itself is just as important to consider
> > >as I/O load from management operations Nova does for a guest. Both
> > >have the capability to impose denial-of-service on a host. IIUC, 
> > > the
> > >flavour specs have the ability to express resource constraints for
> > >the virtual machines to prevent a guest OS initiated DOS-attack
> > >
> > >  - I/O load from live migration is attributable to the running
> > >virtual machine. As such I'd expect that any resource controls
> > >associated with the guest (from the flavour specs) should be
> > >applied to control the load from live migration.
> > >
> > >Unfortunately life isn't quite this simple with KVM/libvirt
> > >currently. For networking we've associated each virtual TAP
> > >device with traffic shaping filters. For migration you have
> > >to set a bandwidth cap explicitly via the API. For network
> > >based storage backends, you don't directly control network
> > >usage, but instead I/O operations/bytes. Ultimately though
> > >there should be a way to enforce limits on anything KVM does,
> > >similarly I expect other hypervisors can do the same
> > >
> > >  - I/O load from operations that Nova does on behalf of a guest
> > >that may be running, or may yet to be launched. These are not
> > >directly known to the hypervisor, so existing resource limits
> > >won't apply. Nova however should have some capability for
> > >applying resource limits to I/O intensive things it does and
> > >somehow associate them with the flavour limits  or some global
> > >per user cap perhaps.
> > >
> > >> Thoughts?
> > > Overall I think that trying to apply caps on the number of API calls
> > > that can be made is not really a credible way to avoid users 
> > > inflicting
> > > DOS attack on the host OS. Not least because it does nothing to 
> > > control
> > > what a guest OS itself may do. If you do caps based on num of APIs 
> > > calls
> > > in a time period, you end up having to do an extremely pessistic
> > > calculation - basically have to consider the worst case for any single
> > > API call, even if most don't hit the worst case. This is going to hurt
> > > scalability of the system as a whole IMHO.
> > >
> > > Regards,
> > > Daniel
> >  Daniel, thanks for this, these are all valid points and essentially tie
> >  with the fundamental issue of dealing with DOS attacks but for this bp 
> >  I
> >  actually want to stay away from this area i.e. this is not intended to
> >  solve any tenant-based attack issues in the rpc layer (although that
> >  definitely warrants a discussion e.g. how do we stop a single tenant
> >  from consuming the entire thread pool with requests) but rather I'm
> >  thinking more from a QOS perspective i.e. to allow an admin to account
> >  for a resource bias e.g. slow RAID controller, on a given node (not
>

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Edward Hope-Morley
On 27/11/13 19:34, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 06:43:42PM +, Edward Hope-Morley wrote:
>> On 27/11/13 18:20, Daniel P. Berrange wrote:
>>> On Wed, Nov 27, 2013 at 06:10:47PM +, Edward Hope-Morley wrote:
 On 27/11/13 17:43, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
>> On 27/11/13 15:49, Daniel P. Berrange wrote:
>>> On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
 Moving this to the ml as requested, would appreciate
 comments/thoughts/feedback.

 So, I recently proposed a small patch to the oslo rpc code (initially 
 in
 oslo-incubator then moved to oslo.messaging) which extends the existing
 support for limiting the rpc thread pool so that concurrent requests 
 can
 be limited based on type/method. The blueprint and patch are here:

 https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control

 The basic idea is that if you have server with limited resources you 
 may
 want restrict operations that would impact those resources e.g. live
 migrations on a specific hypervisor or volume formatting on particular
 volume node. This patch allows you, admittedly in a very crude way, to
 apply a fixed limit to a set of rpc methods. I would like to know
 whether or not people think this is sort of thing would be useful or
 whether it alludes to a more fundamental issue that should be dealt 
 with
 in a different manner.
>>> Based on this description of the problem I have some observations
>>>
>>>  - I/O load from the guest OS itself is just as important to consider
>>>as I/O load from management operations Nova does for a guest. Both
>>>have the capability to impose denial-of-service on a host. IIUC, the
>>>flavour specs have the ability to express resource constraints for
>>>the virtual machines to prevent a guest OS initiated DOS-attack
>>>
>>>  - I/O load from live migration is attributable to the running
>>>virtual machine. As such I'd expect that any resource controls
>>>associated with the guest (from the flavour specs) should be
>>>applied to control the load from live migration.
>>>
>>>Unfortunately life isn't quite this simple with KVM/libvirt
>>>currently. For networking we've associated each virtual TAP
>>>device with traffic shaping filters. For migration you have
>>>to set a bandwidth cap explicitly via the API. For network
>>>based storage backends, you don't directly control network
>>>usage, but instead I/O operations/bytes. Ultimately though
>>>there should be a way to enforce limits on anything KVM does,
>>>similarly I expect other hypervisors can do the same
>>>
>>>  - I/O load from operations that Nova does on behalf of a guest
>>>that may be running, or may yet to be launched. These are not
>>>directly known to the hypervisor, so existing resource limits
>>>won't apply. Nova however should have some capability for
>>>applying resource limits to I/O intensive things it does and
>>>somehow associate them with the flavour limits  or some global
>>>per user cap perhaps.
>>>
 Thoughts?
>>> Overall I think that trying to apply caps on the number of API calls
>>> that can be made is not really a credible way to avoid users inflicting
>>> DOS attack on the host OS. Not least because it does nothing to control
>>> what a guest OS itself may do. If you do caps based on num of APIs calls
>>> in a time period, you end up having to do an extremely pessistic
>>> calculation - basically have to consider the worst case for any single
>>> API call, even if most don't hit the worst case. This is going to hurt
>>> scalability of the system as a whole IMHO.
>>>
>>> Regards,
>>> Daniel
>> Daniel, thanks for this, these are all valid points and essentially tie
>> with the fundamental issue of dealing with DOS attacks but for this bp I
>> actually want to stay away from this area i.e. this is not intended to
>> solve any tenant-based attack issues in the rpc layer (although that
>> definitely warrants a discussion e.g. how do we stop a single tenant
>> from consuming the entire thread pool with requests) but rather I'm
>> thinking more from a QOS perspective i.e. to allow an admin to account
>> for a resource bias e.g. slow RAID controller, on a given node (not
>> necessarily Nova/HV) which could be alleviated with this sort of crude
>> rate limiting. Of course one problem with this approach is that
>> blocked/limited requests still reside in the same pool as other requests
>> so if we did want to use this it 

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 06:43:42PM +, Edward Hope-Morley wrote:
> On 27/11/13 18:20, Daniel P. Berrange wrote:
> > On Wed, Nov 27, 2013 at 06:10:47PM +, Edward Hope-Morley wrote:
> >> On 27/11/13 17:43, Daniel P. Berrange wrote:
> >>> On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
>  On 27/11/13 15:49, Daniel P. Berrange wrote:
> > On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
> >> Moving this to the ml as requested, would appreciate
> >> comments/thoughts/feedback.
> >>
> >> So, I recently proposed a small patch to the oslo rpc code (initially 
> >> in
> >> oslo-incubator then moved to oslo.messaging) which extends the existing
> >> support for limiting the rpc thread pool so that concurrent requests 
> >> can
> >> be limited based on type/method. The blueprint and patch are here:
> >>
> >> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> >>
> >> The basic idea is that if you have server with limited resources you 
> >> may
> >> want restrict operations that would impact those resources e.g. live
> >> migrations on a specific hypervisor or volume formatting on particular
> >> volume node. This patch allows you, admittedly in a very crude way, to
> >> apply a fixed limit to a set of rpc methods. I would like to know
> >> whether or not people think this is sort of thing would be useful or
> >> whether it alludes to a more fundamental issue that should be dealt 
> >> with
> >> in a different manner.
> > Based on this description of the problem I have some observations
> >
> >  - I/O load from the guest OS itself is just as important to consider
> >as I/O load from management operations Nova does for a guest. Both
> >have the capability to impose denial-of-service on a host. IIUC, the
> >flavour specs have the ability to express resource constraints for
> >the virtual machines to prevent a guest OS initiated DOS-attack
> >
> >  - I/O load from live migration is attributable to the running
> >virtual machine. As such I'd expect that any resource controls
> >associated with the guest (from the flavour specs) should be
> >applied to control the load from live migration.
> >
> >Unfortunately life isn't quite this simple with KVM/libvirt
> >currently. For networking we've associated each virtual TAP
> >device with traffic shaping filters. For migration you have
> >to set a bandwidth cap explicitly via the API. For network
> >based storage backends, you don't directly control network
> >usage, but instead I/O operations/bytes. Ultimately though
> >there should be a way to enforce limits on anything KVM does,
> >similarly I expect other hypervisors can do the same
> >
> >  - I/O load from operations that Nova does on behalf of a guest
> >that may be running, or may yet to be launched. These are not
> >directly known to the hypervisor, so existing resource limits
> >won't apply. Nova however should have some capability for
> >applying resource limits to I/O intensive things it does and
> >somehow associate them with the flavour limits  or some global
> >per user cap perhaps.
> >
> >> Thoughts?
> > Overall I think that trying to apply caps on the number of API calls
> > that can be made is not really a credible way to avoid users inflicting
> > DOS attack on the host OS. Not least because it does nothing to control
> > what a guest OS itself may do. If you do caps based on num of APIs calls
> > in a time period, you end up having to do an extremely pessistic
> > calculation - basically have to consider the worst case for any single
> > API call, even if most don't hit the worst case. This is going to hurt
> > scalability of the system as a whole IMHO.
> >
> > Regards,
> > Daniel
>  Daniel, thanks for this, these are all valid points and essentially tie
>  with the fundamental issue of dealing with DOS attacks but for this bp I
>  actually want to stay away from this area i.e. this is not intended to
>  solve any tenant-based attack issues in the rpc layer (although that
>  definitely warrants a discussion e.g. how do we stop a single tenant
>  from consuming the entire thread pool with requests) but rather I'm
>  thinking more from a QOS perspective i.e. to allow an admin to account
>  for a resource bias e.g. slow RAID controller, on a given node (not
>  necessarily Nova/HV) which could be alleviated with this sort of crude
>  rate limiting. Of course one problem with this approach is that
>  blocked/limited requests still reside in the same pool as other requests
>  so if we did want to use this it may be worth considering offloading
>  block

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Edward Hope-Morley
On 27/11/13 18:20, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 06:10:47PM +, Edward Hope-Morley wrote:
>> On 27/11/13 17:43, Daniel P. Berrange wrote:
>>> On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
 On 27/11/13 15:49, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
>> Moving this to the ml as requested, would appreciate
>> comments/thoughts/feedback.
>>
>> So, I recently proposed a small patch to the oslo rpc code (initially in
>> oslo-incubator then moved to oslo.messaging) which extends the existing
>> support for limiting the rpc thread pool so that concurrent requests can
>> be limited based on type/method. The blueprint and patch are here:
>>
>> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
>>
>> The basic idea is that if you have server with limited resources you may
>> want restrict operations that would impact those resources e.g. live
>> migrations on a specific hypervisor or volume formatting on particular
>> volume node. This patch allows you, admittedly in a very crude way, to
>> apply a fixed limit to a set of rpc methods. I would like to know
>> whether or not people think this is sort of thing would be useful or
>> whether it alludes to a more fundamental issue that should be dealt with
>> in a different manner.
> Based on this description of the problem I have some observations
>
>  - I/O load from the guest OS itself is just as important to consider
>as I/O load from management operations Nova does for a guest. Both
>have the capability to impose denial-of-service on a host. IIUC, the
>flavour specs have the ability to express resource constraints for
>the virtual machines to prevent a guest OS initiated DOS-attack
>
>  - I/O load from live migration is attributable to the running
>virtual machine. As such I'd expect that any resource controls
>associated with the guest (from the flavour specs) should be
>applied to control the load from live migration.
>
>Unfortunately life isn't quite this simple with KVM/libvirt
>currently. For networking we've associated each virtual TAP
>device with traffic shaping filters. For migration you have
>to set a bandwidth cap explicitly via the API. For network
>based storage backends, you don't directly control network
>usage, but instead I/O operations/bytes. Ultimately though
>there should be a way to enforce limits on anything KVM does,
>similarly I expect other hypervisors can do the same
>
>  - I/O load from operations that Nova does on behalf of a guest
>that may be running, or may yet to be launched. These are not
>directly known to the hypervisor, so existing resource limits
>won't apply. Nova however should have some capability for
>applying resource limits to I/O intensive things it does and
>somehow associate them with the flavour limits  or some global
>per user cap perhaps.
>
>> Thoughts?
> Overall I think that trying to apply caps on the number of API calls
> that can be made is not really a credible way to avoid users inflicting
> DOS attack on the host OS. Not least because it does nothing to control
> what a guest OS itself may do. If you do caps based on num of APIs calls
> in a time period, you end up having to do an extremely pessistic
> calculation - basically have to consider the worst case for any single
> API call, even if most don't hit the worst case. This is going to hurt
> scalability of the system as a whole IMHO.
>
> Regards,
> Daniel
 Daniel, thanks for this, these are all valid points and essentially tie
 with the fundamental issue of dealing with DOS attacks but for this bp I
 actually want to stay away from this area i.e. this is not intended to
 solve any tenant-based attack issues in the rpc layer (although that
 definitely warrants a discussion e.g. how do we stop a single tenant
 from consuming the entire thread pool with requests) but rather I'm
 thinking more from a QOS perspective i.e. to allow an admin to account
 for a resource bias e.g. slow RAID controller, on a given node (not
 necessarily Nova/HV) which could be alleviated with this sort of crude
 rate limiting. Of course one problem with this approach is that
 blocked/limited requests still reside in the same pool as other requests
 so if we did want to use this it may be worth considering offloading
 blocked requests or giving them their own pool altogether.

 ...or maybe this is just pie in the sky after all.
>>> I don't think it is valid to ignore tenant-based attacks in this. You
>>> have a single resource here and it can be consumed by the tenant
>>> OS, 

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 06:10:47PM +, Edward Hope-Morley wrote:
> On 27/11/13 17:43, Daniel P. Berrange wrote:
> > On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
> >> On 27/11/13 15:49, Daniel P. Berrange wrote:
> >>> On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
>  Moving this to the ml as requested, would appreciate
>  comments/thoughts/feedback.
> 
>  So, I recently proposed a small patch to the oslo rpc code (initially in
>  oslo-incubator then moved to oslo.messaging) which extends the existing
>  support for limiting the rpc thread pool so that concurrent requests can
>  be limited based on type/method. The blueprint and patch are here:
> 
>  https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> 
>  The basic idea is that if you have server with limited resources you may
>  want restrict operations that would impact those resources e.g. live
>  migrations on a specific hypervisor or volume formatting on particular
>  volume node. This patch allows you, admittedly in a very crude way, to
>  apply a fixed limit to a set of rpc methods. I would like to know
>  whether or not people think this is sort of thing would be useful or
>  whether it alludes to a more fundamental issue that should be dealt with
>  in a different manner.
> >>> Based on this description of the problem I have some observations
> >>>
> >>>  - I/O load from the guest OS itself is just as important to consider
> >>>as I/O load from management operations Nova does for a guest. Both
> >>>have the capability to impose denial-of-service on a host. IIUC, the
> >>>flavour specs have the ability to express resource constraints for
> >>>the virtual machines to prevent a guest OS initiated DOS-attack
> >>>
> >>>  - I/O load from live migration is attributable to the running
> >>>virtual machine. As such I'd expect that any resource controls
> >>>associated with the guest (from the flavour specs) should be
> >>>applied to control the load from live migration.
> >>>
> >>>Unfortunately life isn't quite this simple with KVM/libvirt
> >>>currently. For networking we've associated each virtual TAP
> >>>device with traffic shaping filters. For migration you have
> >>>to set a bandwidth cap explicitly via the API. For network
> >>>based storage backends, you don't directly control network
> >>>usage, but instead I/O operations/bytes. Ultimately though
> >>>there should be a way to enforce limits on anything KVM does,
> >>>similarly I expect other hypervisors can do the same
> >>>
> >>>  - I/O load from operations that Nova does on behalf of a guest
> >>>that may be running, or may yet to be launched. These are not
> >>>directly known to the hypervisor, so existing resource limits
> >>>won't apply. Nova however should have some capability for
> >>>applying resource limits to I/O intensive things it does and
> >>>somehow associate them with the flavour limits  or some global
> >>>per user cap perhaps.
> >>>
>  Thoughts?
> >>> Overall I think that trying to apply caps on the number of API calls
> >>> that can be made is not really a credible way to avoid users inflicting
> >>> DOS attack on the host OS. Not least because it does nothing to control
> >>> what a guest OS itself may do. If you do caps based on num of APIs calls
> >>> in a time period, you end up having to do an extremely pessistic
> >>> calculation - basically have to consider the worst case for any single
> >>> API call, even if most don't hit the worst case. This is going to hurt
> >>> scalability of the system as a whole IMHO.
> >>>
> >>> Regards,
> >>> Daniel
> >> Daniel, thanks for this, these are all valid points and essentially tie
> >> with the fundamental issue of dealing with DOS attacks but for this bp I
> >> actually want to stay away from this area i.e. this is not intended to
> >> solve any tenant-based attack issues in the rpc layer (although that
> >> definitely warrants a discussion e.g. how do we stop a single tenant
> >> from consuming the entire thread pool with requests) but rather I'm
> >> thinking more from a QOS perspective i.e. to allow an admin to account
> >> for a resource bias e.g. slow RAID controller, on a given node (not
> >> necessarily Nova/HV) which could be alleviated with this sort of crude
> >> rate limiting. Of course one problem with this approach is that
> >> blocked/limited requests still reside in the same pool as other requests
> >> so if we did want to use this it may be worth considering offloading
> >> blocked requests or giving them their own pool altogether.
> >>
> >> ...or maybe this is just pie in the sky after all.
> > I don't think it is valid to ignore tenant-based attacks in this. You
> > have a single resource here and it can be consumed by the tenant
> > OS, by the VM associated with the tenant or by Nova 

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Edward Hope-Morley
On 27/11/13 17:43, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
>> On 27/11/13 15:49, Daniel P. Berrange wrote:
>>> On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
 Moving this to the ml as requested, would appreciate
 comments/thoughts/feedback.

 So, I recently proposed a small patch to the oslo rpc code (initially in
 oslo-incubator then moved to oslo.messaging) which extends the existing
 support for limiting the rpc thread pool so that concurrent requests can
 be limited based on type/method. The blueprint and patch are here:

 https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control

 The basic idea is that if you have server with limited resources you may
 want restrict operations that would impact those resources e.g. live
 migrations on a specific hypervisor or volume formatting on particular
 volume node. This patch allows you, admittedly in a very crude way, to
 apply a fixed limit to a set of rpc methods. I would like to know
 whether or not people think this is sort of thing would be useful or
 whether it alludes to a more fundamental issue that should be dealt with
 in a different manner.
>>> Based on this description of the problem I have some observations
>>>
>>>  - I/O load from the guest OS itself is just as important to consider
>>>as I/O load from management operations Nova does for a guest. Both
>>>have the capability to impose denial-of-service on a host. IIUC, the
>>>flavour specs have the ability to express resource constraints for
>>>the virtual machines to prevent a guest OS initiated DOS-attack
>>>
>>>  - I/O load from live migration is attributable to the running
>>>virtual machine. As such I'd expect that any resource controls
>>>associated with the guest (from the flavour specs) should be
>>>applied to control the load from live migration.
>>>
>>>Unfortunately life isn't quite this simple with KVM/libvirt
>>>currently. For networking we've associated each virtual TAP
>>>device with traffic shaping filters. For migration you have
>>>to set a bandwidth cap explicitly via the API. For network
>>>based storage backends, you don't directly control network
>>>usage, but instead I/O operations/bytes. Ultimately though
>>>there should be a way to enforce limits on anything KVM does,
>>>similarly I expect other hypervisors can do the same
>>>
>>>  - I/O load from operations that Nova does on behalf of a guest
>>>that may be running, or may yet to be launched. These are not
>>>directly known to the hypervisor, so existing resource limits
>>>won't apply. Nova however should have some capability for
>>>applying resource limits to I/O intensive things it does and
>>>somehow associate them with the flavour limits  or some global
>>>per user cap perhaps.
>>>
 Thoughts?
>>> Overall I think that trying to apply caps on the number of API calls
>>> that can be made is not really a credible way to avoid users inflicting
>>> DOS attack on the host OS. Not least because it does nothing to control
>>> what a guest OS itself may do. If you do caps based on num of APIs calls
>>> in a time period, you end up having to do an extremely pessistic
>>> calculation - basically have to consider the worst case for any single
>>> API call, even if most don't hit the worst case. This is going to hurt
>>> scalability of the system as a whole IMHO.
>>>
>>> Regards,
>>> Daniel
>> Daniel, thanks for this, these are all valid points and essentially tie
>> with the fundamental issue of dealing with DOS attacks but for this bp I
>> actually want to stay away from this area i.e. this is not intended to
>> solve any tenant-based attack issues in the rpc layer (although that
>> definitely warrants a discussion e.g. how do we stop a single tenant
>> from consuming the entire thread pool with requests) but rather I'm
>> thinking more from a QOS perspective i.e. to allow an admin to account
>> for a resource bias e.g. slow RAID controller, on a given node (not
>> necessarily Nova/HV) which could be alleviated with this sort of crude
>> rate limiting. Of course one problem with this approach is that
>> blocked/limited requests still reside in the same pool as other requests
>> so if we did want to use this it may be worth considering offloading
>> blocked requests or giving them their own pool altogether.
>>
>> ...or maybe this is just pie in the sky after all.
> I don't think it is valid to ignore tenant-based attacks in this. You
> have a single resource here and it can be consumed by the tenant
> OS, by the VM associated with the tenant or by Nova itself. As such,
> IMHO adding rate limiting to Nova APIs alone is a non-solution because
> you've still left it wide open to starvation by any number of other
> routes which are arguably even more critical to address than the API
>

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 05:39:30PM +, Edward Hope-Morley wrote:
> On 27/11/13 15:49, Daniel P. Berrange wrote:
> > On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
> >> Moving this to the ml as requested, would appreciate
> >> comments/thoughts/feedback.
> >>
> >> So, I recently proposed a small patch to the oslo rpc code (initially in
> >> oslo-incubator then moved to oslo.messaging) which extends the existing
> >> support for limiting the rpc thread pool so that concurrent requests can
> >> be limited based on type/method. The blueprint and patch are here:
> >>
> >> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> >>
> >> The basic idea is that if you have server with limited resources you may
> >> want restrict operations that would impact those resources e.g. live
> >> migrations on a specific hypervisor or volume formatting on particular
> >> volume node. This patch allows you, admittedly in a very crude way, to
> >> apply a fixed limit to a set of rpc methods. I would like to know
> >> whether or not people think this is sort of thing would be useful or
> >> whether it alludes to a more fundamental issue that should be dealt with
> >> in a different manner.
> > Based on this description of the problem I have some observations
> >
> >  - I/O load from the guest OS itself is just as important to consider
> >as I/O load from management operations Nova does for a guest. Both
> >have the capability to impose denial-of-service on a host. IIUC, the
> >flavour specs have the ability to express resource constraints for
> >the virtual machines to prevent a guest OS initiated DOS-attack
> >
> >  - I/O load from live migration is attributable to the running
> >virtual machine. As such I'd expect that any resource controls
> >associated with the guest (from the flavour specs) should be
> >applied to control the load from live migration.
> >
> >Unfortunately life isn't quite this simple with KVM/libvirt
> >currently. For networking we've associated each virtual TAP
> >device with traffic shaping filters. For migration you have
> >to set a bandwidth cap explicitly via the API. For network
> >based storage backends, you don't directly control network
> >usage, but instead I/O operations/bytes. Ultimately though
> >there should be a way to enforce limits on anything KVM does,
> >similarly I expect other hypervisors can do the same
> >
> >  - I/O load from operations that Nova does on behalf of a guest
> >that may be running, or may yet to be launched. These are not
> >directly known to the hypervisor, so existing resource limits
> >won't apply. Nova however should have some capability for
> >applying resource limits to I/O intensive things it does and
> >somehow associate them with the flavour limits  or some global
> >per user cap perhaps.
> >
> >> Thoughts?
> > Overall I think that trying to apply caps on the number of API calls
> > that can be made is not really a credible way to avoid users inflicting
> > DOS attack on the host OS. Not least because it does nothing to control
> > what a guest OS itself may do. If you do caps based on num of APIs calls
> > in a time period, you end up having to do an extremely pessistic
> > calculation - basically have to consider the worst case for any single
> > API call, even if most don't hit the worst case. This is going to hurt
> > scalability of the system as a whole IMHO.
> >
> > Regards,
> > Daniel
> Daniel, thanks for this, these are all valid points and essentially tie
> with the fundamental issue of dealing with DOS attacks but for this bp I
> actually want to stay away from this area i.e. this is not intended to
> solve any tenant-based attack issues in the rpc layer (although that
> definitely warrants a discussion e.g. how do we stop a single tenant
> from consuming the entire thread pool with requests) but rather I'm
> thinking more from a QOS perspective i.e. to allow an admin to account
> for a resource bias e.g. slow RAID controller, on a given node (not
> necessarily Nova/HV) which could be alleviated with this sort of crude
> rate limiting. Of course one problem with this approach is that
> blocked/limited requests still reside in the same pool as other requests
> so if we did want to use this it may be worth considering offloading
> blocked requests or giving them their own pool altogether.
> 
> ...or maybe this is just pie in the sky after all.

I don't think it is valid to ignore tenant-based attacks in this. You
have a single resource here and it can be consumed by the tenant
OS, by the VM associated with the tenant or by Nova itself. As such,
IMHO adding rate limiting to Nova APIs alone is a non-solution because
you've still left it wide open to starvation by any number of other
routes which are arguably even more critical to address than the API
calls.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.co

Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Edward Hope-Morley
On 27/11/13 15:49, Daniel P. Berrange wrote:
> On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
>> Moving this to the ml as requested, would appreciate
>> comments/thoughts/feedback.
>>
>> So, I recently proposed a small patch to the oslo rpc code (initially in
>> oslo-incubator then moved to oslo.messaging) which extends the existing
>> support for limiting the rpc thread pool so that concurrent requests can
>> be limited based on type/method. The blueprint and patch are here:
>>
>> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
>>
>> The basic idea is that if you have server with limited resources you may
>> want restrict operations that would impact those resources e.g. live
>> migrations on a specific hypervisor or volume formatting on particular
>> volume node. This patch allows you, admittedly in a very crude way, to
>> apply a fixed limit to a set of rpc methods. I would like to know
>> whether or not people think this is sort of thing would be useful or
>> whether it alludes to a more fundamental issue that should be dealt with
>> in a different manner.
> Based on this description of the problem I have some observations
>
>  - I/O load from the guest OS itself is just as important to consider
>as I/O load from management operations Nova does for a guest. Both
>have the capability to impose denial-of-service on a host. IIUC, the
>flavour specs have the ability to express resource constraints for
>the virtual machines to prevent a guest OS initiated DOS-attack
>
>  - I/O load from live migration is attributable to the running
>virtual machine. As such I'd expect that any resource controls
>associated with the guest (from the flavour specs) should be
>applied to control the load from live migration.
>
>Unfortunately life isn't quite this simple with KVM/libvirt
>currently. For networking we've associated each virtual TAP
>device with traffic shaping filters. For migration you have
>to set a bandwidth cap explicitly via the API. For network
>based storage backends, you don't directly control network
>usage, but instead I/O operations/bytes. Ultimately though
>there should be a way to enforce limits on anything KVM does,
>similarly I expect other hypervisors can do the same
>
>  - I/O load from operations that Nova does on behalf of a guest
>that may be running, or may yet to be launched. These are not
>directly known to the hypervisor, so existing resource limits
>won't apply. Nova however should have some capability for
>applying resource limits to I/O intensive things it does and
>somehow associate them with the flavour limits  or some global
>per user cap perhaps.
>
>> Thoughts?
> Overall I think that trying to apply caps on the number of API calls
> that can be made is not really a credible way to avoid users inflicting
> DOS attack on the host OS. Not least because it does nothing to control
> what a guest OS itself may do. If you do caps based on num of APIs calls
> in a time period, you end up having to do an extremely pessistic
> calculation - basically have to consider the worst case for any single
> API call, even if most don't hit the worst case. This is going to hurt
> scalability of the system as a whole IMHO.
>
> Regards,
> Daniel
Daniel, thanks for this, these are all valid points and essentially tie
with the fundamental issue of dealing with DOS attacks but for this bp I
actually want to stay away from this area i.e. this is not intended to
solve any tenant-based attack issues in the rpc layer (although that
definitely warrants a discussion e.g. how do we stop a single tenant
from consuming the entire thread pool with requests) but rather I'm
thinking more from a QOS perspective i.e. to allow an admin to account
for a resource bias e.g. slow RAID controller, on a given node (not
necessarily Nova/HV) which could be alleviated with this sort of crude
rate limiting. Of course one problem with this approach is that
blocked/limited requests still reside in the same pool as other requests
so if we did want to use this it may be worth considering offloading
blocked requests or giving them their own pool altogether.

...or maybe this is just pie in the sky after all.

Ed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 02:45:22PM +, Edward Hope-Morley wrote:
> Moving this to the ml as requested, would appreciate
> comments/thoughts/feedback.
> 
> So, I recently proposed a small patch to the oslo rpc code (initially in
> oslo-incubator then moved to oslo.messaging) which extends the existing
> support for limiting the rpc thread pool so that concurrent requests can
> be limited based on type/method. The blueprint and patch are here:
> 
> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> 
> The basic idea is that if you have server with limited resources you may
> want restrict operations that would impact those resources e.g. live
> migrations on a specific hypervisor or volume formatting on particular
> volume node. This patch allows you, admittedly in a very crude way, to
> apply a fixed limit to a set of rpc methods. I would like to know
> whether or not people think this is sort of thing would be useful or
> whether it alludes to a more fundamental issue that should be dealt with
> in a different manner.

Based on this description of the problem I have some observations

 - I/O load from the guest OS itself is just as important to consider
   as I/O load from management operations Nova does for a guest. Both
   have the capability to impose denial-of-service on a host. IIUC, the
   flavour specs have the ability to express resource constraints for
   the virtual machines to prevent a guest OS initiated DOS-attack

 - I/O load from live migration is attributable to the running
   virtual machine. As such I'd expect that any resource controls
   associated with the guest (from the flavour specs) should be
   applied to control the load from live migration.

   Unfortunately life isn't quite this simple with KVM/libvirt
   currently. For networking we've associated each virtual TAP
   device with traffic shaping filters. For migration you have
   to set a bandwidth cap explicitly via the API. For network
   based storage backends, you don't directly control network
   usage, but instead I/O operations/bytes. Ultimately though
   there should be a way to enforce limits on anything KVM does,
   similarly I expect other hypervisors can do the same

 - I/O load from operations that Nova does on behalf of a guest
   that may be running, or may yet to be launched. These are not
   directly known to the hypervisor, so existing resource limits
   won't apply. Nova however should have some capability for
   applying resource limits to I/O intensive things it does and
   somehow associate them with the flavour limits  or some global
   per user cap perhaps.

> Thoughts?

Overall I think that trying to apply caps on the number of API calls
that can be made is not really a credible way to avoid users inflicting
DOS attack on the host OS. Not least because it does nothing to control
what a guest OS itself may do. If you do caps based on num of APIs calls
in a time period, you end up having to do an extremely pessistic
calculation - basically have to consider the worst case for any single
API call, even if most don't hit the worst case. This is going to hurt
scalability of the system as a whole IMHO.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Mark McLoughlin
Hi,

On Wed, 2013-11-27 at 14:45 +, Edward Hope-Morley wrote:
> Moving this to the ml as requested, would appreciate
> comments/thoughts/feedback.

Thanks, I too would appreciate input from others.

> So, I recently proposed a small patch to the oslo rpc code (initially in
> oslo-incubator then moved to oslo.messaging) which extends the existing
> support for limiting the rpc thread pool so that concurrent requests can
> be limited based on type/method. The blueprint and patch are here:
> 
> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> 
> The basic idea is that if you have server with limited resources you may
> want restrict operations that would impact those resources e.g. live
> migrations on a specific hypervisor or volume formatting on particular
> volume node. This patch allows you, admittedly in a very crude way, to
> apply a fixed limit to a set of rpc methods. I would like to know
> whether or not people think this is sort of thing would be useful or
> whether it alludes to a more fundamental issue that should be dealt with
> in a different manner.

Just to be clear for everyone what we're talking about. Your patch means
that if an operator sees that requests to the 'foo' and 'bar' RPC
methods for a given service are overwhelming the capacity of the
machine, you can throttle them by adding e.g.

  concurrency_control_enabled = true
  concurrency_control_actions = foo,bar
  concurrency_control_limit = 2

to the service's configuration file.

If you accept the premise of what's required here, I think you really
want to have e.g. a json policy file which can control the concurrency
limit on each method individually:

{
"compute": {
"baseapi": {
"ping": 10
},
"": {
"foo": 1,
"bar": 2
}
}
}

but that starts feeling pretty ridiculous.

My concern is that we're avoiding addressing a more fundamental issue
here. From IRC:

  "avoid specific concurrent operations from consuming too many
  system resources and starving other less resource intensive
  actions"
  I'd like us to think about whether we can come up with a
  solution that fixes the problem for people, without them
  having to mess with this type of configuration
  but yeah ... if we can't figure out a way of doing that, there
  is an argument for giving operators and interim workaround
  I wouldn't be in favour of an interim fix without first
  exploring the options for a more fundamental fix
  this isn't easily removable later, because once people start
  to rely on it we would need to put it through a deprecation
  period to remove it
  also, an interim solution like this takes away the pressure on
  us to find a more fundamental solution ... and we may wind up
  never doing that


So, I guess my first question is ... what specific RPC methods have you
seen issues with and feel you need to throttle?

Thanks,
Mark.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [oslo] rpc concurrency control rfc

2013-11-27 Thread Edward Hope-Morley
Moving this to the ml as requested, would appreciate
comments/thoughts/feedback.

So, I recently proposed a small patch to the oslo rpc code (initially in
oslo-incubator then moved to oslo.messaging) which extends the existing
support for limiting the rpc thread pool so that concurrent requests can
be limited based on type/method. The blueprint and patch are here:

https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control

The basic idea is that if you have server with limited resources you may
want restrict operations that would impact those resources e.g. live
migrations on a specific hypervisor or volume formatting on particular
volume node. This patch allows you, admittedly in a very crude way, to
apply a fixed limit to a set of rpc methods. I would like to know
whether or not people think this is sort of thing would be useful or
whether it alludes to a more fundamental issue that should be dealt with
in a different manner.

Thoughts?

Ed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev