Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas duncan.tho...@gmail.com
wrote:

 For LVM-thin I believe it is already disabled? It is only really
 needed on LVM-thick, where the returning zeros behaviour is not done.

 On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com wrote:
  I would say that wipe-on-delete is not necessary in most deployments.
 
  Most storage backends exhibit the following behavior:
  1. Delete volume A that has data on physical sectors 1-10
  2. Create new volume B
  3. Read from volume B before writing, which happens to map to physical
  sector 5 - backend should return zeroes here, and not data from volume A
 
  In case the backend doesn't provide this rather standard behavior, data
 must
  be wiped immediately.  Otherwise, the only risk is physical security,
 and if
  that's not adequate, customers shouldn't be storing all their data there
  regardless.  You could also run a periodic job to wipe deleted volumes to
  reduce the window of vulnerability, without making delete_volume take a
  ridiculously long time.
 
  Encryption is a good option as well, and of course it protects the data
  before deletion as well (as long as your keys are protected...)
 
  Bottom line - I too think the default in devstack should be to disable
 this
  option, and think we should consider making the default False in Cinder
  itself.  This isn't the first time someone has asked why volume deletion
  takes 20 minutes...
 
  As for queuing backup operations and managing bandwidth for various
  operations, ideally this would be done with a holistic view, so that for
  example Cinder operations won't interfere with Nova, or different Nova
  operations won't interfere with each other, but that is probably far down
  the road.
 
  Thanks,
  Avishay
 
 
  On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
 chris.frie...@windriver.com
  wrote:
 
  On 10/19/2014 09:33 AM, Avishay Traeger wrote:
 
  Hi Preston,
  Replies to some of your cinder-related questions:
  1. Creating a snapshot isn't usually an I/O intensive operation.  Are
  you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
  CPU usage of cinder-api spike sometimes - not sure why.
  2. The 'dd' processes that you see are Cinder wiping the volumes during
  deletion.  You can either disable this in cinder.conf, or you can use a
  relatively new option to manage the bandwidth used for this.
 
  IMHO, deployments should be optimized to not do very long/intensive
  management operations - for example, use backends with efficient
  snapshots, use CoW operations wherever possible rather than copying
 full
  volumes/images, disabling wipe on delete, etc.
 
 
  In a public-cloud environment I don't think it's reasonable to disable
  wipe-on-delete.
 
  Arguably it would be better to use encryption instead of wipe-on-delete.
  When done with the backing store, just throw away the key and it'll be
  secure enough for most purposes.
 
  Chris
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 



 --
 Duncan Thomas

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


We disable this in the Gates CINDER_SECURE_DELETE=False

ThinLVM (which hopefully will be default upon release of Kilo) doesn't need
it because internally it returns zeros when reading unallocated blocks so
it's a non-issue.

The debate of to wipe LV's or not to is a long running issue.  The default
behavior in Cinder is to leave it enable and IMHO that's how it should
stay.  The fact is anything that might be construed as less secure and
has been defaulted to the more secure setting should be left as it is.
It's simple to turn this off.

Also, nobody seemed to mention that in the case of Cinder operations like
copy-volume and the delete process you also have the ability to set
bandwidth limits on these operations, and in the case of delete even
specify different schemes (not just enabled/disabled but other options that
may be less or more IO intensive).

For further reference checkout the config options [1]

Thanks,
John

[1]:
https://github.com/openstack/cinder/blob/master/cinder/volume/driver.py#L69
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
John,

As a (new) OpenStack developer, I just discovered the
CINDER_SECURE_DELETE option.

As an *implicit* default, I entirely approve.  Production OpenStack
installations should *absolutely* insure there is no information leakage
from one instance to the next.

As an *explicit* default, I am not so sure. Low-end storage requires you do
this explicitly. High-end storage can insure information never leaks.
Counting on high level storage can make the upper levels more efficient,
can be a good thing.

The debate about whether to wipe LV's pretty much massively depends on the
intelligence of the underlying store. If the lower level storage never
returns accidental information ... explicit zeroes are not needed.



On Wed, Oct 22, 2014 at 11:15 PM, John Griffith john.griffi...@gmail.com
wrote:



 On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas duncan.tho...@gmail.com
 wrote:

 For LVM-thin I believe it is already disabled? It is only really
 needed on LVM-thick, where the returning zeros behaviour is not done.

 On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com
 wrote:
  I would say that wipe-on-delete is not necessary in most deployments.
 
  Most storage backends exhibit the following behavior:
  1. Delete volume A that has data on physical sectors 1-10
  2. Create new volume B
  3. Read from volume B before writing, which happens to map to physical
  sector 5 - backend should return zeroes here, and not data from volume A
 
  In case the backend doesn't provide this rather standard behavior, data
 must
  be wiped immediately.  Otherwise, the only risk is physical security,
 and if
  that's not adequate, customers shouldn't be storing all their data there
  regardless.  You could also run a periodic job to wipe deleted volumes
 to
  reduce the window of vulnerability, without making delete_volume take a
  ridiculously long time.
 
  Encryption is a good option as well, and of course it protects the data
  before deletion as well (as long as your keys are protected...)
 
  Bottom line - I too think the default in devstack should be to disable
 this
  option, and think we should consider making the default False in Cinder
  itself.  This isn't the first time someone has asked why volume deletion
  takes 20 minutes...
 
  As for queuing backup operations and managing bandwidth for various
  operations, ideally this would be done with a holistic view, so that for
  example Cinder operations won't interfere with Nova, or different Nova
  operations won't interfere with each other, but that is probably far
 down
  the road.
 
  Thanks,
  Avishay
 
 
  On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
 chris.frie...@windriver.com
  wrote:
 
  On 10/19/2014 09:33 AM, Avishay Traeger wrote:
 
  Hi Preston,
  Replies to some of your cinder-related questions:
  1. Creating a snapshot isn't usually an I/O intensive operation.  Are
  you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
  CPU usage of cinder-api spike sometimes - not sure why.
  2. The 'dd' processes that you see are Cinder wiping the volumes
 during
  deletion.  You can either disable this in cinder.conf, or you can use
 a
  relatively new option to manage the bandwidth used for this.
 
  IMHO, deployments should be optimized to not do very long/intensive
  management operations - for example, use backends with efficient
  snapshots, use CoW operations wherever possible rather than copying
 full
  volumes/images, disabling wipe on delete, etc.
 
 
  In a public-cloud environment I don't think it's reasonable to disable
  wipe-on-delete.
 
  Arguably it would be better to use encryption instead of
 wipe-on-delete.
  When done with the backing store, just throw away the key and it'll be
  secure enough for most purposes.
 
  Chris
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 



 --
 Duncan Thomas

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 We disable this in the Gates CINDER_SECURE_DELETE=False

 ThinLVM (which hopefully will be default upon release of Kilo) doesn't
 need it because internally it returns zeros when reading unallocated blocks
 so it's a non-issue.

 The debate of to wipe LV's or not to is a long running issue.  The default
 behavior in Cinder is to leave it enable and IMHO that's how it should
 stay.  The fact is anything that might be construed as less secure and
 has been defaulted to the more secure setting should be left as it is.
 It's simple to turn this off.

 Also, nobody seemed to mention that in the case of Cinder operations like
 

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Duncan Thomas
On 23 October 2014 08:30, Preston L. Bannister pres...@bannister.us wrote:
 John,

 As a (new) OpenStack developer, I just discovered the CINDER_SECURE_DELETE
 option.

 As an *implicit* default, I entirely approve.  Production OpenStack
 installations should *absolutely* insure there is no information leakage
 from one instance to the next.

 As an *explicit* default, I am not so sure. Low-end storage requires you do
 this explicitly. High-end storage can insure information never leaks.
 Counting on high level storage can make the upper levels more efficient, can
 be a good thing.

 The debate about whether to wipe LV's pretty much massively depends on the
 intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.

The security requirements regarding wiping are totally and utterly
site dependent - some places care and are happy to pay the cost (some
even using an entirely pointless multi-write scrub out of historically
rooted paranoia) where as some don't care in the slightest. LVM thin
that John mentioned is no worse or better than most 'smart' arrays -
unless you happen to hit a bug, it won't return previous info.

That's a good default, if your site needs better then there are lots
of config options to go looking into for a whole variety of things,
and you should probably be doing your own security audits of the code
base and other deep analysis, as well as reading and contributing to
the security guide.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister pres...@bannister.us
wrote:

 John,

 As a (new) OpenStack developer, I just discovered the
 CINDER_SECURE_DELETE option.

 As an *implicit* default, I entirely approve.  Production OpenStack
 installations should *absolutely* insure there is no information leakage
 from one instance to the next.

 As an *explicit* default, I am not so sure. Low-end storage requires you
 do this explicitly. High-end storage can insure information never leaks.
 Counting on high level storage can make the upper levels more efficient,
 can be a good thing.


Not entirely sure of the distinction intended as far as
implicit/explicit... but one other thing I should probably point out; this
ONLY applies to the LVM driver, maybe that's what you're getting at.  Would
be better probably to advertise as an LVM Driver option (easy enough to do
in the config options help message).

Anyway, I just wanted to point to some of the options like using io-nice,
clear-size, blkio cgroups, bps_limit..

It doesn't suck as bad as you might have thought or some of the other
respondents on this thread seem to think.  There's certainly room for
improvement and growth but it hasn't been completely ignored on the Cinder
side.



 The debate about whether to wipe LV's pretty much massively depends on the
 intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.



 On Wed, Oct 22, 2014 at 11:15 PM, John Griffith john.griffi...@gmail.com
 wrote:



 On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas duncan.tho...@gmail.com
 wrote:

 For LVM-thin I believe it is already disabled? It is only really
 needed on LVM-thick, where the returning zeros behaviour is not done.

 On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com
 wrote:
  I would say that wipe-on-delete is not necessary in most deployments.
 
  Most storage backends exhibit the following behavior:
  1. Delete volume A that has data on physical sectors 1-10
  2. Create new volume B
  3. Read from volume B before writing, which happens to map to physical
  sector 5 - backend should return zeroes here, and not data from volume
 A
 
  In case the backend doesn't provide this rather standard behavior,
 data must
  be wiped immediately.  Otherwise, the only risk is physical security,
 and if
  that's not adequate, customers shouldn't be storing all their data
 there
  regardless.  You could also run a periodic job to wipe deleted volumes
 to
  reduce the window of vulnerability, without making delete_volume take a
  ridiculously long time.
 
  Encryption is a good option as well, and of course it protects the data
  before deletion as well (as long as your keys are protected...)
 
  Bottom line - I too think the default in devstack should be to disable
 this
  option, and think we should consider making the default False in Cinder
  itself.  This isn't the first time someone has asked why volume
 deletion
  takes 20 minutes...
 
  As for queuing backup operations and managing bandwidth for various
  operations, ideally this would be done with a holistic view, so that
 for
  example Cinder operations won't interfere with Nova, or different Nova
  operations won't interfere with each other, but that is probably far
 down
  the road.
 
  Thanks,
  Avishay
 
 
  On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
 chris.frie...@windriver.com
  wrote:
 
  On 10/19/2014 09:33 AM, Avishay Traeger wrote:
 
  Hi Preston,
  Replies to some of your cinder-related questions:
  1. Creating a snapshot isn't usually an I/O intensive operation.  Are
  you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen
 the
  CPU usage of cinder-api spike sometimes - not sure why.
  2. The 'dd' processes that you see are Cinder wiping the volumes
 during
  deletion.  You can either disable this in cinder.conf, or you can
 use a
  relatively new option to manage the bandwidth used for this.
 
  IMHO, deployments should be optimized to not do very long/intensive
  management operations - for example, use backends with efficient
  snapshots, use CoW operations wherever possible rather than copying
 full
  volumes/images, disabling wipe on delete, etc.
 
 
  In a public-cloud environment I don't think it's reasonable to disable
  wipe-on-delete.
 
  Arguably it would be better to use encryption instead of
 wipe-on-delete.
  When done with the backing store, just throw away the key and it'll be
  secure enough for most purposes.
 
  Chris
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 



 --
 Duncan Thomas

 

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 8:50 AM, John Griffith john.griffi...@gmail.com
wrote:



 On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister 
 pres...@bannister.us wrote:

 John,

 As a (new) OpenStack developer, I just discovered the
 CINDER_SECURE_DELETE option.


OHHH... Most importantly, I almost forgot.  Welcome!!!


 As an *implicit* default, I entirely approve.  Production OpenStack
 installations should *absolutely* insure there is no information leakage
 from one instance to the next.

 As an *explicit* default, I am not so sure. Low-end storage requires you
 do this explicitly. High-end storage can insure information never leaks.
 Counting on high level storage can make the upper levels more efficient,
 can be a good thing.


 Not entirely sure of the distinction intended as far as
 implicit/explicit... but one other thing I should probably point out; this
 ONLY applies to the LVM driver, maybe that's what you're getting at.  Would
 be better probably to advertise as an LVM Driver option (easy enough to do
 in the config options help message).

 Anyway, I just wanted to point to some of the options like using io-nice,
 clear-size, blkio cgroups, bps_limit..

 It doesn't suck as bad as you might have thought or some of the other
 respondents on this thread seem to think.  There's certainly room for
 improvement and growth but it hasn't been completely ignored on the Cinder
 side.



 The debate about whether to wipe LV's pretty much massively depends on
 the intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.



 On Wed, Oct 22, 2014 at 11:15 PM, John Griffith john.griffi...@gmail.com
  wrote:



 On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas duncan.tho...@gmail.com
 wrote:

 For LVM-thin I believe it is already disabled? It is only really
 needed on LVM-thick, where the returning zeros behaviour is not done.

 On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com
 wrote:
  I would say that wipe-on-delete is not necessary in most deployments.
 
  Most storage backends exhibit the following behavior:
  1. Delete volume A that has data on physical sectors 1-10
  2. Create new volume B
  3. Read from volume B before writing, which happens to map to physical
  sector 5 - backend should return zeroes here, and not data from
 volume A
 
  In case the backend doesn't provide this rather standard behavior,
 data must
  be wiped immediately.  Otherwise, the only risk is physical security,
 and if
  that's not adequate, customers shouldn't be storing all their data
 there
  regardless.  You could also run a periodic job to wipe deleted
 volumes to
  reduce the window of vulnerability, without making delete_volume take
 a
  ridiculously long time.
 
  Encryption is a good option as well, and of course it protects the
 data
  before deletion as well (as long as your keys are protected...)
 
  Bottom line - I too think the default in devstack should be to
 disable this
  option, and think we should consider making the default False in
 Cinder
  itself.  This isn't the first time someone has asked why volume
 deletion
  takes 20 minutes...
 
  As for queuing backup operations and managing bandwidth for various
  operations, ideally this would be done with a holistic view, so that
 for
  example Cinder operations won't interfere with Nova, or different Nova
  operations won't interfere with each other, but that is probably far
 down
  the road.
 
  Thanks,
  Avishay
 
 
  On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
 chris.frie...@windriver.com
  wrote:
 
  On 10/19/2014 09:33 AM, Avishay Traeger wrote:
 
  Hi Preston,
  Replies to some of your cinder-related questions:
  1. Creating a snapshot isn't usually an I/O intensive operation.
 Are
  you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen
 the
  CPU usage of cinder-api spike sometimes - not sure why.
  2. The 'dd' processes that you see are Cinder wiping the volumes
 during
  deletion.  You can either disable this in cinder.conf, or you can
 use a
  relatively new option to manage the bandwidth used for this.
 
  IMHO, deployments should be optimized to not do very long/intensive
  management operations - for example, use backends with efficient
  snapshots, use CoW operations wherever possible rather than copying
 full
  volumes/images, disabling wipe on delete, etc.
 
 
  In a public-cloud environment I don't think it's reasonable to
 disable
  wipe-on-delete.
 
  Arguably it would be better to use encryption instead of
 wipe-on-delete.
  When done with the backing store, just throw away the key and it'll
 be
  secure enough for most purposes.
 
  Chris
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
On Thu, Oct 23, 2014 at 7:51 AM, John Griffith john.griffi...@gmail.com
wrote:

 On Thu, Oct 23, 2014 at 8:50 AM, John Griffith john.griffi...@gmail.com
 wrote:

 On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister 
 pres...@bannister.us wrote:

 John,

 As a (new) OpenStack developer, I just discovered the
 CINDER_SECURE_DELETE option.


 OHHH... Most importantly, I almost forgot.  Welcome!!!


Thanks! (I think...)




 It doesn't suck as bad as you might have thought or some of the other
 respondents on this thread seem to think.  There's certainly room for
 improvement and growth but it hasn't been completely ignored on the Cinder
 side.


To be clear, I am fairly impressed with what has gone into OpenStack as a
whole. Given the breadth, complexity, and growth ... not everything is
going to be perfect (yet?).

So ... not trying to disparage past work, but noting what does not seem
right. (Also know I could easily be missing something.)





 The debate about whether to wipe LV's pretty much massively depends on the
 intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.


Yes, that is pretty much the key.

Does LVM let you read physical blocks that have never been written? Or zero
out virgin segments on read? If not, then dd of zeroes is a way of doing
the right thing (if *very* expensive).
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister pres...@bannister.us
wrote:


 On Thu, Oct 23, 2014 at 7:51 AM, John Griffith john.griffi...@gmail.com
 wrote:

 On Thu, Oct 23, 2014 at 8:50 AM, John Griffith john.griffi...@gmail.com
 wrote:

 On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister 
 pres...@bannister.us wrote:

 John,

 As a (new) OpenStack developer, I just discovered the
 CINDER_SECURE_DELETE option.


 OHHH... Most importantly, I almost forgot.  Welcome!!!


 Thanks! (I think...)

:)





 It doesn't suck as bad as you might have thought or some of the other
 respondents on this thread seem to think.  There's certainly room for
 improvement and growth but it hasn't been completely ignored on the Cinder
 side.


 To be clear, I am fairly impressed with what has gone into OpenStack as a
 whole. Given the breadth, complexity, and growth ... not everything is
 going to be perfect (yet?).

 So ... not trying to disparage past work, but noting what does not seem
 right. (Also know I could easily be missing something.)

Sure, I didn't mean anything by that at all, and certainly didn't take it
that way.






 The debate about whether to wipe LV's pretty much massively depends on
 the intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.


 Yes, that is pretty much the key.

 Does LVM let you read physical blocks that have never been written? Or
 zero out virgin segments on read? If not, then dd of zeroes is a way of
 doing the right thing (if *very* expensive).


Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
possible for a new LV to be allocated from the VG and a block from a
previous LV can be allocated.  So in essence if somebody were to sit there
in a cloud env and just create volumes and read the blocks over and over
and over they could gather some previous or other tenants data (or pieces
of it at any rate).  It's def the right thing to do if you're in an env
where you need some level of security between tenants.  There are other
ways to solve it of course but this is what we've got.




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
On Thu, Oct 23, 2014 at 3:04 PM, John Griffith john.griffi...@gmail.com
wrote:

The debate about whether to wipe LV's pretty much massively depends on the
 intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.



 On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister 
 pres...@bannister.us wrote:


 Yes, that is pretty much the key.

 Does LVM let you read physical blocks that have never been written? Or
 zero out virgin segments on read? If not, then dd of zeroes is a way of
 doing the right thing (if *very* expensive).


 Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
 possible for a new LV to be allocated from the VG and a block from a
 previous LV can be allocated.  So in essence if somebody were to sit there
 in a cloud env and just create volumes and read the blocks over and over
 and over they could gather some previous or other tenants data (or pieces
 of it at any rate).  It's def the right thing to do if you're in an env
 where you need some level of security between tenants.  There are other
 ways to solve it of course but this is what we've got.



Has anyone raised this issue with the LVM folk? Returning zeros on
unwritten blocks would require a bit of extra bookkeeping, but a lot more
efficient overall.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Chris Friesen

On 10/23/2014 04:24 PM, Preston L. Bannister wrote:

On Thu, Oct 23, 2014 at 3:04 PM, John Griffith john.griffi...@gmail.com
mailto:john.griffi...@gmail.com wrote:

The debate about whether to wipe LV's pretty much massively
depends on the intelligence of the underlying store. If the
lower level storage never returns accidental information ...
explicit zeroes are not needed.

On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister
pres...@bannister.us mailto:pres...@bannister.us wrote:


Yes, that is pretty much the key.

Does LVM let you read physical blocks that have never been
written? Or zero out virgin segments on read? If not, then dd
of zeroes is a way of doing the right thing (if *very* expensive).

Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
possible for a new LV to be allocated from the VG and a block from a
previous LV can be allocated.  So in essence if somebody were to sit
there in a cloud env and just create volumes and read the blocks
over and over and over they could gather some previous or other
tenants data (or pieces of it at any rate).  It's def the right
thing to do if you're in an env where you need some level of
security between tenants.  There are other ways to solve it of
course but this is what we've got.



Has anyone raised this issue with the LVM folk? Returning zeros on
unwritten blocks would require a bit of extra bookkeeping, but a lot
more efficient overall.


For Cinder volumes, I think that if you have new enough versions of 
everything you can specify lvm_type = thin and it will use thin 
provisioning.  Among other things this should improve snapshot 
performance and also avoid the need to explicitly wipe on delete (since 
the next user of the storage will be provided zeros for a read of any 
page it hasn't written).


As far as I know this is not supported for ephemeral storage.

Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-21 Thread Chris Friesen

On 10/19/2014 09:33 AM, Avishay Traeger wrote:

Hi Preston,
Replies to some of your cinder-related questions:
1. Creating a snapshot isn't usually an I/O intensive operation.  Are
you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
CPU usage of cinder-api spike sometimes - not sure why.
2. The 'dd' processes that you see are Cinder wiping the volumes during
deletion.  You can either disable this in cinder.conf, or you can use a
relatively new option to manage the bandwidth used for this.

IMHO, deployments should be optimized to not do very long/intensive
management operations - for example, use backends with efficient
snapshots, use CoW operations wherever possible rather than copying full
volumes/images, disabling wipe on delete, etc.


In a public-cloud environment I don't think it's reasonable to disable 
wipe-on-delete.


Arguably it would be better to use encryption instead of wipe-on-delete. 
 When done with the backing store, just throw away the key and it'll be 
secure enough for most purposes.


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-21 Thread Avishay Traeger
I would say that wipe-on-delete is not necessary in most deployments.

Most storage backends exhibit the following behavior:
1. Delete volume A that has data on physical sectors 1-10
2. Create new volume B
3. Read from volume B before writing, which happens to map to physical
sector 5 - backend should return zeroes here, and not data from volume A

In case the backend doesn't provide this rather standard behavior, data
must be wiped immediately.  Otherwise, the only risk is physical security,
and if that's not adequate, customers shouldn't be storing all their data
there regardless.  You could also run a periodic job to wipe deleted
volumes to reduce the window of vulnerability, without making delete_volume
take a ridiculously long time.

Encryption is a good option as well, and of course it protects the data
before deletion as well (as long as your keys are protected...)

Bottom line - I too think the default in devstack should be to disable this
option, and think we should consider making the default False in Cinder
itself.  This isn't the first time someone has asked why volume deletion
takes 20 minutes...

As for queuing backup operations and managing bandwidth for various
operations, ideally this would be done with a holistic view, so that for
example Cinder operations won't interfere with Nova, or different Nova
operations won't interfere with each other, but that is probably far down
the road.

Thanks,
Avishay


On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen chris.frie...@windriver.com
wrote:

 On 10/19/2014 09:33 AM, Avishay Traeger wrote:

 Hi Preston,
 Replies to some of your cinder-related questions:
 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
 you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
 CPU usage of cinder-api spike sometimes - not sure why.
 2. The 'dd' processes that you see are Cinder wiping the volumes during
 deletion.  You can either disable this in cinder.conf, or you can use a
 relatively new option to manage the bandwidth used for this.

 IMHO, deployments should be optimized to not do very long/intensive
 management operations - for example, use backends with efficient
 snapshots, use CoW operations wherever possible rather than copying full
 volumes/images, disabling wipe on delete, etc.


 In a public-cloud environment I don't think it's reasonable to disable
 wipe-on-delete.

 Arguably it would be better to use encryption instead of wipe-on-delete.
 When done with the backing store, just throw away the key and it'll be
 secure enough for most purposes.

 Chris



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-21 Thread Duncan Thomas
For LVM-thin I believe it is already disabled? It is only really
needed on LVM-thick, where the returning zeros behaviour is not done.

On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com wrote:
 I would say that wipe-on-delete is not necessary in most deployments.

 Most storage backends exhibit the following behavior:
 1. Delete volume A that has data on physical sectors 1-10
 2. Create new volume B
 3. Read from volume B before writing, which happens to map to physical
 sector 5 - backend should return zeroes here, and not data from volume A

 In case the backend doesn't provide this rather standard behavior, data must
 be wiped immediately.  Otherwise, the only risk is physical security, and if
 that's not adequate, customers shouldn't be storing all their data there
 regardless.  You could also run a periodic job to wipe deleted volumes to
 reduce the window of vulnerability, without making delete_volume take a
 ridiculously long time.

 Encryption is a good option as well, and of course it protects the data
 before deletion as well (as long as your keys are protected...)

 Bottom line - I too think the default in devstack should be to disable this
 option, and think we should consider making the default False in Cinder
 itself.  This isn't the first time someone has asked why volume deletion
 takes 20 minutes...

 As for queuing backup operations and managing bandwidth for various
 operations, ideally this would be done with a holistic view, so that for
 example Cinder operations won't interfere with Nova, or different Nova
 operations won't interfere with each other, but that is probably far down
 the road.

 Thanks,
 Avishay


 On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen chris.frie...@windriver.com
 wrote:

 On 10/19/2014 09:33 AM, Avishay Traeger wrote:

 Hi Preston,
 Replies to some of your cinder-related questions:
 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
 you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
 CPU usage of cinder-api spike sometimes - not sure why.
 2. The 'dd' processes that you see are Cinder wiping the volumes during
 deletion.  You can either disable this in cinder.conf, or you can use a
 relatively new option to manage the bandwidth used for this.

 IMHO, deployments should be optimized to not do very long/intensive
 management operations - for example, use backends with efficient
 snapshots, use CoW operations wherever possible rather than copying full
 volumes/images, disabling wipe on delete, etc.


 In a public-cloud environment I don't think it's reasonable to disable
 wipe-on-delete.

 Arguably it would be better to use encryption instead of wipe-on-delete.
 When done with the backing store, just throw away the key and it'll be
 secure enough for most purposes.

 Chris



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Duncan Thomas

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Avishay Traeger
Hi Preston,
Replies to some of your cinder-related questions:
1. Creating a snapshot isn't usually an I/O intensive operation.  Are you
seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the CPU
usage of cinder-api spike sometimes - not sure why.
2. The 'dd' processes that you see are Cinder wiping the volumes during
deletion.  You can either disable this in cinder.conf, or you can use a
relatively new option to manage the bandwidth used for this.

IMHO, deployments should be optimized to not do very long/intensive
management operations - for example, use backends with efficient snapshots,
use CoW operations wherever possible rather than copying full
volumes/images, disabling wipe on delete, etc.

Thanks,
Avishay

On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister pres...@bannister.us
wrote:

 OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or
 not.

 Have a DevStack, running in a VM (VirtualBox), backed by a single flash
 drive (on my current generation MacBook). Could be I have something off in
 my setup.

 Testing nova backup - first the existing implementation, then my (much
 changed) replacement.

 Simple scripts for testing. Create images. Create instances (five). Run
 backup on all instances.

 Currently found in:
 https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts

 First time I started backups of all (five) instances, load on the Devstack
 VM went insane, and all but one backup failed. Seems that all of the
 backups were performed immediately (or attempted), without any sort of
 queuing or load management. Huh. Well, maybe just the backup implementation
 is naive...

 I will write on this at greater length, but backup should interfere as
 little as possible with foreground processing. Overloading a host is
 entirely unacceptable.

 Replaced the backup implementation so it does proper queuing (among other
 things). Iterating forward - implementing and testing.

 Fired off snapshots on five Cinder volumes (attached to five instances).
 Again the load shot very high. Huh. Well, in a full-scale OpenStack setup,
 maybe storage can handle that much I/O more gracefully ... or not. Again,
 should taking snapshots interfere with foreground activity? I would say,
 most often not. Queuing and serializing snapshots would strictly limit the
 interference with foreground. Also, very high end storage can perform
 snapshots *very* quickly, so serialized snapshots will not be slow. My take
 is that the default behavior should be to queue and serialize all heavy I/O
 operations, with non-default allowances for limited concurrency.

 Cleaned up (which required reboot/unstack/stack and more). Tried again.

 Ran two test backups (which in the current iteration create Cinder volume
 snapshots). Asked Cinder to delete the snapshots. Again, very high load
 factors, and in top I can see two long-running dd processes. (Given I
 have a single disk, more than one dd is not good.)

 Running too many heavyweight operations against storage can lead to
 thrashing. Queuing can strictly limit that load, and insure better and
 reliable performance. I am not seeing evidence of this thought in my
 OpenStack testing.

 So far it looks like there is no thought to managing the impact of disk
 intensive management operations. Am I missing something?





 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Preston L. Bannister
Avishay,

Thanks for the tip on [cinder.conf] volume_clear. The corresponding option
in devstack is CINDER_SECURE_DELETE=False.

Also I *may* have been bitten by the related bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1023755

(All I know at this point is the devstack VM became unresponsive - have not
yet identified the cause. But the symptoms fit.)

Not sure if there are spikes on Cinder snapshot creation. Perhaps not. (Too
many different failures and oddities. Have not sorted all, yet.)

I am of the opinion CINDER_SECURE_DELETE=False should be a default for
devstack. Especially as it invokes bug-like behavior.

Also, unbounded concurrent dd operations is not a good idea. (Which is
generally what you meant, I believe.)

Onwards



On Sun, Oct 19, 2014 at 8:33 AM, Avishay Traeger avis...@stratoscale.com
wrote:

 Hi Preston,
 Replies to some of your cinder-related questions:
 1. Creating a snapshot isn't usually an I/O intensive operation.  Are you
 seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the CPU
 usage of cinder-api spike sometimes - not sure why.
 2. The 'dd' processes that you see are Cinder wiping the volumes during
 deletion.  You can either disable this in cinder.conf, or you can use a
 relatively new option to manage the bandwidth used for this.

 IMHO, deployments should be optimized to not do very long/intensive
 management operations - for example, use backends with efficient snapshots,
 use CoW operations wherever possible rather than copying full
 volumes/images, disabling wipe on delete, etc.

 Thanks,
 Avishay

 On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister 
 pres...@bannister.us wrote:

 OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or
 not.

 Have a DevStack, running in a VM (VirtualBox), backed by a single flash
 drive (on my current generation MacBook). Could be I have something off in
 my setup.

 Testing nova backup - first the existing implementation, then my (much
 changed) replacement.

 Simple scripts for testing. Create images. Create instances (five). Run
 backup on all instances.

 Currently found in:

 https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts

 First time I started backups of all (five) instances, load on the
 Devstack VM went insane, and all but one backup failed. Seems that all of
 the backups were performed immediately (or attempted), without any sort of
 queuing or load management. Huh. Well, maybe just the backup implementation
 is naive...

 I will write on this at greater length, but backup should interfere as
 little as possible with foreground processing. Overloading a host is
 entirely unacceptable.

 Replaced the backup implementation so it does proper queuing (among other
 things). Iterating forward - implementing and testing.

 Fired off snapshots on five Cinder volumes (attached to five instances).
 Again the load shot very high. Huh. Well, in a full-scale OpenStack setup,
 maybe storage can handle that much I/O more gracefully ... or not. Again,
 should taking snapshots interfere with foreground activity? I would say,
 most often not. Queuing and serializing snapshots would strictly limit the
 interference with foreground. Also, very high end storage can perform
 snapshots *very* quickly, so serialized snapshots will not be slow. My take
 is that the default behavior should be to queue and serialize all heavy I/O
 operations, with non-default allowances for limited concurrency.

 Cleaned up (which required reboot/unstack/stack and more). Tried again.

 Ran two test backups (which in the current iteration create Cinder volume
 snapshots). Asked Cinder to delete the snapshots. Again, very high load
 factors, and in top I can see two long-running dd processes. (Given I
 have a single disk, more than one dd is not good.)

 Running too many heavyweight operations against storage can lead to
 thrashing. Queuing can strictly limit that load, and insure better and
 reliable performance. I am not seeing evidence of this thought in my
 OpenStack testing.

 So far it looks like there is no thought to managing the impact of disk
 intensive management operations. Am I missing something?





 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Jay Pipes
Hi Preston, some great questions in here. Some comments inline, but 
tl;dr my answer is yes, we need to be doing a much better job thinking 
about how I/O intensive operations affect other things running on 
providers of compute and block storage resources


On 10/19/2014 06:41 AM, Preston L. Bannister wrote:

OK, I am fairly new here (to OpenStack). Maybe I am missing something.
Or not.

Have a DevStack, running in a VM (VirtualBox), backed by a single flash
drive (on my current generation MacBook). Could be I have something off
in my setup.

Testing nova backup - first the existing implementation, then my (much
changed) replacement.

Simple scripts for testing. Create images. Create instances (five). Run
backup on all instances.

Currently found in:
https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts

First time I started backups of all (five) instances, load on the
Devstack VM went insane, and all but one backup failed. Seems that all
of the backups were performed immediately (or attempted), without any
sort of queuing or load management. Huh. Well, maybe just the backup
implementation is naive...


Yes, you are exactly correct. There is no queuing behaviour for any of 
the backup operations (I put backup operations in quotes because IMO 
it is silly to refer to them as backup operations, since all they are 
doing really is a snapshot action against the instance/volume -- and 
then attempting to be a poor man's cloud cron).


The backup is initiated from the admin_actions API extension here:

https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/admin_actions.py#L297

which calls the nova.compute.api.API.backup() method here:

https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031

which, after creating some image metadata in Glance for the snapshot, 
calls the compute RPC API here:


https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759

Which sends an RPC asynchronous message to the compute node to execute 
the instance snapshot and rotate backups:


https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2969

That method eventually calls the blocking snapshot() operation on the 
virt driver:


https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3041

And it is the nova.virt.libvirt.Driver.snapshot() method that is quite 
icky, with lots of logic to determine the type of snapshot to do and 
how to do it:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1607

The gist of the driver's snapshot() method calls 
ImageBackend.snapshot(), which is responsible for doing the actual 
snapshot of the instance:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1685

and then once the snapshot is done, the method calls to the Glance API 
to upload the snapshotted disk image to Glance:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1730-L1734

All of which is I/O intensive and AFAICT, mostly done in a blocking 
manner, with no queuing or traffic control measures, so as you correctly 
point out, if the compute node daemon receives 5 backup requests, it 
will go ahead and do 5 snapshot operations and 5 uploads to Glance all 
as fast as it can. It will do it in 5 different eventlet greenthreads, 
but there are no designs in place to prioritize the snapshotting I/O 
lower than active VM I/O.



I will write on this at greater length, but backup should interfere as
little as possible with foreground processing. Overloading a host is
entirely unacceptable.


Agree with you completely.


Replaced the backup implementation so it does proper queuing (among
other things). Iterating forward - implementing and testing.


Is this code up somewhere we can take a look at?


Fired off snapshots on five Cinder volumes (attached to five instances).
Again the load shot very high. Huh. Well, in a full-scale OpenStack
setup, maybe storage can handle that much I/O more gracefully ... or
not. Again, should taking snapshots interfere with foreground activity?
I would say, most often not. Queuing and serializing snapshots would
strictly limit the interference with foreground. Also, very high end
storage can perform snapshots *very* quickly, so serialized snapshots
will not be slow. My take is that the default behavior should be to
queue and serialize all heavy I/O operations, with non-default
allowances for limited concurrency.

Cleaned up (which required reboot/unstack/stack and more). Tried again.

Ran two test backups (which in the current iteration create Cinder
volume snapshots). Asked Cinder to delete the snapshots. Again, very
high load factors, and in top I can see two long-running dd
processes. (Given I have a single disk, more than one dd is not good.)

Running too many heavyweight operations against storage can lead to
thrashing. Queuing can strictly limit that load, and insure better and
reliable performance. I am not 

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Preston L. Bannister
Jay,

Thanks very much for the insight and links. In fact, I have visited
*almost* all the places mentioned, prior. Added clarity is good. :)

Also, to your earlier comment (to an earlier thread) about backup not
really belonging in Nova - in main I agree. The backup API belongs in
Nova (as this maps cleanly to the equivalent in AWS), but the bulk of the
implementation can and should be distinct (in my opinion).

My current work is at:
https://github.com/dreadedhill-work/stack-backup

I also have matching changes to Nova and the Nova client under the same
Github account.

Please note this is very much a work in progress (as you might guess from
my prior comments). This needs a longer proper write up, and a cleaner Git
history. The code is a pretty fair ways along, but should be considered
more a rough draft, rather than a final version.

For the next few weeks, I am enormously crunched for time, as I have
promised a PoC at a site with a very large OpenStack deployment.

Noted your suggestion about the Rally team. Might be a bit before I can
pursue. :)

Again, Thanks.





On Sun, Oct 19, 2014 at 10:13 AM, Jay Pipes jaypi...@gmail.com wrote:

 Hi Preston, some great questions in here. Some comments inline, but tl;dr
 my answer is yes, we need to be doing a much better job thinking about how
 I/O intensive operations affect other things running on providers of
 compute and block storage resources

 On 10/19/2014 06:41 AM, Preston L. Bannister wrote:

 OK, I am fairly new here (to OpenStack). Maybe I am missing something.
 Or not.

 Have a DevStack, running in a VM (VirtualBox), backed by a single flash
 drive (on my current generation MacBook). Could be I have something off
 in my setup.

 Testing nova backup - first the existing implementation, then my (much
 changed) replacement.

 Simple scripts for testing. Create images. Create instances (five). Run
 backup on all instances.

 Currently found in:
 https://github.com/dreadedhill-work/stack-backup/
 tree/master/backup-scripts

 First time I started backups of all (five) instances, load on the
 Devstack VM went insane, and all but one backup failed. Seems that all
 of the backups were performed immediately (or attempted), without any
 sort of queuing or load management. Huh. Well, maybe just the backup
 implementation is naive...


 Yes, you are exactly correct. There is no queuing behaviour for any of the
 backup operations (I put backup operations in quotes because IMO it is
 silly to refer to them as backup operations, since all they are doing
 really is a snapshot action against the instance/volume -- and then
 attempting to be a poor man's cloud cron).

 The backup is initiated from the admin_actions API extension here:

 https://github.com/openstack/nova/blob/master/nova/api/
 openstack/compute/contrib/admin_actions.py#L297

 which calls the nova.compute.api.API.backup() method here:

 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031

 which, after creating some image metadata in Glance for the snapshot,
 calls the compute RPC API here:

 https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759

 Which sends an RPC asynchronous message to the compute node to execute the
 instance snapshot and rotate backups:

 https://github.com/openstack/nova/blob/master/nova/compute/
 manager.py#L2969

 That method eventually calls the blocking snapshot() operation on the virt
 driver:

 https://github.com/openstack/nova/blob/master/nova/compute/
 manager.py#L3041

 And it is the nova.virt.libvirt.Driver.snapshot() method that is quite
 icky, with lots of logic to determine the type of snapshot to do and how
 to do it:

 https://github.com/openstack/nova/blob/master/nova/virt/
 libvirt/driver.py#L1607

 The gist of the driver's snapshot() method calls ImageBackend.snapshot(),
 which is responsible for doing the actual snapshot of the instance:

 https://github.com/openstack/nova/blob/master/nova/virt/
 libvirt/driver.py#L1685

 and then once the snapshot is done, the method calls to the Glance API to
 upload the snapshotted disk image to Glance:

 https://github.com/openstack/nova/blob/master/nova/virt/
 libvirt/driver.py#L1730-L1734

 All of which is I/O intensive and AFAICT, mostly done in a blocking
 manner, with no queuing or traffic control measures, so as you correctly
 point out, if the compute node daemon receives 5 backup requests, it will
 go ahead and do 5 snapshot operations and 5 uploads to Glance all as fast
 as it can. It will do it in 5 different eventlet greenthreads, but there
 are no designs in place to prioritize the snapshotting I/O lower than
 active VM I/O.

  I will write on this at greater length, but backup should interfere as
 little as possible with foreground processing. Overloading a host is
 entirely unacceptable.


 Agree with you completely.

  Replaced the backup implementation so it does proper queuing (among
 other things). Iterating forward - implementing and testing.


 Is this code up