Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On 10/23/2014 04:24 PM, Preston L. Bannister wrote: On Thu, Oct 23, 2014 at 3:04 PM, John Griffith mailto:john.griffi...@gmail.com>> wrote: The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister mailto:pres...@bannister.us>> wrote: Yes, that is pretty much the key. Does LVM let you read physical blocks that have never been written? Or zero out virgin segments on read? If not, then "dd" of zeroes is a way of doing the right thing (if *very* expensive). Yeah... so that's the crux of the issue on LVM (Thick). It's quite possible for a new LV to be allocated from the VG and a block from a previous LV can be allocated. So in essence if somebody were to sit there in a cloud env and just create volumes and read the blocks over and over and over they could gather some previous or other tenants data (or pieces of it at any rate). It's def the "right" thing to do if you're in an env where you need some level of security between tenants. There are other ways to solve it of course but this is what we've got. Has anyone raised this issue with the LVM folk? Returning zeros on unwritten blocks would require a bit of extra bookkeeping, but a lot more efficient overall. For Cinder volumes, I think that if you have new enough versions of everything you can specify "lvm_type = thin" and it will use thin provisioning. Among other things this should improve snapshot performance and also avoid the need to explicitly wipe on delete (since the next user of the storage will be provided zeros for a read of any page it hasn't written). As far as I know this is not supported for ephemeral storage. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 3:04 PM, John Griffith wrote: The debate about whether to wipe LV's pretty much massively depends on the >> intelligence of the underlying store. If the lower level storage never >> returns accidental information ... explicit zeroes are not needed. >> > > On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister < > pres...@bannister.us> wrote: > >> Yes, that is pretty much the key. >> >> Does LVM let you read physical blocks that have never been written? Or >> zero out virgin segments on read? If not, then "dd" of zeroes is a way of >> doing the right thing (if *very* expensive). >> > > Yeah... so that's the crux of the issue on LVM (Thick). It's quite > possible for a new LV to be allocated from the VG and a block from a > previous LV can be allocated. So in essence if somebody were to sit there > in a cloud env and just create volumes and read the blocks over and over > and over they could gather some previous or other tenants data (or pieces > of it at any rate). It's def the "right" thing to do if you're in an env > where you need some level of security between tenants. There are other > ways to solve it of course but this is what we've got. > Has anyone raised this issue with the LVM folk? Returning zeros on unwritten blocks would require a bit of extra bookkeeping, but a lot more efficient overall. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister wrote: > > On Thu, Oct 23, 2014 at 7:51 AM, John Griffith > wrote: >> >> On Thu, Oct 23, 2014 at 8:50 AM, John Griffith >> wrote: >>> >>> On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister < >>> pres...@bannister.us> wrote: >>> John, As a (new) OpenStack developer, I just discovered the "CINDER_SECURE_DELETE" option. >>> >> OHHH... Most importantly, I almost forgot. Welcome!!! >> > > Thanks! (I think...) > :) > > > > >> It doesn't suck as bad as you might have thought or some of the other >>> respondents on this thread seem to think. There's certainly room for >>> improvement and growth but it hasn't been completely ignored on the Cinder >>> side. >>> >> > To be clear, I am fairly impressed with what has gone into OpenStack as a > whole. Given the breadth, complexity, and growth ... not everything is > going to be perfect (yet?). > > So ... not trying to disparage past work, but noting what does not seem > right. (Also know I could easily be missing something.) > Sure, I didn't mean anything by that at all, and certainly didn't take it that way. > > > > > >> The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. >>> > Yes, that is pretty much the key. > > Does LVM let you read physical blocks that have never been written? Or > zero out virgin segments on read? If not, then "dd" of zeroes is a way of > doing the right thing (if *very* expensive). > Yeah... so that's the crux of the issue on LVM (Thick). It's quite possible for a new LV to be allocated from the VG and a block from a previous LV can be allocated. So in essence if somebody were to sit there in a cloud env and just create volumes and read the blocks over and over and over they could gather some previous or other tenants data (or pieces of it at any rate). It's def the "right" thing to do if you're in an env where you need some level of security between tenants. There are other ways to solve it of course but this is what we've got. > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 7:51 AM, John Griffith wrote: > > On Thu, Oct 23, 2014 at 8:50 AM, John Griffith > wrote: >> >> On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister < >> pres...@bannister.us> wrote: >> >>> John, >>> >>> As a (new) OpenStack developer, I just discovered the >>> "CINDER_SECURE_DELETE" option. >>> >> > OHHH... Most importantly, I almost forgot. Welcome!!! > Thanks! (I think...) > It doesn't suck as bad as you might have thought or some of the other >> respondents on this thread seem to think. There's certainly room for >> improvement and growth but it hasn't been completely ignored on the Cinder >> side. >> > To be clear, I am fairly impressed with what has gone into OpenStack as a whole. Given the breadth, complexity, and growth ... not everything is going to be perfect (yet?). So ... not trying to disparage past work, but noting what does not seem right. (Also know I could easily be missing something.) > The debate about whether to wipe LV's pretty much massively depends on the >>> intelligence of the underlying store. If the lower level storage never >>> returns accidental information ... explicit zeroes are not needed. >>> >> Yes, that is pretty much the key. Does LVM let you read physical blocks that have never been written? Or zero out virgin segments on read? If not, then "dd" of zeroes is a way of doing the right thing (if *very* expensive). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 8:50 AM, John Griffith wrote: > > > On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister < > pres...@bannister.us> wrote: > >> John, >> >> As a (new) OpenStack developer, I just discovered the >> "CINDER_SECURE_DELETE" option. >> > OHHH... Most importantly, I almost forgot. Welcome!!! > >> As an *implicit* default, I entirely approve. Production OpenStack >> installations should *absolutely* insure there is no information leakage >> from one instance to the next. >> >> As an *explicit* default, I am not so sure. Low-end storage requires you >> do this explicitly. High-end storage can insure information never leaks. >> Counting on high level storage can make the upper levels more efficient, >> can be a good thing. >> > > Not entirely sure of the distinction intended as far as > implicit/explicit... but one other thing I should probably point out; this > ONLY applies to the LVM driver, maybe that's what you're getting at. Would > be better probably to advertise as an LVM Driver option (easy enough to do > in the config options help message). > > Anyway, I just wanted to point to some of the options like using io-nice, > clear-size, blkio cgroups, bps_limit.. > > It doesn't suck as bad as you might have thought or some of the other > respondents on this thread seem to think. There's certainly room for > improvement and growth but it hasn't been completely ignored on the Cinder > side. > > >> >> The debate about whether to wipe LV's pretty much massively depends on >> the intelligence of the underlying store. If the lower level storage never >> returns accidental information ... explicit zeroes are not needed. >> >> >> >> On Wed, Oct 22, 2014 at 11:15 PM, John Griffith > > wrote: >> >>> >>> >>> On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas >>> wrote: >>> For LVM-thin I believe it is already disabled? It is only really needed on LVM-thick, where the returning zeros behaviour is not done. On 21 October 2014 08:29, Avishay Traeger wrote: > I would say that wipe-on-delete is not necessary in most deployments. > > Most storage backends exhibit the following behavior: > 1. Delete volume A that has data on physical sectors 1-10 > 2. Create new volume B > 3. Read from volume B before writing, which happens to map to physical > sector 5 - backend should return zeroes here, and not data from volume A > > In case the backend doesn't provide this rather standard behavior, data must > be wiped immediately. Otherwise, the only risk is physical security, and if > that's not adequate, customers shouldn't be storing all their data there > regardless. You could also run a periodic job to wipe deleted volumes to > reduce the window of vulnerability, without making delete_volume take a > ridiculously long time. > > Encryption is a good option as well, and of course it protects the data > before deletion as well (as long as your keys are protected...) > > Bottom line - I too think the default in devstack should be to disable this > option, and think we should consider making the default False in Cinder > itself. This isn't the first time someone has asked why volume deletion > takes 20 minutes... > > As for queuing backup operations and managing bandwidth for various > operations, ideally this would be done with a holistic view, so that for > example Cinder operations won't interfere with Nova, or different Nova > operations won't interfere with each other, but that is probably far down > the road. > > Thanks, > Avishay > > > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen < chris.frie...@windriver.com> > wrote: >> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote: >>> >>> Hi Preston, >>> Replies to some of your cinder-related questions: >>> 1. Creating a snapshot isn't usually an I/O intensive operation. Are >>> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the >>> CPU usage of cinder-api spike sometimes - not sure why. >>> 2. The 'dd' processes that you see are Cinder wiping the volumes during >>> deletion. You can either disable this in cinder.conf, or you can use a >>> relatively new option to manage the bandwidth used for this. >>> >>> IMHO, deployments should be optimized to not do very long/intensive >>> management operations - for example, use backends with efficient >>> snapshots, use CoW operations wherever possible rather than copying full >>> volumes/images, disabling wipe on delete, etc. >> >> >> In a public-cloud environment I don't think it's reasonable to disable >> wipe-on-delete. >> >> Arguably it would be better to use encryption instead of wipe-on-
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister wrote: > John, > > As a (new) OpenStack developer, I just discovered the > "CINDER_SECURE_DELETE" option. > > As an *implicit* default, I entirely approve. Production OpenStack > installations should *absolutely* insure there is no information leakage > from one instance to the next. > > As an *explicit* default, I am not so sure. Low-end storage requires you > do this explicitly. High-end storage can insure information never leaks. > Counting on high level storage can make the upper levels more efficient, > can be a good thing. > Not entirely sure of the distinction intended as far as implicit/explicit... but one other thing I should probably point out; this ONLY applies to the LVM driver, maybe that's what you're getting at. Would be better probably to advertise as an LVM Driver option (easy enough to do in the config options help message). Anyway, I just wanted to point to some of the options like using io-nice, clear-size, blkio cgroups, bps_limit.. It doesn't suck as bad as you might have thought or some of the other respondents on this thread seem to think. There's certainly room for improvement and growth but it hasn't been completely ignored on the Cinder side. > > The debate about whether to wipe LV's pretty much massively depends on the > intelligence of the underlying store. If the lower level storage never > returns accidental information ... explicit zeroes are not needed. > > > > On Wed, Oct 22, 2014 at 11:15 PM, John Griffith > wrote: > >> >> >> On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas >> wrote: >> >>> For LVM-thin I believe it is already disabled? It is only really >>> needed on LVM-thick, where the returning zeros behaviour is not done. >>> >>> On 21 October 2014 08:29, Avishay Traeger >>> wrote: >>> > I would say that wipe-on-delete is not necessary in most deployments. >>> > >>> > Most storage backends exhibit the following behavior: >>> > 1. Delete volume A that has data on physical sectors 1-10 >>> > 2. Create new volume B >>> > 3. Read from volume B before writing, which happens to map to physical >>> > sector 5 - backend should return zeroes here, and not data from volume >>> A >>> > >>> > In case the backend doesn't provide this rather standard behavior, >>> data must >>> > be wiped immediately. Otherwise, the only risk is physical security, >>> and if >>> > that's not adequate, customers shouldn't be storing all their data >>> there >>> > regardless. You could also run a periodic job to wipe deleted volumes >>> to >>> > reduce the window of vulnerability, without making delete_volume take a >>> > ridiculously long time. >>> > >>> > Encryption is a good option as well, and of course it protects the data >>> > before deletion as well (as long as your keys are protected...) >>> > >>> > Bottom line - I too think the default in devstack should be to disable >>> this >>> > option, and think we should consider making the default False in Cinder >>> > itself. This isn't the first time someone has asked why volume >>> deletion >>> > takes 20 minutes... >>> > >>> > As for queuing backup operations and managing bandwidth for various >>> > operations, ideally this would be done with a holistic view, so that >>> for >>> > example Cinder operations won't interfere with Nova, or different Nova >>> > operations won't interfere with each other, but that is probably far >>> down >>> > the road. >>> > >>> > Thanks, >>> > Avishay >>> > >>> > >>> > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen < >>> chris.frie...@windriver.com> >>> > wrote: >>> >> >>> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote: >>> >>> >>> >>> Hi Preston, >>> >>> Replies to some of your cinder-related questions: >>> >>> 1. Creating a snapshot isn't usually an I/O intensive operation. Are >>> >>> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen >>> the >>> >>> CPU usage of cinder-api spike sometimes - not sure why. >>> >>> 2. The 'dd' processes that you see are Cinder wiping the volumes >>> during >>> >>> deletion. You can either disable this in cinder.conf, or you can >>> use a >>> >>> relatively new option to manage the bandwidth used for this. >>> >>> >>> >>> IMHO, deployments should be optimized to not do very long/intensive >>> >>> management operations - for example, use backends with efficient >>> >>> snapshots, use CoW operations wherever possible rather than copying >>> full >>> >>> volumes/images, disabling wipe on delete, etc. >>> >> >>> >> >>> >> In a public-cloud environment I don't think it's reasonable to disable >>> >> wipe-on-delete. >>> >> >>> >> Arguably it would be better to use encryption instead of >>> wipe-on-delete. >>> >> When done with the backing store, just throw away the key and it'll be >>> >> secure enough for most purposes. >>> >> >>> >> Chris >>> >> >>> >> >>> >> >>> >> ___ >>> >> OpenStack-dev mailing list >>> >> OpenStack-dev@lists.openstack.org >>> >> h
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On 23 October 2014 08:30, Preston L. Bannister wrote: > John, > > As a (new) OpenStack developer, I just discovered the "CINDER_SECURE_DELETE" > option. > > As an *implicit* default, I entirely approve. Production OpenStack > installations should *absolutely* insure there is no information leakage > from one instance to the next. > > As an *explicit* default, I am not so sure. Low-end storage requires you do > this explicitly. High-end storage can insure information never leaks. > Counting on high level storage can make the upper levels more efficient, can > be a good thing. > > The debate about whether to wipe LV's pretty much massively depends on the > intelligence of the underlying store. If the lower level storage never > returns accidental information ... explicit zeroes are not needed. The security requirements regarding wiping are totally and utterly site dependent - some places care and are happy to pay the cost (some even using an entirely pointless multi-write scrub out of historically rooted paranoia) where as some don't care in the slightest. LVM thin that John mentioned is no worse or better than most 'smart' arrays - unless you happen to hit a bug, it won't return previous info. That's a good default, if your site needs better then there are lots of config options to go looking into for a whole variety of things, and you should probably be doing your own security audits of the code base and other deep analysis, as well as reading and contributing to the security guide. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
John, As a (new) OpenStack developer, I just discovered the "CINDER_SECURE_DELETE" option. As an *implicit* default, I entirely approve. Production OpenStack installations should *absolutely* insure there is no information leakage from one instance to the next. As an *explicit* default, I am not so sure. Low-end storage requires you do this explicitly. High-end storage can insure information never leaks. Counting on high level storage can make the upper levels more efficient, can be a good thing. The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. On Wed, Oct 22, 2014 at 11:15 PM, John Griffith wrote: > > > On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas > wrote: > >> For LVM-thin I believe it is already disabled? It is only really >> needed on LVM-thick, where the returning zeros behaviour is not done. >> >> On 21 October 2014 08:29, Avishay Traeger >> wrote: >> > I would say that wipe-on-delete is not necessary in most deployments. >> > >> > Most storage backends exhibit the following behavior: >> > 1. Delete volume A that has data on physical sectors 1-10 >> > 2. Create new volume B >> > 3. Read from volume B before writing, which happens to map to physical >> > sector 5 - backend should return zeroes here, and not data from volume A >> > >> > In case the backend doesn't provide this rather standard behavior, data >> must >> > be wiped immediately. Otherwise, the only risk is physical security, >> and if >> > that's not adequate, customers shouldn't be storing all their data there >> > regardless. You could also run a periodic job to wipe deleted volumes >> to >> > reduce the window of vulnerability, without making delete_volume take a >> > ridiculously long time. >> > >> > Encryption is a good option as well, and of course it protects the data >> > before deletion as well (as long as your keys are protected...) >> > >> > Bottom line - I too think the default in devstack should be to disable >> this >> > option, and think we should consider making the default False in Cinder >> > itself. This isn't the first time someone has asked why volume deletion >> > takes 20 minutes... >> > >> > As for queuing backup operations and managing bandwidth for various >> > operations, ideally this would be done with a holistic view, so that for >> > example Cinder operations won't interfere with Nova, or different Nova >> > operations won't interfere with each other, but that is probably far >> down >> > the road. >> > >> > Thanks, >> > Avishay >> > >> > >> > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen < >> chris.frie...@windriver.com> >> > wrote: >> >> >> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote: >> >>> >> >>> Hi Preston, >> >>> Replies to some of your cinder-related questions: >> >>> 1. Creating a snapshot isn't usually an I/O intensive operation. Are >> >>> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the >> >>> CPU usage of cinder-api spike sometimes - not sure why. >> >>> 2. The 'dd' processes that you see are Cinder wiping the volumes >> during >> >>> deletion. You can either disable this in cinder.conf, or you can use >> a >> >>> relatively new option to manage the bandwidth used for this. >> >>> >> >>> IMHO, deployments should be optimized to not do very long/intensive >> >>> management operations - for example, use backends with efficient >> >>> snapshots, use CoW operations wherever possible rather than copying >> full >> >>> volumes/images, disabling wipe on delete, etc. >> >> >> >> >> >> In a public-cloud environment I don't think it's reasonable to disable >> >> wipe-on-delete. >> >> >> >> Arguably it would be better to use encryption instead of >> wipe-on-delete. >> >> When done with the backing store, just throw away the key and it'll be >> >> secure enough for most purposes. >> >> >> >> Chris >> >> >> >> >> >> >> >> ___ >> >> OpenStack-dev mailing list >> >> OpenStack-dev@lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> > >> > >> > ___ >> > OpenStack-dev mailing list >> > OpenStack-dev@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> >> >> >> -- >> Duncan Thomas >> >> ___ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > We disable this in the Gates "CINDER_SECURE_DELETE=False" > > ThinLVM (which hopefully will be default upon release of Kilo) doesn't > need it because internally it returns zeros when reading unallocated blocks > so it's a non-issue. > > The debate of to wipe LV's or not to is a long running issue. The default > behavior in Cinder is to leave it enable and IMHO
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas wrote: > For LVM-thin I believe it is already disabled? It is only really > needed on LVM-thick, where the returning zeros behaviour is not done. > > On 21 October 2014 08:29, Avishay Traeger wrote: > > I would say that wipe-on-delete is not necessary in most deployments. > > > > Most storage backends exhibit the following behavior: > > 1. Delete volume A that has data on physical sectors 1-10 > > 2. Create new volume B > > 3. Read from volume B before writing, which happens to map to physical > > sector 5 - backend should return zeroes here, and not data from volume A > > > > In case the backend doesn't provide this rather standard behavior, data > must > > be wiped immediately. Otherwise, the only risk is physical security, > and if > > that's not adequate, customers shouldn't be storing all their data there > > regardless. You could also run a periodic job to wipe deleted volumes to > > reduce the window of vulnerability, without making delete_volume take a > > ridiculously long time. > > > > Encryption is a good option as well, and of course it protects the data > > before deletion as well (as long as your keys are protected...) > > > > Bottom line - I too think the default in devstack should be to disable > this > > option, and think we should consider making the default False in Cinder > > itself. This isn't the first time someone has asked why volume deletion > > takes 20 minutes... > > > > As for queuing backup operations and managing bandwidth for various > > operations, ideally this would be done with a holistic view, so that for > > example Cinder operations won't interfere with Nova, or different Nova > > operations won't interfere with each other, but that is probably far down > > the road. > > > > Thanks, > > Avishay > > > > > > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen < > chris.frie...@windriver.com> > > wrote: > >> > >> On 10/19/2014 09:33 AM, Avishay Traeger wrote: > >>> > >>> Hi Preston, > >>> Replies to some of your cinder-related questions: > >>> 1. Creating a snapshot isn't usually an I/O intensive operation. Are > >>> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the > >>> CPU usage of cinder-api spike sometimes - not sure why. > >>> 2. The 'dd' processes that you see are Cinder wiping the volumes during > >>> deletion. You can either disable this in cinder.conf, or you can use a > >>> relatively new option to manage the bandwidth used for this. > >>> > >>> IMHO, deployments should be optimized to not do very long/intensive > >>> management operations - for example, use backends with efficient > >>> snapshots, use CoW operations wherever possible rather than copying > full > >>> volumes/images, disabling wipe on delete, etc. > >> > >> > >> In a public-cloud environment I don't think it's reasonable to disable > >> wipe-on-delete. > >> > >> Arguably it would be better to use encryption instead of wipe-on-delete. > >> When done with the backing store, just throw away the key and it'll be > >> secure enough for most purposes. > >> > >> Chris > >> > >> > >> > >> ___ > >> OpenStack-dev mailing list > >> OpenStack-dev@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > ___ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > -- > Duncan Thomas > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > We disable this in the Gates "CINDER_SECURE_DELETE=False" ThinLVM (which hopefully will be default upon release of Kilo) doesn't need it because internally it returns zeros when reading unallocated blocks so it's a non-issue. The debate of to wipe LV's or not to is a long running issue. The default behavior in Cinder is to leave it enable and IMHO that's how it should stay. The fact is anything that might be construed as "less secure" and has been defaulted to the "more secure" setting should be left as it is. It's simple to turn this off. Also, nobody seemed to mention that in the case of Cinder operations like copy-volume and the delete process you also have the ability to set bandwidth limits on these operations, and in the case of delete even specify different schemes (not just enabled/disabled but other options that may be less or more IO intensive). For further reference checkout the config options [1] Thanks, John [1]: https://github.com/openstack/cinder/blob/master/cinder/volume/driver.py#L69 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
For LVM-thin I believe it is already disabled? It is only really needed on LVM-thick, where the returning zeros behaviour is not done. On 21 October 2014 08:29, Avishay Traeger wrote: > I would say that wipe-on-delete is not necessary in most deployments. > > Most storage backends exhibit the following behavior: > 1. Delete volume A that has data on physical sectors 1-10 > 2. Create new volume B > 3. Read from volume B before writing, which happens to map to physical > sector 5 - backend should return zeroes here, and not data from volume A > > In case the backend doesn't provide this rather standard behavior, data must > be wiped immediately. Otherwise, the only risk is physical security, and if > that's not adequate, customers shouldn't be storing all their data there > regardless. You could also run a periodic job to wipe deleted volumes to > reduce the window of vulnerability, without making delete_volume take a > ridiculously long time. > > Encryption is a good option as well, and of course it protects the data > before deletion as well (as long as your keys are protected...) > > Bottom line - I too think the default in devstack should be to disable this > option, and think we should consider making the default False in Cinder > itself. This isn't the first time someone has asked why volume deletion > takes 20 minutes... > > As for queuing backup operations and managing bandwidth for various > operations, ideally this would be done with a holistic view, so that for > example Cinder operations won't interfere with Nova, or different Nova > operations won't interfere with each other, but that is probably far down > the road. > > Thanks, > Avishay > > > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen > wrote: >> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote: >>> >>> Hi Preston, >>> Replies to some of your cinder-related questions: >>> 1. Creating a snapshot isn't usually an I/O intensive operation. Are >>> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the >>> CPU usage of cinder-api spike sometimes - not sure why. >>> 2. The 'dd' processes that you see are Cinder wiping the volumes during >>> deletion. You can either disable this in cinder.conf, or you can use a >>> relatively new option to manage the bandwidth used for this. >>> >>> IMHO, deployments should be optimized to not do very long/intensive >>> management operations - for example, use backends with efficient >>> snapshots, use CoW operations wherever possible rather than copying full >>> volumes/images, disabling wipe on delete, etc. >> >> >> In a public-cloud environment I don't think it's reasonable to disable >> wipe-on-delete. >> >> Arguably it would be better to use encryption instead of wipe-on-delete. >> When done with the backing store, just throw away the key and it'll be >> secure enough for most purposes. >> >> Chris >> >> >> >> ___ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Duncan Thomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
I would say that wipe-on-delete is not necessary in most deployments. Most storage backends exhibit the following behavior: 1. Delete volume A that has data on physical sectors 1-10 2. Create new volume B 3. Read from volume B before writing, which happens to map to physical sector 5 - backend should return zeroes here, and not data from volume A In case the backend doesn't provide this rather standard behavior, data must be wiped immediately. Otherwise, the only risk is physical security, and if that's not adequate, customers shouldn't be storing all their data there regardless. You could also run a periodic job to wipe deleted volumes to reduce the window of vulnerability, without making delete_volume take a ridiculously long time. Encryption is a good option as well, and of course it protects the data before deletion as well (as long as your keys are protected...) Bottom line - I too think the default in devstack should be to disable this option, and think we should consider making the default False in Cinder itself. This isn't the first time someone has asked why volume deletion takes 20 minutes... As for queuing backup operations and managing bandwidth for various operations, ideally this would be done with a holistic view, so that for example Cinder operations won't interfere with Nova, or different Nova operations won't interfere with each other, but that is probably far down the road. Thanks, Avishay On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen wrote: > On 10/19/2014 09:33 AM, Avishay Traeger wrote: > >> Hi Preston, >> Replies to some of your cinder-related questions: >> 1. Creating a snapshot isn't usually an I/O intensive operation. Are >> you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the >> CPU usage of cinder-api spike sometimes - not sure why. >> 2. The 'dd' processes that you see are Cinder wiping the volumes during >> deletion. You can either disable this in cinder.conf, or you can use a >> relatively new option to manage the bandwidth used for this. >> >> IMHO, deployments should be optimized to not do very long/intensive >> management operations - for example, use backends with efficient >> snapshots, use CoW operations wherever possible rather than copying full >> volumes/images, disabling wipe on delete, etc. >> > > In a public-cloud environment I don't think it's reasonable to disable > wipe-on-delete. > > Arguably it would be better to use encryption instead of wipe-on-delete. > When done with the backing store, just throw away the key and it'll be > secure enough for most purposes. > > Chris > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On 10/19/2014 09:33 AM, Avishay Traeger wrote: Hi Preston, Replies to some of your cinder-related questions: 1. Creating a snapshot isn't usually an I/O intensive operation. Are you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the CPU usage of cinder-api spike sometimes - not sure why. 2. The 'dd' processes that you see are Cinder wiping the volumes during deletion. You can either disable this in cinder.conf, or you can use a relatively new option to manage the bandwidth used for this. IMHO, deployments should be optimized to not do very long/intensive management operations - for example, use backends with efficient snapshots, use CoW operations wherever possible rather than copying full volumes/images, disabling wipe on delete, etc. In a public-cloud environment I don't think it's reasonable to disable wipe-on-delete. Arguably it would be better to use encryption instead of wipe-on-delete. When done with the backing store, just throw away the key and it'll be secure enough for most purposes. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Jay, Thanks very much for the insight and links. In fact, I have visited *almost* all the places mentioned, prior. Added clarity is good. :) Also, to your earlier comment (to an earlier thread) about backup not really belonging in Nova - in main I agree. The "backup" API belongs in Nova (as this maps cleanly to the equivalent in AWS), but the bulk of the implementation can and should be distinct (in my opinion). My current work is at: https://github.com/dreadedhill-work/stack-backup I also have matching changes to Nova and the Nova client under the same Github account. Please note this is very much a work in progress (as you might guess from my prior comments). This needs a longer proper write up, and a cleaner Git history. The code is a pretty fair ways along, but should be considered more a rough draft, rather than a final version. For the next few weeks, I am enormously crunched for time, as I have promised a PoC at a site with a very large OpenStack deployment. Noted your suggestion about the Rally team. Might be a bit before I can pursue. :) Again, Thanks. On Sun, Oct 19, 2014 at 10:13 AM, Jay Pipes wrote: > Hi Preston, some great questions in here. Some comments inline, but tl;dr > my answer is "yes, we need to be doing a much better job thinking about how > I/O intensive operations affect other things running on providers of > compute and block storage resources" > > On 10/19/2014 06:41 AM, Preston L. Bannister wrote: > >> OK, I am fairly new here (to OpenStack). Maybe I am missing something. >> Or not. >> >> Have a DevStack, running in a VM (VirtualBox), backed by a single flash >> drive (on my current generation MacBook). Could be I have something off >> in my setup. >> >> Testing nova backup - first the existing implementation, then my (much >> changed) replacement. >> >> Simple scripts for testing. Create images. Create instances (five). Run >> backup on all instances. >> >> Currently found in: >> https://github.com/dreadedhill-work/stack-backup/ >> tree/master/backup-scripts >> >> First time I started backups of all (five) instances, load on the >> Devstack VM went insane, and all but one backup failed. Seems that all >> of the backups were performed immediately (or attempted), without any >> sort of queuing or load management. Huh. Well, maybe just the backup >> implementation is naive... >> > > Yes, you are exactly correct. There is no queuing behaviour for any of the > "backup" operations (I put "backup" operations in quotes because IMO it is > silly to refer to them as backup operations, since all they are doing > really is a snapshot action against the instance/volume -- and then > attempting to be a poor man's cloud cron). > > The backup is initiated from the admin_actions API extension here: > > https://github.com/openstack/nova/blob/master/nova/api/ > openstack/compute/contrib/admin_actions.py#L297 > > which calls the nova.compute.api.API.backup() method here: > > https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031 > > which, after creating some image metadata in Glance for the snapshot, > calls the compute RPC API here: > > https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759 > > Which sends an RPC asynchronous message to the compute node to execute the > instance snapshot and "rotate backups": > > https://github.com/openstack/nova/blob/master/nova/compute/ > manager.py#L2969 > > That method eventually calls the blocking snapshot() operation on the virt > driver: > > https://github.com/openstack/nova/blob/master/nova/compute/ > manager.py#L3041 > > And it is the nova.virt.libvirt.Driver.snapshot() method that is quite > "icky", with lots of logic to determine the type of snapshot to do and how > to do it: > > https://github.com/openstack/nova/blob/master/nova/virt/ > libvirt/driver.py#L1607 > > The gist of the driver's snapshot() method calls ImageBackend.snapshot(), > which is responsible for doing the actual snapshot of the instance: > > https://github.com/openstack/nova/blob/master/nova/virt/ > libvirt/driver.py#L1685 > > and then once the snapshot is done, the method calls to the Glance API to > upload the snapshotted disk image to Glance: > > https://github.com/openstack/nova/blob/master/nova/virt/ > libvirt/driver.py#L1730-L1734 > > All of which is I/O intensive and AFAICT, mostly done in a blocking > manner, with no queuing or traffic control measures, so as you correctly > point out, if the compute node daemon receives 5 backup requests, it will > go ahead and do 5 snapshot operations and 5 uploads to Glance all as fast > as it can. It will do it in 5 different eventlet greenthreads, but there > are no designs in place to prioritize the snapshotting I/O lower than > active VM I/O. > > I will write on this at greater length, but backup should interfere as >> little as possible with foreground processing. Overloading a host is >> entirely unacceptable. >> > > Agree with you completely. > > Replaced the backup implementatio
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Hi Preston, some great questions in here. Some comments inline, but tl;dr my answer is "yes, we need to be doing a much better job thinking about how I/O intensive operations affect other things running on providers of compute and block storage resources" On 10/19/2014 06:41 AM, Preston L. Bannister wrote: OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or not. Have a DevStack, running in a VM (VirtualBox), backed by a single flash drive (on my current generation MacBook). Could be I have something off in my setup. Testing nova backup - first the existing implementation, then my (much changed) replacement. Simple scripts for testing. Create images. Create instances (five). Run backup on all instances. Currently found in: https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts First time I started backups of all (five) instances, load on the Devstack VM went insane, and all but one backup failed. Seems that all of the backups were performed immediately (or attempted), without any sort of queuing or load management. Huh. Well, maybe just the backup implementation is naive... Yes, you are exactly correct. There is no queuing behaviour for any of the "backup" operations (I put "backup" operations in quotes because IMO it is silly to refer to them as backup operations, since all they are doing really is a snapshot action against the instance/volume -- and then attempting to be a poor man's cloud cron). The backup is initiated from the admin_actions API extension here: https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/admin_actions.py#L297 which calls the nova.compute.api.API.backup() method here: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031 which, after creating some image metadata in Glance for the snapshot, calls the compute RPC API here: https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759 Which sends an RPC asynchronous message to the compute node to execute the instance snapshot and "rotate backups": https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2969 That method eventually calls the blocking snapshot() operation on the virt driver: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3041 And it is the nova.virt.libvirt.Driver.snapshot() method that is quite "icky", with lots of logic to determine the type of snapshot to do and how to do it: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1607 The gist of the driver's snapshot() method calls ImageBackend.snapshot(), which is responsible for doing the actual snapshot of the instance: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1685 and then once the snapshot is done, the method calls to the Glance API to upload the snapshotted disk image to Glance: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1730-L1734 All of which is I/O intensive and AFAICT, mostly done in a blocking manner, with no queuing or traffic control measures, so as you correctly point out, if the compute node daemon receives 5 backup requests, it will go ahead and do 5 snapshot operations and 5 uploads to Glance all as fast as it can. It will do it in 5 different eventlet greenthreads, but there are no designs in place to prioritize the snapshotting I/O lower than active VM I/O. I will write on this at greater length, but backup should interfere as little as possible with foreground processing. Overloading a host is entirely unacceptable. Agree with you completely. Replaced the backup implementation so it does proper queuing (among other things). Iterating forward - implementing and testing. Is this code up somewhere we can take a look at? Fired off snapshots on five Cinder volumes (attached to five instances). Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, maybe storage can handle that much I/O more gracefully ... or not. Again, should taking snapshots interfere with foreground activity? I would say, most often not. Queuing and serializing snapshots would strictly limit the interference with foreground. Also, very high end storage can perform snapshots *very* quickly, so serialized snapshots will not be slow. My take is that the default behavior should be to queue and serialize all heavy I/O operations, with non-default allowances for limited concurrency. Cleaned up (which required reboot/unstack/stack and more). Tried again. Ran two test backups (which in the current iteration create Cinder volume snapshots). Asked Cinder to delete the snapshots. Again, very high load factors, and in "top" I can see two long-running "dd" processes. (Given I have a single disk, more than one "dd" is not good.) Running too many heavyweight operations against storage can lead to thrashing. Queuing can strictly limit that load, and insure better and reliable performance
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Avishay, Thanks for the tip on [cinder.conf] volume_clear. The corresponding option in devstack is CINDER_SECURE_DELETE=False. Also I *may* have been bitten by the related bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1023755 (All I know at this point is the devstack VM became unresponsive - have not yet identified the cause. But the symptoms fit.) Not sure if there are spikes on Cinder snapshot creation. Perhaps not. (Too many different failures and oddities. Have not sorted all, yet.) I am of the opinion CINDER_SECURE_DELETE=False should be a default for devstack. Especially as it invokes bug-like behavior. Also, unbounded concurrent "dd" operations is not a good idea. (Which is generally what you meant, I believe.) Onwards On Sun, Oct 19, 2014 at 8:33 AM, Avishay Traeger wrote: > Hi Preston, > Replies to some of your cinder-related questions: > 1. Creating a snapshot isn't usually an I/O intensive operation. Are you > seeing I/O spike or CPU? If you're seeing CPU load, I've seen the CPU > usage of cinder-api spike sometimes - not sure why. > 2. The 'dd' processes that you see are Cinder wiping the volumes during > deletion. You can either disable this in cinder.conf, or you can use a > relatively new option to manage the bandwidth used for this. > > IMHO, deployments should be optimized to not do very long/intensive > management operations - for example, use backends with efficient snapshots, > use CoW operations wherever possible rather than copying full > volumes/images, disabling wipe on delete, etc. > > Thanks, > Avishay > > On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister < > pres...@bannister.us> wrote: > >> OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or >> not. >> >> Have a DevStack, running in a VM (VirtualBox), backed by a single flash >> drive (on my current generation MacBook). Could be I have something off in >> my setup. >> >> Testing nova backup - first the existing implementation, then my (much >> changed) replacement. >> >> Simple scripts for testing. Create images. Create instances (five). Run >> backup on all instances. >> >> Currently found in: >> >> https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts >> >> First time I started backups of all (five) instances, load on the >> Devstack VM went insane, and all but one backup failed. Seems that all of >> the backups were performed immediately (or attempted), without any sort of >> queuing or load management. Huh. Well, maybe just the backup implementation >> is naive... >> >> I will write on this at greater length, but backup should interfere as >> little as possible with foreground processing. Overloading a host is >> entirely unacceptable. >> >> Replaced the backup implementation so it does proper queuing (among other >> things). Iterating forward - implementing and testing. >> >> Fired off snapshots on five Cinder volumes (attached to five instances). >> Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, >> maybe storage can handle that much I/O more gracefully ... or not. Again, >> should taking snapshots interfere with foreground activity? I would say, >> most often not. Queuing and serializing snapshots would strictly limit the >> interference with foreground. Also, very high end storage can perform >> snapshots *very* quickly, so serialized snapshots will not be slow. My take >> is that the default behavior should be to queue and serialize all heavy I/O >> operations, with non-default allowances for limited concurrency. >> >> Cleaned up (which required reboot/unstack/stack and more). Tried again. >> >> Ran two test backups (which in the current iteration create Cinder volume >> snapshots). Asked Cinder to delete the snapshots. Again, very high load >> factors, and in "top" I can see two long-running "dd" processes. (Given I >> have a single disk, more than one "dd" is not good.) >> >> Running too many heavyweight operations against storage can lead to >> thrashing. Queuing can strictly limit that load, and insure better and >> reliable performance. I am not seeing evidence of this thought in my >> OpenStack testing. >> >> So far it looks like there is no thought to managing the impact of disk >> intensive management operations. Am I missing something? >> >> >> >> >> >> ___ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Hi Preston, Replies to some of your cinder-related questions: 1. Creating a snapshot isn't usually an I/O intensive operation. Are you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the CPU usage of cinder-api spike sometimes - not sure why. 2. The 'dd' processes that you see are Cinder wiping the volumes during deletion. You can either disable this in cinder.conf, or you can use a relatively new option to manage the bandwidth used for this. IMHO, deployments should be optimized to not do very long/intensive management operations - for example, use backends with efficient snapshots, use CoW operations wherever possible rather than copying full volumes/images, disabling wipe on delete, etc. Thanks, Avishay On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister wrote: > OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or > not. > > Have a DevStack, running in a VM (VirtualBox), backed by a single flash > drive (on my current generation MacBook). Could be I have something off in > my setup. > > Testing nova backup - first the existing implementation, then my (much > changed) replacement. > > Simple scripts for testing. Create images. Create instances (five). Run > backup on all instances. > > Currently found in: > https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts > > First time I started backups of all (five) instances, load on the Devstack > VM went insane, and all but one backup failed. Seems that all of the > backups were performed immediately (or attempted), without any sort of > queuing or load management. Huh. Well, maybe just the backup implementation > is naive... > > I will write on this at greater length, but backup should interfere as > little as possible with foreground processing. Overloading a host is > entirely unacceptable. > > Replaced the backup implementation so it does proper queuing (among other > things). Iterating forward - implementing and testing. > > Fired off snapshots on five Cinder volumes (attached to five instances). > Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, > maybe storage can handle that much I/O more gracefully ... or not. Again, > should taking snapshots interfere with foreground activity? I would say, > most often not. Queuing and serializing snapshots would strictly limit the > interference with foreground. Also, very high end storage can perform > snapshots *very* quickly, so serialized snapshots will not be slow. My take > is that the default behavior should be to queue and serialize all heavy I/O > operations, with non-default allowances for limited concurrency. > > Cleaned up (which required reboot/unstack/stack and more). Tried again. > > Ran two test backups (which in the current iteration create Cinder volume > snapshots). Asked Cinder to delete the snapshots. Again, very high load > factors, and in "top" I can see two long-running "dd" processes. (Given I > have a single disk, more than one "dd" is not good.) > > Running too many heavyweight operations against storage can lead to > thrashing. Queuing can strictly limit that load, and insure better and > reliable performance. I am not seeing evidence of this thought in my > OpenStack testing. > > So far it looks like there is no thought to managing the impact of disk > intensive management operations. Am I missing something? > > > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or not. Have a DevStack, running in a VM (VirtualBox), backed by a single flash drive (on my current generation MacBook). Could be I have something off in my setup. Testing nova backup - first the existing implementation, then my (much changed) replacement. Simple scripts for testing. Create images. Create instances (five). Run backup on all instances. Currently found in: https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts First time I started backups of all (five) instances, load on the Devstack VM went insane, and all but one backup failed. Seems that all of the backups were performed immediately (or attempted), without any sort of queuing or load management. Huh. Well, maybe just the backup implementation is naive... I will write on this at greater length, but backup should interfere as little as possible with foreground processing. Overloading a host is entirely unacceptable. Replaced the backup implementation so it does proper queuing (among other things). Iterating forward - implementing and testing. Fired off snapshots on five Cinder volumes (attached to five instances). Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, maybe storage can handle that much I/O more gracefully ... or not. Again, should taking snapshots interfere with foreground activity? I would say, most often not. Queuing and serializing snapshots would strictly limit the interference with foreground. Also, very high end storage can perform snapshots *very* quickly, so serialized snapshots will not be slow. My take is that the default behavior should be to queue and serialize all heavy I/O operations, with non-default allowances for limited concurrency. Cleaned up (which required reboot/unstack/stack and more). Tried again. Ran two test backups (which in the current iteration create Cinder volume snapshots). Asked Cinder to delete the snapshots. Again, very high load factors, and in "top" I can see two long-running "dd" processes. (Given I have a single disk, more than one "dd" is not good.) Running too many heavyweight operations against storage can lead to thrashing. Queuing can strictly limit that load, and insure better and reliable performance. I am not seeing evidence of this thought in my OpenStack testing. So far it looks like there is no thought to managing the impact of disk intensive management operations. Am I missing something? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev