Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Chris Friesen

On 10/23/2014 04:24 PM, Preston L. Bannister wrote:

On Thu, Oct 23, 2014 at 3:04 PM, John Griffith mailto:john.griffi...@gmail.com>> wrote:

The debate about whether to wipe LV's pretty much massively
depends on the intelligence of the underlying store. If the
lower level storage never returns accidental information ...
explicit zeroes are not needed.

On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister
mailto:pres...@bannister.us>> wrote:


Yes, that is pretty much the key.

Does LVM let you read physical blocks that have never been
written? Or zero out virgin segments on read? If not, then "dd"
of zeroes is a way of doing the right thing (if *very* expensive).

Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
possible for a new LV to be allocated from the VG and a block from a
previous LV can be allocated.  So in essence if somebody were to sit
there in a cloud env and just create volumes and read the blocks
over and over and over they could gather some previous or other
tenants data (or pieces of it at any rate).  It's def the "right"
thing to do if you're in an env where you need some level of
security between tenants.  There are other ways to solve it of
course but this is what we've got.



Has anyone raised this issue with the LVM folk? Returning zeros on
unwritten blocks would require a bit of extra bookkeeping, but a lot
more efficient overall.


For Cinder volumes, I think that if you have new enough versions of 
everything you can specify "lvm_type = thin" and it will use thin 
provisioning.  Among other things this should improve snapshot 
performance and also avoid the need to explicitly wipe on delete (since 
the next user of the storage will be provided zeros for a read of any 
page it hasn't written).


As far as I know this is not supported for ephemeral storage.

Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
On Thu, Oct 23, 2014 at 3:04 PM, John Griffith 
wrote:

The debate about whether to wipe LV's pretty much massively depends on the
>> intelligence of the underlying store. If the lower level storage never
>> returns accidental information ... explicit zeroes are not needed.
>>
>

> On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister <
> pres...@bannister.us> wrote:
>

>> Yes, that is pretty much the key.
>>
>> Does LVM let you read physical blocks that have never been written? Or
>> zero out virgin segments on read? If not, then "dd" of zeroes is a way of
>> doing the right thing (if *very* expensive).
>>
>
> Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
> possible for a new LV to be allocated from the VG and a block from a
> previous LV can be allocated.  So in essence if somebody were to sit there
> in a cloud env and just create volumes and read the blocks over and over
> and over they could gather some previous or other tenants data (or pieces
> of it at any rate).  It's def the "right" thing to do if you're in an env
> where you need some level of security between tenants.  There are other
> ways to solve it of course but this is what we've got.
>


Has anyone raised this issue with the LVM folk? Returning zeros on
unwritten blocks would require a bit of extra bookkeeping, but a lot more
efficient overall.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister 
wrote:

>
> On Thu, Oct 23, 2014 at 7:51 AM, John Griffith 
> wrote:
>>
>> On Thu, Oct 23, 2014 at 8:50 AM, John Griffith 
>> wrote:
>>>
>>> On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister <
>>> pres...@bannister.us> wrote:
>>>
 John,

 As a (new) OpenStack developer, I just discovered the
 "CINDER_SECURE_DELETE" option.

>>>
>> OHHH... Most importantly, I almost forgot.  Welcome!!!
>>
>
> Thanks! (I think...)
>
:)

>
>
>
>
>> It doesn't suck as bad as you might have thought or some of the other
>>> respondents on this thread seem to think.  There's certainly room for
>>> improvement and growth but it hasn't been completely ignored on the Cinder
>>> side.
>>>
>>
> To be clear, I am fairly impressed with what has gone into OpenStack as a
> whole. Given the breadth, complexity, and growth ... not everything is
> going to be perfect (yet?).
>
> So ... not trying to disparage past work, but noting what does not seem
> right. (Also know I could easily be missing something.)
>
Sure, I didn't mean anything by that at all, and certainly didn't take it
that way.

>
>
>
>
>
>> The debate about whether to wipe LV's pretty much massively depends on
 the intelligence of the underlying store. If the lower level storage never
 returns accidental information ... explicit zeroes are not needed.

>>>
> Yes, that is pretty much the key.
>
> Does LVM let you read physical blocks that have never been written? Or
> zero out virgin segments on read? If not, then "dd" of zeroes is a way of
> doing the right thing (if *very* expensive).
>

Yeah... so that's the crux of the issue on LVM (Thick).  It's quite
possible for a new LV to be allocated from the VG and a block from a
previous LV can be allocated.  So in essence if somebody were to sit there
in a cloud env and just create volumes and read the blocks over and over
and over they could gather some previous or other tenants data (or pieces
of it at any rate).  It's def the "right" thing to do if you're in an env
where you need some level of security between tenants.  There are other
ways to solve it of course but this is what we've got.

>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
On Thu, Oct 23, 2014 at 7:51 AM, John Griffith 
wrote:
>
> On Thu, Oct 23, 2014 at 8:50 AM, John Griffith 
> wrote:
>>
>> On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister <
>> pres...@bannister.us> wrote:
>>
>>> John,
>>>
>>> As a (new) OpenStack developer, I just discovered the
>>> "CINDER_SECURE_DELETE" option.
>>>
>>
> OHHH... Most importantly, I almost forgot.  Welcome!!!
>

Thanks! (I think...)




> It doesn't suck as bad as you might have thought or some of the other
>> respondents on this thread seem to think.  There's certainly room for
>> improvement and growth but it hasn't been completely ignored on the Cinder
>> side.
>>
>
To be clear, I am fairly impressed with what has gone into OpenStack as a
whole. Given the breadth, complexity, and growth ... not everything is
going to be perfect (yet?).

So ... not trying to disparage past work, but noting what does not seem
right. (Also know I could easily be missing something.)





> The debate about whether to wipe LV's pretty much massively depends on the
>>> intelligence of the underlying store. If the lower level storage never
>>> returns accidental information ... explicit zeroes are not needed.
>>>
>>
Yes, that is pretty much the key.

Does LVM let you read physical blocks that have never been written? Or zero
out virgin segments on read? If not, then "dd" of zeroes is a way of doing
the right thing (if *very* expensive).
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 8:50 AM, John Griffith 
wrote:

>
>
> On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister <
> pres...@bannister.us> wrote:
>
>> John,
>>
>> As a (new) OpenStack developer, I just discovered the
>> "CINDER_SECURE_DELETE" option.
>>
>
OHHH... Most importantly, I almost forgot.  Welcome!!!

>
>> As an *implicit* default, I entirely approve.  Production OpenStack
>> installations should *absolutely* insure there is no information leakage
>> from one instance to the next.
>>
>> As an *explicit* default, I am not so sure. Low-end storage requires you
>> do this explicitly. High-end storage can insure information never leaks.
>> Counting on high level storage can make the upper levels more efficient,
>> can be a good thing.
>>
>
> Not entirely sure of the distinction intended as far as
> implicit/explicit... but one other thing I should probably point out; this
> ONLY applies to the LVM driver, maybe that's what you're getting at.  Would
> be better probably to advertise as an LVM Driver option (easy enough to do
> in the config options help message).
>
> Anyway, I just wanted to point to some of the options like using io-nice,
> clear-size, blkio cgroups, bps_limit..
>
> It doesn't suck as bad as you might have thought or some of the other
> respondents on this thread seem to think.  There's certainly room for
> improvement and growth but it hasn't been completely ignored on the Cinder
> side.
>
>
>>
>> The debate about whether to wipe LV's pretty much massively depends on
>> the intelligence of the underlying store. If the lower level storage never
>> returns accidental information ... explicit zeroes are not needed.
>>
>>
>>
>> On Wed, Oct 22, 2014 at 11:15 PM, John Griffith > > wrote:
>>
>>>
>>>
>>> On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas 
>>> wrote:
>>>
 For LVM-thin I believe it is already disabled? It is only really
 needed on LVM-thick, where the returning zeros behaviour is not done.

 On 21 October 2014 08:29, Avishay Traeger 
 wrote:
 > I would say that wipe-on-delete is not necessary in most deployments.
 >
 > Most storage backends exhibit the following behavior:
 > 1. Delete volume A that has data on physical sectors 1-10
 > 2. Create new volume B
 > 3. Read from volume B before writing, which happens to map to physical
 > sector 5 - backend should return zeroes here, and not data from
 volume A
 >
 > In case the backend doesn't provide this rather standard behavior,
 data must
 > be wiped immediately.  Otherwise, the only risk is physical security,
 and if
 > that's not adequate, customers shouldn't be storing all their data
 there
 > regardless.  You could also run a periodic job to wipe deleted
 volumes to
 > reduce the window of vulnerability, without making delete_volume take
 a
 > ridiculously long time.
 >
 > Encryption is a good option as well, and of course it protects the
 data
 > before deletion as well (as long as your keys are protected...)
 >
 > Bottom line - I too think the default in devstack should be to
 disable this
 > option, and think we should consider making the default False in
 Cinder
 > itself.  This isn't the first time someone has asked why volume
 deletion
 > takes 20 minutes...
 >
 > As for queuing backup operations and managing bandwidth for various
 > operations, ideally this would be done with a holistic view, so that
 for
 > example Cinder operations won't interfere with Nova, or different Nova
 > operations won't interfere with each other, but that is probably far
 down
 > the road.
 >
 > Thanks,
 > Avishay
 >
 >
 > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen <
 chris.frie...@windriver.com>
 > wrote:
 >>
 >> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
 >>>
 >>> Hi Preston,
 >>> Replies to some of your cinder-related questions:
 >>> 1. Creating a snapshot isn't usually an I/O intensive operation.
 Are
 >>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen
 the
 >>> CPU usage of cinder-api spike sometimes - not sure why.
 >>> 2. The 'dd' processes that you see are Cinder wiping the volumes
 during
 >>> deletion.  You can either disable this in cinder.conf, or you can
 use a
 >>> relatively new option to manage the bandwidth used for this.
 >>>
 >>> IMHO, deployments should be optimized to not do very long/intensive
 >>> management operations - for example, use backends with efficient
 >>> snapshots, use CoW operations wherever possible rather than copying
 full
 >>> volumes/images, disabling wipe on delete, etc.
 >>
 >>
 >> In a public-cloud environment I don't think it's reasonable to
 disable
 >> wipe-on-delete.
 >>
 >> Arguably it would be better to use encryption instead of
 wipe-on-

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread John Griffith
On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister 
wrote:

> John,
>
> As a (new) OpenStack developer, I just discovered the
> "CINDER_SECURE_DELETE" option.
>
> As an *implicit* default, I entirely approve.  Production OpenStack
> installations should *absolutely* insure there is no information leakage
> from one instance to the next.
>
> As an *explicit* default, I am not so sure. Low-end storage requires you
> do this explicitly. High-end storage can insure information never leaks.
> Counting on high level storage can make the upper levels more efficient,
> can be a good thing.
>

Not entirely sure of the distinction intended as far as
implicit/explicit... but one other thing I should probably point out; this
ONLY applies to the LVM driver, maybe that's what you're getting at.  Would
be better probably to advertise as an LVM Driver option (easy enough to do
in the config options help message).

Anyway, I just wanted to point to some of the options like using io-nice,
clear-size, blkio cgroups, bps_limit..

It doesn't suck as bad as you might have thought or some of the other
respondents on this thread seem to think.  There's certainly room for
improvement and growth but it hasn't been completely ignored on the Cinder
side.


>
> The debate about whether to wipe LV's pretty much massively depends on the
> intelligence of the underlying store. If the lower level storage never
> returns accidental information ... explicit zeroes are not needed.
>
>
>
> On Wed, Oct 22, 2014 at 11:15 PM, John Griffith 
> wrote:
>
>>
>>
>> On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas 
>> wrote:
>>
>>> For LVM-thin I believe it is already disabled? It is only really
>>> needed on LVM-thick, where the returning zeros behaviour is not done.
>>>
>>> On 21 October 2014 08:29, Avishay Traeger 
>>> wrote:
>>> > I would say that wipe-on-delete is not necessary in most deployments.
>>> >
>>> > Most storage backends exhibit the following behavior:
>>> > 1. Delete volume A that has data on physical sectors 1-10
>>> > 2. Create new volume B
>>> > 3. Read from volume B before writing, which happens to map to physical
>>> > sector 5 - backend should return zeroes here, and not data from volume
>>> A
>>> >
>>> > In case the backend doesn't provide this rather standard behavior,
>>> data must
>>> > be wiped immediately.  Otherwise, the only risk is physical security,
>>> and if
>>> > that's not adequate, customers shouldn't be storing all their data
>>> there
>>> > regardless.  You could also run a periodic job to wipe deleted volumes
>>> to
>>> > reduce the window of vulnerability, without making delete_volume take a
>>> > ridiculously long time.
>>> >
>>> > Encryption is a good option as well, and of course it protects the data
>>> > before deletion as well (as long as your keys are protected...)
>>> >
>>> > Bottom line - I too think the default in devstack should be to disable
>>> this
>>> > option, and think we should consider making the default False in Cinder
>>> > itself.  This isn't the first time someone has asked why volume
>>> deletion
>>> > takes 20 minutes...
>>> >
>>> > As for queuing backup operations and managing bandwidth for various
>>> > operations, ideally this would be done with a holistic view, so that
>>> for
>>> > example Cinder operations won't interfere with Nova, or different Nova
>>> > operations won't interfere with each other, but that is probably far
>>> down
>>> > the road.
>>> >
>>> > Thanks,
>>> > Avishay
>>> >
>>> >
>>> > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen <
>>> chris.frie...@windriver.com>
>>> > wrote:
>>> >>
>>> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
>>> >>>
>>> >>> Hi Preston,
>>> >>> Replies to some of your cinder-related questions:
>>> >>> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
>>> >>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen
>>> the
>>> >>> CPU usage of cinder-api spike sometimes - not sure why.
>>> >>> 2. The 'dd' processes that you see are Cinder wiping the volumes
>>> during
>>> >>> deletion.  You can either disable this in cinder.conf, or you can
>>> use a
>>> >>> relatively new option to manage the bandwidth used for this.
>>> >>>
>>> >>> IMHO, deployments should be optimized to not do very long/intensive
>>> >>> management operations - for example, use backends with efficient
>>> >>> snapshots, use CoW operations wherever possible rather than copying
>>> full
>>> >>> volumes/images, disabling wipe on delete, etc.
>>> >>
>>> >>
>>> >> In a public-cloud environment I don't think it's reasonable to disable
>>> >> wipe-on-delete.
>>> >>
>>> >> Arguably it would be better to use encryption instead of
>>> wipe-on-delete.
>>> >> When done with the backing store, just throw away the key and it'll be
>>> >> secure enough for most purposes.
>>> >>
>>> >> Chris
>>> >>
>>> >>
>>> >>
>>> >> ___
>>> >> OpenStack-dev mailing list
>>> >> OpenStack-dev@lists.openstack.org
>>> >> h

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Duncan Thomas
On 23 October 2014 08:30, Preston L. Bannister  wrote:
> John,
>
> As a (new) OpenStack developer, I just discovered the "CINDER_SECURE_DELETE"
> option.
>
> As an *implicit* default, I entirely approve.  Production OpenStack
> installations should *absolutely* insure there is no information leakage
> from one instance to the next.
>
> As an *explicit* default, I am not so sure. Low-end storage requires you do
> this explicitly. High-end storage can insure information never leaks.
> Counting on high level storage can make the upper levels more efficient, can
> be a good thing.
>
> The debate about whether to wipe LV's pretty much massively depends on the
> intelligence of the underlying store. If the lower level storage never
> returns accidental information ... explicit zeroes are not needed.

The security requirements regarding wiping are totally and utterly
site dependent - some places care and are happy to pay the cost (some
even using an entirely pointless multi-write scrub out of historically
rooted paranoia) where as some don't care in the slightest. LVM thin
that John mentioned is no worse or better than most 'smart' arrays -
unless you happen to hit a bug, it won't return previous info.

That's a good default, if your site needs better then there are lots
of config options to go looking into for a whole variety of things,
and you should probably be doing your own security audits of the code
base and other deep analysis, as well as reading and contributing to
the security guide.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-23 Thread Preston L. Bannister
John,

As a (new) OpenStack developer, I just discovered the
"CINDER_SECURE_DELETE" option.

As an *implicit* default, I entirely approve.  Production OpenStack
installations should *absolutely* insure there is no information leakage
from one instance to the next.

As an *explicit* default, I am not so sure. Low-end storage requires you do
this explicitly. High-end storage can insure information never leaks.
Counting on high level storage can make the upper levels more efficient,
can be a good thing.

The debate about whether to wipe LV's pretty much massively depends on the
intelligence of the underlying store. If the lower level storage never
returns accidental information ... explicit zeroes are not needed.



On Wed, Oct 22, 2014 at 11:15 PM, John Griffith 
wrote:

>
>
> On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas 
> wrote:
>
>> For LVM-thin I believe it is already disabled? It is only really
>> needed on LVM-thick, where the returning zeros behaviour is not done.
>>
>> On 21 October 2014 08:29, Avishay Traeger 
>> wrote:
>> > I would say that wipe-on-delete is not necessary in most deployments.
>> >
>> > Most storage backends exhibit the following behavior:
>> > 1. Delete volume A that has data on physical sectors 1-10
>> > 2. Create new volume B
>> > 3. Read from volume B before writing, which happens to map to physical
>> > sector 5 - backend should return zeroes here, and not data from volume A
>> >
>> > In case the backend doesn't provide this rather standard behavior, data
>> must
>> > be wiped immediately.  Otherwise, the only risk is physical security,
>> and if
>> > that's not adequate, customers shouldn't be storing all their data there
>> > regardless.  You could also run a periodic job to wipe deleted volumes
>> to
>> > reduce the window of vulnerability, without making delete_volume take a
>> > ridiculously long time.
>> >
>> > Encryption is a good option as well, and of course it protects the data
>> > before deletion as well (as long as your keys are protected...)
>> >
>> > Bottom line - I too think the default in devstack should be to disable
>> this
>> > option, and think we should consider making the default False in Cinder
>> > itself.  This isn't the first time someone has asked why volume deletion
>> > takes 20 minutes...
>> >
>> > As for queuing backup operations and managing bandwidth for various
>> > operations, ideally this would be done with a holistic view, so that for
>> > example Cinder operations won't interfere with Nova, or different Nova
>> > operations won't interfere with each other, but that is probably far
>> down
>> > the road.
>> >
>> > Thanks,
>> > Avishay
>> >
>> >
>> > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen <
>> chris.frie...@windriver.com>
>> > wrote:
>> >>
>> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
>> >>>
>> >>> Hi Preston,
>> >>> Replies to some of your cinder-related questions:
>> >>> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
>> >>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
>> >>> CPU usage of cinder-api spike sometimes - not sure why.
>> >>> 2. The 'dd' processes that you see are Cinder wiping the volumes
>> during
>> >>> deletion.  You can either disable this in cinder.conf, or you can use
>> a
>> >>> relatively new option to manage the bandwidth used for this.
>> >>>
>> >>> IMHO, deployments should be optimized to not do very long/intensive
>> >>> management operations - for example, use backends with efficient
>> >>> snapshots, use CoW operations wherever possible rather than copying
>> full
>> >>> volumes/images, disabling wipe on delete, etc.
>> >>
>> >>
>> >> In a public-cloud environment I don't think it's reasonable to disable
>> >> wipe-on-delete.
>> >>
>> >> Arguably it would be better to use encryption instead of
>> wipe-on-delete.
>> >> When done with the backing store, just throw away the key and it'll be
>> >> secure enough for most purposes.
>> >>
>> >> Chris
>> >>
>> >>
>> >>
>> >> ___
>> >> OpenStack-dev mailing list
>> >> OpenStack-dev@lists.openstack.org
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >
>> > ___
>> > OpenStack-dev mailing list
>> > OpenStack-dev@lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>>
>>
>> --
>> Duncan Thomas
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> We disable this in the Gates "CINDER_SECURE_DELETE=False"
>
> ThinLVM (which hopefully will be default upon release of Kilo) doesn't
> need it because internally it returns zeros when reading unallocated blocks
> so it's a non-issue.
>
> The debate of to wipe LV's or not to is a long running issue.  The default
> behavior in Cinder is to leave it enable and IMHO

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-22 Thread John Griffith
On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas 
wrote:

> For LVM-thin I believe it is already disabled? It is only really
> needed on LVM-thick, where the returning zeros behaviour is not done.
>
> On 21 October 2014 08:29, Avishay Traeger  wrote:
> > I would say that wipe-on-delete is not necessary in most deployments.
> >
> > Most storage backends exhibit the following behavior:
> > 1. Delete volume A that has data on physical sectors 1-10
> > 2. Create new volume B
> > 3. Read from volume B before writing, which happens to map to physical
> > sector 5 - backend should return zeroes here, and not data from volume A
> >
> > In case the backend doesn't provide this rather standard behavior, data
> must
> > be wiped immediately.  Otherwise, the only risk is physical security,
> and if
> > that's not adequate, customers shouldn't be storing all their data there
> > regardless.  You could also run a periodic job to wipe deleted volumes to
> > reduce the window of vulnerability, without making delete_volume take a
> > ridiculously long time.
> >
> > Encryption is a good option as well, and of course it protects the data
> > before deletion as well (as long as your keys are protected...)
> >
> > Bottom line - I too think the default in devstack should be to disable
> this
> > option, and think we should consider making the default False in Cinder
> > itself.  This isn't the first time someone has asked why volume deletion
> > takes 20 minutes...
> >
> > As for queuing backup operations and managing bandwidth for various
> > operations, ideally this would be done with a holistic view, so that for
> > example Cinder operations won't interfere with Nova, or different Nova
> > operations won't interfere with each other, but that is probably far down
> > the road.
> >
> > Thanks,
> > Avishay
> >
> >
> > On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen <
> chris.frie...@windriver.com>
> > wrote:
> >>
> >> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
> >>>
> >>> Hi Preston,
> >>> Replies to some of your cinder-related questions:
> >>> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
> >>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
> >>> CPU usage of cinder-api spike sometimes - not sure why.
> >>> 2. The 'dd' processes that you see are Cinder wiping the volumes during
> >>> deletion.  You can either disable this in cinder.conf, or you can use a
> >>> relatively new option to manage the bandwidth used for this.
> >>>
> >>> IMHO, deployments should be optimized to not do very long/intensive
> >>> management operations - for example, use backends with efficient
> >>> snapshots, use CoW operations wherever possible rather than copying
> full
> >>> volumes/images, disabling wipe on delete, etc.
> >>
> >>
> >> In a public-cloud environment I don't think it's reasonable to disable
> >> wipe-on-delete.
> >>
> >> Arguably it would be better to use encryption instead of wipe-on-delete.
> >> When done with the backing store, just throw away the key and it'll be
> >> secure enough for most purposes.
> >>
> >> Chris
> >>
> >>
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Duncan Thomas
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

We disable this in the Gates "CINDER_SECURE_DELETE=False"

ThinLVM (which hopefully will be default upon release of Kilo) doesn't need
it because internally it returns zeros when reading unallocated blocks so
it's a non-issue.

The debate of to wipe LV's or not to is a long running issue.  The default
behavior in Cinder is to leave it enable and IMHO that's how it should
stay.  The fact is anything that might be construed as "less secure" and
has been defaulted to the "more secure" setting should be left as it is.
It's simple to turn this off.

Also, nobody seemed to mention that in the case of Cinder operations like
copy-volume and the delete process you also have the ability to set
bandwidth limits on these operations, and in the case of delete even
specify different schemes (not just enabled/disabled but other options that
may be less or more IO intensive).

For further reference checkout the config options [1]

Thanks,
John

[1]:
https://github.com/openstack/cinder/blob/master/cinder/volume/driver.py#L69
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-21 Thread Duncan Thomas
For LVM-thin I believe it is already disabled? It is only really
needed on LVM-thick, where the returning zeros behaviour is not done.

On 21 October 2014 08:29, Avishay Traeger  wrote:
> I would say that wipe-on-delete is not necessary in most deployments.
>
> Most storage backends exhibit the following behavior:
> 1. Delete volume A that has data on physical sectors 1-10
> 2. Create new volume B
> 3. Read from volume B before writing, which happens to map to physical
> sector 5 - backend should return zeroes here, and not data from volume A
>
> In case the backend doesn't provide this rather standard behavior, data must
> be wiped immediately.  Otherwise, the only risk is physical security, and if
> that's not adequate, customers shouldn't be storing all their data there
> regardless.  You could also run a periodic job to wipe deleted volumes to
> reduce the window of vulnerability, without making delete_volume take a
> ridiculously long time.
>
> Encryption is a good option as well, and of course it protects the data
> before deletion as well (as long as your keys are protected...)
>
> Bottom line - I too think the default in devstack should be to disable this
> option, and think we should consider making the default False in Cinder
> itself.  This isn't the first time someone has asked why volume deletion
> takes 20 minutes...
>
> As for queuing backup operations and managing bandwidth for various
> operations, ideally this would be done with a holistic view, so that for
> example Cinder operations won't interfere with Nova, or different Nova
> operations won't interfere with each other, but that is probably far down
> the road.
>
> Thanks,
> Avishay
>
>
> On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
> wrote:
>>
>> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
>>>
>>> Hi Preston,
>>> Replies to some of your cinder-related questions:
>>> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
>>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
>>> CPU usage of cinder-api spike sometimes - not sure why.
>>> 2. The 'dd' processes that you see are Cinder wiping the volumes during
>>> deletion.  You can either disable this in cinder.conf, or you can use a
>>> relatively new option to manage the bandwidth used for this.
>>>
>>> IMHO, deployments should be optimized to not do very long/intensive
>>> management operations - for example, use backends with efficient
>>> snapshots, use CoW operations wherever possible rather than copying full
>>> volumes/images, disabling wipe on delete, etc.
>>
>>
>> In a public-cloud environment I don't think it's reasonable to disable
>> wipe-on-delete.
>>
>> Arguably it would be better to use encryption instead of wipe-on-delete.
>> When done with the backing store, just throw away the key and it'll be
>> secure enough for most purposes.
>>
>> Chris
>>
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Duncan Thomas

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-21 Thread Avishay Traeger
I would say that wipe-on-delete is not necessary in most deployments.

Most storage backends exhibit the following behavior:
1. Delete volume A that has data on physical sectors 1-10
2. Create new volume B
3. Read from volume B before writing, which happens to map to physical
sector 5 - backend should return zeroes here, and not data from volume A

In case the backend doesn't provide this rather standard behavior, data
must be wiped immediately.  Otherwise, the only risk is physical security,
and if that's not adequate, customers shouldn't be storing all their data
there regardless.  You could also run a periodic job to wipe deleted
volumes to reduce the window of vulnerability, without making delete_volume
take a ridiculously long time.

Encryption is a good option as well, and of course it protects the data
before deletion as well (as long as your keys are protected...)

Bottom line - I too think the default in devstack should be to disable this
option, and think we should consider making the default False in Cinder
itself.  This isn't the first time someone has asked why volume deletion
takes 20 minutes...

As for queuing backup operations and managing bandwidth for various
operations, ideally this would be done with a holistic view, so that for
example Cinder operations won't interfere with Nova, or different Nova
operations won't interfere with each other, but that is probably far down
the road.

Thanks,
Avishay


On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen 
wrote:

> On 10/19/2014 09:33 AM, Avishay Traeger wrote:
>
>> Hi Preston,
>> Replies to some of your cinder-related questions:
>> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are
>> you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
>> CPU usage of cinder-api spike sometimes - not sure why.
>> 2. The 'dd' processes that you see are Cinder wiping the volumes during
>> deletion.  You can either disable this in cinder.conf, or you can use a
>> relatively new option to manage the bandwidth used for this.
>>
>> IMHO, deployments should be optimized to not do very long/intensive
>> management operations - for example, use backends with efficient
>> snapshots, use CoW operations wherever possible rather than copying full
>> volumes/images, disabling wipe on delete, etc.
>>
>
> In a public-cloud environment I don't think it's reasonable to disable
> wipe-on-delete.
>
> Arguably it would be better to use encryption instead of wipe-on-delete.
> When done with the backing store, just throw away the key and it'll be
> secure enough for most purposes.
>
> Chris
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-20 Thread Chris Friesen

On 10/19/2014 09:33 AM, Avishay Traeger wrote:

Hi Preston,
Replies to some of your cinder-related questions:
1. Creating a snapshot isn't usually an I/O intensive operation.  Are
you seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the
CPU usage of cinder-api spike sometimes - not sure why.
2. The 'dd' processes that you see are Cinder wiping the volumes during
deletion.  You can either disable this in cinder.conf, or you can use a
relatively new option to manage the bandwidth used for this.

IMHO, deployments should be optimized to not do very long/intensive
management operations - for example, use backends with efficient
snapshots, use CoW operations wherever possible rather than copying full
volumes/images, disabling wipe on delete, etc.


In a public-cloud environment I don't think it's reasonable to disable 
wipe-on-delete.


Arguably it would be better to use encryption instead of wipe-on-delete. 
 When done with the backing store, just throw away the key and it'll be 
secure enough for most purposes.


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Preston L. Bannister
Jay,

Thanks very much for the insight and links. In fact, I have visited
*almost* all the places mentioned, prior. Added clarity is good. :)

Also, to your earlier comment (to an earlier thread) about backup not
really belonging in Nova - in main I agree. The "backup" API belongs in
Nova (as this maps cleanly to the equivalent in AWS), but the bulk of the
implementation can and should be distinct (in my opinion).

My current work is at:
https://github.com/dreadedhill-work/stack-backup

I also have matching changes to Nova and the Nova client under the same
Github account.

Please note this is very much a work in progress (as you might guess from
my prior comments). This needs a longer proper write up, and a cleaner Git
history. The code is a pretty fair ways along, but should be considered
more a rough draft, rather than a final version.

For the next few weeks, I am enormously crunched for time, as I have
promised a PoC at a site with a very large OpenStack deployment.

Noted your suggestion about the Rally team. Might be a bit before I can
pursue. :)

Again, Thanks.





On Sun, Oct 19, 2014 at 10:13 AM, Jay Pipes  wrote:

> Hi Preston, some great questions in here. Some comments inline, but tl;dr
> my answer is "yes, we need to be doing a much better job thinking about how
> I/O intensive operations affect other things running on providers of
> compute and block storage resources"
>
> On 10/19/2014 06:41 AM, Preston L. Bannister wrote:
>
>> OK, I am fairly new here (to OpenStack). Maybe I am missing something.
>> Or not.
>>
>> Have a DevStack, running in a VM (VirtualBox), backed by a single flash
>> drive (on my current generation MacBook). Could be I have something off
>> in my setup.
>>
>> Testing nova backup - first the existing implementation, then my (much
>> changed) replacement.
>>
>> Simple scripts for testing. Create images. Create instances (five). Run
>> backup on all instances.
>>
>> Currently found in:
>> https://github.com/dreadedhill-work/stack-backup/
>> tree/master/backup-scripts
>>
>> First time I started backups of all (five) instances, load on the
>> Devstack VM went insane, and all but one backup failed. Seems that all
>> of the backups were performed immediately (or attempted), without any
>> sort of queuing or load management. Huh. Well, maybe just the backup
>> implementation is naive...
>>
>
> Yes, you are exactly correct. There is no queuing behaviour for any of the
> "backup" operations (I put "backup" operations in quotes because IMO it is
> silly to refer to them as backup operations, since all they are doing
> really is a snapshot action against the instance/volume -- and then
> attempting to be a poor man's cloud cron).
>
> The backup is initiated from the admin_actions API extension here:
>
> https://github.com/openstack/nova/blob/master/nova/api/
> openstack/compute/contrib/admin_actions.py#L297
>
> which calls the nova.compute.api.API.backup() method here:
>
> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031
>
> which, after creating some image metadata in Glance for the snapshot,
> calls the compute RPC API here:
>
> https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759
>
> Which sends an RPC asynchronous message to the compute node to execute the
> instance snapshot and "rotate backups":
>
> https://github.com/openstack/nova/blob/master/nova/compute/
> manager.py#L2969
>
> That method eventually calls the blocking snapshot() operation on the virt
> driver:
>
> https://github.com/openstack/nova/blob/master/nova/compute/
> manager.py#L3041
>
> And it is the nova.virt.libvirt.Driver.snapshot() method that is quite
> "icky", with lots of logic to determine the type of snapshot to do and how
> to do it:
>
> https://github.com/openstack/nova/blob/master/nova/virt/
> libvirt/driver.py#L1607
>
> The gist of the driver's snapshot() method calls ImageBackend.snapshot(),
> which is responsible for doing the actual snapshot of the instance:
>
> https://github.com/openstack/nova/blob/master/nova/virt/
> libvirt/driver.py#L1685
>
> and then once the snapshot is done, the method calls to the Glance API to
> upload the snapshotted disk image to Glance:
>
> https://github.com/openstack/nova/blob/master/nova/virt/
> libvirt/driver.py#L1730-L1734
>
> All of which is I/O intensive and AFAICT, mostly done in a blocking
> manner, with no queuing or traffic control measures, so as you correctly
> point out, if the compute node daemon receives 5 backup requests, it will
> go ahead and do 5 snapshot operations and 5 uploads to Glance all as fast
> as it can. It will do it in 5 different eventlet greenthreads, but there
> are no designs in place to prioritize the snapshotting I/O lower than
> active VM I/O.
>
>  I will write on this at greater length, but backup should interfere as
>> little as possible with foreground processing. Overloading a host is
>> entirely unacceptable.
>>
>
> Agree with you completely.
>
>  Replaced the backup implementatio

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Jay Pipes
Hi Preston, some great questions in here. Some comments inline, but 
tl;dr my answer is "yes, we need to be doing a much better job thinking 
about how I/O intensive operations affect other things running on 
providers of compute and block storage resources"


On 10/19/2014 06:41 AM, Preston L. Bannister wrote:

OK, I am fairly new here (to OpenStack). Maybe I am missing something.
Or not.

Have a DevStack, running in a VM (VirtualBox), backed by a single flash
drive (on my current generation MacBook). Could be I have something off
in my setup.

Testing nova backup - first the existing implementation, then my (much
changed) replacement.

Simple scripts for testing. Create images. Create instances (five). Run
backup on all instances.

Currently found in:
https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts

First time I started backups of all (five) instances, load on the
Devstack VM went insane, and all but one backup failed. Seems that all
of the backups were performed immediately (or attempted), without any
sort of queuing or load management. Huh. Well, maybe just the backup
implementation is naive...


Yes, you are exactly correct. There is no queuing behaviour for any of 
the "backup" operations (I put "backup" operations in quotes because IMO 
it is silly to refer to them as backup operations, since all they are 
doing really is a snapshot action against the instance/volume -- and 
then attempting to be a poor man's cloud cron).


The backup is initiated from the admin_actions API extension here:

https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/admin_actions.py#L297

which calls the nova.compute.api.API.backup() method here:

https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031

which, after creating some image metadata in Glance for the snapshot, 
calls the compute RPC API here:


https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759

Which sends an RPC asynchronous message to the compute node to execute 
the instance snapshot and "rotate backups":


https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2969

That method eventually calls the blocking snapshot() operation on the 
virt driver:


https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3041

And it is the nova.virt.libvirt.Driver.snapshot() method that is quite 
"icky", with lots of logic to determine the type of snapshot to do and 
how to do it:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1607

The gist of the driver's snapshot() method calls 
ImageBackend.snapshot(), which is responsible for doing the actual 
snapshot of the instance:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1685

and then once the snapshot is done, the method calls to the Glance API 
to upload the snapshotted disk image to Glance:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1730-L1734

All of which is I/O intensive and AFAICT, mostly done in a blocking 
manner, with no queuing or traffic control measures, so as you correctly 
point out, if the compute node daemon receives 5 backup requests, it 
will go ahead and do 5 snapshot operations and 5 uploads to Glance all 
as fast as it can. It will do it in 5 different eventlet greenthreads, 
but there are no designs in place to prioritize the snapshotting I/O 
lower than active VM I/O.



I will write on this at greater length, but backup should interfere as
little as possible with foreground processing. Overloading a host is
entirely unacceptable.


Agree with you completely.


Replaced the backup implementation so it does proper queuing (among
other things). Iterating forward - implementing and testing.


Is this code up somewhere we can take a look at?


Fired off snapshots on five Cinder volumes (attached to five instances).
Again the load shot very high. Huh. Well, in a full-scale OpenStack
setup, maybe storage can handle that much I/O more gracefully ... or
not. Again, should taking snapshots interfere with foreground activity?
I would say, most often not. Queuing and serializing snapshots would
strictly limit the interference with foreground. Also, very high end
storage can perform snapshots *very* quickly, so serialized snapshots
will not be slow. My take is that the default behavior should be to
queue and serialize all heavy I/O operations, with non-default
allowances for limited concurrency.

Cleaned up (which required reboot/unstack/stack and more). Tried again.

Ran two test backups (which in the current iteration create Cinder
volume snapshots). Asked Cinder to delete the snapshots. Again, very
high load factors, and in "top" I can see two long-running "dd"
processes. (Given I have a single disk, more than one "dd" is not good.)

Running too many heavyweight operations against storage can lead to
thrashing. Queuing can strictly limit that load, and insure better and
reliable performance

Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Preston L. Bannister
Avishay,

Thanks for the tip on [cinder.conf] volume_clear. The corresponding option
in devstack is CINDER_SECURE_DELETE=False.

Also I *may* have been bitten by the related bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1023755

(All I know at this point is the devstack VM became unresponsive - have not
yet identified the cause. But the symptoms fit.)

Not sure if there are spikes on Cinder snapshot creation. Perhaps not. (Too
many different failures and oddities. Have not sorted all, yet.)

I am of the opinion CINDER_SECURE_DELETE=False should be a default for
devstack. Especially as it invokes bug-like behavior.

Also, unbounded concurrent "dd" operations is not a good idea. (Which is
generally what you meant, I believe.)

Onwards



On Sun, Oct 19, 2014 at 8:33 AM, Avishay Traeger 
wrote:

> Hi Preston,
> Replies to some of your cinder-related questions:
> 1. Creating a snapshot isn't usually an I/O intensive operation.  Are you
> seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the CPU
> usage of cinder-api spike sometimes - not sure why.
> 2. The 'dd' processes that you see are Cinder wiping the volumes during
> deletion.  You can either disable this in cinder.conf, or you can use a
> relatively new option to manage the bandwidth used for this.
>
> IMHO, deployments should be optimized to not do very long/intensive
> management operations - for example, use backends with efficient snapshots,
> use CoW operations wherever possible rather than copying full
> volumes/images, disabling wipe on delete, etc.
>
> Thanks,
> Avishay
>
> On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister <
> pres...@bannister.us> wrote:
>
>> OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or
>> not.
>>
>> Have a DevStack, running in a VM (VirtualBox), backed by a single flash
>> drive (on my current generation MacBook). Could be I have something off in
>> my setup.
>>
>> Testing nova backup - first the existing implementation, then my (much
>> changed) replacement.
>>
>> Simple scripts for testing. Create images. Create instances (five). Run
>> backup on all instances.
>>
>> Currently found in:
>>
>> https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts
>>
>> First time I started backups of all (five) instances, load on the
>> Devstack VM went insane, and all but one backup failed. Seems that all of
>> the backups were performed immediately (or attempted), without any sort of
>> queuing or load management. Huh. Well, maybe just the backup implementation
>> is naive...
>>
>> I will write on this at greater length, but backup should interfere as
>> little as possible with foreground processing. Overloading a host is
>> entirely unacceptable.
>>
>> Replaced the backup implementation so it does proper queuing (among other
>> things). Iterating forward - implementing and testing.
>>
>> Fired off snapshots on five Cinder volumes (attached to five instances).
>> Again the load shot very high. Huh. Well, in a full-scale OpenStack setup,
>> maybe storage can handle that much I/O more gracefully ... or not. Again,
>> should taking snapshots interfere with foreground activity? I would say,
>> most often not. Queuing and serializing snapshots would strictly limit the
>> interference with foreground. Also, very high end storage can perform
>> snapshots *very* quickly, so serialized snapshots will not be slow. My take
>> is that the default behavior should be to queue and serialize all heavy I/O
>> operations, with non-default allowances for limited concurrency.
>>
>> Cleaned up (which required reboot/unstack/stack and more). Tried again.
>>
>> Ran two test backups (which in the current iteration create Cinder volume
>> snapshots). Asked Cinder to delete the snapshots. Again, very high load
>> factors, and in "top" I can see two long-running "dd" processes. (Given I
>> have a single disk, more than one "dd" is not good.)
>>
>> Running too many heavyweight operations against storage can lead to
>> thrashing. Queuing can strictly limit that load, and insure better and
>> reliable performance. I am not seeing evidence of this thought in my
>> OpenStack testing.
>>
>> So far it looks like there is no thought to managing the impact of disk
>> intensive management operations. Am I missing something?
>>
>>
>>
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Avishay Traeger
Hi Preston,
Replies to some of your cinder-related questions:
1. Creating a snapshot isn't usually an I/O intensive operation.  Are you
seeing I/O spike or CPU?  If you're seeing CPU load, I've seen the CPU
usage of cinder-api spike sometimes - not sure why.
2. The 'dd' processes that you see are Cinder wiping the volumes during
deletion.  You can either disable this in cinder.conf, or you can use a
relatively new option to manage the bandwidth used for this.

IMHO, deployments should be optimized to not do very long/intensive
management operations - for example, use backends with efficient snapshots,
use CoW operations wherever possible rather than copying full
volumes/images, disabling wipe on delete, etc.

Thanks,
Avishay

On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister 
wrote:

> OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or
> not.
>
> Have a DevStack, running in a VM (VirtualBox), backed by a single flash
> drive (on my current generation MacBook). Could be I have something off in
> my setup.
>
> Testing nova backup - first the existing implementation, then my (much
> changed) replacement.
>
> Simple scripts for testing. Create images. Create instances (five). Run
> backup on all instances.
>
> Currently found in:
> https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts
>
> First time I started backups of all (five) instances, load on the Devstack
> VM went insane, and all but one backup failed. Seems that all of the
> backups were performed immediately (or attempted), without any sort of
> queuing or load management. Huh. Well, maybe just the backup implementation
> is naive...
>
> I will write on this at greater length, but backup should interfere as
> little as possible with foreground processing. Overloading a host is
> entirely unacceptable.
>
> Replaced the backup implementation so it does proper queuing (among other
> things). Iterating forward - implementing and testing.
>
> Fired off snapshots on five Cinder volumes (attached to five instances).
> Again the load shot very high. Huh. Well, in a full-scale OpenStack setup,
> maybe storage can handle that much I/O more gracefully ... or not. Again,
> should taking snapshots interfere with foreground activity? I would say,
> most often not. Queuing and serializing snapshots would strictly limit the
> interference with foreground. Also, very high end storage can perform
> snapshots *very* quickly, so serialized snapshots will not be slow. My take
> is that the default behavior should be to queue and serialize all heavy I/O
> operations, with non-default allowances for limited concurrency.
>
> Cleaned up (which required reboot/unstack/stack and more). Tried again.
>
> Ran two test backups (which in the current iteration create Cinder volume
> snapshots). Asked Cinder to delete the snapshots. Again, very high load
> factors, and in "top" I can see two long-running "dd" processes. (Given I
> have a single disk, more than one "dd" is not good.)
>
> Running too many heavyweight operations against storage can lead to
> thrashing. Queuing can strictly limit that load, and insure better and
> reliable performance. I am not seeing evidence of this thought in my
> OpenStack testing.
>
> So far it looks like there is no thought to managing the impact of disk
> intensive management operations. Am I missing something?
>
>
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?

2014-10-19 Thread Preston L. Bannister
OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or
not.

Have a DevStack, running in a VM (VirtualBox), backed by a single flash
drive (on my current generation MacBook). Could be I have something off in
my setup.

Testing nova backup - first the existing implementation, then my (much
changed) replacement.

Simple scripts for testing. Create images. Create instances (five). Run
backup on all instances.

Currently found in:
https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts

First time I started backups of all (five) instances, load on the Devstack
VM went insane, and all but one backup failed. Seems that all of the
backups were performed immediately (or attempted), without any sort of
queuing or load management. Huh. Well, maybe just the backup implementation
is naive...

I will write on this at greater length, but backup should interfere as
little as possible with foreground processing. Overloading a host is
entirely unacceptable.

Replaced the backup implementation so it does proper queuing (among other
things). Iterating forward - implementing and testing.

Fired off snapshots on five Cinder volumes (attached to five instances).
Again the load shot very high. Huh. Well, in a full-scale OpenStack setup,
maybe storage can handle that much I/O more gracefully ... or not. Again,
should taking snapshots interfere with foreground activity? I would say,
most often not. Queuing and serializing snapshots would strictly limit the
interference with foreground. Also, very high end storage can perform
snapshots *very* quickly, so serialized snapshots will not be slow. My take
is that the default behavior should be to queue and serialize all heavy I/O
operations, with non-default allowances for limited concurrency.

Cleaned up (which required reboot/unstack/stack and more). Tried again.

Ran two test backups (which in the current iteration create Cinder volume
snapshots). Asked Cinder to delete the snapshots. Again, very high load
factors, and in "top" I can see two long-running "dd" processes. (Given I
have a single disk, more than one "dd" is not good.)

Running too many heavyweight operations against storage can lead to
thrashing. Queuing can strictly limit that load, and insure better and
reliable performance. I am not seeing evidence of this thought in my
OpenStack testing.

So far it looks like there is no thought to managing the impact of disk
intensive management operations. Am I missing something?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev