Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-15 Thread Andrew Laski


On 08/15/2014 11:43 AM, Matthew Booth wrote:

On 15/08/14 16:12, Andrew Laski wrote:

On 08/08/2014 08:42 AM, Nikola Đipanov wrote:

On 08/06/2014 07:54 PM, Jay Pipes wrote:

I bring this up on the mailing list because I think Liyi's patch offers
an interesting future direction to the way that we think about our retry
approach in Nova. Instead of having hard-coded or configurable interval
times, I think Liyi's approach of calculating the interval length based
on some input values is a good direction to take.


This indeed is a problem that we've seen bite us a number of times, and
I tried to solve it by proposing [1] but didn't get to work on it
further yet.

Having said that - after thinking about it more, I was not sure I like
my own approach in [1] on the grounds of it being too generic (and
overly elaborate) for the particular problem it is solving.

I was then thinking of something similar to what is proposed here, where
we would have a waiting time that is a function of a value that we could
query Cinder on. Allocation rate proposed here seems to fit this nicely,
but in my mind we would have a way to query cinder about it instead of
hardcoding it, however this can be done later and in cooperation with
the Cinder team.

I like this direction a lot, because the allocation time depends on
Cinder and its backend more than anything.  Excepting perhaps image
transfer speeds.


2) We should deprecate the CONF.block_device_allocate_retries_interval
option only, and keep the CONF.block_device_allocate_retries
configuration option as-is, changing the help text to read something
like "Max number of retries. We calculate the interval of the retry
based on the size of the volume."


I'd go with this as the number of retries can still be useful as a tool
for easy workaround and troubleshooting, but I'd put a big disclaimer
that it is mostly meant for debugging/workaround purposes.

I disagree a bit with having a knob purely for debugging/workaround.  I
think this is a place where it's very useful to have the knob though.
The figures in the review seem sound, but this is probably not a place
where one size is going to fit all.  Until/unless we can get some
coordination between Nova and Cinder on this I think having a knob that
is intended to be used is the right way to go.

FYI, this bug is admittedly only from code inspection, but I think many
interval config variables are currently broken in Nova:

https://bugs.launchpad.net/nova/+bug/1354403

If they've never been used, they're presumably of limited value.


Fair point.  Though it's likely they'll be fixed as soon as someone 
starts tuning them and finds them broken.  And it'll be easier to fix 
them than to remove them and add them in later.  Which is not to say 
that all of them are useful.


But the two values in question here are ones I've been tuning recently 
and know that they work.




Matt


N.

[1] https://review.openstack.org/#/c/87546/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-15 Thread Matthew Booth
On 15/08/14 16:12, Andrew Laski wrote:
> 
> On 08/08/2014 08:42 AM, Nikola Đipanov wrote:
>> On 08/06/2014 07:54 PM, Jay Pipes wrote:
>>> I bring this up on the mailing list because I think Liyi's patch offers
>>> an interesting future direction to the way that we think about our retry
>>> approach in Nova. Instead of having hard-coded or configurable interval
>>> times, I think Liyi's approach of calculating the interval length based
>>> on some input values is a good direction to take.
>>>
>> This indeed is a problem that we've seen bite us a number of times, and
>> I tried to solve it by proposing [1] but didn't get to work on it
>> further yet.
>>
>> Having said that - after thinking about it more, I was not sure I like
>> my own approach in [1] on the grounds of it being too generic (and
>> overly elaborate) for the particular problem it is solving.
>>
>> I was then thinking of something similar to what is proposed here, where
>> we would have a waiting time that is a function of a value that we could
>> query Cinder on. Allocation rate proposed here seems to fit this nicely,
>> but in my mind we would have a way to query cinder about it instead of
>> hardcoding it, however this can be done later and in cooperation with
>> the Cinder team.
> 
> I like this direction a lot, because the allocation time depends on
> Cinder and its backend more than anything.  Excepting perhaps image
> transfer speeds.
> 
>>> 2) We should deprecate the CONF.block_device_allocate_retries_interval
>>> option only, and keep the CONF.block_device_allocate_retries
>>> configuration option as-is, changing the help text to read something
>>> like "Max number of retries. We calculate the interval of the retry
>>> based on the size of the volume."
>>>
>> I'd go with this as the number of retries can still be useful as a tool
>> for easy workaround and troubleshooting, but I'd put a big disclaimer
>> that it is mostly meant for debugging/workaround purposes.
> 
> I disagree a bit with having a knob purely for debugging/workaround.  I
> think this is a place where it's very useful to have the knob though. 
> The figures in the review seem sound, but this is probably not a place
> where one size is going to fit all.  Until/unless we can get some
> coordination between Nova and Cinder on this I think having a knob that
> is intended to be used is the right way to go.

FYI, this bug is admittedly only from code inspection, but I think many
interval config variables are currently broken in Nova:

https://bugs.launchpad.net/nova/+bug/1354403

If they've never been used, they're presumably of limited value.

Matt

> 
>> N.
>>
>> [1] https://review.openstack.org/#/c/87546/
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-15 Thread Andrew Laski


On 08/08/2014 08:42 AM, Nikola Đipanov wrote:

On 08/06/2014 07:54 PM, Jay Pipes wrote:

I bring this up on the mailing list because I think Liyi's patch offers
an interesting future direction to the way that we think about our retry
approach in Nova. Instead of having hard-coded or configurable interval
times, I think Liyi's approach of calculating the interval length based
on some input values is a good direction to take.


This indeed is a problem that we've seen bite us a number of times, and
I tried to solve it by proposing [1] but didn't get to work on it
further yet.

Having said that - after thinking about it more, I was not sure I like
my own approach in [1] on the grounds of it being too generic (and
overly elaborate) for the particular problem it is solving.

I was then thinking of something similar to what is proposed here, where
we would have a waiting time that is a function of a value that we could
query Cinder on. Allocation rate proposed here seems to fit this nicely,
but in my mind we would have a way to query cinder about it instead of
hardcoding it, however this can be done later and in cooperation with
the Cinder team.


I like this direction a lot, because the allocation time depends on 
Cinder and its backend more than anything.  Excepting perhaps image 
transfer speeds.



2) We should deprecate the CONF.block_device_allocate_retries_interval
option only, and keep the CONF.block_device_allocate_retries
configuration option as-is, changing the help text to read something
like "Max number of retries. We calculate the interval of the retry
based on the size of the volume."


I'd go with this as the number of retries can still be useful as a tool
for easy workaround and troubleshooting, but I'd put a big disclaimer
that it is mostly meant for debugging/workaround purposes.


I disagree a bit with having a knob purely for debugging/workaround.  I 
think this is a place where it's very useful to have the knob though.  
The figures in the review seem sound, but this is probably not a place 
where one size is going to fit all.  Until/unless we can get some 
coordination between Nova and Cinder on this I think having a knob that 
is intended to be used is the right way to go.



N.

[1] https://review.openstack.org/#/c/87546/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-13 Thread Liyi Meng

Hi Nikola,

Thanks a lot for the input! May I kindly invite you to review the change 
as well?


BR/Liyi

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-08 Thread Nikola Đipanov
On 08/06/2014 07:54 PM, Jay Pipes wrote:
> I bring this up on the mailing list because I think Liyi's patch offers
> an interesting future direction to the way that we think about our retry
> approach in Nova. Instead of having hard-coded or configurable interval
> times, I think Liyi's approach of calculating the interval length based
> on some input values is a good direction to take.
> 

This indeed is a problem that we've seen bite us a number of times, and
I tried to solve it by proposing [1] but didn't get to work on it
further yet.

Having said that - after thinking about it more, I was not sure I like
my own approach in [1] on the grounds of it being too generic (and
overly elaborate) for the particular problem it is solving.

I was then thinking of something similar to what is proposed here, where
we would have a waiting time that is a function of a value that we could
query Cinder on. Allocation rate proposed here seems to fit this nicely,
but in my mind we would have a way to query cinder about it instead of
hardcoding it, however this can be done later and in cooperation with
the Cinder team.

>
> 2) We should deprecate the CONF.block_device_allocate_retries_interval
> option only, and keep the CONF.block_device_allocate_retries
> configuration option as-is, changing the help text to read something
> like "Max number of retries. We calculate the interval of the retry
> based on the size of the volume."
>

I'd go with this as the number of retries can still be useful as a tool
for easy workaround and troubleshooting, but I'd put a big disclaimer
that it is mostly meant for debugging/workaround purposes.

N.

[1] https://review.openstack.org/#/c/87546/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-08 Thread Liyi Meng
Hi John, 

I have some comments as well. see blow :) 


_On 6 August 2014 18:54, Jay Pipes  
wrote:
> So, Liyi Meng has an interesting patch up for Nova:
>
> https://review.openstack.org/#/c/104876
>
> 1) We should just deprecate both the options, with a note in the option help
> text that these options are not used when volume size is not 0, and that the
> interval is calculated based on volume size

This feels bad.

Liyi: Not worse than having these two options in action actually. Assuming you 
know you have volume size vary from 1G to 64G in your OPS deployment. To create 
volumes from these images, time consuming is from 30s to 30 x 64s with your 
storage
backend. How will you choose the right value for 
block_device_allocate_retries_interval
and block_device_allocate_retries? What we know for sure is that you have to 
make 
  block_device_allocate_retries_interval * block_device_allocate_retries >= 30 
x 64 seconds

Lets say you have decided to choose: 
block_device_allocate_retries_interval  = 30 
block_device_allocate_retries = 64

This is obviously optimized for image with 64G size, what happen if a 1G image 
is used? 
The worse scenario is that the image is ready at the 30th second and you have 
done your status 
pulling at the 29th second, so your next query will come in the 59th second. 
Your VM booting up 
process therefore is slower 29 seconds than expected!

If applying my algorithm, the calculated interval is  int(30 /60 + 1) = 1 
second. i.e. if you boot up your 
1G VM, you waste less 1 second in waiting it comes up!  1 second v.s. 29 
seconds! This is big difference!

If booting a 64G VM, the figure should be similar to the hard coded value 
above, but for end user, if you know 
you have to wait for more than 32 minutes to get your VM up, will you care if 
you have wasted half 
of a minute somewhere?  

> 2) We should deprecate the CONF.block_device_allocate_retries_interval
> option only, and keep the CONF.block_device_allocate_retries configuration
> option as-is, changing the help text to read something like "Max number of
> retries. We calculate the interval of the retry based on the size of the
> volume."

What about a slight modification to (2)...

Liyi: If we do want to go in this option, block_device_allocate_retries is the 
one to keep. 
I don't have strong opinion here. My two cents is to prefer the option 1 above. 
This option will 
keep block_device_allocate_retries. IMHO, not a wise idea. When we have too 
many 
configuration options, OPS becomes difficult to use for operator, get less 
widely used. 
 If there is an advanced user there want high level customization, they would 
dig into 
source code. OPS is in python, not in C code. handling source is not huge 
different from
 handling the config. In someway, it is even more straightforward. 

3) CONF.block_device_allocate_retries_interval=-1 means calculate
using volume size, and we make it the default, so people can still
override it if they want to. But we also deprecate the option with a
view of removing it during Kilo? Move
CONF.block_device_allocate_retries as max retries.

> I bring this up on the mailing list because I think Liyi's patch offers an
> interesting future direction to the way that we think about our retry
> approach in Nova. Instead of having hard-coded or configurable interval
> times, I think Liyi's approach of calculating the interval length based on
> some input values is a good direction to take.

Seems like the right direction.

But I do worry that its quite dependent on the storage backend.
Sometimes the volume create is almost "free" regardless of the volume
size (with certain types of CoW). So maybe we end up needing some kind
of scaling factor on the weights. I kinda hope I am over thinking
that, and in reality it all works fine. I suspect that is the case.

Liyi:  agree here, and in my implementation I have covered this consideration 
as your see.
e.g. when mapping a 1G image into 64G volume, I call l this function with 
passing in 1G of size, 
not 64G. 


Thanks,
John___
From: Liyi Meng
Sent: Thursday, 07 August 2014 10:09 AM
To: Michael Still; OpenStack Development Mailing List (not for usage questions)
Subject: RE: [openstack-dev] [nova] Deprecating 
CONF.block_device_allocate_retries_interval

Hi Michael,

Not sure if I am getting your right. I think your proposal doesn't not perform 
well in reality.

Firstly, it is difficult to guess a good time that fix all problems, except  
you take "forever". Just take the volume creation in my bugfix as example 
(https://review.openstack.org/#/c/104876/). If a couple of large volumes are 
requested to create at the same time toward a fast storage backend, it would 
take a long time for each to create. It is quite normal to see it takes more 
than an hour to create a volume from a 60G image. That is why I prop

Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-07 Thread John Garbutt
On 6 August 2014 18:54, Jay Pipes  wrote:
> So, Liyi Meng has an interesting patch up for Nova:
>
> https://review.openstack.org/#/c/104876
>
> 1) We should just deprecate both the options, with a note in the option help
> text that these options are not used when volume size is not 0, and that the
> interval is calculated based on volume size

This feels bad.

> 2) We should deprecate the CONF.block_device_allocate_retries_interval
> option only, and keep the CONF.block_device_allocate_retries configuration
> option as-is, changing the help text to read something like "Max number of
> retries. We calculate the interval of the retry based on the size of the
> volume."

What about a slight modification to (2)...

3) CONF.block_device_allocate_retries_interval=-1 means calculate
using volume size, and we make it the default, so people can still
override it if they want to. But we also deprecate the option with a
view of removing it during Kilo? Move
CONF.block_device_allocate_retries as max retries.

> I bring this up on the mailing list because I think Liyi's patch offers an
> interesting future direction to the way that we think about our retry
> approach in Nova. Instead of having hard-coded or configurable interval
> times, I think Liyi's approach of calculating the interval length based on
> some input values is a good direction to take.

Seems like the right direction.

But I do worry that its quite dependent on the storage backend.
Sometimes the volume create is almost "free" regardless of the volume
size (with certain types of CoW). So maybe we end up needing some kind
of scaling factor on the weights. I kinda hope I am over thinking
that, and in reality it all works fine. I suspect that is the case.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-07 Thread Liyi Meng
Hi Michael, 

Not sure if I am getting your right. I think your proposal doesn't not perform 
well in reality. 

Firstly, it is difficult to guess a good time that fix all problems, except  
you take "forever". Just take the volume creation in my bugfix as example 
(https://review.openstack.org/#/c/104876/). If a couple of large volumes are 
requested to create at the same time toward a fast storage backend, it would 
take a long time for each to create. It is quite normal to see it takes more 
than an hour to create a volume from a 60G image. That is why I proposal we 
need to "guess" a total timeout base on image size in the bugfix.  

Secondly, are you suggesting Eventlet sleep for "15 minute" then check the 
result of operation, without doing anything in between? IMHO, this would be 
very bad experience for end user! Because they ALWAYS have to wait for 15 
minutes to move on regardless what operation they have issued. 

BR/Liyi 

 

From: mikalst...@gmail.com [mikalst...@gmail.com] on behalf of Michael Still 
[mi...@stillhq.com]
Sent: Wednesday, 06 August 2014 10:57 PM
To: OpenStack Development Mailing List (not for usage questions)
Cc: Liyi Meng
Subject: Re: [openstack-dev] [nova] Deprecating 
CONF.block_device_allocate_retries_interval

Maybe we should change how we wait?

I get that we don't want to sit around forever, but perhaps we should
specify a total maximum time to wait instead of a number of iterations
of a loop? Something like "15 minutes should be long enough for
anyone!". Eventlet sleeps are also pretty cheap, so having a bigger
sleep time inside them just means that we overshoot more than we would
otherwise.

Michael

On Thu, Aug 7, 2014 at 3:54 AM, Jay Pipes  wrote:
> Hi Stackers!
>
> So, Liyi Meng has an interesting patch up for Nova:
>
> https://review.openstack.org/#/c/104876
>
> that changes the way that the interval and number of retries is calculated
> for a piece of code that waits around for a block device mapping to become
> active.
>
> Namely, the patch discards the value of the following configuration options
> *if the volume size is not 0* (which is a typical case):
>
> * CONF.block_device_allocate_retries_interval
> * CONF.block_device_allocate_retries
>
> and in their place, instead uses a hard-coded 60 max number of retries and
> calculates a more "appropriate" interval by looking at the size of the
> volume to be created. The algorithm uses the sensible idea that it will take
> longer to allocate larger volumes than smaller volumes, and therefore the
> interval time for larger volumes should be longer than smaller ones.
>
> So... here's the question: since this code essentially makes the above two
> configuration options obselete for the majority of cases (where volume size
> is not 0), should we do one of the following?
>
> 1) We should just deprecate both the options, with a note in the option help
> text that these options are not used when volume size is not 0, and that the
> interval is calculated based on volume size
>
> or
>
> 2) We should deprecate the CONF.block_device_allocate_retries_interval
> option only, and keep the CONF.block_device_allocate_retries configuration
> option as-is, changing the help text to read something like "Max number of
> retries. We calculate the interval of the retry based on the size of the
> volume."
>
> I bring this up on the mailing list because I think Liyi's patch offers an
> interesting future direction to the way that we think about our retry
> approach in Nova. Instead of having hard-coded or configurable interval
> times, I think Liyi's approach of calculating the interval length based on
> some input values is a good direction to take.
>
> Thoughts?
>
> -jay
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-06 Thread Michael Still
Maybe we should change how we wait?

I get that we don't want to sit around forever, but perhaps we should
specify a total maximum time to wait instead of a number of iterations
of a loop? Something like "15 minutes should be long enough for
anyone!". Eventlet sleeps are also pretty cheap, so having a bigger
sleep time inside them just means that we overshoot more than we would
otherwise.

Michael

On Thu, Aug 7, 2014 at 3:54 AM, Jay Pipes  wrote:
> Hi Stackers!
>
> So, Liyi Meng has an interesting patch up for Nova:
>
> https://review.openstack.org/#/c/104876
>
> that changes the way that the interval and number of retries is calculated
> for a piece of code that waits around for a block device mapping to become
> active.
>
> Namely, the patch discards the value of the following configuration options
> *if the volume size is not 0* (which is a typical case):
>
> * CONF.block_device_allocate_retries_interval
> * CONF.block_device_allocate_retries
>
> and in their place, instead uses a hard-coded 60 max number of retries and
> calculates a more "appropriate" interval by looking at the size of the
> volume to be created. The algorithm uses the sensible idea that it will take
> longer to allocate larger volumes than smaller volumes, and therefore the
> interval time for larger volumes should be longer than smaller ones.
>
> So... here's the question: since this code essentially makes the above two
> configuration options obselete for the majority of cases (where volume size
> is not 0), should we do one of the following?
>
> 1) We should just deprecate both the options, with a note in the option help
> text that these options are not used when volume size is not 0, and that the
> interval is calculated based on volume size
>
> or
>
> 2) We should deprecate the CONF.block_device_allocate_retries_interval
> option only, and keep the CONF.block_device_allocate_retries configuration
> option as-is, changing the help text to read something like "Max number of
> retries. We calculate the interval of the retry based on the size of the
> volume."
>
> I bring this up on the mailing list because I think Liyi's patch offers an
> interesting future direction to the way that we think about our retry
> approach in Nova. Instead of having hard-coded or configurable interval
> times, I think Liyi's approach of calculating the interval length based on
> some input values is a good direction to take.
>
> Thoughts?
>
> -jay
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval

2014-08-06 Thread Jay Pipes

Hi Stackers!

So, Liyi Meng has an interesting patch up for Nova:

https://review.openstack.org/#/c/104876

that changes the way that the interval and number of retries is 
calculated for a piece of code that waits around for a block device 
mapping to become active.


Namely, the patch discards the value of the following configuration 
options *if the volume size is not 0* (which is a typical case):


* CONF.block_device_allocate_retries_interval
* CONF.block_device_allocate_retries

and in their place, instead uses a hard-coded 60 max number of retries 
and calculates a more "appropriate" interval by looking at the size of 
the volume to be created. The algorithm uses the sensible idea that it 
will take longer to allocate larger volumes than smaller volumes, and 
therefore the interval time for larger volumes should be longer than 
smaller ones.


So... here's the question: since this code essentially makes the above 
two configuration options obselete for the majority of cases (where 
volume size is not 0), should we do one of the following?


1) We should just deprecate both the options, with a note in the option 
help text that these options are not used when volume size is not 0, and 
that the interval is calculated based on volume size


or

2) We should deprecate the CONF.block_device_allocate_retries_interval 
option only, and keep the CONF.block_device_allocate_retries 
configuration option as-is, changing the help text to read something 
like "Max number of retries. We calculate the interval of the retry 
based on the size of the volume."


I bring this up on the mailing list because I think Liyi's patch offers 
an interesting future direction to the way that we think about our retry 
approach in Nova. Instead of having hard-coded or configurable interval 
times, I think Liyi's approach of calculating the interval length based 
on some input values is a good direction to take.


Thoughts?

-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev