Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
On 08/15/2014 11:43 AM, Matthew Booth wrote: On 15/08/14 16:12, Andrew Laski wrote: On 08/08/2014 08:42 AM, Nikola Đipanov wrote: On 08/06/2014 07:54 PM, Jay Pipes wrote: I bring this up on the mailing list because I think Liyi's patch offers an interesting future direction to the way that we think about our retry approach in Nova. Instead of having hard-coded or configurable interval times, I think Liyi's approach of calculating the interval length based on some input values is a good direction to take. This indeed is a problem that we've seen bite us a number of times, and I tried to solve it by proposing [1] but didn't get to work on it further yet. Having said that - after thinking about it more, I was not sure I like my own approach in [1] on the grounds of it being too generic (and overly elaborate) for the particular problem it is solving. I was then thinking of something similar to what is proposed here, where we would have a waiting time that is a function of a value that we could query Cinder on. Allocation rate proposed here seems to fit this nicely, but in my mind we would have a way to query cinder about it instead of hardcoding it, however this can be done later and in cooperation with the Cinder team. I like this direction a lot, because the allocation time depends on Cinder and its backend more than anything. Excepting perhaps image transfer speeds. 2) We should deprecate the CONF.block_device_allocate_retries_interval option only, and keep the CONF.block_device_allocate_retries configuration option as-is, changing the help text to read something like "Max number of retries. We calculate the interval of the retry based on the size of the volume." I'd go with this as the number of retries can still be useful as a tool for easy workaround and troubleshooting, but I'd put a big disclaimer that it is mostly meant for debugging/workaround purposes. I disagree a bit with having a knob purely for debugging/workaround. I think this is a place where it's very useful to have the knob though. The figures in the review seem sound, but this is probably not a place where one size is going to fit all. Until/unless we can get some coordination between Nova and Cinder on this I think having a knob that is intended to be used is the right way to go. FYI, this bug is admittedly only from code inspection, but I think many interval config variables are currently broken in Nova: https://bugs.launchpad.net/nova/+bug/1354403 If they've never been used, they're presumably of limited value. Fair point. Though it's likely they'll be fixed as soon as someone starts tuning them and finds them broken. And it'll be easier to fix them than to remove them and add them in later. Which is not to say that all of them are useful. But the two values in question here are ones I've been tuning recently and know that they work. Matt N. [1] https://review.openstack.org/#/c/87546/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
On 15/08/14 16:12, Andrew Laski wrote: > > On 08/08/2014 08:42 AM, Nikola Đipanov wrote: >> On 08/06/2014 07:54 PM, Jay Pipes wrote: >>> I bring this up on the mailing list because I think Liyi's patch offers >>> an interesting future direction to the way that we think about our retry >>> approach in Nova. Instead of having hard-coded or configurable interval >>> times, I think Liyi's approach of calculating the interval length based >>> on some input values is a good direction to take. >>> >> This indeed is a problem that we've seen bite us a number of times, and >> I tried to solve it by proposing [1] but didn't get to work on it >> further yet. >> >> Having said that - after thinking about it more, I was not sure I like >> my own approach in [1] on the grounds of it being too generic (and >> overly elaborate) for the particular problem it is solving. >> >> I was then thinking of something similar to what is proposed here, where >> we would have a waiting time that is a function of a value that we could >> query Cinder on. Allocation rate proposed here seems to fit this nicely, >> but in my mind we would have a way to query cinder about it instead of >> hardcoding it, however this can be done later and in cooperation with >> the Cinder team. > > I like this direction a lot, because the allocation time depends on > Cinder and its backend more than anything. Excepting perhaps image > transfer speeds. > >>> 2) We should deprecate the CONF.block_device_allocate_retries_interval >>> option only, and keep the CONF.block_device_allocate_retries >>> configuration option as-is, changing the help text to read something >>> like "Max number of retries. We calculate the interval of the retry >>> based on the size of the volume." >>> >> I'd go with this as the number of retries can still be useful as a tool >> for easy workaround and troubleshooting, but I'd put a big disclaimer >> that it is mostly meant for debugging/workaround purposes. > > I disagree a bit with having a knob purely for debugging/workaround. I > think this is a place where it's very useful to have the knob though. > The figures in the review seem sound, but this is probably not a place > where one size is going to fit all. Until/unless we can get some > coordination between Nova and Cinder on this I think having a knob that > is intended to be used is the right way to go. FYI, this bug is admittedly only from code inspection, but I think many interval config variables are currently broken in Nova: https://bugs.launchpad.net/nova/+bug/1354403 If they've never been used, they're presumably of limited value. Matt > >> N. >> >> [1] https://review.openstack.org/#/c/87546/ >> >> ___ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
On 08/08/2014 08:42 AM, Nikola Đipanov wrote: On 08/06/2014 07:54 PM, Jay Pipes wrote: I bring this up on the mailing list because I think Liyi's patch offers an interesting future direction to the way that we think about our retry approach in Nova. Instead of having hard-coded or configurable interval times, I think Liyi's approach of calculating the interval length based on some input values is a good direction to take. This indeed is a problem that we've seen bite us a number of times, and I tried to solve it by proposing [1] but didn't get to work on it further yet. Having said that - after thinking about it more, I was not sure I like my own approach in [1] on the grounds of it being too generic (and overly elaborate) for the particular problem it is solving. I was then thinking of something similar to what is proposed here, where we would have a waiting time that is a function of a value that we could query Cinder on. Allocation rate proposed here seems to fit this nicely, but in my mind we would have a way to query cinder about it instead of hardcoding it, however this can be done later and in cooperation with the Cinder team. I like this direction a lot, because the allocation time depends on Cinder and its backend more than anything. Excepting perhaps image transfer speeds. 2) We should deprecate the CONF.block_device_allocate_retries_interval option only, and keep the CONF.block_device_allocate_retries configuration option as-is, changing the help text to read something like "Max number of retries. We calculate the interval of the retry based on the size of the volume." I'd go with this as the number of retries can still be useful as a tool for easy workaround and troubleshooting, but I'd put a big disclaimer that it is mostly meant for debugging/workaround purposes. I disagree a bit with having a knob purely for debugging/workaround. I think this is a place where it's very useful to have the knob though. The figures in the review seem sound, but this is probably not a place where one size is going to fit all. Until/unless we can get some coordination between Nova and Cinder on this I think having a knob that is intended to be used is the right way to go. N. [1] https://review.openstack.org/#/c/87546/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
Hi Nikola, Thanks a lot for the input! May I kindly invite you to review the change as well? BR/Liyi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
On 08/06/2014 07:54 PM, Jay Pipes wrote: > I bring this up on the mailing list because I think Liyi's patch offers > an interesting future direction to the way that we think about our retry > approach in Nova. Instead of having hard-coded or configurable interval > times, I think Liyi's approach of calculating the interval length based > on some input values is a good direction to take. > This indeed is a problem that we've seen bite us a number of times, and I tried to solve it by proposing [1] but didn't get to work on it further yet. Having said that - after thinking about it more, I was not sure I like my own approach in [1] on the grounds of it being too generic (and overly elaborate) for the particular problem it is solving. I was then thinking of something similar to what is proposed here, where we would have a waiting time that is a function of a value that we could query Cinder on. Allocation rate proposed here seems to fit this nicely, but in my mind we would have a way to query cinder about it instead of hardcoding it, however this can be done later and in cooperation with the Cinder team. > > 2) We should deprecate the CONF.block_device_allocate_retries_interval > option only, and keep the CONF.block_device_allocate_retries > configuration option as-is, changing the help text to read something > like "Max number of retries. We calculate the interval of the retry > based on the size of the volume." > I'd go with this as the number of retries can still be useful as a tool for easy workaround and troubleshooting, but I'd put a big disclaimer that it is mostly meant for debugging/workaround purposes. N. [1] https://review.openstack.org/#/c/87546/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
Hi John, I have some comments as well. see blow :) _On 6 August 2014 18:54, Jay Pipes wrote: > So, Liyi Meng has an interesting patch up for Nova: > > https://review.openstack.org/#/c/104876 > > 1) We should just deprecate both the options, with a note in the option help > text that these options are not used when volume size is not 0, and that the > interval is calculated based on volume size This feels bad. Liyi: Not worse than having these two options in action actually. Assuming you know you have volume size vary from 1G to 64G in your OPS deployment. To create volumes from these images, time consuming is from 30s to 30 x 64s with your storage backend. How will you choose the right value for block_device_allocate_retries_interval and block_device_allocate_retries? What we know for sure is that you have to make block_device_allocate_retries_interval * block_device_allocate_retries >= 30 x 64 seconds Lets say you have decided to choose: block_device_allocate_retries_interval = 30 block_device_allocate_retries = 64 This is obviously optimized for image with 64G size, what happen if a 1G image is used? The worse scenario is that the image is ready at the 30th second and you have done your status pulling at the 29th second, so your next query will come in the 59th second. Your VM booting up process therefore is slower 29 seconds than expected! If applying my algorithm, the calculated interval is int(30 /60 + 1) = 1 second. i.e. if you boot up your 1G VM, you waste less 1 second in waiting it comes up! 1 second v.s. 29 seconds! This is big difference! If booting a 64G VM, the figure should be similar to the hard coded value above, but for end user, if you know you have to wait for more than 32 minutes to get your VM up, will you care if you have wasted half of a minute somewhere? > 2) We should deprecate the CONF.block_device_allocate_retries_interval > option only, and keep the CONF.block_device_allocate_retries configuration > option as-is, changing the help text to read something like "Max number of > retries. We calculate the interval of the retry based on the size of the > volume." What about a slight modification to (2)... Liyi: If we do want to go in this option, block_device_allocate_retries is the one to keep. I don't have strong opinion here. My two cents is to prefer the option 1 above. This option will keep block_device_allocate_retries. IMHO, not a wise idea. When we have too many configuration options, OPS becomes difficult to use for operator, get less widely used. If there is an advanced user there want high level customization, they would dig into source code. OPS is in python, not in C code. handling source is not huge different from handling the config. In someway, it is even more straightforward. 3) CONF.block_device_allocate_retries_interval=-1 means calculate using volume size, and we make it the default, so people can still override it if they want to. But we also deprecate the option with a view of removing it during Kilo? Move CONF.block_device_allocate_retries as max retries. > I bring this up on the mailing list because I think Liyi's patch offers an > interesting future direction to the way that we think about our retry > approach in Nova. Instead of having hard-coded or configurable interval > times, I think Liyi's approach of calculating the interval length based on > some input values is a good direction to take. Seems like the right direction. But I do worry that its quite dependent on the storage backend. Sometimes the volume create is almost "free" regardless of the volume size (with certain types of CoW). So maybe we end up needing some kind of scaling factor on the weights. I kinda hope I am over thinking that, and in reality it all works fine. I suspect that is the case. Liyi: agree here, and in my implementation I have covered this consideration as your see. e.g. when mapping a 1G image into 64G volume, I call l this function with passing in 1G of size, not 64G. Thanks, John___ From: Liyi Meng Sent: Thursday, 07 August 2014 10:09 AM To: Michael Still; OpenStack Development Mailing List (not for usage questions) Subject: RE: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval Hi Michael, Not sure if I am getting your right. I think your proposal doesn't not perform well in reality. Firstly, it is difficult to guess a good time that fix all problems, except you take "forever". Just take the volume creation in my bugfix as example (https://review.openstack.org/#/c/104876/). If a couple of large volumes are requested to create at the same time toward a fast storage backend, it would take a long time for each to create. It is quite normal to see it takes more than an hour to create a volume from a 60G image. That is why I prop
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
On 6 August 2014 18:54, Jay Pipes wrote: > So, Liyi Meng has an interesting patch up for Nova: > > https://review.openstack.org/#/c/104876 > > 1) We should just deprecate both the options, with a note in the option help > text that these options are not used when volume size is not 0, and that the > interval is calculated based on volume size This feels bad. > 2) We should deprecate the CONF.block_device_allocate_retries_interval > option only, and keep the CONF.block_device_allocate_retries configuration > option as-is, changing the help text to read something like "Max number of > retries. We calculate the interval of the retry based on the size of the > volume." What about a slight modification to (2)... 3) CONF.block_device_allocate_retries_interval=-1 means calculate using volume size, and we make it the default, so people can still override it if they want to. But we also deprecate the option with a view of removing it during Kilo? Move CONF.block_device_allocate_retries as max retries. > I bring this up on the mailing list because I think Liyi's patch offers an > interesting future direction to the way that we think about our retry > approach in Nova. Instead of having hard-coded or configurable interval > times, I think Liyi's approach of calculating the interval length based on > some input values is a good direction to take. Seems like the right direction. But I do worry that its quite dependent on the storage backend. Sometimes the volume create is almost "free" regardless of the volume size (with certain types of CoW). So maybe we end up needing some kind of scaling factor on the weights. I kinda hope I am over thinking that, and in reality it all works fine. I suspect that is the case. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
Hi Michael, Not sure if I am getting your right. I think your proposal doesn't not perform well in reality. Firstly, it is difficult to guess a good time that fix all problems, except you take "forever". Just take the volume creation in my bugfix as example (https://review.openstack.org/#/c/104876/). If a couple of large volumes are requested to create at the same time toward a fast storage backend, it would take a long time for each to create. It is quite normal to see it takes more than an hour to create a volume from a 60G image. That is why I proposal we need to "guess" a total timeout base on image size in the bugfix. Secondly, are you suggesting Eventlet sleep for "15 minute" then check the result of operation, without doing anything in between? IMHO, this would be very bad experience for end user! Because they ALWAYS have to wait for 15 minutes to move on regardless what operation they have issued. BR/Liyi From: mikalst...@gmail.com [mikalst...@gmail.com] on behalf of Michael Still [mi...@stillhq.com] Sent: Wednesday, 06 August 2014 10:57 PM To: OpenStack Development Mailing List (not for usage questions) Cc: Liyi Meng Subject: Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval Maybe we should change how we wait? I get that we don't want to sit around forever, but perhaps we should specify a total maximum time to wait instead of a number of iterations of a loop? Something like "15 minutes should be long enough for anyone!". Eventlet sleeps are also pretty cheap, so having a bigger sleep time inside them just means that we overshoot more than we would otherwise. Michael On Thu, Aug 7, 2014 at 3:54 AM, Jay Pipes wrote: > Hi Stackers! > > So, Liyi Meng has an interesting patch up for Nova: > > https://review.openstack.org/#/c/104876 > > that changes the way that the interval and number of retries is calculated > for a piece of code that waits around for a block device mapping to become > active. > > Namely, the patch discards the value of the following configuration options > *if the volume size is not 0* (which is a typical case): > > * CONF.block_device_allocate_retries_interval > * CONF.block_device_allocate_retries > > and in their place, instead uses a hard-coded 60 max number of retries and > calculates a more "appropriate" interval by looking at the size of the > volume to be created. The algorithm uses the sensible idea that it will take > longer to allocate larger volumes than smaller volumes, and therefore the > interval time for larger volumes should be longer than smaller ones. > > So... here's the question: since this code essentially makes the above two > configuration options obselete for the majority of cases (where volume size > is not 0), should we do one of the following? > > 1) We should just deprecate both the options, with a note in the option help > text that these options are not used when volume size is not 0, and that the > interval is calculated based on volume size > > or > > 2) We should deprecate the CONF.block_device_allocate_retries_interval > option only, and keep the CONF.block_device_allocate_retries configuration > option as-is, changing the help text to read something like "Max number of > retries. We calculate the interval of the retry based on the size of the > volume." > > I bring this up on the mailing list because I think Liyi's patch offers an > interesting future direction to the way that we think about our retry > approach in Nova. Instead of having hard-coded or configurable interval > times, I think Liyi's approach of calculating the interval length based on > some input values is a good direction to take. > > Thoughts? > > -jay > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
Maybe we should change how we wait? I get that we don't want to sit around forever, but perhaps we should specify a total maximum time to wait instead of a number of iterations of a loop? Something like "15 minutes should be long enough for anyone!". Eventlet sleeps are also pretty cheap, so having a bigger sleep time inside them just means that we overshoot more than we would otherwise. Michael On Thu, Aug 7, 2014 at 3:54 AM, Jay Pipes wrote: > Hi Stackers! > > So, Liyi Meng has an interesting patch up for Nova: > > https://review.openstack.org/#/c/104876 > > that changes the way that the interval and number of retries is calculated > for a piece of code that waits around for a block device mapping to become > active. > > Namely, the patch discards the value of the following configuration options > *if the volume size is not 0* (which is a typical case): > > * CONF.block_device_allocate_retries_interval > * CONF.block_device_allocate_retries > > and in their place, instead uses a hard-coded 60 max number of retries and > calculates a more "appropriate" interval by looking at the size of the > volume to be created. The algorithm uses the sensible idea that it will take > longer to allocate larger volumes than smaller volumes, and therefore the > interval time for larger volumes should be longer than smaller ones. > > So... here's the question: since this code essentially makes the above two > configuration options obselete for the majority of cases (where volume size > is not 0), should we do one of the following? > > 1) We should just deprecate both the options, with a note in the option help > text that these options are not used when volume size is not 0, and that the > interval is calculated based on volume size > > or > > 2) We should deprecate the CONF.block_device_allocate_retries_interval > option only, and keep the CONF.block_device_allocate_retries configuration > option as-is, changing the help text to read something like "Max number of > retries. We calculate the interval of the retry based on the size of the > volume." > > I bring this up on the mailing list because I think Liyi's patch offers an > interesting future direction to the way that we think about our retry > approach in Nova. Instead of having hard-coded or configurable interval > times, I think Liyi's approach of calculating the interval length based on > some input values is a good direction to take. > > Thoughts? > > -jay > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Deprecating CONF.block_device_allocate_retries_interval
Hi Stackers! So, Liyi Meng has an interesting patch up for Nova: https://review.openstack.org/#/c/104876 that changes the way that the interval and number of retries is calculated for a piece of code that waits around for a block device mapping to become active. Namely, the patch discards the value of the following configuration options *if the volume size is not 0* (which is a typical case): * CONF.block_device_allocate_retries_interval * CONF.block_device_allocate_retries and in their place, instead uses a hard-coded 60 max number of retries and calculates a more "appropriate" interval by looking at the size of the volume to be created. The algorithm uses the sensible idea that it will take longer to allocate larger volumes than smaller volumes, and therefore the interval time for larger volumes should be longer than smaller ones. So... here's the question: since this code essentially makes the above two configuration options obselete for the majority of cases (where volume size is not 0), should we do one of the following? 1) We should just deprecate both the options, with a note in the option help text that these options are not used when volume size is not 0, and that the interval is calculated based on volume size or 2) We should deprecate the CONF.block_device_allocate_retries_interval option only, and keep the CONF.block_device_allocate_retries configuration option as-is, changing the help text to read something like "Max number of retries. We calculate the interval of the retry based on the size of the volume." I bring this up on the mailing list because I think Liyi's patch offers an interesting future direction to the way that we think about our retry approach in Nova. Instead of having hard-coded or configurable interval times, I think Liyi's approach of calculating the interval length based on some input values is a good direction to take. Thoughts? -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev