Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote: I'd like to get some sort of consensus on this before I start working on it. Now that people are back from Summit, what would you propose? Best Regards, Solly Ross - Original Message - From: Solly Ross sr...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Tuesday, November 5, 2013 10:40:48 AM Subject: Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 Also, that's still an overly complicated process for one or two VMs. The idea behind the Nova command was to minimize the steps in the image-volume-VM process for a single VM. - Original Message - From: Chris Friesen chris.frie...@windriver.com To: openstack-dev@lists.openstack.org Sent: Tuesday, November 5, 2013 9:23:39 AM Subject: Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 Wouldn't you still need variable timeouts? I'm assuming that copying multi-gig cinder volumes might take a while, even if it's local. (Or are you assuming copy-on-write?) Chris On 11/05/2013 01:43 AM, Caitlin Bestler wrote: Replication of snapshots is one solution to this. You create a Cinder Volume once. snapshot it. Then replicate to the hosts that need it (this is the piece currently missing). Then you clone there. I will be giving an in an hour in conference session on this and other uses of snapshots in the last time slot Wednesday. On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com mailto:sr...@redhat.com wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I think the best solution here is to clean up the setting of error-status for volumes during create/download and skip the timeout altogether. Last time I looked even this wasn't in that bad of shape (with the exception of the phantom VG doesn't exist that none of us seem to be able to recreate). I'm not a fan of complex variable time-out algorithms, and I'm even less of a fan of adding API functions to gather timeout info. I would like to hear if there's actually a solution offered by call-backs that the rest of us just aren't seeing here? I don't know how that solves the problem though. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On 11/12/2013 8:09 AM, John Griffith wrote: On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote: I'd like to get some sort of consensus on this before I start working on it. Now that people are back from Summit, what would you propose? Best Regards, Solly Ross - Original Message - From: Solly Ross sr...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Tuesday, November 5, 2013 10:40:48 AM Subject: Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 Also, that's still an overly complicated process for one or two VMs. The idea behind the Nova command was to minimize the steps in the image-volume-VM process for a single VM. Complexity is not an issue. Bandwidth and latency are issues. Any solution that achieves the user objectives can be managed by a taskflow. It will be simple for the user to apply. The amount of code involved is relatively low on the factors to compare. Taking extra time and consuming extra bandwidth that were not required are serious issues. My assumption is that the cinder backend will be able to employ copy-on-write when cloning volumes to at least make a thinly provisioned version available almost instantly (even if the full space is allocated and then copied asynchronously. Permanently thin clones just require that the relationship be tracked. Currently that is up to the volume driver, but we could always make these relationships legitimate by recognizing them in Cinder proper). The goal here is not to require new behaviors of backends, but to enable solutions that already exist to be deployed to the benefit of end users. Requiring synchronoous multi-GB copies (locally or even worse over the network) is not a minor price that we should expect customers to endure for the sake of software uniformity. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On Tue, Nov 12, 2013 at 10:25 AM, Caitlin Bestler caitlin.best...@nexenta.com wrote: On 11/12/2013 8:09 AM, John Griffith wrote: On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote: I'd like to get some sort of consensus on this before I start working on it. Now that people are back from Summit, what would you propose? Best Regards, Solly Ross - Original Message - From: Solly Ross sr...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Tuesday, November 5, 2013 10:40:48 AM Subject: Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 Also, that's still an overly complicated process for one or two VMs. The idea behind the Nova command was to minimize the steps in the image-volume-VM process for a single VM. Complexity is not an issue. Bandwidth and latency are issues. Any solution that achieves the user objectives can be managed by a taskflow. It will be simple for the user to apply. The amount of code involved is relatively low on the factors to compare. Taking extra time and consuming extra bandwidth that were not required are serious issues. My assumption is that the cinder backend will be able to employ copy-on-write when cloning volumes to at least make a thinly provisioned version available almost instantly (even if the full space is allocated and then copied asynchronously. Permanently thin clones just require that the relationship be tracked. Currently that is up to the volume driver, but we could always make these relationships legitimate by recognizing them in Cinder proper). Sorry, but I'm not seeing where you're going with this in relation to the question being asked? The question is how to deal with creating a new bootable volume from nova boot command and be able to tell whether it's timed out, or errored while waiting for creation. Not sure I'm following your solution here, in an ideal scenario yes, if the backend has a volume with the image already available they could utilize things like cloning or snapshot features but that's a pretty significant pre-req and I'm not sure how it relates to the general problem that's being discussed. The goal here is not to require new behaviors of backends, but to enable solutions that already exist to be deployed to the benefit of end users. Requiring synchronoous multi-GB copies (locally or even worse over the network) is not a minor price that we should expect customers to endure for the sake of software uniformity. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On 11/05/2013 01:27 AM, Avishay Traeger wrote: I think the proper fix is to make sure that Cinder is moving the volume into 'error' state in all cases where there is an error. Nova can then poll as long as its in the 'downloading' state, until it's 'available' or 'error'. Is there a reason why Cinder would legitimately get stuck in 'downloading'? There's always the cinder service crashed and couldn't restart case. :) Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On Nov 5, 2013 3:33 PM, Avishay Traeger avis...@il.ibm.com wrote: So while doubling the timeout will fix some cases, there will be cases with larger volumes and/or slower systems where the bug will still hit. Even timing out on the download progress can lead to unnecessary timeouts (if it's really slow, or volume is really big, it can stay at 5% for some time). I think the proper fix is to make sure that Cinder is moving the volume into 'error' state in all cases where there is an error. Nova can then poll as long as its in the 'downloading' state, until it's 'available' or 'error'. Agree Is there a reason why Cinder would legitimately get stuck in 'downloading'? Thanks, Avishay From: John Griffith john.griff...@solidfire.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 11/05/2013 07:41 AM Subject:Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 On Tue, Nov 5, 2013 at 7:27 AM, John Griffith john.griff...@solidfire.com wrote: On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen chris.frie...@windriver.com wrote: On 11/04/2013 03:49 PM, Solly Ross wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? The only other option I can think of would be some kind of callback that cinder could explicitly call to drive updates and/or notifications of faults rather than needing to wait for a timeout. Possibly a combination of both would be best, that way you could add a --poll option to the create volume and boot CLI command. I come from the kernel-hacking world and most things there involve event-driven callbacks. Looking at the openstack code I was kind of surprised to see hardcoded timeouts and RPC casts with no callbacks to indicate completion. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I believe you're referring to [1], which was closed after a patch was added to nova to double the timeout length. Based on comments sounds like your still seeing issues on some Gluster (maybe other) setups? Rather than mess with the API in order to do debug, why don't you use the info in the cinder-logs? [1] https://bugs.launchpad.net/nova/+bug/1213953 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
Wouldn't you still need variable timeouts? I'm assuming that copying multi-gig cinder volumes might take a while, even if it's local. (Or are you assuming copy-on-write?) Chris On 11/05/2013 01:43 AM, Caitlin Bestler wrote: Replication of snapshots is one solution to this. You create a Cinder Volume once. snapshot it. Then replicate to the hosts that need it (this is the piece currently missing). Then you clone there. I will be giving an in an hour in conference session on this and other uses of snapshots in the last time slot Wednesday. On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com mailto:sr...@redhat.com wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
Also, that's still an overly complicated process for one or two VMs. The idea behind the Nova command was to minimize the steps in the image-volume-VM process for a single VM. - Original Message - From: Chris Friesen chris.frie...@windriver.com To: openstack-dev@lists.openstack.org Sent: Tuesday, November 5, 2013 9:23:39 AM Subject: Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 Wouldn't you still need variable timeouts? I'm assuming that copying multi-gig cinder volumes might take a while, even if it's local. (Or are you assuming copy-on-write?) Chris On 11/05/2013 01:43 AM, Caitlin Bestler wrote: Replication of snapshots is one solution to this. You create a Cinder Volume once. snapshot it. Then replicate to the hosts that need it (this is the piece currently missing). Then you clone there. I will be giving an in an hour in conference session on this and other uses of snapshots in the last time slot Wednesday. On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com mailto:sr...@redhat.com wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
Chris Friesen chris.frie...@windriver.com wrote on 11/05/2013 10:21:07 PM: I think the proper fix is to make sure that Cinder is moving the volume into 'error' state in all cases where there is an error. Nova can then poll as long as its in the 'downloading' state, until it's 'available' or 'error'. Is there a reason why Cinder would legitimately get stuck in 'downloading'? There's always the cinder service crashed and couldn't restart case. :) Well we should fix that too :) Your Cinder processes should be properly HA'ed, and yes, Cinder needs to be robust enough to resume operations. I don't see how adding a callback would help - wouldn't you still need to timeout if you don't get a callback? Thanks, Avishay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On 11/04/2013 03:49 PM, Solly Ross wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? The only other option I can think of would be some kind of callback that cinder could explicitly call to drive updates and/or notifications of faults rather than needing to wait for a timeout. Possibly a combination of both would be best, that way you could add a --poll option to the create volume and boot CLI command. I come from the kernel-hacking world and most things there involve event-driven callbacks. Looking at the openstack code I was kind of surprised to see hardcoded timeouts and RPC casts with no callbacks to indicate completion. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
On Tue, Nov 5, 2013 at 7:27 AM, John Griffith john.griff...@solidfire.com wrote: On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen chris.frie...@windriver.com wrote: On 11/04/2013 03:49 PM, Solly Ross wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? The only other option I can think of would be some kind of callback that cinder could explicitly call to drive updates and/or notifications of faults rather than needing to wait for a timeout. Possibly a combination of both would be best, that way you could add a --poll option to the create volume and boot CLI command. I come from the kernel-hacking world and most things there involve event-driven callbacks. Looking at the openstack code I was kind of surprised to see hardcoded timeouts and RPC casts with no callbacks to indicate completion. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I believe you're referring to [1], which was closed after a patch was added to nova to double the timeout length. Based on comments sounds like your still seeing issues on some Gluster (maybe other) setups? Rather than mess with the API in order to do debug, why don't you use the info in the cinder-logs? [1] https://bugs.launchpad.net/nova/+bug/1213953 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
So while doubling the timeout will fix some cases, there will be cases with larger volumes and/or slower systems where the bug will still hit. Even timing out on the download progress can lead to unnecessary timeouts (if it's really slow, or volume is really big, it can stay at 5% for some time). I think the proper fix is to make sure that Cinder is moving the volume into 'error' state in all cases where there is an error. Nova can then poll as long as its in the 'downloading' state, until it's 'available' or 'error'. Is there a reason why Cinder would legitimately get stuck in 'downloading'? Thanks, Avishay From: John Griffith john.griff...@solidfire.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 11/05/2013 07:41 AM Subject:Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953 On Tue, Nov 5, 2013 at 7:27 AM, John Griffith john.griff...@solidfire.com wrote: On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen chris.frie...@windriver.com wrote: On 11/04/2013 03:49 PM, Solly Ross wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? The only other option I can think of would be some kind of callback that cinder could explicitly call to drive updates and/or notifications of faults rather than needing to wait for a timeout. Possibly a combination of both would be best, that way you could add a --poll option to the create volume and boot CLI command. I come from the kernel-hacking world and most things there involve event-driven callbacks. Looking at the openstack code I was kind of surprised to see hardcoded timeouts and RPC casts with no callbacks to indicate completion. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I believe you're referring to [1], which was closed after a patch was added to nova to double the timeout length. Based on comments sounds like your still seeing issues on some Gluster (maybe other) setups? Rather than mess with the API in order to do debug, why don't you use the info in the cinder-logs? [1] https://bugs.launchpad.net/nova/+bug/1213953 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953
Replication of snapshots is one solution to this. You create a Cinder Volume once. snapshot it. Then replicate to the hosts that need it (this is the piece currently missing). Then you clone there. I will be giving an in an hour in conference session on this and other uses of snapshots in the last time slot Wednesday. On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com wrote: So, There's currently an outstanding issue with regards to a Nova shortcut command that creates a volume from an image and then boots from it in one fell swoop. The gist of the issue is that there is currently a set timeout which can time out before the volume creation has finished (it's designed to time out in case there is an error), in cases where the image download or volume creation takes an extended period of time (e.g. under a Gluster backend for Cinder with certain network conditions). The proposed solution is a modification to the Cinder API to provide more detail on what exactly is going on, so that we could programmatically tune the timeout. My initial thought is to create a new column in the Volume table called 'status_detail' to provide more detailed information about the current status. For instance, for the 'downloading' status, we could have 'status_detail' be the completion percentage or JSON containing the total size and the current amount copied. This way, at each interval we could check to see if the amount copied had changed, and trigger the timeout if it had not, instead of blindly assuming that the operation will complete within a given amount of time. What do people think? Would there be a better way to do this? Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev