Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-12 Thread John Griffith
On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote:
 I'd like to get some sort of consensus on this before I start working on it.  
 Now that people are back from Summit, what would you propose?

 Best Regards,
 Solly Ross

 - Original Message -
 From: Solly Ross sr...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, November 5, 2013 10:40:48 AM
 Subject: Re: [openstack-dev] Improvement of Cinder API wrt 
 https://bugs.launchpad.net/nova/+bug/1213953

 Also, that's still an overly complicated process for one or two VMs.  The 
 idea behind the Nova command was to minimize the steps in the 
 image-volume-VM process for a single VM.

 - Original Message -
 From: Chris Friesen chris.frie...@windriver.com
 To: openstack-dev@lists.openstack.org
 Sent: Tuesday, November 5, 2013 9:23:39 AM
 Subject: Re: [openstack-dev] Improvement of Cinder API wrt  
 https://bugs.launchpad.net/nova/+bug/1213953

 Wouldn't you still need variable timeouts?  I'm assuming that copying
 multi-gig cinder volumes might take a while, even if it's local.  (Or
 are you assuming copy-on-write?)

 Chris

 On 11/05/2013 01:43 AM, Caitlin Bestler wrote:
 Replication of snapshots is one solution to this.

 You create a Cinder Volume once. snapshot it. Then replicate to the
 hosts that need it (this is the piece currently missing). Then you clone
 there.

 I will be giving an in an hour in conference session on this and other
 uses of snapshots in the last time slot Wednesday.

 On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com
 mailto:sr...@redhat.com wrote:

 So,
 There's currently an outstanding issue with regards to a Nova
 shortcut command that creates a volume from an image and then boots
 from it in one fell swoop.  The gist of the issue is that there is
 currently a set timeout which can time out before the volume
 creation has finished (it's designed to time out in case there is an
 error), in cases where the image download or volume creation takes
 an extended period of time (e.g. under a Gluster backend for Cinder
 with certain network conditions).

 The proposed solution is a modification to the Cinder API to provide
 more detail on what exactly is going on, so that we could
 programmatically tune the timeout.  My initial thought is to create
 a new column in the Volume table called 'status_detail' to provide
 more detailed information about the current status.  For instance,
 for the 'downloading' status, we could have 'status_detail' be the
 completion percentage or JSON containing the total size and the
 current amount copied.  This way, at each interval we could check to
 see if the amount copied had changed, and trigger the timeout if it
 had not, instead of blindly assuming that the operation will
 complete within a given amount of time.

 What do people think?  Would there be a better way to do this?

 Best Regards,
 Solly Ross

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I think the best solution here is to clean up the setting of
error-status for volumes during create/download and skip the timeout
altogether.  Last time I looked even this wasn't in that bad of shape
(with the exception of the phantom VG doesn't exist that none of us
seem to be able to recreate).  I'm not a fan of complex variable
time-out algorithms, and I'm even less of a fan of adding API
functions to gather timeout info.

I would like to hear if there's actually a solution offered by
call-backs that the rest of us just aren't seeing here?  I don't know
how that solves the problem though.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-12 Thread Caitlin Bestler

On 11/12/2013 8:09 AM, John Griffith wrote:

On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote:

I'd like to get some sort of consensus on this before I start working on it.  
Now that people are back from Summit, what would you propose?

Best Regards,
Solly Ross

- Original Message -
From: Solly Ross sr...@redhat.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: Tuesday, November 5, 2013 10:40:48 AM
Subject: Re: [openstack-dev] Improvement of Cinder API wrt 
https://bugs.launchpad.net/nova/+bug/1213953

Also, that's still an overly complicated process for one or two VMs.  The idea behind 
the Nova command was to minimize the steps in the image-volume-VM process for 
a single VM.



Complexity is not an issue. Bandwidth and latency are issues.

Any solution that achieves the user objectives can be managed by a 
taskflow. It will be simple for the user to apply. The amount of code

involved is relatively low on the factors to compare. Taking extra time
and consuming extra bandwidth that were not required are serious issues.

My assumption is that the cinder backend will be able to employ 
copy-on-write when cloning volumes to at least make a thinly provisioned
version available almost instantly (even if the full space is allocated 
and then copied asynchronously. Permanently thin clones just require 
that the relationship be tracked. Currently that is up to the volume 
driver, but we could always make these relationships legitimate by 
recognizing them in Cinder proper).


The goal here is not to require new behaviors of backends, but to enable
solutions that already exist to be deployed to the benefit of end users.
Requiring synchronoous multi-GB copies (locally or even worse over the
network) is not a minor price that we should expect customers to endure
for the sake of software uniformity.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-12 Thread John Griffith
On Tue, Nov 12, 2013 at 10:25 AM, Caitlin Bestler
caitlin.best...@nexenta.com wrote:
 On 11/12/2013 8:09 AM, John Griffith wrote:

 On Tue, Nov 12, 2013 at 8:46 AM, Solly Ross sr...@redhat.com wrote:

 I'd like to get some sort of consensus on this before I start working on
 it.  Now that people are back from Summit, what would you propose?

 Best Regards,
 Solly Ross

 - Original Message -
 From: Solly Ross sr...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org
 Sent: Tuesday, November 5, 2013 10:40:48 AM
 Subject: Re: [openstack-dev] Improvement of Cinder API wrt
 https://bugs.launchpad.net/nova/+bug/1213953

 Also, that's still an overly complicated process for one or two VMs.  The
 idea behind the Nova command was to minimize the steps in the
 image-volume-VM process for a single VM.


 Complexity is not an issue. Bandwidth and latency are issues.

 Any solution that achieves the user objectives can be managed by a taskflow.
 It will be simple for the user to apply. The amount of code
 involved is relatively low on the factors to compare. Taking extra time
 and consuming extra bandwidth that were not required are serious issues.

 My assumption is that the cinder backend will be able to employ
 copy-on-write when cloning volumes to at least make a thinly provisioned
 version available almost instantly (even if the full space is allocated and
 then copied asynchronously. Permanently thin clones just require that the
 relationship be tracked. Currently that is up to the volume driver, but we
 could always make these relationships legitimate by recognizing them in
 Cinder proper).

Sorry, but I'm not seeing where you're going with this in relation to
the question being asked?  The question is how to deal with creating a
new bootable volume from nova boot command and be able to tell whether
it's timed out, or errored while waiting for creation.  Not sure I'm
following your solution here, in an ideal scenario yes, if the backend
has a volume with the image already available they could utilize
things like cloning or snapshot features but that's a pretty
significant pre-req and I'm not sure how it relates to the general
problem that's being discussed.


 The goal here is not to require new behaviors of backends, but to enable
 solutions that already exist to be deployed to the benefit of end users.
 Requiring synchronoous multi-GB copies (locally or even worse over the
 network) is not a minor price that we should expect customers to endure
 for the sake of software uniformity.




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-05 Thread Chris Friesen

On 11/05/2013 01:27 AM, Avishay Traeger wrote:


I think the proper fix is to make sure that Cinder is moving the volume
into 'error' state in all cases where there is an error.  Nova can then
poll as long as its in the 'downloading' state, until it's 'available' or
'error'.  Is there a reason why Cinder would legitimately get stuck in
'downloading'?


There's always the cinder service crashed and couldn't restart case. :)

Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-05 Thread John Griffith
On Nov 5, 2013 3:33 PM, Avishay Traeger avis...@il.ibm.com wrote:

 So while doubling the timeout will fix some cases, there will be cases
with
 larger volumes and/or slower systems where the bug will still hit.  Even
 timing out on the download progress can lead to unnecessary timeouts (if
 it's really slow, or volume is really big, it can stay at 5% for some
 time).

 I think the proper fix is to make sure that Cinder is moving the volume
 into 'error' state in all cases where there is an error.  Nova can then
 poll as long as its in the 'downloading' state, until it's 'available' or
 'error'.

Agree

 Is there a reason why Cinder would legitimately get stuck in
 'downloading'?

 Thanks,
 Avishay



 From:   John Griffith john.griff...@solidfire.com
 To: OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org,
 Date:   11/05/2013 07:41 AM
 Subject:Re: [openstack-dev] Improvement of Cinder API wrt
 https://bugs.launchpad.net/nova/+bug/1213953



 On Tue, Nov 5, 2013 at 7:27 AM, John Griffith
 john.griff...@solidfire.com wrote:
  On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen
  chris.frie...@windriver.com wrote:
  On 11/04/2013 03:49 PM, Solly Ross wrote:
 
  So, There's currently an outstanding issue with regards to a Nova
  shortcut command that creates a volume from an image and then boots
  from it in one fell swoop.  The gist of the issue is that there is
  currently a set timeout which can time out before the volume creation
  has finished (it's designed to time out in case there is an error),
  in cases where the image download or volume creation takes an
  extended period of time (e.g. under a Gluster backend for Cinder with
  certain network conditions).
 
  The proposed solution is a modification to the Cinder API to provide
  more detail on what exactly is going on, so that we could
  programmatically tune the timeout.  My initial thought is to create a
  new column in the Volume table called 'status_detail' to provide more
  detailed information about the current status.  For instance, for the
  'downloading' status, we could have 'status_detail' be the completion
  percentage or JSON containing the total size and the current amount
  copied.  This way, at each interval we could check to see if the
  amount copied had changed, and trigger the timeout if it had not,
  instead of blindly assuming that the operation will complete within a
  given amount of time.
 
  What do people think?  Would there be a better way to do this?
 
 
  The only other option I can think of would be some kind of callback
that
  cinder could explicitly call to drive updates and/or notifications of
 faults
  rather than needing to wait for a timeout.  Possibly a combination of
 both
  would be best, that way you could add a --poll option to the create
 volume
  and boot CLI command.
 
  I come from the kernel-hacking world and most things there involve
  event-driven callbacks.  Looking at the openstack code I was kind of
  surprised to see hardcoded timeouts and RPC casts with no callbacks to
  indicate completion.
 
  Chris
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 I believe you're referring to [1], which was closed after a patch was
 added to nova to double the timeout length.  Based on comments sounds
 like your still seeing issues on some Gluster (maybe other) setups?

 Rather than mess with the API in order to do debug, why don't you use
 the info in the cinder-logs?

 [1] https://bugs.launchpad.net/nova/+bug/1213953

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-05 Thread Chris Friesen
Wouldn't you still need variable timeouts?  I'm assuming that copying 
multi-gig cinder volumes might take a while, even if it's local.  (Or 
are you assuming copy-on-write?)


Chris

On 11/05/2013 01:43 AM, Caitlin Bestler wrote:

Replication of snapshots is one solution to this.

You create a Cinder Volume once. snapshot it. Then replicate to the
hosts that need it (this is the piece currently missing). Then you clone
there.

I will be giving an in an hour in conference session on this and other
uses of snapshots in the last time slot Wednesday.

On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com
mailto:sr...@redhat.com wrote:

So,
There's currently an outstanding issue with regards to a Nova
shortcut command that creates a volume from an image and then boots
from it in one fell swoop.  The gist of the issue is that there is
currently a set timeout which can time out before the volume
creation has finished (it's designed to time out in case there is an
error), in cases where the image download or volume creation takes
an extended period of time (e.g. under a Gluster backend for Cinder
with certain network conditions).

The proposed solution is a modification to the Cinder API to provide
more detail on what exactly is going on, so that we could
programmatically tune the timeout.  My initial thought is to create
a new column in the Volume table called 'status_detail' to provide
more detailed information about the current status.  For instance,
for the 'downloading' status, we could have 'status_detail' be the
completion percentage or JSON containing the total size and the
current amount copied.  This way, at each interval we could check to
see if the amount copied had changed, and trigger the timeout if it
had not, instead of blindly assuming that the operation will
complete within a given amount of time.

What do people think?  Would there be a better way to do this?

Best Regards,
Solly Ross

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
mailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-05 Thread Solly Ross
Also, that's still an overly complicated process for one or two VMs.  The idea 
behind the Nova command was to minimize the steps in the image-volume-VM 
process for a single VM.

- Original Message -
From: Chris Friesen chris.frie...@windriver.com
To: openstack-dev@lists.openstack.org
Sent: Tuesday, November 5, 2013 9:23:39 AM
Subject: Re: [openstack-dev] Improvement of Cinder API wrt  
https://bugs.launchpad.net/nova/+bug/1213953

Wouldn't you still need variable timeouts?  I'm assuming that copying 
multi-gig cinder volumes might take a while, even if it's local.  (Or 
are you assuming copy-on-write?)

Chris

On 11/05/2013 01:43 AM, Caitlin Bestler wrote:
 Replication of snapshots is one solution to this.

 You create a Cinder Volume once. snapshot it. Then replicate to the
 hosts that need it (this is the piece currently missing). Then you clone
 there.

 I will be giving an in an hour in conference session on this and other
 uses of snapshots in the last time slot Wednesday.

 On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com
 mailto:sr...@redhat.com wrote:

 So,
 There's currently an outstanding issue with regards to a Nova
 shortcut command that creates a volume from an image and then boots
 from it in one fell swoop.  The gist of the issue is that there is
 currently a set timeout which can time out before the volume
 creation has finished (it's designed to time out in case there is an
 error), in cases where the image download or volume creation takes
 an extended period of time (e.g. under a Gluster backend for Cinder
 with certain network conditions).

 The proposed solution is a modification to the Cinder API to provide
 more detail on what exactly is going on, so that we could
 programmatically tune the timeout.  My initial thought is to create
 a new column in the Volume table called 'status_detail' to provide
 more detailed information about the current status.  For instance,
 for the 'downloading' status, we could have 'status_detail' be the
 completion percentage or JSON containing the total size and the
 current amount copied.  This way, at each interval we could check to
 see if the amount copied had changed, and trigger the timeout if it
 had not, instead of blindly assuming that the operation will
 complete within a given amount of time.

 What do people think?  Would there be a better way to do this?

 Best Regards,
 Solly Ross

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-05 Thread Avishay Traeger
Chris Friesen chris.frie...@windriver.com wrote on 11/05/2013 10:21:07
PM:
  I think the proper fix is to make sure that Cinder is moving the volume
  into 'error' state in all cases where there is an error.  Nova can then
  poll as long as its in the 'downloading' state, until it's 'available'
or
  'error'.  Is there a reason why Cinder would legitimately get stuck in
  'downloading'?

 There's always the cinder service crashed and couldn't restart case. :)

Well we should fix that too :)
Your Cinder processes should be properly HA'ed, and yes, Cinder needs to be
robust enough to resume operations.
I don't see how adding a callback would help - wouldn't you still need to
timeout if you don't get a callback?

Thanks,
Avishay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-04 Thread Solly Ross
So,
There's currently an outstanding issue with regards to a Nova shortcut command 
that creates a volume from an image and then boots from it in one fell swoop.  
The gist of the issue is that there is currently a set timeout which can time 
out before the volume creation has finished (it's designed to time out in case 
there is an error), in cases where the image download or volume creation takes 
an extended period of time (e.g. under a Gluster backend for Cinder with 
certain network conditions).

The proposed solution is a modification to the Cinder API to provide more 
detail on what exactly is going on, so that we could programmatically tune the 
timeout.  My initial thought is to create a new column in the Volume table 
called 'status_detail' to provide more detailed information about the current 
status.  For instance, for the 'downloading' status, we could have 
'status_detail' be the completion percentage or JSON containing the total size 
and the current amount copied.  This way, at each interval we could check to 
see if the amount copied had changed, and trigger the timeout if it had not, 
instead of blindly assuming that the operation will complete within a given 
amount of time.

What do people think?  Would there be a better way to do this?

Best Regards,
Solly Ross

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-04 Thread Chris Friesen

On 11/04/2013 03:49 PM, Solly Ross wrote:

So, There's currently an outstanding issue with regards to a Nova
shortcut command that creates a volume from an image and then boots
from it in one fell swoop.  The gist of the issue is that there is
currently a set timeout which can time out before the volume creation
has finished (it's designed to time out in case there is an error),
in cases where the image download or volume creation takes an
extended period of time (e.g. under a Gluster backend for Cinder with
certain network conditions).

The proposed solution is a modification to the Cinder API to provide
more detail on what exactly is going on, so that we could
programmatically tune the timeout.  My initial thought is to create a
new column in the Volume table called 'status_detail' to provide more
detailed information about the current status.  For instance, for the
'downloading' status, we could have 'status_detail' be the completion
percentage or JSON containing the total size and the current amount
copied.  This way, at each interval we could check to see if the
amount copied had changed, and trigger the timeout if it had not,
instead of blindly assuming that the operation will complete within a
given amount of time.

What do people think?  Would there be a better way to do this?


The only other option I can think of would be some kind of callback that 
cinder could explicitly call to drive updates and/or notifications of 
faults rather than needing to wait for a timeout.  Possibly a 
combination of both would be best, that way you could add a --poll 
option to the create volume and boot CLI command.


I come from the kernel-hacking world and most things there involve 
event-driven callbacks.  Looking at the openstack code I was kind of 
surprised to see hardcoded timeouts and RPC casts with no callbacks to 
indicate completion.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-04 Thread John Griffith
On Tue, Nov 5, 2013 at 7:27 AM, John Griffith
john.griff...@solidfire.com wrote:
 On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen
 chris.frie...@windriver.com wrote:
 On 11/04/2013 03:49 PM, Solly Ross wrote:

 So, There's currently an outstanding issue with regards to a Nova
 shortcut command that creates a volume from an image and then boots
 from it in one fell swoop.  The gist of the issue is that there is
 currently a set timeout which can time out before the volume creation
 has finished (it's designed to time out in case there is an error),
 in cases where the image download or volume creation takes an
 extended period of time (e.g. under a Gluster backend for Cinder with
 certain network conditions).

 The proposed solution is a modification to the Cinder API to provide
 more detail on what exactly is going on, so that we could
 programmatically tune the timeout.  My initial thought is to create a
 new column in the Volume table called 'status_detail' to provide more
 detailed information about the current status.  For instance, for the
 'downloading' status, we could have 'status_detail' be the completion
 percentage or JSON containing the total size and the current amount
 copied.  This way, at each interval we could check to see if the
 amount copied had changed, and trigger the timeout if it had not,
 instead of blindly assuming that the operation will complete within a
 given amount of time.

 What do people think?  Would there be a better way to do this?


 The only other option I can think of would be some kind of callback that
 cinder could explicitly call to drive updates and/or notifications of faults
 rather than needing to wait for a timeout.  Possibly a combination of both
 would be best, that way you could add a --poll option to the create volume
 and boot CLI command.

 I come from the kernel-hacking world and most things there involve
 event-driven callbacks.  Looking at the openstack code I was kind of
 surprised to see hardcoded timeouts and RPC casts with no callbacks to
 indicate completion.

 Chris


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I believe you're referring to [1], which was closed after a patch was
added to nova to double the timeout length.  Based on comments sounds
like your still seeing issues on some Gluster (maybe other) setups?

Rather than mess with the API in order to do debug, why don't you use
the info in the cinder-logs?

[1] https://bugs.launchpad.net/nova/+bug/1213953

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-04 Thread Avishay Traeger
So while doubling the timeout will fix some cases, there will be cases with
larger volumes and/or slower systems where the bug will still hit.  Even
timing out on the download progress can lead to unnecessary timeouts (if
it's really slow, or volume is really big, it can stay at 5% for some
time).

I think the proper fix is to make sure that Cinder is moving the volume
into 'error' state in all cases where there is an error.  Nova can then
poll as long as its in the 'downloading' state, until it's 'available' or
'error'.  Is there a reason why Cinder would legitimately get stuck in
'downloading'?

Thanks,
Avishay



From:   John Griffith john.griff...@solidfire.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org,
Date:   11/05/2013 07:41 AM
Subject:Re: [openstack-dev] Improvement of Cinder API wrt
https://bugs.launchpad.net/nova/+bug/1213953



On Tue, Nov 5, 2013 at 7:27 AM, John Griffith
john.griff...@solidfire.com wrote:
 On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen
 chris.frie...@windriver.com wrote:
 On 11/04/2013 03:49 PM, Solly Ross wrote:

 So, There's currently an outstanding issue with regards to a Nova
 shortcut command that creates a volume from an image and then boots
 from it in one fell swoop.  The gist of the issue is that there is
 currently a set timeout which can time out before the volume creation
 has finished (it's designed to time out in case there is an error),
 in cases where the image download or volume creation takes an
 extended period of time (e.g. under a Gluster backend for Cinder with
 certain network conditions).

 The proposed solution is a modification to the Cinder API to provide
 more detail on what exactly is going on, so that we could
 programmatically tune the timeout.  My initial thought is to create a
 new column in the Volume table called 'status_detail' to provide more
 detailed information about the current status.  For instance, for the
 'downloading' status, we could have 'status_detail' be the completion
 percentage or JSON containing the total size and the current amount
 copied.  This way, at each interval we could check to see if the
 amount copied had changed, and trigger the timeout if it had not,
 instead of blindly assuming that the operation will complete within a
 given amount of time.

 What do people think?  Would there be a better way to do this?


 The only other option I can think of would be some kind of callback that
 cinder could explicitly call to drive updates and/or notifications of
faults
 rather than needing to wait for a timeout.  Possibly a combination of
both
 would be best, that way you could add a --poll option to the create
volume
 and boot CLI command.

 I come from the kernel-hacking world and most things there involve
 event-driven callbacks.  Looking at the openstack code I was kind of
 surprised to see hardcoded timeouts and RPC casts with no callbacks to
 indicate completion.

 Chris


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I believe you're referring to [1], which was closed after a patch was
added to nova to double the timeout length.  Based on comments sounds
like your still seeing issues on some Gluster (maybe other) setups?

Rather than mess with the API in order to do debug, why don't you use
the info in the cinder-logs?

[1] https://bugs.launchpad.net/nova/+bug/1213953

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improvement of Cinder API wrt https://bugs.launchpad.net/nova/+bug/1213953

2013-11-04 Thread Caitlin Bestler
Replication of snapshots is one solution to this.

You create a Cinder Volume once. snapshot it. Then replicate to the hosts
that need it (this is the piece currently missing). Then you clone there.

I will be giving an in an hour in conference session on this and other uses
of snapshots in the last time slot Wednesday.
 On Nov 5, 2013 5:58 AM, Solly Ross sr...@redhat.com wrote:

 So,
 There's currently an outstanding issue with regards to a Nova shortcut
 command that creates a volume from an image and then boots from it in one
 fell swoop.  The gist of the issue is that there is currently a set timeout
 which can time out before the volume creation has finished (it's designed
 to time out in case there is an error), in cases where the image download
 or volume creation takes an extended period of time (e.g. under a Gluster
 backend for Cinder with certain network conditions).

 The proposed solution is a modification to the Cinder API to provide more
 detail on what exactly is going on, so that we could programmatically tune
 the timeout.  My initial thought is to create a new column in the Volume
 table called 'status_detail' to provide more detailed information about the
 current status.  For instance, for the 'downloading' status, we could have
 'status_detail' be the completion percentage or JSON containing the total
 size and the current amount copied.  This way, at each interval we could
 check to see if the amount copied had changed, and trigger the timeout if
 it had not, instead of blindly assuming that the operation will complete
 within a given amount of time.

 What do people think?  Would there be a better way to do this?

 Best Regards,
 Solly Ross

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev