Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Sylvain Bauza



Le 24/09/2015 09:04, Duncan Thomas a écrit :

Hi

I thought I was late on this thread, but looking at the time stamps, 
it is just something that escalated very quickly. I am honestly 
surprised an cross-project interaction option went from 'we don't seem 
to understand this' to 'deprecation merged' in 4 hours, with only a 12 
hour discussion on the mailing list, right at the end of a cycle when 
we're supposed to be stabilising features.




So, I agree it was maybe a bit too quick hence the revert. That said, 
Nova master is now Mitaka, which means that the deprecation change was 
provided for the next cycle, not the one currently stabilising.


Anyway, I'm really all up with discussing why Cinder needs to know the 
Nova AZs.


I proposed a session at the Tokyo summit for a discussion of Cinder 
AZs, since there was clear confusion about what they are intended for 
and how they should be configured.


Cool, count me in from the Nova standpoint.

Since then I've reached out to and gotten good feedback from, a number 
of operators. There are two distinct configurations for AZ behaviour 
in cinder, and both sort-of worked until very recently.


1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the 
operators who responded so far are using Ceph, though that isn't 
required). The storage takes care of availability concerns, and any AZ 
info from nova should just be ignored.


2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples 
storage to nova AZs. It is may be that an AZ is used as a unit of 
scaling, or it could be a real storage failure domain. Eitehr way, 
there are a number of operators who have this configuration and want 
to keep it. Storage can certainly have a failure domain, and limiting 
the scalability problem of storage to a single cmpute AZ can have 
definite advantages in failure scenarios. These people do not want 
cross-az attach.




Ahem, Nova AZs are not failure domains - I mean the current 
implementation, in the sense of many people understand what is a failure 
domain, ie. a physical unit of machines (a bay, a room, a floor, a 
datacenter).
All the AZs in Nova share the same controlplane with the same message 
queue and database, which means that one failure can be propagated to 
the other AZ.


To be honest, there is one very specific usecase where AZs *are* failure 
domains : when cells exact match with AZs (ie. one AZ grouping all the 
hosts behind one cell). That's the very specific usecase that Sam is 
mentioning in his email, and I certainly understand we need to keep that.


What are AZs in Nova is pretty well explained in a quite old blogpost : 
http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/


We also added a few comments in our developer doc here 
http://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs


tl;dr: AZs are aggregate metadata that makes those aggregates of compute 
nodes visible to the users. Nothing more than that, no magic sauce. 
That's just a logical abstraction that can be mapping your physical 
deployment, but like I said, which would share the same bus and DB.
Of course, you could still provide networks distinct between AZs but 
that just gives you the L2 isolation, not the real failure domain in a 
Business Continuity Plan way.


What puzzles me is how Cinder is managing a datacenter-level of 
isolation given there is no cells concept AFAIK. I assume that 
cinder-volumes are belonging to a specific datacenter but how is managed 
the controlplane of it ? I can certainly understand the need of affinity 
placement between physical units, but I'm missing that piece, and 
consequently I wonder why Nova need to provide AZs to Cinder on a 
general case.




My hope at the summit session was to agree these two configurations, 
discuss any scenarios not covered by these two configuration, and nail 
down the changes we need to get these to work properly. There's 
definitely been interest and activity in the operator community in 
making nova and cinder AZs interact, and every desired interaction 
I've gotten details about so far matches one of the above models.




I'm all with you about providing a way for users to get volume affinity 
for Nova. That's a long story I'm trying to consider and we are 
constantly trying to improve the nova scheduler interfaces so that other 
projects could provide resources to the nova scheduler for decision 
making. I just want to consider whether AZs are the best concept for 
that or we should do thing by other ways (again, because AZs are not 
what people expect).


Again, count me in for the Cinder session, and just lemme know when the 
session is planned so I could attend it.


-Sylvain




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Duncan Thomas
Hi

I thought I was late on this thread, but looking at the time stamps, it is
just something that escalated very quickly. I am honestly surprised an
cross-project interaction option went from 'we don't seem to understand
this' to 'deprecation merged' in 4 hours, with only a 12 hour discussion on
the mailing list, right at the end of a cycle when we're supposed to be
stabilising features.

I proposed a session at the Tokyo summit for a discussion of Cinder AZs,
since there was clear confusion about what they are intended for and how
they should be configured. Since then I've reached out to and gotten good
feedback from, a number of operators. There are two distinct configurations
for AZ behaviour in cinder, and both sort-of worked until very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the operators
who responded so far are using Ceph, though that isn't required). The
storage takes care of availability concerns, and any AZ info from nova
should just be ignored.

2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of scaling,
or it could be a real storage failure domain. Eitehr way, there are a
number of operators who have this configuration and want to keep it.
Storage can certainly have a failure domain, and limiting the scalability
problem of storage to a single cmpute AZ can have definite advantages in
failure scenarios. These people do not want cross-az attach.

My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail down
the changes we need to get these to work properly. There's definitely been
interest and activity in the operator community in making nova and cinder
AZs interact, and every desired interaction I've gotten details about so
far matches one of the above models.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Matt Riedemann



On 9/24/2015 9:06 AM, Matt Riedemann wrote:



On 9/24/2015 3:19 AM, Sylvain Bauza wrote:



Le 24/09/2015 09:04, Duncan Thomas a écrit :

Hi

I thought I was late on this thread, but looking at the time stamps,
it is just something that escalated very quickly. I am honestly
surprised an cross-project interaction option went from 'we don't seem
to understand this' to 'deprecation merged' in 4 hours, with only a 12
hour discussion on the mailing list, right at the end of a cycle when
we're supposed to be stabilising features.



So, I agree it was maybe a bit too quick hence the revert. That said,
Nova master is now Mitaka, which means that the deprecation change was
provided for the next cycle, not the one currently stabilising.

Anyway, I'm really all up with discussing why Cinder needs to know the
Nova AZs.


I proposed a session at the Tokyo summit for a discussion of Cinder
AZs, since there was clear confusion about what they are intended for
and how they should be configured.


Cool, count me in from the Nova standpoint.


Since then I've reached out to and gotten good feedback from, a number
of operators. There are two distinct configurations for AZ behaviour
in cinder, and both sort-of worked until very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the
operators who responded so far are using Ceph, though that isn't
required). The storage takes care of availability concerns, and any AZ
info from nova should just be ignored.

2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of
scaling, or it could be a real storage failure domain. Eitehr way,
there are a number of operators who have this configuration and want
to keep it. Storage can certainly have a failure domain, and limiting
the scalability problem of storage to a single cmpute AZ can have
definite advantages in failure scenarios. These people do not want
cross-az attach.



Ahem, Nova AZs are not failure domains - I mean the current
implementation, in the sense of many people understand what is a failure
domain, ie. a physical unit of machines (a bay, a room, a floor, a
datacenter).
All the AZs in Nova share the same controlplane with the same message
queue and database, which means that one failure can be propagated to
the other AZ.

To be honest, there is one very specific usecase where AZs *are* failure
domains : when cells exact match with AZs (ie. one AZ grouping all the
hosts behind one cell). That's the very specific usecase that Sam is
mentioning in his email, and I certainly understand we need to keep that.

What are AZs in Nova is pretty well explained in a quite old blogpost :
http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/


We also added a few comments in our developer doc here
http://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs


tl;dr: AZs are aggregate metadata that makes those aggregates of compute
nodes visible to the users. Nothing more than that, no magic sauce.
That's just a logical abstraction that can be mapping your physical
deployment, but like I said, which would share the same bus and DB.
Of course, you could still provide networks distinct between AZs but
that just gives you the L2 isolation, not the real failure domain in a
Business Continuity Plan way.

What puzzles me is how Cinder is managing a datacenter-level of
isolation given there is no cells concept AFAIK. I assume that
cinder-volumes are belonging to a specific datacenter but how is managed
the controlplane of it ? I can certainly understand the need of affinity
placement between physical units, but I'm missing that piece, and
consequently I wonder why Nova need to provide AZs to Cinder on a
general case.




My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail
down the changes we need to get these to work properly. There's
definitely been interest and activity in the operator community in
making nova and cinder AZs interact, and every desired interaction
I've gotten details about so far matches one of the above models.



I'm all with you about providing a way for users to get volume affinity
for Nova. That's a long story I'm trying to consider and we are
constantly trying to improve the nova scheduler interfaces so that other
projects could provide resources to the nova scheduler for decision
making. I just want to consider whether AZs are the best concept for
that or we should do thing by other ways (again, because AZs are not
what people expect).

Again, count me in for the Cinder session, and just lemme know when the
session is planned so I could attend it.

-Sylvain




__

OpenStack Development Mailing List (not for usage questions)

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Matt Riedemann



On 9/24/2015 3:19 AM, Sylvain Bauza wrote:



Le 24/09/2015 09:04, Duncan Thomas a écrit :

Hi

I thought I was late on this thread, but looking at the time stamps,
it is just something that escalated very quickly. I am honestly
surprised an cross-project interaction option went from 'we don't seem
to understand this' to 'deprecation merged' in 4 hours, with only a 12
hour discussion on the mailing list, right at the end of a cycle when
we're supposed to be stabilising features.



So, I agree it was maybe a bit too quick hence the revert. That said,
Nova master is now Mitaka, which means that the deprecation change was
provided for the next cycle, not the one currently stabilising.

Anyway, I'm really all up with discussing why Cinder needs to know the
Nova AZs.


I proposed a session at the Tokyo summit for a discussion of Cinder
AZs, since there was clear confusion about what they are intended for
and how they should be configured.


Cool, count me in from the Nova standpoint.


Since then I've reached out to and gotten good feedback from, a number
of operators. There are two distinct configurations for AZ behaviour
in cinder, and both sort-of worked until very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the
operators who responded so far are using Ceph, though that isn't
required). The storage takes care of availability concerns, and any AZ
info from nova should just be ignored.

2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of
scaling, or it could be a real storage failure domain. Eitehr way,
there are a number of operators who have this configuration and want
to keep it. Storage can certainly have a failure domain, and limiting
the scalability problem of storage to a single cmpute AZ can have
definite advantages in failure scenarios. These people do not want
cross-az attach.



Ahem, Nova AZs are not failure domains - I mean the current
implementation, in the sense of many people understand what is a failure
domain, ie. a physical unit of machines (a bay, a room, a floor, a
datacenter).
All the AZs in Nova share the same controlplane with the same message
queue and database, which means that one failure can be propagated to
the other AZ.

To be honest, there is one very specific usecase where AZs *are* failure
domains : when cells exact match with AZs (ie. one AZ grouping all the
hosts behind one cell). That's the very specific usecase that Sam is
mentioning in his email, and I certainly understand we need to keep that.

What are AZs in Nova is pretty well explained in a quite old blogpost :
http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/

We also added a few comments in our developer doc here
http://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs

tl;dr: AZs are aggregate metadata that makes those aggregates of compute
nodes visible to the users. Nothing more than that, no magic sauce.
That's just a logical abstraction that can be mapping your physical
deployment, but like I said, which would share the same bus and DB.
Of course, you could still provide networks distinct between AZs but
that just gives you the L2 isolation, not the real failure domain in a
Business Continuity Plan way.

What puzzles me is how Cinder is managing a datacenter-level of
isolation given there is no cells concept AFAIK. I assume that
cinder-volumes are belonging to a specific datacenter but how is managed
the controlplane of it ? I can certainly understand the need of affinity
placement between physical units, but I'm missing that piece, and
consequently I wonder why Nova need to provide AZs to Cinder on a
general case.




My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail
down the changes we need to get these to work properly. There's
definitely been interest and activity in the operator community in
making nova and cinder AZs interact, and every desired interaction
I've gotten details about so far matches one of the above models.



I'm all with you about providing a way for users to get volume affinity
for Nova. That's a long story I'm trying to consider and we are
constantly trying to improve the nova scheduler interfaces so that other
projects could provide resources to the nova scheduler for decision
making. I just want to consider whether AZs are the best concept for
that or we should do thing by other ways (again, because AZs are not
what people expect).

Again, count me in for the Cinder session, and just lemme know when the
session is planned so I could attend it.

-Sylvain




__
OpenStack Development Mailing List (not for usage questions)

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Walter A. Boring IV

>> ​To be honest this is probably my fault, AZ's were pulled in as part of
>> the nova-volume migration to Cinder and just sort of died.  Quite
>> frankly I wasn't sure "what" to do with them but brought over the
>> concept and the zones that existing in Nova-Volume.  It's been an issue
>> since day 1 of Cinder, and as you note there are little hacks here and
>> there over the years to do different things.
>>
>> I think your question about whether they should be there at all or not
>> is a good one.  We have had some interest from folks lately that want to
>> couple Nova and Cinder AZ's (I'm really not sure of any details or
>> use-cases here).
>>
>> My opinion would be until somebody proposes a clear use case and need
>> that actually works that we consider deprecating it.
>>
>> While we're on the subject (kinda) I've never been a very fond of having
>> Nova create the volume during boot process either; there's a number of
>> things that go wrong here (timeouts almost guaranteed for a "real"
>> image) and some things that are missing last I looked like type
>> selection etc.
>>
>> We do have a proposal to talk about this at the Summit, so maybe we'll
>> have a descent primer before we get there :)
>>
>> Thanks,
>>
>> John
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> Heh, so when I just asked in the cinder channel if we can just
> deprecate nova boot from volume with source=(image|snapshot|blank)
> (which automatically creates the volume and polls for it to be
> available) and then add a microversion that doesn't allow it, I was
> half joking, but I see we're on the same page.  This scenario seems to
> introduce a lot of orchestration work that nova shouldn't necessarily
> be in the business of handling.
I tend to agree with this.   I believe the ability to boot from a volume
with source=image was just a convenience thing and shortcut for users. 
As John stated, we know that we have issues with large images and/or
volumes here with timeouts.  If we want to continue to support this,
then the only way to make sure we don't run into timeout issues is to
look into a callback mechanism from Cinder to Nova, but that seems
awfully heavy handed, just to continue to support Nova orchestrating
this.   The good thing about the Nova and Cinder clients/APIs is that
anyone can write a quick python script to do the orchestration
themselves, if we want to deprecate this.  I'm all for deprecating this.

Walt


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Mathieu Gagné
On 2015-09-24 11:53 AM, Walter A. Boring IV wrote:
> The good thing about the Nova and Cinder clients/APIs is that
> anyone can write a quick python script to do the orchestration
> themselves, if we want to deprecate this.  I'm all for deprecating this.

I don't like this kind of reasoning which can justify close to anything.
It's easy to make those suggestions when you know Python. Please
consider non-technical/non-developers users when suggesting deprecating
features or proposing alternative solutions.

I could also say (in bad faith, I know): why have Heat when you can
write your own Python script. And yet, I don't think we would appreciate
anyone making such a controversial statement.

Our users don't know Python, use 3rd party tools (which don't often
perform/support orchestration) or the Horizon dashboard. They don't want
to have to learn Heat or Python so they can orchestrate volume creation
in place of Nova for a single instance. You don't write CloudFormation
templates on AWS just to boot an instance on volume. That's not the UX I
want to offer to my users.

-- 
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Tim Bell
> -Original Message-
> From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
> Sent: 24 September 2015 16:59
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?
> 
> 
> 
> On 9/24/2015 9:06 AM, Matt Riedemann wrote:
> >
> >
> > On 9/24/2015 3:19 AM, Sylvain Bauza wrote:
> >>
> >>
> >> Le 24/09/2015 09:04, Duncan Thomas a écrit :
> >>> Hi
> >>>
> >>> I thought I was late on this thread, but looking at the time stamps,
> >>> it is just something that escalated very quickly. I am honestly
> >>> surprised an cross-project interaction option went from 'we don't
> >>> seem to understand this' to 'deprecation merged' in 4 hours, with
> >>> only a 12 hour discussion on the mailing list, right at the end of a
> >>> cycle when we're supposed to be stabilising features.
> >>>
> >>
> >> So, I agree it was maybe a bit too quick hence the revert. That said,
> >> Nova master is now Mitaka, which means that the deprecation change
> >> was provided for the next cycle, not the one currently stabilising.
> >>
> >> Anyway, I'm really all up with discussing why Cinder needs to know
> >> the Nova AZs.
> >>
> >>> I proposed a session at the Tokyo summit for a discussion of Cinder
> >>> AZs, since there was clear confusion about what they are intended
> >>> for and how they should be configured.
> >>
> >> Cool, count me in from the Nova standpoint.
> >>
> >>> Since then I've reached out to and gotten good feedback from, a
> >>> number of operators. There are two distinct configurations for AZ
> >>> behaviour in cinder, and both sort-of worked until very recently.
> >>>
> >>> 1) No AZs in cinder
> >>> This is the config where a single 'blob' of storage (most of the
> >>> operators who responded so far are using Ceph, though that isn't
> >>> required). The storage takes care of availability concerns, and any
> >>> AZ info from nova should just be ignored.
> >>>
> >>> 2) Cinder AZs map to Nova AZs
> >>> In this case, some combination of storage / networking / etc couples
> >>> storage to nova AZs. It is may be that an AZ is used as a unit of
> >>> scaling, or it could be a real storage failure domain. Eitehr way,
> >>> there are a number of operators who have this configuration and want
> >>> to keep it. Storage can certainly have a failure domain, and
> >>> limiting the scalability problem of storage to a single cmpute AZ
> >>> can have definite advantages in failure scenarios. These people do
> >>> not want cross-az attach.
> >>>
> >>
> >> Ahem, Nova AZs are not failure domains - I mean the current
> >> implementation, in the sense of many people understand what is a
> >> failure domain, ie. a physical unit of machines (a bay, a room, a
> >> floor, a datacenter).
> >> All the AZs in Nova share the same controlplane with the same message
> >> queue and database, which means that one failure can be propagated to
> >> the other AZ.
> >>
> >> To be honest, there is one very specific usecase where AZs *are*
> >> failure domains : when cells exact match with AZs (ie. one AZ
> >> grouping all the hosts behind one cell). That's the very specific
> >> usecase that Sam is mentioning in his email, and I certainly understand
we
> need to keep that.
> >>
> >> What are AZs in Nova is pretty well explained in a quite old blogpost :
> >> http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-
> >> aggregates-in-openstack-compute-nova/
> >>
> >>
> >> We also added a few comments in our developer doc here
> >> http://docs.openstack.org/developer/nova/aggregates.html#availability
> >> -zones-azs
> >>
> >>
> >> tl;dr: AZs are aggregate metadata that makes those aggregates of
> >> compute nodes visible to the users. Nothing more than that, no magic
> sauce.
> >> That's just a logical abstraction that can be mapping your physical
> >> deployment, but like I said, which would share the same bus and DB.
> >> Of course, you could still provide networks distinct between AZs but
> >> that just gives you the L2 isolation, not the real failure domain in
> >> a Business Continuity Plan way.
> >>
> >> What puzzles me is how C

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Sylvain Bauza



Le 24/09/2015 18:16, Mathieu Gagné a écrit :

On 2015-09-24 11:53 AM, Walter A. Boring IV wrote:

The good thing about the Nova and Cinder clients/APIs is that
anyone can write a quick python script to do the orchestration
themselves, if we want to deprecate this.  I'm all for deprecating this.

I don't like this kind of reasoning which can justify close to anything.
It's easy to make those suggestions when you know Python. Please
consider non-technical/non-developers users when suggesting deprecating
features or proposing alternative solutions.

I could also say (in bad faith, I know): why have Heat when you can
write your own Python script. And yet, I don't think we would appreciate
anyone making such a controversial statement.

Our users don't know Python, use 3rd party tools (which don't often
perform/support orchestration) or the Horizon dashboard. They don't want
to have to learn Heat or Python so they can orchestrate volume creation
in place of Nova for a single instance. You don't write CloudFormation
templates on AWS just to boot an instance on volume. That's not the UX I
want to offer to my users.



I'd tend to answer that if it's an user problem, then I would prefer to 
see the orchestration done by a python-novaclient wrapping CLI module 
like we have for host-evacuate (for example) and deprecate the REST and 
novaclient APIs so it would still be possible for the users to get the 
orchestration done by the same CLI but the API no longer supporting that.


-Sylvain


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Mathieu Gagné
Hi Matt,

On 2015-09-24 1:45 PM, Matt Riedemann wrote:
> 
> 
> On 9/24/2015 11:50 AM, Mathieu Gagné wrote:
>>
>> May I suggest the following solutions:
>>
>> 1) Add ability to disable this whole AZ concept in Cinder so it doesn't
>> fail to create volumes when Nova asks for a specific AZ. This could
>> result in the same behavior as cinder.cross_az_attach config.
> 
> That's essentially what this does:
> 
> https://review.openstack.org/#/c/217857/
> 
> It defaults to False though so you have to be aware and set it if you're
> hitting this problem.
> 
> The nova block_device code that tries to create the volume and passes
> the nova AZ should have probably been taking into account the
> cinder.cross_az_attach config option, because just blindly passing it
> was the reason why cinder added that option.  There is now a change up
> for review to consider cinder.cross_az_attach in block_device:
> 
> https://review.openstack.org/#/c/225119/
> 
> But that's still making the assumption that we should be passing the AZ
> on the volume create request and will still fail if the AZ isn't in
> cinder (and allow_availability_zone_fallback=False in cinder.conf).
> 
> In talking with Duncan this morning he's going to propose a spec for an
> attempt to clean some of this up and decouple nova from handling this
> logic.  Basically a new Cinder API where you give it an AZ and it tells
> you if that's OK.  We could then use this on the nova side before we
> ever get to the compute node and fail.

IMO, the confusion comes from what I consider a wrong usage of AZ. To
quote Sylvain Bauza from a recent review [1][2]:

"because Nova AZs and Cinder AZs are very different failure domains"

This is not the concept of AZ I learned to know from cloud providers
where an AZ is global to the region, not per-service.

Google Cloud Platform:
- Persistent disks are per-zone resources. [3]
- Resources that are specific to a zone or a region can only be used by
other resources in the same zone or region. For example, disks and
instances are both zonal resources. To attach a disk to an instance,
both resources must be in the same zone. [4]

Amazon Web Services:
- Instances and disks are per-zone resources. [5]

So now we are stuck with AZ not being consistent across services and
confusing people.


[1] https://review.openstack.org/#/c/225119/2
[2] https://review.openstack.org/#/c/225119/2/nova/virt/block_device.py
[3] https://cloud.google.com/compute/docs/disks/persistent-disks
[4] https://cloud.google.com/compute/docs/zones
[5] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/resources.html

-- 
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Andrew Laski

On 09/24/15 at 12:16pm, Mathieu Gagné wrote:

On 2015-09-24 11:53 AM, Walter A. Boring IV wrote:

The good thing about the Nova and Cinder clients/APIs is that
anyone can write a quick python script to do the orchestration
themselves, if we want to deprecate this.  I'm all for deprecating this.


I don't like this kind of reasoning which can justify close to anything.
It's easy to make those suggestions when you know Python. Please
consider non-technical/non-developers users when suggesting deprecating
features or proposing alternative solutions.

I could also say (in bad faith, I know): why have Heat when you can
write your own Python script. And yet, I don't think we would appreciate
anyone making such a controversial statement.

Our users don't know Python, use 3rd party tools (which don't often
perform/support orchestration) or the Horizon dashboard. They don't want
to have to learn Heat or Python so they can orchestrate volume creation
in place of Nova for a single instance. You don't write CloudFormation
templates on AWS just to boot an instance on volume. That's not the UX I
want to offer to my users.


The issues that I've seen with having this happen in Nova are that there 
are many different ways for this process to fail and the user is 
provided no control or visibility.


As an example we have some images that should convert to volumes quickly 
so failure would be defined as taking longer than x amount of time, but 
for another set of images that are expected to take longer failure would 
be 3x amount of time.  Nova shouldn't be the place to decide how long 
volume creation should take, and I wouldn't expect to ask users to pass 
this in during an API request.


When volume creation does take a decent amount of time there is no 
indication of progress in the Nova API.  When monitoring it via the 
Cinder API you can get a rough approximation of progress.  I don't 
expect Nova to expose volume creation progress as part of the feedback 
during an instance boot request.


At the moment the volume creation request happens from the computes 
themselves.  This means that a failure presents itself as a build 
failure leading to a reschedule and ultimately the user is given a 
NoValidHost.  This is unhelpful and as an operator tracking down the 
root cause is time consuming.


When there is a failure to build an instance while Cinder is creating a 
volume it's possible to end up with the volume left around while the 
instance is deleted.  This is not at all made visible to users in the 
Nova API unless they query the list of volumes and see one they don't 
expect, though it's often immediately clear in the DELETE request sent 
to Cinder.


In short, it ends up being much nicer for users to control the process 
themselves.  Alternatively it would be nice if there was an 
orchestration system that could handle it for them.  But Nova is not 
designed to do that very well.





--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Matt Riedemann



On 9/24/2015 11:50 AM, Mathieu Gagné wrote:

On 2015-09-24 3:04 AM, Duncan Thomas wrote:


I proposed a session at the Tokyo summit for a discussion of Cinder AZs,
since there was clear confusion about what they are intended for and how
they should be configured. Since then I've reached out to and gotten
good feedback from, a number of operators.


Thanks for your proposition. I will make sure to attend this session.



There are two distinct
configurations for AZ behaviour in cinder, and both sort-of worked until
very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the
operators who responded so far are using Ceph, though that isn't
required). The storage takes care of availability concerns, and any AZ
info from nova should just be ignored.


Unless I'm very mistaken, I think it's the main "feature" missing from
OpenStack itself. The concept of AZ isn't global and anyone can still
make it so Nova AZ != Cinder AZ.

In my opinion, AZ should be a global concept where they are available
and the same for all services so Nova AZ == Cinder AZ. This could result
in a behavior similar to "regions within regions".

We should survey and ask how AZ are actually used by operators and
users. Some might create an AZ for each server racks, others for each
power segments in their datacenter or even business units so they can
segregate to specific physical servers. Some AZ use cases might just be
a "perverted" way of bypassing shortcomings in OpenStack itself. We
should find out those use cases and see if we should still support them
or offer them an existing or new alternatives.

(I don't run Ceph yet, only SolidFire but I guess the same could apply)

For people running Ceph (or other big clustered block storage), they
will have one big Cinder backend. For resources or business reasons,
they can't afford to create as many clusters (and Cinder AZ) as there
are AZ in Nova. So they end up with one big Cinder AZ (lets call it
az-1) in Cinder. Nova won't be able to create volumes in Cinder az-2 if
an instance is created in Nova az-2.

May I suggest the following solutions:

1) Add ability to disable this whole AZ concept in Cinder so it doesn't
fail to create volumes when Nova asks for a specific AZ. This could
result in the same behavior as cinder.cross_az_attach config.


That's essentially what this does:

https://review.openstack.org/#/c/217857/

It defaults to False though so you have to be aware and set it if you're 
hitting this problem.


The nova block_device code that tries to create the volume and passes 
the nova AZ should have probably been taking into account the 
cinder.cross_az_attach config option, because just blindly passing it 
was the reason why cinder added that option.  There is now a change up 
for review to consider cinder.cross_az_attach in block_device:


https://review.openstack.org/#/c/225119/

But that's still making the assumption that we should be passing the AZ 
on the volume create request and will still fail if the AZ isn't in 
cinder (and allow_availability_zone_fallback=False in cinder.conf).


In talking with Duncan this morning he's going to propose a spec for an 
attempt to clean some of this up and decouple nova from handling this 
logic.  Basically a new Cinder API where you give it an AZ and it tells 
you if that's OK.  We could then use this on the nova side before we 
ever get to the compute node and fail.




2) Add ability for a volume backend to be in multiple AZ. Of course,
this would defeat the whole AZ concept. This could however be something
our operators/users might accept.


I'd nix this on the point about it defeating the purpose of AZs.





2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of
scaling, or it could be a real storage failure domain. Eitehr way, there
are a number of operators who have this configuration and want to keep
it. Storage can certainly have a failure domain, and limiting the
scalability problem of storage to a single cmpute AZ can have definite
advantages in failure scenarios. These people do not want cross-az attach.

My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail
down the changes we need to get these to work properly. There's
definitely been interest and activity in the operator community in
making nova and cinder AZs interact, and every desired interaction I've
gotten details about so far matches one of the above models.





--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Sylvain Bauza



Le 24/09/2015 19:45, Matt Riedemann a écrit :



On 9/24/2015 11:50 AM, Mathieu Gagné wrote:

On 2015-09-24 3:04 AM, Duncan Thomas wrote:


I proposed a session at the Tokyo summit for a discussion of Cinder 
AZs,
since there was clear confusion about what they are intended for and 
how

they should be configured. Since then I've reached out to and gotten
good feedback from, a number of operators.


Thanks for your proposition. I will make sure to attend this session.



There are two distinct
configurations for AZ behaviour in cinder, and both sort-of worked 
until

very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the
operators who responded so far are using Ceph, though that isn't
required). The storage takes care of availability concerns, and any AZ
info from nova should just be ignored.


Unless I'm very mistaken, I think it's the main "feature" missing from
OpenStack itself. The concept of AZ isn't global and anyone can still
make it so Nova AZ != Cinder AZ.

In my opinion, AZ should be a global concept where they are available
and the same for all services so Nova AZ == Cinder AZ. This could result
in a behavior similar to "regions within regions".

We should survey and ask how AZ are actually used by operators and
users. Some might create an AZ for each server racks, others for each
power segments in their datacenter or even business units so they can
segregate to specific physical servers. Some AZ use cases might just be
a "perverted" way of bypassing shortcomings in OpenStack itself. We
should find out those use cases and see if we should still support them
or offer them an existing or new alternatives.

(I don't run Ceph yet, only SolidFire but I guess the same could apply)

For people running Ceph (or other big clustered block storage), they
will have one big Cinder backend. For resources or business reasons,
they can't afford to create as many clusters (and Cinder AZ) as there
are AZ in Nova. So they end up with one big Cinder AZ (lets call it
az-1) in Cinder. Nova won't be able to create volumes in Cinder az-2 if
an instance is created in Nova az-2.

May I suggest the following solutions:

1) Add ability to disable this whole AZ concept in Cinder so it doesn't
fail to create volumes when Nova asks for a specific AZ. This could
result in the same behavior as cinder.cross_az_attach config.


That's essentially what this does:

https://review.openstack.org/#/c/217857/

It defaults to False though so you have to be aware and set it if 
you're hitting this problem.


The nova block_device code that tries to create the volume and passes 
the nova AZ should have probably been taking into account the 
cinder.cross_az_attach config option, because just blindly passing it 
was the reason why cinder added that option.  There is now a change up 
for review to consider cinder.cross_az_attach in block_device:


https://review.openstack.org/#/c/225119/

But that's still making the assumption that we should be passing the 
AZ on the volume create request and will still fail if the AZ isn't in 
cinder (and allow_availability_zone_fallback=False in cinder.conf).


In talking with Duncan this morning he's going to propose a spec for 
an attempt to clean some of this up and decouple nova from handling 
this logic.  Basically a new Cinder API where you give it an AZ and it 
tells you if that's OK.  We could then use this on the nova side 
before we ever get to the compute node and fail.


MHO is like you, we should decouple Nova AZs from Cinder AZs and just 
have a lazy relationship between those by getting a way to call Cinder 
to know which AZ before calling the scheduler.







2) Add ability for a volume backend to be in multiple AZ. Of course,
this would defeat the whole AZ concept. This could however be something
our operators/users might accept.


I'd nix this on the point about it defeating the purpose of AZs.


Well, if we rename Cinder AZs to something else, then I'm honestly not 
really opiniated,since it's already always confusing, because Nova AZs 
are groups of hosts, not anything else.


If we keep the naming as AZs, then I'm not OK since it creates more 
confusion.


-Sylvain








2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of
scaling, or it could be a real storage failure domain. Eitehr way, 
there

are a number of operators who have this configuration and want to keep
it. Storage can certainly have a failure domain, and limiting the
scalability problem of storage to a single cmpute AZ can have definite
advantages in failure scenarios. These people do not want cross-az 
attach.


My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail
down the changes we need to get these to work properly. There's
definitely been interest and activity in the 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Sam Morrison

> On 24 Sep 2015, at 6:19 pm, Sylvain Bauza  wrote:
> 
> Ahem, Nova AZs are not failure domains - I mean the current implementation, 
> in the sense of many people understand what is a failure domain, ie. a 
> physical unit of machines (a bay, a room, a floor, a datacenter).
> All the AZs in Nova share the same controlplane with the same message queue 
> and database, which means that one failure can be propagated to the other AZ.
> 
> To be honest, there is one very specific usecase where AZs *are* failure 
> domains : when cells exact match with AZs (ie. one AZ grouping all the hosts 
> behind one cell). That's the very specific usecase that Sam is mentioning in 
> his email, and I certainly understand we need to keep that.
> 
> What are AZs in Nova is pretty well explained in a quite old blogpost : 
> http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/
>  
> 
Yes an AZ may not be considered a failure domain in terms of control 
infrastructure, I think all operators understand this. If you want control 
infrastructure failure domains use regions.

However from a resource level (eg. running instance/ running volume) I would 
consider them some kind of failure domain. It’s a way of saying to a user if 
you have resources running in 2 AZs you have a more available service. 

Every cloud will have a different definition of what an AZ is, a 
rack/collection of racks/DC etc. openstack doesn’t need to decide what that is.

Sam

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread James Penick
On Thu, Sep 24, 2015 at 2:22 PM, Sam Morrison  wrote:

>
> Yes an AZ may not be considered a failure domain in terms of control
> infrastructure, I think all operators understand this. If you want control
> infrastructure failure domains use regions.
>
> However from a resource level (eg. running instance/ running volume) I
> would consider them some kind of failure domain. It’s a way of saying to a
> user if you have resources running in 2 AZs you have a more available
> service.
>
> Every cloud will have a different definition of what an AZ is, a
> rack/collection of racks/DC etc. openstack doesn’t need to decide what that
> is.
>
> Sam
>

This seems to map more closely to how we use AZs.

Turning it around to the user perspective:
 My users want to be sure that when they boot compute resources, they can
do so in such a way that their application will be immune to a certain
amount of physical infrastructure failure.

Use cases I get from my users:
1. "I want to boot 10 instances, and be sure that if a single leg of power
goes down, I wont lose more than 2 instances"
2. "My instances move a lot of network traffic. I want to ensure that I
don't have more than 3 of my instances per rack, or else they'll saturate
the ToR"
3. "Compute room #1 has been overrun by crazed ferrets. I need to boot new
instances in compute room #2."
4. "I want to boot 10 instances, striped across at least two power domains,
under no less than 5 top of rack switches, with access to network security
zone X."

For my users, abstractions for availability and scale of the control plane
should be hidden from their view. I've almost never been asked by my users
whether or not the control plane is resilient. They assume that my team, as
the deployers, have taken adequate steps to ensure that the control plane
is deployed in a resilient and highly available fashion.

I think it would be good for the operator community to come to an agreement
on what an AZ should be from the perspective of those who deploy both
public and private clouds and bring that back to the dev teams.

-James
:)=
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Mathieu Gagné
On 2015-09-24 3:04 AM, Duncan Thomas wrote:
> 
> I proposed a session at the Tokyo summit for a discussion of Cinder AZs,
> since there was clear confusion about what they are intended for and how
> they should be configured. Since then I've reached out to and gotten
> good feedback from, a number of operators.

Thanks for your proposition. I will make sure to attend this session.


> There are two distinct
> configurations for AZ behaviour in cinder, and both sort-of worked until
> very recently.
> 
> 1) No AZs in cinder
> This is the config where a single 'blob' of storage (most of the
> operators who responded so far are using Ceph, though that isn't
> required). The storage takes care of availability concerns, and any AZ
> info from nova should just be ignored.

Unless I'm very mistaken, I think it's the main "feature" missing from
OpenStack itself. The concept of AZ isn't global and anyone can still
make it so Nova AZ != Cinder AZ.

In my opinion, AZ should be a global concept where they are available
and the same for all services so Nova AZ == Cinder AZ. This could result
in a behavior similar to "regions within regions".

We should survey and ask how AZ are actually used by operators and
users. Some might create an AZ for each server racks, others for each
power segments in their datacenter or even business units so they can
segregate to specific physical servers. Some AZ use cases might just be
a "perverted" way of bypassing shortcomings in OpenStack itself. We
should find out those use cases and see if we should still support them
or offer them an existing or new alternatives.

(I don't run Ceph yet, only SolidFire but I guess the same could apply)

For people running Ceph (or other big clustered block storage), they
will have one big Cinder backend. For resources or business reasons,
they can't afford to create as many clusters (and Cinder AZ) as there
are AZ in Nova. So they end up with one big Cinder AZ (lets call it
az-1) in Cinder. Nova won't be able to create volumes in Cinder az-2 if
an instance is created in Nova az-2.

May I suggest the following solutions:

1) Add ability to disable this whole AZ concept in Cinder so it doesn't
fail to create volumes when Nova asks for a specific AZ. This could
result in the same behavior as cinder.cross_az_attach config.

2) Add ability for a volume backend to be in multiple AZ. Of course,
this would defeat the whole AZ concept. This could however be something
our operators/users might accept.


> 2) Cinder AZs map to Nova AZs
> In this case, some combination of storage / networking / etc couples
> storage to nova AZs. It is may be that an AZ is used as a unit of
> scaling, or it could be a real storage failure domain. Eitehr way, there
> are a number of operators who have this configuration and want to keep
> it. Storage can certainly have a failure domain, and limiting the
> scalability problem of storage to a single cmpute AZ can have definite
> advantages in failure scenarios. These people do not want cross-az attach.
> 
> My hope at the summit session was to agree these two configurations,
> discuss any scenarios not covered by these two configuration, and nail
> down the changes we need to get these to work properly. There's
> definitely been interest and activity in the operator community in
> making nova and cinder AZs interact, and every desired interaction I've
> gotten details about so far matches one of the above models.


-- 
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Ivan Kolodyazhny
Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback [1] option in
Cinder config as fix for bug [2]. If you set this option, Cinder will
create volume in a default AZ instead of set volume into the error state

[1]
https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31
[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann 
wrote:

> I came across bug 1496235 [1] today.  In this case the user is booting an
> instance from a volume using source=image, so nova actually does the volume
> create call to the volume API.  They are booting the instance into a valid
> nova availability zone, but that same AZ isn't defined in Cinder, so the
> volume create request fails (since nova passes the instance AZ to cinder
> [2]).
>
> I marked this as invalid given how the code works.
>
> I'm posting here since I'm wondering if there are alternatives worth
> pursuing.  For example, nova could get the list of AZs from the volume API
> and if the nova AZ isn't in that list, don't provide it on the volume
> create request.  That's essentially the same as first creating the volume
> outside of nova and not specifying an AZ, then when doing the boot from
> volume, provide the volume_id as the source.
>
> The question is, is it worth doing that?  I'm not familiar enough with how
> availability zones are meant to work between nova and cinder so it's hard
> for me to have much of an opinion here.
>
> [1] https://bugs.launchpad.net/nova/+bug/1496235
> [2]
> https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread John Griffith
On Wed, Sep 23, 2015 at 1:34 PM, Matt Riedemann 
wrote:

>
>
> On 9/23/2015 2:15 PM, Matt Riedemann wrote:
>
>>
>>
>> On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:
>>
>>> Hi Matt,
>>>
>>> In Liberty, we introduced allow_availability_zone_fallback [1] option in
>>> Cinder config as fix for bug [2]. If you set this option, Cinder will
>>> create volume in a default AZ instead of set volume into the error state
>>>
>>> [1]
>>>
>>> https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31
>>>
>>> [2] https://bugs.launchpad.net/cinder/+bug/1489575
>>>
>>> Regards,
>>> Ivan Kolodyazhny
>>>
>>> On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann
>>> > wrote:
>>>
>>> I came across bug 1496235 [1] today.  In this case the user is
>>> booting an instance from a volume using source=image, so nova
>>> actually does the volume create call to the volume API.  They are
>>> booting the instance into a valid nova availability zone, but that
>>> same AZ isn't defined in Cinder, so the volume create request fails
>>> (since nova passes the instance AZ to cinder [2]).
>>>
>>> I marked this as invalid given how the code works.
>>>
>>> I'm posting here since I'm wondering if there are alternatives worth
>>> pursuing.  For example, nova could get the list of AZs from the
>>> volume API and if the nova AZ isn't in that list, don't provide it
>>> on the volume create request.  That's essentially the same as first
>>> creating the volume outside of nova and not specifying an AZ, then
>>> when doing the boot from volume, provide the volume_id as the source.
>>>
>>> The question is, is it worth doing that?  I'm not familiar enough
>>> with how availability zones are meant to work between nova and
>>> cinder so it's hard for me to have much of an opinion here.
>>>
>>> [1] https://bugs.launchpad.net/nova/+bug/1496235
>>> [2]
>>>
>>>
>>> https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383
>>>
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> Matt Riedemann
>>>
>>>
>>>
>>>
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>>
>>> 
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>>
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>> Sorry but that seems like a hack.
>>
>> I'm trying to figure out the relationship between AZs in nova and cinder
>> and so far no one seems to really know.  In the cinder IRC channel I was
>> told there isn't one, which would mean we shouldn't even try creating
>> the volume using the server instance AZ.
>>
>> Also, if there is no relationship, I was trying to figure out why there
>> is the cinder.cross_az_attach config option.  That was added in grizzly
>> [1].  I was thinking maybe it was a legacy artifact from nova-volume,
>> but that was dropped in grizzly.
>>
>> So is cinder.cross_az_attach even useful?
>>
>> [1] https://review.openstack.org/#/c/21672/
>>
>>
> The plot thickens.
>
> I was checking to see what change was made to start passing the server
> instance az on the volume create call during boot from volume, and that was
> [1] which was added in kilo to fix a bug where boot from volume into a nova
> az will fail if cinder.cross_az_attach=False and storage_availability_zone
> is set in cinder.conf.
>
> So I guess we can't just stop passing the instance az to the volume create
> call.
>
> But what I'd really like to know is how this is all used between cinder
> and nova, or was this all some work done as part of a larger effort that
> was never completed?  Basically, can we deprecate the
> cinder.cross_az_attach config option in nova and start decoupling this code?
>
> [1] https://review.openstack.org/#/c/157041/
>
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
​To be honest this is probably my fault, AZ's were pulled in as part of the
nova-volume migration to Cinder and just sort of died.  Quite frankly I
wasn't sure "what" to do with them but brought over the concept and the
zones that existing in Nova-Volume.  It's been an issue since day 1 of
Cinder, and 

[openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Matt Riedemann
I came across bug 1496235 [1] today.  In this case the user is booting 
an instance from a volume using source=image, so nova actually does the 
volume create call to the volume API.  They are booting the instance 
into a valid nova availability zone, but that same AZ isn't defined in 
Cinder, so the volume create request fails (since nova passes the 
instance AZ to cinder [2]).


I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are alternatives worth 
pursuing.  For example, nova could get the list of AZs from the volume 
API and if the nova AZ isn't in that list, don't provide it on the 
volume create request.  That's essentially the same as first creating 
the volume outside of nova and not specifying an AZ, then when doing the 
boot from volume, provide the volume_id as the source.


The question is, is it worth doing that?  I'm not familiar enough with 
how availability zones are meant to work between nova and cinder so it's 
hard for me to have much of an opinion here.


[1] https://bugs.launchpad.net/nova/+bug/1496235
[2] 
https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Matt Riedemann



On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback [1] option in
Cinder config as fix for bug [2]. If you set this option, Cinder will
create volume in a default AZ instead of set volume into the error state

[1]
https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31
[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann
> wrote:

I came across bug 1496235 [1] today.  In this case the user is
booting an instance from a volume using source=image, so nova
actually does the volume create call to the volume API.  They are
booting the instance into a valid nova availability zone, but that
same AZ isn't defined in Cinder, so the volume create request fails
(since nova passes the instance AZ to cinder [2]).

I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are alternatives worth
pursuing.  For example, nova could get the list of AZs from the
volume API and if the nova AZ isn't in that list, don't provide it
on the volume create request.  That's essentially the same as first
creating the volume outside of nova and not specifying an AZ, then
when doing the boot from volume, provide the volume_id as the source.

The question is, is it worth doing that?  I'm not familiar enough
with how availability zones are meant to work between nova and
cinder so it's hard for me to have much of an opinion here.

[1] https://bugs.launchpad.net/nova/+bug/1496235
[2]

https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Sorry but that seems like a hack.

I'm trying to figure out the relationship between AZs in nova and cinder 
and so far no one seems to really know.  In the cinder IRC channel I was 
told there isn't one, which would mean we shouldn't even try creating 
the volume using the server instance AZ.


Also, if there is no relationship, I was trying to figure out why there 
is the cinder.cross_az_attach config option.  That was added in grizzly 
[1].  I was thinking maybe it was a legacy artifact from nova-volume, 
but that was dropped in grizzly.


So is cinder.cross_az_attach even useful?

[1] https://review.openstack.org/#/c/21672/

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Matt Riedemann



On 9/23/2015 2:15 PM, Matt Riedemann wrote:



On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback [1] option in
Cinder config as fix for bug [2]. If you set this option, Cinder will
create volume in a default AZ instead of set volume into the error state

[1]
https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31

[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann
> wrote:

I came across bug 1496235 [1] today.  In this case the user is
booting an instance from a volume using source=image, so nova
actually does the volume create call to the volume API.  They are
booting the instance into a valid nova availability zone, but that
same AZ isn't defined in Cinder, so the volume create request fails
(since nova passes the instance AZ to cinder [2]).

I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are alternatives worth
pursuing.  For example, nova could get the list of AZs from the
volume API and if the nova AZ isn't in that list, don't provide it
on the volume create request.  That's essentially the same as first
creating the volume outside of nova and not specifying an AZ, then
when doing the boot from volume, provide the volume_id as the source.

The question is, is it worth doing that?  I'm not familiar enough
with how availability zones are meant to work between nova and
cinder so it's hard for me to have much of an opinion here.

[1] https://bugs.launchpad.net/nova/+bug/1496235
[2]

https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


--

Thanks,

Matt Riedemann



__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe


http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Sorry but that seems like a hack.

I'm trying to figure out the relationship between AZs in nova and cinder
and so far no one seems to really know.  In the cinder IRC channel I was
told there isn't one, which would mean we shouldn't even try creating
the volume using the server instance AZ.

Also, if there is no relationship, I was trying to figure out why there
is the cinder.cross_az_attach config option.  That was added in grizzly
[1].  I was thinking maybe it was a legacy artifact from nova-volume,
but that was dropped in grizzly.

So is cinder.cross_az_attach even useful?

[1] https://review.openstack.org/#/c/21672/



The plot thickens.

I was checking to see what change was made to start passing the server 
instance az on the volume create call during boot from volume, and that 
was [1] which was added in kilo to fix a bug where boot from volume into 
a nova az will fail if cinder.cross_az_attach=False and 
storage_availability_zone is set in cinder.conf.


So I guess we can't just stop passing the instance az to the volume 
create call.


But what I'd really like to know is how this is all used between cinder 
and nova, or was this all some work done as part of a larger effort that 
was never completed?  Basically, can we deprecate the 
cinder.cross_az_attach config option in nova and start decoupling this code?


[1] https://review.openstack.org/#/c/157041/

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Mathieu Gagné
On 2015-09-23 4:50 PM, Andrew Laski wrote:
> On 09/23/15 at 04:30pm, Mathieu Gagné wrote:
>> On 2015-09-23 4:12 PM, Andrew Laski wrote:
>>> On 09/23/15 at 02:55pm, Matt Riedemann wrote:

 Heh, so when I just asked in the cinder channel if we can just
 deprecate nova boot from volume with source=(image|snapshot|blank)
 (which automatically creates the volume and polls for it to be
 available) and then add a microversion that doesn't allow it, I was
 half joking, but I see we're on the same page.  This scenario seems to
 introduce a lot of orchestration work that nova shouldn't necessarily
 be in the business of handling.
>>>
>>> I am very much in support of this.  This has been a source of
>>> frustration for our users because it is prone to failures we can't
>>> properly expose to users and timeouts.  There are much better places to
>>> handle the orchestration of creating a volume and then booting from it
>>> than Nova.
>>>
>>
>> Unfortunately, this is a feature our users *heavily* rely on and we
>> worked very hard to make it happen. We had a private patch on our side
>> for years to optimize boot-from-volume before John Griffith came up with
>> an upstream solution for SolidFire [2] and others with a generic
>> solution [3] [4].
>>
>> Being able to "nova boot" and have everything done for you is awesome.
>> Just see what Monty Taylor mentioned in his thread about sane default
>> networking [1]. Having orchestration on the client side is just
>> something our users don't want to have to do and often complain about.
> 
> At risk of getting too offtopic I think there's an alternate solution to
> doing this in Nova or on the client side.  I think we're missing some
> sort of OpenStack API and service that can handle this.  Nova is a low
> level infrastructure API and service, it is not designed to handle these
> orchestrations.  I haven't checked in on Heat in a while but perhaps
> this is a role that it could fill.
> 
> I think that too many people consider Nova to be *the* OpenStack API
> when considering instances/volumes/networking/images and that's not
> something I would like to see continue.  Or at the very least I would
> like to see a split between the orchestration/proxy pieces and the
> "manage my VM/container/baremetal" bits.
> 

"too many people" happens to include a lot of 3rd party tools supporting
OpenStack which our users complain a lot about. Just see all the
possible way to get an external IP [5]. Introducing yet another service
would increase the pain on our users which will see their tools and
products not working even more.

Just see how EC2 is doing it [6], you won't see them suggest to use yet
another service to orchestrate what I consider a fundamental feature "I
wish to boot an instance on a volume".

The current ease to boot from volume is THE selling feature our users
want and heavily/actively use. We fought very hard to make it work and
reading about how it should be removed is frustrating.

Issues we identified shouldn't be a reason to drop this feature. Other
providers are making it work and I don't see why we couldn't. I'm
convinced we can do better.

[5]
https://github.com/openstack-infra/shade/blob/03c1556a12aabfc21de60a9fac97aea7871485a3/shade/meta.py#L106-L173
[6]
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html

Mathieu

>>
>> [1]
>> http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html
>>
>> [2] https://review.openstack.org/#/c/142859/
>> [3] https://review.openstack.org/#/c/195795/
>> [4] https://review.openstack.org/#/c/201754/
>>
>> -- 
>> Mathieu


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sylvain Bauza



Le 23/09/2015 21:45, John Griffith a écrit :



On Wed, Sep 23, 2015 at 1:34 PM, Matt Riedemann 
> wrote:




On 9/23/2015 2:15 PM, Matt Riedemann wrote:



On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback
[1] option in
Cinder config as fix for bug [2]. If you set this option,
Cinder will
create volume in a default AZ instead of set volume into
the error state

[1]

https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31

[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann

>> wrote:

I came across bug 1496235 [1] today.  In this case the
user is
booting an instance from a volume using source=image,
so nova
actually does the volume create call to the volume
API.  They are
booting the instance into a valid nova availability
zone, but that
same AZ isn't defined in Cinder, so the volume create
request fails
(since nova passes the instance AZ to cinder [2]).

I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are
alternatives worth
pursuing.  For example, nova could get the list of AZs
from the
volume API and if the nova AZ isn't in that list,
don't provide it
on the volume create request.  That's essentially the
same as first
creating the volume outside of nova and not specifying
an AZ, then
when doing the boot from volume, provide the volume_id
as the source.

The question is, is it worth doing that?  I'm not
familiar enough
with how availability zones are meant to work between
nova and
cinder so it's hard for me to have much of an opinion
here.

[1] https://bugs.launchpad.net/nova/+bug/1496235
[2]


https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


--

Thanks,

Matt Riedemann




__

OpenStack Development Mailing List (not for usage
questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe





http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe


http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Sorry but that seems like a hack.

I'm trying to figure out the relationship between AZs in nova
and cinder
and so far no one seems to really know.  In the cinder IRC
channel I was
told there isn't one, which would mean we shouldn't even try
creating
the volume using the server instance AZ.

Also, if there is no relationship, I was trying to figure out
why there
is the cinder.cross_az_attach config option.  That was added
in grizzly
[1].  I was thinking maybe it was a legacy artifact from
nova-volume,
but that was dropped in grizzly.

So is cinder.cross_az_attach even useful?

[1] https://review.openstack.org/#/c/21672/


The plot thickens.

I was checking to see what change was made to start passing the
server instance az on the volume create call during boot from
volume, and that was [1] which was added in kilo to fix a bug
where boot from volume into a nova az will fail if
cinder.cross_az_attach=False and storage_availability_zone is set
in cinder.conf.

So I guess we can't just stop passing the instance az to the
volume create call.

But what I'd really like 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Matt Riedemann



On 9/23/2015 2:57 PM, Sylvain Bauza wrote:



Le 23/09/2015 21:45, John Griffith a écrit :



On Wed, Sep 23, 2015 at 1:34 PM, Matt Riedemann
> wrote:



On 9/23/2015 2:15 PM, Matt Riedemann wrote:



On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback
[1] option in
Cinder config as fix for bug [2]. If you set this option,
Cinder will
create volume in a default AZ instead of set volume into
the error state

[1]

https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31

[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann

>> wrote:

I came across bug 1496235 [1] today.  In this case the
user is
booting an instance from a volume using source=image,
so nova
actually does the volume create call to the volume
API.  They are
booting the instance into a valid nova availability
zone, but that
same AZ isn't defined in Cinder, so the volume create
request fails
(since nova passes the instance AZ to cinder [2]).

I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are
alternatives worth
pursuing.  For example, nova could get the list of AZs
from the
volume API and if the nova AZ isn't in that list,
don't provide it
on the volume create request.  That's essentially the
same as first
creating the volume outside of nova and not specifying
an AZ, then
when doing the boot from volume, provide the volume_id
as the source.

The question is, is it worth doing that?  I'm not
familiar enough
with how availability zones are meant to work between
nova and
cinder so it's hard for me to have much of an opinion
here.

[1] https://bugs.launchpad.net/nova/+bug/1496235
[2]


https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


--

Thanks,

Matt Riedemann




__

OpenStack Development Mailing List (not for usage
questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe





http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe


http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Sorry but that seems like a hack.

I'm trying to figure out the relationship between AZs in nova
and cinder
and so far no one seems to really know.  In the cinder IRC
channel I was
told there isn't one, which would mean we shouldn't even try
creating
the volume using the server instance AZ.

Also, if there is no relationship, I was trying to figure out
why there
is the cinder.cross_az_attach config option.  That was added
in grizzly
[1].  I was thinking maybe it was a legacy artifact from
nova-volume,
but that was dropped in grizzly.

So is cinder.cross_az_attach even useful?

[1] https://review.openstack.org/#/c/21672/


The plot thickens.

I was checking to see what change was made to start passing the
server instance az on the volume create call during boot from
volume, and that was [1] which was added in kilo to fix a bug
where boot from volume into a nova az will fail if
cinder.cross_az_attach=False and storage_availability_zone is set
in cinder.conf.

So I guess we can't just stop passing the instance az to the

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Mathieu Gagné
On 2015-09-23 4:12 PM, Andrew Laski wrote:
> On 09/23/15 at 02:55pm, Matt Riedemann wrote:
>>
>> Heh, so when I just asked in the cinder channel if we can just
>> deprecate nova boot from volume with source=(image|snapshot|blank)
>> (which automatically creates the volume and polls for it to be
>> available) and then add a microversion that doesn't allow it, I was
>> half joking, but I see we're on the same page.  This scenario seems to
>> introduce a lot of orchestration work that nova shouldn't necessarily
>> be in the business of handling.
> 
> I am very much in support of this.  This has been a source of
> frustration for our users because it is prone to failures we can't
> properly expose to users and timeouts.  There are much better places to
> handle the orchestration of creating a volume and then booting from it
> than Nova.
> 

Unfortunately, this is a feature our users *heavily* rely on and we
worked very hard to make it happen. We had a private patch on our side
for years to optimize boot-from-volume before John Griffith came up with
an upstream solution for SolidFire [2] and others with a generic
solution [3] [4].

Being able to "nova boot" and have everything done for you is awesome.
Just see what Monty Taylor mentioned in his thread about sane default
networking [1]. Having orchestration on the client side is just
something our users don't want to have to do and often complain about.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html
[2] https://review.openstack.org/#/c/142859/
[3] https://review.openstack.org/#/c/195795/
[4] https://review.openstack.org/#/c/201754/

-- 
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Matt Riedemann



On 9/23/2015 2:45 PM, John Griffith wrote:



On Wed, Sep 23, 2015 at 1:34 PM, Matt Riedemann
> wrote:



On 9/23/2015 2:15 PM, Matt Riedemann wrote:



On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

Hi Matt,

In Liberty, we introduced allow_availability_zone_fallback
[1] option in
Cinder config as fix for bug [2]. If you set this option,
Cinder will
create volume in a default AZ instead of set volume into the
error state

[1]

https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31

[2] https://bugs.launchpad.net/cinder/+bug/1489575

Regards,
Ivan Kolodyazhny

On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann

>> wrote:

 I came across bug 1496235 [1] today.  In this case the
user is
 booting an instance from a volume using source=image,
so nova
 actually does the volume create call to the volume
API.  They are
 booting the instance into a valid nova availability
zone, but that
 same AZ isn't defined in Cinder, so the volume create
request fails
 (since nova passes the instance AZ to cinder [2]).

 I marked this as invalid given how the code works.

 I'm posting here since I'm wondering if there are
alternatives worth
 pursuing.  For example, nova could get the list of AZs
from the
 volume API and if the nova AZ isn't in that list, don't
provide it
 on the volume create request.  That's essentially the
same as first
 creating the volume outside of nova and not specifying
an AZ, then
 when doing the boot from volume, provide the volume_id
as the source.

 The question is, is it worth doing that?  I'm not
familiar enough
 with how availability zones are meant to work between
nova and
 cinder so it's hard for me to have much of an opinion here.

 [1] https://bugs.launchpad.net/nova/+bug/1496235
 [2]


https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


 --

 Thanks,

 Matt Riedemann




__

 OpenStack Development Mailing List (not for usage
questions)
 Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 




http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Sorry but that seems like a hack.

I'm trying to figure out the relationship between AZs in nova
and cinder
and so far no one seems to really know.  In the cinder IRC
channel I was
told there isn't one, which would mean we shouldn't even try
creating
the volume using the server instance AZ.

Also, if there is no relationship, I was trying to figure out
why there
is the cinder.cross_az_attach config option.  That was added in
grizzly
[1].  I was thinking maybe it was a legacy artifact from
nova-volume,
but that was dropped in grizzly.

So is cinder.cross_az_attach even useful?

[1] https://review.openstack.org/#/c/21672/


The plot thickens.

I was checking to see what change was made to start passing the
server instance az on the volume create call during boot from
volume, and that was [1] which was added in kilo to fix a bug where
boot from volume into a nova az will fail if
cinder.cross_az_attach=False and storage_availability_zone is set in
cinder.conf.

So I guess we can't just stop passing the instance az to the volume
create call.

But what I'd really like to know is how 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Andrew Laski

On 09/23/15 at 01:45pm, John Griffith wrote:



​To be honest this is probably my fault, AZ's were pulled in as part of the
nova-volume migration to Cinder and just sort of died.  Quite frankly I
wasn't sure "what" to do with them but brought over the concept and the
zones that existing in Nova-Volume.  It's been an issue since day 1 of
Cinder, and as you note there are little hacks here and there over the
years to do different things.

I think your question about whether they should be there at all or not is a
good one.  We have had some interest from folks lately that want to couple
Nova and Cinder AZ's (I'm really not sure of any details or use-cases here).

My opinion would be until somebody proposes a clear use case and need that
actually works that we consider deprecating it.


I've heard some discussion about trying to use coupled AZs in order to 
schedule volumes close to instances.  However I think that is occurring 
because it's possible to do that, not because that would be a good way 
to handle the coordinated scheduling problem.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Andrew Laski

On 09/23/15 at 02:55pm, Matt Riedemann wrote:



On 9/23/2015 2:45 PM, John Griffith wrote:



On Wed, Sep 23, 2015 at 1:34 PM, Matt Riedemann
> wrote:



   On 9/23/2015 2:15 PM, Matt Riedemann wrote:



   On 9/23/2015 1:46 PM, Ivan Kolodyazhny wrote:

   Hi Matt,

   In Liberty, we introduced allow_availability_zone_fallback
   [1] option in
   Cinder config as fix for bug [2]. If you set this option,
   Cinder will
   create volume in a default AZ instead of set volume into the
   error state

   [1]
   
https://github.com/openstack/cinder/commit/b85d2812a8256ff82934d150dbc4909e041d8b31

   [2] https://bugs.launchpad.net/cinder/+bug/1489575

   Regards,
   Ivan Kolodyazhny

   On Wed, Sep 23, 2015 at 9:00 PM, Matt Riedemann
   
   >> wrote:

I came across bug 1496235 [1] today.  In this case the
   user is
booting an instance from a volume using source=image,
   so nova
actually does the volume create call to the volume
   API.  They are
booting the instance into a valid nova availability
   zone, but that
same AZ isn't defined in Cinder, so the volume create
   request fails
(since nova passes the instance AZ to cinder [2]).

I marked this as invalid given how the code works.

I'm posting here since I'm wondering if there are
   alternatives worth
pursuing.  For example, nova could get the list of AZs
   from the
volume API and if the nova AZ isn't in that list, don't
   provide it
on the volume create request.  That's essentially the
   same as first
creating the volume outside of nova and not specifying
   an AZ, then
when doing the boot from volume, provide the volume_id
   as the source.

The question is, is it worth doing that?  I'm not
   familiar enough
with how availability zones are meant to work between
   nova and
cinder so it's hard for me to have much of an opinion here.

[1] https://bugs.launchpad.net/nova/+bug/1496235
[2]

   
https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L381-L383


--

Thanks,

Matt Riedemann



   
__

OpenStack Development Mailing List (not for usage
   questions)
Unsubscribe:
   openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 


   

   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




   
__

   OpenStack Development Mailing List (not for usage questions)
   Unsubscribe:
   openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 

   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


   Sorry but that seems like a hack.

   I'm trying to figure out the relationship between AZs in nova
   and cinder
   and so far no one seems to really know.  In the cinder IRC
   channel I was
   told there isn't one, which would mean we shouldn't even try
   creating
   the volume using the server instance AZ.

   Also, if there is no relationship, I was trying to figure out
   why there
   is the cinder.cross_az_attach config option.  That was added in
   grizzly
   [1].  I was thinking maybe it was a legacy artifact from
   nova-volume,
   but that was dropped in grizzly.

   So is cinder.cross_az_attach even useful?

   [1] https://review.openstack.org/#/c/21672/


   The plot thickens.

   I was checking to see what change was made to start passing the
   server instance az on the volume create call during boot from
   volume, and that was [1] which was added in kilo to fix a bug where
   boot from volume into a nova az will fail if
   cinder.cross_az_attach=False and storage_availability_zone is set in
   cinder.conf.

   So I guess we can't just stop passing the instance az to the volume
   create call.

   But what I'd really like to know is how this is all used between
   cinder and nova, or 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sylvain Bauza



Le 23/09/2015 22:15, Andrew Laski a écrit :

On 09/23/15 at 01:45pm, John Griffith wrote:



​To be honest this is probably my fault, AZ's were pulled in as part 
of the

nova-volume migration to Cinder and just sort of died.  Quite frankly I
wasn't sure "what" to do with them but brought over the concept and the
zones that existing in Nova-Volume.  It's been an issue since day 1 of
Cinder, and as you note there are little hacks here and there over the
years to do different things.

I think your question about whether they should be there at all or 
not is a
good one.  We have had some interest from folks lately that want to 
couple
Nova and Cinder AZ's (I'm really not sure of any details or use-cases 
here).


My opinion would be until somebody proposes a clear use case and need 
that

actually works that we consider deprecating it.


I've heard some discussion about trying to use coupled AZs in order to 
schedule volumes close to instances.  However I think that is 
occurring because it's possible to do that, not because that would be 
a good way to handle the coordinated scheduling problem.




So, while I think it's understandable to have that done, since Nova AZs 
are related to compute nodes and Cinder AZs could be related to volumes, 
I'd tend to ask Cinder to rename the AZ concept into something else less 
confusing.


Also, there is a long story about trying to have Cinder provide 
resources to the Nova scheduler so that we could have volume affinity 
when booting, so I would prefer to go that way instead of trying to 
misuse AZs.


I'm about to ask for a Nova/Cinder/Neutron room at the Summit to discuss 
how Cinder and Neutron could provide resources to the scheduler, I'd 
love to get feedback from those teams there.


 -Sylvain

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Andrew Laski

On 09/23/15 at 04:30pm, Mathieu Gagné wrote:

On 2015-09-23 4:12 PM, Andrew Laski wrote:

On 09/23/15 at 02:55pm, Matt Riedemann wrote:


Heh, so when I just asked in the cinder channel if we can just
deprecate nova boot from volume with source=(image|snapshot|blank)
(which automatically creates the volume and polls for it to be
available) and then add a microversion that doesn't allow it, I was
half joking, but I see we're on the same page.  This scenario seems to
introduce a lot of orchestration work that nova shouldn't necessarily
be in the business of handling.


I am very much in support of this.  This has been a source of
frustration for our users because it is prone to failures we can't
properly expose to users and timeouts.  There are much better places to
handle the orchestration of creating a volume and then booting from it
than Nova.



Unfortunately, this is a feature our users *heavily* rely on and we
worked very hard to make it happen. We had a private patch on our side
for years to optimize boot-from-volume before John Griffith came up with
an upstream solution for SolidFire [2] and others with a generic
solution [3] [4].

Being able to "nova boot" and have everything done for you is awesome.
Just see what Monty Taylor mentioned in his thread about sane default
networking [1]. Having orchestration on the client side is just
something our users don't want to have to do and often complain about.


At risk of getting too offtopic I think there's an alternate solution to 
doing this in Nova or on the client side.  I think we're missing some 
sort of OpenStack API and service that can handle this.  Nova is a low 
level infrastructure API and service, it is not designed to handle these 
orchestrations.  I haven't checked in on Heat in a while but perhaps 
this is a role that it could fill.


I think that too many people consider Nova to be *the* OpenStack API 
when considering instances/volumes/networking/images and that's not 
something I would like to see continue.  Or at the very least I would 
like to see a split between the orchestration/proxy pieces and the 
"manage my VM/container/baremetal" bits.




[1]
http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html
[2] https://review.openstack.org/#/c/142859/
[3] https://review.openstack.org/#/c/195795/
[4] https://review.openstack.org/#/c/201754/

--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Ikuo Kumagai
Hi All,

I'm sorry I was in vacation yesterday(in JST), and I did not notice this
discussion.
I registered "bug 1496235".

In our case , there is Nova 2 az(az1, az2),and Cinder 1 az (default).
Cinder backend is ceph, that is a cluster of compute nodes inclued az1 and
az2 of nova. Nova's 2 az always use cinder default zone .

When I resistered, the option I wanted is that I can select "sync" or
"async" az between nova and cinder.

Regards,
IKUO Kumagai


2015-09-24 10:05 GMT+09:00 Sam Morrison :

>
> > On 24 Sep 2015, at 9:59 am, Andrew Laski  wrote:
> >
> > I was perhaps hasty in approving that patch and didn't realize that Matt
> had reached out for operator feedback at the same time that he proposed it.
> Since this is being used in production I wouldn't want it to be removed
> without at least having an alternative, and hopefully better, method of
> achieving your goal.  Reverting the deprecation seems reasonable to me for
> now while we work out the details around Cinder/Nova AZ interactions.
>
> Thanks Andrew,
>
> What we basically want is for our users to have instances and volumes on a
> section of hardware and then for them to be able to have other instances
> and volumes in another section of hardware.
>
> If one section dies then the other section is fine. For us we use
> availability-zones for this. If this is not the intended use for AZs what
> is a better way for us to do this.
>
> Cheers,
> Sam
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Andrew Laski

On 09/24/15 at 09:34am, Sam Morrison wrote:

Just got alerted to this on the operator list.

We very much rely on this.

We have multiple availability zones in nova and each zone has a corresponding 
cinder-volume service(s) in the same availability zone.

We don’t want people attaching a volume from one zone to another as the network 
won’t allow that as the zones are in different network domains and different 
data centres.

I wonder if you guys can reconsider deprecating this option as it is very 
useful to us.


I was perhaps hasty in approving that patch and didn't realize that Matt 
had reached out for operator feedback at the same time that he proposed 
it.  Since this is being used in production I wouldn't want it to be 
removed without at least having an alternative, and hopefully better, 
method of achieving your goal.  Reverting the deprecation seems 
reasonable to me for now while we work out the details around 
Cinder/Nova AZ interactions.






Cheers,
Sam




On 24 Sep 2015, at 7:43 am, Mathieu Gagné  wrote:

On 2015-09-23 4:50 PM, Andrew Laski wrote:

On 09/23/15 at 04:30pm, Mathieu Gagné wrote:

On 2015-09-23 4:12 PM, Andrew Laski wrote:

On 09/23/15 at 02:55pm, Matt Riedemann wrote:


Heh, so when I just asked in the cinder channel if we can just
deprecate nova boot from volume with source=(image|snapshot|blank)
(which automatically creates the volume and polls for it to be
available) and then add a microversion that doesn't allow it, I was
half joking, but I see we're on the same page.  This scenario seems to
introduce a lot of orchestration work that nova shouldn't necessarily
be in the business of handling.


I am very much in support of this.  This has been a source of
frustration for our users because it is prone to failures we can't
properly expose to users and timeouts.  There are much better places to
handle the orchestration of creating a volume and then booting from it
than Nova.



Unfortunately, this is a feature our users *heavily* rely on and we
worked very hard to make it happen. We had a private patch on our side
for years to optimize boot-from-volume before John Griffith came up with
an upstream solution for SolidFire [2] and others with a generic
solution [3] [4].

Being able to "nova boot" and have everything done for you is awesome.
Just see what Monty Taylor mentioned in his thread about sane default
networking [1]. Having orchestration on the client side is just
something our users don't want to have to do and often complain about.


At risk of getting too offtopic I think there's an alternate solution to
doing this in Nova or on the client side.  I think we're missing some
sort of OpenStack API and service that can handle this.  Nova is a low
level infrastructure API and service, it is not designed to handle these
orchestrations.  I haven't checked in on Heat in a while but perhaps
this is a role that it could fill.

I think that too many people consider Nova to be *the* OpenStack API
when considering instances/volumes/networking/images and that's not
something I would like to see continue.  Or at the very least I would
like to see a split between the orchestration/proxy pieces and the
"manage my VM/container/baremetal" bits.



"too many people" happens to include a lot of 3rd party tools supporting
OpenStack which our users complain a lot about. Just see all the
possible way to get an external IP [5]. Introducing yet another service
would increase the pain on our users which will see their tools and
products not working even more.

Just see how EC2 is doing it [6], you won't see them suggest to use yet
another service to orchestrate what I consider a fundamental feature "I
wish to boot an instance on a volume".

The current ease to boot from volume is THE selling feature our users
want and heavily/actively use. We fought very hard to make it work and
reading about how it should be removed is frustrating.

Issues we identified shouldn't be a reason to drop this feature. Other
providers are making it work and I don't see why we couldn't. I'm
convinced we can do better.

[5]
https://github.com/openstack-infra/shade/blob/03c1556a12aabfc21de60a9fac97aea7871485a3/shade/meta.py#L106-L173
[6]
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html

Mathieu



[1]
http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html

[2] https://review.openstack.org/#/c/142859/
[3] https://review.openstack.org/#/c/195795/
[4] https://review.openstack.org/#/c/201754/

--
Mathieu



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sam Morrison
Just got alerted to this on the operator list.

We very much rely on this.

We have multiple availability zones in nova and each zone has a corresponding 
cinder-volume service(s) in the same availability zone.

We don’t want people attaching a volume from one zone to another as the network 
won’t allow that as the zones are in different network domains and different 
data centres.

I wonder if you guys can reconsider deprecating this option as it is very 
useful to us.

Cheers,
Sam



> On 24 Sep 2015, at 7:43 am, Mathieu Gagné  wrote:
> 
> On 2015-09-23 4:50 PM, Andrew Laski wrote:
>> On 09/23/15 at 04:30pm, Mathieu Gagné wrote:
>>> On 2015-09-23 4:12 PM, Andrew Laski wrote:
 On 09/23/15 at 02:55pm, Matt Riedemann wrote:
> 
> Heh, so when I just asked in the cinder channel if we can just
> deprecate nova boot from volume with source=(image|snapshot|blank)
> (which automatically creates the volume and polls for it to be
> available) and then add a microversion that doesn't allow it, I was
> half joking, but I see we're on the same page.  This scenario seems to
> introduce a lot of orchestration work that nova shouldn't necessarily
> be in the business of handling.
 
 I am very much in support of this.  This has been a source of
 frustration for our users because it is prone to failures we can't
 properly expose to users and timeouts.  There are much better places to
 handle the orchestration of creating a volume and then booting from it
 than Nova.
 
>>> 
>>> Unfortunately, this is a feature our users *heavily* rely on and we
>>> worked very hard to make it happen. We had a private patch on our side
>>> for years to optimize boot-from-volume before John Griffith came up with
>>> an upstream solution for SolidFire [2] and others with a generic
>>> solution [3] [4].
>>> 
>>> Being able to "nova boot" and have everything done for you is awesome.
>>> Just see what Monty Taylor mentioned in his thread about sane default
>>> networking [1]. Having orchestration on the client side is just
>>> something our users don't want to have to do and often complain about.
>> 
>> At risk of getting too offtopic I think there's an alternate solution to
>> doing this in Nova or on the client side.  I think we're missing some
>> sort of OpenStack API and service that can handle this.  Nova is a low
>> level infrastructure API and service, it is not designed to handle these
>> orchestrations.  I haven't checked in on Heat in a while but perhaps
>> this is a role that it could fill.
>> 
>> I think that too many people consider Nova to be *the* OpenStack API
>> when considering instances/volumes/networking/images and that's not
>> something I would like to see continue.  Or at the very least I would
>> like to see a split between the orchestration/proxy pieces and the
>> "manage my VM/container/baremetal" bits.
>> 
> 
> "too many people" happens to include a lot of 3rd party tools supporting
> OpenStack which our users complain a lot about. Just see all the
> possible way to get an external IP [5]. Introducing yet another service
> would increase the pain on our users which will see their tools and
> products not working even more.
> 
> Just see how EC2 is doing it [6], you won't see them suggest to use yet
> another service to orchestrate what I consider a fundamental feature "I
> wish to boot an instance on a volume".
> 
> The current ease to boot from volume is THE selling feature our users
> want and heavily/actively use. We fought very hard to make it work and
> reading about how it should be removed is frustrating.
> 
> Issues we identified shouldn't be a reason to drop this feature. Other
> providers are making it work and I don't see why we couldn't. I'm
> convinced we can do better.
> 
> [5]
> https://github.com/openstack-infra/shade/blob/03c1556a12aabfc21de60a9fac97aea7871485a3/shade/meta.py#L106-L173
> [6]
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html
> 
> Mathieu
> 
>>> 
>>> [1]
>>> http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html
>>> 
>>> [2] https://review.openstack.org/#/c/142859/
>>> [3] https://review.openstack.org/#/c/195795/
>>> [4] https://review.openstack.org/#/c/201754/
>>> 
>>> -- 
>>> Mathieu
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sam Morrison

> On 24 Sep 2015, at 9:59 am, Andrew Laski  wrote:
> 
> I was perhaps hasty in approving that patch and didn't realize that Matt had 
> reached out for operator feedback at the same time that he proposed it. Since 
> this is being used in production I wouldn't want it to be removed without at 
> least having an alternative, and hopefully better, method of achieving your 
> goal.  Reverting the deprecation seems reasonable to me for now while we work 
> out the details around Cinder/Nova AZ interactions.

Thanks Andrew,

What we basically want is for our users to have instances and volumes on a 
section of hardware and then for them to be able to have other instances and 
volumes in another section of hardware.

If one section dies then the other section is fine. For us we use 
availability-zones for this. If this is not the intended use for AZs what is a 
better way for us to do this.

Cheers,
Sam



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev