[openstack-dev] [openstack-ansible] dropping xenial jobs

2018-10-09 Thread Mohammed Naser
Hi everyone!

So I’ve been thinking of dropping the Xenial jobs to reduce our overall impact 
in terms of gate usage in master because we don’t support it. 

However, I was a bit torn on this because i realize that it’s possible for us 
to write things and backport them only to find out that they’d break under 
xenial which can be deployed with Rocky. 

Thoughts?  Ideas?  I was thinking maybe Lee an experimental job.. not really 
sure on specifics but I’d like to bring in more feedback. 

Thanks,
Mohammed
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api] Open API 3.0 for OpenStack API

2018-10-09 Thread Gilles Dubreuil



On 09/10/18 23:58, Jeremy Stanley wrote:

On 2018-10-09 08:52:52 -0400 (-0400), Jim Rollenhagen wrote:
[...]

It seems to me that a major goal of openstacksdk is to hide differences
between clouds from the user. If the user is meant to use a GraphQL library
themselves, we lose this and the user needs to figure it out themselves.
Did I understand that correctly?

This is especially useful where the SDK implements business logic
for common operations like "if the user requested A and the cloud
supports features B+C+D then use those to fulfil the request,
otherwise fall back to using features E+F".



The features offered to the user don't have to change, it's just a 
different architecture.


The user doesn't have to deal with a GraphQL library, only the client 
applications (consuming OpenStack APIs).
And there are also UI tools such as GraphiQL which allow to interact 
directly with GraphQL servers.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Chris Friesen

On 10/9/2018 1:20 PM, Jay Pipes wrote:

On 10/09/2018 11:04 AM, Balázs Gibizer wrote:

If you do the force flag removal in a nw microversion that also means
(at least to me) that you should not change the behavior of the force
flag in the old microversions.


Agreed.

Keep the old, buggy and unsafe behaviour for the old microversion and in 
a new microversion remove the --force flag entirely and always call GET 
/a_c, followed by a claim_resources() on the destination host.


Agreed.  Once you start looking at more complicated resource topologies, 
you pretty much need to handle allocations properly.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Eric Fried


On 10/09/2018 02:20 PM, Jay Pipes wrote:
> On 10/09/2018 11:04 AM, Balázs Gibizer wrote:
>> If you do the force flag removal in a nw microversion that also means
>> (at least to me) that you should not change the behavior of the force
>> flag in the old microversions.
> 
> Agreed.
> 
> Keep the old, buggy and unsafe behaviour for the old microversion and in
> a new microversion remove the --force flag entirely and always call GET
> /a_c, followed by a claim_resources() on the destination host.
> 
> For the old microversion behaviour, continue to do the "blind copy" of
> allocations from the source compute node provider to the destination
> compute node provider.

TBC, for nested/sharing source, we should consolidate all the resources
into a single allocation against the destination's root provider?

> That "blind copy" will still fail if there isn't
> capacity for the new allocations on the destination host anyway, because
> the blind copy is just issuing a POST /allocations, and that code path
> still checks capacity on the target resource providers.

What happens when the migration fails, either because of that POST
/allocations, or afterwards? Do we still have the old allocation around
to restore? Cause we can't re-figure it from the now-monolithic
destination allocation.

> There isn't a
> code path in the placement API that allows a provider's inventory
> capacity to be exceeded by new allocations.
> 
> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [manila] nominating Amit Oren for manila core

2018-10-09 Thread Amit Oren
Thank you Tom for nominating me and thank you all for your votes :)

- Amit

On Tue, Oct 9, 2018 at 11:24 PM Erlon Cruz  wrote:

> Hey Amit,
>
> Welcome!
>
> Em ter, 9 de out de 2018 às 16:52, Tom Barron  escreveu:
>
>> On 02/10/18 13:58 -0400, Tom Barron wrote:
>> >Amit Oren has contributed high quality reviews in the last
>> >couple of cycles so I would like to nominated him for manila
>> >core.
>> >
>> >Please respond with your +1 or -1 votes.  We'll hold voting
>> >open for 7 days.
>> >
>> >Thanks,
>> >
>> >-- Tom Barron (tbarron)
>> >
>>
>> We've had lots of +1s for Amit Oren as manila core and no -1s so I've
>> added him.
>>
>> Welcome, Amit!
>>
>> -- Tom
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [manila] nominating Amit Oren for manila core

2018-10-09 Thread Erlon Cruz
Hey Amit,

Welcome!

Em ter, 9 de out de 2018 às 16:52, Tom Barron  escreveu:

> On 02/10/18 13:58 -0400, Tom Barron wrote:
> >Amit Oren has contributed high quality reviews in the last
> >couple of cycles so I would like to nominated him for manila
> >core.
> >
> >Please respond with your +1 or -1 votes.  We'll hold voting
> >open for 7 days.
> >
> >Thanks,
> >
> >-- Tom Barron (tbarron)
> >
>
> We've had lots of +1s for Amit Oren as manila core and no -1s so I've
> added him.
>
> Welcome, Amit!
>
> -- Tom
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][qa] Enabling online volume_extend tests by default

2018-10-09 Thread Erlon Cruz
Hi Ghanshyam,


Though I have concern over running those tests by default(making config
> options True by default), because it is not confirmed all cinder backends
> implements this functionality and it only works for nova libvirt driver. We
> need to keep config options default as False and Devstack/CI can make it
> True to run the tests.
>
>
The discussion on the PTG was about whether we should run this on gate to
actually break the CIs. Once that happens, vendors will have 3 options:

#1: fix their drivers by properly implementing  volume_extend and run
the positive tests
#2: fix their drivers by reporting that they not support volume_extend
and run the negative tests
#3: disable volume extend tests at all (not recommendable), but this
still give us a hint on whether the vendor supports this or not


> If this feature becomes mandatory functionality (or cinder say standard
> feature i think) to implement for every backends and it work with all nova
> driver also(in term of instance action events) then, we can enable this
> feature tests by default. But until then, we should keep them disable by
> default in Tempest but we can enable them on gate via Devstack (patch you
> mentioned) and test them daily on integrated-gate.
>

Its not mandatory that the driver must implement online_extend, but if the
driver does not support it, the driver should report as so.


> Overall, I am ok with Devstack change to make these tests enable for every
> Cinder backends but we need to keep the config options false in Tempest.
>

So, the outcome from the PTG was that we would first merge the tempest test
and give time for vendors to get the drivers fixed. Then we would change it
in devstack so we push vendor to fix their drivers in case they hadn't done
that.

Erlon



>
> I will review those patch and leave comments on gerrit (i saw those patch
> introduce new config option than using the existing one)
>
> -gmann
>
>  > Please let us know if you have any question or concerns about it.
>  > Kind regards,Erlon_[1]
> https://review.openstack.org/#/c/572188/[2]
> https://review.openstack.org/#/c/578463/
> __
>  > OpenStack Development Mailing List (not for usage questions)
>  > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>  > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>  >
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [manila] nominating Amit Oren for manila core

2018-10-09 Thread Tom Barron

On 02/10/18 13:58 -0400, Tom Barron wrote:
Amit Oren has contributed high quality reviews in the last 
couple of cycles so I would like to nominated him for manila 
core.


Please respond with your +1 or -1 votes.  We'll hold voting 
open for 7 days.


Thanks,

-- Tom Barron (tbarron)



We've had lots of +1s for Amit Oren as manila core and no -1s so I've 
added him.


Welcome, Amit!

-- Tom


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Lance Bragstad
On Tue, Oct 9, 2018 at 10:56 AM Doug Hellmann  wrote:

> Matthew Thode  writes:
>
> > On 18-10-09 11:12:30, Doug Hellmann wrote:
> >> Matthew Thode  writes:
> >>
> >> > several projects have had problems with the new release, some have
> ways
> >> > of working around it, and some do not.  I'm sending this just to raise
> >> > the issue and allow a place to discuss solutions.
> >> >
> >> > Currently there is a review proposed to blacklist 9.0.0, but if this
> is
> >> > going to still be an issue somehow in further releases we may need
> >> > another solution.
> >> >
> >> > https://review.openstack.org/#/c/608835/
> >> >
> >> > --
> >> > Matthew Thode (prometheanfire)
> >> >
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> Do you have links to the failure logs or bug reports or something? If I
> >> wanted to help I wouldn't even know where to start.
> >>
> >
> >
> http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz
>
> These failures look like we should add a proper API to oslo.messaging to
> set the notification and rpc backends for testing. The configuration
> options are *not* part of the API of the library.
>
> There is already an oslo_messaging.conffixture module with a fixture
> class, but it looks like it defaults to rabbit. Maybe someone wants to
> propose a patch to make that a parameter to the constructor?
>
> >
> http://logs.openstack.org/21/607521/2/check/cross-glance-py35/e2161d7/testr_results.html.gz
>
> These failures should be fixed by releasing the patch that Mehdi
> provided that ensures there is a valid default transport configured.
>
> >
> http://logs.openstack.org/21/607521/2/check/cross-keystone-py35/908a1c2/testr_results.html.gz
>
> Lance has already described these as mocking implementation details of
> the library. I expect we'll need someone with keystone experience to
> work out what the best solution is to do there.
>

So - I think it's apparent there are two things to do to fix this for
keystone, which could be true for other projects as well.

To recap, keystone has tests to assert the plumbing to send a notification
was called, or not called, depending on configuration options in keystone
(we allow operators to opt out of noisy notifications, like authenticate).

As noted earlier, we shouldn't be making these assertions using an internal
method of oslo.messaging. I have a patch up to refactor that to use the
public API instead [0]. Even with that fix [0], the tests mentioned by Matt
still fail because there isn't a sane default. I have a separate patch up
to make keystone's tests work by supplying the default introduced in
version 9.0.1 [1], overriding the configuration option for transport_url.
This got a bit hairy in a circular-dependency kind of way because
get_notification_transport() [2] is what registers the default options,
which is broken. I have a patch to keystone [3] showing how I worked around
this, which might not be needed if we allow the constructor to accept an
override for transport_url.

[0] https://review.openstack.org/#/c/609072/
[1] https://review.openstack.org/#/c/608196/3/oslo_messaging/transport.py
[2]
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/notify/notifier.py#n167
[3] https://review.openstack.org/#/c/609106/


>
> >
> > --
> > Matthew Thode (prometheanfire)
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 03:10 PM, Fox, Kevin M wrote:

Oh, this does raise an interesting question... Should such information be reported by the 
projects up to users through labels? Something like, "percona_multimaster=safe" 
Its really difficult for folks to know which projects can and can not be used that way 
currently.


Are you referring to k8s labels/selectors? or are you referring to 
project tags (you know, part of that whole Big Tent thing...)?


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky RC time regression analysis

2018-10-09 Thread Matt Riedemann

On 10/5/2018 6:59 PM, melanie witt wrote:
5) when live migration fails due to a internal error rollback is not 
handled correctly https://bugs.launchpad.net/nova/+bug/1788014


- Bug was reported on 2018-08-20
- The change that caused the regression landed on 2018-07-26, FF day 
https://review.openstack.org/434870

- Unrelated to a blueprint, the regression was part of a bug fix
- Was found because sean-k-mooney was doing live migrations and found 
that when a LM failed because of a QEMU internal error, the VM remained 
ACTIVE but the VM no longer had network connectivity.

- Question: why wasn't this caught earlier?
- Answer: We would need a live migration job scenario that intentionally 
initiates and fails a live migration, then verify network connectivity 
after the rollback occurs.

- Question: can we add something like that?


Not in Tempest, no, but we could run something in the 
nova-live-migration job since that executes via its own script. We could 
hack something in like what we have proposed for testing evacuate:


https://review.openstack.org/#/c/602174/

The trick is figuring out how to introduce a fault in the destination 
host without taking down the service, because if the compute service is 
down we won't schedule to it.




6) nova-manage db online_data_migrations hangs on instances with no host 
set https://bugs.launchpad.net/nova/+bug/1788115


- Bug was reported on 2018-08-21
- The patch that introduced the bug landed on 2018-05-30 
https://review.openstack.org/567878

- Unrelated to a blueprint, the regression was part of a bug fix
- Question: why wasn't this caught earlier?
- Answer: To hit the bug, you had to have had instances with no host set 
(that failed to schedule) in your database during an upgrade. This does 
not happen during the grenade job
- Question: could we add anything to the grenade job that would leave 
some instances with no host set to cover cases like this?


Probably - I'd think creating a server on the old side with some 
parameters that we know won't schedule would do it, maybe requesting an 
AZ that doesn't exist, or some other kind of scheduler hint that we know 
won't work so we get a NoValidHost. However, online_data_migrations in 
grenade probably don't run on the cell0 database, so I'm not sure we 
would have caught that case.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Jay Pipes

On 10/09/2018 11:04 AM, Balázs Gibizer wrote:

If you do the force flag removal in a nw microversion that also means
(at least to me) that you should not change the behavior of the force
flag in the old microversions.


Agreed.

Keep the old, buggy and unsafe behaviour for the old microversion and in 
a new microversion remove the --force flag entirely and always call GET 
/a_c, followed by a claim_resources() on the destination host.


For the old microversion behaviour, continue to do the "blind copy" of 
allocations from the source compute node provider to the destination 
compute node provider. That "blind copy" will still fail if there isn't 
capacity for the new allocations on the destination host anyway, because 
the blind copy is just issuing a POST /allocations, and that code path 
still checks capacity on the target resource providers. There isn't a 
code path in the placement API that allows a provider's inventory 
capacity to be exceeded by new allocations.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Fox, Kevin M
Oh, this does raise an interesting question... Should such information be 
reported by the projects up to users through labels? Something like, 
"percona_multimaster=safe" Its really difficult for folks to know which 
projects can and can not be used that way currently.

Is this a TC question?

Thanks,
Kevin

From: melanie witt [melwi...@gmail.com]
Sent: Tuesday, October 09, 2018 10:35 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, 
fabio and FQDN endpoints

On Tue, 9 Oct 2018 07:23:03 -0400, Jay Pipes wrote:
> That explains where the source of the problem comes from (it's the use
> of SELECT FOR UPDATE, which has been removed from Nova's quota-handling
> code in the Rocky release).

Small correction, the SELECT FOR UPDATE was removed from Nova's
quota-handling code in the Pike release.

-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] Core Team Update

2018-10-09 Thread Li Liu
Hi Cyborg Team,

I want to nominate Xinran Wang as a new core reviewer for Cyborg project.
Xiran has been working hard and kept contributing to the project[1][2].
Keep Claim and Carry on :)

[1]
https://review.openstack.org/#/q/owner:xin-ran.wang%2540intel.com+status:open
[2]
http://stackalytics.com/?module=cyborg-group=person-day=rocky_id=xinran
-- 
Thank you

Regards

Li
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cyborg] Weekly Meeting this week is happening on ZOOM

2018-10-09 Thread Li Liu
Hi Team,

This week's Cyborg meeting will be held in ZOOM.  Sundar will give a
presentation.

LI LIU is inviting you to a scheduled Zoom meeting.

Topic: Cyborg Meeting
Time: Oct 10, 2018 10:00 AM Eastern Time (US and Canada)

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/6637468814

Or iPhone one-tap :
US: +16465588665,,6637468814#  or +16699006833,,6637468814#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 646 558 8665  or +1 669 900 6833
Meeting ID: 663 746 8814
International numbers available: https://zoom.us/u/bXRmjYQX


-- 
Thank you

Regards

Li
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-09 Thread Matt Riedemann

On 10/9/2018 11:08 AM, Miguel Lavalle wrote:
Since it has been more than a week since this nomination was posted and 
we have received only positive feedback, can we move ahead and add 
Bernard Cafarelli to Neutron Stable core team?


Done:

https://review.openstack.org/#/admin/groups/539,members

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread melanie witt

On Tue, 9 Oct 2018 10:35:23 -0700, Melanie Witt wrote:

On Tue, 9 Oct 2018 07:23:03 -0400, Jay Pipes wrote:

That explains where the source of the problem comes from (it's the use
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling
code in the Rocky release).

Small correction, the SELECT FOR UPDATE was removed from Nova's
quota-handling code in the Pike release.


Elaboration: the calls to quota reserve/commit/rollback were removed in 
the Pike release, so with_lockmode('update') is not called for quota 
operations, even though the reserve/commit/rollback methods are still 
there for use by old (Ocata) computes during an Ocata => Pike upgrade. 
Then, the reserve/commit/rollback methods were removed in Queens once no 
old computes could be calling them.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [masakari][congress] what is a host?

2018-10-09 Thread Eric K
Got it thanks very much, Sampath!

From:  Sam P 
Reply-To:  "OpenStack Development Mailing List (not for usage questions)"

Date:  Saturday, October 6, 2018 at 8:29 PM
To:  "OpenStack Development Mailing List (not for usage questions)"

Subject:  Re: [openstack-dev] [masakari][congress] what is a host?

> Hi Eric,
>  
> (1) "virtual machine" is bug. This need to be corrected as physical machine or
> hypervisor.
>   Masakari host is a physical host/hypervisor. I will correct this.
> 
> (2) Not through masakari APIs. You have to add metadata key 'HA_Enabled=True'
> to each VM by using nova API.
>  Masakeri monitors check for failures and send notification to masakari
> API if detected any failure (i.e host, VM or process failures).
>  In host failure (hypervisor down) scenario, Masakari engine get the VM
> list on that hypervisor and start evacuate VMs.
>  Operator can configure masakari to evacuate all VMs or only the VMs with
> the metadata key   'HA_Enabled=True.
>  Please see the config file [1] section [host_failure] for more details.
> 
> Let me know if you need more info on this.
> 
> [1] https://docs.openstack.org/masakari/latest/sample_config.html
> 
> --- Regards,
> Sampath
> 
> 
> 
> On Sun, Oct 7, 2018 at 8:55 AM Eric K  wrote:
>> Hi all, I'm working on a potential integration between masakari and
>> congress. But I am stuck on some basic usage questions I could not
>> answer in my search of docs and demos. Any clarification or references
>> would be much appreciated!
>> 
>> 1. What does a host refer to in masakari API? Here's the explanation in API
>> doc:
>> "Host can be any kind of virtual machine which can have compute
>> service running on it."
>> (https://developer.openstack.org/api-ref/instance-ha/#hosts-hosts)
>> 
>> So is a masakari host usually a nova instance/server instead of a
>> host/hypervisor?
>> 
>> 2. Through the masakari api, how does one go about configuring a VM to
>> be managed by masakari instance HA?
>> 
>> Thanks so much!
>> 
>> Eric Kao
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __
> OpenStack Development Mailing List (not for usage questions) Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Ken Giusti
On Tue, Oct 9, 2018 at 12:30 PM Doug Hellmann  wrote:

> Ken Giusti  writes:
>
> > On Tue, Oct 9, 2018 at 11:56 AM Doug Hellmann 
> wrote:
> >
> >> Matthew Thode  writes:
> >>
> >> > On 18-10-09 11:12:30, Doug Hellmann wrote:
> >> >> Matthew Thode  writes:
> >> >>
> >> >> > several projects have had problems with the new release, some have
> >> ways
> >> >> > of working around it, and some do not.  I'm sending this just to
> raise
> >> >> > the issue and allow a place to discuss solutions.
> >> >> >
> >> >> > Currently there is a review proposed to blacklist 9.0.0, but if
> this
> >> is
> >> >> > going to still be an issue somehow in further releases we may need
> >> >> > another solution.
> >> >> >
> >> >> > https://review.openstack.org/#/c/608835/
> >> >> >
> >> >> > --
> >> >> > Matthew Thode (prometheanfire)
> >> >> >
> >>
> __
> >> >> > OpenStack Development Mailing List (not for usage questions)
> >> >> > Unsubscribe:
> >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >>
> >> >> Do you have links to the failure logs or bug reports or something?
> If I
> >> >> wanted to help I wouldn't even know where to start.
> >> >>
> >> >
> >> >
> >>
> http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz
> >>
> >> These failures look like we should add a proper API to oslo.messaging to
> >> set the notification and rpc backends for testing. The configuration
> >> options are *not* part of the API of the library.
> >>
> >> There is already an oslo_messaging.conffixture module with a fixture
> >> class, but it looks like it defaults to rabbit. Maybe someone wants to
> >> propose a patch to make that a parameter to the constructor?
> >>
> >
> > oslo.messaging's conffixture uses whatever the config default for
> > transport_url is unless the test
> > specifically overrides it by setting the transport_url attribute.
> > The o.m. unit tests's base test class sets conffixture.transport_url to
> > "fake:/" to use the fake in memory driver.
> > That's the existing practice (I believe it's used like that outside of
> o.m.)
>
> OK, so it sounds like the fixture is relying on the configuration to be
> set up in advance, and that's the thing we need to change. We don't want
> users outside of the library to set up tests by using the configuration
> options, right?
>

That's the intent of ConfFixture it seems - provide a wrapper API so tests
don't have to monkey directly with the config.

How about this:

  https://review.openstack.org/609063


>
> Doug
>


-- 
Ken Giusti  (kgiu...@gmail.com)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread melanie witt

On Tue, 9 Oct 2018 07:23:03 -0400, Jay Pipes wrote:

That explains where the source of the problem comes from (it's the use
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling
code in the Rocky release).


Small correction, the SELECT FOR UPDATE was removed from Nova's 
quota-handling code in the Pike release.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Ken Giusti  writes:

> On Tue, Oct 9, 2018 at 11:56 AM Doug Hellmann  wrote:
>
>> Matthew Thode  writes:
>>
>> > On 18-10-09 11:12:30, Doug Hellmann wrote:
>> >> Matthew Thode  writes:
>> >>
>> >> > several projects have had problems with the new release, some have
>> ways
>> >> > of working around it, and some do not.  I'm sending this just to raise
>> >> > the issue and allow a place to discuss solutions.
>> >> >
>> >> > Currently there is a review proposed to blacklist 9.0.0, but if this
>> is
>> >> > going to still be an issue somehow in further releases we may need
>> >> > another solution.
>> >> >
>> >> > https://review.openstack.org/#/c/608835/
>> >> >
>> >> > --
>> >> > Matthew Thode (prometheanfire)
>> >> >
>> __
>> >> > OpenStack Development Mailing List (not for usage questions)
>> >> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >> Do you have links to the failure logs or bug reports or something? If I
>> >> wanted to help I wouldn't even know where to start.
>> >>
>> >
>> >
>> http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz
>>
>> These failures look like we should add a proper API to oslo.messaging to
>> set the notification and rpc backends for testing. The configuration
>> options are *not* part of the API of the library.
>>
>> There is already an oslo_messaging.conffixture module with a fixture
>> class, but it looks like it defaults to rabbit. Maybe someone wants to
>> propose a patch to make that a parameter to the constructor?
>>
>
> oslo.messaging's conffixture uses whatever the config default for
> transport_url is unless the test
> specifically overrides it by setting the transport_url attribute.
> The o.m. unit tests's base test class sets conffixture.transport_url to
> "fake:/" to use the fake in memory driver.
> That's the existing practice (I believe it's used like that outside of o.m.)

OK, so it sounds like the fixture is relying on the configuration to be
set up in advance, and that's the thing we need to change. We don't want
users outside of the library to set up tests by using the configuration
options, right?

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Ken Giusti
On Tue, Oct 9, 2018 at 11:56 AM Doug Hellmann  wrote:

> Matthew Thode  writes:
>
> > On 18-10-09 11:12:30, Doug Hellmann wrote:
> >> Matthew Thode  writes:
> >>
> >> > several projects have had problems with the new release, some have
> ways
> >> > of working around it, and some do not.  I'm sending this just to raise
> >> > the issue and allow a place to discuss solutions.
> >> >
> >> > Currently there is a review proposed to blacklist 9.0.0, but if this
> is
> >> > going to still be an issue somehow in further releases we may need
> >> > another solution.
> >> >
> >> > https://review.openstack.org/#/c/608835/
> >> >
> >> > --
> >> > Matthew Thode (prometheanfire)
> >> >
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> Do you have links to the failure logs or bug reports or something? If I
> >> wanted to help I wouldn't even know where to start.
> >>
> >
> >
> http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz
>
> These failures look like we should add a proper API to oslo.messaging to
> set the notification and rpc backends for testing. The configuration
> options are *not* part of the API of the library.
>
> There is already an oslo_messaging.conffixture module with a fixture
> class, but it looks like it defaults to rabbit. Maybe someone wants to
> propose a patch to make that a parameter to the constructor?
>

oslo.messaging's conffixture uses whatever the config default for
transport_url is unless the test
specifically overrides it by setting the transport_url attribute.
The o.m. unit tests's base test class sets conffixture.transport_url to
"fake:/" to use the fake in memory driver.
That's the existing practice (I believe it's used like that outside of o.m.)


>
> >
> http://logs.openstack.org/21/607521/2/check/cross-glance-py35/e2161d7/testr_results.html.gz
>
> These failures should be fixed by releasing the patch that Mehdi
> provided that ensures there is a valid default transport configured.
>
> >
> http://logs.openstack.org/21/607521/2/check/cross-keystone-py35/908a1c2/testr_results.html.gz
>
> Lance has already described these as mocking implementation details of
> the library. I expect we'll need someone with keystone experience to
> work out what the best solution is to do there.
>
> >
> > --
> > Matthew Thode (prometheanfire)
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Ken Giusti  (kgiu...@gmail.com)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Fox, Kevin M
etcd is an already approved openstack dependency. Could that be used instead of 
consul so as to not add yet another storage system? coredns with the 
https://coredns.io/plugins/etcd/ plugin would maybe do what you need?

Thanks,
Kevin

From: Florian Engelmann [florian.engelm...@everyware.ch]
Sent: Monday, October 08, 2018 3:14 AM
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio 
and FQDN endpoints

Hi,

I would like to start a discussion about some changes and additions I
would like to see in in kolla and kolla-ansible.

1. Keepalived is a problem in layer3 spine leaf networks as any floating
IP can only exist in one leaf (and VRRP is a problem in layer3). I would
like to use consul and registrar to get rid of the "internal" floating
IP and use consuls DNS service discovery to connect all services with
each other.

2. Using "ports" for external API (endpoint) access is a major headache
if a firewall is involved. I would like to configure the HAProxy (or
fabio) for the external access to use "Host:" like, eg. "Host:
keystone.somedomain.tld", "Host: nova.somedomain.tld", ... with HTTPS.
Any customer would just need HTTPS access and not have to open all those
ports in his firewall. For some enterprise customers it is not possible
to request FW changes like that.

3. HAProxy is not capable to handle "read/write" split with Galera. I
would like to introduce ProxySQL to be able to scale Galera.

4. HAProxy is fine but fabio integrates well with consul, statsd and
could be connected to a vault cluster to manage secure certificate access.

5. I would like to add vault as Barbican backend.

6. I would like to add an option to enable tokenless authentication for
all services with each other to get rid of all the openstack service
passwords (security issue).

What do you think about it?

All the best,
Florian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Fox, Kevin M
There are specific cases where it expects the client to retry and not all code 
tests for that case. Its safe funneling all traffic to one server. It can be 
unsafe to do so otherwise.

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.com]
Sent: Monday, October 08, 2018 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, 
fabio and FQDN endpoints

On 10/08/2018 06:14 AM, Florian Engelmann wrote:
> 3. HAProxy is not capable to handle "read/write" split with Galera. I
> would like to introduce ProxySQL to be able to scale Galera.

Why not send all read and all write traffic to a single haproxy endpoint
and just have haproxy spread all traffic across each Galera node?

Galera, after all, is multi-master synchronous replication... so it
shouldn't matter which node in the Galera cluster you send traffic to.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-09 Thread Miguel Lavalle
Hi Stable Team,

Since it has been more than a week since this nomination was posted and we
have received only positive feedback, can we move ahead and add Bernard
Cafarelli to Neutron Stable core team?

Thanks and regards

Miguel

On Wed, Oct 3, 2018 at 8:32 AM Nate Johnston 
wrote:

> On Tue, Oct 02, 2018 at 10:41:58AM -0500, Miguel Lavalle wrote:
>
> > I want to nominate Bernard Cafarrelli as a stable core reviewer for
> Neutron
> > and related projects. Bernard has been increasing the number of stable
> > reviews he is doing for the project [1]. Besides that, he is a stable
> > maintainer downstream for his employer (Red Hat), so he can bring that
> > valuable experience to the Neutron stable team.
>
> I'm not on the stable team, but an enthusiastic +1 from me!
>
> Nate
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Matthew Thode  writes:

> On 18-10-09 11:12:30, Doug Hellmann wrote:
>> Matthew Thode  writes:
>> 
>> > several projects have had problems with the new release, some have ways
>> > of working around it, and some do not.  I'm sending this just to raise
>> > the issue and allow a place to discuss solutions.
>> >
>> > Currently there is a review proposed to blacklist 9.0.0, but if this is
>> > going to still be an issue somehow in further releases we may need
>> > another solution.
>> >
>> > https://review.openstack.org/#/c/608835/
>> >
>> > -- 
>> > Matthew Thode (prometheanfire)
>> > __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> Do you have links to the failure logs or bug reports or something? If I
>> wanted to help I wouldn't even know where to start.
>> 
>
> http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz

These failures look like we should add a proper API to oslo.messaging to
set the notification and rpc backends for testing. The configuration
options are *not* part of the API of the library.

There is already an oslo_messaging.conffixture module with a fixture
class, but it looks like it defaults to rabbit. Maybe someone wants to
propose a patch to make that a parameter to the constructor?

> http://logs.openstack.org/21/607521/2/check/cross-glance-py35/e2161d7/testr_results.html.gz

These failures should be fixed by releasing the patch that Mehdi
provided that ensures there is a valid default transport configured.

> http://logs.openstack.org/21/607521/2/check/cross-keystone-py35/908a1c2/testr_results.html.gz

Lance has already described these as mocking implementation details of
the library. I expect we'll need someone with keystone experience to
work out what the best solution is to do there.

>
> -- 
> Matthew Thode (prometheanfire)
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Lance Bragstad
On Tue, Oct 9, 2018 at 10:31 AM Ben Nemec  wrote:

>
>
> On 10/9/18 9:06 AM, Lance Bragstad wrote:
> > Keystone is failing because it's missing a fix from oslo.messaging [0].
> > That said, keystone is also relying on an internal implementation detail
> > in oslo.messaging by mocking it in tests [1]. The notification work has
> > been around in keystone for a *long* time, but it's apparent that we
> > should revisit these tests to make sure we aren't testing something that
> > is already tested by oslo.messaging if we're mocking internal
> > implementation details of a library.
>
> This is actually the same problem Cinder and Glance had, it's just being
> hidden because there is an exception handler in Keystone that buried the
> original exception message in log output. 9.0.1 will get Keystone
> working too.
>
> But mocking library internals is still naughty and you should stop that.
> :-P
>

Agreed. I have a note to investigate and see if I can rip those bits out or
rewrite them.


>
> >
> > Regardless, blacklisting version 9.0.0 will work for keystone, but we
> > can work around it another way by either rewriting the tests to not care
> > about oslo.messaging specifics, or removing them if they're obsolete.
> >
> > [0] https://review.openstack.org/#/c/608196/
> > [1]
> >
> https://git.openstack.org/cgit/openstack/keystone/tree/keystone/tests/unit/common/test_notifications.py#n1343
> >
> > On Mon, Oct 8, 2018 at 10:59 PM Matthew Thode  > > wrote:
> >
> > several projects have had problems with the new release, some have
> ways
> > of working around it, and some do not.  I'm sending this just to
> raise
> > the issue and allow a place to discuss solutions.
> >
> > Currently there is a review proposed to blacklist 9.0.0, but if this
> is
> > going to still be an issue somehow in further releases we may need
> > another solution.
> >
> > https://review.openstack.org/#/c/608835/
> >
> > --
> > Matthew Thode (prometheanfire)
> >
>  __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > <
> http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Ben Nemec  writes:

> On 10/9/18 10:19 AM, Doug Hellmann wrote:
>> Brian Rosmaita  writes:
>> 
>>> On 10/8/18 11:59 PM, Matthew Thode wrote:
 several projects have had problems with the new release, some have ways
 of working around it, and some do not.  I'm sending this just to raise
 the issue and allow a place to discuss solutions.

 Currently there is a review proposed to blacklist 9.0.0, but if this is
 going to still be an issue somehow in further releases we may need
 another solution.

 https://review.openstack.org/#/c/608835/
>>>
>>> As indicated in the commit message on the above patch, 9.0.0 contains a
>>> bug that's been fixed in oslo.messaging master, so I don't think there's
>>> any question that 9.0.0 has to be blacklisted.
>> 
>> I've proposed releasing oslo.messaging 9.0.1 in
>> https://review.openstack.org/609030
>
> I also included it in https://review.openstack.org/#/c/609031/ (which I 
> see you found).

Yeah, I abandoned my separate patch to do the same in favor of your
omnibus patch.

>> If we don't land the constraint update to allow 9.0.1 in, then there's
>> no rush to blacklist anything, is there?
>
> Probably not. We'll want to blacklist it before we allow 9.0.1, but I 
> suspect this is mostly a test problem since in production the transport 
> would have to be set explicitly.
>
>> 
>>> As far as the timing/content of 9.0.1, however, that may require further
>>> discussion.
>>>
>>> (In other words, I'm saying that when you say 'another solution', my
>>> position is that we should take 'another' to mean 'additional', not
>>> 'different'.)
>> 
>> I'm not sure what that means.
>> 
>> Doug
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Balázs Gibizer


On Tue, Oct 9, 2018 at 5:32 PM, Sylvain Bauza  
wrote:
> 
> 
> Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer 
>  a écrit :
>> 
>> 
>> On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza 
>> 
>> wrote:
>> >
>> >
>> > Le mar. 9 oct. 2018 à 16:39, Eric Fried  a
>> > écrit :
>> >> IIUC, the primary thing the force flag was intended to do - allow 
>> an
>> >> instance to land on the requested destination even if that means
>> >> oversubscription of the host's resources - doesn't happen anymore
>> >> since
>> >> we started making the destination claim in placement.
>> >>
>> >> IOW, since pike, you don't actually see a difference in behavior 
>> by
>> >> using the force flag or not. (If you do, it's more likely a bug 
>> than
>> >> what you were expecting.)
>> >>
>> >> So there's no reason to keep it around. We can remove it in a new
>> >> microversion (or not); but even in the current microversion we 
>> need
>> >> not
>> >> continue making convoluted attempts to observe it.
>> >>
>> >> What that means is that we should simplify everything down to 
>> ignore
>> >> the
>> >> force flag and always call GET /a_c. Problem solved - for nested
>> >> and/or
>> >> sharing, NUMA or not, root resources or no, on the source and/or
>> >> destination.
>> >>
>> >
>> >
>> > While I tend to agree with Eric here (and I commented on the review
>> > accordingly by saying we should signal the new behaviour by a
>> > microversion), I still think we need to properly advertise this,
>> > adding openstack-operators@ accordingly.
>> 
>> Question for you as well: if we remove (or change) the force flag in 
>> a
>> new microversion then how should the old microversions behave when
>> nested allocations would be required?
>> 
> 
> In that case (ie. old microversions with either "force=None and 
> target" or 'force=True', we should IMHO not allocate any migration.
> Thoughts ?

Do you mean on old microversions implement option #D) ?

Cheers,
gibi


> 
>> Cheers,
>> gibi
>> 
>> > Disclaimer : since we have gaps on OSC, the current OSC behaviour
>> > when you "openstack server live-migrate " is to *force* the
>> > destination by not calling the scheduler. Yeah, it sucks.
>> >
>> > Operators, what are the exact cases (for those running clouds newer
>> > than Mitaka, ie. Newton and above) when you make use of the --force
>> > option for live migration with a microversion newer or equal 2.29 ?
>> > In general, even in the case of an emergency, you still want to 
>> make
>> > sure you don't throw your compute under the bus by massively
>> > migrating instances that would create an undetected snowball effect
>> > by having this compute refusing new instances. Or are you disabling
>> > the target compute service first and throw your pet instances up
>> > there ?
>> >
>> > -Sylvain
>> >
>> >
>> >
>> >> -efried
>> >>
>> >> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> >> > Hi,
>> >> >
>> >> > Setup
>> >> > -
>> >> >
>> >> > nested allocation: an allocation that contains resources from 
>> one
>> >> or
>> >> > more nested RPs. (if you have better term for this then please
>> >> suggest).
>> >> >
>> >> > If an instance has nested allocation it means that the compute, 
>> it
>> >> > allocates from, has a nested RP tree. BUT if a compute has a
>> >> nested RP
>> >> > tree it does not automatically means that the instance, 
>> allocating
>> >> from
>> >> > that compute, has a nested allocation (e.g. bandwidth inventory
>> >> will be
>> >> > on a nested RPs but not every instance will require bandwidth)
>> >> >
>> >> > Afaiu, as soon as we have NUMA modelling in place the most 
>> trivial
>> >> > servers will have nested allocations as CPU and MEMORY 
>> inverntory
>> >> will
>> >> > be moved to the nested NUMA RPs. But NUMA is still in the 
>> future.
>> >> >
>> >> > Sidenote: there is an edge case reported by bauzas when an 
>> instance
>> >> > allocates _only_ from nested RPs. This was discussed on last
>> >> Friday and
>> >> > it resulted in a new patch[0] but I would like to keep that
>> >> discussion
>> >> > separate from this if possible.
>> >> >
>> >> > Sidenote: the current problem somewhat related to not just 
>> nested
>> >> PRs
>> >> > but to sharing RPs as well. However I'm not aiming to implement
>> >> sharing
>> >> > support in Nova right now so I also try to keep the sharing
>> >> disscussion
>> >> > separated if possible.
>> >> >
>> >> > There was already some discussion on the Monday's scheduler
>> >> meeting but
>> >> > I could not attend.
>> >> >
>> >> 
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >> >
>> >> >
>> >> > The meat
>> >> > 
>> >> >
>> >> > Both live-migrate[1] and evacuate[2] has an optional force flag 
>> on
>> >> the
>> >> > nova REST API. The documentation says: "Force  by 
>> not
>> >> > verifying the provided destination host by the scheduler."
>> >> >
>> >> > Nova implements this statement by not calling the scheduler if
>> >> > force=True BUT 

Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Ben Nemec



On 10/9/18 10:19 AM, Doug Hellmann wrote:

Brian Rosmaita  writes:


On 10/8/18 11:59 PM, Matthew Thode wrote:

several projects have had problems with the new release, some have ways
of working around it, and some do not.  I'm sending this just to raise
the issue and allow a place to discuss solutions.

Currently there is a review proposed to blacklist 9.0.0, but if this is
going to still be an issue somehow in further releases we may need
another solution.

https://review.openstack.org/#/c/608835/


As indicated in the commit message on the above patch, 9.0.0 contains a
bug that's been fixed in oslo.messaging master, so I don't think there's
any question that 9.0.0 has to be blacklisted.


I've proposed releasing oslo.messaging 9.0.1 in
https://review.openstack.org/609030


I also included it in https://review.openstack.org/#/c/609031/ (which I 
see you found).




If we don't land the constraint update to allow 9.0.1 in, then there's
no rush to blacklist anything, is there?


Probably not. We'll want to blacklist it before we allow 9.0.1, but I 
suspect this is mostly a test problem since in production the transport 
would have to be set explicitly.





As far as the timing/content of 9.0.1, however, that may require further
discussion.

(In other words, I'm saying that when you say 'another solution', my
position is that we should take 'another' to mean 'additional', not
'different'.)


I'm not sure what that means.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
> Shit, I forgot to add openstack-operators@...
> Operators, see my question for you here :
>
>
>> Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :
>>
>>> IIUC, the primary thing the force flag was intended to do - allow an
>>> instance to land on the requested destination even if that means
>>> oversubscription of the host's resources - doesn't happen anymore since
>>> we started making the destination claim in placement.
>>>
>>> IOW, since pike, you don't actually see a difference in behavior by
>>> using the force flag or not. (If you do, it's more likely a bug than
>>> what you were expecting.)
>>>
>>> So there's no reason to keep it around. We can remove it in a new
>>> microversion (or not); but even in the current microversion we need not
>>> continue making convoluted attempts to observe it.
>>>
>>> What that means is that we should simplify everything down to ignore the
>>> force flag and always call GET /a_c. Problem solved - for nested and/or
>>> sharing, NUMA or not, root resources or no, on the source and/or
>>> destination.
>>>
>>>
>>
>> While I tend to agree with Eric here (and I commented on the review
>> accordingly by saying we should signal the new behaviour by a
>> microversion), I still think we need to properly advertise this, adding
>> openstack-operators@ accordingly.
>> Disclaimer : since we have gaps on OSC, the current OSC behaviour when
>> you "openstack server live-migrate " is to *force* the destination
>> by not calling the scheduler. Yeah, it sucks.
>>
>> Operators, what are the exact cases (for those running clouds newer than
>> Mitaka, ie. Newton and above) when you make use of the --force option for
>> live migration with a microversion newer or equal 2.29 ?
>> In general, even in the case of an emergency, you still want to make sure
>> you don't throw your compute under the bus by massively migrating instances
>> that would create an undetected snowball effect by having this compute
>> refusing new instances. Or are you disabling the target compute service
>> first and throw your pet instances up there ?
>>
>> -Sylvain
>>
>>
>>
>> -efried
>>>
>>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>>> > Hi,
>>> >
>>> > Setup
>>> > -
>>> >
>>> > nested allocation: an allocation that contains resources from one or
>>> > more nested RPs. (if you have better term for this then please
>>> suggest).
>>> >
>>> > If an instance has nested allocation it means that the compute, it
>>> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
>>> > tree it does not automatically means that the instance, allocating
>>> from
>>> > that compute, has a nested allocation (e.g. bandwidth inventory will
>>> be
>>> > on a nested RPs but not every instance will require bandwidth)
>>> >
>>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>>> > servers will have nested allocations as CPU and MEMORY inverntory will
>>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>>> >
>>> > Sidenote: there is an edge case reported by bauzas when an instance
>>> > allocates _only_ from nested RPs. This was discussed on last Friday
>>> and
>>> > it resulted in a new patch[0] but I would like to keep that discussion
>>> > separate from this if possible.
>>> >
>>> > Sidenote: the current problem somewhat related to not just nested PRs
>>> > but to sharing RPs as well. However I'm not aiming to implement
>>> sharing
>>> > support in Nova right now so I also try to keep the sharing
>>> disscussion
>>> > separated if possible.
>>> >
>>> > There was already some discussion on the Monday's scheduler meeting
>>> but
>>> > I could not attend.
>>> >
>>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>>> >
>>> >
>>> > The meat
>>> > 
>>> >
>>> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
>>> > nova REST API. The documentation says: "Force  by not
>>> > verifying the provided destination host by the scheduler."
>>> >
>>> > Nova implements this statement by not calling the scheduler if
>>> > force=True BUT still try to manage allocations in placement.
>>> >
>>> > To have allocation on the destination host Nova blindly copies the
>>> > instance allocation from the source host to the destination host
>>> during
>>> > these operations. Nova can do that as 1) the whole allocation is
>>> > against a single RP (the compute RP) and 2) Nova knows both the source
>>> > compute RP and the destination compute RP.
>>> >
>>> > However as soon as we bring nested allocations into the picture that
>>> > blind copy will not be feasible. Possible cases
>>> > 0) The instance has non-nested allocation on the source and would need
>>> > non nested allocation on the destination. This works with blindy copy
>>> > today.
>>> > 1) The instance has a nested allocation on the source and would need a
>>> > nested allocation on the destination as well.
>>> > 2) The instance has a non-nested 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Shit, I forgot to add openstack-operators@...
Operators, see my question for you here :


> Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :
>
>> IIUC, the primary thing the force flag was intended to do - allow an
>> instance to land on the requested destination even if that means
>> oversubscription of the host's resources - doesn't happen anymore since
>> we started making the destination claim in placement.
>>
>> IOW, since pike, you don't actually see a difference in behavior by
>> using the force flag or not. (If you do, it's more likely a bug than
>> what you were expecting.)
>>
>> So there's no reason to keep it around. We can remove it in a new
>> microversion (or not); but even in the current microversion we need not
>> continue making convoluted attempts to observe it.
>>
>> What that means is that we should simplify everything down to ignore the
>> force flag and always call GET /a_c. Problem solved - for nested and/or
>> sharing, NUMA or not, root resources or no, on the source and/or
>> destination.
>>
>>
>
> While I tend to agree with Eric here (and I commented on the review
> accordingly by saying we should signal the new behaviour by a
> microversion), I still think we need to properly advertise this, adding
> openstack-operators@ accordingly.
> Disclaimer : since we have gaps on OSC, the current OSC behaviour when you
> "openstack server live-migrate " is to *force* the destination by
> not calling the scheduler. Yeah, it sucks.
>
> Operators, what are the exact cases (for those running clouds newer than
> Mitaka, ie. Newton and above) when you make use of the --force option for
> live migration with a microversion newer or equal 2.29 ?
> In general, even in the case of an emergency, you still want to make sure
> you don't throw your compute under the bus by massively migrating instances
> that would create an undetected snowball effect by having this compute
> refusing new instances. Or are you disabling the target compute service
> first and throw your pet instances up there ?
>
> -Sylvain
>
>
>
> -efried
>>
>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> > Hi,
>> >
>> > Setup
>> > -
>> >
>> > nested allocation: an allocation that contains resources from one or
>> > more nested RPs. (if you have better term for this then please suggest).
>> >
>> > If an instance has nested allocation it means that the compute, it
>> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
>> > tree it does not automatically means that the instance, allocating from
>> > that compute, has a nested allocation (e.g. bandwidth inventory will be
>> > on a nested RPs but not every instance will require bandwidth)
>> >
>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>> > servers will have nested allocations as CPU and MEMORY inverntory will
>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>> >
>> > Sidenote: there is an edge case reported by bauzas when an instance
>> > allocates _only_ from nested RPs. This was discussed on last Friday and
>> > it resulted in a new patch[0] but I would like to keep that discussion
>> > separate from this if possible.
>> >
>> > Sidenote: the current problem somewhat related to not just nested PRs
>> > but to sharing RPs as well. However I'm not aiming to implement sharing
>> > support in Nova right now so I also try to keep the sharing disscussion
>> > separated if possible.
>> >
>> > There was already some discussion on the Monday's scheduler meeting but
>> > I could not attend.
>> >
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >
>> >
>> > The meat
>> > 
>> >
>> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
>> > nova REST API. The documentation says: "Force  by not
>> > verifying the provided destination host by the scheduler."
>> >
>> > Nova implements this statement by not calling the scheduler if
>> > force=True BUT still try to manage allocations in placement.
>> >
>> > To have allocation on the destination host Nova blindly copies the
>> > instance allocation from the source host to the destination host during
>> > these operations. Nova can do that as 1) the whole allocation is
>> > against a single RP (the compute RP) and 2) Nova knows both the source
>> > compute RP and the destination compute RP.
>> >
>> > However as soon as we bring nested allocations into the picture that
>> > blind copy will not be feasible. Possible cases
>> > 0) The instance has non-nested allocation on the source and would need
>> > non nested allocation on the destination. This works with blindy copy
>> > today.
>> > 1) The instance has a nested allocation on the source and would need a
>> > nested allocation on the destination as well.
>> > 2) The instance has a non-nested allocation on the source and would
>> > need a nested allocation on the destination.
>> > 3) The instance has a nested allocation on the source and 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer  a
écrit :

>
>
> On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza 
> wrote:
> >
> >
> > Le mar. 9 oct. 2018 à 16:39, Eric Fried  a
> > écrit :
> >> IIUC, the primary thing the force flag was intended to do - allow an
> >> instance to land on the requested destination even if that means
> >> oversubscription of the host's resources - doesn't happen anymore
> >> since
> >> we started making the destination claim in placement.
> >>
> >> IOW, since pike, you don't actually see a difference in behavior by
> >> using the force flag or not. (If you do, it's more likely a bug than
> >> what you were expecting.)
> >>
> >> So there's no reason to keep it around. We can remove it in a new
> >> microversion (or not); but even in the current microversion we need
> >> not
> >> continue making convoluted attempts to observe it.
> >>
> >> What that means is that we should simplify everything down to ignore
> >> the
> >> force flag and always call GET /a_c. Problem solved - for nested
> >> and/or
> >> sharing, NUMA or not, root resources or no, on the source and/or
> >> destination.
> >>
> >
> >
> > While I tend to agree with Eric here (and I commented on the review
> > accordingly by saying we should signal the new behaviour by a
> > microversion), I still think we need to properly advertise this,
> > adding openstack-operators@ accordingly.
>
> Question for you as well: if we remove (or change) the force flag in a
> new microversion then how should the old microversions behave when
> nested allocations would be required?
>
>
In that case (ie. old microversions with either "force=None and target" or
'force=True', we should IMHO not allocate any migration.
Thoughts ?


> Cheers,
> gibi
>
> > Disclaimer : since we have gaps on OSC, the current OSC behaviour
> > when you "openstack server live-migrate " is to *force* the
> > destination by not calling the scheduler. Yeah, it sucks.
> >
> > Operators, what are the exact cases (for those running clouds newer
> > than Mitaka, ie. Newton and above) when you make use of the --force
> > option for live migration with a microversion newer or equal 2.29 ?
> > In general, even in the case of an emergency, you still want to make
> > sure you don't throw your compute under the bus by massively
> > migrating instances that would create an undetected snowball effect
> > by having this compute refusing new instances. Or are you disabling
> > the target compute service first and throw your pet instances up
> > there ?
> >
> > -Sylvain
> >
> >
> >
> >> -efried
> >>
> >> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> >> > Hi,
> >> >
> >> > Setup
> >> > -
> >> >
> >> > nested allocation: an allocation that contains resources from one
> >> or
> >> > more nested RPs. (if you have better term for this then please
> >> suggest).
> >> >
> >> > If an instance has nested allocation it means that the compute, it
> >> > allocates from, has a nested RP tree. BUT if a compute has a
> >> nested RP
> >> > tree it does not automatically means that the instance, allocating
> >> from
> >> > that compute, has a nested allocation (e.g. bandwidth inventory
> >> will be
> >> > on a nested RPs but not every instance will require bandwidth)
> >> >
> >> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> >> > servers will have nested allocations as CPU and MEMORY inverntory
> >> will
> >> > be moved to the nested NUMA RPs. But NUMA is still in the future.
> >> >
> >> > Sidenote: there is an edge case reported by bauzas when an instance
> >> > allocates _only_ from nested RPs. This was discussed on last
> >> Friday and
> >> > it resulted in a new patch[0] but I would like to keep that
> >> discussion
> >> > separate from this if possible.
> >> >
> >> > Sidenote: the current problem somewhat related to not just nested
> >> PRs
> >> > but to sharing RPs as well. However I'm not aiming to implement
> >> sharing
> >> > support in Nova right now so I also try to keep the sharing
> >> disscussion
> >> > separated if possible.
> >> >
> >> > There was already some discussion on the Monday's scheduler
> >> meeting but
> >> > I could not attend.
> >> >
> >>
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> >> >
> >> >
> >> > The meat
> >> > 
> >> >
> >> > Both live-migrate[1] and evacuate[2] has an optional force flag on
> >> the
> >> > nova REST API. The documentation says: "Force  by not
> >> > verifying the provided destination host by the scheduler."
> >> >
> >> > Nova implements this statement by not calling the scheduler if
> >> > force=True BUT still try to manage allocations in placement.
> >> >
> >> > To have allocation on the destination host Nova blindly copies the
> >> > instance allocation from the source host to the destination host
> >> during
> >> > these operations. Nova can do that as 1) the whole allocation is
> >> > against a single RP (the compute RP) and 2) Nova knows 

Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Florian Engelmann

Am 10/9/18 um 1:23 PM schrieb Jay Pipes:

On 10/09/2018 06:34 AM, Florian Engelmann wrote:

Am 10/9/18 um 11:41 AM schrieb Jay Pipes:

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy 
endpoint and just have haproxy spread all traffic across each 
Galera node?


Galera, after all, is multi-master synchronous replication... so it 
shouldn't matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses 
cluster-wide optimistic locking. This may cause some transactions to 
rollback. With an increasing number of writeable masters, the 
transaction rollback rate may increase, especially if there is write 
contention on the same dataset. It is of course possible to retry 
the transaction and perhaps it will COMMIT in the retries, but this 
will add to the transaction latency. However, some designs are 
deadlock prone, e.g sequence tables.

—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial 



Have you seen the above in production?


Yes of course. Just depends on the application and how high the 
workload gets.


Please read about deadloks and nova in the following report by Intel:

http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf 



I have read the above. It's a synthetic workload analysis, which is why 
I asked if you'd seen this in production.


For the record, we addressed much of the contention/races mentioned in 
the above around scheduler resource consumption in the Ocata and Pike 
releases of Nova.


I'm aware that the report above identifies the quota handling code in 
Nova as the primary culprit of the deadlock issues but again, it's a 
synthetic workload that is designed to find breaking points. It doesn't 
represent a realistic production workload.


You can read about the deadlock issue in depth on my blog here:

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/ 



That explains where the source of the problem comes from (it's the use 
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling 
code in the Rocky release).


Thank you very much for your link. Great article!!! Took my a while to 
read it and understand everything but helped a lot!!!




If just Nova is affected we could also create an additional HAProxy 
listener using all Galera nodes with round-robin for all other services?


I fail to see the point of using Galera with a single writer. At that 
point, why bother with Galera at all? Just use a single database node 
with a single slave for backup purposes.


From my point of view Galera is easy to manage and great for HA. Having 
to handle a manual failover in production with mysql master/slave never 
was fun...
Indeed writing to a single node and not using the other nodes (even for 
read, like it is done in kolla-ansible) is not the best solution. Galera 
is slower than a standalone MySQL.
Using ProxySQL would enable us to use caching and read/write split to 
speed up database queries while HA and management are still good.






Anyway - proxySQL would be a great extension.


I don't disagree that proxySQL is a good extension. However, it adds yet 
another services to the mesh that needs to be deployed, configured and 
maintained.


True. I guess we will start with an external MySQL installation to 
collect some experience.


smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Ben Nemec



On 10/9/18 9:06 AM, Lance Bragstad wrote:
Keystone is failing because it's missing a fix from oslo.messaging [0]. 
That said, keystone is also relying on an internal implementation detail 
in oslo.messaging by mocking it in tests [1]. The notification work has 
been around in keystone for a *long* time, but it's apparent that we 
should revisit these tests to make sure we aren't testing something that 
is already tested by oslo.messaging if we're mocking internal 
implementation details of a library.


This is actually the same problem Cinder and Glance had, it's just being 
hidden because there is an exception handler in Keystone that buried the 
original exception message in log output. 9.0.1 will get Keystone 
working too.


But mocking library internals is still naughty and you should stop that. :-P



Regardless, blacklisting version 9.0.0 will work for keystone, but we 
can work around it another way by either rewriting the tests to not care 
about oslo.messaging specifics, or removing them if they're obsolete.


[0] https://review.openstack.org/#/c/608196/
[1] 
https://git.openstack.org/cgit/openstack/keystone/tree/keystone/tests/unit/common/test_notifications.py#n1343


On Mon, Oct 8, 2018 at 10:59 PM Matthew Thode > wrote:


several projects have had problems with the new release, some have ways
of working around it, and some do not.  I'm sending this just to raise
the issue and allow a place to discuss solutions.

Currently there is a review proposed to blacklist 9.0.0, but if this is
going to still be an issue somehow in further releases we may need
another solution.

https://review.openstack.org/#/c/608835/

-- 
Matthew Thode (prometheanfire)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Matthew Thode
On 18-10-09 11:12:30, Doug Hellmann wrote:
> Matthew Thode  writes:
> 
> > several projects have had problems with the new release, some have ways
> > of working around it, and some do not.  I'm sending this just to raise
> > the issue and allow a place to discuss solutions.
> >
> > Currently there is a review proposed to blacklist 9.0.0, but if this is
> > going to still be an issue somehow in further releases we may need
> > another solution.
> >
> > https://review.openstack.org/#/c/608835/
> >
> > -- 
> > Matthew Thode (prometheanfire)
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> Do you have links to the failure logs or bug reports or something? If I
> wanted to help I wouldn't even know where to start.
> 

http://logs.openstack.org/21/607521/2/check/cross-cinder-py35/e15722e/testr_results.html.gz
http://logs.openstack.org/21/607521/2/check/cross-glance-py35/e2161d7/testr_results.html.gz
http://logs.openstack.org/21/607521/2/check/cross-keystone-py35/908a1c2/testr_results.html.gz

-- 
Matthew Thode (prometheanfire)


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Lance Bragstad  writes:

> Keystone is failing because it's missing a fix from oslo.messaging [0].
> That said, keystone is also relying on an internal implementation detail in
> oslo.messaging by mocking it in tests [1]. The notification work has been
> around in keystone for a *long* time, but it's apparent that we should
> revisit these tests to make sure we aren't testing something that is
> already tested by oslo.messaging if we're mocking internal implementation
> details of a library.
>
> Regardless, blacklisting version 9.0.0 will work for keystone, but we can
> work around it another way by either rewriting the tests to not care about
> oslo.messaging specifics, or removing them if they're obsolete.

Yeah, we keep running into these sorts of problems when folks mock past
the public API boundary of a library, so let's eliminate them as we find
them.

If there's a way to add a fixture to oslo.messaging to support the tests
we can do that, too.

Doug

>
> [0] https://review.openstack.org/#/c/608196/
> [1]
> https://git.openstack.org/cgit/openstack/keystone/tree/keystone/tests/unit/common/test_notifications.py#n1343
>
> On Mon, Oct 8, 2018 at 10:59 PM Matthew Thode 
> wrote:
>
>> several projects have had problems with the new release, some have ways
>> of working around it, and some do not.  I'm sending this just to raise
>> the issue and allow a place to discuss solutions.
>>
>> Currently there is a review proposed to blacklist 9.0.0, but if this is
>> going to still be an issue somehow in further releases we may need
>> another solution.
>>
>> https://review.openstack.org/#/c/608835/
>>
>> --
>> Matthew Thode (prometheanfire)
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Brian Rosmaita  writes:

> On 10/8/18 11:59 PM, Matthew Thode wrote:
>> several projects have had problems with the new release, some have ways
>> of working around it, and some do not.  I'm sending this just to raise
>> the issue and allow a place to discuss solutions.
>> 
>> Currently there is a review proposed to blacklist 9.0.0, but if this is
>> going to still be an issue somehow in further releases we may need
>> another solution.
>> 
>> https://review.openstack.org/#/c/608835/
>
> As indicated in the commit message on the above patch, 9.0.0 contains a
> bug that's been fixed in oslo.messaging master, so I don't think there's
> any question that 9.0.0 has to be blacklisted.

I've proposed releasing oslo.messaging 9.0.1 in
https://review.openstack.org/609030

If we don't land the constraint update to allow 9.0.1 in, then there's
no rush to blacklist anything, is there?

> As far as the timing/content of 9.0.1, however, that may require further
> discussion.
>
> (In other words, I'm saying that when you say 'another solution', my
> position is that we should take 'another' to mean 'additional', not
> 'different'.)

I'm not sure what that means.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Doug Hellmann
Matthew Thode  writes:

> several projects have had problems with the new release, some have ways
> of working around it, and some do not.  I'm sending this just to raise
> the issue and allow a place to discuss solutions.
>
> Currently there is a review proposed to blacklist 9.0.0, but if this is
> going to still be an issue somehow in further releases we may need
> another solution.
>
> https://review.openstack.org/#/c/608835/
>
> -- 
> Matthew Thode (prometheanfire)
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Do you have links to the failure logs or bug reports or something? If I
wanted to help I wouldn't even know where to start.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Ben Nemec



On 10/9/18 8:22 AM, Brian Rosmaita wrote:

On 10/8/18 11:59 PM, Matthew Thode wrote:

several projects have had problems with the new release, some have ways
of working around it, and some do not.  I'm sending this just to raise
the issue and allow a place to discuss solutions.

Currently there is a review proposed to blacklist 9.0.0, but if this is
going to still be an issue somehow in further releases we may need
another solution.

https://review.openstack.org/#/c/608835/


As indicated in the commit message on the above patch, 9.0.0 contains a
bug that's been fixed in oslo.messaging master, so I don't think there's
any question that 9.0.0 has to be blacklisted.


Agreed.



As far as the timing/content of 9.0.1, however, that may require further
discussion.


I'll get the release request for 9.0.1 up today. That should fix 
everyone but Keystone. I'm not sure yet what is going to be needed to 
get that working. They are mocking a private function from 
oslo.messaging, but we didn't remove it so I'm not sure why those tests 
started failing.




(In other words, I'm saying that when you say 'another solution', my
position is that we should take 'another' to mean 'additional', not
'different'.)

cheers,
brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Balázs Gibizer


On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza  
wrote:
> 
> 
> Le mar. 9 oct. 2018 à 16:39, Eric Fried  a 
> écrit :
>> IIUC, the primary thing the force flag was intended to do - allow an
>> instance to land on the requested destination even if that means
>> oversubscription of the host's resources - doesn't happen anymore 
>> since
>> we started making the destination claim in placement.
>> 
>> IOW, since pike, you don't actually see a difference in behavior by
>> using the force flag or not. (If you do, it's more likely a bug than
>> what you were expecting.)
>> 
>> So there's no reason to keep it around. We can remove it in a new
>> microversion (or not); but even in the current microversion we need 
>> not
>> continue making convoluted attempts to observe it.
>> 
>> What that means is that we should simplify everything down to ignore 
>> the
>> force flag and always call GET /a_c. Problem solved - for nested 
>> and/or
>> sharing, NUMA or not, root resources or no, on the source and/or
>> destination.
>> 
> 
> 
> While I tend to agree with Eric here (and I commented on the review 
> accordingly by saying we should signal the new behaviour by a 
> microversion), I still think we need to properly advertise this, 
> adding openstack-operators@ accordingly.

Question for you as well: if we remove (or change) the force flag in a 
new microversion then how should the old microversions behave when 
nested allocations would be required?

Cheers,
gibi

> Disclaimer : since we have gaps on OSC, the current OSC behaviour 
> when you "openstack server live-migrate " is to *force* the 
> destination by not calling the scheduler. Yeah, it sucks.
> 
> Operators, what are the exact cases (for those running clouds newer 
> than Mitaka, ie. Newton and above) when you make use of the --force 
> option for live migration with a microversion newer or equal 2.29 ?
> In general, even in the case of an emergency, you still want to make 
> sure you don't throw your compute under the bus by massively 
> migrating instances that would create an undetected snowball effect 
> by having this compute refusing new instances. Or are you disabling 
> the target compute service first and throw your pet instances up 
> there ?
> 
> -Sylvain
> 
> 
> 
>> -efried
>> 
>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> > Hi,
>> >
>> > Setup
>> > -
>> >
>> > nested allocation: an allocation that contains resources from one 
>> or
>> > more nested RPs. (if you have better term for this then please 
>> suggest).
>> >
>> > If an instance has nested allocation it means that the compute, it
>> > allocates from, has a nested RP tree. BUT if a compute has a 
>> nested RP
>> > tree it does not automatically means that the instance, allocating 
>> from
>> > that compute, has a nested allocation (e.g. bandwidth inventory 
>> will be
>> > on a nested RPs but not every instance will require bandwidth)
>> >
>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>> > servers will have nested allocations as CPU and MEMORY inverntory 
>> will
>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>> >
>> > Sidenote: there is an edge case reported by bauzas when an instance
>> > allocates _only_ from nested RPs. This was discussed on last 
>> Friday and
>> > it resulted in a new patch[0] but I would like to keep that 
>> discussion
>> > separate from this if possible.
>> >
>> > Sidenote: the current problem somewhat related to not just nested 
>> PRs
>> > but to sharing RPs as well. However I'm not aiming to implement 
>> sharing
>> > support in Nova right now so I also try to keep the sharing 
>> disscussion
>> > separated if possible.
>> >
>> > There was already some discussion on the Monday's scheduler 
>> meeting but
>> > I could not attend.
>> > 
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >
>> >
>> > The meat
>> > 
>> >
>> > Both live-migrate[1] and evacuate[2] has an optional force flag on 
>> the
>> > nova REST API. The documentation says: "Force  by not
>> > verifying the provided destination host by the scheduler."
>> >
>> > Nova implements this statement by not calling the scheduler if
>> > force=True BUT still try to manage allocations in placement.
>> >
>> > To have allocation on the destination host Nova blindly copies the
>> > instance allocation from the source host to the destination host 
>> during
>> > these operations. Nova can do that as 1) the whole allocation is
>> > against a single RP (the compute RP) and 2) Nova knows both the 
>> source
>> > compute RP and the destination compute RP.
>> >
>> > However as soon as we bring nested allocations into the picture 
>> that
>> > blind copy will not be feasible. Possible cases
>> > 0) The instance has non-nested allocation on the source and would 
>> need
>> > non nested allocation on the destination. This works with blindy 
>> copy
>> > today.
>> > 1) The instance has a nested 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Balázs Gibizer


On Tue, Oct 9, 2018 at 4:39 PM, Eric Fried  wrote:
> IIUC, the primary thing the force flag was intended to do - allow an
> instance to land on the requested destination even if that means
> oversubscription of the host's resources - doesn't happen anymore 
> since
> we started making the destination claim in placement.

Can we simply do that still by not creating allocation in placement 
during the move? (see option #D))

> 
> IOW, since pike, you don't actually see a difference in behavior by
> using the force flag or not. (If you do, it's more likely a bug than
> what you were expecting.)

There is still difference between force=True and force=False today. 
When you say force=False nova calls placement a_c and placement try to 
satisfy requested resource, required traits, and aggregate membership. 
When you say force=True nova conductor takes the resource allocation 
from the source host and copies that blindly to the destination but 
does not check any traits or aggregate membership. So force=True is 
still ignores a lot of rules and safeties.

> 
> So there's no reason to keep it around. We can remove it in a new
> microversion (or not); but even in the current microversion we need 
> not
> continue making convoluted attempts to observe it.

If we remove it in a new microversion (option #C)) then we still need 
to define how to behave in the old microversions when nested allocation 
would be needed. I don't fully get what you mean by 'not continue 
making convoluted attempts to observe it.'

> 
> What that means is that we should simplify everything down to ignore 
> the
> force flag and always call GET /a_c. Problem solved - for nested 
> and/or
> sharing, NUMA or not, root resources or no, on the source and/or
> destination.

If you do the force flag removal in a nw microversion that also means 
(at least to me) that you should not change the behavior of the force 
flag in the old microversions.

Cheers,
gibi

> 
> -efried
> 
> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>>  Hi,
>> 
>>  Setup
>>  -
>> 
>>  nested allocation: an allocation that contains resources from one or
>>  more nested RPs. (if you have better term for this then please 
>> suggest).
>> 
>>  If an instance has nested allocation it means that the compute, it
>>  allocates from, has a nested RP tree. BUT if a compute has a nested 
>> RP
>>  tree it does not automatically means that the instance, allocating 
>> from
>>  that compute, has a nested allocation (e.g. bandwidth inventory 
>> will be
>>  on a nested RPs but not every instance will require bandwidth)
>> 
>>  Afaiu, as soon as we have NUMA modelling in place the most trivial
>>  servers will have nested allocations as CPU and MEMORY inverntory 
>> will
>>  be moved to the nested NUMA RPs. But NUMA is still in the future.
>> 
>>  Sidenote: there is an edge case reported by bauzas when an instance
>>  allocates _only_ from nested RPs. This was discussed on last Friday 
>> and
>>  it resulted in a new patch[0] but I would like to keep that 
>> discussion
>>  separate from this if possible.
>> 
>>  Sidenote: the current problem somewhat related to not just nested 
>> PRs
>>  but to sharing RPs as well. However I'm not aiming to implement 
>> sharing
>>  support in Nova right now so I also try to keep the sharing 
>> disscussion
>>  separated if possible.
>> 
>>  There was already some discussion on the Monday's scheduler meeting 
>> but
>>  I could not attend.
>>  
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> 
>> 
>>  The meat
>>  
>> 
>>  Both live-migrate[1] and evacuate[2] has an optional force flag on 
>> the
>>  nova REST API. The documentation says: "Force  by not
>>  verifying the provided destination host by the scheduler."
>> 
>>  Nova implements this statement by not calling the scheduler if
>>  force=True BUT still try to manage allocations in placement.
>> 
>>  To have allocation on the destination host Nova blindly copies the
>>  instance allocation from the source host to the destination host 
>> during
>>  these operations. Nova can do that as 1) the whole allocation is
>>  against a single RP (the compute RP) and 2) Nova knows both the 
>> source
>>  compute RP and the destination compute RP.
>> 
>>  However as soon as we bring nested allocations into the picture that
>>  blind copy will not be feasible. Possible cases
>>  0) The instance has non-nested allocation on the source and would 
>> need
>>  non nested allocation on the destination. This works with blindy 
>> copy
>>  today.
>>  1) The instance has a nested allocation on the source and would 
>> need a
>>  nested allocation on the destination as well.
>>  2) The instance has a non-nested allocation on the source and would
>>  need a nested allocation on the destination.
>>  3) The instance has a nested allocation on the source and would 
>> need a
>>  non nested allocation on the destination.
>> 
>>  Nova cannot generate 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :

> IIUC, the primary thing the force flag was intended to do - allow an
> instance to land on the requested destination even if that means
> oversubscription of the host's resources - doesn't happen anymore since
> we started making the destination claim in placement.
>
> IOW, since pike, you don't actually see a difference in behavior by
> using the force flag or not. (If you do, it's more likely a bug than
> what you were expecting.)
>
> So there's no reason to keep it around. We can remove it in a new
> microversion (or not); but even in the current microversion we need not
> continue making convoluted attempts to observe it.
>
> What that means is that we should simplify everything down to ignore the
> force flag and always call GET /a_c. Problem solved - for nested and/or
> sharing, NUMA or not, root resources or no, on the source and/or
> destination.
>
>

While I tend to agree with Eric here (and I commented on the review
accordingly by saying we should signal the new behaviour by a
microversion), I still think we need to properly advertise this, adding
openstack-operators@ accordingly.
Disclaimer : since we have gaps on OSC, the current OSC behaviour when you
"openstack server live-migrate " is to *force* the destination by
not calling the scheduler. Yeah, it sucks.

Operators, what are the exact cases (for those running clouds newer than
Mitaka, ie. Newton and above) when you make use of the --force option for
live migration with a microversion newer or equal 2.29 ?
In general, even in the case of an emergency, you still want to make sure
you don't throw your compute under the bus by massively migrating instances
that would create an undetected snowball effect by having this compute
refusing new instances. Or are you disabling the target compute service
first and throw your pet instances up there ?

-Sylvain



-efried
>
> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> > Hi,
> >
> > Setup
> > -
> >
> > nested allocation: an allocation that contains resources from one or
> > more nested RPs. (if you have better term for this then please suggest).
> >
> > If an instance has nested allocation it means that the compute, it
> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
> > tree it does not automatically means that the instance, allocating from
> > that compute, has a nested allocation (e.g. bandwidth inventory will be
> > on a nested RPs but not every instance will require bandwidth)
> >
> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> > servers will have nested allocations as CPU and MEMORY inverntory will
> > be moved to the nested NUMA RPs. But NUMA is still in the future.
> >
> > Sidenote: there is an edge case reported by bauzas when an instance
> > allocates _only_ from nested RPs. This was discussed on last Friday and
> > it resulted in a new patch[0] but I would like to keep that discussion
> > separate from this if possible.
> >
> > Sidenote: the current problem somewhat related to not just nested PRs
> > but to sharing RPs as well. However I'm not aiming to implement sharing
> > support in Nova right now so I also try to keep the sharing disscussion
> > separated if possible.
> >
> > There was already some discussion on the Monday's scheduler meeting but
> > I could not attend.
> >
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> >
> >
> > The meat
> > 
> >
> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
> > nova REST API. The documentation says: "Force  by not
> > verifying the provided destination host by the scheduler."
> >
> > Nova implements this statement by not calling the scheduler if
> > force=True BUT still try to manage allocations in placement.
> >
> > To have allocation on the destination host Nova blindly copies the
> > instance allocation from the source host to the destination host during
> > these operations. Nova can do that as 1) the whole allocation is
> > against a single RP (the compute RP) and 2) Nova knows both the source
> > compute RP and the destination compute RP.
> >
> > However as soon as we bring nested allocations into the picture that
> > blind copy will not be feasible. Possible cases
> > 0) The instance has non-nested allocation on the source and would need
> > non nested allocation on the destination. This works with blindy copy
> > today.
> > 1) The instance has a nested allocation on the source and would need a
> > nested allocation on the destination as well.
> > 2) The instance has a non-nested allocation on the source and would
> > need a nested allocation on the destination.
> > 3) The instance has a nested allocation on the source and would need a
> > non nested allocation on the destination.
> >
> > Nova cannot generate nested allocations easily without reimplementing
> > some of the placement allocation candidate (a_c) code. However I don't
> > 

Re: [openstack-dev] [cinder] [nova] Do we need a "force" parameter in cinder "re-image" API?

2018-10-09 Thread Jay S Bryant



On 10/8/2018 8:54 AM, Sean McGinnis wrote:

On Mon, Oct 08, 2018 at 03:09:36PM +0800, Yikun Jiang wrote:

In Denver, we agree to add a new "re-image" API in cinder to support upport
volume-backed server rebuild with a new image.

An initial blueprint has been drafted in [3], welcome to review it, thanks.
: )

[snip]

The "force" parameter idea comes from [4], means that
1. we can re-image an "available" volume directly.
2. we can't re-image "in-use"/"reserved" volume directly.
3. we can only re-image an "in-use"/"reserved" volume with "force"
parameter.

And it means nova need to always call re-image API with an extra "force"
parameter,
because the volume status is "in-use" or "reserve" when we rebuild the
server.

*So, what's you idea? Do we really want to add this "force" parameter?*


I would prefer we have the "force" parameter, even if it is something that will
always be defaulted to True from Nova.

Having this exposed as a REST API means anyone could call it, not just Nova
code. So as protection from someone doing something that they are not really
clear on the full implications of, having a flag in there to guard volumes that
are already attached or reserved for shelved instances is worth the very minor
extra overhead.
I concur with Sean's assessment.  I think putting a safety switch in 
place in this design is important to ensure that people using the API 
directly are less likely to do something that they may not actually want 
to do.


Jay

[1] https://etherpad.openstack.org/p/nova-ptg-stein L483
[2] https://etherpad.openstack.org/p/cinder-ptg-stein-thursday-rebuild L12
[3] https://review.openstack.org/#/c/605317
[4]
https://review.openstack.org/#/c/605317/1/specs/stein/add-volume-re-image-api.rst@75

Regards,
Yikun

Jiang Yikun(Kero)
Mail: yikunk...@gmail.com
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Eric Fried
IIUC, the primary thing the force flag was intended to do - allow an
instance to land on the requested destination even if that means
oversubscription of the host's resources - doesn't happen anymore since
we started making the destination claim in placement.

IOW, since pike, you don't actually see a difference in behavior by
using the force flag or not. (If you do, it's more likely a bug than
what you were expecting.)

So there's no reason to keep it around. We can remove it in a new
microversion (or not); but even in the current microversion we need not
continue making convoluted attempts to observe it.

What that means is that we should simplify everything down to ignore the
force flag and always call GET /a_c. Problem solved - for nested and/or
sharing, NUMA or not, root resources or no, on the source and/or
destination.

-efried

On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> Hi,
> 
> Setup
> -
> 
> nested allocation: an allocation that contains resources from one or 
> more nested RPs. (if you have better term for this then please suggest).
> 
> If an instance has nested allocation it means that the compute, it 
> allocates from, has a nested RP tree. BUT if a compute has a nested RP 
> tree it does not automatically means that the instance, allocating from 
> that compute, has a nested allocation (e.g. bandwidth inventory will be 
> on a nested RPs but not every instance will require bandwidth)
> 
> Afaiu, as soon as we have NUMA modelling in place the most trivial 
> servers will have nested allocations as CPU and MEMORY inverntory will 
> be moved to the nested NUMA RPs. But NUMA is still in the future.
> 
> Sidenote: there is an edge case reported by bauzas when an instance 
> allocates _only_ from nested RPs. This was discussed on last Friday and 
> it resulted in a new patch[0] but I would like to keep that discussion 
> separate from this if possible.
> 
> Sidenote: the current problem somewhat related to not just nested PRs 
> but to sharing RPs as well. However I'm not aiming to implement sharing 
> support in Nova right now so I also try to keep the sharing disscussion 
> separated if possible.
> 
> There was already some discussion on the Monday's scheduler meeting but 
> I could not attend.
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> 
> 
> The meat
> 
> 
> Both live-migrate[1] and evacuate[2] has an optional force flag on the 
> nova REST API. The documentation says: "Force  by not 
> verifying the provided destination host by the scheduler."
> 
> Nova implements this statement by not calling the scheduler if 
> force=True BUT still try to manage allocations in placement.
> 
> To have allocation on the destination host Nova blindly copies the 
> instance allocation from the source host to the destination host during 
> these operations. Nova can do that as 1) the whole allocation is 
> against a single RP (the compute RP) and 2) Nova knows both the source 
> compute RP and the destination compute RP.
> 
> However as soon as we bring nested allocations into the picture that 
> blind copy will not be feasible. Possible cases
> 0) The instance has non-nested allocation on the source and would need 
> non nested allocation on the destination. This works with blindy copy 
> today.
> 1) The instance has a nested allocation on the source and would need a 
> nested allocation on the destination as well.
> 2) The instance has a non-nested allocation on the source and would 
> need a nested allocation on the destination.
> 3) The instance has a nested allocation on the source and would need a 
> non nested allocation on the destination.
> 
> Nova cannot generate nested allocations easily without reimplementing 
> some of the placement allocation candidate (a_c) code. However I don't 
> like the idea of duplicating some of the a_c code in Nova.
> 
> Nova cannot detect what kind of allocation (nested or non-nested) an 
> instance would need on the destination without calling placement a_c. 
> So knowing when to call placement is a chicken and egg problem.
> 
> Possible solutions:
> A) fail fast
> 
> 0) Nova can detect that the source allocatioin is non-nested and try 
> the blindy copy and it will succeed.
> 1) Nova can detect that the source allocaton is nested and fail the 
> operation
> 2) Nova only sees a non nested source allocation. Even if the dest RP 
> tree is nested it does not mean that the allocation will be nested. We 
> cannot fail fast. Nova can try the blind copy and allocate every 
> resources from the root RP of the destination. If the instance require 
> nested allocation instead the claim will fail in placement. So nova can 
> fail the operation a bit later than in 1).
> 3) Nova can detect that the source allocation is nested and fail the 
> operation. However and enhanced blind copy that tries to allocation 
> everything from the root RP on the destinaton would have worked.
> 
> B) Guess 

Re: [openstack-dev] [cinder] [nova] Do we need a "force" parameter in cinder "re-image" API?

2018-10-09 Thread Matt Riedemann

On 10/9/2018 8:04 AM, Erlon Cruz wrote:
If you are planning to re-image an image on a bootable volume then yes 
you should use a force parameter. I have lost the discussion about this 
on PTG. What is the main use cases? This seems to me something that 
could be leveraged with the current revert-to-snapshot API, which would 
be even better. The flow would be:


1 - create a volume from image
2 - create an snapshot
3 - do whatever you wan't
4 - revert the snapshot

Would that help in your the use cases?


As the spec mentions, this is for enabling re-imaging the root volume on 
a server when nova rebuilds the server. That is not allowed today 
because the compute service can't re-image the root volume. We don't 
want to jump through a bunch of gross alternative hoops to create a new 
root volume with the new image and swap them out (the reasons why are in 
the spec, and have been discussed previously in the ML). So nova is 
asking cinder to provide an API to change the image in a volume which 
the nova rebuild operation will use to re-image the root volume on a 
volume-backed server. I don't know if revert-to-snapshot solves that use 
case, but it doesn't sound like it. With the nova rebuild API, the user 
provides an image reference and that is used to re-image the root disk 
on the server. So it might not be a snapshot, it could be something new.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Lance Bragstad
Keystone is failing because it's missing a fix from oslo.messaging [0].
That said, keystone is also relying on an internal implementation detail in
oslo.messaging by mocking it in tests [1]. The notification work has been
around in keystone for a *long* time, but it's apparent that we should
revisit these tests to make sure we aren't testing something that is
already tested by oslo.messaging if we're mocking internal implementation
details of a library.

Regardless, blacklisting version 9.0.0 will work for keystone, but we can
work around it another way by either rewriting the tests to not care about
oslo.messaging specifics, or removing them if they're obsolete.

[0] https://review.openstack.org/#/c/608196/
[1]
https://git.openstack.org/cgit/openstack/keystone/tree/keystone/tests/unit/common/test_notifications.py#n1343

On Mon, Oct 8, 2018 at 10:59 PM Matthew Thode 
wrote:

> several projects have had problems with the new release, some have ways
> of working around it, and some do not.  I'm sending this just to raise
> the issue and allow a place to discuss solutions.
>
> Currently there is a review proposed to blacklist 9.0.0, but if this is
> going to still be an issue somehow in further releases we may need
> another solution.
>
> https://review.openstack.org/#/c/608835/
>
> --
> Matthew Thode (prometheanfire)
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][glance][cinder][keystone][requirements] blocking oslo.messaging 9.0.0

2018-10-09 Thread Brian Rosmaita
On 10/8/18 11:59 PM, Matthew Thode wrote:
> several projects have had problems with the new release, some have ways
> of working around it, and some do not.  I'm sending this just to raise
> the issue and allow a place to discuss solutions.
> 
> Currently there is a review proposed to blacklist 9.0.0, but if this is
> going to still be an issue somehow in further releases we may need
> another solution.
> 
> https://review.openstack.org/#/c/608835/

As indicated in the commit message on the above patch, 9.0.0 contains a
bug that's been fixed in oslo.messaging master, so I don't think there's
any question that 9.0.0 has to be blacklisted.

As far as the timing/content of 9.0.1, however, that may require further
discussion.

(In other words, I'm saying that when you say 'another solution', my
position is that we should take 'another' to mean 'additional', not
'different'.)

cheers,
brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stadium][networking] Seeking proposals for non-voting Stadium projects in Neutron check queue

2018-10-09 Thread Michel Peterson
On Sun, Sep 30, 2018 at 3:43 AM Miguel Lavalle  wrote:

> The next step is for each project to propose the jobs that they want to
> run against Neutron patches.
>

This is fantastic. Do you plan to have all patches under a single topic for
easier tracking? I'll be handling the proposal of these jobs for
networking-odl and would like to know this before sending them for review.

In addition, since these will be non-voting for Stadium projects, how will
the mechanics be to avoid breakage of such projects?
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Do we need a "force" parameter in cinder "re-image" API?

2018-10-09 Thread Erlon Cruz
If you are planning to re-image an image on a bootable volume then yes you
should use a force parameter. I have lost the discussion about this on PTG.
What is the main use cases? This seems to me something that could be
leveraged with the current revert-to-snapshot API, which would be even
better. The flow would be:

1 - create a volume from image
2 - create an snapshot
3 - do whatever you wan't
4 - revert the snapshot

Would that help in your the use cases?

Em seg, 8 de out de 2018 às 10:54, Sean McGinnis 
escreveu:

> On Mon, Oct 08, 2018 at 03:09:36PM +0800, Yikun Jiang wrote:
> > In Denver, we agree to add a new "re-image" API in cinder to support
> upport
> > volume-backed server rebuild with a new image.
> >
> > An initial blueprint has been drafted in [3], welcome to review it,
> thanks.
> > : )
> >
> > [snip]
> >
> > The "force" parameter idea comes from [4], means that
> > 1. we can re-image an "available" volume directly.
> > 2. we can't re-image "in-use"/"reserved" volume directly.
> > 3. we can only re-image an "in-use"/"reserved" volume with "force"
> > parameter.
> >
> > And it means nova need to always call re-image API with an extra "force"
> > parameter,
> > because the volume status is "in-use" or "reserve" when we rebuild the
> > server.
> >
> > *So, what's you idea? Do we really want to add this "force" parameter?*
> >
>
> I would prefer we have the "force" parameter, even if it is something that
> will
> always be defaulted to True from Nova.
>
> Having this exposed as a REST API means anyone could call it, not just Nova
> code. So as protection from someone doing something that they are not
> really
> clear on the full implications of, having a flag in there to guard volumes
> that
> are already attached or reserved for shelved instances is worth the very
> minor
> extra overhead.
>
> > [1] https://etherpad.openstack.org/p/nova-ptg-stein L483
> > [2] https://etherpad.openstack.org/p/cinder-ptg-stein-thursday-rebuild
> L12
> > [3] https://review.openstack.org/#/c/605317
> > [4]
> >
> https://review.openstack.org/#/c/605317/1/specs/stein/add-volume-re-image-api.rst@75
> >
> > Regards,
> > Yikun
> > 
> > Jiang Yikun(Kero)
> > Mail: yikunk...@gmail.com
>
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api] Open API 3.0 for OpenStack API

2018-10-09 Thread Jeremy Stanley
On 2018-10-09 08:52:52 -0400 (-0400), Jim Rollenhagen wrote:
[...]
> It seems to me that a major goal of openstacksdk is to hide differences
> between clouds from the user. If the user is meant to use a GraphQL library
> themselves, we lose this and the user needs to figure it out themselves.
> Did I understand that correctly?

This is especially useful where the SDK implements business logic
for common operations like "if the user requested A and the cloud
supports features B+C+D then use those to fulfil the request,
otherwise fall back to using features E+F".
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api] Open API 3.0 for OpenStack API

2018-10-09 Thread Jim Rollenhagen
On Mon, Oct 8, 2018 at 5:58 AM Gilles Dubreuil  wrote:

>
> On 05/10/18 21:54, Jim Rollenhagen wrote:
>
> GraphQL has introspection features that allow clients to pull the schema
> (types, queries, mutations, etc): https://graphql.org/learn/introspection/
>
> That said, it seems like using this in a client like OpenStackSDK would
> get messy quickly. Instead of asking for which versions are supported,
> you'd have to fetch the schema, map it to actual features somehow, and
> adjust queries based on this info.
>
>
> A main difference in software architecture when using GraphQL is that a
> client makes use of a GraphQL client library instead of relaying on a SDK.
>

It seems to me that a major goal of openstacksdk is to hide differences
between clouds from the user. If the user is meant to use a GraphQL library
themselves, we lose this and the user needs to figure it out themselves.
Did I understand that correctly?

// jim
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Mark Goddard
On Tue, 9 Oct 2018 at 12:03, Florian Engelmann <
florian.engelm...@everyware.ch> wrote:

> Am 10/9/18 um 11:04 AM schrieb Mark Goddard:
> > Thanks for these suggestions Florian, there are some interesting ideas
> > in here. I'm a little concerned about the maintenance overhead of adding
> > support for all of these things, and wonder if some of them could be
> > done without explicit support in kolla and kolla-ansible. The kolla
> > projects have been able to move quickly by providing a flexible
> > configuration mechanism that avoids the need to maintain support for
> > every OpenStack feature. Other thoughts inline.
> >
>
> I do understand your apprehensions Mark. For some of the suggested
> changes/additions I agree. But adding those components without
> kolla/kolla-ansible integration feels not right.
>

I'm not entirely against adding some of these things, if enough people in
the community want them. I'd just like to make sure that if they can be
done in a sane way without changes, then we do that and document how to do
it instead.

>
> >
> > On Mon, 8 Oct 2018 at 11:15, Florian Engelmann
> > mailto:florian.engelm...@everyware.ch>>
>
> > wrote:
> >
> > Hi,
> >
> > I would like to start a discussion about some changes and additions I
> > would like to see in in kolla and kolla-ansible.
> >
> > 1. Keepalived is a problem in layer3 spine leaf networks as any
> > floating
> > IP can only exist in one leaf (and VRRP is a problem in layer3). I
> > would
> > like to use consul and registrar to get rid of the "internal"
> floating
> > IP and use consuls DNS service discovery to connect all services with
> > each other.
> >
> >
> > Without reading up, I'm not sure exactly how this fits together. If
> > kolla-ansible made the API host configurable for each service rather
> > than globally, would that be a step in the right direction?
>
> No that would not help. The problem is HA. Right now there is a
> "central" floating IP (kolla_internal_vip_address) that is used for all
> services to connect to (each other). Keepalived is failing that IP over
> if the "active" host fails. In a layer3 (CLOS/Spine-Leaf) network this
> IP is only available in one leaf/rack. So that rack is a "SPOF".
> Using service discovery fits perfect in a CLOS network and scales much
> better as a HA solution.
>
> Right, but what I'm saying as a thought experiment is, if we gave you the
required variables in kolla-ansible (e.g. nova_internal_fqdn) to make this
possible with an externally managed consul service, could that work?

>
> >
> > 2. Using "ports" for external API (endpoint) access is a major
> headache
> > if a firewall is involved. I would like to configure the HAProxy (or
> > fabio) for the external access to use "Host:" like, eg. "Host:
> > keystone.somedomain.tld", "Host: nova.somedomain.tld", ... with
> HTTPS.
> > Any customer would just need HTTPS access and not have to open all
> > those
> > ports in his firewall. For some enterprise customers it is not
> possible
> > to request FW changes like that.
> >
> > 3. HAProxy is not capable to handle "read/write" split with Galera. I
> > would like to introduce ProxySQL to be able to scale Galera.
> >
> >
> > It's now possible to use an external database server with kolla-ansible,
> > instead of deploying a mariadb/galera cluster. This could be implemented
> > how you like, see
> >
> https://docs.openstack.org/kolla-ansible/latest/reference/external-mariadb-guide.html
> .
>
> Yes I agree. And this is what we will do in our first production
> deployment. But I would love to see ProxySQL in Kolla as well.
> As a side note: Kolla-ansible does use:
>
> option mysql-check user haproxy post-41
>
> to check Galera, but that check does not fail if the node is out of sync
> with the other nodes!
>
> http://galeracluster.com/documentation-webpages/monitoringthecluster.html
>
> That's good to know. Could you raise a bug in kolla-ansible on launchpad,
and offer advice on how to improve this check if you have any?

>
> >
> > 4. HAProxy is fine but fabio integrates well with consul, statsd and
> > could be connected to a vault cluster to manage secure certificate
> > access.
> >
> > As above.
> >
> > 5. I would like to add vault as Barbican backend.
> >
> > Does this need explicit support in kolla and kolla-ansible, or could it
> > be done through configuration of barbican.conf? Are there additional
> > packages required in the barbican container? If so, see
> >
> https://docs.openstack.org/kolla/latest/admin/image-building.html#package-customisation
> .
>
> True but the vault (and consul) containers could be deployed and managed
> by kolla-ansible.
>
> I'd like to see if anyone else is interested in this. Kolla ansible
already deploys a large number of services, which is great. As with many
other projects I'm seeing the resources of core contributors fall off a
little, and I think we need to 

Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 06:34 AM, Florian Engelmann wrote:

Am 10/9/18 um 11:41 AM schrieb Jay Pipes:

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy 
endpoint and just have haproxy spread all traffic across each Galera 
node?


Galera, after all, is multi-master synchronous replication... so it 
shouldn't matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses 
cluster-wide optimistic locking. This may cause some transactions to 
rollback. With an increasing number of writeable masters, the 
transaction rollback rate may increase, especially if there is write 
contention on the same dataset. It is of course possible to retry the 
transaction and perhaps it will COMMIT in the retries, but this will 
add to the transaction latency. However, some designs are deadlock 
prone, e.g sequence tables.

—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial 



Have you seen the above in production?


Yes of course. Just depends on the application and how high the workload 
gets.


Please read about deadloks and nova in the following report by Intel:

http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf 


I have read the above. It's a synthetic workload analysis, which is why 
I asked if you'd seen this in production.


For the record, we addressed much of the contention/races mentioned in 
the above around scheduler resource consumption in the Ocata and Pike 
releases of Nova.


I'm aware that the report above identifies the quota handling code in 
Nova as the primary culprit of the deadlock issues but again, it's a 
synthetic workload that is designed to find breaking points. It doesn't 
represent a realistic production workload.


You can read about the deadlock issue in depth on my blog here:

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/

That explains where the source of the problem comes from (it's the use 
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling 
code in the Rocky release).


If just Nova is affected we could also create an additional HAProxy 
listener using all Galera nodes with round-robin for all other services?


I fail to see the point of using Galera with a single writer. At that 
point, why bother with Galera at all? Just use a single database node 
with a single slave for backup purposes.



Anyway - proxySQL would be a great extension.


I don't disagree that proxySQL is a good extension. However, it adds yet 
another services to the mesh that needs to be deployed, configured and 
maintained.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Chair for Edge squad meeting this week

2018-10-09 Thread James Slagle
I won't be able to chair the edge squad meeting this week. Can anyone
take it over? If not, we'll pick it back up next week.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Florian Engelmann

Am 10/9/18 um 11:04 AM schrieb Mark Goddard:
Thanks for these suggestions Florian, there are some interesting ideas 
in here. I'm a little concerned about the maintenance overhead of adding 
support for all of these things, and wonder if some of them could be 
done without explicit support in kolla and kolla-ansible. The kolla 
projects have been able to move quickly by providing a flexible 
configuration mechanism that avoids the need to maintain support for 
every OpenStack feature. Other thoughts inline.




I do understand your apprehensions Mark. For some of the suggested 
changes/additions I agree. But adding those components without 
kolla/kolla-ansible integration feels not right.




On Mon, 8 Oct 2018 at 11:15, Florian Engelmann 
mailto:florian.engelm...@everyware.ch>> 
wrote:


Hi,

I would like to start a discussion about some changes and additions I
would like to see in in kolla and kolla-ansible.

1. Keepalived is a problem in layer3 spine leaf networks as any
floating
IP can only exist in one leaf (and VRRP is a problem in layer3). I
would
like to use consul and registrar to get rid of the "internal" floating
IP and use consuls DNS service discovery to connect all services with
each other.


Without reading up, I'm not sure exactly how this fits together. If 
kolla-ansible made the API host configurable for each service rather 
than globally, would that be a step in the right direction?


No that would not help. The problem is HA. Right now there is a 
"central" floating IP (kolla_internal_vip_address) that is used for all 
services to connect to (each other). Keepalived is failing that IP over 
if the "active" host fails. In a layer3 (CLOS/Spine-Leaf) network this 
IP is only available in one leaf/rack. So that rack is a "SPOF".
Using service discovery fits perfect in a CLOS network and scales much 
better as a HA solution.





2. Using "ports" for external API (endpoint) access is a major headache
if a firewall is involved. I would like to configure the HAProxy (or
fabio) for the external access to use "Host:" like, eg. "Host:
keystone.somedomain.tld", "Host: nova.somedomain.tld", ... with HTTPS.
Any customer would just need HTTPS access and not have to open all
those
ports in his firewall. For some enterprise customers it is not possible
to request FW changes like that.

3. HAProxy is not capable to handle "read/write" split with Galera. I
would like to introduce ProxySQL to be able to scale Galera.


It's now possible to use an external database server with kolla-ansible, 
instead of deploying a mariadb/galera cluster. This could be implemented 
how you like, see 
https://docs.openstack.org/kolla-ansible/latest/reference/external-mariadb-guide.html.


Yes I agree. And this is what we will do in our first production 
deployment. But I would love to see ProxySQL in Kolla as well.

As a side note: Kolla-ansible does use:

option mysql-check user haproxy post-41

to check Galera, but that check does not fail if the node is out of sync 
with the other nodes!


http://galeracluster.com/documentation-webpages/monitoringthecluster.html




4. HAProxy is fine but fabio integrates well with consul, statsd and
could be connected to a vault cluster to manage secure certificate
access.

As above.

5. I would like to add vault as Barbican backend.

Does this need explicit support in kolla and kolla-ansible, or could it 
be done through configuration of barbican.conf? Are there additional 
packages required in the barbican container? If so, see 
https://docs.openstack.org/kolla/latest/admin/image-building.html#package-customisation.


True but the vault (and consul) containers could be deployed and managed 
by kolla-ansible.




6. I would like to add an option to enable tokenless authentication for
all services with each other to get rid of all the openstack service
passwords (security issue).

Again, could this be done without explicit support?


We did not investigate here. Changes to the apache configuration are 
needed. I guess we will have to change the kolla container itself to do 
so? Is it possible to "inject" files in a container using kolla-ansible?


smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Florian Engelmann

Am 10/9/18 um 11:41 AM schrieb Jay Pipes:

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy 
endpoint and just have haproxy spread all traffic across each Galera 
node?


Galera, after all, is multi-master synchronous replication... so it 
shouldn't matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses 
cluster-wide optimistic locking. This may cause some transactions to 
rollback. With an increasing number of writeable masters, the 
transaction rollback rate may increase, especially if there is write 
contention on the same dataset. It is of course possible to retry the 
transaction and perhaps it will COMMIT in the retries, but this will 
add to the transaction latency. However, some designs are deadlock 
prone, e.g sequence tables.

—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial 



Have you seen the above in production?



Yes of course. Just depends on the application and how high the workload 
gets.


Please read about deadloks and nova in the following report by Intel:

http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf

If just Nova is affected we could also create an additional HAProxy 
listener using all Galera nodes with round-robin for all other services?


Anyway - proxySQL would be a great extension.


smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] assigning new liaisons to projects

2018-10-09 Thread Davanum Srinivas
On Mon, Oct 8, 2018 at 11:57 PM Ghanshyam Mann 
wrote:

>
>
>
>   On Mon, 08 Oct 2018 23:27:06 +0900 Doug Hellmann <
> d...@doughellmann.com> wrote 
>  > TC members,
>  >
>  > Since we are starting a new term, and have several new members, we need
>  > to decide how we want to rotate the liaisons attached to each our
>  > project teams, SIGs, and working groups [1].
>  >
>  > Last term we went through a period of volunteer sign-up and then I
>  > randomly assigned folks to slots to fill out the roster evenly. During
>  > the retrospective we talked a bit about how to ensure we had an
>  > objective perspective for each team by not having PTLs sign up for their
>  > own teams, but I don't think we settled on that as a hard rule.
>  >
>  > I think the easiest and fairest (to new members) way to manage the list
>  > will be to wipe it and follow the same process we did last time. If you
>  > agree, I will update the page this week and we can start collecting
>  > volunteers over the next week or so.
>
> +1, sounds good to me.
>
> -gmann
>

+1 from me as well.



>  >
>  > Doug
>  >
>  > [1] https://wiki.openstack.org/wiki/OpenStack_health_tracker
>  >
>  >
> __
>  > OpenStack Development Mailing List (not for usage questions)
>  > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>  > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>  >
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Davanum Srinivas :: https://twitter.com/dims
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Balázs Gibizer
Hi,

Setup
-

nested allocation: an allocation that contains resources from one or 
more nested RPs. (if you have better term for this then please suggest).

If an instance has nested allocation it means that the compute, it 
allocates from, has a nested RP tree. BUT if a compute has a nested RP 
tree it does not automatically means that the instance, allocating from 
that compute, has a nested allocation (e.g. bandwidth inventory will be 
on a nested RPs but not every instance will require bandwidth)

Afaiu, as soon as we have NUMA modelling in place the most trivial 
servers will have nested allocations as CPU and MEMORY inverntory will 
be moved to the nested NUMA RPs. But NUMA is still in the future.

Sidenote: there is an edge case reported by bauzas when an instance 
allocates _only_ from nested RPs. This was discussed on last Friday and 
it resulted in a new patch[0] but I would like to keep that discussion 
separate from this if possible.

Sidenote: the current problem somewhat related to not just nested PRs 
but to sharing RPs as well. However I'm not aiming to implement sharing 
support in Nova right now so I also try to keep the sharing disscussion 
separated if possible.

There was already some discussion on the Monday's scheduler meeting but 
I could not attend.
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20


The meat


Both live-migrate[1] and evacuate[2] has an optional force flag on the 
nova REST API. The documentation says: "Force  by not 
verifying the provided destination host by the scheduler."

Nova implements this statement by not calling the scheduler if 
force=True BUT still try to manage allocations in placement.

To have allocation on the destination host Nova blindly copies the 
instance allocation from the source host to the destination host during 
these operations. Nova can do that as 1) the whole allocation is 
against a single RP (the compute RP) and 2) Nova knows both the source 
compute RP and the destination compute RP.

However as soon as we bring nested allocations into the picture that 
blind copy will not be feasible. Possible cases
0) The instance has non-nested allocation on the source and would need 
non nested allocation on the destination. This works with blindy copy 
today.
1) The instance has a nested allocation on the source and would need a 
nested allocation on the destination as well.
2) The instance has a non-nested allocation on the source and would 
need a nested allocation on the destination.
3) The instance has a nested allocation on the source and would need a 
non nested allocation on the destination.

Nova cannot generate nested allocations easily without reimplementing 
some of the placement allocation candidate (a_c) code. However I don't 
like the idea of duplicating some of the a_c code in Nova.

Nova cannot detect what kind of allocation (nested or non-nested) an 
instance would need on the destination without calling placement a_c. 
So knowing when to call placement is a chicken and egg problem.

Possible solutions:
A) fail fast

0) Nova can detect that the source allocatioin is non-nested and try 
the blindy copy and it will succeed.
1) Nova can detect that the source allocaton is nested and fail the 
operation
2) Nova only sees a non nested source allocation. Even if the dest RP 
tree is nested it does not mean that the allocation will be nested. We 
cannot fail fast. Nova can try the blind copy and allocate every 
resources from the root RP of the destination. If the instance require 
nested allocation instead the claim will fail in placement. So nova can 
fail the operation a bit later than in 1).
3) Nova can detect that the source allocation is nested and fail the 
operation. However and enhanced blind copy that tries to allocation 
everything from the root RP on the destinaton would have worked.

B) Guess when to ignore the force flag and call the scheduler
-
0) keep the blind copy as it works
1) Nova detect that the source allocation is nested. Ignores the force 
flag and calls the scheduler that will call placement a_c. Move 
operation can succeed.
2) Nova only sees a non nested source allocation so it will fall back 
to blind copy and fails at the claim on destination.
3) Nova detect that the source allocation is nested. Ignores the force 
flag and calls the scheduler that will call placement a_c. Move 
operation can succeed.

This solution would be against the API doc that states nova does not 
call the scheduler if the operation is forced. However in case of force 
live-migration Nova already verifies the target host from couple of 
perspective in [3].
This solution is alreay proposed for live-migrate in [4] and for 
evacuate in [5] so the complexity of the solution can be seen in the 
reviews.

C) Remove the force flag from the API in a new microversion

Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy endpoint and 
just have haproxy spread all traffic across each Galera node?

Galera, after all, is multi-master synchronous replication... so it shouldn't 
matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses cluster-wide 
optimistic locking. This may cause some transactions to rollback. With an 
increasing number of writeable masters, the transaction rollback rate may 
increase, especially if there is write contention on the same dataset. It is of 
course possible to retry the transaction and perhaps it will COMMIT in the 
retries, but this will add to the transaction latency. However, some designs 
are deadlock prone, e.g sequence tables.
—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial


Have you seen the above in production?

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Mark Goddard
Thanks for these suggestions Florian, there are some interesting ideas in
here. I'm a little concerned about the maintenance overhead of adding
support for all of these things, and wonder if some of them could be done
without explicit support in kolla and kolla-ansible. The kolla projects
have been able to move quickly by providing a flexible configuration
mechanism that avoids the need to maintain support for every OpenStack
feature. Other thoughts inline.

Regards,
Mark

On Mon, 8 Oct 2018 at 11:15, Florian Engelmann <
florian.engelm...@everyware.ch> wrote:

> Hi,
>
> I would like to start a discussion about some changes and additions I
> would like to see in in kolla and kolla-ansible.
>
> 1. Keepalived is a problem in layer3 spine leaf networks as any floating
> IP can only exist in one leaf (and VRRP is a problem in layer3). I would
> like to use consul and registrar to get rid of the "internal" floating
> IP and use consuls DNS service discovery to connect all services with
> each other.
>

Without reading up, I'm not sure exactly how this fits together. If
kolla-ansible made the API host configurable for each service rather than
globally, would that be a step in the right direction?

>
> 2. Using "ports" for external API (endpoint) access is a major headache
> if a firewall is involved. I would like to configure the HAProxy (or
> fabio) for the external access to use "Host:" like, eg. "Host:
> keystone.somedomain.tld", "Host: nova.somedomain.tld", ... with HTTPS.
> Any customer would just need HTTPS access and not have to open all those
> ports in his firewall. For some enterprise customers it is not possible
> to request FW changes like that.
>
> 3. HAProxy is not capable to handle "read/write" split with Galera. I
> would like to introduce ProxySQL to be able to scale Galera.
>

It's now possible to use an external database server with kolla-ansible,
instead of deploying a mariadb/galera cluster. This could be implemented
how you like, see
https://docs.openstack.org/kolla-ansible/latest/reference/external-mariadb-guide.html
.

4. HAProxy is fine but fabio integrates well with consul, statsd and
> could be connected to a vault cluster to manage secure certificate access.
>
> As above.


> 5. I would like to add vault as Barbican backend.
>
> Does this need explicit support in kolla and kolla-ansible, or could it
be done through configuration of barbican.conf? Are there additional
packages required in the barbican container? If so, see
https://docs.openstack.org/kolla/latest/admin/image-building.html#package-customisation
.

> 6. I would like to add an option to enable tokenless authentication for
> all services with each other to get rid of all the openstack service
> passwords (security issue).
>
> Again, could this be done without explicit support?


> What do you think about it?
>
> All the best,
> Florian
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Christian Berendt


> On 8. Oct 2018, at 19:48, Jay Pipes  wrote:
> 
> Why not send all read and all write traffic to a single haproxy endpoint and 
> just have haproxy spread all traffic across each Galera node?
> 
> Galera, after all, is multi-master synchronous replication... so it shouldn't 
> matter which node in the Galera cluster you send traffic to.

Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses cluster-wide 
optimistic locking. This may cause some transactions to rollback. With an 
increasing number of writeable masters, the transaction rollback rate may 
increase, especially if there is write contention on the same dataset. It is of 
course possible to retry the transaction and perhaps it will COMMIT in the 
retries, but this will add to the transaction latency. However, some designs 
are deadlock prone, e.g sequence tables.
—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial

Christian.

-- 
Christian Berendt
Chief Executive Officer (CEO)

Mail: bere...@betacloud-solutions.de
Web: https://www.betacloud-solutions.de

Betacloud Solutions GmbH
Teckstrasse 62 / 70190 Stuttgart / Deutschland

Geschäftsführer: Christian Berendt
Unternehmenssitz: Stuttgart
Amtsgericht: Stuttgart, HRB 756139


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla-ansible][zun] Request to make another release for stable/queens

2018-10-09 Thread Mark Goddard
Hi Hongbin,

I'll add this to our meeting agenda for tomorrow, but I see no reason why
we should not make another queens series release.

Cheers,
Mark

On Sun, 7 Oct 2018 at 18:50, Hongbin Lu  wrote:

> Hi Kolla team,
>
> I have a fixup on the configuration of Zun service:
> https://review.openstack.org/#/c/591256/ . The fix has been backported to
> stable/queens and I wonder if it is possible to release kolla-ansible 6.2.0
> that contains this patch. The deployment of Zun service needs this patch to
> work properly.
>
> Best regards,
> Hongbin
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [SIGS] Ops Tools SIG

2018-10-09 Thread Miguel Angel Ajo Pelayo
Hello

Yesterday, during the Oslo meeting we discussed [6] the possibility of
creating a new Special Interest Group [1][2] to provide home and release
means for operator related tools [3] [4] [5]

I continued the discussion with M.Hillsman later, and he made me aware
of the operator working group and mailing list, which existed even before
the SIGs.

I believe it could be a very good idea, to give life and more
visibility to all those very useful tools (for example, I didn't know some
of them existed ...).

   Give this, I have two questions:

   1) Do you know or more tools which could find home under an Ops Tools
SIG umbrella?

   2) Do you want to join us?


Best regards and have a great day.


[1] https://governance.openstack.org/sigs/
[2] http://git.openstack.org/cgit/openstack/governance-sigs/tree/sigs.yaml
[3] https://wiki.openstack.org/wiki/Osops
[4] http://git.openstack.org/cgit/openstack/ospurge/tree/
[5] http://git.openstack.org/cgit/openstack/os-log-merger/tree/
[6]
http://eavesdrop.openstack.org/meetings/oslo/2018/oslo.2018-10-08-15.00.log.html#l-130



-- 
Miguel Ángel Ajo
OSP / Networking DFG, OVN Squad Engineering
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][senlin] Action Required. Idea to propose for a forum for autoscaling features integration

2018-10-09 Thread Rico Lin
a reminder for all, please put your ideas/thoughts/suggest actions in our
etherpad [1],
which we gonna use for further discussion in Forum, or in PTG if we got no
forum for it.
So we won't be missing anything.



[1] https://etherpad.openstack.org/p/autoscaling-integration-and-feedback

On Tue, Oct 9, 2018 at 2:22 PM Qiming Teng  wrote:

> > >One approach would be to switch the underlying Heat AutoScalingGroup
> > >implementation to use Senlin and then deprecate the AutoScalingGroup
> > >resource type in favor of the Senlin resource type over several
> > >cycles.
> >
> > The hard part (or one hard part, at least) of that is migrating the
> existing
> > data.
>
> Agreed. In an ideal world, we can transparently transplant the "scaling
> group" resource implementation onto something (e.g. a library or an
> interface). This sounds like an option for both teams to brainstorm
> together.
>
> - Qiming
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
May The Force of OpenStack Be With You,

*Rico Lin*irc: ricolin
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][senlin] Action Required. Idea to propose for a forum for autoscaling features integration

2018-10-09 Thread Qiming Teng
> >One approach would be to switch the underlying Heat AutoScalingGroup
> >implementation to use Senlin and then deprecate the AutoScalingGroup
> >resource type in favor of the Senlin resource type over several
> >cycles.
> 
> The hard part (or one hard part, at least) of that is migrating the existing
> data.

Agreed. In an ideal world, we can transparently transplant the "scaling
group" resource implementation onto something (e.g. a library or an
interface). This sounds like an option for both teams to brainstorm
together.

- Qiming


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev