date:20170519

So making a subnet ID mandatory for a port creation and a RouterInterface
ID mandatory for a Floating IP creation are both possible in Heat without
Neutron changes. Presumably you haven't done that because it's
backwards-incompatible, but you would need to implement the change anyway
if the Neutron API was changed to require it.

It seems like Heat has a backwards-compatibility requirement for supporting
old templates that aren't explicit. That will be the real blocker to
actually making any of these changes, no? i.e. Neutron isn't preventing
Heat from being more strict, it's the legacy Heat modeling that is
preventing it.

>(a) drop the requirement that the Network has to be connected to the
external network with the FloatingIPs with a RouterInterface prior to
creating the FloatingIP. IIUC only *some* Neutron backends require this.

This can produce difficult to debug situations when multiple routers
attached to different external networks are attached to different subnets
of the same network and the user associates a floating IP to the wrong
fixed IP of the instance. Right now the interface check will prevent that,
but if we remove it the floating IP would just sit in the DOWN state.

If a backend supports floating IPs without router interfaces entirely, it's
likely making assumptions that prevent it from supporting multi-router
scenarios. A single fixed IP on a port can have multiple floating IPs
associated with it from different external networks. So the only way to
distinguish which floating IP to translate to is which router the traffic
is being directed to by the instance, which requires router interfaces.

Cheers

On Fri, May 19, 2017 at 3:29 PM, Zane Bitter  wrote:

> On 19/05/17 17:03, Kevin Benton wrote:
>
>> I split this conversation off of the "Is the pendulum swinging on PaaS
>> layers?" thread [1] to discuss some improvements we can make to Neutron
>> to make orchestration easier.
>>
>> There are some pain points that heat has when working with the Neutron
>> API. I would like to get them converted into requests for enhancements
>> in Neutron so the wider community is aware of them.
>>
>> Starting with the port/subnet/network relationship - it's important to
>> understand that IP addresses are not required on a port.
>>
>> So knowing now that a Network is a layer-2 network segment and a Subnet
>>>
>> is... effectively a glorified DHCP address pool
>>
>> Yes, a Subnet controls IP address allocation as well as setting up
>> routing for routers, which is why routers reference subnets instead of
>> networks (different routers can route for different subnets on the same
>> network). It essentially dictates things related to L3 addressing and
>> provides information for L3 reachability.
>>
>> But at the end of the day, I still can't create a Port until a Subnet
>>> exists
>>>
>>
>> This is only true if you want an IP address on the port. This sounds
>> silly for most use cases, but there are a non-trivial portion of NFV
>> workloads that do not want IP addresses at all so they create a network
>> and just attach ports without creating any subnets.
>>
>
> Fair. A more precise statement of the problem would be that given a
> template containing both a Port and a Subnet that it will be attached to,
> there is a specific order in which those need to be created that is _not_
> reflected in the data flow between them.
>
> I still don't know what Subnet a Port will be attached to (unless the
>>>
>> user specifies it explicitly using the --fixed-ip option... regardless
>> of whether they actually specify a fixed IP),
>>
>> So what would you like Neutron to do differently here? Always force a
>> user to pick which subnet they want an allocation from
>>
>
> That would work.
>
> if there are
>> multiple?
>>
>
> Ideally even if there aren't.
>
> If so, can't you just force that explicitness in Heat?
>>
>
> I think the answer here is exactly the same as for Neutron: yes, we
> totally could have if we'd realised that it was a problem at the time.
>
> and I have no way in general of telling which Subnets can be deleted
>>> before a given Port is and which will fail to delete until the Port
>>> disappears.
>>>
>>
>> A given port will only block subnet deletions from subnets it is
>> attached to. Conversely, you can see all ports with allocations from a
>> subnet with 'neutron port-list --fixed-ips subnet_id='.  So
>> is the issue here that the dependency wasn't made explicit in the heat
>> modeling (leading to the problem above and this one)?
>>
>
> Yes, that's exactly the issue. The Heat modelling was based on 1:1 with
> the Neutron API to minimise user confusion.
>
> For the individual bugs you highlighted, it would be good if you can
>> provide some details about what changes we could make to help.
>>
>>
>> https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result
>> of partially specified floating IPs (no fixed_ip). What can we
>> add/change here to help? Or can heat just

[openstack-dev] [nova] Bug heads up: Significant increase in DB connections with cells since Newton

2017-05-19 Thread melanie witt


Hey Folks,

I wanted to give everyone a heads up that we have found and fixed a bug 
[1] on master where the number of database connections had increased so 
much that we were starting to hit a "OperationalError 
(pymysql.err.OperationalError) (1040, u'Too many connections')" in the 
gate on some work-in-progress patches. The problem of an increased 
number of database connections goes all the way back to Newton, based on 
when the cell database connecting code was written along with a report 
from at least one operator who has upgraded to Newton.


Upon investigation, we found that the way we were establishing 
connections to cell databases was effectively "leaking" connections and 
we have merged a fix on master and have backports to Ocata and Newton in 
progress [2].


Once we have merged the backports, we'll be releasing new versions from 
stable/newton and stable/ocata with the fix ASAP.


Thanks,
melanie

[1] https://bugs.launchpad.net/nova/+bug/1691545
[2] https://review.openstack.org/#/q/topic:bug/1691545

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron] - Summit "The Future of VPN as a Service (VPNaaS)" summary and next steps

Hi,

Here are the key points from "The Future of VPN as a Service" session:
https://etherpad.openstack.org/p/boston-vpnaas

We had a good showing of operators that have VPNaaS deployed who are
relatively happy with it and definitely want to see it continue being
developed. We have a list of 9 people who have volunteered to help keep the
repo maintained, which should be plenty to at least keep it from bit
rotting.

The assessment is in process upstream to make it part of the stadium again (
https://review.openstack.org/#/c/452032/) and it looks like we have the
community to back it.

Here are a few pain points using VPNaaS:

- Better error details need to be reported back to user (e.g. which IPsec
phase failed). Currently this requires a support call to the operator.[1]
- VPN code should load in the existing L3 agent using some kind of
extension framework so there doesn't need to be a separate subclass for
VPNaaS with its own binary.[2]
- IPv6 support needs to be covered with tests [3]
- Improved documentation [4]:
- installation (maybe an entry in the networking guide)
- behavior in L3 HA failover scenarios
- several useful documentation links from user's perspective available
in etherpad

1. https://bugs.launchpad.net/neutron/+bug/1692126
2. https://bugs.launchpad.net/neutron/+bug/1692128
3. https://bugs.launchpad.net/neutron/+bug/1692130
4. https://bugs.launchpad.net/neutron/+bug/1692131

Cheers,
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron] - Summit "Distributed SNAT with DVR" summary and next steps

Hi,

https://etherpad.openstack.org/p/boston-dvr

The summary from this session is somewhat short. It looks like for this
cycle we don't have the bandwidth to either implement or review the data
model, API, and agent changes required to implement this feature in the
in-tree DVR implementation.

If this is a burning point for you as an operator, consider looking into
Dragonflow, which now has support for distributed SNAT.

Cheers,
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron] - Summit "Neutron Pain points session" summary and next steps

Hi,

Here are the key points from the "Neutron Pain Points" session:
https://etherpad.openstack.org/p/neutron-boston-painpoints


FWaaS v1 to v2 - We should try to support running both v1 and v2
concurrently so operators aren't forced to immediately kill all existing v1
users just to start to switch to v2. If not, we need some sort of migration
at minimum if v1 is to be removed at some point.[1]

dsnmasq alternatives for DHCP - There was some interest in using other DHCP
servers. If you are interested, contact the people listed on the volunteer
list to work with them on putting up patches.

Security groups API docs needs improvement - security groups support more
than just ICMP, UDP, TCP but that isn't clear via the docs.[2]

Altering the preset default security group rules - this came up again, but
I don't think any different conclusions were reached other than "make your
on-boarding script do it".  This is because we can't really change them at
this point without a big backwards compatibility problem and if we make
them easily configurable we end up with a cross-cloud compatibility problem.

Let me know if I left out anything important. Also, keep an eye out for the
summary of the closely related session "Making Neutron easy for people who
want basic Networking" from Sukhdev if your topics were covered that
session.

1. https://bugs.launchpad.net/neutron/+bug/1692133
2. https://bugs.launchpad.net/neutron/+bug/1692134

Cheers,
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

2017-05-19 Thread Zane Bitter


On 19/05/17 17:03, Kevin Benton wrote:

I split this conversation off of the "Is the pendulum swinging on PaaS
layers?" thread [1] to discuss some improvements we can make to Neutron
to make orchestration easier.

There are some pain points that heat has when working with the Neutron
API. I would like to get them converted into requests for enhancements
in Neutron so the wider community is aware of them.

Starting with the port/subnet/network relationship - it's important to
understand that IP addresses are not required on a port.


So knowing now that a Network is a layer-2 network segment and a Subnet

is... effectively a glorified DHCP address pool

Yes, a Subnet controls IP address allocation as well as setting up
routing for routers, which is why routers reference subnets instead of
networks (different routers can route for different subnets on the same
network). It essentially dictates things related to L3 addressing and
provides information for L3 reachability.


But at the end of the day, I still can't create a Port until a Subnet exists


This is only true if you want an IP address on the port. This sounds
silly for most use cases, but there are a non-trivial portion of NFV
workloads that do not want IP addresses at all so they create a network
and just attach ports without creating any subnets.


Fair. A more precise statement of the problem would be that given a 
template containing both a Port and a Subnet that it will be attached 
to, there is a specific order in which those need to be created that is 
_not_ reflected in the data flow between them.



I still don't know what Subnet a Port will be attached to (unless the

user specifies it explicitly using the --fixed-ip option... regardless
of whether they actually specify a fixed IP),

So what would you like Neutron to do differently here? Always force a
user to pick which subnet they want an allocation from


That would work.


if there are
multiple?


Ideally even if there aren't.


If so, can't you just force that explicitness in Heat?


I think the answer here is exactly the same as for Neutron: yes, we 
totally could have if we'd realised that it was a problem at the time.



and I have no way in general of telling which Subnets can be deleted before a 
given Port is and which will fail to delete until the Port disappears.


A given port will only block subnet deletions from subnets it is
attached to. Conversely, you can see all ports with allocations from a
subnet with 'neutron port-list --fixed-ips subnet_id='.  So
is the issue here that the dependency wasn't made explicit in the heat
modeling (leading to the problem above and this one)?


Yes, that's exactly the issue. The Heat modelling was based on 1:1 with 
the Neutron API to minimise user confusion.



For the individual bugs you highlighted, it would be good if you can
provide some details about what changes we could make to help.


https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result
of partially specified floating IPs (no fixed_ip). What can we
add/change here to help? Or can heat just always force the user to
specify a fixed IP for the case where disambiguation on multiple
fixed_ip ports is needed?


This is the issue from which all the others on that list were spawned 
(see https://bugs.launchpad.net/heat/+bug/1442121/comments/10), so the 
only thing we're planning to actually do for this one is to catch any 
exceptions closer to where they occur than we're doing in the fix for 
https://bugs.launchpad.net/heat/+bug/1554625



https://launchpad.net/bugs/1626607


Note that this one is fixed.


- I see this is about a dependency
between RouterGateways and RouterInterfaces, but it's not clear to me
why that dependency exists. Is it to solve a lack of visibility into the
interfaces required for a floating IP?


Yes, exactly.

We essentially solved the RouterGateway/RouterInterface half of the 
problem in Heat back in Juno, by deprecating the 
OS::Neutron::RouterGateway resource and replacing it with an 
"external_gateway_info" property in OS::Neutron::Router. Old templates 
never die though.



https://bugs.launchpad.net/heat/+bug/1626619,
https://bugs.launchpad.net/heat/+bug/1626630, and
https://bugs.launchpad.net/heat/+bug/1626634 - These seems similar to
1626607.


The first and third are the RouterInterface/FloatingIP half of the 
problem. And to work around that we also have to work around the 
Subnet/Port problem (that's the third bug). The second bug is the 
RouterGateway/RouterInterface equivalent of the third.



Can we just expose the interfaces/router a floating IP is
depending on explicitly in the API for you to fix these?


Not really. We need to know before any of them are actually created. 
Preferably without making any REST calls, because REST calls are slow 
and tend to raise exceptions at unfortunate times.



If not, what
can we do to help here?


In principle, either:

(a) drop the requirement that the Network has to be connected to the

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

2017-05-19 Thread Armando M.

On 19 May 2017 at 14:54, Clark Boylan  wrote:

> On Fri, May 19, 2017, at 02:03 PM, Kevin Benton wrote:
> > I split this conversation off of the "Is the pendulum swinging on PaaS
> > layers?" thread [1] to discuss some improvements we can make to Neutron
> > to
> > make orchestration easier.
> >
> > There are some pain points that heat has when working with the Neutron
> > API.
> > I would like to get them converted into requests for enhancements in
> > Neutron so the wider community is aware of them.
> >
> > Starting with the port/subnet/network relationship - it's important to
> > understand that IP addresses are not required on a port.
> >
> > >So knowing now that a Network is a layer-2 network segment and a Subnet
> > is... effectively a glorified DHCP address pool
> >
> > Yes, a Subnet controls IP address allocation as well as setting up
> > routing
> > for routers, which is why routers reference subnets instead of networks
> > (different routers can route for different subnets on the same network).
> > It
> > essentially dictates things related to L3 addressing and provides
> > information for L3 reachability.
>
> One thing that is odd about this is when creating a router you specify
> the gateway information using a network which is l2 not l3. Seems like
> it would be more correct to use a subnet rather than a network there?
>

I think this is due the way external networks ended up being modeled in
neutron. I suppose we could have allowed the user to specify a subnet, so
long that it fell in the bucket of subnets that belong to a router:external
network.


>
> Clark
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Summary of BOS Summit session: User API Improvements

2017-05-19 Thread Rochelle Grober

Hey folks,

I’ve summarized the User API Improvement forum session.  I seem to recall that 
Clark Boyland and I were “volunteered” to project manage the effort to get 
these tracked and scheduled (which I would assume also means eiterh spec’ed, 
bp’ed or bugged), but I’m real fuzzy on that.  Any/all comments welcome.  The 
etherpad for this session is here:
https://etherpad.openstack.org/p/openstack-user-api-improvements

--Rocky

Summary:
This session’s focus was how to both improve the user experience in using 
OpenStack APIs through identification of issues and inconsistencies and raising 
their visibility in the developer community.  There were general observations, 
then observation specific to individual projects. The major themes of this 
session were:
Consistency:

· Use the same verbs for same/similar actions across all OpenStack

·  Make states the same across all projects

· UTF8 everywhere for user provided info and metadata

· Make ports 80/443 defaults across OpenStack so that deployments that 
have to think about nonstandard port assignments

· Services other than core/base services should be design to run *on* 
cloud, not *in* cloud

· All clouds should accept qcow2 uploads and convert to cloud native if 
necessary

· Cloud provided images should have immutable names

· Label ephemeral drives and swap disks consistently across clouds

· Enforce consistency in the service catalog across projects
Missing Functionality:

· Search function with wildcarding for entities within a cloud

· Automation for API self description (sort of like APINAME –help)

· Ability to take a “show” or “get” response and pass response to a 
“create” call (piping)

· support for image aliases

· Image annotations, both for cloud provider and user

· Provide information with net addresses as to whether internal to 
cloud only or internet accessible

· Clarify DHCP APIs and documentation

· Document config drive – important functionality the currently 
requires reading the code

· Nested virt

· Multi-attach read-only volumes

· Support for master+slave impactless backups
Improve Functionality:

· Better info/differentiation on custom attributes of images:  
read-only vs user-defined vs ??

· Create internet attached network with a single API call based on user 
expressible rules/options

· Move towards Neutron+IPv6 as default networking

· Default Security groups default needs improvement

· Improve clarity of which device each volume is attached to

· Make quota management simpler

· Horizon: move security groups to networking menus

· User facing docs on how to use all the varieties of auth and scopes

· Heat should be able to run on top of clouds (improves consistency and 
interop)

· Heat support for multi region and multi cloud




华为技术有限公司 Huawei Technologies Co., Ltd.
[Company_logo]
Rochelle Grober
Sr. Staff Architect, Open Source
Office Phone:408-330-5472
Email:rochelle.gro...@huawei.com

 本邮件及其附件含有华为公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中
的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 5/19/2017 3:35 PM, Monty Taylor wrote:
Heck - while I'm on floating ips ... if you have some pre-existing 
floating ips and you want to boot servers on them and you want to do 
that in parallel, you can't. You can boot a server with a floating ip 
that did not pre-exist if you get the port id of the fixed ip of the 
server then pass that id to the floating ip create call. Of course, the 
server doesn't return the port id in the server record, so at the very 
least you need to make a GET /ports.json?device_id={server_id} call. Of 
course what you REALLY need to find is the port_id of the ip of the 
server that came from a subnet that has 'gateway_ip' defined, which is 
even more fun since ips are associated with _networks_ on the server 
record and not with subnets.


A few weeks ago I think we went down this rabbit hole in the nova 
channel, which led to this etherpad:


https://etherpad.openstack.org/p/nova-os-interfaces

It was really a discussion about the weird APIs that nova has and a lot 
of the time our first question is, "why does it return this, or that, or 
how is this consumed even?", at which point we put out the Monty signal.


During a seemingly unrelated forum session on integrating searchlight 
with nova-api, operators in the room were saying they wanted to see 
ports returned in the server response body, which I think Monty was also 
saying when we were going through that etherpad above.


This goes back to a common issue we/I have in nova which is we don't 
know who is using which APIs and how. The user survey isn't going to 
give us this data. Operators probably don't have this data, unless they 
are voicing it as API users themselves. But it would be really useful to 
know, which gaps are various tools in the ecosystem needing to overcome 
by making multiple API calls to possibly multiple services to get a 
clear picture to answer some question, and how can we fix that in a 
single place (maybe the compute API)? A backlog spec in nova could be a 
simple place to start, or just explaining the gaps in the mailing list 
(separate targeted thread of course).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On Fri, May 19, 2017 at 4:01 PM, Matt Riedemann  wrote:
> I'm confused by this. Creating a server takes a volume ID if you're booting
> from volume, and that's actually preferred (by nova devs) since then Nova
> doesn't have to orchestrate the creation of the volume in the compute
> service and then poll until it's available.
>
> Same for ports - nova can create the port (default action) or get a port at
> server creation time, which is required if you're doing trunk ports or
> sr-iov / fancy pants ports.
>
> Am I misunderstanding what you're saying is missing?

It turns out those are bad examples, they do accept IDs.

The point though was there _are_ times when what you want is not what
the opinionated composed API gives you (as much as I _do_ like those).
It isn't so much making more REST calls, but a similar number of
different ones that are actually more efficient in the long run.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 5/19/2017 9:36 AM, Zane Bitter wrote:


The problem is that orchestration done inside APIs is very easy to do 
badly in ways that cause lots of downstream pain for users and external 
orchestrators. For example, Nova already does some orchestration: it 
creates a Neutron port for a server if you don't specify one. (And then 
promptly forgets that it has done so.) There is literally an entire 
inner platform, an orchestrator within an orchestrator, inside Heat to 
try to manage the fallout from this. And the inner platform shares none 
of the elegance, such as it is, of Heat itself, but is rather a 
collection of cobbled-together hacks to deal with the seemingly infinite 
explosion of edge cases that we kept running into over a period of at 
least 5 releases.


I'm assuming you're talking about how nova used to (years ago) not keep 
track of which ports it created and which ones were provided when 
creating a server or attaching ports to an existing server. That was 
fixed quite awhile ago, so I assume anything in Heat at this point is no 
longer necessary and if it is, then it's a bug in nova. i.e. if you 
provide a port when creating a server, when you delete the server, nova 
should not delete the port. If nova creates the port and you delete the 
server, nova should then delete the port also.




The get-me-a-network thing is... better, but there's no provision for 
changes after the server is created, which means we have to copy-paste 
the Nova implementation into Heat to deal with update.[1] Which sounds 
like a maintenance nightmare in the making. That seems to be a common 
mistake: to assume that once users create something they'll never need 
to touch it again, except to delete it when they're done.


I'm not really sure what you're referring to here with 'update' and [1]. 
Can you expand on that? I know it's a bit of a tangent.




Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done 
superbly well, provide transparency for external orchestration tools 
that need to hook in to the data flow, and should be developed in 
consultation with potential consumers like Shade and Heat.


Agree, this is why we push back on baking in more orchestration into 
Nova, because we generally don't do it well, or don't test it well, and 
end up having half-baked things which are a constant source of pain, 
e.g. boot from volume - that might work fine when creating and deleting 
a server, but what happens when you try to migrate, resize, rebuild, 
evacuate or shelve that server?





Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?


(Aside: can we stop using the term 'PaaS' to refer to "everything that 
Nova doesn't do"? This habit is not helping us to communicate clearly.)


Sorry, as I said in response to sdague elsewhere in this thread, I tend 
to lump PaaS and orchestration / porcelain tools together, but that's 
not my intent in starting this thread. I was going to say we should have 
a glossary for terms in OpenStack, but we do, and both are listed. :)


https://docs.openstack.org/user-guide/common/glossary.html

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

2017-05-19 Thread Clark Boylan

On Fri, May 19, 2017, at 02:03 PM, Kevin Benton wrote:
> I split this conversation off of the "Is the pendulum swinging on PaaS
> layers?" thread [1] to discuss some improvements we can make to Neutron
> to
> make orchestration easier.
> 
> There are some pain points that heat has when working with the Neutron
> API.
> I would like to get them converted into requests for enhancements in
> Neutron so the wider community is aware of them.
> 
> Starting with the port/subnet/network relationship - it's important to
> understand that IP addresses are not required on a port.
> 
> >So knowing now that a Network is a layer-2 network segment and a Subnet
> is... effectively a glorified DHCP address pool
> 
> Yes, a Subnet controls IP address allocation as well as setting up
> routing
> for routers, which is why routers reference subnets instead of networks
> (different routers can route for different subnets on the same network).
> It
> essentially dictates things related to L3 addressing and provides
> information for L3 reachability.

One thing that is odd about this is when creating a router you specify
the gateway information using a network which is l2 not l3. Seems like
it would be more correct to use a subnet rather than a network there?

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [glance] Stepping Down

2017-05-19 Thread Hemanth Makkapati

Glancers,
Due to a significant change to my job description, I wouldn't be able to
contribute to Glance in the capacity of a core reviewer going forward.
Hence, I'd like to step down from my role immediately.
For the same reason, I'd like to step down from Glance coresec and release
liaison roles as well.

Thanks for all the help!

Rooting for Glance to do great things,
Hemanth Makkapati
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master




On 05/18/2017 06:13 PM, Adrian Turjak wrote:


So, specifically in the realm of Keystone, since we are using sqlalchemy
we already have Postgresql support, and since Cockroachdb does talk
Postgres it shouldn't be too hard to back Keystone with it. At that
stage you have a Keystone DB that could be multi-region, multi-master,
consistent, and mostly impervious to disaster. Is that not the holy
grail for a service like Keystone? Combine that with fernet tokens and
suddenly Keystone becomes a service you can't really kill, and can
mostly forget about.


So this is exhibit A for why I think keeping some level of "this might 
need to work on other databases" within a codebase is always a great 
idea even if you are not actively supporting other DBs at the moment. 
Even if Openstack dumped Postgresql completely, I'd not take the 
rudimental PG-related utilities out of oslo.db nor would I rename all 
the "mysql_XYZ" facilities to be "XYZ".


Cockroachdb advertises SQLAlchemy compatibility very prominently.  While 
their tutorial at 
https://www.cockroachlabs.com/docs/build-a-python-app-with-cockroachdb-sqlalchemy.html 
says it uses psycopg2 as the database driver, they have implemented 
their own "cockroachdb://" dialect on top of it, which likely smooths 
out the SQL dialect and connectivity quirks between real Postgresql and 
CockroachDB.


This is not the first "distributed database" to build on the Postgresql 
protocol, I did a bunch of work for a database that started out called 
"Akiban", then got merged to "FoundationDB", and then sadly was sucked 
into a black hole shaped like a huge Apple and the entire product and 
staff were gone forever.  CockroachDB seems to be filling in that same 
hole that I was hoping FoundationDB was going to do (until they fell 
into said hole).




I'm welcome to being called mad, but I am curious if anyone has looked
at this. I'm likely to do some tests at some stage regarding this,
because I'm hoping this is the solution I've been hoping to find for
quite a long time.


I'd have a blast if Keystone wanted to get into this.   Distributed / 
NewSQL is something I have a lot of optimism about.   Please keep me 
looped in.






Further reading:
https://www.cockroachlabs.com/
https://github.com/cockroachdb/cockroach
https://www.cockroachlabs.com/docs/build-a-python-app-with-cockroachdb-sqlalchemy.html

Cheers,
- Adrian Turjak


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

We could potentially alter Neutron to inspect each of the fixed IPs to see
if it's eligible to be associated to that floating IP to find the one that
works. Unfortunately that still won't solve multiple eligible fixed IPs. So
is the right thing to do to just always force the user to specify port+IP?

On Fri, May 19, 2017 at 2:13 PM, Monty Taylor  wrote:

> On 05/19/2017 04:03 PM, Kevin Benton wrote:
>
>> I split this conversation off of the "Is the pendulum swinging on PaaS
>> layers?" thread [1] to discuss some improvements we can make to Neutron
>> to make orchestration easier.
>>
>> There are some pain points that heat has when working with the Neutron
>> API. I would like to get them converted into requests for enhancements
>> in Neutron so the wider community is aware of them.
>>
>> Starting with the port/subnet/network relationship - it's important to
>> understand that IP addresses are not required on a port.
>>
>> So knowing now that a Network is a layer-2 network segment and a Subnet
>>>
>> is... effectively a glorified DHCP address pool
>>
>> Yes, a Subnet controls IP address allocation as well as setting up
>> routing for routers, which is why routers reference subnets instead of
>> networks (different routers can route for different subnets on the same
>> network). It essentially dictates things related to L3 addressing and
>> provides information for L3 reachability.
>>
>> But at the end of the day, I still can't create a Port until a Subnet
>>> exists
>>>
>>
>> This is only true if you want an IP address on the port. This sounds
>> silly for most use cases, but there are a non-trivial portion of NFV
>> workloads that do not want IP addresses at all so they create a network
>> and just attach ports without creating any subnets.
>>
>> I still don't know what Subnet a Port will be attached to (unless the
>>>
>> user specifies it explicitly using the --fixed-ip option... regardless
>> of whether they actually specify a fixed IP),
>>
>> So what would you like Neutron to do differently here? Always force a
>> user to pick which subnet they want an allocation from if there are
>> multiple? If so, can't you just force that explicitness in Heat?
>>
>> and I have no way in general of telling which Subnets can be deleted
>>> before a given Port is and which will fail to delete until the Port
>>> disappears.
>>>
>>
>> A given port will only block subnet deletions from subnets it is
>> attached to. Conversely, you can see all ports with allocations from a
>> subnet with 'neutron port-list --fixed-ips subnet_id='.  So
>> is the issue here that the dependency wasn't made explicit in the heat
>> modeling (leading to the problem above and this one)?
>>
>>
>> For the individual bugs you highlighted, it would be good if you can
>> provide some details about what changes we could make to help.
>>
>>
>> https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result
>> of partially specified floating IPs (no fixed_ip). What can we
>> add/change here to help? Or can heat just always force the user to
>> specify a fixed IP for the case where disambiguation on multiple
>> fixed_ip ports is needed?
>>
>
> If the server has more than one fixed_ip ports, it's possible that only
> one of them will be able to receive a floating ip. The subnet a port comes
> from must have gateway_ip set for a floating_ip to attach to it. So if you
> have a server, you can poke and find the right fixed_ip in all cases except
> when the server has more than one fixed_ip and each of them are from a
> subnet with a gateway_ip. In that case, a user _must_ provide a fixed_ip,
> because there is no way to know what they intend.
>
> https://launchpad.net/bugs/1626607 - I see this is about a dependency
>> between RouterGateways and RouterInterfaces, but it's not clear to me
>> why that dependency exists. Is it to solve a lack of visibility into the
>> interfaces required for a floating IP?
>>
>> https://bugs.launchpad.net/heat/+bug/1626619,
>> https://bugs.launchpad.net/heat/+bug/1626630, and
>> https://bugs.launchpad.net/heat/+bug/1626634 - These seems similar to
>> 1626607. Can we just expose the interfaces/router a floating IP is
>> depending on explicitly in the API for you to fix these? If not, what
>> can we do to help here?
>>
>>
>> 1. http://lists.openstack.org/pipermail/openstack-dev/2017-May/
>> 117106.html
>>
>> Cheers,
>> Kevin Benton
>>
>> On Fri, May 19, 2017 at 1:05 PM, Zane Bitter > > wrote:
>>
>> On 19/05/17 15:06, Kevin Benton wrote:
>>
>> Don't even get me started on Neutron.[2]
>>
>>
>> It seems to me the conclusion to that thread was that the
>> majority of
>> your issues stemmed from the fact that we had poor documentation
>> at the
>> time.  A major component of the complaints resulted from you
>> misunderstanding the difference between networks/subnets in
>> Neutron.
>>
>>
>>

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 5/19/2017 3:03 PM, Monty Taylor wrote:

On 05/19/2017 01:04 PM, Sean Dague wrote:

On 05/19/2017 01:38 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova 
boot
request should only take neutron port ids and cinder volume ids.  
The actual

setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this 
subset of

simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands".  I think 
there's
an argument to be made that it would be better to be consistent even 
for the

simple things.


cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.


It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.


Yup. Agree. Making forward progress on that is paramount.


I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... 
boom.


You can actually get fairly far (with a few notable exceptions - I'm 
looking at you unattached floating ips) without state tracking. It comes 
at the cost of more API spidering after a failure/restart. Being able to 
cache stuff aggressively combined with batching/rate-limiting of 
requests to the cloud API allows one to do most of this to a fairly 
massive scale statelessly. However, caching, batching and rate-limiting 
are all pretty much required else you wind up crashing public clouds. :)


I agree that the things are currently duct tape, but I don't think that 
has to be a bad thing. The duct tape is currently needed client-side no 
matter what we do, and will be for some time no matter what we do 
because of older clouds. What's often missing is closing the loop so 
that we can, as OpenStack, eventually provide out of the box the consume 
experience that people currently get from using one of the client-side 
duct tapes. That part is harder, but it's definitely important.



You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10


Right. This is why communication is essential. I'm optimistic we can do 
well on this topic, because we are MUCH better are talking to each other 
now than we were back when ceilometer was started.


Also, a REST-consuming porcelain like oaktree gets to draw on real-world 
experience consuming OpenStack's REST APIs at scale. So it's also not 
the same problem setup, since it's not a from-scratch new thing.


This is, incidentally, why experience with caching and batching is 
important. There is a reason why we do GET /servers/detail once every 5 
seconds rather than doing a specific GET /server/{id}/detail calls for 
each booting VM.


Look at what we could learn just from that... Users using shade are 
doing a full detailed server list because it scales better for 
concurrency. It's obviously more expensive on a single-call basis. BUT - 
maybe it's useful information that doing optimization work on GET 
/servers/detail could be beneficial.


This reminds me that I suspect we're lazy-loading server detail 
information in certain cases, i.e. going back to the DB to do a join 
per-instance after we've already pulled all instances in an initial set 
(with some initial joins). I need to pull this thread again...

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration


On 05/19/2017 04:03 PM, Kevin Benton wrote:

I split this conversation off of the "Is the pendulum swinging on PaaS
layers?" thread [1] to discuss some improvements we can make to Neutron
to make orchestration easier.

There are some pain points that heat has when working with the Neutron
API. I would like to get them converted into requests for enhancements
in Neutron so the wider community is aware of them.

Starting with the port/subnet/network relationship - it's important to
understand that IP addresses are not required on a port.


So knowing now that a Network is a layer-2 network segment and a Subnet

is... effectively a glorified DHCP address pool

Yes, a Subnet controls IP address allocation as well as setting up
routing for routers, which is why routers reference subnets instead of
networks (different routers can route for different subnets on the same
network). It essentially dictates things related to L3 addressing and
provides information for L3 reachability.


But at the end of the day, I still can't create a Port until a Subnet exists


This is only true if you want an IP address on the port. This sounds
silly for most use cases, but there are a non-trivial portion of NFV
workloads that do not want IP addresses at all so they create a network
and just attach ports without creating any subnets.


I still don't know what Subnet a Port will be attached to (unless the

user specifies it explicitly using the --fixed-ip option... regardless
of whether they actually specify a fixed IP),

So what would you like Neutron to do differently here? Always force a
user to pick which subnet they want an allocation from if there are
multiple? If so, can't you just force that explicitness in Heat?


and I have no way in general of telling which Subnets can be deleted before a 
given Port is and which will fail to delete until the Port disappears.


A given port will only block subnet deletions from subnets it is
attached to. Conversely, you can see all ports with allocations from a
subnet with 'neutron port-list --fixed-ips subnet_id='.  So
is the issue here that the dependency wasn't made explicit in the heat
modeling (leading to the problem above and this one)?


For the individual bugs you highlighted, it would be good if you can
provide some details about what changes we could make to help.


https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result
of partially specified floating IPs (no fixed_ip). What can we
add/change here to help? Or can heat just always force the user to
specify a fixed IP for the case where disambiguation on multiple
fixed_ip ports is needed?


If the server has more than one fixed_ip ports, it's possible that only 
one of them will be able to receive a floating ip. The subnet a port 
comes from must have gateway_ip set for a floating_ip to attach to it. 
So if you have a server, you can poke and find the right fixed_ip in all 
cases except when the server has more than one fixed_ip and each of them 
are from a subnet with a gateway_ip. In that case, a user _must_ provide 
a fixed_ip, because there is no way to know what they intend.



https://launchpad.net/bugs/1626607 - I see this is about a dependency
between RouterGateways and RouterInterfaces, but it's not clear to me
why that dependency exists. Is it to solve a lack of visibility into the
interfaces required for a floating IP?

https://bugs.launchpad.net/heat/+bug/1626619,
https://bugs.launchpad.net/heat/+bug/1626630, and
https://bugs.launchpad.net/heat/+bug/1626634 - These seems similar to
1626607. Can we just expose the interfaces/router a floating IP is
depending on explicitly in the API for you to fix these? If not, what
can we do to help here?


1. http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html

Cheers,
Kevin Benton

On Fri, May 19, 2017 at 1:05 PM, Zane Bitter > wrote:

On 19/05/17 15:06, Kevin Benton wrote:

Don't even get me started on Neutron.[2]


It seems to me the conclusion to that thread was that the
majority of
your issues stemmed from the fact that we had poor documentation
at the
time.  A major component of the complaints resulted from you
misunderstanding the difference between networks/subnets in Neutron.


It's true that I was completely off base as to what the various
primitives in Neutron actually do. (Thanks for educating me!) The
implications for orchestration are largely unchanged though. It's a
giant pain that we have to infer implicit dependencies between stuff
to get them to create/delete in the right order, pretty much
independently of what that stuff does.

So knowing now that a Network is a layer-2 network segment and a
Subnet is... effectively a glorified DHCP address pool, I understand
better why it probably seemed like a good idea to hook stuff up
magically. But at the end of the day, I still can't

Re: [openstack-dev] [oslo] can we make everyone drop eventlet? (was: Can we stop global requirements update?)


FTFY



On 05/19/2017 03:58 PM, Joshua Harlow wrote:

Mehdi Abaakouk wrote:

Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.


Octavia is using cotyledon and they have gotten rid of eventlet. Didn't 
seem like it was that hard either to do it (of course the experience in 
how easy it was is likely not transferable to other projects...)


-Josh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?




On 05/19/2017 04:23 AM, Mehdi Abaakouk wrote:



And some applications rely

on implicit internal contract/behavior/assumption.


IMO that's a bug for them.I'm inspired to see that Keystone, Nova 
etc. are able to move between and eventlet backend and a mod_wsgi 
backend.IMO eventlet is really not needed for those services that 
present a REST interface.   Although for a message queue with lots of 
long-running connections that receive events, that's a place where I 
*would* want to use a polling / non-blocking model.  But I'd use it 
explicitly, not with monkeypatching.




Since a new API is needed, why not writing a new lib. Anyways when you
get rid of eventlet you have so many thing to change to ensure your
performance will not drop. 


While I don't know the specifics for your project(s), I don't buy that 
in general because IMO eventlet is not giving us any performance boost 
in the majority of cases.   most of our IO is blocking on the database 
and all the applications have DB connections throttled at about 50 per 
process at the max, and that's only recently, it used to be just 15.




Changing from oslo.service to cotyledon is

really easy on the side.


I'd ask why not oslo.cotyledon but it seems there's a faction here that 
is overall moving out of the Openstack umbrella in any case.





Docs state: "oslo.service being impossible to fix and bringing an 
heavy dependency on eventlet, "  is there a discussion thread on that?


Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.


Many (most?) of the web services can run under mod_wsgi with threads, 
Keystone seems to be standard on this now and I get the impression Nova 
is going in that direction too.(anyone correct me if I'm wrong on 
any of that, I looked to ask around on IRC but it's too late in the day).






For the story we first get rid of eventlet in Telemetry, fixes couple of
performance issue due to using threading/process instead
greenlet/greenthread.

Then we fall into some weird issue due to oslo.service internal
implementation. Process not exiting properly, signals not received,
deadlock when signal are received, unkillable process,
tooz/oslo.messaging heartbeat not scheduled correctly, worker not
restarted when they are dead. All of what we expect from oslo.service
was not working correctly anymore because we remove the line
'eventlet.monkeypatch()'.


So, I've used gevent more than eventlet in my own upstream non-blocking 
work, and while this opinion is like spilling water in the ocean, I 
think applications should never use monkeypatching.   They should call 
into the eventlet/gevent greenlet API directly if that's what they want 
to do.


Of course this means that database logic has to move out of greenlets 
entirely since none of the DBAPIs use non-blocking IO.  That's fine. 
Database-oriented business logic should not be in greenlets.I've 
written about this as well.If one is familiar enough with greenlets 
and threads you can write an application that makes explicit use of 
both.   However, that's application level stuff.   Web service apps like 
Nova conductor  / Neutron / Keystone should not be aware of any of that. 
  They should be coded to assume nothing about context switching.  IMO 
the threading model is "safer" to code towards since you have to handle 
locking and concurrency contingencies explicitly without hardwiring that 
to your assumptions about when context switching is to take place and 
when it's not.





For example, when oslo.service receive a signal, it can arrive on any
thread, this thread is paused, the callback is run in this thread
context, but if the callback try to discus to your code in this thread,
the process lockup, because your code is paused. Python
offers tool to avoid that (signal.set_wakeup_fd), but oslo.service don't
use it. I have tried to run callbacks only on the main thread with
set_wakeup_fd, to avoid this kind of issue but I fail. The whole
oslo.service code is clearly not designed to be threadsafe/signalsafe.
Well, it works for eventlet because you have only one real thread.

And this is just one example on complicated thing I have tried to fix,
before starting cotyledon.


I've no doubt oslo.service has major eventlet problems baked in, I've 
looked at it a little bit but didn't go too far with it.   That still 
doesn't mean there shouldn't be an "oslo.service2" that can effectively 
produce a concurrency-agnostic platform.It of course would have the 
goal in mind of moving projects off eventlet since as I mentioned, 
eventlet monkeypatching should not be used which means our services 
should do most of their "implicitly concurrent" work within threads.


Basically I think openstack should be getting off eventlet in a big way 
so I guess my sentiment here is that the Gnocchi / Cotyledon /etc. 
faction is just splitting off rather than serving as any kind of

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 5/19/2017 1:04 PM, Sean Dague wrote:

Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
want my server to build", or "I'd like Nova to build a volume for me"
are very odd things to call PaaS. I think of PaaS as "here is a ruby on
rails app, provision me a db for it, and make it go". Heroku style.


Yeah as soon as I sent the original email I realized that I was munging 
PaaS and orchestration services/libraries and probably shouldn't have, 
that wasn't my intent. I just tend to lump them together in my head. My 
point was trying to see if I'm missing a change in people's opinions 
about really wanting Nova to be doing more orchestration than we expect.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

Started a new Neutron-specific thread so we can get some stuff turned into
improvements in Neutron from this:
http://lists.openstack.org/pipermail/openstack-dev/2017-May/117112.html

On Fri, May 19, 2017 at 1:05 PM, Zane Bitter  wrote:

> On 19/05/17 15:06, Kevin Benton wrote:
>
>> Don't even get me started on Neutron.[2]
>>>
>>
>> It seems to me the conclusion to that thread was that the majority of
>> your issues stemmed from the fact that we had poor documentation at the
>> time.  A major component of the complaints resulted from you
>> misunderstanding the difference between networks/subnets in Neutron.
>>
>
> It's true that I was completely off base as to what the various primitives
> in Neutron actually do. (Thanks for educating me!) The implications for
> orchestration are largely unchanged though. It's a giant pain that we have
> to infer implicit dependencies between stuff to get them to create/delete
> in the right order, pretty much independently of what that stuff does.
>
> So knowing now that a Network is a layer-2 network segment and a Subnet
> is... effectively a glorified DHCP address pool, I understand better why it
> probably seemed like a good idea to hook stuff up magically. But at the end
> of the day, I still can't create a Port until a Subnet exists, I still
> don't know what Subnet a Port will be attached to (unless the user
> specifies it explicitly using the --fixed-ip option... regardless of
> whether they actually specify a fixed IP), and I have no way in general of
> telling which Subnets can be deleted before a given Port is and which will
> fail to delete until the Port disappears.
>
> There are some legitimate issues in there about the extra routes
>> extension being replace-only and the routers API not accepting a list of
>> interfaces in POST.  However, it hardly seems that those are worthy of
>> "Don't even get me started on Neutron."
>>
>
> https://launchpad.net/bugs/1626607
> https://launchpad.net/bugs/1442121
> https://launchpad.net/bugs/1626619
> https://launchpad.net/bugs/1626630
> https://launchpad.net/bugs/1626634
>
> It would be nice if you could write up something about current gaps that
>> would make Heat's life easier, because a large chunk of that initial
>> email is incorrect and linking to it as a big list of "issues" is
>> counter-productive.
>>
>
> Yes, agreed. I wish I had a clean thread to link to. It's a huge amount of
> work to research it all though.
>
> cheers,
> Zane.
>
> On Fri, May 19, 2017 at 7:36 AM, Zane Bitter > > wrote:
>>
>> On 18/05/17 20:19, Matt Riedemann wrote:
>>
>> I just wanted to blurt this out since it hit me a few times at the
>> summit, and see if I'm misreading the rooms.
>>
>> For the last few years, Nova has pushed back on adding
>> orchestration to
>> the compute API, and even define a policy for it since it comes
>> up so
>> much [1]. The stance is that the compute API should expose
>> capabilities
>> that a higher-level orchestration service can stitch together
>> for a more
>> fluid end user experience.
>>
>>
>> I think this is a wise policy.
>>
>> One simple example that comes up time and again is allowing a
>> user to
>> pass volume type to the compute API when booting from volume
>> such that
>> when nova creates the backing volume in Cinder, it passes
>> through the
>> volume type. If you need a non-default volume type for boot from
>> volume,
>> the way you do this today is first create the volume with said
>> type in
>> Cinder and then provide that volume to the compute API when
>> creating the
>> server. However, people claim that is bad UX or hard for users to
>> understand, something like that (at least from a command line, I
>> assume
>> Horizon hides this, and basic users should probably be using
>> Horizon
>> anyway right?).
>>
>>
>> As always, there's a trade-off between simplicity and flexibility. I
>> can certainly understand the logic in wanting to make the simple
>> stuff simple. But users also need to be able to progress from simple
>> stuff to more complex stuff without having to give up and start
>> over. There's a danger of leading them down the garden path.
>>
>> While talking about claims in the scheduler and a top-level
>> conductor
>> for cells v2 deployments, we've talked about the desire to
>> eliminate
>> "up-calls" from the compute service to the top-level controller
>> services
>> (nova-api, nova-conductor and nova-scheduler). Build retries is
>> one such
>> up-call. CERN disables build retries, but others rely on them,
>> because
>> of how racy claims in the computes are (that's another story and
>>

[openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

I split this conversation off of the "Is the pendulum swinging on PaaS
layers?" thread [1] to discuss some improvements we can make to Neutron to
make orchestration easier.

There are some pain points that heat has when working with the Neutron API.
I would like to get them converted into requests for enhancements in
Neutron so the wider community is aware of them.

Starting with the port/subnet/network relationship - it's important to
understand that IP addresses are not required on a port.

>So knowing now that a Network is a layer-2 network segment and a Subnet
is... effectively a glorified DHCP address pool

Yes, a Subnet controls IP address allocation as well as setting up routing
for routers, which is why routers reference subnets instead of networks
(different routers can route for different subnets on the same network). It
essentially dictates things related to L3 addressing and provides
information for L3 reachability.

>But at the end of the day, I still can't create a Port until a Subnet
exists

This is only true if you want an IP address on the port. This sounds silly
for most use cases, but there are a non-trivial portion of NFV workloads
that do not want IP addresses at all so they create a network and just
attach ports without creating any subnets.

>I still don't know what Subnet a Port will be attached to (unless the user
specifies it explicitly using the --fixed-ip option... regardless of
whether they actually specify a fixed IP),

So what would you like Neutron to do differently here? Always force a user
to pick which subnet they want an allocation from if there are multiple? If
so, can't you just force that explicitness in Heat?

> and I have no way in general of telling which Subnets can be deleted
before a given Port is and which will fail to delete until the Port
disappears.

A given port will only block subnet deletions from subnets it is attached
to. Conversely, you can see all ports with allocations from a subnet with
'neutron port-list --fixed-ips subnet_id='.  So is the issue
here that the dependency wasn't made explicit in the heat modeling (leading
to the problem above and this one)?

For the individual bugs you highlighted, it would be good if you can
provide some details about what changes we could make to help.

https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result of
partially specified floating IPs (no fixed_ip). What can we add/change here
to help? Or can heat just always force the user to specify a fixed IP for
the case where disambiguation on multiple fixed_ip ports is needed?

https://launchpad.net/bugs/1626607 - I see this is about a dependency
between RouterGateways and RouterInterfaces, but it's not clear to me why
that dependency exists. Is it to solve a lack of visibility into the
interfaces required for a floating IP?

https://bugs.launchpad.net/heat/+bug/1626619,
https://bugs.launchpad.net/heat/+bug/1626630, and
https://bugs.launchpad.net/heat/+bug/1626634 - These seems similar to
1626607. Can we just expose the interfaces/router a floating IP is
depending on explicitly in the API for you to fix these? If not, what can
we do to help here?

1. http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html

Cheers,
Kevin Benton

On Fri, May 19, 2017 at 1:05 PM, Zane Bitter  wrote:

> On 19/05/17 15:06, Kevin Benton wrote:
>
>> Don't even get me started on Neutron.[2]
>>>
>>
>> It seems to me the conclusion to that thread was that the majority of
>> your issues stemmed from the fact that we had poor documentation at the
>> time.  A major component of the complaints resulted from you
>> misunderstanding the difference between networks/subnets in Neutron.
>>
>
> It's true that I was completely off base as to what the various primitives
> in Neutron actually do. (Thanks for educating me!) The implications for
> orchestration are largely unchanged though. It's a giant pain that we have
> to infer implicit dependencies between stuff to get them to create/delete
> in the right order, pretty much independently of what that stuff does.
>
> So knowing now that a Network is a layer-2 network segment and a Subnet
> is... effectively a glorified DHCP address pool, I understand better why it
> probably seemed like a good idea to hook stuff up magically. But at the end
> of the day, I still can't create a Port until a Subnet exists, I still
> don't know what Subnet a Port will be attached to (unless the user
> specifies it explicitly using the --fixed-ip option... regardless of
> whether they actually specify a fixed IP), and I have no way in general of
> telling which Subnets can be deleted before a given Port is and which will
> fail to delete until the Port disappears.
>
> There are some legitimate issues in there about the extra routes
>> extension being replace-only and the routers API not accepting a list of
>> interfaces in POST.  However, it hardly seems that those are worthy of
>> "Don't even get me started on

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 5/19/2017 12:38 PM, Dean Troyer wrote:

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available.  Just picking "POST
/server" as an example, we do not have that today.  Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes.  I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.


I'm confused by this. Creating a server takes a volume ID if you're 
booting from volume, and that's actually preferred (by nova devs) since 
then Nova doesn't have to orchestrate the creation of the volume in the 
compute service and then poll until it's available.


Same for ports - nova can create the port (default action) or get a port 
at server creation time, which is required if you're doing trunk ports 
or sr-iov / fancy pants ports.


Am I misunderstanding what you're saying is missing?

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

2017-05-19 Thread Lance Bragstad

Project: Keystone
Attendees: 12 - 15

We conflicted with one of the Baremetal/VM sessions

I attempted to document most of the session in my recap [0].

We started out by doing a round-the-room of introductions so that folks
could put IRC nicks to faces (we also didn't have a packed room so this
went pretty quick). After that we cruised through a summary of keystone,
the format of the projects, and the various processes we use. All of this
took *maybe* 30 minutes.

>From there we had an open discussion and things evolved organically. We
ended up going through:

   - the differences between the v2.0 and v3 APIs
   - keystonemiddleware architecture, how it aids services, and how it
   interacts with keystone
  - we essentially followed an API call for creating a instance from
  keystone -> nova -> glance
   - how authentication scoping works and why it works that way
   - how federation works and why it's setup the way it is
   - how federated authentication works (https://goo.gl/NfY3mr)

All of this was pretty well-received and generated a lot of productive
discussion. We also had several seasoned keystone contributors in the room,
which helped a lot. Most of the attendees were all curious about similar
topics, which was great, but we totally could have split into separate
groups given the experience we had in the room (we'll save that in our back
pocket for next time).

[0] https://www.lbragstad.com/blog/openstack-boston-summit-recap
[1] https://www.slideshare.net/LanceBragstad/keystone-project-onboarding

On Fri, May 19, 2017 at 10:37 AM, Michał Jastrzębski 
wrote:

> Kolla:
> Attendees - full room (20-30?)
> Notes - Conflict with kolla-k8s demo probably didn't help
>
> While we didn't have etherpad, slides, recording (and video dongle
> that could fit my laptop), we had great session with analog tools
> (whiteboard and my voice chords). We walked through architecture of
> each Kolla project, how they relate to each other and so on.
>
> Couple things to take out from our onboarding:
> 1. Bring dongles
> 2. We could've used bigger room - people were leaving because we had
> no chairs left
> 3. Recording would be awesome
> 4. Low tech is not a bad tech
>
> All and all, when we started session I didn't know what to expect or
> what people will expect so we just...rolled with it, and people seemed
> to be happy with it:) I think onboarding rooms were great idea (kudos
> to whoever came up with it)! I'll be happy to run it again in Sydney.
>
> Cheers,
> Michal
>
>
> On 19 May 2017 at 08:12, Julien Danjou  wrote:
> > On Fri, May 19 2017, Sean Dague wrote:
> >
> >> If you ran a room, please post the project, what you did in the room,
> >> what you think worked, what you would have done differently. If you
> >> attended a room you didn't run, please provide feedback about which one
> >> it was, and what you thought worked / didn't work from the other side of
> >> the table.
> >
> > We shared a room for Telemetry and CloudKitty for 90 minutes.
> > I was there with Gordon Chung for Telemetry.
> > Christophe Sauthier was there for CloudKitty.
> >
> > We only had 3 people showing up in the session. One wanted to read his
> > emails in a quiet room, the two others had a couple of question on
> > Telemetry – though it was not really related to contribution as far as I
> > can recall.
> >
> > I had to leave after 45 minutes because they was an overlap with a talk
> > I was doing and rescheduling did not seem possible. And everybody left a
> > few minutes after I left apparently.
> >
> > --
> > Julien Danjou
> > -- Free Software hacker
> > -- https://julien.danjou.info
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 05/19/2017 03:05 PM, Zane Bitter wrote:

On 19/05/17 15:06, Kevin Benton wrote:

Don't even get me started on Neutron.[2]


It seems to me the conclusion to that thread was that the majority of
your issues stemmed from the fact that we had poor documentation at the
time.  A major component of the complaints resulted from you
misunderstanding the difference between networks/subnets in Neutron.


It's true that I was completely off base as to what the various
primitives in Neutron actually do. (Thanks for educating me!) The
implications for orchestration are largely unchanged though. It's a
giant pain that we have to infer implicit dependencies between stuff to
get them to create/delete in the right order, pretty much independently
of what that stuff does.

So knowing now that a Network is a layer-2 network segment and a Subnet
is... effectively a glorified DHCP address pool, I understand better why
it probably seemed like a good idea to hook stuff up magically. But at
the end of the day, I still can't create a Port until a Subnet exists, I
still don't know what Subnet a Port will be attached to (unless the user
specifies it explicitly using the --fixed-ip option... regardless of
whether they actually specify a fixed IP), and I have no way in general
of telling which Subnets can be deleted before a given Port is and which
will fail to delete until the Port disappears.


There are some legitimate issues in there about the extra routes
extension being replace-only and the routers API not accepting a list of
interfaces in POST.  However, it hardly seems that those are worthy of
"Don't even get me started on Neutron."


https://launchpad.net/bugs/1626607
https://launchpad.net/bugs/1442121
https://launchpad.net/bugs/1626619
https://launchpad.net/bugs/1626630
https://launchpad.net/bugs/1626634


It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial
email is incorrect and linking to it as a big list of "issues" is
counter-productive.


I used to have angst at the Neutron API but have come to like it more 
and more over time.


I think the main thing I run in to is that Neutron's API is modelling a 
a pile of data to allow for power users to do very flexible things. What 
it's missing most of the time is an easy button.


I'll give some examples:

My favorite for-instance, which I mentioned in a different thread this 
week and have mentioned in almost every talk I've given over the last 3 
years - is that there is no way to find out if a given network can 
provide connectivity to a resource from outside of the cloud.


There are _many_ reasons why it's hard to fully express a completely 
accurate answer to this problem. "What does external mean" "what if 
there are multiple external networks" etc. Those are all valid, and all 
speak to real workloads and real user scenarios ...


But there's also:

As a user I want to boot a VM on this cloud and have my users who are 
not necessarily on this cloud be able to connect a service I'm going to 
run on it. (aka, I want to run a wordpress)


and

As a user I want to boot a VM on this cloud and I do not want anyone who 
is not another resource on this cloud to be able to connect to anything 
it's running. ( aka, I want to run a mysql)


Unless you know things about the cloud already somehow not from the API, 
it is impossible to consistently perform those two tasks.


We've done a great job empowering the power users to do a bunch of 
really cool things. But we missed booting a wordpress as a basic use case.


Other things exist but aren't anyone's fault really. We still can't as a 
community agree on a consistent worldview related to fixed ips, neutron 
ports and floating ips. Neutron amazingly supports ALL of the use case 
combinations for those topics ... it just doesn't always do so in all of 
the clouds.


Heck - while I'm on floating ips ... if you have some pre-existing 
floating ips and you want to boot servers on them and you want to do 
that in parallel, you can't. You can boot a server with a floating ip 
that did not pre-exist if you get the port id of the fixed ip of the 
server then pass that id to the floating ip create call. Of course, the 
server doesn't return the port id in the server record, so at the very 
least you need to make a GET /ports.json?device_id={server_id} call. Of 
course what you REALLY need to find is the port_id of the ip of the 
server that came from a subnet that has 'gateway_ip' defined, which is 
even more fun since ips are associated with _networks_ on the server 
record and not with subnets.


Possibly to Zane's point, you basically have to recreate a multi-table 
data model client side and introspect relationships between objects to 
be able to figure out how to correctly get a floating ip on to a server. 
NOW - as opposed to the external network bit- it IS possible to do and 
to do correctly and have it work every time.


But if you want

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master

2017-05-19 Thread Lance Bragstad

On Thu, May 18, 2017 at 6:43 PM, Curtis  wrote:

> On Thu, May 18, 2017 at 4:13 PM, Adrian Turjak 
> wrote:
> > Hello fellow OpenStackers,
> >
> > For the last while I've been looking at options for multi-region
> > multi-master Keystone, as well as multi-master for other services I've
> > been developing and one thing that always came up was there aren't many
> > truly good options for a true multi-master backend. Recently I've been
> > looking at Cockroachdb and while I haven't had the chance to do any
> > testing I'm curious if anyone else has looked into it. It sounds like
> > the perfect solution, and if it can be proved to be stable enough it
> > could solve a lot of problems.
> >
> > So, specifically in the realm of Keystone, since we are using sqlalchemy
> > we already have Postgresql support, and since Cockroachdb does talk
> > Postgres it shouldn't be too hard to back Keystone with it. At that
> > stage you have a Keystone DB that could be multi-region, multi-master,
> > consistent, and mostly impervious to disaster. Is that not the holy
> > grail for a service like Keystone? Combine that with fernet tokens and
> > suddenly Keystone becomes a service you can't really kill, and can
> > mostly forget about.
>

++


> >
> > I'm welcome to being called mad, but I am curious if anyone has looked
> > at this. I'm likely to do some tests at some stage regarding this,
> > because I'm hoping this is the solution I've been hoping to find for
> > quite a long time.
>
> I was going to take a look at this a bit myself, just try it out. I
> can't completely speak for the Fog/Edge/Massively Distributed working
> group in OpenStack, but I feel like this might be something they look
> into.
>
> For standard multi-site I don't know how much it would help, say if
> you only had a couple or three clouds, but more than that maybe this
> starts to make sense. Also running Galera has gotten easier but still
> not that easy.
>

When we originally tested a PoC fernet implementation, we did it globally
distributed across five data centers. We didn't generate enough non-token
load to notice significant service degradation due to replication lag or
issues. I have heard replication across regions in the double digits is
where you start getting into some real interesting problems (gyee was one
of the folks in keystone who knew more about that). Dusting off those cases
with something like cockroachdb would be an interesting exercise!


>
> I had thought that the OpenStack community was deprecating Postgres
> support though, so that could make things a bit harder here (I might
> be wrong about this).
>
> Thanks,
> Curtis.
>
> >
> > Further reading:
> > https://www.cockroachlabs.com/
> > https://github.com/cockroachdb/cockroach
> > https://www.cockroachlabs.com/docs/build-a-python-app-with-
> cockroachdb-sqlalchemy.html
> >
> > Cheers,
> > - Adrian Turjak
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Blog: serverascode.com
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 05/19/2017 01:53 PM, Sean Dague wrote:

On 05/19/2017 02:34 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper documentation,
performance implications, and the like), not an end game on their own.


I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).


Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.


You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".


Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.


Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.


We also searching of all resources by name-or-id in shade. But it's one 
call - GET /images - and then we test to see if the given value matches 
the name field or the id field. And there is caching, so the list call 
is done once in the session.


The thing I'm the saddest about is the Nova flavor "extra_info" that one 
needs to grab for backwards compat but almost never has anything useful 
in it. This causes me to make a billion API calls for the initial flavor 
list (which is then cached of course) It would be WAY nicer if there was 
a GET /flavors/detail that would just get me the whole lot in one go, fwiw.


Dean has a harder time than I do with that one because osc interactions 
are lots of process invocations from scratch. We chatted a bit about how 
to potentially share caching things in Boston, but not sure we've come 
up with more.



All for new and better experiences. I think that's great. Where I think
we want to be really careful is deciding the path to creating better
experiences is by not engaging with the services and just writing around
it. That feedback has to come back. Those reasons have to come back, and
we need to roll sensible improvements back into base services.

If you want to go fast, go alone, if you want to go far, go together.


Couldn't agree more . I think we're getting better at that communication.

We still have a hole, which is that the path from "this is a problem and 
here's how I'm working around it" to "there are devs tasked to work on 
solving that problem" is a hard one, because while the communication 
from those of us doing client-layer stuff with the folks doing the 
servers is pretty good - the communication loop with the folks at the 
companies who are prioritizing work ... not so much. Look at the number 
of people hacking on shade or python-openstackclient or writing 
user-facing docs compared to folks adding backend features to the services.


So - yes, I totally agree. But also, we can make and are making a lot of 
progress in some areas with tiny crews. That's gonna likely be the state 
of the world for a while until we're better able to point our fingers at 
the problem and characterize it such that our friends who provide 
resources value these problems enough to fund working on them.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Zane Bitter


On 19/05/17 15:06, Kevin Benton wrote:

Don't even get me started on Neutron.[2]


It seems to me the conclusion to that thread was that the majority of
your issues stemmed from the fact that we had poor documentation at the
time.  A major component of the complaints resulted from you
misunderstanding the difference between networks/subnets in Neutron.


It's true that I was completely off base as to what the various 
primitives in Neutron actually do. (Thanks for educating me!) The 
implications for orchestration are largely unchanged though. It's a 
giant pain that we have to infer implicit dependencies between stuff to 
get them to create/delete in the right order, pretty much independently 
of what that stuff does.


So knowing now that a Network is a layer-2 network segment and a Subnet 
is... effectively a glorified DHCP address pool, I understand better why 
it probably seemed like a good idea to hook stuff up magically. But at 
the end of the day, I still can't create a Port until a Subnet exists, I 
still don't know what Subnet a Port will be attached to (unless the user 
specifies it explicitly using the --fixed-ip option... regardless of 
whether they actually specify a fixed IP), and I have no way in general 
of telling which Subnets can be deleted before a given Port is and which 
will fail to delete until the Port disappears.



There are some legitimate issues in there about the extra routes
extension being replace-only and the routers API not accepting a list of
interfaces in POST.  However, it hardly seems that those are worthy of
"Don't even get me started on Neutron."


https://launchpad.net/bugs/1626607
https://launchpad.net/bugs/1442121
https://launchpad.net/bugs/1626619
https://launchpad.net/bugs/1626630
https://launchpad.net/bugs/1626634


It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial
email is incorrect and linking to it as a big list of "issues" is
counter-productive.


Yes, agreed. I wish I had a clean thread to link to. It's a huge amount 
of work to research it all though.


cheers,
Zane.


On Fri, May 19, 2017 at 7:36 AM, Zane Bitter > wrote:

On 18/05/17 20:19, Matt Riedemann wrote:

I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding
orchestration to
the compute API, and even define a policy for it since it comes
up so
much [1]. The stance is that the compute API should expose
capabilities
that a higher-level orchestration service can stitch together
for a more
fluid end user experience.


I think this is a wise policy.

One simple example that comes up time and again is allowing a
user to
pass volume type to the compute API when booting from volume
such that
when nova creates the backing volume in Cinder, it passes
through the
volume type. If you need a non-default volume type for boot from
volume,
the way you do this today is first create the volume with said
type in
Cinder and then provide that volume to the compute API when
creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I
assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).


As always, there's a trade-off between simplicity and flexibility. I
can certainly understand the logic in wanting to make the simple
stuff simple. But users also need to be able to progress from simple
stuff to more complex stuff without having to give up and start
over. There's a danger of leading them down the garden path.

While talking about claims in the scheduler and a top-level
conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller
services
(nova-api, nova-conductor and nova-scheduler). Build retries is
one such
up-call. CERN disables build retries, but others rely on them,
because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked,
"why
not just do away with build retries in nova altogether? If the
scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."


(FWIW Heat does this for you already.)

But during several different Forum sessions, like user API
improvements
[2] but also the cells v2 and claims in the scheduler

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On 05/19/2017 01:04 PM, Sean Dague wrote:

On 05/19/2017 01:38 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids.  The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of
simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands".  I think there's
an argument to be made that it would be better to be consistent even for the
simple things.


cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.


It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.


Yup. Agree. Making forward progress on that is paramount.


I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... boom.


You can actually get fairly far (with a few notable exceptions - I'm 
looking at you unattached floating ips) without state tracking. It comes 
at the cost of more API spidering after a failure/restart. Being able to 
cache stuff aggressively combined with batching/rate-limiting of 
requests to the cloud API allows one to do most of this to a fairly 
massive scale statelessly. However, caching, batching and rate-limiting 
are all pretty much required else you wind up crashing public clouds. :)


I agree that the things are currently duct tape, but I don't think that 
has to be a bad thing. The duct tape is currently needed client-side no 
matter what we do, and will be for some time no matter what we do 
because of older clouds. What's often missing is closing the loop so 
that we can, as OpenStack, eventually provide out of the box the consume 
experience that people currently get from using one of the client-side 
duct tapes. That part is harder, but it's definitely important.



You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10


Right. This is why communication is essential. I'm optimistic we can do 
well on this topic, because we are MUCH better are talking to each other 
now than we were back when ceilometer was started.


Also, a REST-consuming porcelain like oaktree gets to draw on real-world 
experience consuming OpenStack's REST APIs at scale. So it's also not 
the same problem setup, since it's not a from-scratch new thing.


This is, incidentally, why experience with caching and batching is 
important. There is a reason why we do GET /servers/detail once every 5 
seconds rather than doing a specific GET /server/{id}/detail calls for 
each booting VM.


Look at what we could learn just from that... Users using shade are 
doing a full detailed server list because it scales better for 
concurrency. It's obviously more expensive on a single-call basis. BUT - 
maybe it's useful information that doing optimization work on GET 
/servers/detail could be beneficial.



(http://superuser.openstack.org/articles/cern-cloud-architecture-update/
last graph). That was an anti pattern. We should have gotten to the
bottom of the mismatches and communication issues early on, because the
end state we all inflicted on users to get a totally reasonable set of
features, was not good. Please lets not do this again.


++


These should

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-19 Thread Joshua Harlow


Mehdi Abaakouk wrote:

Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.


Octavia is using cotyledon and they have gotten rid of eventlet. Didn't 
seem like it was that hard either to do it (of course the experience in 
how easy it was is likely not transferable to other projects...)


-Josh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance][openstack-ansible] Moving on

2017-05-19 Thread Jesse Pretorius

This is most unfortunate. :(

I do wish you the absolute best with whatever lies ahead. I have found your 
skills, patience and willingness to debate and learn most inspirational.

From: Steve Lewis 
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 

Date: Friday, May 19, 2017 at 4:55 AM
To: OpenStack 
Subject: [openstack-dev] [glance][openstack-ansible] Moving on

It is clear to me now that I won't be able to work on OpenStack as a part of my 
next day job, wherever that ends up being. As such, I’ll no longer be able to 
invest the time and energy required to maintain my involvement in the 
community. It's time to resign my role as a core reviewer, effective 
immediately.
Thanks for all the fish.
--
SteveL

Rackspace Limited is a company registered in England & Wales (company 
registered number 03897010) whose registered office is at 5 Millington Road, 
Hyde Park Hayes, Middlesex UB3 4AZ. Rackspace Limited privacy policy can be 
viewed at www.rackspace.co.uk/legal/privacy-policy - This e-mail message may 
contain confidential or privileged information intended for the recipient. Any 
dissemination, distribution or copying of the enclosed material is prohibited. 
If you receive this transmission in error, please notify us immediately by 
e-mail at ab...@rackspace.com and delete the original message. Your cooperation 
is appreciated.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [trove] Trove reboot meeting

2017-05-19 Thread MCCASLAND, TREVOR

As a result of a large number of new contributors looking for direction from 
our project, we would like to host a focused meeting on the project's scope. 

Please let us know your availability for this one time meeting by using this 
doodle poll[1]

We have brainstormed a few ideas for discussion [2], the project's scope is not 
limited to these ideas so if you would like to include something, please add it.

Traditionally these kind of meetings are done at the PTG but we wanted to get 
ahead of that timeline to keep the interest going.

The current meeting time and place is not decided yet but it will most likely 
be an impromptu virtual meeting on google hangouts or some variant but we will 
also try our best to loop the conversation back in to the mailing list, our 
channel #openstack-trove and/or our project's meeting time.

When the time is right, probably Tuesday morning. I will announce what time 
works the best for everyone, how and where to participate. 

[1] https://beta.doodle.com/poll/s36ywdz5mfwqkdvu
[2] https://etherpad.openstack.org/p/trove-reboot


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] massive overhead processing "network-changed" events during live migration


On 5/19/2017 1:40 PM, Chris Friesen wrote:
Recently we noticed failures in Newton when we attempted to live-migrate 
an instance with 16 vifs.  We tracked it down to an RPC timeout in nova 
which timed out waiting for the 'refresh_cache-%s' lock in 
get_instance_nw_info().  This led to a few other discoveries.


First, we have no fair locking in OpenStack.  The live migration code 
path was waiting for the lock, but the code processing the incoming 
"network-changed" events kept getting the lock instead even though they 
arrived while the live migration code was already blocked waiting for 
the lock.


I'm told that etcd gives us a DLM which is unicorns and rainbows, would 
that help us here?




Second, it turns out the cost of processing the "network-changed" events 
is astronomical.


1) In Newton nova commit 5de902a was merged to fix evacuate bugs, but it 
meant both source and dest compute nodes got the "network-changed" 
events.  This doubled the number of neutron API calls during a live 
migration.


As you noted below, that change was made specifically for evacuate. With 
the migration object we know the type of migration and could scope this 
behavior to just evacuate. However, I'm sort of confused by that change 
- why are we sending external events to the source compute during an 
evacuation? Isn't the source compute down and thus can't receive and 
process the event?




2) A "network-changed" event is sent from neutron each time something 
changes. There are multiple of these events for each vif during a 
live-migration.  In the current upstream code the only information 
passed with the event is the instance id, so nova will loop over all the 
ports in the instance and build up all the information about 
subnets/floatingIP/fixedIP/etc. for that instance.  This results in 
O(N^2) neutron API calls where N is the number of vifs in the instance.


While working on the patches you reference in #3 I was also working on 
seeing if we can do some bulk queries to Neutron:


https://review.openstack.org/#/c/465792/

It looks like that's not working though. Kevin Benton seemed to think at 
the time (it was late the other night) that passing a list of filter 
parameters would get turned into an OR in the database query, but I'm 
not sure that's happening (see that Tempest failed on that patch). I 
don't have a devstack handy but it seems we could prove this via simple 
curl requests.




3) mriedem has proposed a patch series 
(https://review.openstack.org/#/c/465783 and 
https://review.openstack.org/#/c/465787) that would change neutron to 
include the port ID, and allow nova to update just that port.  This 
reduces the cost to O(N), but it's still significant.


In a hardware lab with 4 compute nodes I created 4 boot-from-volume 
instances, each with 16 vifs.  I then live-migrated them all in 
parallel.  (The one on compute-0 was migrated to compute-1, the one on 
compute-1 was migrated to compute-2, etc.)  The aggregate CPU usage for 
a few critical components on the controller node is shown below.  Note 
in particular the CPU usage for neutron--it's using most of 10 CPUs for 
~10 seconds, spiking to 13 CPUs.  This seems like an absurd amount of 
work to do just to update the cache in nova.



Labels:
   L0: neutron-server
   L1: nova-conductor
   L2: beam.smp
   L3: postgres
-  - -  L0  L1  L2  L3
date   time dt occ occ occ occ
/mm/dd hh:mm:ss.dec(s) (%) (%) (%) (%)
2017-05-19 17:51:38.710  2.173   19.751.282.851.96
2017-05-19 17:51:40.012  1.3021.021.753.805.07
2017-05-19 17:51:41.334  1.3222.342.665.251.76
2017-05-19 17:51:42.681  1.347   91.793.315.275.64
2017-05-19 17:51:44.035  1.354   40.787.273.487.34
2017-05-19 17:51:45.406  1.3717.12   21.358.66   19.58
2017-05-19 17:51:46.784  1.378   16.71  196.296.87   15.93
2017-05-19 17:51:48.133  1.349   18.51  362.468.57   25.70
2017-05-19 17:51:49.508  1.375  284.16  199.304.58   18.49
2017-05-19 17:51:50.919  1.411  512.88   17.617.47   42.88
2017-05-19 17:51:52.322  1.403  412.348.909.15   19.24
2017-05-19 17:51:53.734  1.411  320.245.20   10.599.08
2017-05-19 17:51:55.129  1.396  304.922.27   10.65   10.29
2017-05-19 17:51:56.551  1.422  556.09   14.56   10.74   18.85
2017-05-19 17:51:57.977  1.426  979.63   43.41   14.17   21.32
2017-05-19 17:51:59.382  1.405  902.56   48.31   13.69   18.59
2017-05-19 17:52:00.808  1.425 1140.99   74.28   15.12   17.18
2017-05-19 17:52:02.238  1.430 1013.91   69.77   16.46   21.19
2017-05-19 17:52:03.647  1.409  964.94  175.09   15.81   27.23
2017-05-19 17:52:05.077  1.430  838.15  109.13   15.70   34.12
2017-05-19 17:52:06.502  1.425  525.88   79.09   14.42   11.09
2017-05-19 17:52:07.954  1.452  614.58   38.38   12.20   17.89
2017-05-19 17:52:09.380  1.426  763.25   68.40   12.36   16.08
2017-05-19

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

>Don't even get me started on Neutron.[2]

It seems to me the conclusion to that thread was that the majority of your
issues stemmed from the fact that we had poor documentation at the time.  A
major component of the complaints resulted from you misunderstanding the
difference between networks/subnets in Neutron.

There are some legitimate issues in there about the extra routes extension
being replace-only and the routers API not accepting a list of interfaces
in POST.  However, it hardly seems that those are worthy of "Don't even get
me started on Neutron."

It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial email
is incorrect and linking to it as a big list of "issues" is
counter-productive.


On Fri, May 19, 2017 at 7:36 AM, Zane Bitter  wrote:

> On 18/05/17 20:19, Matt Riedemann wrote:
>
>> I just wanted to blurt this out since it hit me a few times at the
>> summit, and see if I'm misreading the rooms.
>>
>> For the last few years, Nova has pushed back on adding orchestration to
>> the compute API, and even define a policy for it since it comes up so
>> much [1]. The stance is that the compute API should expose capabilities
>> that a higher-level orchestration service can stitch together for a more
>> fluid end user experience.
>>
>
> I think this is a wise policy.
>
> One simple example that comes up time and again is allowing a user to
>> pass volume type to the compute API when booting from volume such that
>> when nova creates the backing volume in Cinder, it passes through the
>> volume type. If you need a non-default volume type for boot from volume,
>> the way you do this today is first create the volume with said type in
>> Cinder and then provide that volume to the compute API when creating the
>> server. However, people claim that is bad UX or hard for users to
>> understand, something like that (at least from a command line, I assume
>> Horizon hides this, and basic users should probably be using Horizon
>> anyway right?).
>>
>
> As always, there's a trade-off between simplicity and flexibility. I can
> certainly understand the logic in wanting to make the simple stuff simple.
> But users also need to be able to progress from simple stuff to more
> complex stuff without having to give up and start over. There's a danger of
> leading them down the garden path.
>
> While talking about claims in the scheduler and a top-level conductor
>> for cells v2 deployments, we've talked about the desire to eliminate
>> "up-calls" from the compute service to the top-level controller services
>> (nova-api, nova-conductor and nova-scheduler). Build retries is one such
>> up-call. CERN disables build retries, but others rely on them, because
>> of how racy claims in the computes are (that's another story and why
>> we're working on fixing it). While talking about this, we asked, "why
>> not just do away with build retries in nova altogether? If the scheduler
>> picks a host and the build fails, it fails, and you have to
>> retry/rebuild/delete/recreate from a top-level service."
>>
>
> (FWIW Heat does this for you already.)
>
> But during several different Forum sessions, like user API improvements
>> [2] but also the cells v2 and claims in the scheduler sessions, I was
>> hearing about how operators only wanted to expose the base IaaS services
>> and APIs and end API users wanted to only use those, which means any
>> improvements in those APIs would have to be in the base APIs (nova,
>> cinder, etc). To me, that generally means any orchestration would have
>> to be baked into the compute API if you're not using Heat or something
>> similar.
>>
>
> The problem is that orchestration done inside APIs is very easy to do
> badly in ways that cause lots of downstream pain for users and external
> orchestrators. For example, Nova already does some orchestration: it
> creates a Neutron port for a server if you don't specify one. (And then
> promptly forgets that it has done so.) There is literally an entire inner
> platform, an orchestrator within an orchestrator, inside Heat to try to
> manage the fallout from this. And the inner platform shares none of the
> elegance, such as it is, of Heat itself, but is rather a collection of
> cobbled-together hacks to deal with the seemingly infinite explosion of
> edge cases that we kept running into over a period of at least 5 releases.
>
> The get-me-a-network thing is... better, but there's no provision for
> changes after the server is created, which means we have to copy-paste the
> Nova implementation into Heat to deal with update.[1] Which sounds like a
> maintenance nightmare in the making. That seems to be a common mistake: to
> assume that once users create something they'll never need to touch it
> again, except to delete it when they're done.
>
> Don't even get me started on Neutron.[2]
>
> Any orchestration that is done behind-the-scenes needs to be

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration




On 05/19/2017 02:46 AM, joehuang wrote:

Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?



I've talked to Matthew in Boston and I am also a little concerned about 
this.The approach involves trying to fetch just the smallest number 
of records possible from each backend, merging them as they come in, and 
then discarding the rest (unfetched) once there's enough for a page. 
But there is latency around invoking query before any results are 
received, and the database driver really wants to send out all the rows 
as well, not to mention the ORM (with configurability) wants to convert 
the whole set of rows received to objects, all has overhead.


To at least handle the problem of 50 connections that have all executed 
a statement and waiting on results, to parallelize that means there 
needs to be a threadpool , greenlet pool, or explicit non-blocking 
approach put in place.  The "thread pool" would be the approach that's 
possible, which with eventlet monkeypatching transparently becomes a 
greenlet pool.  But that's where this starts getting a little intense 
for something you want to do in the context of "a web request".   So I 
think the DB-based solution here is feasible but I'm a little skeptical 
of it at higher scale.   Usually, the search engine would be something 
pluggable, like, "SQL" or "searchlight".








More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered.

It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.

In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight
integration

Hi everyone,

After previous summits where we had vertical tracks for Nova sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only attempting to
recap the Forum sessions that I ran. Dan Smith led a session on Cells
v2, John Garbutt led several sessions on the VM and Baremetal platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency for state
changes and questioning how this does not redo the top-level cells v1
sync issue. It admittedly does to an extent, but we don't have all of
the weird side code paths with cells v1 and it should be self-healing.
Kris Lindgren noted that the instance.usage.exists periodic notification
from the computes hammers their notification bus; we suggested he report
a bug so we can fix that.

It was also noted that if data is corrupted in ElasticSearch or is out
of sync, you could re-sync that from nova to searchlight, however,
searchlight syncs up with nova via the compute REST API, which if the
compute REST API is using searchlight in the backend, you end up getting
into an infinite loop of broken. This could probably be fixed with
bypass query options in the compute API, but it's not a fun problem.

It was also suggested that we store a minimal set of data about
instances in the top-level nova API database's instance_mappings table,
where all we have today is the uuid. Anything that is set in the API
would probably be OK for this, but operators in the room noted that they
frequently need to filter instances by an IP, which is set in the
compute. So this option turns into a slippery slope, and is potentially
not inter-operable across clouds.

Matt Booth is also skeptical that we can't have a multi-cell query
perform well, and

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On 05/19/2017 02:34 PM, Dean Troyer wrote:
> On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:
>> These should be used as ways to experiment with the kinds of interfaces
>> we want cheaply, then take them back into services (which is a more
>> expensive process involving compatibility stories, deeper documentation,
>> performance implications, and the like), not an end game on their own.
> 
> I totally agree here.  But I also see the rate of progress for many
> and varied reasons, and want to make users lives easier now.
> 
> Have any of the lessons already learned from Shade or OSC made it into
> services yet?  I think a few may have, "get me a network" being the
> obvious one.  But that still took a lot of work (granted that one _is_
> complicated).

Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.

>> You can get the behavior. It also has other behaviors. I'm not sure any
>> user has actually argued for "please make me do more rest calls to
>> create a server".
> 
> Maybe not in those words, but "give me the tools to do what I need"
> has been heard often.  Sometimes those tools are composable
> primitives, sometimes they are helpful opinionated interfaces.  I've
> already done the helpful opinionated stuff in OSC here (accept flavor
> and image names when the non-unique names _do_ identify a single
> result).  Having that control lets me give the user more options in
> handling edge cases.

Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.

All for new and better experiences. I think that's great. Where I think
we want to be really careful is deciding the path to creating better
experiences is by not engaging with the services and just writing around
it. That feedback has to come back. Those reasons have to come back, and
we need to roll sensible improvements back into base services.

If you want to go fast, go alone, if you want to go far, go together.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [trove] Trove meeting time update

2017-05-19 Thread MCCASLAND, TREVOR

I have submitted a patch for updating the regular trove meeting time to 1500 
UTC. This was decided during the trove project update meeting last week[1]

If you weren't able to make it and want to voice your opinion or If you feel 
like a better time is more suitable, feel free to make a suggestion here[2] 

[1] https://youtu.be/g8tKXn_Axhs?t=23m50s
[2] https://review.openstack.org/#/c/466381/ 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][neutron] massive overhead processing "network-changed" events during live migration

2017-05-19 Thread Chris Friesen

Recently we noticed failures in Newton when we attempted to live-migrate an 
instance with 16 vifs.  We tracked it down to an RPC timeout in nova which timed 
out waiting for the 'refresh_cache-%s' lock in get_instance_nw_info().  This led 
to a few other discoveries.


First, we have no fair locking in OpenStack.  The live migration code path was 
waiting for the lock, but the code processing the incoming "network-changed" 
events kept getting the lock instead even though they arrived while the live 
migration code was already blocked waiting for the lock.


Second, it turns out the cost of processing the "network-changed" events is 
astronomical.


1) In Newton nova commit 5de902a was merged to fix evacuate bugs, but it meant 
both source and dest compute nodes got the "network-changed" events.  This 
doubled the number of neutron API calls during a live migration.


2) A "network-changed" event is sent from neutron each time something changes. 
There are multiple of these events for each vif during a live-migration.  In the 
current upstream code the only information passed with the event is the instance 
id, so nova will loop over all the ports in the instance and build up all the 
information about subnets/floatingIP/fixedIP/etc. for that instance.  This 
results in O(N^2) neutron API calls where N is the number of vifs in the instance.


3) mriedem has proposed a patch series (https://review.openstack.org/#/c/465783 
and https://review.openstack.org/#/c/465787) that would change neutron to 
include the port ID, and allow nova to update just that port.  This reduces the 
cost to O(N), but it's still significant.


In a hardware lab with 4 compute nodes I created 4 boot-from-volume instances, 
each with 16 vifs.  I then live-migrated them all in parallel.  (The one on 
compute-0 was migrated to compute-1, the one on compute-1 was migrated to 
compute-2, etc.)  The aggregate CPU usage for a few critical components on the 
controller node is shown below.  Note in particular the CPU usage for 
neutron--it's using most of 10 CPUs for ~10 seconds, spiking to 13 CPUs.  This 
seems like an absurd amount of work to do just to update the cache in nova.



Labels:
  L0: neutron-server
  L1: nova-conductor
  L2: beam.smp
  L3: postgres
-  - -  L0  L1  L2  L3
date   time dt occ occ occ occ
/mm/dd hh:mm:ss.dec(s) (%) (%) (%) (%)
2017-05-19 17:51:38.710  2.173   19.751.282.851.96
2017-05-19 17:51:40.012  1.3021.021.753.805.07
2017-05-19 17:51:41.334  1.3222.342.665.251.76
2017-05-19 17:51:42.681  1.347   91.793.315.275.64
2017-05-19 17:51:44.035  1.354   40.787.273.487.34
2017-05-19 17:51:45.406  1.3717.12   21.358.66   19.58
2017-05-19 17:51:46.784  1.378   16.71  196.296.87   15.93
2017-05-19 17:51:48.133  1.349   18.51  362.468.57   25.70
2017-05-19 17:51:49.508  1.375  284.16  199.304.58   18.49
2017-05-19 17:51:50.919  1.411  512.88   17.617.47   42.88
2017-05-19 17:51:52.322  1.403  412.348.909.15   19.24
2017-05-19 17:51:53.734  1.411  320.245.20   10.599.08
2017-05-19 17:51:55.129  1.396  304.922.27   10.65   10.29
2017-05-19 17:51:56.551  1.422  556.09   14.56   10.74   18.85
2017-05-19 17:51:57.977  1.426  979.63   43.41   14.17   21.32
2017-05-19 17:51:59.382  1.405  902.56   48.31   13.69   18.59
2017-05-19 17:52:00.808  1.425 1140.99   74.28   15.12   17.18
2017-05-19 17:52:02.238  1.430 1013.91   69.77   16.46   21.19
2017-05-19 17:52:03.647  1.409  964.94  175.09   15.81   27.23
2017-05-19 17:52:05.077  1.430  838.15  109.13   15.70   34.12
2017-05-19 17:52:06.502  1.425  525.88   79.09   14.42   11.09
2017-05-19 17:52:07.954  1.452  614.58   38.38   12.20   17.89
2017-05-19 17:52:09.380  1.426  763.25   68.40   12.36   16.08
2017-05-19 17:52:10.825  1.445  901.57   73.59   15.90   41.12
2017-05-19 17:52:12.252  1.427  966.15   42.97   16.76   23.07
2017-05-19 17:52:13.702  1.450  902.40   70.98   19.66   17.50
2017-05-19 17:52:15.173  1.471 1023.33   59.71   19.78   18.91
2017-05-19 17:52:16.605  1.432 1127.04   64.19   16.41   26.80
2017-05-19 17:52:18.046  1.442 1300.56   68.22   16.29   24.39
2017-05-19 17:52:19.517  1.471 1055.60   71.74   14.39   17.09
2017-05-19 17:52:20.983  1.465  845.30   61.48   15.24   22.86
2017-05-19 17:52:22.447  1.464 1027.33   65.53   15.94   26.85
2017-05-19 17:52:23.919  1.472 1003.08   56.97   14.39   28.93
2017-05-19 17:52:25.367  1.448  702.50   45.42   11.78   20.53
2017-05-19 17:52:26.814  1.448  558.63   66.48   13.22   29.64
2017-05-19 17:52:28.276  1.462  620.34  206.63   14.58   17.17
2017-05-19 17:52:29.749  1.473  555.62  110.37   10.95   13.27
2017-05-19 17:52:31.228  1.479  436.66   33.659.00   21.55
2017-05-19 17:52:32.685  1.456  417.12   87.44   13.44   12.27
2017-05-19 17:52:34.128  1.443  368.31   87.08   11.95   14.70
2017-05-19

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:
> These should be used as ways to experiment with the kinds of interfaces
> we want cheaply, then take them back into services (which is a more
> expensive process involving compatibility stories, deeper documentation,
> performance implications, and the like), not an end game on their own.

I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).

> You can get the behavior. It also has other behaviors. I'm not sure any
> user has actually argued for "please make me do more rest calls to
> create a server".

Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

On Thu, May 18, 2017 at 4:21 PM, Matt Riedemann  wrote:
> Since sorting instances across cells is the main issue, it was also
> suggested that we allow a config option to disable sorting in the API. It
> was stated this would be without a microversion, and filtering/paging would
> still be supported. I'm personally skeptical about how this could be
> consider inter-operable or discoverable for API users, and would need more
> thought and input from users like Monty Taylor and Clark Boylan.

Please please please make that config option discoverable, do not
propagate that silent config option pattern any more.  Please.

This is totally a microversion-required situation in my view as the
API will behave differently and clients will need to do the sorting
locally if that is what they require.  Doing it locally is (usually)
fine, but we need to know.

Now the question of how to actually do this?  If we had some
side-channel to return results metadata then this config change would
be discoverable after-the-fact, which in this case would be acceptable
as the condition checking happens after (at least some of) results are
returned anyway.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Clint Byrum

Excerpts from Clark Boylan's message of 2017-05-19 10:03:23 -0700:
> On Fri, May 19, 2017, at 05:59 AM, Duncan Thomas wrote:
> > On 19 May 2017 at 12:24, Sean Dague  wrote:
> > 
> > > I do get the concerns of extra logic in Nova, but the decision to break
> > > up the working compute with network and storage problem space across 3
> > > services and APIs doesn't mean we shouldn't still make it easy to
> > > express some pretty basic and common intents.
> > 
> > Given that we've similar needs for retries and race avoidance in and
> > between glance, nova, cinder and neutron, and a need or orchestrate
> > between at least these three (arguably other infrastructure projects
> > too, I'm not trying to get into specifics), maybe the answer is to put
> > that logic in a new service, that talks to those four, and provides a
> > nice simple API, while allowing the cinder, nova etc APIs to remove
> > things like internal retries?
> 
> The big issue with trying to solve the problem this way is that various
> clouds won't deploy this service then your users are stuck with the
> "base" APIs anyway or deploying this service themselves. This is mostly
> ok until you realize that we rarely build services to run "on" cloud
> rather than "in" cloud so I as the user can't sanely deploy a new
> service this way, and even if I can I'm stuck deploying it for the 6
> clouds and 15 regions (numbers not exact) because even more rarely do we
> write software that is multicloud/region aware.
> 
> We need to be very careful if this is the path we take because it often
> doesn't actually make the user experience better.

I think an argument can be made that if the community were to rally
around something like Enamel and Oaktree, that it would be deployed
broadly.

As Zane pointed out elsewhere in the thread, Heat does some of this
for you, and has seen a lot of adoption, but nowhere near the level
of Neutron and Cinder. However, I believe Heat is missing from some
clouds because it is stateful, and thus, requires a large investment to
deploy. Oaktree is specifically _not_ stateful, and not dependent on admin
access to function, so I could see rallying around something that _can_
be deployed by users, but would be much more popular for deployers to
add in as users ask for it.

So whatever gets chosen as the popular porcelein API, it sounds to me
like it's worth getting serious about it.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On 05/19/2017 01:38 PM, Dean Troyer wrote:
> On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
>  wrote:
>> ..., but it seems to me that the logical
>> extension of that is to expose simple orthogonal APIs where the nova boot
>> request should only take neutron port ids and cinder volume ids.  The actual
>> setup of those ports/volumes would be done by neutron and cinder.
>>
>> It seems somewhat arbitrary to say "for historical reasons this subset of
>> simple things can be done directly in a nova boot command, but for more
>> complicated stuff you have to go use these other commands".  I think there's
>> an argument to be made that it would be better to be consistent even for the
>> simple things.
> 
> cdent mentioned enamel[0] above, and there is also oaktree[1], both of
> which are wrapper/proxy services in front of existing OpenStack APIs.
> I don't know enough about enamel yet, but one of the things I like
> about oaktree is that it is not required to be deployed by the cloud
> operator to be useful, I could set it up and proxy Rax and/or
> CityCloud and/or mtreinish's closet cloud equally well.
> 
> The fact that these exist, and things like shade itself, are clues
> that we are not meeting the needs of API consumers.  I don't think
> anyone disagrees with that; let me know if you do and I'll update my
> thoughts.

It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.

I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... boom.

You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10
(http://superuser.openstack.org/articles/cern-cloud-architecture-update/
last graph). That was an anti pattern. We should have gotten to the
bottom of the mismatches and communication issues early on, because the
end state we all inflicted on users to get a totally reasonable set of
features, was not good. Please lets not do this again.

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper documentation,
performance implications, and the like), not an end game on their own.

> First and foremost, we need to have the primitive operations that get
> composed into the higher-level ones available.  Just picking "POST
> /server" as an example, we do not have that today.  Chris mentions
> above the low-level version should take IDs for all of the associated
> resources and no magic happening behind the scenes.  I think this
> should be our top priority, everything else builds on top of that, via
> either in-service APIs or proxies or library wrappers, whatever a) can
> get implemented and b) makes sense for the use case.

You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".

Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
want my server to build", or "I'd like Nova to build a volume for me"
are very odd things to call PaaS. I think of PaaS as "here is a ruby on
rails app, provision me a db for it, and make it go". Heroku style.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:
> ..., but it seems to me that the logical
> extension of that is to expose simple orthogonal APIs where the nova boot
> request should only take neutron port ids and cinder volume ids.  The actual
> setup of those ports/volumes would be done by neutron and cinder.
>
> It seems somewhat arbitrary to say "for historical reasons this subset of
> simple things can be done directly in a nova boot command, but for more
> complicated stuff you have to go use these other commands".  I think there's
> an argument to be made that it would be better to be consistent even for the
> simple things.

cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available.  Just picking "POST
/server" as an example, we do not have that today.  Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes.  I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.

dt

[BTW, I made this same type of proposal for the OpenStack SDK a few
years ago and it went unmerged, so at some level folks do not agree
this is necessary. I look now at what the Shade folk are doing about
building low-level REST layer that they then compose and wish I had
been more persistent then.]

[0] https://github.com/jaypipes/enamel
[1] http://git.openstack.org/cgit/openstack/oaktree
-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [gnocchi] Migration to GitHub

2017-05-19 Thread Julien Danjou

On Thu, May 18 2017, Julien Danjou wrote:

> I've started to migrate Gnocchi itself to GitHub. The Launchpad bugs
> have been re-created at https://github.com/gnocchixyz/gnocchi/issues and
> I'll move the repository as soon as all opened reviews are merged.

Everything has been merged today so the repository is now live at
GitHub.
The rest of the deprecation patches are up for review:
  
https://review.openstack.org/#/q/status:open+project:openstack-infra/project-config+branch:master+topic:jd/move-gnocchi-out

-- 
Julien Danjou
/* Free Software hacker
   https://julien.danjou.info */


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Clark Boylan

On Fri, May 19, 2017, at 05:59 AM, Duncan Thomas wrote:
> On 19 May 2017 at 12:24, Sean Dague  wrote:
> 
> > I do get the concerns of extra logic in Nova, but the decision to break
> > up the working compute with network and storage problem space across 3
> > services and APIs doesn't mean we shouldn't still make it easy to
> > express some pretty basic and common intents.
> 
> Given that we've similar needs for retries and race avoidance in and
> between glance, nova, cinder and neutron, and a need or orchestrate
> between at least these three (arguably other infrastructure projects
> too, I'm not trying to get into specifics), maybe the answer is to put
> that logic in a new service, that talks to those four, and provides a
> nice simple API, while allowing the cinder, nova etc APIs to remove
> things like internal retries?

The big issue with trying to solve the problem this way is that various
clouds won't deploy this service then your users are stuck with the
"base" APIs anyway or deploying this service themselves. This is mostly
ok until you realize that we rarely build services to run "on" cloud
rather than "in" cloud so I as the user can't sanely deploy a new
service this way, and even if I can I'm stuck deploying it for the 6
clouds and 15 regions (numbers not exact) because even more rarely do we
write software that is multicloud/region aware.

We need to be very careful if this is the path we take because it often
doesn't actually make the user experience better.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Chris Friesen

On 05/19/2017 07:18 AM, Sean Dague wrote:

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to *actually* do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

A while back I suggested adding the vif-model as an attribute on the network
during a nova boot request, and we were shot down because "that should be done
in neutron".

I have some sympathy for this argument, but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids. The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of simple
things can be done directly in a nova boot command, but for more complicated
stuff you have to go use these other commands". I think there's an argument to
be made that it would be better to be consistent even for the simple things.

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-19 Thread Doug Hellmann

Excerpts from Mehdi Abaakouk's message of 2017-05-19 10:23:09 +0200:
> On Thu, May 18, 2017 at 03:16:20PM -0400, Mike Bayer wrote:
> >
> >
> >On 05/18/2017 02:37 PM, Julien Danjou wrote:
> >>On Thu, May 18 2017, Mike Bayer wrote:
> >>
> >>>I'm not understanding this?  do you mean this?
> >>
> >>In the long run, yes. Unfortunately, we're not happy with the way Oslo
> >>libraries are managed and too OpenStack centric. I've tried for the last
> >>couple of years to move things on, but it's barely possible to deprecate
> >>anything and contribute, so I feel it's safer to start fresh and better
> >>alternative. Cotyledon by Mehdi is a good example of what can be
> >>achieved.
> >
> >
> >here's cotyledon:
> >
> >https://cotyledon.readthedocs.io/en/latest/
> >
> >
> >replaces oslo.service with a multiprocessing approach that doesn't use 
> >eventlet.  great!  any openstack service that rides on oslo.service 
> >would like to be able to transparently switch from eventlet to 
> >multiprocessing the same way they can more or less switch to mod_wsgi 
> >at the moment.   IMO this should be part of oslo.service itself.   
> 
> I have quickly presented cotyledon some summit ago, we said we will wait
> to see if other projects want to get rid of eventlet before adopting
> such new lib (or merge it with oslo.service).
> 
> But for now, the lib is still under telemetry umbrella.
> 
> Keeping the current API and supporting both are (I think) impossible.
> The current API is too eventlet centric. And some applications rely
> on implicit internal contract/behavior/assumption.
> 
> Dealing about concurrent/thread/signal safety in multithreading app or
> eventlet app is already hard enough. So having the lib that deals with
> both is even harder. We already have oslo.messaging that deals with
> 3 threads models, this is just an unending story of race conditions.
> 
> Since a new API is needed, why not writing a new lib. Anyways when you
> get rid of eventlet you have so many thing to change to ensure your
> performance will not drop. Changing from oslo.service to cotyledon is
> really easy on the side.
> 
> >Docs state: "oslo.service being impossible to fix and bringing an 
> >heavy dependency on eventlet, "  is there a discussion thread on that?
> 
> Not really, I just put some comments on reviews and discus this on IRC.
> Since nobody except Telemetry have expressed/try to get rid of eventlet.
> 
> For the story we first get rid of eventlet in Telemetry, fixes couple of
> performance issue due to using threading/process instead
> greenlet/greenthread.
> 
> Then we fall into some weird issue due to oslo.service internal
> implementation. Process not exiting properly, signals not received,
> deadlock when signal are received, unkillable process,
> tooz/oslo.messaging heartbeat not scheduled correctly, worker not
> restarted when they are dead. All of what we expect from oslo.service
> was not working correctly anymore because we remove the line
> 'eventlet.monkeypatch()'.
> 
> For example, when oslo.service receive a signal, it can arrive on any
> thread, this thread is paused, the callback is run in this thread
> context, but if the callback try to discus to your code in this thread,
> the process lockup, because your code is paused. Python
> offers tool to avoid that (signal.set_wakeup_fd), but oslo.service don't
> use it. I have tried to run callbacks only on the main thread with
> set_wakeup_fd, to avoid this kind of issue but I fail. The whole
> oslo.service code is clearly not designed to be threadsafe/signalsafe.
> Well, it works for eventlet because you have only one real thread.
> 
> And this is just one example on complicated thing I have tried to fix,
> before starting cotyledon.
>
> >I'm finding it hard to believe that only a few years ago, everyone saw 
> >the wisdom of not re-implementing everything in their own projects and 
> >using a common layer like oslo, and already that whole situation is 
> >becoming forgotten - not just for consistency, but also when a bug is 
> >found, if fixed in oslo it gets fixed for everyone.
> 
> Because the internal of cotyledon and oslo.service are so different.
> Having the code in oslo or not doesn't help for maintenance anymore.
> Cotyledon is a lib, code and bugs :) can already be shared between
> projects that doesn't want eventlet.

Yes, I remember discussing this some time ago and I agree that starting
a new library was the right approach. The changes needed to make
oslo.service work without eventlet are too big, and rather than have 2
separate implementations in the same library a second library makes
sense.

> >An increase in the scope of oslo is essential to dealing with the 
> >issue of "complexity" in openstack. 
> 
> Increasing the scope of oslo works only if libs have maintainers. But
> most of them lack of people today. Most of oslo libs are in maintenance
> mode. But that another subject.
> 
> > The state of openstack as dozens 
> >of individual software projects

Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

2017-05-19 Thread Michał Jastrzębski

Kolla:
Attendees - full room (20-30?)
Notes - Conflict with kolla-k8s demo probably didn't help

While we didn't have etherpad, slides, recording (and video dongle
that could fit my laptop), we had great session with analog tools
(whiteboard and my voice chords). We walked through architecture of
each Kolla project, how they relate to each other and so on.

Couple things to take out from our onboarding:
1. Bring dongles
2. We could've used bigger room - people were leaving because we had
no chairs left
3. Recording would be awesome
4. Low tech is not a bad tech

All and all, when we started session I didn't know what to expect or
what people will expect so we just...rolled with it, and people seemed
to be happy with it:) I think onboarding rooms were great idea (kudos
to whoever came up with it)! I'll be happy to run it again in Sydney.

Cheers,
Michal

On 19 May 2017 at 08:12, Julien Danjou  wrote:
> On Fri, May 19 2017, Sean Dague wrote:
>
>> If you ran a room, please post the project, what you did in the room,
>> what you think worked, what you would have done differently. If you
>> attended a room you didn't run, please provide feedback about which one
>> it was, and what you thought worked / didn't work from the other side of
>> the table.
>
> We shared a room for Telemetry and CloudKitty for 90 minutes.
> I was there with Gordon Chung for Telemetry.
> Christophe Sauthier was there for CloudKitty.
>
> We only had 3 people showing up in the session. One wanted to read his
> emails in a quiet room, the two others had a couple of question on
> Telemetry – though it was not really related to contribution as far as I
> can recall.
>
> I had to leave after 45 minutes because they was an overlap with a talk
> I was doing and rescheduling did not seem possible. And everybody left a
> few minutes after I left apparently.
>
> --
> Julien Danjou
> -- Free Software hacker
> -- https://julien.danjou.info
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova]next notification subteam meeting

2017-05-19 Thread Balazs Gibizer


Hi,

The next two notification subteam meetings are canceled so the next 
meeting will be held on 6th of June.

https://www.timeanddate.com/worldclock/fixedtime.html?iso=20170606T17

Cheers,
gibi




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 20)

2017-05-19 Thread James Slagle

On Fri, May 19, 2017 at 5:23 AM, Attila Darazs  wrote:
> If the topics below interest you and you want to contribute to the
> discussion, feel free to join the next meeting:
>
> Time: Thursdays, 14:30-15:30 UTC
> Place: https://bluejeans.com/4113567798/
>
> Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
>
> = Using RDO Cloud for OVB jobs =
>
> We spent some time discussing the steps needed to start running a few OVB
> TripleO jobs on the new RDO Cloud, which seems to be a good shape to start
> utilizing it. We need to create new users for it and add the cloud
> definition to project-config among other things.
>
> When all is set up, we will ramp up the amount of jobs ran there slowly to
> test the stability and bottlenecks.
>
> = Old OVB jobs running without Quickstart =
>
> There are a couple of jobs that is still not transitioned running on a few
> repos. We need to figure out if those jobs are still needed and if yes,
> what's holding them back from transition.

Hi, is there a plan or potential dates for transitioning the remaining
multinode jobs to quickstart?

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Boston Forum session recap - cellsv2

2017-05-19 Thread Dan Smith

The etherpad for this session is here [1]. The goal of the session was
to get some questions answered that the developers had for operators
around the topic of cellsv2.

The bulk of the time was spent discussing ways to limit instance
scheduling retries in a cellsv2 world where placement eliminates
resource-reservation races. Reschedules would be upcalls from the cell,
which we are trying to avoid.

While placement should eliminate 95% (or more) of reschedules due to
pre-claiming resources before booting, there will still be cases where
we may want to reschedule due to unexpected transient failures. How many
of those remain, and whether or not rescheduling for them is really
useful is in question.

The compromise that seemed popular in the room was to grab more than one
host at the time of scheduling, claim for that one, but pass the rest to
the cell. If the cell needs to reschedule, the cell conductor would try
one of the alternates that came as part of the original boot request,
instead of asking scheduler again.

During the discussion of this, an operator raised the concern that
without reschedules, a single compute that fails to boot 100% of the
time ends up becoming a magnet for all future builds, looking like an
excellent target for the scheduler, but failing anything that is sent to
it. If we don't reschedule, that situation could be very problematic. An
idea came out that we should really have compute monitor and disable
itself if a certain number of _consecutive_ build failures crosses a
threshold. That would mitigate/eliminate the "fail magnet" behavior and
further reduce the need for retries. A patch has been proposed for this,
and so far enjoys wide support [2].

We also discussed the transition to counting quotas, and what that means
for operators. The room seemed in favor of this, and discussion was brief.

Finally, I made the call for people with reasonably-sized pre-prod
environments to begin testing cellsv2 to help prove it out and find the
gremlins. CERN and NeCTAR specifically volunteered for this effort.

[1]
https://etherpad.openstack.org/p/BOS-forum-cellsv2-developer-community-coordination
[2] https://review.openstack.org/#/c/463597/

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

2017-05-19 Thread Julien Danjou

On Fri, May 19 2017, Sean Dague wrote:

> If you ran a room, please post the project, what you did in the room,
> what you think worked, what you would have done differently. If you
> attended a room you didn't run, please provide feedback about which one
> it was, and what you thought worked / didn't work from the other side of
> the table.

We shared a room for Telemetry and CloudKitty for 90 minutes.
I was there with Gordon Chung for Telemetry.
Christophe Sauthier was there for CloudKitty.

We only had 3 people showing up in the session. One wanted to read his
emails in a quiet room, the two others had a couple of question on
Telemetry – though it was not really related to contribution as far as I
can recall.

I had to leave after 45 minutes because they was an overlap with a talk
I was doing and rescheduling did not seem possible. And everybody left a
few minutes after I left apparently.

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info

signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [horizon][api][docs] Feedback requested on proposed formatting change to API docs

2017-05-19 Thread Joe Topjian

On Fri, May 19, 2017 at 8:00 AM, Sean Dague  wrote:

> On 05/19/2017 08:36 AM, Monty Taylor wrote:
> > On 05/17/2017 10:14 AM, Joe Topjian wrote:
> >>
> >>
> >> On Tue, May 16, 2017 at 4:13 PM, Monty Taylor  >> > wrote:
> >>
> >> Hey all!
> >>
> >> I read the API docs A LOT. (thank you to all of you who have worked
> >> on writing them)
> >>
> >> As I do, a gotcha I hit up against a non-zero amount is mapping the
> >> descriptions of the response parameters to the form of the response
> >> itself. Most of the time there is a top level parameter under which
> >> either an object or a list resides, but the description lists list
> >> the top level and the sub-parameters as siblings.
> >>
> >> So I wrote a patch to os-api-ref taking a stab at providing a way to
> >> show things a little differently:
> >>
> >> https://review.openstack.org/#/c/464255/
> >> 
> >>
> >> You can see the output here:
> >>
> >>
> >> http://docs-draft.openstack.org/55/464255/5/check/gate-
> nova-api-ref-src/f02b170//api-ref/build/html/
> >>
> >>
> >>  nova-api-ref-src/f02b170//api-ref/build/html/>
> >>
> >>
> >> If you go expand either the GET / or the GET /servers/details
> >> sections and go to look at their Response sections, you can see it
> >> in action.
> >>
> >> We'd like some feedback on impact from humans who read the API docs
> >> decently regularly...
> >>
> >> The questions:
> >>
> >> - Does this help, hurt, no difference?
> >>
> >>
> >> It helps. It seems noisy at first glance, but the information being
> >> denoted is important. It's one of those situations where once you start
> >> reading deeper into the information, this kind of markup makes the API
> >> more understandable more quickly.
> >
> > Awesome. Thanks!
> >
> >> - servers[].name - servers is a list, containing objects with a name
> >> field. Good or bad?
> >> - servers[].addresses.$network-name - addresses is an object and
> the
> >> keys of the object are the name of the network in question.
> >>
> >>
> >> Again, these seem noisy at first, but having parsed complex paths,
> >> especially the above address info, by dumping variables too many times,
> >> I really appreciate the above syntax.
> >>
> >> Going even further:
> >>
> >> servers[].addresses.$network-name[].OS-EXT-IPS-MAC:mac_addr
> >>
> >> looks a mess, but I can see how exactly to navigate to the leaf as well
> >> as understand what types make up the path. Being able to succinctly
> >> (relatively/subjectively speaking) describe something like the above is
> >> very helpful. This definitely gets my support.
> >
> > Sweet. thanks for the feedback! I just pushed up a new rev yesterday
> > based on some feedback from Anne:
> >
> > http://docs-draft.openstack.org/55/464255/5/check/gate-
> nova-api-ref-src/f02b170//api-ref/build/html/
>
> http://docs-draft.openstack.org/55/464255/7/check/gate-
> nova-api-ref-src/88a33cc//api-ref/build/html/
> is actually the revision with the  tags.


+1 on the  tags.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Heat] revised structure of the heat-templates repository. Suggestions

2017-05-19 Thread Lance Haig


Hi,

As we know the heat-templates repository has become out of date in some 
respects and also has been difficult to be maintained from a community 
perspective.


For me the repository is quiet confusing with different styles that are 
used to show certain aspects and other styles for older template examples.


This I think leads to confusion and perhaps many people who give up on 
heat as a resource as things are not that clear.


From discussions in other threads and on the IRC channel I have seen 
that there is a need to change things a bit.



This is why I would like to start the discussion that we rethink the 
template example repository.


I would like to open the discussion with mys suggestions.

 * We need to differentiate templates that work on earlier versions of
   heat that what is the current supported versions.
 o I have suggested that we create directories that relate to
   different versions so that you can create a stable version of
   examples for the heat version and they should always remain
   stable for that version and once it goes out of support can
   remain there.
 o This would mean people can find their version of heat and know
   these templates all work on their version
 * We should consider adding a docs section that that includes training
   for new users.
 o I know that there are documents hosted in the developer area and
   these could be utilized but I would think having a documentation
   section in the repository would be a good way to keep the
   examples and the documents in the same place.
 o This docs directory could also host some training for new users
   and old ones on new features etc.. In a similar line to what is
   here in this repo https://github.com/heat-extras/heat-tutorial
 * We should include examples form the default hooks e.g. ansible salt
   etc... with SoftwareDeployments.
 o We found this quiet helpful for new users to understand what is
   possible.
 * We should make sure that the validation running against the
   templates runs without ignoring errors.
 o This was noted in IRC that some errors were ignored as the
   endpoints or catalog was not available. It would be good to have
   some form of headless catalog server that tests can be run
   against so that developers of templates can validate before
   submitting patches.


These points are here to open the discussions around this topic

Please feel free to make your suggestions.

Lance

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [L2-Gateway] Tempest plugin structure

2017-05-19 Thread Ricardo Noriega De Soto

Hello guys,

I'm trying to include Neutron L2GW tempest tests into puppet-openstack CI
pipeline and I'm having some troubles. I noticed that the structure of the
code:

https://github.com/openstack/networking-l2gw/tree/master/networking_l2gw/tests

It's not as a standard tempest plugin. I was wondering if there is any
reason why this plugin was done like this, or if there is any intention of
standarizing it with cookiecutter or so. Having the following variables:

https://github.com/openstack/networking-l2gw/blob/master/tox.ini#L49-L53

make harder to get the tests running in the pipeline.

Thanks guys!!

-- 
Ricardo Noriega

Senior Software Engineer - NFV Partner Engineer | Office of Technology  |
Red Hat
irc: rnoriega @freenode
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Zane Bitter


On 18/05/17 20:19, Matt Riedemann wrote:

I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding orchestration to
the compute API, and even define a policy for it since it comes up so
much [1]. The stance is that the compute API should expose capabilities
that a higher-level orchestration service can stitch together for a more
fluid end user experience.


I think this is a wise policy.


One simple example that comes up time and again is allowing a user to
pass volume type to the compute API when booting from volume such that
when nova creates the backing volume in Cinder, it passes through the
volume type. If you need a non-default volume type for boot from volume,
the way you do this today is first create the volume with said type in
Cinder and then provide that volume to the compute API when creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).


As always, there's a trade-off between simplicity and flexibility. I can 
certainly understand the logic in wanting to make the simple stuff 
simple. But users also need to be able to progress from simple stuff to 
more complex stuff without having to give up and start over. There's a 
danger of leading them down the garden path.



While talking about claims in the scheduler and a top-level conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller services
(nova-api, nova-conductor and nova-scheduler). Build retries is one such
up-call. CERN disables build retries, but others rely on them, because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked, "why
not just do away with build retries in nova altogether? If the scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."


(FWIW Heat does this for you already.)


But during several different Forum sessions, like user API improvements
[2] but also the cells v2 and claims in the scheduler sessions, I was
hearing about how operators only wanted to expose the base IaaS services
and APIs and end API users wanted to only use those, which means any
improvements in those APIs would have to be in the base APIs (nova,
cinder, etc). To me, that generally means any orchestration would have
to be baked into the compute API if you're not using Heat or something
similar.


The problem is that orchestration done inside APIs is very easy to do 
badly in ways that cause lots of downstream pain for users and external 
orchestrators. For example, Nova already does some orchestration: it 
creates a Neutron port for a server if you don't specify one. (And then 
promptly forgets that it has done so.) There is literally an entire 
inner platform, an orchestrator within an orchestrator, inside Heat to 
try to manage the fallout from this. And the inner platform shares none 
of the elegance, such as it is, of Heat itself, but is rather a 
collection of cobbled-together hacks to deal with the seemingly infinite 
explosion of edge cases that we kept running into over a period of at 
least 5 releases.


The get-me-a-network thing is... better, but there's no provision for 
changes after the server is created, which means we have to copy-paste 
the Nova implementation into Heat to deal with update.[1] Which sounds 
like a maintenance nightmare in the making. That seems to be a common 
mistake: to assume that once users create something they'll never need 
to touch it again, except to delete it when they're done.


Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done 
superbly well, provide transparency for external orchestration tools 
that need to hook in to the data flow, and should be developed in 
consultation with potential consumers like Shade and Heat.



Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?


(Aside: can we stop using the term 'PaaS' to refer to "everything that 
Nova doesn't do"? This habit is not helping us to communicate clearly.)


cheers,
Zane.

[1] https://review.openstack.org/#/c/407328/
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

2017-05-19 Thread Kendall Nelson

Thank you so much for getting this started Sean!

I have gotten a lot of feedback that people liked the on-boarding rooms,
but I would be interested to know more about what people did so we can
coordinate better next time. This round I left a lot of the decisions up to
the different teams since this was a new type of session for the Summit so
we could figure out what works best.

I started a resource collection here[1] to round up materials.  I am trying
to find a place to post them so that people that weren't able to attend to
look at since we weren't able to get recordings in the room this time-
definitely something to try to coordinate next round!


Thanks again,

-Kendall (diablo_rojo)

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-May/116513.html




On Fri, May 19, 2017 at 8:40 AM Sean Dague  wrote:

> On 05/19/2017 09:22 AM, Sean Dague wrote:
> > If you ran a room, please post the project, what you did in the room,
> > what you think worked, what you would have done differently. If you
> > attended a room you didn't run, please provide feedback about which one
> > it was, and what you thought worked / didn't work from the other side of
> > the table.
>
> Project: Nova
> Attendees: 25 - 30
> Notes: (this conflicted with Baremetal/VM platform part 1, may have
> impacted attendance)
> Etherpad:
> https://etherpad.openstack.org/p/BOS-forum-nova-project-onboarding
>
> What we did:
>
> To get the room warmed up (it was the first post keynote session), we
> prepared a document which was an annotated flow of the logs of booting
> up a server with openstack client -
> https://github.com/sdague/nova-boot-flow/blob/master/flow.rst - and
> talked through all of that, fielding questions along the way. That
> actually took about 45 minutes because 20 minutes in the room had warmed
> up and started asking a bunch of questions (especially around scheduling
> which always seems like a hot area).
>
> We used the back half of the session for just audience questions. Some
> of the more interesting ones were diving into what a context really is
> (that's a pretty core concept in multiple projects, but one we forget is
> new to people).
>
> We did an adhoc diagramming of the basic api.py -> rpcapi.py ->
> manager.py pattern in the code that hits all the different daemons. And
> even looked at some of the directory structures on how this is organized.
>
> There was a good conversation towards the end on debug strategies. Most
> of us are print/log debuggers, but guru mediation was news to most folks
> in the room. Definitely clear that there is a need for a pdb guide for
> OpenStack (by someone that regularly uses it).
>
> There was also a good discussion around types of arguments in Nova
> function calls, and how much one can trust they know what a variable
> named "instance" really is.
>
>
> What worked:
>
> It was really good to have some interactive technical content pre canned
> to get the conversation going. Rooms start cold, and you need to get
> people interactive.
>
> Questions phase turned out really good. They also seemed pretty spread
> around the audience.
>
>
> Do differently next time:
>
> Recording would have been great.
>
> We did a poor job of fielding questions off the etherpad because my
> laptop was being used show flows or answers. Next time it would be good
> to have 2 computers up, one on the etherpad watching for questions from
> quieter people there, while we have other relevant answer material on
> the projector.
>
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [deployment][tripleo][puppet] using puppet to push configuration into etcd

2017-05-19 Thread Emilien Macchi

While some efforts are made to support k/v to store OpenStack
configurations [1] [2] [3], I spent some time this week to investigate
how Puppet modules could still be used to manage data binding, but
push the config into etcd instead of the files.

A containerized TripleO environment only use Puppet modules to manage
configuration file (and some other bits, but unrelated to $topic). The
configuration management is done in 2 steps: the data binding (exposed
to the end-user) and writing the configuration somewhere (in files
right now e.g. /etc/keystone/keystone.conf). As we are moving to etcd
to store configuration, we still need to maintain a stable interface
for data binding until we figure out the replacement. In other words,
we haven't found yet how we could replace Hiera to bind Heat
parameters into actual OpenStack parameters consumed by the services.

I've been thinking of a transition where we would use etcd to store
configs but we would still use Puppet and Hiera to handle data binding
and push the config into etcd. I think it would make the transition to
etcd smoother since we wouldn't change any logic in parameters and it
would give us more time to find a way to manage the future tool that
will actually push the configuration directly from the interface
(instead of using Puppet). Puppet would still be run during the
TripleO deployment but it wouldn't write any configuration file.
Instead, it would push the config into etcd before or during deploying
containers.

+-+
|TripleO Interface   |
| (Heat Templates, UI, etc) |
+--+
  |
+--v+
| Puppet / Hiera |
++
 |
+--v--+
| etcd |
+--+

I started this WIP work:
https://review.openstack.org/#/c/466292/

It's a first draft of what could be done in puppet-openstacklib to use
etcd as a configuration store backend, instead of configuration file.
During the investigation, I found some limitations to ruby-etcdv3 and
also found out that the openstack_config ruby provider would certainly
require some refacto to be used by etcd backend (a lot of bits are
written for inifile provider).

I'm looking for early feedback on this work, and also potential
contributors willing to help in this effort.

Thanks,

[1] 
https://etherpad.openstack.org/p/BOS-forum-future-of-configuration-management
[2] https://review.openstack.org/#/c/466109/
[3] https://review.openstack.org/#/c/454897
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ptg] How to slice the week to minimize conflicts

Emilien Macchi wrote:
> On Thu, May 18, 2017 at 5:27 AM, Thierry Carrez  wrote:
>> After giving it some thought, my current thinking is that we should
>> still split the week in two, but should move away from an arbitrary
>> horizontal/vertical split. My strawman proposal would be to split the
>> week between inter-project work (+ teams that rely mostly on liaisons in
>> other teams) on Monday-Tuesday, and team-specific work on Wednesday-Friday:
>>
>> Example of Monday-Tuesday rooms:
>> Interop WG, Docs, QA, API WG, Packaging WG, Oslo, Goals helproom,
>> Infra/RelMgt/support teams helpdesk, TC/SWG room, VM Working group...
>>
>> Example of Wednesday-Thursday or Wednesday-Friday rooms:
>> Nova, Cinder, Neutron, Swift, TripleO, Kolla, Infra...
> 
> I like the idea of continuing to have Deployment tools part of
> vertical projects room.
> Though once it's confirmed, I would like to setup a 2 hours slot where
> we meet together and make some cross-deployment-project collaboration.
> In Atlanta, we managed to do it on last minute and I found it
> extremely useful, let's repeat this but scheduled this time.

Actually if you look above, I added the "Packaging WG" in the
Monday-Tuesday rooms example. You could easily have 1 or 2 days there to
discuss collaboration between packaging projects, before breaking out
for 2 or 3 days with your own project team.

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

Le 19/05/2017 15:14, Chris Dent a écrit :
> On Thu, 18 May 2017, Matt Riedemann wrote:
> 
>> We didn't really get into this during the forum session, but there are
>> different opinions within the nova dev team on how to do claims in the
>> controller services (conductor vs scheduler). Sylvain Bauza has a
>> series which uses the conductor service, and Ed Leafe has a series
>> using the scheduler. More on that in the mailing list [3].
> 
> Since we've got multiple threads going on this topic, I put some
> of my concerns in a comment on one of Ed's reviews:
> 
> https://review.openstack.org/#/c/465171/3//COMMIT_MSG@30
> 
> It's a bit left fieldy but tries to ask about some of the long term
> concerns we may need to be thinking about here, with regard to other
> services using placement and maybe them needing a
> scheduler-like-thing too (because placement cannot do everything).
> 

That's actually a good question that I would translate to :
'Are other projects having interest in scheduling other things than just
instances ?'.

To be honest, that's something I wondered since ages and I tried during
the VM/BM Forum session [1] to ask operators/developers their usecases
they'd like to see for placement given the priority they gave.
If you look at the etherpad, you will see a couple of given usecases but
none of them are related to a generic scheduler, rather a compute
scheduler doing multi-projects affinity, which is already in our scope
thanks to Placement.

So, while I think it's a reasonable question to ask, it shouldn't divert
our current priority effort as it can't be said a motivation.
Also, I'm not particularly concerned by the interface between conductor
and scheduler that we have, as that interface is flexible enough for not
blocking us in the future, should we need to implement a generic scheduler.

-Sylvain

> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance][openstack-ansible] Moving on

2017-05-19 Thread Brian Rosmaita

On Thu, May 18, 2017 at 11:55 PM, Steve Lewis  wrote:
> It is clear to me now that I won't be able to work on OpenStack as a part of
> my next day job, wherever that ends up being. As such, I’ll no longer be
> able to invest the time and energy required to maintain my involvement in
> the community. It's time to resign my role as a core reviewer, effective
> immediately.

Steve, on behalf of the Glance community, thank you for your service
to the project.
We all wish you luck in your future endeavors, and we hope you'll find
some time for Glance again at some point.

cheers,
brian

>
> Thanks for all the fish.
> --
> SteveL

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [horizon][api][docs] Feedback requested on proposed formatting change to API docs

On 05/19/2017 08:36 AM, Monty Taylor wrote:
> On 05/17/2017 10:14 AM, Joe Topjian wrote:
>>
>>
>> On Tue, May 16, 2017 at 4:13 PM, Monty Taylor > > wrote:
>>
>> Hey all!
>>
>> I read the API docs A LOT. (thank you to all of you who have worked
>> on writing them)
>>
>> As I do, a gotcha I hit up against a non-zero amount is mapping the
>> descriptions of the response parameters to the form of the response
>> itself. Most of the time there is a top level parameter under which
>> either an object or a list resides, but the description lists list
>> the top level and the sub-parameters as siblings.
>>
>> So I wrote a patch to os-api-ref taking a stab at providing a way to
>> show things a little differently:
>>
>> https://review.openstack.org/#/c/464255/
>> 
>>
>> You can see the output here:
>>
>>
>> http://docs-draft.openstack.org/55/464255/5/check/gate-nova-api-ref-src/f02b170//api-ref/build/html/
>>
>>
>> 
>>
>>
>> If you go expand either the GET / or the GET /servers/details
>> sections and go to look at their Response sections, you can see it
>> in action.
>>
>> We'd like some feedback on impact from humans who read the API docs
>> decently regularly...
>>
>> The questions:
>>
>> - Does this help, hurt, no difference?
>>
>>
>> It helps. It seems noisy at first glance, but the information being
>> denoted is important. It's one of those situations where once you start
>> reading deeper into the information, this kind of markup makes the API
>> more understandable more quickly.
> 
> Awesome. Thanks!
> 
>> - servers[].name - servers is a list, containing objects with a name
>> field. Good or bad?
>> - servers[].addresses.$network-name - addresses is an object and the
>> keys of the object are the name of the network in question.
>>
>>
>> Again, these seem noisy at first, but having parsed complex paths,
>> especially the above address info, by dumping variables too many times,
>> I really appreciate the above syntax.
>>
>> Going even further:
>>
>> servers[].addresses.$network-name[].OS-EXT-IPS-MAC:mac_addr
>>
>> looks a mess, but I can see how exactly to navigate to the leaf as well
>> as understand what types make up the path. Being able to succinctly
>> (relatively/subjectively speaking) describe something like the above is
>> very helpful. This definitely gets my support.
> 
> Sweet. thanks for the feedback! I just pushed up a new rev yesterday
> based on some feedback from Anne:
> 
> http://docs-draft.openstack.org/55/464255/5/check/gate-nova-api-ref-src/f02b170//api-ref/build/html/

http://docs-draft.openstack.org/55/464255/7/check/gate-nova-api-ref-src/88a33cc//api-ref/build/html/
is actually the revision with the  tags.

I think that satisfies my concerns on visual skimming. +2 on moving
forward conceptually. Will leave any other comments in the review.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master


On Fri, 19 May 2017, Adrian Turjak wrote:

On 19 May 2017 11:43 am, Curtis  wrote:
  I had thought that the OpenStack community was deprecating Postgres
  support though, so that could make things a bit harder here (I might
  be wrong about this).

I really hope not, because that will take Cockroachdb off the table entirely 
(unless they add MySQL support) and it may prove to be a
great option overall once it is known to be stable and has been tested in 
larger scale setups.

I remember reading about the possibility of deprecating Postgres but there are 
people using it in production so I assumed we didn't go
down that path. Would be good to have someone confirm.


Deprecating postgreSQL is not a done deal, it's up for review at
[1] and [2]. And at this point it is more about documenting reality
that postgreSQL is not a focus of upstream development.

Deprecation is likely to happen, however, if there isn't an increase
in the number people willing to:

* actively share pg knowledge in the OpenStack community
* help with ensuring there is gate testing and responsiveness to
  failures
* address some of the mysql-oriented issue listed in [1]

I'd rather not see it happen, especially if it allows an easy step
to using cockroachdb. So I'd encourage you (and anyone else) to
participate in those reviews, especially if they are able to make
some commitments about future involvement.

[1] https://review.openstack.org/#/c/427880/
[2] https://review.openstack.org/#/c/465589/

--
Chris Dent  ┬──┬◡ﾉ(° -°ﾉ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

On 05/19/2017 09:22 AM, Sean Dague wrote:
> If you ran a room, please post the project, what you did in the room,
> what you think worked, what you would have done differently. If you
> attended a room you didn't run, please provide feedback about which one
> it was, and what you thought worked / didn't work from the other side of
> the table.

Project: Nova
Attendees: 25 - 30
Notes: (this conflicted with Baremetal/VM platform part 1, may have
impacted attendance)
Etherpad: https://etherpad.openstack.org/p/BOS-forum-nova-project-onboarding

What we did:

To get the room warmed up (it was the first post keynote session), we
prepared a document which was an annotated flow of the logs of booting
up a server with openstack client -
https://github.com/sdague/nova-boot-flow/blob/master/flow.rst - and
talked through all of that, fielding questions along the way. That
actually took about 45 minutes because 20 minutes in the room had warmed
up and started asking a bunch of questions (especially around scheduling
which always seems like a hot area).

We used the back half of the session for just audience questions. Some
of the more interesting ones were diving into what a context really is
(that's a pretty core concept in multiple projects, but one we forget is
new to people).

We did an adhoc diagramming of the basic api.py -> rpcapi.py ->
manager.py pattern in the code that hits all the different daemons. And
even looked at some of the directory structures on how this is organized.

There was a good conversation towards the end on debug strategies. Most
of us are print/log debuggers, but guru mediation was news to most folks
in the room. Definitely clear that there is a need for a pdb guide for
OpenStack (by someone that regularly uses it).

There was also a good discussion around types of arguments in Nova
function calls, and how much one can trust they know what a variable
named "instance" really is.

What worked:

It was really good to have some interactive technical content pre canned
to get the conversation going. Rooms start cold, and you need to get
people interactive.

Questions phase turned out really good. They also seemed pretty spread
around the audience.

Do differently next time:

Recording would have been great.

We did a poor job of fielding questions off the etherpad because my
laptop was being used show flows or answers. Next time it would be good
to have 2 computers up, one on the etherpad watching for questions from
quieter people there, while we have other relevant answer material on
the projector.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration


On 5/19/2017 1:46 AM, joehuang wrote:

Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?

More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered.


I think we've always stated that paging and sorting is not guaranteed to 
be perfect. With paging the marker is the last instance uuid in the last 
page, and if you create a new instance before querying for the next 
page, you might not find that new instance in the results. I don't think 
integrating searchlight is going to fix that as there is still latency 
in getting the new instance.create event results to searchlight so it's 
indexed.




It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.


As noted above, I don't see how Searchlight is going to fix the 
"instance created while in the middle of paging" issue. Searchlight may 
increase the performance of querying a large number of instances across 
dozens of cells, yes, that was the point in going down this path in the 
first place.




In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.


Doesn't that break the contract on the networking API if paging/sorting 
isn't supported when using Tricircle but it is supported when using 
Neutron's networking API directly? It's my understanding that Tricircle 
(and Cascading before it) are proxies to separate OpenStack deployments, 
which can be at various versions (maybe one deployment is mitaka, others 
are newton). But I would expect that the end user facing API is 
compatible with the native APIs, or is that not the case - and users 
understand that when using Tricircle / Cascading? If so, then how do 
libraries/SDKs and CLIs like openstackclient work with Tricircle?


The point of what we're trying to do in nova is expose the same API and 
honor it regardless of whether or not you're using a single cell or 10 
cells - it should be transparent to the end user of the cloud.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

This is a thread for anyone that participated in the onboarding rooms,
on either the presenter or audience side. Because we all went into this
creating things from whole cloth, I'm sure there are lots of lessons
learned.

If you ran a room, please post the project, what you did in the room,
what you think worked, what you would have done differently. If you
attended a room you didn't run, please provide feedback about which one
it was, and what you thought worked / didn't work from the other side of
the table.

Hopefully we can consolidate some of that feedback for best practices
going forward.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On 05/19/2017 09:04 AM, Chris Dent wrote:
> On Fri, 19 May 2017, Duncan Thomas wrote:
> 
>> On 19 May 2017 at 12:24, Sean Dague  wrote:
>>
>>> I do get the concerns of extra logic in Nova, but the decision to break
>>> up the working compute with network and storage problem space across 3
>>> services and APIs doesn't mean we shouldn't still make it easy to
>>> express some pretty basic and common intents.
>>
>> Given that we've similar needs for retries and race avoidance in and
>> between glance, nova, cinder and neutron, and a need or orchestrate
>> between at least these three (arguably other infrastructure projects
>> too, I'm not trying to get into specifics), maybe the answer is to put
>> that logic in a new service, that talks to those four, and provides a
>> nice simple API, while allowing the cinder, nova etc APIs to remove
>> things like internal retries?
> 
> This is what enamel was going to be, but we got stalled out because
> of lack of resources and the usual raft of other commitments:
> 
> https://github.com/jaypipes/enamel

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to *actually* do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] placement/resource providers update 23

Here's your placement and resource providers update 23. There wasn't
one last week because of summit. As usual, please let me know if
there is anything incorrect or missing.

# From Summit

Placement was all over the place at summit, coming up as a relevant
piece of the puzzle in many sessions.

Jay's sessions were well attended and the questions seemed to
indicate that people understood the potential. Videos:

* https://www.openstack.org/videos/boston-2017/scheduler-wars-a-new-hope
*
https://www.openstack.org/videos/boston-2017/scheduler-wars-revenge-of-the-split

One (of three) of the VM and Baremetal sessions chose "cross project
placement" as a high priorty. See the "winners" section at the end
of:

https://etherpad.openstack.org/p/BOS-forum-operating-vm-and-baremetal

# What Matters Most

Claims against the placement API remain the highest priorty. There's
plenty of other work in progress too which needs to advance. Lots of
links within.

# What's Changed

Besides advancing some of the code (notably in claims and the
api-ref), and the additional of some alternate approaches on doing
claims, many people are still digesting summit.

"get providers sharing capacity"
https://review.openstack.org/#/c/460798/ merged just before summit.
This is the first in a stack of changes to allow selecting resource
providers based on association in aggregates where one of the
members has a trait of MISC_SHARES_VIA_AGGREGATE. This doesn't do
anything yet in part because it is big stack and in part because
nothing is yet managing the addition of the trait and aggregate
creation.

# Help Wanted

(This section not changed since last time)

Areas where volunteers are needed.

* General attention to bugs tagged placement:
https://bugs.launchpad.net/nova/+bugs?field.tag=placement

* Helping to create api documentation for placement (see the Docs
section below).

* Helping to create and evaluate functional tests of the resource
tracker and the ways in which it and nova-scheduler use the
reporting client. For some info see

https://etherpad.openstack.org/p/nova-placement-functional

and talk to edleafe. He has a work in progress at:

https://review.openstack.org/#/c/446123/

that seeks input and assistance.

* Performance testing. If you have access to some nodes, some basic
benchmarking and profiling would be very useful.

# Main Themes

## Claims in the Scheduler

Work has started on placement-claims blueprint:

https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/placement-claims

There are two different tracks of that work. One explores doing the
claims from the conductor, one from within the scheduler. There's a
related email thread about the summit session on this topic which is
worth reviewing (along with the dicussions on gerrit):

http://lists.openstack.org/pipermail/openstack-dev/2017-May/117028.html

There's an interesting mix of trying to maintain sane code and
architecture while optimizing for common cases, preserving existing
(but not necessarily ideal) behaviors, and protecting against
pathology. Fun times.

## Traits

The main API is in place. Debate raged on how best to manage updates
of standard os-traits. In the end a cache similar to the one used
for resource classes was created:

https://review.openstack.org/#/c/462769/

The code at that review needs some tweaks before it can be merged,
but is almost there.

Work will be required at some point on filtering resource providers
based on traits, and adding traits to resource providers from the
resource tracker. There's been some discussion on that in the
reviews of shared providers (below) because it will be a part of
the same mass (MASS!) of SQL.

## Shared Resource Providers

This work (when finished) makes it possible (amongst other things)
for use of shared disk resources to be tracked correctly.

https://review.openstack.org/#/q/status:open+topic:bp/shared-resources-pike

Reviewers should be aware that the patches, at least as of today,
are structured in a way that evolves from the current state to the
eventual desired state in a way that duplicates some effort and
code. This was done intentionally by Jay to make the testing and
review more incremental. It's probably best to read through the
entire stack before jumping to any conclusions.

## Docs

https://review.openstack.org/#/q/topic:cd/placement-api-ref

Several reviews are in progress for documenting the placement API.
This is likely going to take quite a few iterations as we work out
the patterns and tooling. But it's great to see the progress and
when looking at the draft rendered docs it makes placement feel like
a real thing™.

Find me (cdent) or Andrey (avolkov) if you want to help out or have
other questions.

## Nested Resource Providers

(This section has not changed since last week)

On hold while attention is given to traits and claims. There's a
stack of code waiting until all of that settles:

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)


On Thu, 18 May 2017, Matt Riedemann wrote:

We didn't really get into this during the forum session, but there are 
different opinions within the nova dev team on how to do claims in the 
controller services (conductor vs scheduler). Sylvain Bauza has a series 
which uses the conductor service, and Ed Leafe has a series using the 
scheduler. More on that in the mailing list [3].


Since we've got multiple threads going on this topic, I put some
of my concerns in a comment on one of Ed's reviews:

https://review.openstack.org/#/c/465171/3//COMMIT_MSG@30

It's a bit left fieldy but tries to ask about some of the long term
concerns we may need to be thinking about here, with regard to other
services using placement and maybe them needing a
scheduler-like-thing too (because placement cannot do everything).

--
Chris Dent  ┬──┬◡ﾉ(° -°ﾉ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?


On Fri, 19 May 2017, Duncan Thomas wrote:


On 19 May 2017 at 12:24, Sean Dague  wrote:


I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.


Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?


This is what enamel was going to be, but we got stalled out because
of lack of resources and the usual raft of other commitments:

https://github.com/jaypipes/enamel

--
Chris Dent  ┬──┬◡ﾉ(° -°ﾉ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [telemetry] Room during the next PTG

2017-05-19 Thread gordon chung

On 18/05/17 05:56 PM, Julien Danjou wrote:
> Hi team,
>
> It's time for us to request a room (or share one) for the next PTG in
> September if we want to meet. Last time we did not. Do we want one this
> time?
>

if there are cross project discussions we can line up beforehand, it 
might be worthwhile. if not, i imagine it's not worthwhile to have us 
all fly to a different city so we can do exactly what we can do at home.

cheers,

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Duncan Thomas

On 19 May 2017 at 12:24, Sean Dague  wrote:

> I do get the concerns of extra logic in Nova, but the decision to break
> up the working compute with network and storage problem space across 3
> services and APIs doesn't mean we shouldn't still make it easy to
> express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

-- 
Duncan Thomas

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [horizon][api][docs] Feedback requested on proposed formatting change to API docs

On 05/17/2017 10:14 AM, Joe Topjian wrote:

On Tue, May 16, 2017 at 4:13 PM, Monty Taylor > wrote:

Hey all!

I read the API docs A LOT. (thank you to all of you who have worked
on writing them)

As I do, a gotcha I hit up against a non-zero amount is mapping the
descriptions of the response parameters to the form of the response
itself. Most of the time there is a top level parameter under which
either an object or a list resides, but the description lists list
the top level and the sub-parameters as siblings.

So I wrote a patch to os-api-ref taking a stab at providing a way to
show things a little differently:

https://review.openstack.org/#/c/464255/

You can see the output here:

http://docs-draft.openstack.org/55/464255/5/check/gate-nova-api-ref-src/f02b170//api-ref/build/html/

If you go expand either the GET / or the GET /servers/details
sections and go to look at their Response sections, you can see it
in action.

We'd like some feedback on impact from humans who read the API docs
decently regularly...

The questions:

- Does this help, hurt, no difference?

It helps. It seems noisy at first glance, but the information being
denoted is important. It's one of those situations where once you start
reading deeper into the information, this kind of markup makes the API
more understandable more quickly.

Awesome. Thanks!

- servers[].name - servers is a list, containing objects with a name
field. Good or bad?
- servers[].addresses.$network-name - addresses is an object and the
keys of the object are the name of the network in question.

Again, these seem noisy at first, but having parsed complex paths,
especially the above address info, by dumping variables too many times,
I really appreciate the above syntax.

Going even further:

servers[].addresses.$network-name[].OS-EXT-IPS-MAC:mac_addr

looks a mess, but I can see how exactly to navigate to the leaf as well
as understand what types make up the path. Being able to succinctly
(relatively/subjectively speaking) describe something like the above is
very helpful. This definitely gets my support.

Sweet. thanks for the feedback! I just pushed up a new rev yesterday
based on some feedback from Anne:

http://docs-draft.openstack.org/55/464255/5/check/gate-nova-api-ref-src/f02b170//api-ref/build/html/

That adds tags around the leaf element. so basically:

servers[].addresses.$network-name[].OS-EXT-IPS-MAC:mac_addr

becomes

servers[].addresses.$network-name[].**OS-EXT-IPS-MAC:mac_addr**

which I hope will ease any confusion when leaf elements have punctuation
in their names - like OS-EXT-IPS-MAC:mac_addr

Re: [openstack-dev] [glance][openstack-ansible] Moving on

2017-05-19 Thread Kekane, Abhishek

Thank you, Steve for your help.
Hope to see you back. .

All the best ☺

Abhishek


From: Steve Lewis [mailto:steve...@gmail.com]
Sent: Friday, May 19, 2017 9:26 AM
To: OpenStack
Subject: [openstack-dev] [glance][openstack-ansible] Moving on

It is clear to me now that I won't be able to work on OpenStack as a part of my 
next day job, wherever that ends up being. As such, I’ll no longer be able to 
invest the time and energy required to maintain my involvement in the 
community. It's time to resign my role as a core reviewer, effective 
immediately.
Thanks for all the fish.
--
SteveL

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 20)

2017-05-19 Thread Emilien Macchi

On Fri, May 19, 2017 at 5:23 AM, Attila Darazs  wrote:
> If the topics below interest you and you want to contribute to the
> discussion, feel free to join the next meeting:
>
> Time: Thursdays, 14:30-15:30 UTC
> Place: https://bluejeans.com/4113567798/
>
> Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Sorry I couldn't make it yesterday.

> = Using RDO Cloud for OVB jobs =
>
> We spent some time discussing the steps needed to start running a few OVB
> TripleO jobs on the new RDO Cloud, which seems to be a good shape to start
> utilizing it. We need to create new users for it and add the cloud
> definition to project-config among other things.
>
> When all is set up, we will ramp up the amount of jobs ran there slowly to
> test the stability and bottlenecks.
>
> = Old OVB jobs running without Quickstart =
>
> There are a couple of jobs that is still not transitioned running on a few
> repos. We need to figure out if those jobs are still needed and if yes,
> what's holding them back from transition.
>
> = CI jobs with containers =
>
> We talked about possible ways to update all the containers with fresh and
> gating packages. It's not a trivial problem and we will probably involve
> more container folks in it. The current idea is to create a container that
> could locally serve the DLRN hash packages, avoiding downloading them for
> each containers. However this will be still an IO intensive solution, but
> probably there's no way around it.

Just an FYI about new container jobs: https://review.openstack.org/#/c/466041/
They are now available, feel free to use them with "check experimental".

> = Gate instability, critical bug =
>
> The pingest failures are still plaguing the ovb-ha job, we really need a
> solution for this critical bug[1], as it fails around ~30 percent of the
> time. Please take a look if you can!

The last time I checked on Wednesday, I thought it was a timeout.
Could it be related to the transition to quickstart where some tasks
take more time (like image building, etc).

> Thank you for reading the summary.
>
> Best regards,
> Attila
>
> [1] https://bugs.launchpad.net/tripleo/+bug/1680195
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ptg] How to slice the week to minimize conflicts

2017-05-19 Thread Emilien Macchi

On Thu, May 18, 2017 at 5:27 AM, Thierry Carrez  wrote:
> Hi everyone,
>
> For the PTG events we have a number of rooms available for 5 days, of
> which we need to make the best usage. We also want to keep it simple and
> productive, so we want to minimize room changes (allocating the same
> room to the same group for one or more days).
>
> For the first PTG in Atlanta, we split the week into two groups.
> Monday-Tuesday for "horizontal" project team meetups (Infra, QA...) and
> workgroups (API WG, Goals helprooms...), and Wednesday-Friday for
> "vertical" project team meetups (Nova, Swift...). This kinda worked, but
> the feedback we received called for more optimizations and reduced
> conflicts.
>
> In particular, some projects which have a lot of contributors overlap
> (Storlets/Swift, or Manila/Cinder) were all considered "vertical" and
> happened at the same time. Also horizontal team members ended up having
> issues to attend workgroups, and had nowhere to go for the rest of the
> week. Finally, on Monday-Tuesday the rooms that had the most success
> were inter-project ones we didn't really anticipate (like the API WG),
> while rooms with horizontal project team meetups were a bit
> under-attended. While we have a lot of constraints, I think we can
> optimize a bit better.
>
> After giving it some thought, my current thinking is that we should
> still split the week in two, but should move away from an arbitrary
> horizontal/vertical split. My strawman proposal would be to split the
> week between inter-project work (+ teams that rely mostly on liaisons in
> other teams) on Monday-Tuesday, and team-specific work on Wednesday-Friday:
>
> Example of Monday-Tuesday rooms:
> Interop WG, Docs, QA, API WG, Packaging WG, Oslo, Goals helproom,
> Infra/RelMgt/support teams helpdesk, TC/SWG room, VM Working group...
>
> Example of Wednesday-Thursday or Wednesday-Friday rooms:
> Nova, Cinder, Neutron, Swift, TripleO, Kolla, Infra...

I like the idea of continuing to have Deployment tools part of
vertical projects room.
Though once it's confirmed, I would like to setup a 2 hours slot where
we meet together and make some cross-deployment-project collaboration.
In Atlanta, we managed to do it on last minute and I found it
extremely useful, let's repeat this but scheduled this time.

> (NB: in this example infra team members end up being available in a
> general support team helpdesk room in the first part of the week, and
> having a regular team meetup on the second part of the week)
>
> In summary, Monday-Tuesday would be mostly around themes, while
> Wednesday-Friday would be mostly around teams. In addition to that,
> teams that /prefer/ to run on Monday-Tuesday to avoid conflicting with
> another project meetup (like Manila wanting to avoid conflicting with
> Cinder, or Storlets wanting to avoid conflicting with Swift) could
> *choose* to go for Monday-Tuesday instead of Wednesday-Friday.
>
> It's a bit of a long shot (we'd still want to equilibrate both sides in
> terms of room usage, so it's likely that the teams that are late to
> decide to participate would be pushed on one side or the other), but I
> think it's a good incremental change that could solve some of the issues
> reported in the Atlanta week slicing, as well as generally make
> inter-project coordination simpler.
>
> If we adopt that format, we need to be pretty flexible in terms of what
> is a "workgroup": to me, any inter-project work that would like to have
> a one-day or two-day room should be able to get some.
> Nova-{Cinder,Neutron,Ironic} discussions would for example happen in the
> VM & BM working group room, but we can imagine others just like it.
>
> Let me know what you think. Also feel free to propose alternate creative
> ways to slice the space and time we'll have. We need to open
> registration very soon (June 1st is the current target), and we'd like
> to have a rough idea of the program before we do that (so that people
> can report which days they will attend more accurately).
>
> --
> Thierry Carrez (ttx)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance][openstack-ansible] Moving on

2017-05-19 Thread Major Hayden

On 05/18/2017 10:55 PM, Steve Lewis wrote:
> It is clear to me now that I won't be able to work on OpenStack as a part of 
> my next day job, wherever that ends up being. As such, I’ll no longer be able 
> to invest the time and energy required to maintain my involvement in the 
> community. It's time to resign my role as a core reviewer, effective 
> immediately.
> 
> Thanks for all the fish.

You will definitely be missed, Steve!  Thanks for everything you've done so far 
and for helping so many of us level up along the way. :)

--
Major Hayden

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

On 05/18/2017 08:19 PM, Matt Riedemann wrote:
> I just wanted to blurt this out since it hit me a few times at the
> summit, and see if I'm misreading the rooms.
> 
> For the last few years, Nova has pushed back on adding orchestration to
> the compute API, and even define a policy for it since it comes up so
> much [1]. The stance is that the compute API should expose capabilities
> that a higher-level orchestration service can stitch together for a more
> fluid end user experience.
> 
> One simple example that comes up time and again is allowing a user to
> pass volume type to the compute API when booting from volume such that
> when nova creates the backing volume in Cinder, it passes through the
> volume type. If you need a non-default volume type for boot from volume,
> the way you do this today is first create the volume with said type in
> Cinder and then provide that volume to the compute API when creating the
> server. However, people claim that is bad UX or hard for users to
> understand, something like that (at least from a command line, I assume
> Horizon hides this, and basic users should probably be using Horizon
> anyway right?).
> 
> While talking about claims in the scheduler and a top-level conductor
> for cells v2 deployments, we've talked about the desire to eliminate
> "up-calls" from the compute service to the top-level controller services
> (nova-api, nova-conductor and nova-scheduler). Build retries is one such
> up-call. CERN disables build retries, but others rely on them, because
> of how racy claims in the computes are (that's another story and why
> we're working on fixing it). While talking about this, we asked, "why
> not just do away with build retries in nova altogether? If the scheduler
> picks a host and the build fails, it fails, and you have to
> retry/rebuild/delete/recreate from a top-level service."
> 
> But during several different Forum sessions, like user API improvements
> [2] but also the cells v2 and claims in the scheduler sessions, I was
> hearing about how operators only wanted to expose the base IaaS services
> and APIs and end API users wanted to only use those, which means any
> improvements in those APIs would have to be in the base APIs (nova,
> cinder, etc). To me, that generally means any orchestration would have
> to be baked into the compute API if you're not using Heat or something
> similar.
> 
> Am I missing the point, or is the pendulum really swinging away from
> PaaS layer services which abstract the dirty details of the lower-level
> IaaS APIs? Or was this always something people wanted and I've just
> never made the connection until now?

Lots of people just want IaaS. See the fact that Google and Microsoft
both didn't offer it at first in their public clouds, and got pretty
marginal uptake while AWS ate the world. They have both reversed course
there.

The predictability of whether an intent is going to be fullfilled, and
"POST /servers" is definitely pretty clear intent, is directly related
to how much will are going to be willing to use the platform, build
tools for it. If it's much more complicated to build tooling on
OpenStack IaaS because that tooling needs to put everything in it's own
retry work queue, lots of folks will just give up.

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [forum] Writing applications for the VM and Baremetal Platform

2017-05-19 Thread John Garbutt

A quick summary of what happened in the writing applications for the
VM and Baremetal forum session. The etherpad is available here:
https://etherpad.openstack.org/p/BOS-forum-using-vm-and-baremetal

We had a good number of API users and API developers talking together
about the issues facing API users. It would be nice to have involved a
more diverse set of API users, but we have a reasonable starting
place.

There was general agreement on the need for API keys for applications
to access OpenStack APIs rather than forcing the user of passwords,
etc Plan A was to have a voting excercise on the most important
problems facing writing applications for the VM and Baremetal
platform. This was abandoned because there was a clear winer in
Keystone API keys. For example, LDAP passwords give you access to more
things than OpenStack, so you probably don't want to hand those out.
Currently service configuration files have lots of service user
passwords in them, API keys for each node feels a much better
solution, etc, etc.

Saddly the people previously working on this feature are no longer
working on OpenStack. Lance has been asking for help in this email
thread, where the conversation is now continuing:
http://lists.openstack.org/pipermail/openstack-dev/2017-May/116596.html

We agreed a clear next step, once API keys are implemented, was
working out how to limit the access that is granted to a particular
API key. Discussion around this was defferred to the forum session
called "Cloud-Aware Application support", more details here:
https://etherpad.openstack.org/p/pike-forum-cloud-applications

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [openstack-doc] [dev] What's up doc? Summit recap edition

2017-05-19 Thread Alexandra Settle

Hi everyone,

The OpenStack manuals project had a really productive week at the OpenStack 
summit in Boston. You can find a list of all the etherpads and attendees here: 
https://etherpad.openstack.org/p/docs-summit

As we all know, we are rapidly losing key contributors and core reviewers. We 
are not alone, this is happening across the board. It is making things harder, 
but not impossible. Since our inception in 2010, we’ve been climbing higher and 
higher trying to achieve the best documentation we could, and uphold our high 
standards. This is something to be incredibly proud of. However, we now need to 
take a step back and realise that the amount of work we are attempting to 
maintain is now out of reach for the team size that we have. At the moment we 
have 13 cores, of which none are full time contributors or reviewers. This 
includes myself.

That being said! I have spent the last week at the summit talking to some of 
our leaders, including Doug Hellmann (cc’d), Jonathan Bryce and Mike Perez 
regarding the future of the project. Between myself and other community 
members, we have been drafting plans and coming up with a new direction that 
will hopefully be sustainable in the long-term.

I am interested to hear your thoughts. I want to make sure that everyone feels 
that we’re headed in the right direction first and foremost. All of these 
action items are documented in this WIP etherpad: 
https://etherpad.openstack.org/p/doc-planning

Some further highlights from the event…


· The documentation team was represented by myself, Olga, and Alex 
Adamov for the Project Update: Documentation on the Monday. If you’d like to 
catch up with what we talked about, the video is available online now: 
https://www.youtube.com/watch?v=jcfbKxbpRvc The translation team PTL, Ian Choi, 
also had a session about getting more involved with the I18N team. You can view 
that video here: https://www.youtube.com/watch?v=ybFI4nez_Z8


· Ian and I also hosted the joint I18N and documentation onboarding 
session. We were visited by some friendly faces, and some new ones. Between Ian 
and myself, we discussed the documentation and translation workflows, and how 
to get involved (the mailing list, IRC channel, etc). Which was lots of fun :) 
we’d love to see more people there in the future, hopefully we’ll slowly get 
there!


· This week I was focusing heavily on making the community aware that 
the documentation team was struggling to maintain contributors, but continuing 
with the same amount of work. This was a heavy conversation to be having, but 
it posed some really interesting questions to key leaders, and hopefully raised 
appropriate concerns. Ildiko and I hosted “OpenStack documentation: The future 
depends on all of us”. This was a really interesting session. I was able to 
pose to the group of attendees that the documentation team was struggling to 
maintain contributions. Major Hayden was kind enough to take notes during the 
session, you can find those here: https://etherpad.openstack.org/p/doc-future  
The project teams that came and represented their groups were interested in 
discussing the project-specific documentation (is living in the project’s repo 
tree the best place?) and voiced concerns I had otherwise not heard before. I 
recommend reading the notes to get a better idea :)


· Kendall Nelson and Ildiko also hosted a session on the OpenStack 
Upstream Institute highlights. I recommend watching the video which is now live 
and available here: 
https://www.openstack.org/videos/boston-2017/openstack-upstream-institute-highlights


· One of the key takeaways from the summit was the session that I joint 
moderated with Melvin Hillsman regarding the Operations and Administration 
Guides. You can find the etherpad with notes here: 
https://etherpad.openstack.org/p/admin-ops-guides  The session was really 
helpful – we were able to discuss with the operators present the current 
situation of the documentation team, and how they could help us maintain the 
two guides, aimed at the same audience. The operator’s present at the session 
agreed that the Administration Guide was important, and could be maintained 
upstream. However, they voted and agreed that the best course of action for the 
Operations Guide was for it to be pulled down and put into a wiki that the 
operators could manage themselves. We will be looking at actioning this item as 
soon as possible.

These action items will free up the documentation team to become gate keepers 
and reviewers of documentation. Our key focus as a team will be on the tooling 
for the docs.openstack.org site (including the API docs).

I’m really interested to hear everyone’s thoughts going forward – this is not 
set in stone. We need to change our strategy, and now is the time. If you’d 
rather reach out and discuss this personally, asettle on IRC is always the best 
place to find me.

Thanks,

Alex

Re: [openstack-dev] [oslo][devstack][tooz][all] etcd 3.x as a base service

On 05/18/2017 07:31 PM, Davanum Srinivas wrote:
> Team,
> 
> Please take a look at this devstack review that adds a new etcd3 service:
> https://review.openstack.org/#/c/445432/
> 
> Waiting on infra team to help with creating a directory on
> tarballs.openstack.org with etcd release binaries as so far i haven't
> been able to get time/effort from ubuntu/debian distro folks. Fedora
> already has 3.1.x so no problem there. Another twist is that the ppc64
> arch support is not present in 3.1.x etcd.
> 
> Here are two options to enable the DLM use case with tooz (for
> eventlet based services, Note that non-eventlet based services can
> already use tooz with etcd3 with the driver added by Jay and Julien):
> https://review.openstack.org/#/c/466098/
> https://review.openstack.org/#/c/466109/
> 
> Please let me know here or in the review which one you would lean
> towards. The first one neatly separates the etcg3+v3alpha-grpc/gateway
> into a separate driver. the second one tries to be a bit more clever
> when to use grpc directly and when to use the v3alpha-grpc/gateway.

I'd like to put that sha256sum checking in before we land it and start
using it.

Also, when can we delete all the zookeeper in devstack? As the push
forward here was that this was our supported backend, I'd like to not
put devstack in the business of also zookeeper management.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)



Le 19/05/2017 12:19, John Garbutt a écrit :
> On 19 May 2017 at 10:03, Sylvain Bauza  wrote:
>>
>>
>> Le 19/05/2017 10:02, Sylvain Bauza a écrit :
>>>
>>>
>>> Le 19/05/2017 02:55, Matt Riedemann a écrit :
 The etherpad for this session is here [1]. The goal for this session was
 to inform operators and get feedback on the plan for what we're doing
 with moving claims from the computes to the control layer (scheduler or
 conductor).

 We mostly talked about retries, which also came up in the cells v2
 session that Dan Smith led [2] and will recap later.

 Without getting into too many details, in the cells v2 session we came
 to a compromise on build retries and said that we could pass hosts down
 to the cell so that the cell-level conductor could retry if needed (even
 though we expect doing claims at the top will fix the majority of
 reasons you'd have a reschedule in the first place).

>>>
>>> And during that session, we said that given cell-local conductors (when
>>> there is a reschedule) can't upcall the global (for all cells)
>>> schedulers, that's why we agreed to use the conductor to be calling
>>> Placement API for allocations.
>>>
>>>
 During the claims in the scheduler session, a new wrinkle came up which
 is the hosts that the scheduler returns to the top-level conductor may
 be in different cells. So if we have two cells, A and B, with hosts x
 and y in cell A and host z in cell B, we can't send z to A for retries,
 or x or y to B for retries. So we need some kind of post-filter/weigher
 filtering such that hosts are grouped by cell and then they can be sent
 to the cells for retries as necessary.

>>>
>>> That's already proposed for reviews in
>>> https://review.openstack.org/#/c/465175/
>>>
>>>
 There was also some side discussion asking if we somehow regressed
 pack-first strategies by using Placement in Ocata. John Garbutt and Dan
 Smith have the context on this (I think) so I'm hoping they can clarify
 if we really need to fix something in Ocata at this point, or is this
 more of a case of closing a loop-hole?

>>>
>>> The problem is that the scheduler doesn't verify the cells when trying
>>> to find a destination for an instance, it's just using weights for packing.
>>>
>>> So, for example, say I have N hosts and 2 cells, the first weighting
>>> host could be in cell1 while the second could be in cell2. Then, even if
>>> the operator uses the weighers for packing, for example a RequestSpec
>>> with num_instances=2 could push one instance in cell1 and the other in
>>> cell2.
>>>
>>> From a scheduler point of view, I think we could possibly add a
>>> CellWeigher that would help to pack instances within the same cell.
>>> Anyway, that's not related to the claims series, so we could possibly
>>> backport it for Ocata hopefully.
>>>
>>
>> Melanie actually made a good point about the current logic based on the
>> `host_subset_size`config option. If you're leaving it defaulted to 1, in
>> theory all instances coming along the scheduler would get a sorted list
>> of hosts by weights and only pick the first one (ie. packing all the
>> instances onto the same host) which is good for that (except of course
>> some user request that fits all the space of the host and where a spread
>> could be better by shuffling between multiple hosts).
>>
>> So, while I began deprecating that option because I thought the race
>> condition would be fixed by conductor claims, I think we should keep it
>> for the time being until we clearly identify whether it's still necessary.
>>
>> All what I said earlier above remains valid tho. In a world where 2
>> hosts are given as the less weighed ones, we could send instances from
>> the same user request onto different cells, but that only ties the
>> problem to a multi-instance boot problem, which is far less impactful.
> 
> FWIW, I think we need to keep this.
> 
> If you have *lots* of contention when picking your host, increasing
> host_subset_size should help reduce that contention (and maybe help
> increase the throughput). I haven't written a simulator to test it
> out, but it feels like we will still need to keep the fuzzy select.
> That might just be a different way to say the same thing mel was
> saying, not sure.
> 

Yup, agreed, thanks to Mel, that's why I'm providing a new revision that
is no longer removing this conf opt.

Melanie, very good point!

-Sylvain

> Thanks,
> johnthetubaguy
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

[openstack-dev] [forum] [vm] Action required to help improve Ops <-> Dev feedback loops

2017-05-19 Thread John Garbutt

Hi,

On the ops list I have started a thread about the forum session sumary
and the actions needed to keep things going. Please do join in over
there:
http://lists.openstack.org/pipermail/openstack-operators/2017-May/013448.html

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread John Garbutt

On 19 May 2017 at 10:03, Sylvain Bauza  wrote:
>
>
> Le 19/05/2017 10:02, Sylvain Bauza a écrit :
>>
>>
>> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>>> The etherpad for this session is here [1]. The goal for this session was
>>> to inform operators and get feedback on the plan for what we're doing
>>> with moving claims from the computes to the control layer (scheduler or
>>> conductor).
>>>
>>> We mostly talked about retries, which also came up in the cells v2
>>> session that Dan Smith led [2] and will recap later.
>>>
>>> Without getting into too many details, in the cells v2 session we came
>>> to a compromise on build retries and said that we could pass hosts down
>>> to the cell so that the cell-level conductor could retry if needed (even
>>> though we expect doing claims at the top will fix the majority of
>>> reasons you'd have a reschedule in the first place).
>>>
>>
>> And during that session, we said that given cell-local conductors (when
>> there is a reschedule) can't upcall the global (for all cells)
>> schedulers, that's why we agreed to use the conductor to be calling
>> Placement API for allocations.
>>
>>
>>> During the claims in the scheduler session, a new wrinkle came up which
>>> is the hosts that the scheduler returns to the top-level conductor may
>>> be in different cells. So if we have two cells, A and B, with hosts x
>>> and y in cell A and host z in cell B, we can't send z to A for retries,
>>> or x or y to B for retries. So we need some kind of post-filter/weigher
>>> filtering such that hosts are grouped by cell and then they can be sent
>>> to the cells for retries as necessary.
>>>
>>
>> That's already proposed for reviews in
>> https://review.openstack.org/#/c/465175/
>>
>>
>>> There was also some side discussion asking if we somehow regressed
>>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>>> Smith have the context on this (I think) so I'm hoping they can clarify
>>> if we really need to fix something in Ocata at this point, or is this
>>> more of a case of closing a loop-hole?
>>>
>>
>> The problem is that the scheduler doesn't verify the cells when trying
>> to find a destination for an instance, it's just using weights for packing.
>>
>> So, for example, say I have N hosts and 2 cells, the first weighting
>> host could be in cell1 while the second could be in cell2. Then, even if
>> the operator uses the weighers for packing, for example a RequestSpec
>> with num_instances=2 could push one instance in cell1 and the other in
>> cell2.
>>
>> From a scheduler point of view, I think we could possibly add a
>> CellWeigher that would help to pack instances within the same cell.
>> Anyway, that's not related to the claims series, so we could possibly
>> backport it for Ocata hopefully.
>>
>
> Melanie actually made a good point about the current logic based on the
> `host_subset_size`config option. If you're leaving it defaulted to 1, in
> theory all instances coming along the scheduler would get a sorted list
> of hosts by weights and only pick the first one (ie. packing all the
> instances onto the same host) which is good for that (except of course
> some user request that fits all the space of the host and where a spread
> could be better by shuffling between multiple hosts).
>
> So, while I began deprecating that option because I thought the race
> condition would be fixed by conductor claims, I think we should keep it
> for the time being until we clearly identify whether it's still necessary.
>
> All what I said earlier above remains valid tho. In a world where 2
> hosts are given as the less weighed ones, we could send instances from
> the same user request onto different cells, but that only ties the
> problem to a multi-instance boot problem, which is far less impactful.

FWIW, I think we need to keep this.

If you have *lots* of contention when picking your host, increasing
host_subset_size should help reduce that contention (and maybe help
increase the throughput). I haven't written a simulator to test it
out, but it feels like we will still need to keep the fuzzy select.
That might just be a different way to say the same thing mel was
saying, not sure.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Openstack-dev][Tacker] How to update vnf with user_data

2017-05-19 Thread Vishnu Pajjuri

Hi,
   Someone can help me to update vnf with user_data by using noop or
openwrt mgmt_driver.

Below is my tosca-config-user-data.yaml

vdus:
  VDU1:
config:
user_data: |
  #!/bin/sh
  echo "my hostname is `hostname`" > /tmp/hostname
  df -h > /tmp/diskinfo


Regards,
-vishnu
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tc] Status update, May 19

Hi!

This new regular email will give you an update on the status of a number
of TC-proposed governance changes, in an attempt to rely less on a
weekly meeting to convey that information.

In this first version, this email will highlight a number of topics
requiring more immediate attention. If interested, you can find the full
list of open topics at:
https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee


== Open discussions ==

The discussion around postgresql support in OpenStack is still very much
going on, with two slightly-different proposals up:

* Declare plainly the current state of Posgresql in OpenStack
[https://review.openstack.org/427880] (sdague)
* Document lack of postgresql support
[https://review.openstack.org/465589] (cdent)

Feel free to jump in there (or on the ML discussion at
http://lists.openstack.org/pipermail/openstack-dev/2017-May/116642.html)

We are also still discussing the details on new delays around TC voting
(in order to include all feedback before "approving"), as well as what
approval means in terms of required votes. If interested, jump on the
review @ https://review.openstack.org/463141


== Voting in progress ==

We have two items that now seem ready for voting:

* Add Queens goal split out tempest plugins
[https://review.openstack.org/369749]
* Fix comment about design summits [https://review.openstack.org/454070]


== Blocked items ==

Two new project teams were recently proposed.

The first one, Stackube, needs to be set up on OpenStack infrastructure
(and experiment with open colaboration there for a while) before the
proposal can be officially considered. However, we keep the review open
so that the discussion on scope fit with the OpenStack mission can
continue. dims has volunteer as a sponsor for this application, helping
the Stackube team through the process.

The second one is the Gluon proposal. A number of questions have been
asked, and we are waiting for the Gluon folks to address them. We could
use a TC member volunteer to "sponsor" that application, i.e. help that
team navigate through the process and be the preferred gateway for
communication. Any taker ?


== TC member actions for the coming week(s) ==

johnthetubaguy, cdent, dtroyer to distill TC vision feedback into
actionable points (and split between cosmetic and significant changes)
[https://review.openstack.org/453262]

sdague to follow up with mtreinish to see if he will push
assert:supports-api-compatibility to the end
[https://review.openstack.org/418010]

ttx to follow up with mordred to see if he will push
assert:never-breaks-compat to the end [https://review.openstack.org/446561]

johnthetubaguy to update "Describe what upstream support means" with a
new revision [https://review.openstack.org/440601]

ttx, dims to start discussion on official "help wanted" list

dims, ttx to finalize setting up TC discussion channel and defining TC
office hours

johnthetubaguy to update "Decisions should be globally inclusive" with a
new revision [https://review.openstack.org/460946]

johnthetubaguy to consider separating "Stop requiring public IRC
meetings" from parent change [https://review.openstack.org/462077]

flaper87 to update "Drop Technical Committee meetings" with a new
revision [https://review.openstack.org/459848]

dhellmann to post proposal for a policy on binary images publication,
following the thread at
http://lists.openstack.org/pipermail/openstack-dev/2017-May/116677.html''


== Need for a TC meeting next Tuesday ==

Based on the current status, nothing seems to require a synchronous TC
meeting to make progress. As discussed at the last meeting, we'll
therefore skip the weekly IRC meeting next week.

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tripleo] CI Squad Meeting Summary (week 20)

2017-05-19 Thread Attila Darazs

If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Using RDO Cloud for OVB jobs =

We spent some time discussing the steps needed to start running a few 
OVB TripleO jobs on the new RDO Cloud, which seems to be a good shape to 
start utilizing it. We need to create new users for it and add the cloud 
definition to project-config among other things.


When all is set up, we will ramp up the amount of jobs ran there slowly 
to test the stability and bottlenecks.


= Old OVB jobs running without Quickstart =

There are a couple of jobs that is still not transitioned running on a 
few repos. We need to figure out if those jobs are still needed and if 
yes, what's holding them back from transition.


= CI jobs with containers =

We talked about possible ways to update all the containers with fresh 
and gating packages. It's not a trivial problem and we will probably 
involve more container folks in it. The current idea is to create a 
container that could locally serve the DLRN hash packages, avoiding 
downloading them for each containers. However this will be still an IO 
intensive solution, but probably there's no way around it.


= Gate instability, critical bug =

The pingest failures are still plaguing the ovb-ha job, we really need a 
solution for this critical bug[1], as it fails around ~30 percent of the 
time. Please take a look if you can!


Thank you for reading the summary.

Best regards,
Attila

[1] https://bugs.launchpad.net/tripleo/+bug/1680195

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)



Le 19/05/2017 10:02, Sylvain Bauza a écrit :
> 
> 
> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>> The etherpad for this session is here [1]. The goal for this session was
>> to inform operators and get feedback on the plan for what we're doing
>> with moving claims from the computes to the control layer (scheduler or
>> conductor).
>>
>> We mostly talked about retries, which also came up in the cells v2
>> session that Dan Smith led [2] and will recap later.
>>
>> Without getting into too many details, in the cells v2 session we came
>> to a compromise on build retries and said that we could pass hosts down
>> to the cell so that the cell-level conductor could retry if needed (even
>> though we expect doing claims at the top will fix the majority of
>> reasons you'd have a reschedule in the first place).
>>
> 
> And during that session, we said that given cell-local conductors (when
> there is a reschedule) can't upcall the global (for all cells)
> schedulers, that's why we agreed to use the conductor to be calling
> Placement API for allocations.
> 
> 
>> During the claims in the scheduler session, a new wrinkle came up which
>> is the hosts that the scheduler returns to the top-level conductor may
>> be in different cells. So if we have two cells, A and B, with hosts x
>> and y in cell A and host z in cell B, we can't send z to A for retries,
>> or x or y to B for retries. So we need some kind of post-filter/weigher
>> filtering such that hosts are grouped by cell and then they can be sent
>> to the cells for retries as necessary.
>>
> 
> That's already proposed for reviews in
> https://review.openstack.org/#/c/465175/
> 
> 
>> There was also some side discussion asking if we somehow regressed
>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>> Smith have the context on this (I think) so I'm hoping they can clarify
>> if we really need to fix something in Ocata at this point, or is this
>> more of a case of closing a loop-hole?
>>
> 
> The problem is that the scheduler doesn't verify the cells when trying
> to find a destination for an instance, it's just using weights for packing.
> 
> So, for example, say I have N hosts and 2 cells, the first weighting
> host could be in cell1 while the second could be in cell2. Then, even if
> the operator uses the weighers for packing, for example a RequestSpec
> with num_instances=2 could push one instance in cell1 and the other in
> cell2.
> 
> From a scheduler point of view, I think we could possibly add a
> CellWeigher that would help to pack instances within the same cell.
> Anyway, that's not related to the claims series, so we could possibly
> backport it for Ocata hopefully.
> 

Melanie actually made a good point about the current logic based on the
`host_subset_size`config option. If you're leaving it defaulted to 1, in
theory all instances coming along the scheduler would get a sorted list
of hosts by weights and only pick the first one (ie. packing all the
instances onto the same host) which is good for that (except of course
some user request that fits all the space of the host and where a spread
could be better by shuffling between multiple hosts).

So, while I began deprecating that option because I thought the race
condition would be fixed by conductor claims, I think we should keep it
for the time being until we clearly identify whether it's still necessary.

All what I said earlier above remains valid tho. In a world where 2
hosts are given as the less weighed ones, we could send instances from
the same user request onto different cells, but that only ties the
problem to a multi-instance boot problem, which is far less impactful.



> 
>> We also spent a good chunk of the session talking about overhead
>> calculations for memory_mb and disk_gb which happens in the compute and
>> on a per-hypervisor basis. In the absence of automating ways to adjust
>> for overhead, our solution for now is operators can adjust reserved host
>> resource values (vcpus, memory, disk) via config options and be
>> conservative or aggressive as they see fit. Chris Dent and I also noted
>> that you can adjust those reserved values via the placement REST API but
>> they will be overridden by the config in a periodic task - which may be
>> a bug, if not at least a surprise to an operator.
>>
>> We didn't really get into this during the forum session, but there are
>> different opinions within the nova dev team on how to do claims in the
>> controller services (conductor vs scheduler). Sylvain Bauza has a series
>> which uses the conductor service, and Ed Leafe has a series using the
>> scheduler. More on that in the mailing list [3].
>>
> 
> Sorry, but I do remember we had a consensus on using conductor at least
> during the cells v2 session.
> 
> What I'm a bit afraid is that we're duplicating efforts on a sole
> blueprint while we all agreed to go that way.
> 
>> Next steps are going to be weighing both options between Sylvain and Ed,
>>

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-19 Thread Julien Danjou

On Fri, May 19 2017, Mehdi Abaakouk wrote:

> Not really, I just put some comments on reviews and discus this on IRC.
> Since nobody except Telemetry have expressed/try to get rid of eventlet.

TBH Keystone get rid of it too. But they only provide WSGI servers. They
don't build any daemon so they don't need and user either Cotyledon nor
oslo.service. :)

> Because the internal of cotyledon and oslo.service are so different.
> Having the code in oslo or not doesn't help for maintenance anymore.
> Cotyledon is a lib, code and bugs :) can already be shared between
> projects that doesn't want eventlet.

Cotyledon is explicitly better just by being out of Oslo, because it's
usable by the whole Python ecosystem. :)

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info

signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [release] Release countdown for week R-14 and R-13, May 22 - June 2

Welcome to our regular release countdown email!

Development Focus
-

At this stage, a few weeks before the Pike-2 milestone, work on major
feature work should be well under way. Team members who had the chance
to attend the Forum in Boston should collect their thoughts and bring
the feedback they received back to the rest of their team.


Actions
---

The release membership freeze[0] hits at the same date as the Pike-2
milestone, on June 8. If you want a new deliverable to be included in
the Pike release, you should publish a milestone or an intermediary
release for it before that date !

A significant number of projects are late answering for their Pike
series goals (answer was due for the Pike-1 milestone on April 13).

For the python35 goal [1]:
cinder, cloudkitty, designate, dragonflow, ec2api, freezer, fuel,
horizon, infrastructure, kolla, magnum, manila, mistral, charms,
openstackclient, oslo, packaging-{deb,rpm}, rally, refstack,
requirements, sahara, searchlight, security, senlin, swift, tacker,
telemetry, tricircle, winstackers, zaqar

For the deploy-api-in-wsgi goal [2]:
cinder, cloudkitty, designate, dragonflow, ec2api, freezer, fuel,
horizon, infrastructure, kolla, magnum, manila, mistral, charms,
openstackclient, oslo, packaging-{deb,rpm}, rally, refstack,
searchlight, security, senlin, swift, tacker, telemetry, tricircle,
winstackers, zaqar

Note that you have to submit a response even if the goal ends up not
triggering any work. See [3] for details.

[0] https://releases.openstack.org/pike/schedule.html#p-mf
[1]
http://git.openstack.org/cgit/openstack/governance/tree/goals/pike/python35.rst
[2]
http://git.openstack.org/cgit/openstack/governance/tree/goals/pike/deploy-api-in-wsgi.rst
[3] https://governance.openstack.org/tc/goals/index.html


Upcoming Deadlines & Dates
--

Pike-2 milestone: June 8
Queens PTG in Denver: Sept 11-15

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [gnocchi] Running Gnocchi API in specific interface

2017-05-19 Thread aalvarez

Understood. 

I have submitted a patch to pbr for review here:
https://review.openstack.org/#/c/466225/



--
View this message in context: 
http://openstack.10931.n7.nabble.com/gnocchi-Running-Gnocchi-API-in-specific-interface-tp135004p135095.html
Sent from the Developer mailing list archive at Nabble.com.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-19 Thread Mehdi Abaakouk


On Thu, May 18, 2017 at 03:16:20PM -0400, Mike Bayer wrote:



On 05/18/2017 02:37 PM, Julien Danjou wrote:

On Thu, May 18 2017, Mike Bayer wrote:


I'm not understanding this?  do you mean this?


In the long run, yes. Unfortunately, we're not happy with the way Oslo
libraries are managed and too OpenStack centric. I've tried for the last
couple of years to move things on, but it's barely possible to deprecate
anything and contribute, so I feel it's safer to start fresh and better
alternative. Cotyledon by Mehdi is a good example of what can be
achieved.



here's cotyledon:

https://cotyledon.readthedocs.io/en/latest/


replaces oslo.service with a multiprocessing approach that doesn't use 
eventlet.  great!  any openstack service that rides on oslo.service 
would like to be able to transparently switch from eventlet to 
multiprocessing the same way they can more or less switch to mod_wsgi 
at the moment.   IMO this should be part of oslo.service itself.   


I have quickly presented cotyledon some summit ago, we said we will wait
to see if other projects want to get rid of eventlet before adopting
such new lib (or merge it with oslo.service).

But for now, the lib is still under telemetry umbrella.

Keeping the current API and supporting both are (I think) impossible.
The current API is too eventlet centric. And some applications rely
on implicit internal contract/behavior/assumption.

Dealing about concurrent/thread/signal safety in multithreading app or
eventlet app is already hard enough. So having the lib that deals with
both is even harder. We already have oslo.messaging that deals with
3 threads models, this is just an unending story of race conditions.

Since a new API is needed, why not writing a new lib. Anyways when you
get rid of eventlet you have so many thing to change to ensure your
performance will not drop. Changing from oslo.service to cotyledon is
really easy on the side.

Docs state: "oslo.service being impossible to fix and bringing an 
heavy dependency on eventlet, "  is there a discussion thread on that?


Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.

For the story we first get rid of eventlet in Telemetry, fixes couple of
performance issue due to using threading/process instead
greenlet/greenthread.

Then we fall into some weird issue due to oslo.service internal
implementation. Process not exiting properly, signals not received,
deadlock when signal are received, unkillable process,
tooz/oslo.messaging heartbeat not scheduled correctly, worker not
restarted when they are dead. All of what we expect from oslo.service
was not working correctly anymore because we remove the line
'eventlet.monkeypatch()'.

For example, when oslo.service receive a signal, it can arrive on any
thread, this thread is paused, the callback is run in this thread
context, but if the callback try to discus to your code in this thread,
the process lockup, because your code is paused. Python
offers tool to avoid that (signal.set_wakeup_fd), but oslo.service don't
use it. I have tried to run callbacks only on the main thread with
set_wakeup_fd, to avoid this kind of issue but I fail. The whole
oslo.service code is clearly not designed to be threadsafe/signalsafe.
Well, it works for eventlet because you have only one real thread.

And this is just one example on complicated thing I have tried to fix,
before starting cotyledon.

I'm finding it hard to believe that only a few years ago, everyone saw 
the wisdom of not re-implementing everything in their own projects and 
using a common layer like oslo, and already that whole situation is 
becoming forgotten - not just for consistency, but also when a bug is 
found, if fixed in oslo it gets fixed for everyone.


Because the internal of cotyledon and oslo.service are so different.
Having the code in oslo or not doesn't help for maintenance anymore.
Cotyledon is a lib, code and bugs :) can already be shared between
projects that doesn't want eventlet.

An increase in the scope of oslo is essential to dealing with the 
issue of "complexity" in openstack. 


Increasing the scope of oslo works only if libs have maintainers. But
most of them lack of people today. Most of oslo libs are in maintenance
mode. But that another subject.

The state of openstack as dozens 
of individual software projects each with their own idiosyncratic 
quirks, CLIs, process and deployment models, and everything else that 
is visible to operators is ground zero for perceived operator 
complexity.


Cotyledon have been written to be Openstack agnostic. But I have also
write an optional module within the library to glue oslo.config and
cotyledon. Mainly to mimic the oslo.config options/reload of
oslo.service and make operators experience unchanged for Openstack
people.

--
Mehdi Abaakouk
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: PGP signature

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

Matt Riedemann wrote:
> [...]
> Am I missing the point, or is the pendulum really swinging away from
> PaaS layer services which abstract the dirty details of the lower-level
> IaaS APIs? Or was this always something people wanted and I've just
> never made the connection until now?

I feel like this is driven by a need for better UX on the IaaS APIs
layer (less calls, or more intuitive calls, as shown by shade UI). Even
if that IaaS layer is mostly accessed programmatically, it's not an
excuse for requiring 5 convoluted API calls and reading 5 pages of doc
for a basic action, when you could make it a single call.

So I'm not sure it's a recent change, or that it shows the demise of
PaaS layers, but that certainly shows that direct usage of IaaS APIs is
still a thing. If anything, the rise of application orchestration
frameworks like Kubernetes only separated the concerns -- provisioning
of application clusters might be done by someone else, but it still is
done by someone.

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)