subject:"\[openstack\-dev\] Is the pendulum swinging on PaaS layers\?"

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-26 Thread Zane Bitter


On 25/05/17 18:34, Matt Riedemann wrote:

On 5/22/2017 11:01 AM, Zane Bitter wrote:

If the user does a stack update that changes the network from 'auto'
to 'none', or vice-versa.


OK I guess we should make this a side discussion at some point, or hit
me up in IRC, but if you're requesting networks='none' with microversion

= 2.37 then nova should not allocate any networking, it should not

event attempt to do so.

Maybe the issue is the server is created with networks='auto' and has a
port, and then when you 'update the stack' it doesn't delete that server
and create a new one, but it tries to do something with the same server,


Yes, exactly. There are circumstances where Heat will replace a server 
because of a change in the configuration, but we want to have as few as 
possible of them and this is not one.



and in this case you'd have to detach the port(s) that were previously
created?


Yep, although this part is not that much different from what we had to 
do already when ports/networks change. The new part is handling the case 
where the user updates the network from 'none' -> 'auto'.



I don't know how Heat works, but if that's the case, then yeah that
doesn't sound fun, but I think Nova provides the APIs to be able to do
this.


Yep, it's all possible, since Nova talks to Neutron over a public API. 
Here is the implementation in Heat:


https://review.openstack.org/#/c/407328/16/heat/engine/resources/openstack/nova/server_network_mixin.py

The downside is that (in the update case) Heat has to call Neutron's 
get_auto_allocated_topology() itself rather than let Nova do it, so we 
now have some amount of duplicated logic that has to be kept in sync if 
anything ever changes in Nova/Neutron. It's definitely not the end of 
the world, but it's not entirely ideal either.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-26 Thread Jay Pipes


On 05/26/2017 02:53 AM, Chris Friesen wrote:

On 05/19/2017 04:06 PM, Dean Troyer wrote:
On Fri, May 19, 2017 at 4:01 PM, Matt Riedemann  
wrote:
I'm confused by this. Creating a server takes a volume ID if you're 
booting
from volume, and that's actually preferred (by nova devs) since then 
Nova

doesn't have to orchestrate the creation of the volume in the compute
service and then poll until it's available.

Same for ports - nova can create the port (default action) or get a 
port at

server creation time, which is required if you're doing trunk ports or
sr-iov / fancy pants ports.

Am I misunderstanding what you're saying is missing?


It turns out those are bad examples, they do accept IDs.


I was actually suggesting that maybe these commands in nova should 
*only* take IDs, and that nova itself should not set up either block 
storage or networking for you.


It seems non-intuitive to me that nova will do some basic stuff for you, 
but if you want something more complicated then you need to go do it a 
totally different way.


It seems to me that it'd be more logical if we always set up 
volumes/ports first, then passed the resulting UUIDs to nova.  This 
could maybe be hidden from the end-user by doing it in the client or 
some intermediate layer, but arguably nova proper shouldn't be in the 
proxying business.


You are describing the porcelain API that we've been talking about. :)

Viva enamel!

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-26 Thread Chris Friesen


On 05/19/2017 04:06 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 4:01 PM, Matt Riedemann  wrote:

I'm confused by this. Creating a server takes a volume ID if you're booting
from volume, and that's actually preferred (by nova devs) since then Nova
doesn't have to orchestrate the creation of the volume in the compute
service and then poll until it's available.

Same for ports - nova can create the port (default action) or get a port at
server creation time, which is required if you're doing trunk ports or
sr-iov / fancy pants ports.

Am I misunderstanding what you're saying is missing?


It turns out those are bad examples, they do accept IDs.


I was actually suggesting that maybe these commands in nova should *only* take 
IDs, and that nova itself should not set up either block storage or networking 
for you.


It seems non-intuitive to me that nova will do some basic stuff for you, but if 
you want something more complicated then you need to go do it a totally 
different way.


It seems to me that it'd be more logical if we always set up volumes/ports 
first, then passed the resulting UUIDs to nova.  This could maybe be hidden from 
the end-user by doing it in the client or some intermediate layer, but arguably 
nova proper shouldn't be in the proxying business.


Lastly, the existence of a partial proxy means that people ask for a more 
complete proxy--for example, specifying the vnic_type (for a port) or volume 
type (for a volume) when booting an instance.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-25 Thread Matt Riedemann


On 5/23/2017 10:23 AM, Zane Bitter wrote:
Yes! Everything is much easier if you tell all the users to re-architect 
their applications from scratch :) Which, I mean, if you can... great! 
Meanwhile here on planet Earth, it's 2017 and 95% of payment card 
transactions are still processed using COBOL at some point. (Studies 
show that 79% of statistics are made up, but I actually legit read this 
last week.)


That's one reason I don't buy any of the 'OpenStack is dead' commentary. 
If we respond appropriately to the needs of users who run a *mixture* of 
legacy, cloud-aware, and cloud-native applications then OpenStack will 
be relevant for a very long time indeed.


I enjoyed this, thank you.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-25 Thread Matt Riedemann


On 5/22/2017 11:01 AM, Zane Bitter wrote:
If the user does a stack update that changes the network from 'auto' to 
'none', or vice-versa.


OK I guess we should make this a side discussion at some point, or hit 
me up in IRC, but if you're requesting networks='none' with microversion 
>= 2.37 then nova should not allocate any networking, it should not 
event attempt to do so.


Maybe the issue is the server is created with networks='auto' and has a 
port, and then when you 'update the stack' it doesn't delete that server 
and create a new one, but it tries to do something with the same server, 
and in this case you'd have to detach the port(s) that were previously 
created?


I don't know how Heat works, but if that's the case, then yeah that 
doesn't sound fun, but I think Nova provides the APIs to be able to do this.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-25 Thread Chris Friesen


On 05/20/2017 10:36 AM, Monty Taylor wrote:

On 05/19/2017 03:13 PM, Monty Taylor wrote:

On 05/19/2017 01:53 PM, Sean Dague wrote:

On 05/19/2017 02:34 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper
documentation,
performance implications, and the like), not an end game on their own.


I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).


Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.


You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".


Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.


Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.


We also searching of all resources by name-or-id in shade. But it's one
call - GET /images - and then we test to see if the given value matches
the name field or the id field. And there is caching, so the list call
is done once in the session.

The thing I'm the saddest about is the Nova flavor "extra_info" that one
needs to grab for backwards compat but almost never has anything useful
in it. This causes me to make a billion API calls for the initial flavor
list (which is then cached of course) It would be WAY nicer if there was
a GET /flavors/detail that would just get me the whole lot in one go, fwiw.


Quick follow up on this one.

It was "extra_specs" I was thinking about - not "extra_info"

It used to be in the flavor as part of an extension (with a longer name) - we
fetch them in shade for backwards compat with the past when they were just
there. However, I've also learned from a follow up in IRC that these aren't
really things that were intended for me.


For what it's worth, there are cases where extra_specs are important to normal 
users because they constrain what image properties you are allowed to set.


Things like cpu_policy, cpu_thread_policy, memory page size, number of NUMA 
nodes, etc. can all be set in both places, and they behave differently if there 
is a mismatch between the flavor extra_spec and the image property.


Because of this I think it makes sense for a normal person to be able to look at 
flavor extra_specs so that they can create an image with suitable properties to 
be able to boot up an instance with that image on that flavor.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-23 Thread Zane Bitter


On 22/05/17 22:58, Jay Pipes wrote:

On 05/22/2017 12:01 PM, Zane Bitter wrote:

On 19/05/17 17:59, Matt Riedemann wrote:

I'm not really sure what you're referring to here with 'update' and [1].
Can you expand on that? I know it's a bit of a tangent.


If the user does a stack update that changes the network from 'auto'
to 'none', or vice-versa.


Detour here, apologies...

Why would it matter whether a user changes a stack definition for some
resource from auto-created network to none? Why would you want
*anything* to change about instances that had already been created by
Heat with the previous version of the stack definition?


The short answer is that's just how Heat works. A large part of the 
value is the ability with Heat to make changes to your application over 
time by describing it declaratively. (In the past I've compared this to 
the advantage configuration management tools provided over shell scripts 
- e.g. in 
https://www.openstack.org/videos/atlanta-2013/introduction-to-openstack-orchestration).



In other words, why shouldn't the change to the stack simply affect
*new* resources that the stack might create?


Our job is to make the world look like the template the user provides. 
If the user changes something, Heat takes them seriously and does not 
imagine that it knows better than the user what the user wants. If the 
user doesn't want to change anything then they're welcome to not change 
the template.


(We *could* do better on protection against accidental changes... 
there's an update-preview command and ways of marking resources as 
immutable such that updates will fail if they try to change it, but I 
don't know that the workflow/UX is great. There are some technical 
limitations on how much we can even determine in update-preview.)



After all, get-me-a-network
is intended for instance *creation* and nothing else...


So it may be intended for that, but there's any number of legitimate 
reasons why a user might want to change things after the server is created:


* Server was created with network: none, but something went horribly 
wrong and now you need to ssh in to debug it.
* Server was created with network: auto, but it was compromised by an 
attacker and now you want to get it off the network while you conduct a 
post-mortem through the console.
* Server was created with network: auto, but now you need more 
sophisticated networking and you don't want to delete your server and 
all its data to change it.




That's why it's dangerous, as Matt said in another part of the thread, 
to just do the easy part of the job (create) and forget about how a 
feature will interact with all of the other things that can happen over 
time. At the very least you want a way for users to move from the 'easy' 
way to the 'full control' way without starting over. (Semi-professional 
cameras and digital oscilloscopes are a couple of examples of where this 
is routinely done very well.)


(None of this is to suggest that get-me-a-network is a particularly bad 
offender here - it isn't IMO.)



Why not treat already-provisioned resources of a stack as immutable once
provisioned? That is, after all, one of the primary benefits of a "cloud
native application" -- immutability of application images once deployed
and the clean separation of configuration from data.


I could equally ask why Nova and Neutron allow stuff to be changed after 
it has been provisioned? Heat is only providing an interface to public 
APIs that exist. You can bet that if we told our users that they can't 
use those APIs because we know better than them, we'd have a long list 
of feature request and many fewer users.


There are some things that cannot be changed through the underlying APIs 
once a resource is created, and in those cases we mark the property with 
'update_allowed=False' in the resource schema. However, if it _does_ 
change then Heat will create a _new_ resource with the property value 
you want, and delete the original. So we could have done that with the 
get-me-a-network thing, but it wouldn't have been the Right Thing for 
our users.



This is one of the reasons that the (application) container world has it
easy with regards to resource management.


Yes! Everything is much easier if you tell all the users to re-architect 
their applications from scratch :) Which, I mean, if you can... great! 
Meanwhile here on planet Earth, it's 2017 and 95% of payment card 
transactions are still processed using COBOL at some point. (Studies 
show that 79% of statistics are made up, but I actually legit read this 
last week.)


That's one reason I don't buy any of the 'OpenStack is dead' commentary. 
If we respond appropriately to the needs of users who run a *mixture* of 
legacy, cloud-aware, and cloud-native applications then OpenStack will 
be relevant for a very long time indeed.



If you need to change the
sizing of a deployment [1], Kubernetes doesn't need to go through all
the hoops we do in

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-22 Thread Jay Pipes

On 05/22/2017 12:01 PM, Zane Bitter wrote:

On 19/05/17 17:59, Matt Riedemann wrote:

I'm not really sure what you're referring to here with 'update' and [1].
Can you expand on that? I know it's a bit of a tangent.

If the user does a stack update that changes the network from 'auto' to
'none', or vice-versa.

Detour here, apologies...

Why would it matter whether a user changes a stack definition for some
resource from auto-created network to none? Why would you want
*anything* to change about instances that had already been created by
Heat with the previous version of the stack definition?

In other words, why shouldn't the change to the stack simply affect
*new* resources that the stack might create? After all, get-me-a-network
is intended for instance *creation* and nothing else...

Why not treat already-provisioned resources of a stack as immutable once
provisioned? That is, after all, one of the primary benefits of a "cloud
native application" -- immutability of application images once deployed
and the clean separation of configuration from data.

This is one of the reasons that the (application) container world has it
easy with regards to resource management. If you need to change the
sizing of a deployment [1], Kubernetes doesn't need to go through all
the hoops we do in resize/migrate/live-migrate. They just blow away one
or more of the application container replicas [2] and start up new ones.
[3] Of course, this doesn't work out so well with stateful applications
(aka the good ol' Nova VM), which is why there's a whole slew of
constraints on the automatic orchestration potential of StatefulSets in
Kubernetes [4], constraints that (surprise!) map pretty much one-to-one
with all the Heat resource dependency management bugs that you
highlighted in a previous ML response (network identifier is static and
must follow a pre-described pattern, storage for all pods in the
StatefulSet must be a PersistentVolume, updating a StatefulSet is
currently a manual process, etc).

Best,
-jay

[1] A deployment in the Kubernetes sense of the term, ala
https://kubernetes.io/docs/concepts/workloads/controllers/deployment

[2]
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L508

[3] In fact, changing the size/scale of a deployment *does not*
automatically trigger any action in Kubernetes. Only changes to the
configuration of the deployment's containers (.spec.template) will
automatically trigger some action being taken.

[4]
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-22 Thread Jay Pipes


On 05/21/2017 03:56 PM, Monty Taylor wrote:

On 05/19/2017 05:10 PM, Matt Riedemann wrote:

On 5/19/2017 3:35 PM, Monty Taylor wrote:

Heck - while I'm on floating ips ... if you have some pre-existing
floating ips and you want to boot servers on them and you want to do
that in parallel, you can't. You can boot a server with a floating ip
that did not pre-exist if you get the port id of the fixed ip of the
server then pass that id to the floating ip create call. Of course,
the server doesn't return the port id in the server record, so at the
very least you need to make a GET /ports.json?device_id={server_id}
call. Of course what you REALLY need to find is the port_id of the ip
of the server that came from a subnet that has 'gateway_ip' defined,
which is even more fun since ips are associated with _networks_ on the
server record and not with subnets.


A few weeks ago I think we went down this rabbit hole in the nova
channel, which led to this etherpad:

https://etherpad.openstack.org/p/nova-os-interfaces

It was really a discussion about the weird APIs that nova has and a lot
of the time our first question is, "why does it return this, or that, or
how is this consumed even?", at which point we put out the Monty signal.


That was a fun conversation!


During a seemingly unrelated forum session on integrating searchlight
with nova-api, operators in the room were saying they wanted to see
ports returned in the server response body, which I think Monty was also
saying when we were going through that etherpad above.


I'd honestly like the contents you get from os-interfaces just always be 
returned as part of the server record. Having it as a second REST call 
isn't terribly helpful - if I need to make an additional call per 
server, I might as well just go call neutron. That way the only 
per-server query I really need to make is GET 
/ports.json?device_id={server_id} - since networks and subnets can be 
cached.


However, if I could do GET /servers/really-detailed or something and get 
/servers/detail + /os-interfaces in one go for all of the servers in my 
project, that would be an efficiency win.


It seems you're asking us really to get rid of REST and implement a 
GraphQL API.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-22 Thread Zane Bitter

On 19/05/17 17:59, Matt Riedemann wrote:

On 5/19/2017 9:36 AM, Zane Bitter wrote:

The problem is that orchestration done inside APIs is very easy to do
badly in ways that cause lots of downstream pain for users and
external orchestrators. For example, Nova already does some
orchestration: it creates a Neutron port for a server if you don't
specify one. (And then promptly forgets that it has done so.) There is
literally an entire inner platform, an orchestrator within an
orchestrator, inside Heat to try to manage the fallout from this. And
the inner platform shares none of the elegance, such as it is, of Heat
itself, but is rather a collection of cobbled-together hacks to deal
with the seemingly infinite explosion of edge cases that we kept
running into over a period of at least 5 releases.

I'm assuming you're talking about how nova used to (years ago) not keep
track of which ports it created and which ones were provided when
creating a server or attaching ports to an existing server. That was
fixed quite awhile ago, so I assume anything in Heat at this point is no
longer necessary and if it is, then it's a bug in nova. i.e. if you
provide a port when creating a server, when you delete the server, nova
should not delete the port. If nova creates the port and you delete the
server, nova should then delete the port also.

Yeah, you're right, I believe that (long-fixed) bug may have been the
genesis of it: https://bugs.launchpad.net/nova/+bug/1158684 but I could
be mixing some issues up in my head, because I personally haven't done a
lot of reviews in this specific area of the code.

Here is the most recent corner-case fix, which is a good example some of
the subtleties involved in managing a combination of explicit and
'magical' interactions with other resources:

https://review.openstack.org/#/c/450724/2/heat/engine/resources/openstack/nova/server_network_mixin.py

The get-me-a-network thing is... better, but there's no provision for
changes after the server is created, which means we have to copy-paste
the Nova implementation into Heat to deal with update.[1] Which sounds
like a maintenance nightmare in the making. That seems to be a common
mistake: to assume that once users create something they'll never need
to touch it again, except to delete it when they're done.

I'm not really sure what you're referring to here with 'update' and [1].
Can you expand on that? I know it's a bit of a tangent.

If the user does a stack update that changes the network from 'auto' to
'none', or vice-versa.

Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done
superbly well, provide transparency for external orchestration tools
that need to hook in to the data flow, and should be developed in
consultation with potential consumers like Shade and Heat.

Agree, this is why we push back on baking in more orchestration into
Nova, because we generally don't do it well, or don't test it well, and
end up having half-baked things which are a constant source of pain,
e.g. boot from volume - that might work fine when creating and deleting
a server, but what happens when you try to migrate, resize, rebuild,
evacuate or shelve that server?

Yeah, exactly. There is a really long tail of stuff that is easy to forget.

Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?

(Aside: can we stop using the term 'PaaS' to refer to "everything that
Nova doesn't do"? This habit is not helping us to communicate clearly.)

Sorry, as I said in response to sdague elsewhere in this thread, I tend
to lump PaaS and orchestration / porcelain tools together, but that's
not my intent in starting this thread. I was going to say we should have
a glossary for terms in OpenStack, but we do, and both are listed. :)

https://docs.openstack.org/user-guide/common/glossary.html

Hmm, I don't love the example in that definition either.
https://review.openstack.org/466773

cheers,
Zane.

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-21 Thread Monty Taylor


On 05/19/2017 05:10 PM, Matt Riedemann wrote:

On 5/19/2017 3:35 PM, Monty Taylor wrote:

Heck - while I'm on floating ips ... if you have some pre-existing
floating ips and you want to boot servers on them and you want to do
that in parallel, you can't. You can boot a server with a floating ip
that did not pre-exist if you get the port id of the fixed ip of the
server then pass that id to the floating ip create call. Of course,
the server doesn't return the port id in the server record, so at the
very least you need to make a GET /ports.json?device_id={server_id}
call. Of course what you REALLY need to find is the port_id of the ip
of the server that came from a subnet that has 'gateway_ip' defined,
which is even more fun since ips are associated with _networks_ on the
server record and not with subnets.


A few weeks ago I think we went down this rabbit hole in the nova
channel, which led to this etherpad:

https://etherpad.openstack.org/p/nova-os-interfaces

It was really a discussion about the weird APIs that nova has and a lot
of the time our first question is, "why does it return this, or that, or
how is this consumed even?", at which point we put out the Monty signal.


That was a fun conversation!


During a seemingly unrelated forum session on integrating searchlight
with nova-api, operators in the room were saying they wanted to see
ports returned in the server response body, which I think Monty was also
saying when we were going through that etherpad above.


I'd honestly like the contents you get from os-interfaces just always be 
returned as part of the server record. Having it as a second REST call 
isn't terribly helpful - if I need to make an additional call per 
server, I might as well just go call neutron. That way the only 
per-server query I really need to make is GET 
/ports.json?device_id={server_id} - since networks and subnets can be 
cached.


However, if I could do GET /servers/really-detailed or something and get 
/servers/detail + /os-interfaces in one go for all of the servers in my 
project, that would be an efficiency win.



This goes back to a common issue we/I have in nova which is we don't
know who is using which APIs and how. The user survey isn't going to
give us this data. Operators probably don't have this data, unless they
are voicing it as API users themselves. But it would be really useful to
know, which gaps are various tools in the ecosystem needing to overcome
by making multiple API calls to possibly multiple services to get a
clear picture to answer some question, and how can we fix that in a
single place (maybe the compute API)? A backlog spec in nova could be a
simple place to start, or just explaining the gaps in the mailing list
(separate targeted thread of course).




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-21 Thread Mehdi Abaakouk


On Fri, May 19, 2017 at 02:04:05PM -0400, Sean Dague wrote:

You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed).


Not related to the topic, but Ceilometer doesn't have this issue
anymore. Since Nova writes the uuid of the instance inside the libvirt
instance metadata. We just associate libvirt metrics to the instance
uuid. And then correlate them with the full metdata we receive via
notification. We don't poll nova at all anymore.

--
Mehdi Abaakouk
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-20 Thread Monty Taylor


On 05/19/2017 04:27 PM, Matt Riedemann wrote:

On 5/19/2017 3:03 PM, Monty Taylor wrote:

On 05/19/2017 01:04 PM, Sean Dague wrote:

On 05/19/2017 01:38 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the
nova boot
request should only take neutron port ids and cinder volume ids.
The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this
subset of
simple things can be done directly in a nova boot command, but for
more
complicated stuff you have to go use these other commands".  I
think there's
an argument to be made that it would be better to be consistent
even for the
simple things.


cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.


It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.


Yup. Agree. Making forward progress on that is paramount.


I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and...
boom.


You can actually get fairly far (with a few notable exceptions - I'm
looking at you unattached floating ips) without state tracking. It
comes at the cost of more API spidering after a failure/restart. Being
able to cache stuff aggressively combined with batching/rate-limiting
of requests to the cloud API allows one to do most of this to a fairly
massive scale statelessly. However, caching, batching and
rate-limiting are all pretty much required else you wind up crashing
public clouds. :)

I agree that the things are currently duct tape, but I don't think
that has to be a bad thing. The duct tape is currently needed
client-side no matter what we do, and will be for some time no matter
what we do because of older clouds. What's often missing is closing
the loop so that we can, as OpenStack, eventually provide out of the
box the consume experience that people currently get from using one of
the client-side duct tapes. That part is harder, but it's definitely
important.


You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10


Right. This is why communication is essential. I'm optimistic we can
do well on this topic, because we are MUCH better are talking to each
other now than we were back when ceilometer was started.

Also, a REST-consuming porcelain like oaktree gets to draw on
real-world experience consuming OpenStack's REST APIs at scale. So
it's also not the same problem setup, since it's not a from-scratch
new thing.

This is, incidentally, why experience with caching and batching is
important. There is a reason why we do GET /servers/detail once every
5 seconds rather than doing a specific GET /server/{id}/detail calls
for each booting VM.

Look at what we could learn just from that... Users using shade are
doing a full detailed server list because it scales better for
concurrency. It's obviously more expensive on a single-call basis. BUT
- maybe it's useful information that doing optimization work on GET
/servers/detail could be beneficial.


This reminds me that I suspect we're lazy-loading server detail
information in certain cases, i.e. going back to the DB to do a join
per-instance after we've already pulled all instances in an initial set
(with some initial joins). I need to pull this thread again...

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-20 Thread Monty Taylor


On 05/19/2017 03:13 PM, Monty Taylor wrote:

On 05/19/2017 01:53 PM, Sean Dague wrote:

On 05/19/2017 02:34 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper
documentation,
performance implications, and the like), not an end game on their own.


I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).


Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.


You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".


Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.


Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.


We also searching of all resources by name-or-id in shade. But it's one
call - GET /images - and then we test to see if the given value matches
the name field or the id field. And there is caching, so the list call
is done once in the session.

The thing I'm the saddest about is the Nova flavor "extra_info" that one
needs to grab for backwards compat but almost never has anything useful
in it. This causes me to make a billion API calls for the initial flavor
list (which is then cached of course) It would be WAY nicer if there was
a GET /flavors/detail that would just get me the whole lot in one go, fwiw.


Quick follow up on this one.

It was "extra_specs" I was thinking about - not "extra_info"

It used to be in the flavor as part of an extension (with a longer name) 
- we fetch them in shade for backwards compat with the past when they 
were just there. However, I've also learned from a follow up in IRC that 
these aren't really things that were intended for me.


So I'll re-frame this point slightly ...

As a user it's often quite difficult to tell what general intent is 
related to use of resources - whether they are intended for general 
users, or whether they are intended for admins. I guess a lot, and 
sometimes I get it right, and sometimes I don't. I know, I know - policy 
makes it so that different cloud deployers can have _vastly_ different 
opinions on this. But a clear intent from us (greatly helped, btw, by 
putting default policy in code) of "this call, this resource, this field 
is intended for normal users, but this one is intended for admin users" 
would have certainly helped me many times in the past.


Thanks for the IRC chat!


Dean has a harder time than I do with that one because osc interactions
are lots of process invocations from scratch. We chatted a bit about how
to potentially share caching things in Boston, but not sure we've come
up with more.


All for new and better experiences. I think that's great. Where I think
we want to be really careful is deciding the path to creating better
experiences is by not engaging with the services and just writing around
it. That feedback has to come back. Those reasons have to come back, and
we need to roll sensible improvements back into base services.

If you want to go fast, go alone, if you want to go far, go together.


Couldn't agree more . I think we're getting better at that communication.

We still have a hole, which is that the path from "this is a problem and
here's how I'm working around it" to "there are devs tasked to work on
solving that problem" is a hard one, because while the communication
from those of us doing client-layer stuff with the folks doing the
servers is pretty good - the communication loop with the folks at the
companies who are prioritizing work ... not so much. Look at the number
of people hacking on shade or python-openstackclient or writing
user-facing docs compared to folks adding backend features to the services.

So - yes, I totally agree. But also, we can make and are making a lot of
progress in some areas with tiny crews. That's gonna likely be the state
of the world

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Matt Riedemann


On 5/19/2017 3:35 PM, Monty Taylor wrote:
Heck - while I'm on floating ips ... if you have some pre-existing 
floating ips and you want to boot servers on them and you want to do 
that in parallel, you can't. You can boot a server with a floating ip 
that did not pre-exist if you get the port id of the fixed ip of the 
server then pass that id to the floating ip create call. Of course, the 
server doesn't return the port id in the server record, so at the very 
least you need to make a GET /ports.json?device_id={server_id} call. Of 
course what you REALLY need to find is the port_id of the ip of the 
server that came from a subnet that has 'gateway_ip' defined, which is 
even more fun since ips are associated with _networks_ on the server 
record and not with subnets.


A few weeks ago I think we went down this rabbit hole in the nova 
channel, which led to this etherpad:


https://etherpad.openstack.org/p/nova-os-interfaces

It was really a discussion about the weird APIs that nova has and a lot 
of the time our first question is, "why does it return this, or that, or 
how is this consumed even?", at which point we put out the Monty signal.


During a seemingly unrelated forum session on integrating searchlight 
with nova-api, operators in the room were saying they wanted to see 
ports returned in the server response body, which I think Monty was also 
saying when we were going through that etherpad above.


This goes back to a common issue we/I have in nova which is we don't 
know who is using which APIs and how. The user survey isn't going to 
give us this data. Operators probably don't have this data, unless they 
are voicing it as API users themselves. But it would be really useful to 
know, which gaps are various tools in the ecosystem needing to overcome 
by making multiple API calls to possibly multiple services to get a 
clear picture to answer some question, and how can we fix that in a 
single place (maybe the compute API)? A backlog spec in nova could be a 
simple place to start, or just explaining the gaps in the mailing list 
(separate targeted thread of course).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Dean Troyer

On Fri, May 19, 2017 at 4:01 PM, Matt Riedemann  wrote:
> I'm confused by this. Creating a server takes a volume ID if you're booting
> from volume, and that's actually preferred (by nova devs) since then Nova
> doesn't have to orchestrate the creation of the volume in the compute
> service and then poll until it's available.
>
> Same for ports - nova can create the port (default action) or get a port at
> server creation time, which is required if you're doing trunk ports or
> sr-iov / fancy pants ports.
>
> Am I misunderstanding what you're saying is missing?

It turns out those are bad examples, they do accept IDs.

The point though was there _are_ times when what you want is not what
the opinionated composed API gives you (as much as I _do_ like those).
It isn't so much making more REST calls, but a similar number of
different ones that are actually more efficient in the long run.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Matt Riedemann


On 5/19/2017 9:36 AM, Zane Bitter wrote:


The problem is that orchestration done inside APIs is very easy to do 
badly in ways that cause lots of downstream pain for users and external 
orchestrators. For example, Nova already does some orchestration: it 
creates a Neutron port for a server if you don't specify one. (And then 
promptly forgets that it has done so.) There is literally an entire 
inner platform, an orchestrator within an orchestrator, inside Heat to 
try to manage the fallout from this. And the inner platform shares none 
of the elegance, such as it is, of Heat itself, but is rather a 
collection of cobbled-together hacks to deal with the seemingly infinite 
explosion of edge cases that we kept running into over a period of at 
least 5 releases.


I'm assuming you're talking about how nova used to (years ago) not keep 
track of which ports it created and which ones were provided when 
creating a server or attaching ports to an existing server. That was 
fixed quite awhile ago, so I assume anything in Heat at this point is no 
longer necessary and if it is, then it's a bug in nova. i.e. if you 
provide a port when creating a server, when you delete the server, nova 
should not delete the port. If nova creates the port and you delete the 
server, nova should then delete the port also.




The get-me-a-network thing is... better, but there's no provision for 
changes after the server is created, which means we have to copy-paste 
the Nova implementation into Heat to deal with update.[1] Which sounds 
like a maintenance nightmare in the making. That seems to be a common 
mistake: to assume that once users create something they'll never need 
to touch it again, except to delete it when they're done.


I'm not really sure what you're referring to here with 'update' and [1]. 
Can you expand on that? I know it's a bit of a tangent.




Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done 
superbly well, provide transparency for external orchestration tools 
that need to hook in to the data flow, and should be developed in 
consultation with potential consumers like Shade and Heat.


Agree, this is why we push back on baking in more orchestration into 
Nova, because we generally don't do it well, or don't test it well, and 
end up having half-baked things which are a constant source of pain, 
e.g. boot from volume - that might work fine when creating and deleting 
a server, but what happens when you try to migrate, resize, rebuild, 
evacuate or shelve that server?





Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?


(Aside: can we stop using the term 'PaaS' to refer to "everything that 
Nova doesn't do"? This habit is not helping us to communicate clearly.)


Sorry, as I said in response to sdague elsewhere in this thread, I tend 
to lump PaaS and orchestration / porcelain tools together, but that's 
not my intent in starting this thread. I was going to say we should have 
a glossary for terms in OpenStack, but we do, and both are listed. :)


https://docs.openstack.org/user-guide/common/glossary.html

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Matt Riedemann


On 5/19/2017 3:03 PM, Monty Taylor wrote:

On 05/19/2017 01:04 PM, Sean Dague wrote:

On 05/19/2017 01:38 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova 
boot
request should only take neutron port ids and cinder volume ids.  
The actual

setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this 
subset of

simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands".  I think 
there's
an argument to be made that it would be better to be consistent even 
for the

simple things.


cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.


It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.


Yup. Agree. Making forward progress on that is paramount.


I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... 
boom.


You can actually get fairly far (with a few notable exceptions - I'm 
looking at you unattached floating ips) without state tracking. It comes 
at the cost of more API spidering after a failure/restart. Being able to 
cache stuff aggressively combined with batching/rate-limiting of 
requests to the cloud API allows one to do most of this to a fairly 
massive scale statelessly. However, caching, batching and rate-limiting 
are all pretty much required else you wind up crashing public clouds. :)


I agree that the things are currently duct tape, but I don't think that 
has to be a bad thing. The duct tape is currently needed client-side no 
matter what we do, and will be for some time no matter what we do 
because of older clouds. What's often missing is closing the loop so 
that we can, as OpenStack, eventually provide out of the box the consume 
experience that people currently get from using one of the client-side 
duct tapes. That part is harder, but it's definitely important.



You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10


Right. This is why communication is essential. I'm optimistic we can do 
well on this topic, because we are MUCH better are talking to each other 
now than we were back when ceilometer was started.


Also, a REST-consuming porcelain like oaktree gets to draw on real-world 
experience consuming OpenStack's REST APIs at scale. So it's also not 
the same problem setup, since it's not a from-scratch new thing.


This is, incidentally, why experience with caching and batching is 
important. There is a reason why we do GET /servers/detail once every 5 
seconds rather than doing a specific GET /server/{id}/detail calls for 
each booting VM.


Look at what we could learn just from that... Users using shade are 
doing a full detailed server list because it scales better for 
concurrency. It's obviously more expensive on a single-call basis. BUT - 
maybe it's useful information that doing optimization work on GET 
/servers/detail could be beneficial.


This reminds me that I suspect we're lazy-loading server detail 
information in certain cases, i.e. going back to the DB to do a join 
per-instance after we've already pulled all instances in an initial set 
(with some initial joins). I need to pull this thread again...

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Matt Riedemann


On 5/19/2017 1:04 PM, Sean Dague wrote:

Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
want my server to build", or "I'd like Nova to build a volume for me"
are very odd things to call PaaS. I think of PaaS as "here is a ruby on
rails app, provision me a db for it, and make it go". Heroku style.


Yeah as soon as I sent the original email I realized that I was munging 
PaaS and orchestration services/libraries and probably shouldn't have, 
that wasn't my intent. I just tend to lump them together in my head. My 
point was trying to see if I'm missing a change in people's opinions 
about really wanting Nova to be doing more orchestration than we expect.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Kevin Benton

Started a new Neutron-specific thread so we can get some stuff turned into
improvements in Neutron from this:
http://lists.openstack.org/pipermail/openstack-dev/2017-May/117112.html

On Fri, May 19, 2017 at 1:05 PM, Zane Bitter  wrote:

> On 19/05/17 15:06, Kevin Benton wrote:
>
>> Don't even get me started on Neutron.[2]
>>>
>>
>> It seems to me the conclusion to that thread was that the majority of
>> your issues stemmed from the fact that we had poor documentation at the
>> time.  A major component of the complaints resulted from you
>> misunderstanding the difference between networks/subnets in Neutron.
>>
>
> It's true that I was completely off base as to what the various primitives
> in Neutron actually do. (Thanks for educating me!) The implications for
> orchestration are largely unchanged though. It's a giant pain that we have
> to infer implicit dependencies between stuff to get them to create/delete
> in the right order, pretty much independently of what that stuff does.
>
> So knowing now that a Network is a layer-2 network segment and a Subnet
> is... effectively a glorified DHCP address pool, I understand better why it
> probably seemed like a good idea to hook stuff up magically. But at the end
> of the day, I still can't create a Port until a Subnet exists, I still
> don't know what Subnet a Port will be attached to (unless the user
> specifies it explicitly using the --fixed-ip option... regardless of
> whether they actually specify a fixed IP), and I have no way in general of
> telling which Subnets can be deleted before a given Port is and which will
> fail to delete until the Port disappears.
>
> There are some legitimate issues in there about the extra routes
>> extension being replace-only and the routers API not accepting a list of
>> interfaces in POST.  However, it hardly seems that those are worthy of
>> "Don't even get me started on Neutron."
>>
>
> https://launchpad.net/bugs/1626607
> https://launchpad.net/bugs/1442121
> https://launchpad.net/bugs/1626619
> https://launchpad.net/bugs/1626630
> https://launchpad.net/bugs/1626634
>
> It would be nice if you could write up something about current gaps that
>> would make Heat's life easier, because a large chunk of that initial
>> email is incorrect and linking to it as a big list of "issues" is
>> counter-productive.
>>
>
> Yes, agreed. I wish I had a clean thread to link to. It's a huge amount of
> work to research it all though.
>
> cheers,
> Zane.
>
> On Fri, May 19, 2017 at 7:36 AM, Zane Bitter > > wrote:
>>
>> On 18/05/17 20:19, Matt Riedemann wrote:
>>
>> I just wanted to blurt this out since it hit me a few times at the
>> summit, and see if I'm misreading the rooms.
>>
>> For the last few years, Nova has pushed back on adding
>> orchestration to
>> the compute API, and even define a policy for it since it comes
>> up so
>> much [1]. The stance is that the compute API should expose
>> capabilities
>> that a higher-level orchestration service can stitch together
>> for a more
>> fluid end user experience.
>>
>>
>> I think this is a wise policy.
>>
>> One simple example that comes up time and again is allowing a
>> user to
>> pass volume type to the compute API when booting from volume
>> such that
>> when nova creates the backing volume in Cinder, it passes
>> through the
>> volume type. If you need a non-default volume type for boot from
>> volume,
>> the way you do this today is first create the volume with said
>> type in
>> Cinder and then provide that volume to the compute API when
>> creating the
>> server. However, people claim that is bad UX or hard for users to
>> understand, something like that (at least from a command line, I
>> assume
>> Horizon hides this, and basic users should probably be using
>> Horizon
>> anyway right?).
>>
>>
>> As always, there's a trade-off between simplicity and flexibility. I
>> can certainly understand the logic in wanting to make the simple
>> stuff simple. But users also need to be able to progress from simple
>> stuff to more complex stuff without having to give up and start
>> over. There's a danger of leading them down the garden path.
>>
>> While talking about claims in the scheduler and a top-level
>> conductor
>> for cells v2 deployments, we've talked about the desire to
>> eliminate
>> "up-calls" from the compute service to the top-level controller
>> services
>> (nova-api, nova-conductor and nova-scheduler). Build retries is
>> one such
>> up-call. CERN disables build retries, but others rely on them,
>> because
>> of how racy claims in the computes are (that's another story and
>>

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Matt Riedemann


On 5/19/2017 12:38 PM, Dean Troyer wrote:

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available.  Just picking "POST
/server" as an example, we do not have that today.  Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes.  I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.


I'm confused by this. Creating a server takes a volume ID if you're 
booting from volume, and that's actually preferred (by nova devs) since 
then Nova doesn't have to orchestrate the creation of the volume in the 
compute service and then poll until it's available.


Same for ports - nova can create the port (default action) or get a port 
at server creation time, which is required if you're doing trunk ports 
or sr-iov / fancy pants ports.


Am I misunderstanding what you're saying is missing?

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Monty Taylor


On 05/19/2017 03:05 PM, Zane Bitter wrote:

On 19/05/17 15:06, Kevin Benton wrote:

Don't even get me started on Neutron.[2]


It seems to me the conclusion to that thread was that the majority of
your issues stemmed from the fact that we had poor documentation at the
time.  A major component of the complaints resulted from you
misunderstanding the difference between networks/subnets in Neutron.


It's true that I was completely off base as to what the various
primitives in Neutron actually do. (Thanks for educating me!) The
implications for orchestration are largely unchanged though. It's a
giant pain that we have to infer implicit dependencies between stuff to
get them to create/delete in the right order, pretty much independently
of what that stuff does.

So knowing now that a Network is a layer-2 network segment and a Subnet
is... effectively a glorified DHCP address pool, I understand better why
it probably seemed like a good idea to hook stuff up magically. But at
the end of the day, I still can't create a Port until a Subnet exists, I
still don't know what Subnet a Port will be attached to (unless the user
specifies it explicitly using the --fixed-ip option... regardless of
whether they actually specify a fixed IP), and I have no way in general
of telling which Subnets can be deleted before a given Port is and which
will fail to delete until the Port disappears.


There are some legitimate issues in there about the extra routes
extension being replace-only and the routers API not accepting a list of
interfaces in POST.  However, it hardly seems that those are worthy of
"Don't even get me started on Neutron."


https://launchpad.net/bugs/1626607
https://launchpad.net/bugs/1442121
https://launchpad.net/bugs/1626619
https://launchpad.net/bugs/1626630
https://launchpad.net/bugs/1626634


It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial
email is incorrect and linking to it as a big list of "issues" is
counter-productive.


I used to have angst at the Neutron API but have come to like it more 
and more over time.


I think the main thing I run in to is that Neutron's API is modelling a 
a pile of data to allow for power users to do very flexible things. What 
it's missing most of the time is an easy button.


I'll give some examples:

My favorite for-instance, which I mentioned in a different thread this 
week and have mentioned in almost every talk I've given over the last 3 
years - is that there is no way to find out if a given network can 
provide connectivity to a resource from outside of the cloud.


There are _many_ reasons why it's hard to fully express a completely 
accurate answer to this problem. "What does external mean" "what if 
there are multiple external networks" etc. Those are all valid, and all 
speak to real workloads and real user scenarios ...


But there's also:

As a user I want to boot a VM on this cloud and have my users who are 
not necessarily on this cloud be able to connect a service I'm going to 
run on it. (aka, I want to run a wordpress)


and

As a user I want to boot a VM on this cloud and I do not want anyone who 
is not another resource on this cloud to be able to connect to anything 
it's running. ( aka, I want to run a mysql)


Unless you know things about the cloud already somehow not from the API, 
it is impossible to consistently perform those two tasks.


We've done a great job empowering the power users to do a bunch of 
really cool things. But we missed booting a wordpress as a basic use case.


Other things exist but aren't anyone's fault really. We still can't as a 
community agree on a consistent worldview related to fixed ips, neutron 
ports and floating ips. Neutron amazingly supports ALL of the use case 
combinations for those topics ... it just doesn't always do so in all of 
the clouds.


Heck - while I'm on floating ips ... if you have some pre-existing 
floating ips and you want to boot servers on them and you want to do 
that in parallel, you can't. You can boot a server with a floating ip 
that did not pre-exist if you get the port id of the fixed ip of the 
server then pass that id to the floating ip create call. Of course, the 
server doesn't return the port id in the server record, so at the very 
least you need to make a GET /ports.json?device_id={server_id} call. Of 
course what you REALLY need to find is the port_id of the ip of the 
server that came from a subnet that has 'gateway_ip' defined, which is 
even more fun since ips are associated with _networks_ on the server 
record and not with subnets.


Possibly to Zane's point, you basically have to recreate a multi-table 
data model client side and introspect relationships between objects to 
be able to figure out how to correctly get a floating ip on to a server. 
NOW - as opposed to the external network bit- it IS possible to do and 
to do correctly and have it work every time.


But if you want

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Monty Taylor


On 05/19/2017 01:53 PM, Sean Dague wrote:

On 05/19/2017 02:34 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper documentation,
performance implications, and the like), not an end game on their own.


I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).


Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.


You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".


Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.


Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.


We also searching of all resources by name-or-id in shade. But it's one 
call - GET /images - and then we test to see if the given value matches 
the name field or the id field. And there is caching, so the list call 
is done once in the session.


The thing I'm the saddest about is the Nova flavor "extra_info" that one 
needs to grab for backwards compat but almost never has anything useful 
in it. This causes me to make a billion API calls for the initial flavor 
list (which is then cached of course) It would be WAY nicer if there was 
a GET /flavors/detail that would just get me the whole lot in one go, fwiw.


Dean has a harder time than I do with that one because osc interactions 
are lots of process invocations from scratch. We chatted a bit about how 
to potentially share caching things in Boston, but not sure we've come 
up with more.



All for new and better experiences. I think that's great. Where I think
we want to be really careful is deciding the path to creating better
experiences is by not engaging with the services and just writing around
it. That feedback has to come back. Those reasons have to come back, and
we need to roll sensible improvements back into base services.

If you want to go fast, go alone, if you want to go far, go together.


Couldn't agree more . I think we're getting better at that communication.

We still have a hole, which is that the path from "this is a problem and 
here's how I'm working around it" to "there are devs tasked to work on 
solving that problem" is a hard one, because while the communication 
from those of us doing client-layer stuff with the folks doing the 
servers is pretty good - the communication loop with the folks at the 
companies who are prioritizing work ... not so much. Look at the number 
of people hacking on shade or python-openstackclient or writing 
user-facing docs compared to folks adding backend features to the services.


So - yes, I totally agree. But also, we can make and are making a lot of 
progress in some areas with tiny crews. That's gonna likely be the state 
of the world for a while until we're better able to point our fingers at 
the problem and characterize it such that our friends who provide 
resources value these problems enough to fund working on them.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Zane Bitter


On 19/05/17 15:06, Kevin Benton wrote:

Don't even get me started on Neutron.[2]


It seems to me the conclusion to that thread was that the majority of
your issues stemmed from the fact that we had poor documentation at the
time.  A major component of the complaints resulted from you
misunderstanding the difference between networks/subnets in Neutron.


It's true that I was completely off base as to what the various 
primitives in Neutron actually do. (Thanks for educating me!) The 
implications for orchestration are largely unchanged though. It's a 
giant pain that we have to infer implicit dependencies between stuff to 
get them to create/delete in the right order, pretty much independently 
of what that stuff does.


So knowing now that a Network is a layer-2 network segment and a Subnet 
is... effectively a glorified DHCP address pool, I understand better why 
it probably seemed like a good idea to hook stuff up magically. But at 
the end of the day, I still can't create a Port until a Subnet exists, I 
still don't know what Subnet a Port will be attached to (unless the user 
specifies it explicitly using the --fixed-ip option... regardless of 
whether they actually specify a fixed IP), and I have no way in general 
of telling which Subnets can be deleted before a given Port is and which 
will fail to delete until the Port disappears.



There are some legitimate issues in there about the extra routes
extension being replace-only and the routers API not accepting a list of
interfaces in POST.  However, it hardly seems that those are worthy of
"Don't even get me started on Neutron."


https://launchpad.net/bugs/1626607
https://launchpad.net/bugs/1442121
https://launchpad.net/bugs/1626619
https://launchpad.net/bugs/1626630
https://launchpad.net/bugs/1626634


It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial
email is incorrect and linking to it as a big list of "issues" is
counter-productive.


Yes, agreed. I wish I had a clean thread to link to. It's a huge amount 
of work to research it all though.


cheers,
Zane.


On Fri, May 19, 2017 at 7:36 AM, Zane Bitter > wrote:

On 18/05/17 20:19, Matt Riedemann wrote:

I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding
orchestration to
the compute API, and even define a policy for it since it comes
up so
much [1]. The stance is that the compute API should expose
capabilities
that a higher-level orchestration service can stitch together
for a more
fluid end user experience.


I think this is a wise policy.

One simple example that comes up time and again is allowing a
user to
pass volume type to the compute API when booting from volume
such that
when nova creates the backing volume in Cinder, it passes
through the
volume type. If you need a non-default volume type for boot from
volume,
the way you do this today is first create the volume with said
type in
Cinder and then provide that volume to the compute API when
creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I
assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).


As always, there's a trade-off between simplicity and flexibility. I
can certainly understand the logic in wanting to make the simple
stuff simple. But users also need to be able to progress from simple
stuff to more complex stuff without having to give up and start
over. There's a danger of leading them down the garden path.

While talking about claims in the scheduler and a top-level
conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller
services
(nova-api, nova-conductor and nova-scheduler). Build retries is
one such
up-call. CERN disables build retries, but others rely on them,
because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked,
"why
not just do away with build retries in nova altogether? If the
scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."


(FWIW Heat does this for you already.)

But during several different Forum sessions, like user API
improvements
[2] but also the cells v2 and claims in the scheduler

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Monty Taylor


On 05/19/2017 01:04 PM, Sean Dague wrote:

On 05/19/2017 01:38 PM, Dean Troyer wrote:

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids.  The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of
simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands".  I think there's
an argument to be made that it would be better to be consistent even for the
simple things.


cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.


It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.


Yup. Agree. Making forward progress on that is paramount.


I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... boom.


You can actually get fairly far (with a few notable exceptions - I'm 
looking at you unattached floating ips) without state tracking. It comes 
at the cost of more API spidering after a failure/restart. Being able to 
cache stuff aggressively combined with batching/rate-limiting of 
requests to the cloud API allows one to do most of this to a fairly 
massive scale statelessly. However, caching, batching and rate-limiting 
are all pretty much required else you wind up crashing public clouds. :)


I agree that the things are currently duct tape, but I don't think that 
has to be a bad thing. The duct tape is currently needed client-side no 
matter what we do, and will be for some time no matter what we do 
because of older clouds. What's often missing is closing the loop so 
that we can, as OpenStack, eventually provide out of the box the consume 
experience that people currently get from using one of the client-side 
duct tapes. That part is harder, but it's definitely important.



You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10


Right. This is why communication is essential. I'm optimistic we can do 
well on this topic, because we are MUCH better are talking to each other 
now than we were back when ceilometer was started.


Also, a REST-consuming porcelain like oaktree gets to draw on real-world 
experience consuming OpenStack's REST APIs at scale. So it's also not 
the same problem setup, since it's not a from-scratch new thing.


This is, incidentally, why experience with caching and batching is 
important. There is a reason why we do GET /servers/detail once every 5 
seconds rather than doing a specific GET /server/{id}/detail calls for 
each booting VM.


Look at what we could learn just from that... Users using shade are 
doing a full detailed server list because it scales better for 
concurrency. It's obviously more expensive on a single-call basis. BUT - 
maybe it's useful information that doing optimization work on GET 
/servers/detail could be beneficial.



(http://superuser.openstack.org/articles/cern-cloud-architecture-update/
last graph). That was an anti pattern. We should have gotten to the
bottom of the mismatches and communication issues early on, because the
end state we all inflicted on users to get a totally reasonable set of
features, was not good. Please lets not do this again.


++


These should

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Kevin Benton

>Don't even get me started on Neutron.[2]

It seems to me the conclusion to that thread was that the majority of your
issues stemmed from the fact that we had poor documentation at the time.  A
major component of the complaints resulted from you misunderstanding the
difference between networks/subnets in Neutron.

There are some legitimate issues in there about the extra routes extension
being replace-only and the routers API not accepting a list of interfaces
in POST.  However, it hardly seems that those are worthy of "Don't even get
me started on Neutron."

It would be nice if you could write up something about current gaps that
would make Heat's life easier, because a large chunk of that initial email
is incorrect and linking to it as a big list of "issues" is
counter-productive.


On Fri, May 19, 2017 at 7:36 AM, Zane Bitter  wrote:

> On 18/05/17 20:19, Matt Riedemann wrote:
>
>> I just wanted to blurt this out since it hit me a few times at the
>> summit, and see if I'm misreading the rooms.
>>
>> For the last few years, Nova has pushed back on adding orchestration to
>> the compute API, and even define a policy for it since it comes up so
>> much [1]. The stance is that the compute API should expose capabilities
>> that a higher-level orchestration service can stitch together for a more
>> fluid end user experience.
>>
>
> I think this is a wise policy.
>
> One simple example that comes up time and again is allowing a user to
>> pass volume type to the compute API when booting from volume such that
>> when nova creates the backing volume in Cinder, it passes through the
>> volume type. If you need a non-default volume type for boot from volume,
>> the way you do this today is first create the volume with said type in
>> Cinder and then provide that volume to the compute API when creating the
>> server. However, people claim that is bad UX or hard for users to
>> understand, something like that (at least from a command line, I assume
>> Horizon hides this, and basic users should probably be using Horizon
>> anyway right?).
>>
>
> As always, there's a trade-off between simplicity and flexibility. I can
> certainly understand the logic in wanting to make the simple stuff simple.
> But users also need to be able to progress from simple stuff to more
> complex stuff without having to give up and start over. There's a danger of
> leading them down the garden path.
>
> While talking about claims in the scheduler and a top-level conductor
>> for cells v2 deployments, we've talked about the desire to eliminate
>> "up-calls" from the compute service to the top-level controller services
>> (nova-api, nova-conductor and nova-scheduler). Build retries is one such
>> up-call. CERN disables build retries, but others rely on them, because
>> of how racy claims in the computes are (that's another story and why
>> we're working on fixing it). While talking about this, we asked, "why
>> not just do away with build retries in nova altogether? If the scheduler
>> picks a host and the build fails, it fails, and you have to
>> retry/rebuild/delete/recreate from a top-level service."
>>
>
> (FWIW Heat does this for you already.)
>
> But during several different Forum sessions, like user API improvements
>> [2] but also the cells v2 and claims in the scheduler sessions, I was
>> hearing about how operators only wanted to expose the base IaaS services
>> and APIs and end API users wanted to only use those, which means any
>> improvements in those APIs would have to be in the base APIs (nova,
>> cinder, etc). To me, that generally means any orchestration would have
>> to be baked into the compute API if you're not using Heat or something
>> similar.
>>
>
> The problem is that orchestration done inside APIs is very easy to do
> badly in ways that cause lots of downstream pain for users and external
> orchestrators. For example, Nova already does some orchestration: it
> creates a Neutron port for a server if you don't specify one. (And then
> promptly forgets that it has done so.) There is literally an entire inner
> platform, an orchestrator within an orchestrator, inside Heat to try to
> manage the fallout from this. And the inner platform shares none of the
> elegance, such as it is, of Heat itself, but is rather a collection of
> cobbled-together hacks to deal with the seemingly infinite explosion of
> edge cases that we kept running into over a period of at least 5 releases.
>
> The get-me-a-network thing is... better, but there's no provision for
> changes after the server is created, which means we have to copy-paste the
> Nova implementation into Heat to deal with update.[1] Which sounds like a
> maintenance nightmare in the making. That seems to be a common mistake: to
> assume that once users create something they'll never need to touch it
> again, except to delete it when they're done.
>
> Don't even get me started on Neutron.[2]
>
> Any orchestration that is done behind-the-scenes needs to be

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Sean Dague

On 05/19/2017 02:34 PM, Dean Troyer wrote:
> On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:
>> These should be used as ways to experiment with the kinds of interfaces
>> we want cheaply, then take them back into services (which is a more
>> expensive process involving compatibility stories, deeper documentation,
>> performance implications, and the like), not an end game on their own.
> 
> I totally agree here.  But I also see the rate of progress for many
> and varied reasons, and want to make users lives easier now.
> 
> Have any of the lessons already learned from Shade or OSC made it into
> services yet?  I think a few may have, "get me a network" being the
> obvious one.  But that still took a lot of work (granted that one _is_
> complicated).

Doing hard things is hard. I don't expect changing APIs to be easy at
this level of deployedness of OpenStack.

>> You can get the behavior. It also has other behaviors. I'm not sure any
>> user has actually argued for "please make me do more rest calls to
>> create a server".
> 
> Maybe not in those words, but "give me the tools to do what I need"
> has been heard often.  Sometimes those tools are composable
> primitives, sometimes they are helpful opinionated interfaces.  I've
> already done the helpful opinionated stuff in OSC here (accept flavor
> and image names when the non-unique names _do_ identify a single
> result).  Having that control lets me give the user more options in
> handling edge cases.

Sure, it does. The fact that it makes 3 API calls every time when doing
flavors by name (404 on the name, list all flavors, local search, get
the flavor by real id) on mostly read only data (without any caching) is
the kind of problem that rises from "just fix it in an upper layer". So
it does provide an experience at a cost.

All for new and better experiences. I think that's great. Where I think
we want to be really careful is deciding the path to creating better
experiences is by not engaging with the services and just writing around
it. That feedback has to come back. Those reasons have to come back, and
we need to roll sensible improvements back into base services.

If you want to go fast, go alone, if you want to go far, go together.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Dean Troyer

On Fri, May 19, 2017 at 1:04 PM, Sean Dague  wrote:
> These should be used as ways to experiment with the kinds of interfaces
> we want cheaply, then take them back into services (which is a more
> expensive process involving compatibility stories, deeper documentation,
> performance implications, and the like), not an end game on their own.

I totally agree here.  But I also see the rate of progress for many
and varied reasons, and want to make users lives easier now.

Have any of the lessons already learned from Shade or OSC made it into
services yet?  I think a few may have, "get me a network" being the
obvious one.  But that still took a lot of work (granted that one _is_
complicated).

> You can get the behavior. It also has other behaviors. I'm not sure any
> user has actually argued for "please make me do more rest calls to
> create a server".

Maybe not in those words, but "give me the tools to do what I need"
has been heard often.  Sometimes those tools are composable
primitives, sometimes they are helpful opinionated interfaces.  I've
already done the helpful opinionated stuff in OSC here (accept flavor
and image names when the non-unique names _do_ identify a single
result).  Having that control lets me give the user more options in
handling edge cases.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Clint Byrum

Excerpts from Clark Boylan's message of 2017-05-19 10:03:23 -0700:
> On Fri, May 19, 2017, at 05:59 AM, Duncan Thomas wrote:
> > On 19 May 2017 at 12:24, Sean Dague  wrote:
> > 
> > > I do get the concerns of extra logic in Nova, but the decision to break
> > > up the working compute with network and storage problem space across 3
> > > services and APIs doesn't mean we shouldn't still make it easy to
> > > express some pretty basic and common intents.
> > 
> > Given that we've similar needs for retries and race avoidance in and
> > between glance, nova, cinder and neutron, and a need or orchestrate
> > between at least these three (arguably other infrastructure projects
> > too, I'm not trying to get into specifics), maybe the answer is to put
> > that logic in a new service, that talks to those four, and provides a
> > nice simple API, while allowing the cinder, nova etc APIs to remove
> > things like internal retries?
> 
> The big issue with trying to solve the problem this way is that various
> clouds won't deploy this service then your users are stuck with the
> "base" APIs anyway or deploying this service themselves. This is mostly
> ok until you realize that we rarely build services to run "on" cloud
> rather than "in" cloud so I as the user can't sanely deploy a new
> service this way, and even if I can I'm stuck deploying it for the 6
> clouds and 15 regions (numbers not exact) because even more rarely do we
> write software that is multicloud/region aware.
> 
> We need to be very careful if this is the path we take because it often
> doesn't actually make the user experience better.

I think an argument can be made that if the community were to rally
around something like Enamel and Oaktree, that it would be deployed
broadly.

As Zane pointed out elsewhere in the thread, Heat does some of this
for you, and has seen a lot of adoption, but nowhere near the level
of Neutron and Cinder. However, I believe Heat is missing from some
clouds because it is stateful, and thus, requires a large investment to
deploy. Oaktree is specifically _not_ stateful, and not dependent on admin
access to function, so I could see rallying around something that _can_
be deployed by users, but would be much more popular for deployers to
add in as users ask for it.

So whatever gets chosen as the popular porcelein API, it sounds to me
like it's worth getting serious about it.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Sean Dague

On 05/19/2017 01:38 PM, Dean Troyer wrote:
> On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
>  wrote:
>> ..., but it seems to me that the logical
>> extension of that is to expose simple orthogonal APIs where the nova boot
>> request should only take neutron port ids and cinder volume ids.  The actual
>> setup of those ports/volumes would be done by neutron and cinder.
>>
>> It seems somewhat arbitrary to say "for historical reasons this subset of
>> simple things can be done directly in a nova boot command, but for more
>> complicated stuff you have to go use these other commands".  I think there's
>> an argument to be made that it would be better to be consistent even for the
>> simple things.
> 
> cdent mentioned enamel[0] above, and there is also oaktree[1], both of
> which are wrapper/proxy services in front of existing OpenStack APIs.
> I don't know enough about enamel yet, but one of the things I like
> about oaktree is that it is not required to be deployed by the cloud
> operator to be useful, I could set it up and proxy Rax and/or
> CityCloud and/or mtreinish's closet cloud equally well.
> 
> The fact that these exist, and things like shade itself, are clues
> that we are not meeting the needs of API consumers.  I don't think
> anyone disagrees with that; let me know if you do and I'll update my
> thoughts.

It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.

I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... boom.

You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10
(http://superuser.openstack.org/articles/cern-cloud-architecture-update/
last graph). That was an anti pattern. We should have gotten to the
bottom of the mismatches and communication issues early on, because the
end state we all inflicted on users to get a totally reasonable set of
features, was not good. Please lets not do this again.

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper documentation,
performance implications, and the like), not an end game on their own.

> First and foremost, we need to have the primitive operations that get
> composed into the higher-level ones available.  Just picking "POST
> /server" as an example, we do not have that today.  Chris mentions
> above the low-level version should take IDs for all of the associated
> resources and no magic happening behind the scenes.  I think this
> should be our top priority, everything else builds on top of that, via
> either in-service APIs or proxies or library wrappers, whatever a) can
> get implemented and b) makes sense for the use case.

You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".

Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
want my server to build", or "I'd like Nova to build a volume for me"
are very odd things to call PaaS. I think of PaaS as "here is a ruby on
rails app, provision me a db for it, and make it go". Heroku style.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Dean Troyer

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
 wrote:
> ..., but it seems to me that the logical
> extension of that is to expose simple orthogonal APIs where the nova boot
> request should only take neutron port ids and cinder volume ids.  The actual
> setup of those ports/volumes would be done by neutron and cinder.
>
> It seems somewhat arbitrary to say "for historical reasons this subset of
> simple things can be done directly in a nova boot command, but for more
> complicated stuff you have to go use these other commands".  I think there's
> an argument to be made that it would be better to be consistent even for the
> simple things.

cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers.  I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available.  Just picking "POST
/server" as an example, we do not have that today.  Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes.  I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.

dt

[BTW, I made this same type of proposal for the OpenStack SDK a few
years ago and it went unmerged, so at some level folks do not agree
this is necessary. I look now at what the Shade folk are doing about
building low-level REST layer that they then compose and wish I had
been more persistent then.]

[0] https://github.com/jaypipes/enamel
[1] http://git.openstack.org/cgit/openstack/oaktree
-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Clark Boylan

On Fri, May 19, 2017, at 05:59 AM, Duncan Thomas wrote:
> On 19 May 2017 at 12:24, Sean Dague  wrote:
> 
> > I do get the concerns of extra logic in Nova, but the decision to break
> > up the working compute with network and storage problem space across 3
> > services and APIs doesn't mean we shouldn't still make it easy to
> > express some pretty basic and common intents.
> 
> Given that we've similar needs for retries and race avoidance in and
> between glance, nova, cinder and neutron, and a need or orchestrate
> between at least these three (arguably other infrastructure projects
> too, I'm not trying to get into specifics), maybe the answer is to put
> that logic in a new service, that talks to those four, and provides a
> nice simple API, while allowing the cinder, nova etc APIs to remove
> things like internal retries?

The big issue with trying to solve the problem this way is that various
clouds won't deploy this service then your users are stuck with the
"base" APIs anyway or deploying this service themselves. This is mostly
ok until you realize that we rarely build services to run "on" cloud
rather than "in" cloud so I as the user can't sanely deploy a new
service this way, and even if I can I'm stuck deploying it for the 6
clouds and 15 regions (numbers not exact) because even more rarely do we
write software that is multicloud/region aware.

We need to be very careful if this is the path we take because it often
doesn't actually make the user experience better.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Chris Friesen

On 05/19/2017 07:18 AM, Sean Dague wrote:

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to *actually* do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

A while back I suggested adding the vif-model as an attribute on the network
during a nova boot request, and we were shot down because "that should be done
in neutron".

I have some sympathy for this argument, but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids. The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of simple
things can be done directly in a nova boot command, but for more complicated
stuff you have to go use these other commands". I think there's an argument to
be made that it would be better to be consistent even for the simple things.

Chris

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Zane Bitter


On 18/05/17 20:19, Matt Riedemann wrote:

I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding orchestration to
the compute API, and even define a policy for it since it comes up so
much [1]. The stance is that the compute API should expose capabilities
that a higher-level orchestration service can stitch together for a more
fluid end user experience.


I think this is a wise policy.


One simple example that comes up time and again is allowing a user to
pass volume type to the compute API when booting from volume such that
when nova creates the backing volume in Cinder, it passes through the
volume type. If you need a non-default volume type for boot from volume,
the way you do this today is first create the volume with said type in
Cinder and then provide that volume to the compute API when creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).


As always, there's a trade-off between simplicity and flexibility. I can 
certainly understand the logic in wanting to make the simple stuff 
simple. But users also need to be able to progress from simple stuff to 
more complex stuff without having to give up and start over. There's a 
danger of leading them down the garden path.



While talking about claims in the scheduler and a top-level conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller services
(nova-api, nova-conductor and nova-scheduler). Build retries is one such
up-call. CERN disables build retries, but others rely on them, because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked, "why
not just do away with build retries in nova altogether? If the scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."


(FWIW Heat does this for you already.)


But during several different Forum sessions, like user API improvements
[2] but also the cells v2 and claims in the scheduler sessions, I was
hearing about how operators only wanted to expose the base IaaS services
and APIs and end API users wanted to only use those, which means any
improvements in those APIs would have to be in the base APIs (nova,
cinder, etc). To me, that generally means any orchestration would have
to be baked into the compute API if you're not using Heat or something
similar.


The problem is that orchestration done inside APIs is very easy to do 
badly in ways that cause lots of downstream pain for users and external 
orchestrators. For example, Nova already does some orchestration: it 
creates a Neutron port for a server if you don't specify one. (And then 
promptly forgets that it has done so.) There is literally an entire 
inner platform, an orchestrator within an orchestrator, inside Heat to 
try to manage the fallout from this. And the inner platform shares none 
of the elegance, such as it is, of Heat itself, but is rather a 
collection of cobbled-together hacks to deal with the seemingly infinite 
explosion of edge cases that we kept running into over a period of at 
least 5 releases.


The get-me-a-network thing is... better, but there's no provision for 
changes after the server is created, which means we have to copy-paste 
the Nova implementation into Heat to deal with update.[1] Which sounds 
like a maintenance nightmare in the making. That seems to be a common 
mistake: to assume that once users create something they'll never need 
to touch it again, except to delete it when they're done.


Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done 
superbly well, provide transparency for external orchestration tools 
that need to hook in to the data flow, and should be developed in 
consultation with potential consumers like Shade and Heat.



Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?


(Aside: can we stop using the term 'PaaS' to refer to "everything that 
Nova doesn't do"? This habit is not helping us to communicate clearly.)


cheers,
Zane.

[1] https://review.openstack.org/#/c/407328/
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Sean Dague

On 05/19/2017 09:04 AM, Chris Dent wrote:
> On Fri, 19 May 2017, Duncan Thomas wrote:
> 
>> On 19 May 2017 at 12:24, Sean Dague  wrote:
>>
>>> I do get the concerns of extra logic in Nova, but the decision to break
>>> up the working compute with network and storage problem space across 3
>>> services and APIs doesn't mean we shouldn't still make it easy to
>>> express some pretty basic and common intents.
>>
>> Given that we've similar needs for retries and race avoidance in and
>> between glance, nova, cinder and neutron, and a need or orchestrate
>> between at least these three (arguably other infrastructure projects
>> too, I'm not trying to get into specifics), maybe the answer is to put
>> that logic in a new service, that talks to those four, and provides a
>> nice simple API, while allowing the cinder, nova etc APIs to remove
>> things like internal retries?
> 
> This is what enamel was going to be, but we got stalled out because
> of lack of resources and the usual raft of other commitments:
> 
> https://github.com/jaypipes/enamel

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to *actually* do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Chris Dent


On Fri, 19 May 2017, Duncan Thomas wrote:


On 19 May 2017 at 12:24, Sean Dague  wrote:


I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.


Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?


This is what enamel was going to be, but we got stalled out because
of lack of resources and the usual raft of other commitments:

https://github.com/jaypipes/enamel

--
Chris Dent  ┬──┬◡ﾉ(° -°ﾉ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Duncan Thomas

On 19 May 2017 at 12:24, Sean Dague  wrote:

> I do get the concerns of extra logic in Nova, but the decision to break
> up the working compute with network and storage problem space across 3
> services and APIs doesn't mean we shouldn't still make it easy to
> express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

-- 
Duncan Thomas

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Sean Dague

On 05/18/2017 08:19 PM, Matt Riedemann wrote:
> I just wanted to blurt this out since it hit me a few times at the
> summit, and see if I'm misreading the rooms.
> 
> For the last few years, Nova has pushed back on adding orchestration to
> the compute API, and even define a policy for it since it comes up so
> much [1]. The stance is that the compute API should expose capabilities
> that a higher-level orchestration service can stitch together for a more
> fluid end user experience.
> 
> One simple example that comes up time and again is allowing a user to
> pass volume type to the compute API when booting from volume such that
> when nova creates the backing volume in Cinder, it passes through the
> volume type. If you need a non-default volume type for boot from volume,
> the way you do this today is first create the volume with said type in
> Cinder and then provide that volume to the compute API when creating the
> server. However, people claim that is bad UX or hard for users to
> understand, something like that (at least from a command line, I assume
> Horizon hides this, and basic users should probably be using Horizon
> anyway right?).
> 
> While talking about claims in the scheduler and a top-level conductor
> for cells v2 deployments, we've talked about the desire to eliminate
> "up-calls" from the compute service to the top-level controller services
> (nova-api, nova-conductor and nova-scheduler). Build retries is one such
> up-call. CERN disables build retries, but others rely on them, because
> of how racy claims in the computes are (that's another story and why
> we're working on fixing it). While talking about this, we asked, "why
> not just do away with build retries in nova altogether? If the scheduler
> picks a host and the build fails, it fails, and you have to
> retry/rebuild/delete/recreate from a top-level service."
> 
> But during several different Forum sessions, like user API improvements
> [2] but also the cells v2 and claims in the scheduler sessions, I was
> hearing about how operators only wanted to expose the base IaaS services
> and APIs and end API users wanted to only use those, which means any
> improvements in those APIs would have to be in the base APIs (nova,
> cinder, etc). To me, that generally means any orchestration would have
> to be baked into the compute API if you're not using Heat or something
> similar.
> 
> Am I missing the point, or is the pendulum really swinging away from
> PaaS layer services which abstract the dirty details of the lower-level
> IaaS APIs? Or was this always something people wanted and I've just
> never made the connection until now?

Lots of people just want IaaS. See the fact that Google and Microsoft
both didn't offer it at first in their public clouds, and got pretty
marginal uptake while AWS ate the world. They have both reversed course
there.

The predictability of whether an intent is going to be fullfilled, and
"POST /servers" is definitely pretty clear intent, is directly related
to how much will are going to be willing to use the platform, build
tools for it. If it's much more complicated to build tooling on
OpenStack IaaS because that tooling needs to put everything in it's own
retry work queue, lots of folks will just give up.

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-19 Thread Thierry Carrez

Matt Riedemann wrote:
> [...]
> Am I missing the point, or is the pendulum really swinging away from
> PaaS layer services which abstract the dirty details of the lower-level
> IaaS APIs? Or was this always something people wanted and I've just
> never made the connection until now?

I feel like this is driven by a need for better UX on the IaaS APIs
layer (less calls, or more intuitive calls, as shown by shade UI). Even
if that IaaS layer is mostly accessed programmatically, it's not an
excuse for requiring 5 convoluted API calls and reading 5 pages of doc
for a basic action, when you could make it a single call.

So I'm not sure it's a recent change, or that it shows the demise of
PaaS layers, but that certainly shows that direct usage of IaaS APIs is
still a thing. If anything, the rise of application orchestration
frameworks like Kubernetes only separated the concerns -- provisioning
of application clusters might be done by someone else, but it still is
done by someone.

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Is the pendulum swinging on PaaS layers?

2017-05-18 Thread Matt Riedemann

I just wanted to blurt this out since it hit me a few times at the 
summit, and see if I'm misreading the rooms.


For the last few years, Nova has pushed back on adding orchestration to 
the compute API, and even define a policy for it since it comes up so 
much [1]. The stance is that the compute API should expose capabilities 
that a higher-level orchestration service can stitch together for a more 
fluid end user experience.


One simple example that comes up time and again is allowing a user to 
pass volume type to the compute API when booting from volume such that 
when nova creates the backing volume in Cinder, it passes through the 
volume type. If you need a non-default volume type for boot from volume, 
the way you do this today is first create the volume with said type in 
Cinder and then provide that volume to the compute API when creating the 
server. However, people claim that is bad UX or hard for users to 
understand, something like that (at least from a command line, I assume 
Horizon hides this, and basic users should probably be using Horizon 
anyway right?).


While talking about claims in the scheduler and a top-level conductor 
for cells v2 deployments, we've talked about the desire to eliminate 
"up-calls" from the compute service to the top-level controller services 
(nova-api, nova-conductor and nova-scheduler). Build retries is one such 
up-call. CERN disables build retries, but others rely on them, because 
of how racy claims in the computes are (that's another story and why 
we're working on fixing it). While talking about this, we asked, "why 
not just do away with build retries in nova altogether? If the scheduler 
picks a host and the build fails, it fails, and you have to 
retry/rebuild/delete/recreate from a top-level service."


But during several different Forum sessions, like user API improvements 
[2] but also the cells v2 and claims in the scheduler sessions, I was 
hearing about how operators only wanted to expose the base IaaS services 
and APIs and end API users wanted to only use those, which means any 
improvements in those APIs would have to be in the base APIs (nova, 
cinder, etc). To me, that generally means any orchestration would have 
to be baked into the compute API if you're not using Heat or something 
similar.


Am I missing the point, or is the pendulum really swinging away from 
PaaS layer services which abstract the dirty details of the lower-level 
IaaS APIs? Or was this always something people wanted and I've just 
never made the connection until now?


[1] https://docs.openstack.org/developer/nova/project_scope.html#api-scope
[2] 
https://etherpad.openstack.org/p/BOS-forum-openstack-user-api-improvements


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

40 matches

Mail list logo