Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

Zane Bitter Fri, 19 May 2017 15:30:52 -0700

On 19/05/17 17:03, Kevin Benton wrote:

I split this conversation off of the "Is the pendulum swinging on PaaS
layers?" thread [1] to discuss some improvements we can make to Neutron
to make orchestration easier.


There are some pain points that heat has when working with the Neutron
API. I would like to get them converted into requests for enhancements
in Neutron so the wider community is aware of them.

Starting with the port/subnet/network relationship - it's important to
understand that IP addresses are not required on a port.

So knowing now that a Network is a layer-2 network segment and a Subnet

is... effectively a glorified DHCP address pool

Yes, a Subnet controls IP address allocation as well as setting up
routing for routers, which is why routers reference subnets instead of
networks (different routers can route for different subnets on the same
network). It essentially dictates things related to L3 addressing and
provides information for L3 reachability.

But at the end of the day, I still can't create a Port until a Subnet exists


This is only true if you want an IP address on the port. This sounds
silly for most use cases, but there are a non-trivial portion of NFV
workloads that do not want IP addresses at all so they create a network
and just attach ports without creating any subnets.

Fair. A more precise statement of the problem would be that given atemplate containing both a Port and a Subnet that it will be attachedto, there is a specific order in which those need to be created that is_not_ reflected in the data flow between them.

I still don't know what Subnet a Port will be attached to (unless the

user specifies it explicitly using the --fixed-ip option... regardless
of whether they actually specify a fixed IP),

So what would you like Neutron to do differently here? Always force a
user to pick which subnet they want an allocation from


That would work.

if there are
multiple?


Ideally even if there aren't.

If so, can't you just force that explicitness in Heat?

I think the answer here is exactly the same as for Neutron: yes, wetotally could have if we'd realised that it was a problem at the time.

and I have no way in general of telling which Subnets can be deleted before a 
given Port is and which will fail to delete until the Port disappears.


A given port will only block subnet deletions from subnets it is
attached to. Conversely, you can see all ports with allocations from a
subnet with 'neutron port-list --fixed-ips subnet_id=<subnet-UUID>'.  So
is the issue here that the dependency wasn't made explicit in the heat
modeling (leading to the problem above and this one)?

Yes, that's exactly the issue. The Heat modelling was based on 1:1 withthe Neutron API to minimise user confusion.

For the individual bugs you highlighted, it would be good if you can
provide some details about what changes we could make to help.


https://bugs.launchpad.net/heat/+bug/1442121 - This looks like a result
of partially specified floating IPs (no fixed_ip). What can we
add/change here to help? Or can heat just always force the user to
specify a fixed IP for the case where disambiguation on multiple
fixed_ip ports is needed?

This is the issue from which all the others on that list were spawned(see https://bugs.launchpad.net/heat/+bug/1442121/comments/10), so theonly thing we're planning to actually do for this one is to catch anyexceptions closer to where they occur than we're doing in the fix forhttps://bugs.launchpad.net/heat/+bug/1554625

https://launchpad.net/bugs/1626607


Note that this one is fixed.

- I see this is about a dependency
between RouterGateways and RouterInterfaces, but it's not clear to me
why that dependency exists. Is it to solve a lack of visibility into the
interfaces required for a floating IP?


Yes, exactly.

We essentially solved the RouterGateway/RouterInterface half of theproblem in Heat back in Juno, by deprecating theOS::Neutron::RouterGateway resource and replacing it with an"external_gateway_info" property in OS::Neutron::Router. Old templatesnever die though.

https://bugs.launchpad.net/heat/+bug/1626619,
https://bugs.launchpad.net/heat/+bug/1626630, and
https://bugs.launchpad.net/heat/+bug/1626634 - These seems similar to
1626607.

The first and third are the RouterInterface/FloatingIP half of theproblem. And to work around that we also have to work around theSubnet/Port problem (that's the third bug). The second bug is theRouterGateway/RouterInterface equivalent of the third.

Can we just expose the interfaces/router a floating IP is
depending on explicitly in the API for you to fix these?

Not really. We need to know before any of them are actually created.Preferably without making any REST calls, because REST calls are slowand tend to raise exceptions at unfortunate times.

If not, what
can we do to help here?


In principle, either:

(a) drop the requirement that the Network has to be connected to theexternal network with the FloatingIPs with a RouterInterface prior tocreating the FloatingIP. IIUC only *some* Neutron backends require this.

or

(b) require the user to provide the UUID of the RouterInterface throughwhich they wish to connect when they create the FloatingIP.


cheers,
Zane.

1. http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html

Cheers,
Kevin Benton

On Fri, May 19, 2017 at 1:05 PM, Zane Bitter <[email protected]
<mailto:[email protected]>> wrote:

    On 19/05/17 15:06, Kevin Benton wrote:

            Don't even get me started on Neutron.[2]


        It seems to me the conclusion to that thread was that the
        majority of
        your issues stemmed from the fact that we had poor documentation
        at the
        time.  A major component of the complaints resulted from you
        misunderstanding the difference between networks/subnets in Neutron.


    It's true that I was completely off base as to what the various
    primitives in Neutron actually do. (Thanks for educating me!) The
    implications for orchestration are largely unchanged though. It's a
    giant pain that we have to infer implicit dependencies between stuff
    to get them to create/delete in the right order, pretty much
    independently of what that stuff does.

    So knowing now that a Network is a layer-2 network segment and a
    Subnet is... effectively a glorified DHCP address pool, I understand
    better why it probably seemed like a good idea to hook stuff up
    magically. But at the end of the day, I still can't create a Port
    until a Subnet exists, I still don't know what Subnet a Port will be
    attached to (unless the user specifies it explicitly using the
    --fixed-ip option... regardless of whether they actually specify a
    fixed IP), and I have no way in general of telling which Subnets can
    be deleted before a given Port is and which will fail to delete
    until the Port disappears.

        There are some legitimate issues in there about the extra routes
        extension being replace-only and the routers API not accepting a
        list of
        interfaces in POST.  However, it hardly seems that those are
        worthy of
        "Don't even get me started on Neutron."


    https://launchpad.net/bugs/1626607 <https://launchpad.net/bugs/1626607>
    https://launchpad.net/bugs/1442121 <https://launchpad.net/bugs/1442121>
    https://launchpad.net/bugs/1626619 <https://launchpad.net/bugs/1626619>
    https://launchpad.net/bugs/1626630 <https://launchpad.net/bugs/1626630>
    https://launchpad.net/bugs/1626634 <https://launchpad.net/bugs/1626634>

        It would be nice if you could write up something about current
        gaps that
        would make Heat's life easier, because a large chunk of that initial
        email is incorrect and linking to it as a big list of "issues" is
        counter-productive.


    Yes, agreed. I wish I had a clean thread to link to. It's a huge
    amount of work to research it all though.

    cheers,
    Zane.

        On Fri, May 19, 2017 at 7:36 AM, Zane Bitter <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

            On 18/05/17 20:19, Matt Riedemann wrote:

                I just wanted to blurt this out since it hit me a few
        times at the
                summit, and see if I'm misreading the rooms.

                For the last few years, Nova has pushed back on adding
                orchestration to
                the compute API, and even define a policy for it since
        it comes
                up so
                much [1]. The stance is that the compute API should expose
                capabilities
                that a higher-level orchestration service can stitch
        together
                for a more
                fluid end user experience.


            I think this is a wise policy.

                One simple example that comes up time and again is
        allowing a
                user to
                pass volume type to the compute API when booting from volume
                such that
                when nova creates the backing volume in Cinder, it passes
                through the
                volume type. If you need a non-default volume type for
        boot from
                volume,
                the way you do this today is first create the volume
        with said
                type in
                Cinder and then provide that volume to the compute API when
                creating the
                server. However, people claim that is bad UX or hard for
        users to
                understand, something like that (at least from a command
        line, I
                assume
                Horizon hides this, and basic users should probably be
        using Horizon
                anyway right?).


            As always, there's a trade-off between simplicity and
        flexibility. I
            can certainly understand the logic in wanting to make the simple
            stuff simple. But users also need to be able to progress
        from simple
            stuff to more complex stuff without having to give up and start
            over. There's a danger of leading them down the garden path.

                While talking about claims in the scheduler and a top-level
                conductor
                for cells v2 deployments, we've talked about the desire
        to eliminate
                "up-calls" from the compute service to the top-level
        controller
                services
                (nova-api, nova-conductor and nova-scheduler). Build
        retries is
                one such
                up-call. CERN disables build retries, but others rely on
        them,
                because
                of how racy claims in the computes are (that's another
        story and why
                we're working on fixing it). While talking about this,
        we asked,
                "why
                not just do away with build retries in nova altogether?
        If the
                scheduler
                picks a host and the build fails, it fails, and you have to
                retry/rebuild/delete/recreate from a top-level service."


            (FWIW Heat does this for you already.)

                But during several different Forum sessions, like user API
                improvements
                [2] but also the cells v2 and claims in the scheduler
        sessions,
                I was
                hearing about how operators only wanted to expose the
        base IaaS
                services
                and APIs and end API users wanted to only use those,
        which means any
                improvements in those APIs would have to be in the base
        APIs (nova,
                cinder, etc). To me, that generally means any orchestration
                would have
                to be baked into the compute API if you're not using Heat or
                something
                similar.


            The problem is that orchestration done inside APIs is very
        easy to
            do badly in ways that cause lots of downstream pain for
        users and
            external orchestrators. For example, Nova already does some
            orchestration: it creates a Neutron port for a server if you
        don't
            specify one. (And then promptly forgets that it has done
        so.) There
            is literally an entire inner platform, an orchestrator within an
            orchestrator, inside Heat to try to manage the fallout from
        this.
            And the inner platform shares none of the elegance, such as
        it is,
            of Heat itself, but is rather a collection of
        cobbled-together hacks
            to deal with the seemingly infinite explosion of edge cases
        that we
            kept running into over a period of at least 5 releases.

            The get-me-a-network thing is... better, but there's no
        provision
            for changes after the server is created, which means we have to
            copy-paste the Nova implementation into Heat to deal with
        update.[1]
            Which sounds like a maintenance nightmare in the making.
        That seems
            to be a common mistake: to assume that once users create
        something
            they'll never need to touch it again, except to delete it when
            they're done.

            Don't even get me started on Neutron.[2]

            Any orchestration that is done behind-the-scenes needs to be
        done
            superbly well, provide transparency for external
        orchestration tools
            that need to hook in to the data flow, and should be
        developed in
            consultation with potential consumers like Shade and Heat.

                Am I missing the point, or is the pendulum really
        swinging away from
                PaaS layer services which abstract the dirty details of the
                lower-level
                IaaS APIs? Or was this always something people wanted
        and I've just
                never made the connection until now?


            (Aside: can we stop using the term 'PaaS' to refer to
        "everything
            that Nova doesn't do"? This habit is not helping us to
        communicate
            clearly.)

            cheers,
            Zane.

            [1] https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>
            <https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>>
            [2]

        
http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>

        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>>



        
__________________________________________________________________________
            OpenStack Development Mailing List (not for usage questions)
            Unsubscribe:

        [email protected]?subject:unsubscribe
        <http://[email protected]?subject:unsubscribe>

        <http://[email protected]?subject:unsubscribe
        <http://[email protected]?subject:unsubscribe>>

        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>

        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>




        
__________________________________________________________________________
        OpenStack Development Mailing List (not for usage questions)
        Unsubscribe:
        [email protected]?subject:unsubscribe
        <http://[email protected]?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>



    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    [email protected]?subject:unsubscribe
    <http://[email protected]?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

Reply via email to