Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

Zane Bitter Tue, 23 May 2017 11:51:12 -0700

On 19/05/17 19:53, Kevin Benton wrote:

So making a subnet ID mandatory for a port creation and
a RouterInterface ID mandatory for a Floating IP creation are both
possible in Heat without Neutron changes. Presumably you haven't done
that because it's backwards-incompatible, but you would need to
implement the change anyway if the Neutron API was changed to require it.


It seems like Heat has a backwards-compatibility requirement for
supporting old templates that aren't explicit. That will be the real
blocker to actually making any of these changes, no? i.e. Neutron isn't
preventing Heat from being more strict, it's the legacy Heat modeling
that is preventing it.

We have a translation mechanism for resource properties (much improvedin Pike - thanks prazumovsky!) that could in theory help us to make sucha change (with or without a corresponding change in the Neutron API)without breaking existing users (although it would probably require abunch of expensive API calls at inopportune times). That would likely bejust as much of a pain to maintain as the workarounds we have now, sotbh we're likely to stick with reflecting the Neutron API directly,whatever it does.

I've long since chalked this one up to 'lessons learned'; if I keepharping on it, it's because I want to make sure that everyone reallydoes learn the lessons.

(a) drop the requirement that the Network has to be connected to the

external network with the FloatingIPs with a RouterInterface prior to
creating the FloatingIP. IIUC only *some* Neutron backends require this.

This can produce difficult to debug situations when multiple routers
attached to different external networks are attached to different
subnets of the same network and the user associates a floating IP to the
wrong fixed IP of the instance. Right now the interface check will
prevent that, but if we remove it the floating IP would just sit in the
DOWN state.

If a backend supports floating IPs without router interfaces entirely,
it's likely making assumptions that prevent it from supporting
multi-router scenarios. A single fixed IP on a port can have multiple
floating IPs associated with it from different external networks. So the
only way to distinguish which floating IP to translate to is which
router the traffic is being directed to by the instance, which requires
router interfaces.

Cheers

On Fri, May 19, 2017 at 3:29 PM, Zane Bitter <zbit...@redhat.com
<mailto:zbit...@redhat.com>> wrote:

    On 19/05/17 17:03, Kevin Benton wrote:

        I split this conversation off of the "Is the pendulum swinging
        on PaaS
        layers?" thread [1] to discuss some improvements we can make to
        Neutron
        to make orchestration easier.

        There are some pain points that heat has when working with the
        Neutron
        API. I would like to get them converted into requests for
        enhancements
        in Neutron so the wider community is aware of them.

        Starting with the port/subnet/network relationship - it's
        important to
        understand that IP addresses are not required on a port.

            So knowing now that a Network is a layer-2 network segment
            and a Subnet

        is... effectively a glorified DHCP address pool

        Yes, a Subnet controls IP address allocation as well as setting up
        routing for routers, which is why routers reference subnets
        instead of
        networks (different routers can route for different subnets on
        the same
        network). It essentially dictates things related to L3
        addressing and
        provides information for L3 reachability.

            But at the end of the day, I still can't create a Port until
            a Subnet exists


        This is only true if you want an IP address on the port. This sounds
        silly for most use cases, but there are a non-trivial portion of NFV
        workloads that do not want IP addresses at all so they create a
        network
        and just attach ports without creating any subnets.


    Fair. A more precise statement of the problem would be that given a
    template containing both a Port and a Subnet that it will be
    attached to, there is a specific order in which those need to be
    created that is _not_ reflected in the data flow between them.

            I still don't know what Subnet a Port will be attached to
            (unless the

        user specifies it explicitly using the --fixed-ip option...
        regardless
        of whether they actually specify a fixed IP),

        So what would you like Neutron to do differently here? Always
        force a
        user to pick which subnet they want an allocation from


    That would work.

        if there are
        multiple?


    Ideally even if there aren't.

        If so, can't you just force that explicitness in Heat?


    I think the answer here is exactly the same as for Neutron: yes, we
    totally could have if we'd realised that it was a problem at the time.

            and I have no way in general of telling which Subnets can be
            deleted before a given Port is and which will fail to delete
            until the Port disappears.


        A given port will only block subnet deletions from subnets it is
        attached to. Conversely, you can see all ports with allocations
        from a
        subnet with 'neutron port-list --fixed-ips
        subnet_id=<subnet-UUID>'.  So
        is the issue here that the dependency wasn't made explicit in
        the heat
        modeling (leading to the problem above and this one)?


    Yes, that's exactly the issue. The Heat modelling was based on 1:1
    with the Neutron API to minimise user confusion.

        For the individual bugs you highlighted, it would be good if you can
        provide some details about what changes we could make to help.


        https://bugs.launchpad.net/heat/+bug/1442121
        <https://bugs.launchpad.net/heat/+bug/1442121> - This looks like
        a result
        of partially specified floating IPs (no fixed_ip). What can we
        add/change here to help? Or can heat just always force the user to
        specify a fixed IP for the case where disambiguation on multiple
        fixed_ip ports is needed?


    This is the issue from which all the others on that list were
    spawned (see
    https://bugs.launchpad.net/heat/+bug/1442121/comments/10
    <https://bugs.launchpad.net/heat/+bug/1442121/comments/10>), so the
    only thing we're planning to actually do for this one is to catch
    any exceptions closer to where they occur than we're doing in the
    fix for https://bugs.launchpad.net/heat/+bug/1554625
    <https://bugs.launchpad.net/heat/+bug/1554625>

        https://launchpad.net/bugs/1626607
        <https://launchpad.net/bugs/1626607>


    Note that this one is fixed.

        - I see this is about a dependency
        between RouterGateways and RouterInterfaces, but it's not clear
        to me
        why that dependency exists. Is it to solve a lack of visibility
        into the
        interfaces required for a floating IP?


    Yes, exactly.

    We essentially solved the RouterGateway/RouterInterface half of the
    problem in Heat back in Juno, by deprecating the
    OS::Neutron::RouterGateway resource and replacing it with an
    "external_gateway_info" property in OS::Neutron::Router. Old
    templates never die though.

        https://bugs.launchpad.net/heat/+bug/1626619
        <https://bugs.launchpad.net/heat/+bug/1626619>,
        https://bugs.launchpad.net/heat/+bug/1626630
        <https://bugs.launchpad.net/heat/+bug/1626630>, and
        https://bugs.launchpad.net/heat/+bug/1626634
        <https://bugs.launchpad.net/heat/+bug/1626634> - These seems
        similar to
        1626607.


    The first and third are the RouterInterface/FloatingIP half of the
    problem. And to work around that we also have to work around the
    Subnet/Port problem (that's the third bug). The second bug is the
    RouterGateway/RouterInterface equivalent of the third.

        Can we just expose the interfaces/router a floating IP is
        depending on explicitly in the API for you to fix these?


    Not really. We need to know before any of them are actually created.
    Preferably without making any REST calls, because REST calls are
    slow and tend to raise exceptions at unfortunate times.

        If not, what
        can we do to help here?


    In principle, either:

    (a) drop the requirement that the Network has to be connected to the
    external network with the FloatingIPs with a RouterInterface prior
    to creating the FloatingIP. IIUC only *some* Neutron backends
    require this.

    or

    (b) require the user to provide the UUID of the RouterInterface
    through which they wish to connect when they create the FloatingIP.

    cheers,
    Zane.

        1.
        http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html>

        Cheers,
        Kevin Benton

        On Fri, May 19, 2017 at 1:05 PM, Zane Bitter <zbit...@redhat.com
        <mailto:zbit...@redhat.com>
        <mailto:zbit...@redhat.com <mailto:zbit...@redhat.com>>> wrote:

            On 19/05/17 15:06, Kevin Benton wrote:

                    Don't even get me started on Neutron.[2]


                It seems to me the conclusion to that thread was that the
                majority of
                your issues stemmed from the fact that we had poor
        documentation
                at the
                time.  A major component of the complaints resulted from you
                misunderstanding the difference between networks/subnets
        in Neutron.


            It's true that I was completely off base as to what the various
            primitives in Neutron actually do. (Thanks for educating
        me!) The
            implications for orchestration are largely unchanged though.
        It's a
            giant pain that we have to infer implicit dependencies
        between stuff
            to get them to create/delete in the right order, pretty much
            independently of what that stuff does.

            So knowing now that a Network is a layer-2 network segment and a
            Subnet is... effectively a glorified DHCP address pool, I
        understand
            better why it probably seemed like a good idea to hook stuff up
            magically. But at the end of the day, I still can't create a
        Port
            until a Subnet exists, I still don't know what Subnet a Port
        will be
            attached to (unless the user specifies it explicitly using the
            --fixed-ip option... regardless of whether they actually
        specify a
            fixed IP), and I have no way in general of telling which
        Subnets can
            be deleted before a given Port is and which will fail to delete
            until the Port disappears.

                There are some legitimate issues in there about the
        extra routes
                extension being replace-only and the routers API not
        accepting a
                list of
                interfaces in POST.  However, it hardly seems that those are
                worthy of
                "Don't even get me started on Neutron."


            https://launchpad.net/bugs/1626607
        <https://launchpad.net/bugs/1626607>
        <https://launchpad.net/bugs/1626607
        <https://launchpad.net/bugs/1626607>>
            https://launchpad.net/bugs/1442121
        <https://launchpad.net/bugs/1442121>
        <https://launchpad.net/bugs/1442121
        <https://launchpad.net/bugs/1442121>>
            https://launchpad.net/bugs/1626619
        <https://launchpad.net/bugs/1626619>
        <https://launchpad.net/bugs/1626619
        <https://launchpad.net/bugs/1626619>>
            https://launchpad.net/bugs/1626630
        <https://launchpad.net/bugs/1626630>
        <https://launchpad.net/bugs/1626630
        <https://launchpad.net/bugs/1626630>>
            https://launchpad.net/bugs/1626634
        <https://launchpad.net/bugs/1626634>
        <https://launchpad.net/bugs/1626634
        <https://launchpad.net/bugs/1626634>>

                It would be nice if you could write up something about
        current
                gaps that
                would make Heat's life easier, because a large chunk of
        that initial
                email is incorrect and linking to it as a big list of
        "issues" is
                counter-productive.


            Yes, agreed. I wish I had a clean thread to link to. It's a huge
            amount of work to research it all though.

            cheers,
            Zane.

                On Fri, May 19, 2017 at 7:36 AM, Zane Bitter
        <zbit...@redhat.com <mailto:zbit...@redhat.com>
                <mailto:zbit...@redhat.com <mailto:zbit...@redhat.com>>
                <mailto:zbit...@redhat.com <mailto:zbit...@redhat.com>
        <mailto:zbit...@redhat.com <mailto:zbit...@redhat.com>>>> wrote:

                    On 18/05/17 20:19, Matt Riedemann wrote:

                        I just wanted to blurt this out since it hit me
        a few
                times at the
                        summit, and see if I'm misreading the rooms.

                        For the last few years, Nova has pushed back on
        adding
                        orchestration to
                        the compute API, and even define a policy for it
        since
                it comes
                        up so
                        much [1]. The stance is that the compute API
        should expose
                        capabilities
                        that a higher-level orchestration service can stitch
                together
                        for a more
                        fluid end user experience.


                    I think this is a wise policy.

                        One simple example that comes up time and again is
                allowing a
                        user to
                        pass volume type to the compute API when booting
        from volume
                        such that
                        when nova creates the backing volume in Cinder,
        it passes
                        through the
                        volume type. If you need a non-default volume
        type for
                boot from
                        volume,
                        the way you do this today is first create the volume
                with said
                        type in
                        Cinder and then provide that volume to the
        compute API when
                        creating the
                        server. However, people claim that is bad UX or
        hard for
                users to
                        understand, something like that (at least from a
        command
                line, I
                        assume
                        Horizon hides this, and basic users should
        probably be
                using Horizon
                        anyway right?).


                    As always, there's a trade-off between simplicity and
                flexibility. I
                    can certainly understand the logic in wanting to
        make the simple
                    stuff simple. But users also need to be able to progress
                from simple
                    stuff to more complex stuff without having to give
        up and start
                    over. There's a danger of leading them down the
        garden path.

                        While talking about claims in the scheduler and
        a top-level
                        conductor
                        for cells v2 deployments, we've talked about the
        desire
                to eliminate
                        "up-calls" from the compute service to the top-level
                controller
                        services
                        (nova-api, nova-conductor and nova-scheduler). Build
                retries is
                        one such
                        up-call. CERN disables build retries, but others
        rely on
                them,
                        because
                        of how racy claims in the computes are (that's
        another
                story and why
                        we're working on fixing it). While talking about
        this,
                we asked,
                        "why
                        not just do away with build retries in nova
        altogether?
                If the
                        scheduler
                        picks a host and the build fails, it fails, and
        you have to
                        retry/rebuild/delete/recreate from a top-level
        service."


                    (FWIW Heat does this for you already.)

                        But during several different Forum sessions,
        like user API
                        improvements
                        [2] but also the cells v2 and claims in the
        scheduler
                sessions,
                        I was
                        hearing about how operators only wanted to
        expose the
                base IaaS
                        services
                        and APIs and end API users wanted to only use those,
                which means any
                        improvements in those APIs would have to be in
        the base
                APIs (nova,
                        cinder, etc). To me, that generally means any
        orchestration
                        would have
                        to be baked into the compute API if you're not
        using Heat or
                        something
                        similar.


                    The problem is that orchestration done inside APIs
        is very
                easy to
                    do badly in ways that cause lots of downstream pain for
                users and
                    external orchestrators. For example, Nova already
        does some
                    orchestration: it creates a Neutron port for a
        server if you
                don't
                    specify one. (And then promptly forgets that it has done
                so.) There
                    is literally an entire inner platform, an
        orchestrator within an
                    orchestrator, inside Heat to try to manage the
        fallout from
                this.
                    And the inner platform shares none of the elegance,
        such as
                it is,
                    of Heat itself, but is rather a collection of
                cobbled-together hacks
                    to deal with the seemingly infinite explosion of
        edge cases
                that we
                    kept running into over a period of at least 5 releases.

                    The get-me-a-network thing is... better, but there's no
                provision
                    for changes after the server is created, which means
        we have to
                    copy-paste the Nova implementation into Heat to deal
        with
                update.[1]
                    Which sounds like a maintenance nightmare in the making.
                That seems
                    to be a common mistake: to assume that once users create
                something
                    they'll never need to touch it again, except to
        delete it when
                    they're done.

                    Don't even get me started on Neutron.[2]

                    Any orchestration that is done behind-the-scenes
        needs to be
                done
                    superbly well, provide transparency for external
                orchestration tools
                    that need to hook in to the data flow, and should be
                developed in
                    consultation with potential consumers like Shade and
        Heat.

                        Am I missing the point, or is the pendulum really
                swinging away from
                        PaaS layer services which abstract the dirty
        details of the
                        lower-level
                        IaaS APIs? Or was this always something people
        wanted
                and I've just
                        never made the connection until now?


                    (Aside: can we stop using the term 'PaaS' to refer to
                "everything
                    that Nova doesn't do"? This habit is not helping us to
                communicate
                    clearly.)

                    cheers,
                    Zane.

                    [1] https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>
                <https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>>
                    <https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>
                <https://review.openstack.org/#/c/407328/
        <https://review.openstack.org/#/c/407328/>>>
                    [2]


        
http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>

        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>>


        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>

        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
        
<http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>>>




        
__________________________________________________________________________
                    OpenStack Development Mailing List (not for usage
        questions)
                    Unsubscribe:


        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>>


        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>>>


        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>


        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>

        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>>





        
__________________________________________________________________________
                OpenStack Development Mailing List (not for usage questions)
                Unsubscribe:

        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>>

        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>




        
__________________________________________________________________________
            OpenStack Development Mailing List (not for usage questions)
            Unsubscribe:

        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>>

        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>

        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
        <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

Reply via email to