Re: [openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane

Bogdan Dobrelya Wed, 14 Feb 2018 05:02:48 -0800

On 2/14/18 11:58 AM, Daniel Alvarez Sanchez wrote:

On Wed, Feb 14, 2018 at 5:40 AM, Brian Haley <[email protected]<mailto:[email protected]>> wrote:


    On 02/13/2018 05:08 PM, Armando M. wrote:



        On 13 February 2018 at 14:02, Brent Eagles <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

             Hi,

             The neutron agents are implemented in such a way that key
             functionality is implemented in terms of haproxy, dnsmasq,
             keepalived and radvd configuration. The agents manage
        instances of
             these services but, by design, the parent is the top-most
        (pid 1).

             On baremetal this has the advantage that, while control plane
             changes cannot be made while the agents are not available, the
             configuration at the time the agents were stopped will work
        (for
             example, VMs that are restarted can request their IPs, etc). In
             short, the dataplane is not affected by shutting down the
        agents.

             In the TripleO containerized version of these agents, the
        supporting
             processes (haproxy, dnsmasq, etc.) are run within the agent's
             container so when the container is stopped, the supporting
        processes
             are also stopped. That is, the behavior with the current
        containers
             is significantly different than on baremetal and
        stopping/restarting
             containers effectively breaks the dataplane. At the moment
        this is
             being considered a blocker and unless we can find a
        resolution, we
             may need to recommend running the L3, DHCP and metadata
        agents on
             baremetal.


    I didn't think the neutron metadata agent was affected but just the
    ovn-metadata agent?  Or is there a problem with the UNIX domain
    sockets the haproxy instances use to connect to it when the
    container is restarted?

That's right. In ovn-metadata-agent we spawn haproxy inside theq-ovnmeta namespaceand this is where we'll find a problem if the process goes away. As yousaid, neutronmetadata agent is basically receiving the proxied requests fromhaproxies residingin either q-router or q-dhcp namespaces on its UNIX socket and sendingthem to Nova.




        There's quite a bit to unpack here: are you suggesting that
        running these services in HA configuration doesn't help either
        with the data plane being gone after a stop/restart? Ultimately
        this boils down to where the state is persisted, and while
        certain agents rely on namespaces and processes whose ephemeral
        nature is hard to persist, enough could be done to allow for a
        non-disruptive bumping of the afore mentioned services.


    Armando - https://review.openstack.org/#/c/542858/
    <https://review.openstack.org/#/c/542858/> (if accepted) should help
    with dataplane downtime, as sharing the namespaces lets them
    persist, which eases what the agent has to configure on the restart
    of a container (think of what the l3-agent needs to create for 1000
    routers).

    But it doesn't address dnsmasq being unavailable when the dhcp-agent
    container is restarted like it is today.  Maybe one way around that
    is to run 2+ agents per network, but that still leaves a regression
    from how it works today.  Even with l3-ha I'm not sure things are
    perfect, might wind-up with two masters sometimes.

    I've seen one suggestion of putting all these processes in their own
    container instead of the agent container so they continue to run, it
    just might be invasive to the neutron code.  Maybe there is another
    option?

I had some idea based on that one to reduce the impact on neutron codeand its dependency oncontainers. Basically, we would be running dnsmasq, haproxy, keepalived,radvd, etcin separate containers (it makes sense as they have independentlifecycles) and we would drive


+1 for that separation

those through the docker socket from neutron agents. In order to reducethis dependency, Ithought of having some sort of 'rootwrap-daemon-docker' which takes the

Let's please avoid using 'docker' in names, could it be rootwrap-cri orrootwrap-engine-moby or something?

commands and
checks if it has to spawn the process in a separate container (forexample, iptables wouldn't
be the case) and if so, it'll use the docker socket to do it.
We'll also have to monitor the PID files on those containers to respawnthem in case they
die.
IMHO, this is far from the containers philosophy since we're using hostnetworking,privileged access, sharing namespaces, relying on 'sidecar'containers... but I can't think of
a better way to do it.

This still looks fitting well into the k8s pods concept [0], withhealthchecks and shared namespaces and logical coupling of sidecars,which is the agents and helping daemons running in namespaces. I hope itdoes.


[0] https://kubernetes.io/docs/concepts/workloads/pods/pod/




    -Brian


    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    [email protected]?subject:unsubscribe
    <http://[email protected]?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane

Reply via email to