Hi Kevin,

Sorry for the late reply, We have tried doing that and we were still seeing the same issues. I don't think the bug was quite the same as what we were seeing.

Unfortunately we have had to roll back to Mitaka as we had a tight deadline and not being able to create networks / have HA was fairly critical. Interestingly, now we are back on Mitaka, everything is working fine.

I will try and get a testing environment set up to see if I get the same results as we were seeing when we upgraded to Newton from Mitaka. I am not sure if it is something to do with our specific set up, but we have followed the OSA guidelines and as everything was working on Liberty and Mitaka I assume we have it all set up correctly.

I will keep you posted to our findings, as we may be onto another bug.

Regards,


On 06/12/16 14:07, Kevin Benton wrote:
There was a bug that the fixes just recently merged for where removing a router on the L3 agent was done in the wrong order and it resulted in issues cleaning up the interfaces with Linux Bridge + L3HA. https://bugs.launchpad.net/neutron/+bug/1629159

It could be the case that there is an orphaned veth pair in a deleted namespace from the same router when it was removed from the L3 agent.

For each L3 agent, can you shutdown the L3 agent, run the netns cleanup script, ensure all keepalived processes are dead, and then start the agent again?

On Tue, Dec 6, 2016 at 4:59 AM, Grant Morley <[email protected] <mailto:[email protected]>> wrote:

    They both appear to be "ACTIVE" which is what I would expect:

    root@management-1-utility-container-f1222d05:~# neutron port-show
    8cd027f1-9f8c-4077-9c8a-92abc62fadd4
    
+-----------------------+--------------------------------------------------------------------------------------+
    | Field                 | Value |
    
+-----------------------+--------------------------------------------------------------------------------------+
    | admin_state_up        | True |
    | allowed_address_pairs | |
    | binding:host_id       |
    network-1-neutron-agents-container-11d47568 |
    | binding:profile       | {} |
    | binding:vif_details   | {"port_filter": true} |
    | binding:vif_type      | bridge |
    | binding:vnic_type     | normal |
    | created_at            | 2016-12-05T10:58:01Z |
    | description | |
    | device_id             | a8a10308-d62f-420f-99cf-f3727ef2b784 |
    | device_owner          | network:router_ha_interface |
    | extra_dhcp_opts | |
    | fixed_ips             | {"subnet_id":
    "6495d542-4b78-40df-84af-31500aaa0bf8", "ip_address":
    "169.254.192.5"} |
    | id                    | 8cd027f1-9f8c-4077-9c8a-92abc62fadd4 |
    | mac_address           | fa:16:3e:58:a1:a4 |
    | name                  | HA port tenant
    e0ffdeb1e910469d9e625b95f2fa6c54 |
    | network_id            | 2b04fc3a-5c0d-4f55-996f-8888d8bd1e1d |
    | port_security_enabled | False |
    | project_id | |
    | revision_number       | 23 |
    | security_groups | |
    | status                | ACTIVE |
    | tenant_id | |
    | updated_at            | 2016-12-06T10:18:00Z |
    
+-----------------------+--------------------------------------------------------------------------------------+
    root@management-1-utility-container-f1222d05:~# neutron port-show
    bda1f324-3178-46e5-8638-0f454ba09cab
    
+-----------------------+--------------------------------------------------------------------------------------+
    | Field                 | Value |
    
+-----------------------+--------------------------------------------------------------------------------------+
    | admin_state_up        | True |
    | allowed_address_pairs | |
    | binding:host_id       |
    network-2-neutron-agents-container-40906bfc |
    | binding:profile       | {} |
    | binding:vif_details   | {"port_filter": true} |
    | binding:vif_type      | bridge |
    | binding:vnic_type     | normal |
    | created_at            | 2016-12-05T10:58:01Z |
    | description | |
    | device_id             | a8a10308-d62f-420f-99cf-f3727ef2b784 |
    | device_owner          | network:router_ha_interface |
    | extra_dhcp_opts | |
    | fixed_ips             | {"subnet_id":
    "6495d542-4b78-40df-84af-31500aaa0bf8", "ip_address":
    "169.254.192.1"} |
    | id                    | bda1f324-3178-46e5-8638-0f454ba09cab |
    | mac_address           | fa:16:3e:c3:8a:14 |
    | name                  | HA port tenant
    e0ffdeb1e910469d9e625b95f2fa6c54 |
    | network_id            | 2b04fc3a-5c0d-4f55-996f-8888d8bd1e1d |
    | port_security_enabled | False |
    | project_id | |
    | revision_number       | 15 |
    | security_groups | |
    | status                | ACTIVE |
    | tenant_id | |
    | updated_at            | 2016-12-05T14:35:16Z |
    
+-----------------------+--------------------------------------------------------------------------------------+



    On 06/12/16 12:53, Kevin Benton wrote:
    Can you do a 'neutron port-show' for both of those HA ports to
    check their status field?

    On Tue, Dec 6, 2016 at 2:29 AM, Grant Morley
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Kevin & Neil,

        Many thanks for the reply. I have attached a screen shot
        showing that we cannot ping between the L3 HA nodes on the
        router name spaces. This was previously working fine with
        Mitaka, and has only stopped working since the upgrade to Newton.

        From the packet captures and TCP dumps, the traffic doesn't
        seem to be even leaving the namespace.

        On the attachment, the left hand side shows the state of
        keepalived showing both HA agents as master and the ring hand
        side is the ping attempt.

        Regards,

        On 06/12/16 10:14, Kevin Benton wrote:
        Yes, that is a misleading warning. What is happening is that
        it's trying to load the interface driver as an alias first,
        which results in a stevedore warning that you see and then
        it falls back to loading it by the class path, which is what
        you have configured. We will need to see if there is a way
        we can suppress that warning somehow when we make the call
        to load by an alias and it fails.

        If you switch your interface to just 'linuxbridge', that
        should get rid of the warning.


        For both L3 HA nodes becoming master, we need a little more
        info to figure out the root cause. Can you try switching
        into the router namespace on one of the L3 HA nodes and see
        if you can ping the other router instance across the L3 HA
        network for that router?

        On Mon, Dec 5, 2016 at 7:54 AM, Neil Jerram <[email protected]
        <mailto:[email protected]>> wrote:

            I have also recently been seeing 'Could not load
            <whatever>InterfaceDriver' warnings from the DHCP agent,
            and haven't yet understood that - although I'm pretty
            sure that my interface driver is being loaded really -
            or else none of my networking function would work at all.

            So it's possible that that part of your report is
            benign, and just a misleading warning.  That said, I am
            still worried about it too, and would like to understand
            it properly.

            I'm not aware of seeing the other symptoms you mentioned.

                 Neil


            On Mon, Dec 5, 2016 at 3:14 PM Grant Morley
            <[email protected]
            <mailto:[email protected]>> wrote:

                Hi All,

                We have just upgraded from Mitaka to Newton. We are
                running OSA and we seem to have come across some
                weird networking issues since the upgrade. Basically
                network access to instances is very intermittent and
                seems to randomly stop working.

                We are running neutron in HA and it appears that
                both of the neutron nodes are now trying to be
                master and are both trying to bring up the gateway
                IP addresses which would be causing conflicts.

                We are also seeing a lot of the following in the
                "neutron-dhcp-agent" log files:

                2016-12-05 14:42:24.837 2020 WARNING stevedore.named
                [req-1955d0a1-1453-4c65-a93a-54e8ea39b230
                1ac995c0729142289f7237222f335806
                3cc95dbe91c84e3e8ebbb9893ee54d20 - - -] Could not
                load neutron.agent.linux.interface.BridgeInterfaceDriver
                2016-12-05 14:42:42.803 2020 INFO
                neutron.agent.dhcp.agent
                [req-fad7d2bb-9d3c-4192-868a-0164b382aecf
                1ac995c0729142289f7237222f335806
                3cc95dbe91c84e3e8ebbb9893ee54d20 - - -] Trigger
                reload_allocations for port admin_state_up=True,
                allowed_address_pairs=[], binding:host_id=,
                binding:profile=, binding:vif_details=,
                binding:vif_type=unbound, binding:vnic_type=normal,
                created_at=2016-12-05T14:42:42Z, description=,
                device_id=8752effa-2ff2-4ce1-be70-e9f2243612cb,
                device_owner=network:floatingip, extra_dhcp_opts=[],
                fixed_ips=[{u'subnet_id':
                u'4ca7db2d-544a-4a97-b5a4-3cbf2467a4b7',
                u'ip_address': u'XXX.XXX.XXX.XXX'}],
                id=b3cf223d-8e76-484a-a649-d8a7dd435124,
                mac_address=fa:16:3e:ff:0d:50, name=,
                network_id=af5db886-0178-4f8d-9189-f55f773b37fa,
                port_security_enabled=False, project_id=,
                revision_number=4, security_groups=[], status=N/A,
                tenant_id=, updated_at=2016-12-05T14:42:42Z

                I am a bit concerned about neutron not being able to
                load the Bridge interface driver.

                Has anyone else come across this at all or have any
                pointers? This was working fine in Mitaka it just
                seems since the upgrade to Newton, we have these issues.

                I am able to provide more logs if they are needed.

                Regards,

-- Grant Morley
                Cloud Lead
                Absolute DevOps Ltd
                Units H, J & K, Gateway 1000, Whittle Way,
                Stevenage, Herts, SG1 2FP
                www.absolutedevops.io <http://www.absolutedevops.io>
                [email protected]
                <mailto:[email protected]> 0845 874 0580
                _______________________________________________
                OpenStack-operators mailing list
                [email protected]
                <mailto:[email protected]>
                
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
                
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>


            _______________________________________________
            OpenStack-operators mailing list
            [email protected]
            <mailto:[email protected]>
            
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
            
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>



-- Grant Morley
        Cloud Lead
        Absolute DevOps Ltd
        Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts,
        SG1 2FP
        www.absolutedevops.io <http://www.absolutedevops.io>
        [email protected] <mailto:[email protected]> 0845
        874 0580



-- Grant Morley
    Cloud Lead
    Absolute DevOps Ltd
    Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts, SG1 2FP
    www.absolutedevops.io <http://www.absolutedevops.io>
    [email protected] <mailto:[email protected]> 0845 874
    0580



--
Grant Morley
Cloud Lead
Absolute DevOps Ltd
Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts, SG1 2FP
www.absolutedevops.io <http://www.absolutedevops.io/> [email protected] <mailto:[email protected]> 0845 874 0580
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to