I didn't add any extra tcpdump yet, but here is a job that will
enable TRACE for ofp.impl.

it's only running the four connectivity/ folder suites.

https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-1node-openstack-ocata-jamo-upstream-stateful-carbon/11/

JamO


On 1/24/18 8:04 PM, D Arunprakash wrote:
Please enable the logs for the below package,

org.opendaylight.openflowplugin.impl

Regards,

Arun

*From:*Sam Hague [mailto:[email protected]]
*Sent:* Thursday, January 25, 2018 9:32 AM
*To:* D Arunprakash <[email protected]>
*Cc:* Anil Vishnoi <[email protected]>; openflowplugin-dev <[email protected]>; Vishal Thapar <[email protected]>; Faseela K <[email protected]>; Josh Hershberg <[email protected]>; Jamo Luhrsen <[email protected]>; Manu B <[email protected]>
*Subject:* RE: is dhcp issue fixed on carbon?

What logs to enable? Or just the whole openflowplugin, which is very noisy.

Is packet capture on port 6653 good?

On Jan 24, 2018 10:54 PM, "D Arunprakash" <[email protected] <mailto:[email protected]>> wrote:

    Hi Sam,

    We need to enable Openflowplugin logs and do packet capture for the
    time window when the tunnel port is delete and added back.

    Regards,

    Arun

    *From:*Anil Vishnoi [mailto:[email protected]
    <mailto:[email protected]>]
    *Sent:* Thursday, January 25, 2018 1:17 AM
    *To:* Sam Hague <[email protected] <mailto:[email protected]>>
    *Cc:* D Arunprakash <[email protected]
    <mailto:[email protected]>>; openflowplugin-dev
    <[email protected]
    <mailto:[email protected]>>; Vishal Thapar
    <[email protected] <mailto:[email protected]>>;
    Faseela K <[email protected] <mailto:[email protected]>>;
    Josh Hershberg <[email protected] <mailto:[email protected]>>;
    Jamo Luhrsen <[email protected] <mailto:[email protected]>>;
    Manu B <[email protected] <mailto:[email protected]>>


    *Subject:* Re: is dhcp issue fixed on carbon?

    Hi Sam,

    Looks like Arun is looking at it ?

    Arun, if you are not looking at it currently, please let me know I
    will take a look at it.

    Thanks

    Anil

    On Wed, Jan 24, 2018 at 4:25 AM, Sam Hague <[email protected]
    <mailto:[email protected]>> wrote:

        Adding openflow to thread.

        Anil, could someone take a look at this for carbon? We are
        seeing a connection flapping and end up missing port status
        updates. This leads to stale models and flows.

        This is blocking the carbon sr3.

        On Jan 24, 2018 12:58 AM, "D Arunprakash"
        <[email protected] <mailto:[email protected]>>
        wrote:

            Ignore my previous email.

            The tunnel port got deleted around 18:49:29.373 and added
            back on 18:52:46.26

            2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653
            <http://10.30.170.63:6653>: sent (Success): OFPT_PORT_STATUS
            (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b):
            addr:3e:0c:ed:2e:a9:ba

            2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653
            <http://10.30.170.63:6653>: sent (Success): OFPT_PORT_STATUS
            (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b):
            addr:8a:2f:9f:c6:fe:d9

            Immediately after tunnel delete, I’m seeing so multiple
            switch flaps for quite sometime,

            2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix:
            entering ACTIVE

            2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent
            (Success): OFPT_HELLO (OF1.3) (xid=0x75):

            version bitmap: 0x04

            2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received:
            OFPT_HELLO (OF1.3) (xid=0x1):

            2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix:
            connection closed by peer

            2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix:
            entering DISCONNECTED

            2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix:
            entering ACTIVE

            Also, I’m seeing error in karaf log

            2018-01-23 18:49:29,378 | WARN  | entLoopGroup-7-3 |
            DeviceContextImpl                | 280 -
            org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT |
            writePortStatusMessage

            2018-01-23 18:49:29,379 | WARN  | entLoopGroup-7-3 |
            DeviceContextImpl                | 280 -
            org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT |
            submit transaction for write port status message

            2018-01-23 18:49:29,379 | WARN  | rd-dispatcher-23 |
            ShardDataTree                    | 184 -
            org.opendaylight.controller.sa
            <http://org.opendaylight.controller.sa>l-distributed-datastore
            - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational:
            Store Tx member-1-datastore-operational-fe-0-chn-8-txn-11-0:
            Data validation failed for path
            
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow.

            
org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException:
            Node
            
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
            (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
            
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow
            does not exist. Cannot apply modification to its children.

            We need to check why there is multiple switch disconnect and
            reconnect and how ofp handles the same.

            Regards,

            Arun

            *From:*Vishal Thapar
            *Sent:* Wednesday, January 24, 2018 9:52 AM
            *To:* Faseela K <[email protected]
            <mailto:[email protected]>>; Sam Hague
            <[email protected] <mailto:[email protected]>>; Josh
            Hershberg <[email protected]
            <mailto:[email protected]>>; D Arunprakash
            <[email protected] <mailto:[email protected]>>
            *Cc:* Jamo Luhrsen <[email protected]
            <mailto:[email protected]>>; Manu B <[email protected]
            <mailto:[email protected]>>
            *Subject:* RE: is dhcp issue fixed on carbon?

            Missed adding most important detail and added Arun.

            Inventory operational is still showing old port and new port
            for some reason. I guess that is what caused problems.

            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56

            
{"id":"openflow:246869078989547:4","flow-node-inventory:supported":"","flow-node-inventory:peer-features":"","flow-node-inventory:port-number":4,"flow-node-inventory:hardware-address":"3e:0c:ed:2e:a9:ba","flow-node-inventory:current-feature":"","flow-node-inventory:maximum-speed":0,"flow-node-inventory:reason":"add","flow-node-inventory:configuration":"","flow-node-inventory:advertised-features":"","flow-node-inventory:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-inventory:state":{"link-down":false,"blocked":false,"live":false}}

            
{"id":"openflow:246869078989547:9","flow-node-inventory:supported":"","flow-node-inventory:peer-features":"","flow-node-inventory:port-number":9,"flow-node-inventory:hardware-address":"8a:2f:9f:c6:fe:d9","flow-node-inventory:current-feature":"","flow-node-inventory:maximum-speed":0,"flow-node-inventory:reason":"add","flow-node-inventory:configuration":"","flow-node-inventory:advertised-features":"","flow-node-inventory:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-inventory:state":{"link-down":false,"blocked":false,"live":false}}

            OVS output from same set of logs:

            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4

            9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9

                 config: 0

                 state: 0

                 speed: 0 Mbps now, 0 Mbps max

            So for now I’d peg it as OFPlugin issue. It didn’t detect or
            inform us of old port delete and that is why we didn’t
            delete old flows. Though wondering if something else in IFM
            code could’ve handled it, but don’t think we handle OfPort
            number changes, expect a delete+add in such scenarios.
            Faseela can pitch in why we have service binding entry with
            new port number but flow is still using old one.

            Regards,

            Vishal.

            *From:*Vishal Thapar
            *Sent:* 24 January 2018 09:26
            *To:* Faseela K <[email protected]
            <mailto:[email protected]>>; Sam Hague
            <[email protected] <mailto:[email protected]>>; Josh
            Hershberg <[email protected] <mailto:[email protected]>>
            *Cc:* Jamo Luhrsen <[email protected]
            <mailto:[email protected]>>; Manu B <[email protected]
            <mailto:[email protected]>>
            *Subject:* RE: is dhcp issue fixed on carbon?

            Quick analysis:

            Not related to policy stuff. Service binding has entry for
            the new port number but table 220 flow is still using old
            port number.

            { "bound-services": [ { "flow-cookie": 134217735,
            "flow-priority": 9, "instruction": [ { "apply-actions": {
            "action": [ { "order": 0, "output-action": { "max-length":
            0, "output-node-connector": "*9*" } } ] }, "order": 0 } ],
            "service-name": "default.tun55fb50d0a2b",
            "service-priority": 9, "service-type":
            "interface-service-bindings:service-type-flow-based" } ],
            "interface-name": "tun55fb50d0a2b", "service-mode":
            "interface-service-bindings:service-mode-egress" }

            
{"id":"246869078989547.220.tun55fb50d0a2b.0","priority":9,"table_id":220,"installHw":true,"hard-timeout":0,"match":{"openflowplugin-extension-general:extension-list":[{"extension-key":"openflowplugin-extension-nicira-match:nxm-nx-reg6-key","extension":{"openflowplugin-extension-nicira-match:nxm-nx-reg":{"value":4096,"reg":"nicira-match:nxm-nx-reg6"}}}]},"cookie":134217735,"flow-name":"default.tun55fb50d0a2b","strict":true,"instructions":{"instruction":[{"order":0,"apply-actions":{"action":[{"order":0,"output-action":{"max-length":0,"output-node-connector":"*4*"}}]}}]},"barrier":false,"idle-timeout":0}

            cookie=0x8000007, duration=403.965s, table=220, n_packets=0,
            n_bytes=0, priority=9,reg6=0x1000 actions=output:*4*

            In OVS logs you can see this tunnel port getting deleted and
            then coming back in with a different OfPort.

            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/compute_2/ovs-vswitchd.log.gz

            It goes from 4 to 9. This happens due to clean up in
            previous suite which doesn’t actually clean up everything
            and leaves entry for that old service binding. Can confirm
            it from interfaces-state entry for same port in first and
            second suites. So we have stale flows and stale service
            bindings for old tunnel port. Could probably check with
            OFPlugin how they handle update of a flow, may probably not
            work.

            We need to check if cleanup has been done completely before
            moving to next suite. This is where the work we been doing
            on tools comes in.

            Regards,

            Vishal.

            *From:*Faseela K
            *Sent:* 24 January 2018 08:10
            *To:* Sam Hague <[email protected]
            <mailto:[email protected]>>; Josh Hershberg
            <[email protected] <mailto:[email protected]>>
            *Cc:* Vishal Thapar <[email protected]
            <mailto:[email protected]>>; Jamo Luhrsen
            <[email protected] <mailto:[email protected]>>; Manu B
            <[email protected] <mailto:[email protected]>>
            *Subject:* RE: is dhcp issue fixed on carbon?

            Looks more or less similar issue, tunnel flow is programmed
            in table 220 with older tunnel’s port number, which was
            deleted in l2 suite. However policy code has not kicked in.
            I will take a detailed look on what is causing this issue now.

            Thanks,

            Faseela

            *From:*Faseela K
            *Sent:* Wednesday, January 24, 2018 7:48 AM
            *To:* 'Sam Hague' <[email protected]
            <mailto:[email protected]>>; Josh Hershberg
            <[email protected] <mailto:[email protected]>>
            *Cc:* Vishal Thapar <[email protected]
            <mailto:[email protected]>>; Jamo Luhrsen
            <[email protected] <mailto:[email protected]>>; Manu B
            <[email protected] <mailto:[email protected]>>
            *Subject:* RE: is dhcp issue fixed on carbon?

            Thanks Sam for initial triaging.

            I will take a look at this.

            *From:*Sam Hague [mailto:[email protected]]
            *Sent:* Wednesday, January 24, 2018 6:54 AM
            *To:* Faseela K <[email protected]
            <mailto:[email protected]>>; Josh Hershberg
            <[email protected] <mailto:[email protected]>>
            *Cc:* Vishal Thapar <[email protected]
            <mailto:[email protected]>>; Jamo Luhrsen
            <[email protected] <mailto:[email protected]>>; Manu B
            <[email protected] <mailto:[email protected]>>
            *Subject:* Re: is dhcp issue fixed on carbon?

            OK, seems pretty consistent that table 220 flows are not
            showing up. Vishal, Faseela, can you see if it is like the
            policymgr one where the bind/unbind was wrong? That seems
            the closest culprit as those were the last patches merged.

            Here is another case where the table 220 flow is missing in
            suite [5] of job [6]. This time the port missing is a tunnel
            port. "9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9" is missing
            from table 220. And then in suite [7] of the same job this
            port has the same issue where the tunnel port is missing:
            "16(tap28760838-a7): addr:fe:16:3e:26:0a:e3"

            [5]
            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4

            [6]
            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz

            [7]
            
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_04_security_group.html.gz

            On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague
            <[email protected] <mailto:[email protected]>> wrote:

                further details for Josh since the original email
                doesn't have many...

                - so the "l3.Check Vm Instances Have Ip Address" test
                fails with the net1 not being able to get all the vm ips
                for it's three vms.

                - '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None'
                - this means the first vm of the three did not get a ip

                
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k8

                - looks at the neutron ports to find which port goes
                with vm1

                
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2

                get the missing ip as 31.0.0.6, then look at next log to
                get the port

                - look at the 31.0.0.x addresses, we know 31,0.0

                
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2

                3862fa17-4e7d-4d41-9237-c372fca11c03 | |
                fa:16:3e:96:06:3f | ip_address='31.0.0.6',
                subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' | ACTIVE |

                - I know the first vm (and second) are both on the
                compute_1 so look at the ovs logs on compute_1

                
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4

                - compute_1, in the ofctl show br-int, we see port 7

                7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f

                - then check flows to see if there is a table 220 flow
                for port 7

                
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4

                And the table 220 flow for port 7 is not there, so the
                vm can't get an IP.

                [3] is the patch vishal pushed to fix a similar issue
                the first time we saw this. What we found is that the
                elan tag was being reused, because a port was deleted
                and then a new one created and the elan tag reused. So
                you ended up with a tunnel port stomping on a vm port.

                [3] https://git.opendaylight.org/gerrit/#/c/67009/

                On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague
                <[email protected] <mailto:[email protected]>> wrote:

                    Adding Josh to thread.

                    On Tue, Jan 23, 2018 at 2:25 PM, Faseela K
                    <[email protected]
                    <mailto:[email protected]>> wrote:

                        Manu,

                            Could you please take a look at the DHCP
                        failure in the below run?

                             I am caught up with something else, will
                        help you out in initial triaging.

                        Thanks,

                        Faseela

                        *From:*Sam Hague [mailto:[email protected]
                        <mailto:[email protected]>]
                        *Sent:* Monday, January 22, 2018 10:57 PM
                        *To:* Vishal Thapar <[email protected]
                        <mailto:[email protected]>>; Faseela K
                        <[email protected]
                        <mailto:[email protected]>>; Jamo Luhrsen
                        <[email protected] <mailto:[email protected]>>
                        *Subject:* is dhcp issue fixed on carbon?

                        Vishal, Faseela,

                        can you look at this job run to see if the issue
                        you fixed with the policymgr binding is fixed?
                        in this build the whole poligymgr bundle has
                        been removed. This is carbon so I just removed
                        the whole bundle as we would never use it. Could
                        that have uncovered something that the code was
                        doing? If so, then even master and nitrogen
                        should have the issue since there we disabled
                        building policymgr - so should be the same as
                        removing it.

                        Other thing, merged in carbon is the bind/unbind
                        patches for elan and dhcp. Could those have an
                        impact?

                        Thanks, Sam

                        I don't see "7(tap3862fa17-4e):
                        addr:fe:16:3e:96:06:3f" pop up in the table 220
                        flows which was the problem before.

                        3862fa17-4e7d-4d41-9237-c372fca11c03 | |
                        fa:16:3e:96:06:3f | ip_address='31.0.0.6',
                        subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd'
                        | ACTIVE |

                        Thanks, Sam

                        [1]
                        
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4



--
    Thanks

    Anil


_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to