Hi Sam, Looks like Arun is looking at it ?
Arun, if you are not looking at it currently, please let me know I will take a look at it. Thanks Anil On Wed, Jan 24, 2018 at 4:25 AM, Sam Hague <[email protected]> wrote: > Adding openflow to thread. > > Anil, could someone take a look at this for carbon? We are seeing a > connection flapping and end up missing port status updates. This leads to > stale models and flows. > > This is blocking the carbon sr3. > > On Jan 24, 2018 12:58 AM, "D Arunprakash" <[email protected]> > wrote: > >> Ignore my previous email. >> >> >> >> The tunnel port got deleted around 18:49:29.373 and added back on >> 18:52:46.26 >> >> >> >> 2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653: sent >> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b): >> addr:3e:0c:ed:2e:a9:ba >> >> >> >> 2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653: sent >> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b): >> addr:8a:2f:9f:c6:fe:d9 >> >> >> >> Immediately after tunnel delete, I’m seeing so multiple switch flaps for >> quite sometime, >> >> >> >> 2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix: entering ACTIVE >> >> 2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent (Success): >> OFPT_HELLO (OF1.3) (xid=0x75): >> >> version bitmap: 0x04 >> >> 2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received: OFPT_HELLO >> (OF1.3) (xid=0x1): >> >> >> >> 2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix: connection >> closed by peer >> >> 2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix: entering >> DISCONNECTED >> >> 2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix: entering ACTIVE >> >> >> >> Also, I’m seeing error in karaf log >> >> >> >> 2018-01-23 18:49:29,378 | WARN | entLoopGroup-7-3 | >> DeviceContextImpl | 280 - org.opendaylight.openflowplugin.impl >> - 0.4.3.SNAPSHOT | writePortStatusMessage >> >> 2018-01-23 18:49:29,379 | WARN | entLoopGroup-7-3 | >> DeviceContextImpl | 280 - org.opendaylight.openflowplugin.impl >> - 0.4.3.SNAPSHOT | submit transaction for write port status message >> >> 2018-01-23 18:49:29,379 | WARN | rd-dispatcher-23 | >> ShardDataTree | 184 - >> org.opendaylight.controller.sal-distributed-datastore >> - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational: Store Tx >> member-1-datastore-operational-fe-0-chn-8-txn-11-0: Data validation >> failed for path /(urn:opendaylight:inventory?r >> evision=2013-08-19)nodes/node/node[{(urn:opendaylight:invent >> ory?revision=2013-08-19)id=openflow:246869078989547}]/Aug >> mentationIdentifier{childNames=[(urn:opendaylight:flow: >> inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:invento >> ry?revision=2013-08-19)stale-group, (urn:opendaylight:flow:invento >> ry?revision=2013-08-19)supported-match-types, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)table, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)group, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)software, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)description, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)meter]} >> /(urn:opendaylight:flow:inventory?revision=2013-08-19) >> table/table[{(urn:opendaylight:flow:inventory?revision=2013- >> 08-19)id=50}]/flow. >> >> org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException: >> Node /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/ >> node[{(urn:opendaylight:inventory?revision=2013-08-19)id= >> openflow:246869078989547}]/AugmentationIdentifier{childNames >> =[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)table, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)group, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)software, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)description, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, >> (urn:opendaylight:flow:inventory?revision=2013-08-19)meter]} >> /(urn:opendaylight:flow:inventory?revision=2013-08-19) >> table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow >> does not exist. Cannot apply modification to its children. >> >> >> >> We need to check why there is multiple switch disconnect and reconnect >> and how ofp handles the same. >> >> >> >> Regards, >> >> Arun >> >> >> >> *From:* Vishal Thapar >> *Sent:* Wednesday, January 24, 2018 9:52 AM >> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>; >> Josh Hershberg <[email protected]>; D Arunprakash < >> [email protected]> >> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]> >> *Subject:* RE: is dhcp issue fixed on carbon? >> >> >> >> Missed adding most important detail and added Arun. >> >> >> >> Inventory operational is still showing old port and new port for some >> reason. I guess that is what caused problems. >> >> >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/ >> log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56 >> >> >> >> {"id":"openflow:246869078989547:4","flow-node-inventory: >> supported":"","flow-node-inventory:peer-features":""," >> flow-node-inventory:port-number":4,"flow-node-inventory: >> hardware-address":"3e:0c:ed:2e:a9:ba","flow-node-inventory >> :current-feature":"","flow-node-inventory:maximum-speed": >> 0,"flow-node-inventory:reason":"add","flow-node-inventory: >> configuration":"","flow-node-inventory:advertised-features" >> :"","flow-node-inventory:current-speed":0,"flow-node- >> inventory:name":"tun55fb50d0a2b","flow-node-inventory:state" >> :{"link-down":false,"blocked":false,"live":false}} >> >> >> >> {"id":"openflow:246869078989547:9","flow-node-inventory: >> supported":"","flow-node-inventory:peer-features":""," >> flow-node-inventory:port-number":9,"flow-node-inventory: >> hardware-address":"8a:2f:9f:c6:fe:d9","flow-node-inventory >> :current-feature":"","flow-node-inventory:maximum-speed": >> 0,"flow-node-inventory:reason":"add","flow-node-inventory: >> configuration":"","flow-node-inventory:advertised-features" >> :"","flow-node-inventory:current-speed":0,"flow-node- >> inventory:name":"tun55fb50d0a2b","flow-node-inventory:state" >> :{"link-down":false,"blocked":false,"live":false}} >> >> >> >> OVS output from same set of logs: >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/ >> log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4 >> >> >> >> 9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9 >> >> config: 0 >> >> state: 0 >> >> speed: 0 Mbps now, 0 Mbps max >> >> >> >> So for now I’d peg it as OFPlugin issue. It didn’t detect or inform us of >> old port delete and that is why we didn’t delete old flows. Though >> wondering if something else in IFM code could’ve handled it, but don’t >> think we handle OfPort number changes, expect a delete+add in such >> scenarios. Faseela can pitch in why we have service binding entry with new >> port number but flow is still using old one. >> >> >> >> Regards, >> >> Vishal. >> >> >> >> *From:* Vishal Thapar >> *Sent:* 24 January 2018 09:26 >> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>; >> Josh Hershberg <[email protected]> >> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]> >> *Subject:* RE: is dhcp issue fixed on carbon? >> >> >> >> Quick analysis: >> >> >> >> Not related to policy stuff. Service binding has entry for the new port >> number but table 220 flow is still using old port number. >> >> >> >> { "bound-services": [ { "flow-cookie": 134217735, "flow-priority": 9, >> "instruction": [ { "apply-actions": { "action": [ { "order": 0, >> "output-action": { "max-length": 0, "output-node-connector": "*9*" } } ] >> }, "order": 0 } ], "service-name": "default.tun55fb50d0a2b", >> "service-priority": 9, "service-type": >> "interface-service-bindings:service-type-flow-based" >> } ], "interface-name": "tun55fb50d0a2b", "service-mode": >> "interface-service-bindings:service-mode-egress" } >> >> >> >> {"id":"246869078989547.220.tun55fb50d0a2b.0","priority":9," >> table_id":220,"installHw":true,"hard-timeout":0,"match":{" >> openflowplugin-extension-general:extension-list":[{"extensio >> n-key":"openflowplugin-extension-nicira-match:nxm-nx- >> reg6-key","extension":{"openflowplugin-extension- >> nicira-match:nxm-nx-reg":{"value":4096,"reg":"nicira- >> match:nxm-nx-reg6"}}}]},"cookie":134217735,"flow-name": >> "default.tun55fb50d0a2b","strict":true,"instructions":{" >> instruction":[{"order":0,"apply-actions":{"action":[{"order" >> :0,"output-action":{"max-length":0,"output-node-connector":"*4* >> "}}]}}]},"barrier":false,"idle-timeout":0} >> >> >> >> cookie=0x8000007, duration=403.965s, table=220, n_packets=0, n_bytes=0, >> priority=9,reg6=0x1000 actions=output:*4* >> >> >> >> In OVS logs you can see this tunnel port getting deleted and then coming >> back in with a different OfPort. >> >> >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/ >> compute_2/ovs-vswitchd.log.gz >> >> >> >> It goes from 4 to 9. This happens due to clean up in previous suite which >> doesn’t actually clean up everything and leaves entry for that old service >> binding. Can confirm it from interfaces-state entry for same port in first >> and second suites. So we have stale flows and stale service bindings for >> old tunnel port. Could probably check with OFPlugin how they handle update >> of a flow, may probably not work. >> >> >> >> We need to check if cleanup has been done completely before moving to >> next suite. This is where the work we been doing on tools comes in. >> >> >> >> Regards, >> >> Vishal. >> >> >> >> *From:* Faseela K >> *Sent:* 24 January 2018 08:10 >> *To:* Sam Hague <[email protected]>; Josh Hershberg <[email protected]> >> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < >> [email protected]>; Manu B <[email protected]> >> *Subject:* RE: is dhcp issue fixed on carbon? >> >> >> >> Looks more or less similar issue, tunnel flow is programmed in table 220 >> with older tunnel’s port number, which was deleted in l2 suite. However >> policy code has not kicked in. I will take a detailed look on what is >> causing this issue now. >> >> >> >> Thanks, >> >> Faseela >> >> >> >> *From:* Faseela K >> *Sent:* Wednesday, January 24, 2018 7:48 AM >> *To:* 'Sam Hague' <[email protected]>; Josh Hershberg < >> [email protected]> >> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < >> [email protected]>; Manu B <[email protected]> >> *Subject:* RE: is dhcp issue fixed on carbon? >> >> >> >> Thanks Sam for initial triaging. >> >> I will take a look at this. >> >> >> >> *From:* Sam Hague [mailto:[email protected] <[email protected]>] >> *Sent:* Wednesday, January 24, 2018 6:54 AM >> *To:* Faseela K <[email protected]>; Josh Hershberg < >> [email protected]> >> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < >> [email protected]>; Manu B <[email protected]> >> *Subject:* Re: is dhcp issue fixed on carbon? >> >> >> >> OK, seems pretty consistent that table 220 flows are not showing up. >> Vishal, Faseela, can you see if it is like the policymgr one where the >> bind/unbind was wrong? That seems the closest culprit as those were the >> last patches merged. >> >> >> >> Here is another case where the table 220 flow is missing in suite [5] of >> job [6]. This time the port missing is a tunnel port. "9(tun55fb50d0a2b): >> addr:8a:2f:9f:c6:fe:d9" is missing from table 220. And then in suite [7] >> of the same job this port has the same issue where the tunnel port is >> missing: "16(tap28760838-a7): addr:fe:16:3e:26:0a:e3" >> >> >> >> [5] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/ >> log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4 >> >> >> >> [6] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins >> -1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ >> 263/log_02_l3.html.gz >> >> >> >> [7] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins >> -1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ >> 263/log_04_security_group.html.gz >> >> >> >> On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague <[email protected]> wrote: >> >> further details for Josh since the original email doesn't have many... >> >> >> >> - so the "l3.Check Vm Instances Have Ip Address" test fails with the >> net1 not being able to get all the vm ips for it's three vms. >> >> - '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None' - this means >> the first vm of the three did not get a ip >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/ >> 298/log_02_l3.html.gz#s1-t11-k8 >> >> >> >> - looks at the neutron ports to find which port goes with vm1 >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/ >> 298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2 >> >> get the missing ip as 31.0.0.6, then look at next log to get the port >> >> - look at the 31.0.0.x addresses, we know 31,0.0 >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/ >> 298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2 >> >> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | >> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' >> | ACTIVE | >> >> >> >> - I know the first vm (and second) are both on the compute_1 so look at >> the ovs logs on compute_1 >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/ >> 298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4 >> >> >> >> - compute_1, in the ofctl show br-int, we see port 7 >> >> 7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f >> >> >> >> - then check flows to see if there is a table 220 flow for port 7 >> >> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n >> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/ >> 298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4 >> >> And the table 220 flow for port 7 is not there, so the vm can't get an IP. >> >> >> >> [3] is the patch vishal pushed to fix a similar issue the first time we >> saw this. What we found is that the elan tag was being reused, because a >> port was deleted and then a new one created and the elan tag reused. So you >> ended up with a tunnel port stomping on a vm port. >> >> [3] https://git.opendaylight.org/gerrit/#/c/67009/ >> >> >> >> On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague <[email protected]> wrote: >> >> Adding Josh to thread. >> >> >> >> On Tue, Jan 23, 2018 at 2:25 PM, Faseela K <[email protected]> >> wrote: >> >> Manu, >> >> Could you please take a look at the DHCP failure in the below run? >> >> I am caught up with something else, will help you out in initial >> triaging. >> >> Thanks, >> >> Faseela >> >> >> >> *From:* Sam Hague [mailto:[email protected]] >> *Sent:* Monday, January 22, 2018 10:57 PM >> *To:* Vishal Thapar <[email protected]>; Faseela K < >> [email protected]>; Jamo Luhrsen <[email protected]> >> *Subject:* is dhcp issue fixed on carbon? >> >> >> >> Vishal, Faseela, >> >> >> >> can you look at this job run to see if the issue you fixed with the >> policymgr binding is fixed? in this build the whole poligymgr bundle has >> been removed. This is carbon so I just removed the whole bundle as we would >> never use it. Could that have uncovered something that the code was doing? >> If so, then even master and nitrogen should have the issue since there we >> disabled building policymgr - so should be the same as removing it. >> >> >> >> Other thing, merged in carbon is the bind/unbind patches for elan and >> dhcp. Could those have an impact? >> >> >> >> Thanks, Sam >> >> >> >> I don't see "7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f" pop up in the >> table 220 flows which was the problem before. >> >> >> >> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | >> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' >> | ACTIVE | >> >> >> >> Thanks, Sam >> >> >> >> [1] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins >> -1/netvirt-csit-1node-openstack-ocata-upstream-stateful- >> carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4 >> >> >> >> >> >> >> > -- Thanks Anil
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
