Hi Sam,

Looks like Arun is looking at it ?

Arun, if you are not looking at it currently, please let me know I will
take a look at it.

Thanks
Anil

On Wed, Jan 24, 2018 at 4:25 AM, Sam Hague <[email protected]> wrote:

> Adding openflow to thread.
>
> Anil, could someone take a look at this for carbon? We are seeing a
> connection flapping and end up missing port status updates. This leads to
> stale models and flows.
>
> This is blocking the carbon sr3.
>
> On Jan 24, 2018 12:58 AM, "D Arunprakash" <[email protected]>
> wrote:
>
>> Ignore my previous email.
>>
>>
>>
>> The tunnel port got deleted around 18:49:29.373 and added back on
>> 18:52:46.26
>>
>>
>>
>> 2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653: sent
>> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b):
>> addr:3e:0c:ed:2e:a9:ba
>>
>>
>>
>> 2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653: sent
>> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b):
>> addr:8a:2f:9f:c6:fe:d9
>>
>>
>>
>> Immediately after tunnel delete, I’m seeing so multiple switch flaps for
>> quite sometime,
>>
>>
>>
>> 2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix: entering ACTIVE
>>
>> 2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent (Success):
>> OFPT_HELLO (OF1.3) (xid=0x75):
>>
>> version bitmap: 0x04
>>
>> 2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received: OFPT_HELLO
>> (OF1.3) (xid=0x1):
>>
>>
>>
>> 2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix: connection
>> closed by peer
>>
>> 2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix: entering
>> DISCONNECTED
>>
>> 2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix: entering ACTIVE
>>
>>
>>
>> Also, I’m seeing error in karaf log
>>
>>
>>
>> 2018-01-23 18:49:29,378 | WARN  | entLoopGroup-7-3 |
>> DeviceContextImpl                | 280 - org.opendaylight.openflowplugin.impl
>> - 0.4.3.SNAPSHOT | writePortStatusMessage
>>
>> 2018-01-23 18:49:29,379 | WARN  | entLoopGroup-7-3 |
>> DeviceContextImpl                | 280 - org.opendaylight.openflowplugin.impl
>> - 0.4.3.SNAPSHOT | submit transaction for write port status message
>>
>> 2018-01-23 18:49:29,379 | WARN  | rd-dispatcher-23 |
>> ShardDataTree                    | 184 - 
>> org.opendaylight.controller.sal-distributed-datastore
>> - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational: Store Tx
>> member-1-datastore-operational-fe-0-chn-8-txn-11-0: Data validation
>> failed for path /(urn:opendaylight:inventory?r
>> evision=2013-08-19)nodes/node/node[{(urn:opendaylight:invent
>> ory?revision=2013-08-19)id=openflow:246869078989547}]/Aug
>> mentationIdentifier{childNames=[(urn:opendaylight:flow:
>> inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:invento
>> ry?revision=2013-08-19)stale-group, (urn:opendaylight:flow:invento
>> ry?revision=2013-08-19)supported-match-types,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}
>> /(urn:opendaylight:flow:inventory?revision=2013-08-19)
>> table/table[{(urn:opendaylight:flow:inventory?revision=2013-
>> 08-19)id=50}]/flow.
>>
>> org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException:
>> Node /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/
>> node[{(urn:opendaylight:inventory?revision=2013-08-19)id=
>> openflow:246869078989547}]/AugmentationIdentifier{childNames
>> =[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
>> (urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}
>> /(urn:opendaylight:flow:inventory?revision=2013-08-19)
>> table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow
>> does not exist. Cannot apply modification to its children.
>>
>>
>>
>> We need to check why there is multiple switch disconnect and reconnect
>> and how ofp handles the same.
>>
>>
>>
>> Regards,
>>
>> Arun
>>
>>
>>
>> *From:* Vishal Thapar
>> *Sent:* Wednesday, January 24, 2018 9:52 AM
>> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>;
>> Josh Hershberg <[email protected]>; D Arunprakash <
>> [email protected]>
>> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]>
>> *Subject:* RE: is dhcp issue fixed on carbon?
>>
>>
>>
>> Missed adding most important detail and added Arun.
>>
>>
>>
>> Inventory operational is still showing old port and new port for some
>> reason. I guess that is what caused problems.
>>
>>
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/
>> log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56
>>
>>
>>
>> {"id":"openflow:246869078989547:4","flow-node-inventory:
>> supported":"","flow-node-inventory:peer-features":"","
>> flow-node-inventory:port-number":4,"flow-node-inventory:
>> hardware-address":"3e:0c:ed:2e:a9:ba","flow-node-inventory
>> :current-feature":"","flow-node-inventory:maximum-speed":
>> 0,"flow-node-inventory:reason":"add","flow-node-inventory:
>> configuration":"","flow-node-inventory:advertised-features"
>> :"","flow-node-inventory:current-speed":0,"flow-node-
>> inventory:name":"tun55fb50d0a2b","flow-node-inventory:state"
>> :{"link-down":false,"blocked":false,"live":false}}
>>
>>
>>
>> {"id":"openflow:246869078989547:9","flow-node-inventory:
>> supported":"","flow-node-inventory:peer-features":"","
>> flow-node-inventory:port-number":9,"flow-node-inventory:
>> hardware-address":"8a:2f:9f:c6:fe:d9","flow-node-inventory
>> :current-feature":"","flow-node-inventory:maximum-speed":
>> 0,"flow-node-inventory:reason":"add","flow-node-inventory:
>> configuration":"","flow-node-inventory:advertised-features"
>> :"","flow-node-inventory:current-speed":0,"flow-node-
>> inventory:name":"tun55fb50d0a2b","flow-node-inventory:state"
>> :{"link-down":false,"blocked":false,"live":false}}
>>
>>
>>
>> OVS output from same set of logs:
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/
>> log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4
>>
>>
>>
>> 9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9
>>
>>     config: 0
>>
>>     state: 0
>>
>>     speed: 0 Mbps now, 0 Mbps max
>>
>>
>>
>> So for now I’d peg it as OFPlugin issue. It didn’t detect or inform us of
>> old port delete and that is why we didn’t delete old flows. Though
>> wondering if something else in IFM code could’ve handled it, but don’t
>> think we handle OfPort number changes, expect a delete+add in such
>> scenarios. Faseela can pitch in why we have service binding entry with new
>> port number but flow is still using old one.
>>
>>
>>
>> Regards,
>>
>> Vishal.
>>
>>
>>
>> *From:* Vishal Thapar
>> *Sent:* 24 January 2018 09:26
>> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>;
>> Josh Hershberg <[email protected]>
>> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]>
>> *Subject:* RE: is dhcp issue fixed on carbon?
>>
>>
>>
>> Quick analysis:
>>
>>
>>
>> Not related to policy stuff. Service binding has entry for the new port
>> number but table 220 flow is still using old port number.
>>
>>
>>
>> { "bound-services": [ { "flow-cookie": 134217735, "flow-priority": 9,
>> "instruction": [ { "apply-actions": { "action": [ { "order": 0,
>> "output-action": { "max-length": 0, "output-node-connector": "*9*" } } ]
>> }, "order": 0 } ], "service-name": "default.tun55fb50d0a2b",
>> "service-priority": 9, "service-type": 
>> "interface-service-bindings:service-type-flow-based"
>> } ], "interface-name": "tun55fb50d0a2b", "service-mode":
>> "interface-service-bindings:service-mode-egress" }
>>
>>
>>
>> {"id":"246869078989547.220.tun55fb50d0a2b.0","priority":9,"
>> table_id":220,"installHw":true,"hard-timeout":0,"match":{"
>> openflowplugin-extension-general:extension-list":[{"extensio
>> n-key":"openflowplugin-extension-nicira-match:nxm-nx-
>> reg6-key","extension":{"openflowplugin-extension-
>> nicira-match:nxm-nx-reg":{"value":4096,"reg":"nicira-
>> match:nxm-nx-reg6"}}}]},"cookie":134217735,"flow-name":
>> "default.tun55fb50d0a2b","strict":true,"instructions":{"
>> instruction":[{"order":0,"apply-actions":{"action":[{"order"
>> :0,"output-action":{"max-length":0,"output-node-connector":"*4*
>> "}}]}}]},"barrier":false,"idle-timeout":0}
>>
>>
>>
>> cookie=0x8000007, duration=403.965s, table=220, n_packets=0, n_bytes=0,
>> priority=9,reg6=0x1000 actions=output:*4*
>>
>>
>>
>> In OVS logs you can see this tunnel port getting deleted and then coming
>> back in with a different OfPort.
>>
>>
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/
>> compute_2/ovs-vswitchd.log.gz
>>
>>
>>
>> It goes from 4 to 9. This happens due to clean up in previous suite which
>> doesn’t actually clean up everything and leaves entry for that old service
>> binding. Can confirm it from interfaces-state entry for same port in first
>> and second suites. So we have stale flows and stale service bindings for
>> old tunnel port. Could probably check with OFPlugin how they handle update
>> of a flow, may probably not work.
>>
>>
>>
>> We need to check if cleanup has been done completely before moving to
>> next suite. This is where the work we been doing on tools comes in.
>>
>>
>>
>> Regards,
>>
>> Vishal.
>>
>>
>>
>> *From:* Faseela K
>> *Sent:* 24 January 2018 08:10
>> *To:* Sam Hague <[email protected]>; Josh Hershberg <[email protected]>
>> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
>> [email protected]>; Manu B <[email protected]>
>> *Subject:* RE: is dhcp issue fixed on carbon?
>>
>>
>>
>> Looks more or less similar issue, tunnel flow is programmed in table 220
>> with older tunnel’s port number, which was deleted in l2 suite. However
>> policy code has not kicked in. I will take a detailed look on what is
>> causing this issue now.
>>
>>
>>
>> Thanks,
>>
>> Faseela
>>
>>
>>
>> *From:* Faseela K
>> *Sent:* Wednesday, January 24, 2018 7:48 AM
>> *To:* 'Sam Hague' <[email protected]>; Josh Hershberg <
>> [email protected]>
>> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
>> [email protected]>; Manu B <[email protected]>
>> *Subject:* RE: is dhcp issue fixed on carbon?
>>
>>
>>
>> Thanks Sam for initial triaging.
>>
>> I will take a look at this.
>>
>>
>>
>> *From:* Sam Hague [mailto:[email protected] <[email protected]>]
>> *Sent:* Wednesday, January 24, 2018 6:54 AM
>> *To:* Faseela K <[email protected]>; Josh Hershberg <
>> [email protected]>
>> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
>> [email protected]>; Manu B <[email protected]>
>> *Subject:* Re: is dhcp issue fixed on carbon?
>>
>>
>>
>> OK, seems pretty consistent that table 220 flows are not showing up.
>> Vishal, Faseela, can you see if it is like the policymgr one where the
>> bind/unbind was wrong? That seems the closest culprit as those were the
>> last patches merged.
>>
>>
>>
>> Here is another case where the table 220 flow is missing in suite [5] of
>> job [6]. This time the port missing is a tunnel port. "9(tun55fb50d0a2b):
>> addr:8a:2f:9f:c6:fe:d9" is missing from table 220. And then in suite [7]
>> of the same job this port has the same issue where the tunnel port is
>> missing: "16(tap28760838-a7): addr:fe:16:3e:26:0a:e3"
>>
>>
>>
>> [5] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/
>> log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4
>>
>>
>>
>> [6] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins
>> -1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
>> 263/log_02_l3.html.gz
>>
>>
>>
>> [7] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins
>> -1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
>> 263/log_04_security_group.html.gz
>>
>>
>>
>> On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague <[email protected]> wrote:
>>
>> further details for Josh since the original email doesn't have many...
>>
>>
>>
>> - so the "l3.Check Vm Instances Have Ip Address" test fails with the
>> net1 not being able to get all the vm ips for it's three vms.
>>
>> - '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None' - this means
>> the first vm of the three did not get a ip
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/
>> 298/log_02_l3.html.gz#s1-t11-k8
>>
>>
>>
>> - looks at the neutron ports to find which port goes with vm1
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/
>> 298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2
>>
>> get the missing ip as 31.0.0.6, then look at next log to get the port
>>
>> - look at the 31.0.0.x addresses, we know 31,0.0
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/
>> 298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2
>>
>> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f |
>> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd'
>> | ACTIVE |
>>
>>
>>
>> - I know the first vm (and second) are both on the compute_1 so look at
>> the ovs logs on compute_1
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/
>> 298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4
>>
>>
>>
>> - compute_1, in the ofctl show br-int, we see port 7
>>
>> 7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f
>>
>>
>>
>> - then check flows to see if there is a table 220 flow for port 7
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/n
>> etvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/
>> 298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4
>>
>> And the table 220 flow for port 7 is not there, so the vm can't get an IP.
>>
>>
>>
>> [3] is the patch vishal pushed to fix a similar issue the first time we
>> saw this. What we found is that the elan tag was being reused, because a
>> port was deleted and then a new one created and the elan tag reused. So you
>> ended up with a tunnel port stomping on a vm port.
>>
>> [3]  https://git.opendaylight.org/gerrit/#/c/67009/
>>
>>
>>
>> On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague <[email protected]> wrote:
>>
>> Adding Josh to thread.
>>
>>
>>
>> On Tue, Jan 23, 2018 at 2:25 PM, Faseela K <[email protected]>
>> wrote:
>>
>> Manu,
>>
>>    Could you please take a look at the DHCP failure in the below run?
>>
>>     I am caught up with something else, will help you out in initial
>> triaging.
>>
>> Thanks,
>>
>> Faseela
>>
>>
>>
>> *From:* Sam Hague [mailto:[email protected]]
>> *Sent:* Monday, January 22, 2018 10:57 PM
>> *To:* Vishal Thapar <[email protected]>; Faseela K <
>> [email protected]>; Jamo Luhrsen <[email protected]>
>> *Subject:* is dhcp issue fixed on carbon?
>>
>>
>>
>> Vishal, Faseela,
>>
>>
>>
>> can you look at this job run to see if the issue you fixed with the
>> policymgr binding is fixed? in this build the whole poligymgr bundle has
>> been removed. This is carbon so I just removed the whole bundle as we would
>> never use it. Could that have uncovered something that the code was doing?
>> If so, then even master and nitrogen should have the issue since there we
>> disabled building policymgr - so should be the same as removing it.
>>
>>
>>
>> Other thing, merged in carbon is the bind/unbind patches for elan and
>> dhcp. Could those have an impact?
>>
>>
>>
>> Thanks, Sam
>>
>>
>>
>> I don't see "7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f" pop up in the
>> table 220 flows which was the problem before.
>>
>>
>>
>> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f |
>> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd'
>> | ACTIVE |
>>
>>
>>
>> Thanks, Sam
>>
>>
>>
>> [1] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins
>> -1/netvirt-csit-1node-openstack-ocata-upstream-stateful-
>> carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4
>>
>>
>>
>>
>>
>>
>>
>


-- 
Thanks
Anil
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to