Hi Sam,
We need to enable Openflowplugin logs and do packet capture for the time window 
when the tunnel port is delete and added back.

Regards,
Arun

From: Anil Vishnoi [mailto:[email protected]]
Sent: Thursday, January 25, 2018 1:17 AM
To: Sam Hague <[email protected]>
Cc: D Arunprakash <[email protected]>; openflowplugin-dev 
<[email protected]>; Vishal Thapar 
<[email protected]>; Faseela K <[email protected]>; Josh 
Hershberg <[email protected]>; Jamo Luhrsen <[email protected]>; Manu B 
<[email protected]>
Subject: Re: is dhcp issue fixed on carbon?

Hi Sam,

Looks like Arun is looking at it ?

Arun, if you are not looking at it currently, please let me know I will take a 
look at it.

Thanks
Anil

On Wed, Jan 24, 2018 at 4:25 AM, Sam Hague 
<[email protected]<mailto:[email protected]>> wrote:
Adding openflow to thread.

Anil, could someone take a look at this for carbon? We are seeing a connection 
flapping and end up missing port status updates. This leads to stale models and 
flows.

This is blocking the carbon sr3.

On Jan 24, 2018 12:58 AM, "D Arunprakash" 
<[email protected]<mailto:[email protected]>> wrote:
Ignore my previous email.

The tunnel port got deleted around 18:49:29.373 and added back on 18:52:46.26

2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653<http://10.30.170.63:6653>:
 sent (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b): 
addr:3e:0c:ed:2e:a9:ba

2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653<http://10.30.170.63:6653>:
 sent (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b): 
addr:8a:2f:9f:c6:fe:d9

Immediately after tunnel delete, I’m seeing so multiple switch flaps for quite 
sometime,

2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix: entering ACTIVE
2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent (Success): OFPT_HELLO 
(OF1.3) (xid=0x75):
version bitmap: 0x04
2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received: OFPT_HELLO (OF1.3) 
(xid=0x1):

2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix: connection closed by 
peer
2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix: entering DISCONNECTED
2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix: entering ACTIVE

Also, I’m seeing error in karaf log

2018-01-23 18:49:29,378 | WARN  | entLoopGroup-7-3 | DeviceContextImpl          
      | 280 - org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT | 
writePortStatusMessage
2018-01-23 18:49:29,379 | WARN  | entLoopGroup-7-3 | DeviceContextImpl          
      | 280 - org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT | submit 
transaction for write port status message
2018-01-23 18:49:29,379 | WARN  | rd-dispatcher-23 | ShardDataTree              
      | 184 - 
org.opendaylight.controller.sa<http://org.opendaylight.controller.sa>l-distributed-datastore
 - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational: Store Tx 
member-1-datastore-operational-fe-0-chn-8-txn-11-0: Data validation failed for 
path 
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
 (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)table, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)group, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)software, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)description, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow.
org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException:
 Node 
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
 (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)table, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)group, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)software, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)description, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, 
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow
 does not exist. Cannot apply modification to its children.

We need to check why there is multiple switch disconnect and reconnect and how 
ofp handles the same.

Regards,
Arun

From: Vishal Thapar
Sent: Wednesday, January 24, 2018 9:52 AM
To: Faseela K <[email protected]<mailto:[email protected]>>; Sam 
Hague <[email protected]<mailto:[email protected]>>; Josh Hershberg 
<[email protected]<mailto:[email protected]>>; D Arunprakash 
<[email protected]<mailto:[email protected]>>
Cc: Jamo Luhrsen <[email protected]<mailto:[email protected]>>; Manu B 
<[email protected]<mailto:[email protected]>>
Subject: RE: is dhcp issue fixed on carbon?

Missed adding most important detail and added Arun.

Inventory operational is still showing old port and new port for some reason. I 
guess that is what caused problems.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56

{"id":"openflow:246869078989547:4","flow-node-inventory:supported":"","flow-node-inventory:peer-features":"","flow-node-inventory:port-number":4,"flow-node-inventory:hardware-address":"3e:0c:ed:2e:a9:ba","flow-node-inventory:current-feature":"","flow-node-inventory:maximum-speed":0,"flow-node-inventory:reason":"add","flow-node-inventory:configuration":"","flow-node-inventory:advertised-features":"","flow-node-inventory:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-inventory:state":{"link-down":false,"blocked":false,"live":false}}

{"id":"openflow:246869078989547:9","flow-node-inventory:supported":"","flow-node-inventory:peer-features":"","flow-node-inventory:port-number":9,"flow-node-inventory:hardware-address":"8a:2f:9f:c6:fe:d9","flow-node-inventory:current-feature":"","flow-node-inventory:maximum-speed":0,"flow-node-inventory:reason":"add","flow-node-inventory:configuration":"","flow-node-inventory:advertised-features":"","flow-node-inventory:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-inventory:state":{"link-down":false,"blocked":false,"live":false}}

OVS output from same set of logs:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4

9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9
    config: 0
    state: 0
    speed: 0 Mbps now, 0 Mbps max

So for now I’d peg it as OFPlugin issue. It didn’t detect or inform us of old 
port delete and that is why we didn’t delete old flows. Though wondering if 
something else in IFM code could’ve handled it, but don’t think we handle 
OfPort number changes, expect a delete+add in such scenarios. Faseela can pitch 
in why we have service binding entry with new port number but flow is still 
using old one.

Regards,
Vishal.

From: Vishal Thapar
Sent: 24 January 2018 09:26
To: Faseela K <[email protected]<mailto:[email protected]>>; Sam 
Hague <[email protected]<mailto:[email protected]>>; Josh Hershberg 
<[email protected]<mailto:[email protected]>>
Cc: Jamo Luhrsen <[email protected]<mailto:[email protected]>>; Manu B 
<[email protected]<mailto:[email protected]>>
Subject: RE: is dhcp issue fixed on carbon?

Quick analysis:

Not related to policy stuff. Service binding has entry for the new port number 
but table 220 flow is still using old port number.

{ "bound-services": [ { "flow-cookie": 134217735, "flow-priority": 9, 
"instruction": [ { "apply-actions": { "action": [ { "order": 0, 
"output-action": { "max-length": 0, "output-node-connector": "9" } } ] }, 
"order": 0 } ], "service-name": "default.tun55fb50d0a2b", "service-priority": 
9, "service-type": "interface-service-bindings:service-type-flow-based" } ], 
"interface-name": "tun55fb50d0a2b", "service-mode": 
"interface-service-bindings:service-mode-egress" }

{"id":"246869078989547.220.tun55fb50d0a2b.0","priority":9,"table_id":220,"installHw":true,"hard-timeout":0,"match":{"openflowplugin-extension-general:extension-list":[{"extension-key":"openflowplugin-extension-nicira-match:nxm-nx-reg6-key","extension":{"openflowplugin-extension-nicira-match:nxm-nx-reg":{"value":4096,"reg":"nicira-match:nxm-nx-reg6"}}}]},"cookie":134217735,"flow-name":"default.tun55fb50d0a2b","strict":true,"instructions":{"instruction":[{"order":0,"apply-actions":{"action":[{"order":0,"output-action":{"max-length":0,"output-node-connector":"4"}}]}}]},"barrier":false,"idle-timeout":0}

cookie=0x8000007, duration=403.965s, table=220, n_packets=0, n_bytes=0, 
priority=9,reg6=0x1000 actions=output:4

In OVS logs you can see this tunnel port getting deleted and then coming back 
in with a different OfPort.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/compute_2/ovs-vswitchd.log.gz

It goes from 4 to 9. This happens due to clean up in previous suite which 
doesn’t actually clean up everything and leaves entry for that old service 
binding. Can confirm it from interfaces-state entry for same port in first and 
second suites. So we have stale flows and stale service bindings for old tunnel 
port. Could probably check with OFPlugin how they handle update of a flow, may 
probably not work.

We need to check if cleanup has been done completely before moving to next 
suite. This is where the work we been doing on tools comes in.

Regards,
Vishal.

From: Faseela K
Sent: 24 January 2018 08:10
To: Sam Hague <[email protected]<mailto:[email protected]>>; Josh Hershberg 
<[email protected]<mailto:[email protected]>>
Cc: Vishal Thapar 
<[email protected]<mailto:[email protected]>>; Jamo Luhrsen 
<[email protected]<mailto:[email protected]>>; Manu B 
<[email protected]<mailto:[email protected]>>
Subject: RE: is dhcp issue fixed on carbon?

Looks more or less similar issue, tunnel flow is programmed in table 220 with 
older tunnel’s port number, which was deleted in l2 suite. However policy code 
has not kicked in. I will take a detailed look on what is causing this issue 
now.

Thanks,
Faseela

From: Faseela K
Sent: Wednesday, January 24, 2018 7:48 AM
To: 'Sam Hague' <[email protected]<mailto:[email protected]>>; Josh Hershberg 
<[email protected]<mailto:[email protected]>>
Cc: Vishal Thapar 
<[email protected]<mailto:[email protected]>>; Jamo Luhrsen 
<[email protected]<mailto:[email protected]>>; Manu B 
<[email protected]<mailto:[email protected]>>
Subject: RE: is dhcp issue fixed on carbon?

Thanks Sam for initial triaging.
I will take a look at this.

From: Sam Hague [mailto:[email protected]]
Sent: Wednesday, January 24, 2018 6:54 AM
To: Faseela K <[email protected]<mailto:[email protected]>>; Josh 
Hershberg <[email protected]<mailto:[email protected]>>
Cc: Vishal Thapar 
<[email protected]<mailto:[email protected]>>; Jamo Luhrsen 
<[email protected]<mailto:[email protected]>>; Manu B 
<[email protected]<mailto:[email protected]>>
Subject: Re: is dhcp issue fixed on carbon?

OK, seems pretty consistent that table 220 flows are not showing up. Vishal, 
Faseela, can you see if it is like the policymgr one where the bind/unbind was 
wrong? That seems the closest culprit as those were the last patches merged.

Here is another case where the table 220 flow is missing in suite [5] of job 
[6]. This time the port missing is a tunnel port. "9(tun55fb50d0a2b): 
addr:8a:2f:9f:c6:fe:d9" is missing from table 220. And then in suite [7] of the 
same job this port has the same issue where the tunnel port is missing: 
"16(tap28760838-a7): addr:fe:16:3e:26:0a:e3"

[5] 
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4

[6] 
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz

[7] 
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/263/log_04_security_group.html.gz

On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague 
<[email protected]<mailto:[email protected]>> wrote:
further details for Josh since the original email doesn't have many...

- so the "l3.Check Vm Instances Have Ip Address" test fails with the net1 not 
being able to get all the vm ips for it's three vms.
- '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None' - this means the first 
vm of the three did not get a ip
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k8

- looks at the neutron ports to find which port goes with vm1
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2
get the missing ip as 31.0.0.6, then look at next log to get the port
- look at the 31.0.0.x addresses, we know 31,0.0
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2
3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | 
ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' | 
ACTIVE |

- I know the first vm (and second) are both on the compute_1 so look at the ovs 
logs on compute_1
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4

- compute_1, in the ofctl show br-int, we see port 7
7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f

- then check flows to see if there is a table 220 flow for port 7
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4
And the table 220 flow for port 7 is not there, so the vm can't get an IP.

[3] is the patch vishal pushed to fix a similar issue the first time we saw 
this. What we found is that the elan tag was being reused, because a port was 
deleted and then a new one created and the elan tag reused. So you ended up 
with a tunnel port stomping on a vm port.
[3]  https://git.opendaylight.org/gerrit/#/c/67009/

On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague 
<[email protected]<mailto:[email protected]>> wrote:
Adding Josh to thread.

On Tue, Jan 23, 2018 at 2:25 PM, Faseela K 
<[email protected]<mailto:[email protected]>> wrote:
Manu,
   Could you please take a look at the DHCP failure in the below run?
    I am caught up with something else, will help you out in initial triaging.
Thanks,
Faseela

From: Sam Hague [mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, January 22, 2018 10:57 PM
To: Vishal Thapar 
<[email protected]<mailto:[email protected]>>; Faseela K 
<[email protected]<mailto:[email protected]>>; Jamo Luhrsen 
<[email protected]<mailto:[email protected]>>
Subject: is dhcp issue fixed on carbon?

Vishal, Faseela,

can you look at this job run to see if the issue you fixed with the policymgr 
binding is fixed? in this build the whole poligymgr bundle has been removed. 
This is carbon so I just removed the whole bundle as we would never use it. 
Could that have uncovered something that the code was doing? If so, then even 
master and nitrogen should have the issue since there we disabled building 
policymgr - so should be the same as removing it.

Other thing, merged in carbon is the bind/unbind patches for elan and dhcp. 
Could those have an impact?

Thanks, Sam

I don't see "7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f" pop up in the table 220 
flows which was the problem before.

3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | 
ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' | 
ACTIVE |

Thanks, Sam

[1] 
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4






--
Thanks
Anil
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to