Please enable the logs for the below package,
org.opendaylight.openflowplugin.impl
Regards,
Arun
*From:*Sam Hague [mailto:[email protected]]
*Sent:* Thursday, January 25, 2018 9:32 AM
*To:* D Arunprakash <[email protected]>
*Cc:* Anil Vishnoi <[email protected]>; openflowplugin-dev
<[email protected]>; Vishal Thapar
<[email protected]>; Faseela K <[email protected]>;
Josh Hershberg <[email protected]>; Jamo Luhrsen
<[email protected]>; Manu B <[email protected]>
*Subject:* RE: is dhcp issue fixed on carbon?
What logs to enable? Or just the whole openflowplugin, which is very noisy.
Is packet capture on port 6653 good?
On Jan 24, 2018 10:54 PM, "D Arunprakash" <[email protected]
<mailto:[email protected]>> wrote:
Hi Sam,
We need to enable Openflowplugin logs and do packet capture for the
time window when the tunnel port is delete and added back.
Regards,
Arun
*From:*Anil Vishnoi [mailto:[email protected]
<mailto:[email protected]>]
*Sent:* Thursday, January 25, 2018 1:17 AM
*To:* Sam Hague <[email protected] <mailto:[email protected]>>
*Cc:* D Arunprakash <[email protected]
<mailto:[email protected]>>; openflowplugin-dev
<[email protected]
<mailto:[email protected]>>; Vishal Thapar
<[email protected] <mailto:[email protected]>>;
Faseela K <[email protected] <mailto:[email protected]>>;
Josh Hershberg <[email protected] <mailto:[email protected]>>;
Jamo Luhrsen <[email protected] <mailto:[email protected]>>;
Manu B <[email protected] <mailto:[email protected]>>
*Subject:* Re: is dhcp issue fixed on carbon?
Hi Sam,
Looks like Arun is looking at it ?
Arun, if you are not looking at it currently, please let me know I
will take a look at it.
Thanks
Anil
On Wed, Jan 24, 2018 at 4:25 AM, Sam Hague <[email protected]
<mailto:[email protected]>> wrote:
Adding openflow to thread.
Anil, could someone take a look at this for carbon? We are
seeing a connection flapping and end up missing port status
updates. This leads to stale models and flows.
This is blocking the carbon sr3.
On Jan 24, 2018 12:58 AM, "D Arunprakash"
<[email protected] <mailto:[email protected]>>
wrote:
Ignore my previous email.
The tunnel port got deleted around 18:49:29.373 and added
back on 18:52:46.26
2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653
<http://10.30.170.63:6653>: sent (Success): OFPT_PORT_STATUS
(OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b):
addr:3e:0c:ed:2e:a9:ba
2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653
<http://10.30.170.63:6653>: sent (Success): OFPT_PORT_STATUS
(OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b):
addr:8a:2f:9f:c6:fe:d9
Immediately after tunnel delete, I’m seeing so multiple
switch flaps for quite sometime,
2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix:
entering ACTIVE
2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent
(Success): OFPT_HELLO (OF1.3) (xid=0x75):
version bitmap: 0x04
2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received:
OFPT_HELLO (OF1.3) (xid=0x1):
2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix:
connection closed by peer
2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix:
entering DISCONNECTED
2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix:
entering ACTIVE
Also, I’m seeing error in karaf log
2018-01-23 18:49:29,378 | WARN | entLoopGroup-7-3 |
DeviceContextImpl | 280 -
org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT |
writePortStatusMessage
2018-01-23 18:49:29,379 | WARN | entLoopGroup-7-3 |
DeviceContextImpl | 280 -
org.opendaylight.openflowplugin.impl - 0.4.3.SNAPSHOT |
submit transaction for write port status message
2018-01-23 18:49:29,379 | WARN | rd-dispatcher-23 |
ShardDataTree | 184 -
org.opendaylight.controller.sa
<http://org.opendaylight.controller.sa>l-distributed-datastore
- 1.5.3.SNAPSHOT | member-1-shard-inventory-operational:
Store Tx member-1-datastore-operational-fe-0-chn-8-txn-11-0:
Data validation failed for path
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
(urn:opendaylight:flow:inventory?revision=2013-08-19)table,
(urn:opendaylight:flow:inventory?revision=2013-08-19)group,
(urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
(urn:opendaylight:flow:inventory?revision=2013-08-19)software,
(urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
(urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
(urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
(urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
(urn:opendaylight:flow:inventory?revision=2013-08-19)description,
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow.
org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException:
Node
/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:246869078989547}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
(urn:opendaylight:flow:inventory?revision=2013-08-19)table,
(urn:opendaylight:flow:inventory?revision=2013-08-19)group,
(urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
(urn:opendaylight:flow:inventory?revision=2013-08-19)software,
(urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
(urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
(urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
(urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
(urn:opendaylight:flow:inventory?revision=2013-08-19)description,
(urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
(urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
(urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow
does not exist. Cannot apply modification to its children.
We need to check why there is multiple switch disconnect and
reconnect and how ofp handles the same.
Regards,
Arun
*From:*Vishal Thapar
*Sent:* Wednesday, January 24, 2018 9:52 AM
*To:* Faseela K <[email protected]
<mailto:[email protected]>>; Sam Hague
<[email protected] <mailto:[email protected]>>; Josh
Hershberg <[email protected]
<mailto:[email protected]>>; D Arunprakash
<[email protected] <mailto:[email protected]>>
*Cc:* Jamo Luhrsen <[email protected]
<mailto:[email protected]>>; Manu B <[email protected]
<mailto:[email protected]>>
*Subject:* RE: is dhcp issue fixed on carbon?
Missed adding most important detail and added Arun.
Inventory operational is still showing old port and new port
for some reason. I guess that is what caused problems.
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1
-
t25-k4-k2-k1-k2-k56
{"id":"openflow:246869078989547:4","flow-node-inventory:supported":""
,
"flow-node-inventory:peer-features":"","flow-node-inventory:port-numb
e
r":4,"flow-node-inventory:hardware-address":"3e:0c:ed:2e:a9:ba","flow
-
node-inventory:current-feature":"","flow-node-inventory:maximum-speed"
:0,"flow-node-inventory:reason":"add","flow-node-inventory:configurat
i
on":"","flow-node-inventory:advertised-features":"","flow-node-invent
o
ry:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow
-
node-inventory:state":{"link-down":false,"blocked":false,"live":false
}
}
{"id":"openflow:246869078989547:9","flow-node-inventory:supported":""
,
"flow-node-inventory:peer-features":"","flow-node-inventory:port-numb
e
r":9,"flow-node-inventory:hardware-address":"8a:2f:9f:c6:fe:d9","flow
-
node-inventory:current-feature":"","flow-node-inventory:maximum-speed"
:0,"flow-node-inventory:reason":"add","flow-node-inventory:configurat
i
on":"","flow-node-inventory:advertised-features":"","flow-node-invent
o
ry:current-speed":0,"flow-node-inventory:name":"tun55fb50d0a2b","flow
-
node-inventory:state":{"link-down":false,"blocked":false,"live":false
}
}
OVS output from same set of logs:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1
-
t25-k4-k1-k3-k1-k11-k4
9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
So for now I’d peg it as OFPlugin issue. It didn’t detect or
inform us of old port delete and that is why we didn’t
delete old flows. Though wondering if something else in IFM
code could’ve handled it, but don’t think we handle OfPort
number changes, expect a delete+add in such scenarios.
Faseela can pitch in why we have service binding entry with
new port number but flow is still using old one.
Regards,
Vishal.
*From:*Vishal Thapar
*Sent:* 24 January 2018 09:26
*To:* Faseela K <[email protected]
<mailto:[email protected]>>; Sam Hague
<[email protected] <mailto:[email protected]>>; Josh
Hershberg <[email protected] <mailto:[email protected]>>
*Cc:* Jamo Luhrsen <[email protected]
<mailto:[email protected]>>; Manu B <[email protected]
<mailto:[email protected]>>
*Subject:* RE: is dhcp issue fixed on carbon?
Quick analysis:
Not related to policy stuff. Service binding has entry for
the new port number but table 220 flow is still using old
port number.
{ "bound-services": [ { "flow-cookie": 134217735,
"flow-priority": 9, "instruction": [ { "apply-actions": {
"action": [ { "order": 0, "output-action": { "max-length":
0, "output-node-connector": "*9*" } } ] }, "order": 0 } ],
"service-name": "default.tun55fb50d0a2b",
"service-priority": 9, "service-type":
"interface-service-bindings:service-type-flow-based" } ],
"interface-name": "tun55fb50d0a2b", "service-mode":
"interface-service-bindings:service-mode-egress" }
{"id":"246869078989547.220.tun55fb50d0a2b.0","priority":9,"table_id":
2
20,"installHw":true,"hard-timeout":0,"match":{"openflowplugin-extensi
o
n-general:extension-list":[{"extension-key":"openflowplugin-extension
-
nicira-match:nxm-nx-reg6-key","extension":{"openflowplugin-extension-
n
icira-match:nxm-nx-reg":{"value":4096,"reg":"nicira-match:nxm-nx-reg6"
}}}]},"cookie":134217735,"flow-name":"default.tun55fb50d0a2b","strict"
:true,"instructions":{"instruction":[{"order":0,"apply-actions":{"act
i
on":[{"order":0,"output-action":{"max-length":0,"output-node-connecto
r ":"*4*"}}]}}]},"barrier":false,"idle-timeout":0}
cookie=0x8000007, duration=403.965s, table=220, n_packets=0,
n_bytes=0, priority=9,reg6=0x1000 actions=output:*4*
In OVS logs you can see this tunnel port getting deleted and
then coming back in with a different OfPort.
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-gate-stateful-carbon/263/compute_2/ovs-vswitc
h
d.log.gz
It goes from 4 to 9. This happens due to clean up in
previous suite which doesn’t actually clean up everything
and leaves entry for that old service binding. Can confirm
it from interfaces-state entry for same port in first and
second suites. So we have stale flows and stale service
bindings for old tunnel port. Could probably check with
OFPlugin how they handle update of a flow, may probably not
work.
We need to check if cleanup has been done completely before
moving to next suite. This is where the work we been doing
on tools comes in.
Regards,
Vishal.
*From:*Faseela K
*Sent:* 24 January 2018 08:10
*To:* Sam Hague <[email protected]
<mailto:[email protected]>>; Josh Hershberg
<[email protected] <mailto:[email protected]>>
*Cc:* Vishal Thapar <[email protected]
<mailto:[email protected]>>; Jamo Luhrsen
<[email protected] <mailto:[email protected]>>; Manu B
<[email protected] <mailto:[email protected]>>
*Subject:* RE: is dhcp issue fixed on carbon?
Looks more or less similar issue, tunnel flow is programmed
in table 220 with older tunnel’s port number, which was
deleted in l2 suite. However policy code has not kicked in.
I will take a detailed look on what is causing this issue now.
Thanks,
Faseela
*From:*Faseela K
*Sent:* Wednesday, January 24, 2018 7:48 AM
*To:* 'Sam Hague' <[email protected]
<mailto:[email protected]>>; Josh Hershberg
<[email protected] <mailto:[email protected]>>
*Cc:* Vishal Thapar <[email protected]
<mailto:[email protected]>>; Jamo Luhrsen
<[email protected] <mailto:[email protected]>>; Manu B
<[email protected] <mailto:[email protected]>>
*Subject:* RE: is dhcp issue fixed on carbon?
Thanks Sam for initial triaging.
I will take a look at this.
*From:*Sam Hague [mailto:[email protected]]
*Sent:* Wednesday, January 24, 2018 6:54 AM
*To:* Faseela K <[email protected]
<mailto:[email protected]>>; Josh Hershberg
<[email protected] <mailto:[email protected]>>
*Cc:* Vishal Thapar <[email protected]
<mailto:[email protected]>>; Jamo Luhrsen
<[email protected] <mailto:[email protected]>>; Manu B
<[email protected] <mailto:[email protected]>>
*Subject:* Re: is dhcp issue fixed on carbon?
OK, seems pretty consistent that table 220 flows are not
showing up. Vishal, Faseela, can you see if it is like the
policymgr one where the bind/unbind was wrong? That seems
the closest culprit as those were the last patches merged.
Here is another case where the table 220 flow is missing in
suite [5] of job [6]. This time the port missing is a tunnel
port. "9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9" is missing
from table 220. And then in suite [7] of the same job this
port has the same issue where the tunnel port is missing:
"16(tap28760838-a7): addr:fe:16:3e:26:0a:e3"
[5]
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz#s1
-
t25-k4-k1-k3-k1-k12-k4
[6]
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i t-1node-openstack-ocata-gate-stateful-carbon/263/log_02_l3.html.gz
[7]
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-gate-stateful-carbon/263/log_04_security_grou
p
.html.gz
On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague
<[email protected] <mailto:[email protected]>> wrote:
further details for Josh since the original email
doesn't have many...
- so the "l3.Check Vm Instances Have Ip Address" test
fails with the net1 not being able to get all the vm ips
for it's three vms.
- '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None'
- this means the first vm of the three did not get a
ip
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k8
- looks at the neutron ports to find which port goes
with vm1
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k9-k1-k4-k1-k2
get the missing ip as 31.0.0.6, then look at next log to
get the port
- look at the 31.0.0.x addresses, we know 31,0.0
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k9-k1-k8-k2
3862fa17-4e7d-4d41-9237-c372fca11c03 | |
fa:16:3e:96:06:3f | ip_address='31.0.0.6',
subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' |
ACTIVE |
- I know the first vm (and second) are both on the
compute_1 so look at the ovs logs on compute_1
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k9-k2-k1-k2-k1-k11-k4
- compute_1, in the ofctl show br-int, we see port 7
7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f
- then check flows to see if there is a table 220 flow
for port 7
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k9-k2-k1-k2-k1-k12-k4
And the table 220 flow for port 7 is not there, so the
vm can't get an IP.
[3] is the patch vishal pushed to fix a similar issue
the first time we saw this. What we found is that the
elan tag was being reused, because a port was deleted
and then a new one created and the elan tag reused. So
you ended up with a tunnel port stomping on a vm port.
[3] https://git.opendaylight.org/gerrit/#/c/67009/
On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague
<[email protected] <mailto:[email protected]>> wrote:
Adding Josh to thread.
On Tue, Jan 23, 2018 at 2:25 PM, Faseela K
<[email protected]
<mailto:[email protected]>> wrote:
Manu,
Could you please take a look at the DHCP
failure in the below run?
I am caught up with something else, will
help you out in initial triaging.
Thanks,
Faseela
*From:*Sam Hague [mailto:[email protected]
<mailto:[email protected]>]
*Sent:* Monday, January 22, 2018 10:57 PM
*To:* Vishal Thapar <[email protected]
<mailto:[email protected]>>; Faseela K
<[email protected]
<mailto:[email protected]>>; Jamo Luhrsen
<[email protected] <mailto:[email protected]>>
*Subject:* is dhcp issue fixed on carbon?
Vishal, Faseela,
can you look at this job run to see if the issue
you fixed with the policymgr binding is fixed?
in this build the whole poligymgr bundle has
been removed. This is carbon so I just removed
the whole bundle as we would never use it. Could
that have uncovered something that the code was
doing? If so, then even master and nitrogen
should have the issue since there we disabled
building policymgr - so should be the same as
removing it.
Other thing, merged in carbon is the bind/unbind
patches for elan and dhcp. Could those have an
impact?
Thanks, Sam
I don't see "7(tap3862fa17-4e):
addr:fe:16:3e:96:06:3f" pop up in the table 220
flows which was the problem before.
3862fa17-4e7d-4d41-9237-c372fca11c03 | |
fa:16:3e:96:06:3f | ip_address='31.0.0.6',
subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd'
| ACTIVE |
Thanks, Sam
[1]
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-cs
i
t-1node-openstack-ocata-upstream-stateful-carbon/298/log_02_l3.html.g
z
#s1-t11-k9-k2-k1-k2-k1-k11-k4
--
Thanks
Anil
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev