Hi all,
I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1).
Now I found a BUG that ovs crash and could NOT fix again after set
request_mtu.
1. How to reproduce and my Analysis:
```
# start ovs and add bridge and port and openflow
[root@bogon ~]# ovs-vsctl show
0444869c-dc4d-462f-8caf-074ecbab1a55
Bridge br-int
datapath_type: netdev
Port p0
Interface p0
type: dpdk
options: {dpdk-devargs="0000:c1:00.0"}
Port br-int
Interface br-int
type: internal
Bridge br-phy
datapath_type: netdev
Port pf1vf0
Interface pf1vf0
type: dpdk
options: {dpdk-devargs="0000:c1:00.1,representor=[0]"}
Port pf1vf1
Interface pf1vf1
type: dpdk
options: {dpdk-devargs="0000:c1:00.1,representor=[1]"}
Port br-phy
Interface br-phy
type: internal
Port pf1vf3
Interface pf1vf3
type: dpdk
options: {dpdk-devargs="0000:c1:00.1,representor=[3]"}
Port pf1vf2
Interface pf1vf2
type: dpdk
options: {dpdk-devargs="0000:c1:00.1,representor=[2]"}
ovs_version: "2.17.2"
[root@bogon ~]# ovs-ofctl dump-flows br-int
cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262,
n_bytes=984712027272, priority=0 actions=NORMAL
865084 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:48.23
revalidator53
865123 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:00.43
revalidator92
865158 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:58.49
revalidator127
865171 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 176:29.69
revalidator140
865058 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:58.03
revalidator27
865091 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 175:41.81
revalidator60
865111 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:05.97
revalidator80
865113 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 177:09.64
revalidator82
865130 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:16.27
revalidator99
865155 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:11.22
revalidator124
865097 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 177:00.22
revalidator66
865110 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 175:16.52
revalidator79
865149 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:00.84
revalidator118
865151 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:29.06
revalidator120
865057 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 178:03.60
revalidator26
865070 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 176:17.63
revalidator39
865112 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 175:35.65
revalidator81
865083 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:21.53
revalidator52
865124 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 175:31.27
revalidator93
865127 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:59.65
revalidator96
865147 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:51.85
revalidator116
865164 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 177:34.16
revalidator133
865051 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:27.68
revalidator20
865066 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:54.05
revalidator35
865087 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:38.54
revalidator56
865100 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:12.42
revalidator69
865118 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:02.57
revalidator87
865121 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:06.20
revalidator90
865132 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:24.71
revalidator101
865148 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 179:07.53
revalidator117
865162 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:18.34
revalidator131
865047 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:30.75
revalidator16
865080 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 175:36.41
revalidator49
865117 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:03.18
revalidator86
865125 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 177:15.42
revalidator94
865122 root 10 -10 522.9g 1.6g 42808 S 15.0 0.6 176:45.37
revalidator91
865065 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 176:49.66
revalidator34
865116 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 174:57.67
revalidator85
865161 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 175:10.52
revalidator130
865133 root 10 -10 522.9g 1.6g 42808 S 14.3 0.6 174:49.83
revalidator102
865016 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 1:27.68
ovs-vswitchd
865017 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:14.57
eal-intr-thread
865020 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
bond_cmd_parse_
865021 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
telemetry-v2
865022 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.65
dpdk_watchdog1
865023 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:10.16 urcu2
865025 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:36.14
ct_clean3
865026 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.04
ipf_clean4
865027 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:12.28
hw_offload5
865028 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c106/id:6
865030 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c88/id:8
865031 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c21/id:9
865032 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c78/id:10
865033 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c124/id:11
865035 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00
pmd-c96/id:13
Notice here, I found that if only one revalidator, there is no BUG.
So maybe thread race-condition of revalidator?
# type these commands
ovs-vsctl set interface p0 mtu_request=3000
ovs-vsctl set interface p0 mtu_request=1000
ovs-vsctl set interface p0 mtu_request=2000
ovs-vsctl set interface p0 mtu_request=3100
ovs-vsctl set interface p0 mtu_request=200
ovs-vsctl set interface p0 mtu_request=300
ovs-vsctl set interface p0 mtu_request=500
ovs-vsctl set interface p0 mtu_request=3000
ovs-vsctl set interface p0 mtu_request=1500
ovs-vsctl set interface p0 mtu_request=1300
ovs-vsctl set interface p0 mtu_request=1200
ovs-vsctl set interface p0 mtu_request=800
ovs-vsctl set interface p0 mtu_request=4000
ovs-vsctl set interface p0 mtu_request=5000
ovs-vsctl set interface p0 mtu_request=600
ovs-vsctl set interface p0 mtu_request=2400
ovs-vsctl set interface p0 mtu_request=4800
Notice, type these commands at one time, the BUG may happen.
But if type commands one by one, which type one command and wait for a
time, the BUG will NOT happen.
So maybe thread race-condition revalidator?
# BUG happen
2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating
with signal 15 (Terminated)
# 1st, ovs-vswitch restart, I think this is because hugepage is not enough?
2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use
device '0000:c1:00.0' which is already in use by 'p0'
2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set configuration
(Address already in use)
2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512
# 2nd, after restart, lots of this log.
# Is this caused by thread race-condition of revalidator? Which one thread
add p0, but another add p0 again?
But the key is, this condition could not recover by such as `ovs-vsctl
del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`.
Only restart ovs-vswitch could recover.
```
2. My question
```
- Is this a BUG which has already been resolved? If it is, which commit?
- How to resolve this BUG?
```
Thanks~
----
Simon Jones
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev