In ovs code, bridge_reconfigure function should ONLY be called in
ovs-vswitchd thread.

But how about this case:
- ovs-vswitchd is starting, the ovs-vswitch thread calls bridge_reconfigure
function.
- at the same time, ovs-vsctl set port xxx request_mtu=4800 comes to
ovs-vswitchd thread.
- the ovs-vswitchd interrupt to process request_mtu.

???

----
Simon Jones


Ilya Maximets <[email protected]> 于2024年5月31日周五 17:43写道:

> On 5/31/24 04:00, Simon Jones wrote:
> > Hi all,
> >
> > I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1).
> > Now I found a BUG that ovs crash and could NOT fix again after set
> > request_mtu.
> >
> > 1. How to reproduce and my Analysis:
> > ```
> > # start ovs and add bridge and port and openflow
> >
> > [root@bogon ~]# ovs-vsctl show
> > 0444869c-dc4d-462f-8caf-074ecbab1a55
> >     Bridge br-int
> >         datapath_type: netdev
> >         Port p0
> >             Interface p0
> >                 type: dpdk
> >                 options: {dpdk-devargs="0000:c1:00.0"}
> >         Port br-int
> >             Interface br-int
> >                 type: internal
> >     Bridge br-phy
> >         datapath_type: netdev
> >         Port pf1vf0
> >             Interface pf1vf0
> >                 type: dpdk
> >                 options: {dpdk-devargs="0000:c1:00.1,representor=[0]"}
> >         Port pf1vf1
> >             Interface pf1vf1
> >                 type: dpdk
> >                 options: {dpdk-devargs="0000:c1:00.1,representor=[1]"}
> >         Port br-phy
> >             Interface br-phy
> >                 type: internal
> >         Port pf1vf3
> >             Interface pf1vf3
> >                 type: dpdk
> >                 options: {dpdk-devargs="0000:c1:00.1,representor=[3]"}
> >         Port pf1vf2
> >             Interface pf1vf2
> >                 type: dpdk
> >                 options: {dpdk-devargs="0000:c1:00.1,representor=[2]"}
> >     ovs_version: "2.17.2"
> >
> > [root@bogon ~]# ovs-ofctl dump-flows br-int
> >  cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262,
> > n_bytes=984712027272, priority=0 actions=NORMAL
> >
> >  865084 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:48.23
> > revalidator53
> >  865123 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:00.43
> > revalidator92
> >  865158 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:58.49
> > revalidator127
> >  865171 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 176:29.69
> > revalidator140
> >  865058 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:58.03
> > revalidator27
> >  865091 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 175:41.81
> > revalidator60
> >  865111 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:05.97
> > revalidator80
> >  865113 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 177:09.64
> > revalidator82
> >  865130 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:16.27
> > revalidator99
> >  865155 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:11.22
> > revalidator124
> >  865097 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 177:00.22
> > revalidator66
> >  865110 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 175:16.52
> > revalidator79
> >  865149 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:00.84
> > revalidator118
> >  865151 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:29.06
> > revalidator120
> >  865057 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 178:03.60
> > revalidator26
> >  865070 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 176:17.63
> > revalidator39
> >  865112 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 175:35.65
> > revalidator81
> >  865083 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:21.53
> > revalidator52
> >  865124 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 175:31.27
> > revalidator93
> >  865127 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:59.65
> > revalidator96
> >  865147 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:51.85
> > revalidator116
> >  865164 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 177:34.16
> > revalidator133
> >  865051 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:27.68
> > revalidator20
> >  865066 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:54.05
> > revalidator35
> >  865087 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:38.54
> > revalidator56
> >  865100 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:12.42
> > revalidator69
> >  865118 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:02.57
> > revalidator87
> >  865121 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:06.20
> > revalidator90
> >  865132 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:24.71
> > revalidator101
> >  865148 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 179:07.53
> > revalidator117
> >  865162 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:18.34
> > revalidator131
> >  865047 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:30.75
> > revalidator16
> >  865080 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 175:36.41
> > revalidator49
> >  865117 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:03.18
> > revalidator86
> >  865125 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 177:15.42
> > revalidator94
> >  865122 root      10 -10  522.9g   1.6g  42808 S  15.0   0.6 176:45.37
> > revalidator91
> >  865065 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 176:49.66
> > revalidator34
> >  865116 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 174:57.67
> > revalidator85
> >  865161 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 175:10.52
> > revalidator130
> >  865133 root      10 -10  522.9g   1.6g  42808 S  14.3   0.6 174:49.83
> > revalidator102
> >  865016 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   1:27.68
> > ovs-vswitchd
> >  865017 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:14.57
> > eal-intr-thread
> >  865020 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > bond_cmd_parse_
> >  865021 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > telemetry-v2
> >  865022 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.65
> > dpdk_watchdog1
> >  865023 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:10.16
> urcu2
> >  865025 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:36.14
> > ct_clean3
> >  865026 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.04
> > ipf_clean4
> >  865027 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:12.28
> > hw_offload5
> >  865028 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c106/id:6
> >  865030 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c88/id:8
> >  865031 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c21/id:9
> >  865032 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c78/id:10
> >  865033 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c124/id:11
> >  865035 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > pmd-c96/id:13
> >
> > Notice here, I found that if only one revalidator, there is no BUG.
> > So maybe thread race-condition of revalidator?
> >
> > # type these commands
> >
> > ovs-vsctl set interface p0 mtu_request=3000
> > ovs-vsctl set interface p0 mtu_request=1000
> > ovs-vsctl set interface p0 mtu_request=2000
> > ovs-vsctl set interface p0 mtu_request=3100
> > ovs-vsctl set interface p0 mtu_request=200
> > ovs-vsctl set interface p0 mtu_request=300
> > ovs-vsctl set interface p0 mtu_request=500
> > ovs-vsctl set interface p0 mtu_request=3000
> > ovs-vsctl set interface p0 mtu_request=1500
> > ovs-vsctl set interface p0 mtu_request=1300
> > ovs-vsctl set interface p0 mtu_request=1200
> > ovs-vsctl set interface p0 mtu_request=800
> > ovs-vsctl set interface p0 mtu_request=4000
> > ovs-vsctl set interface p0 mtu_request=5000
> > ovs-vsctl set interface p0 mtu_request=600
> > ovs-vsctl set interface p0 mtu_request=2400
> > ovs-vsctl set interface p0 mtu_request=4800
> >
> > Notice, type these commands at one time, the BUG may happen.
> > But if type commands one by one, which type one command and wait for a
> > time, the BUG will NOT happen.
> > So maybe thread race-condition revalidator?
> >
> > # BUG happen
> >
> >
> 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating
> > with signal 15 (Terminated)
>
> This is not a crash or a bug.  Signal 15 is a SIGTERM.  It was sent by some
> other process to ask OVS to terminate itself.  You need to find the process
> that sends it.
>
> In case you're running OVS inside the container, the usual suspect would be
> the container termination.  Container runtimes usually send SIGTERM to the
> processes inside before stopping the container.
>
> > # 1st, ovs-vswitch restart, I think this is because hugepage is not
> enough?
> > 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use
> > device '0000:c1:00.0' which is already in use by 'p0'
>
> This looks strange, I'm not sure how that can happen.
>
> > 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set
> configuration
> > (Address already in use)
> > 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512
> > # 2nd, after restart, lots of this log.
> > # Is this caused by thread race-condition of revalidator? Which one
> thread
> > add p0, but another add p0 again?
>
> Port additions are happening in a single thread, so there should be no
> race.
>
> >
> > But the key is, this condition could not recover by such as `ovs-vsctl
> > del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`.
> > Only restart ovs-vswitch could recover.
> > ```
> >
> > 2. My question
> > ```
> > - Is this a BUG which has already been resolved? If it is, which commit?
> > - How to resolve this BUG?
> > ```
> >
> > Thanks~
> >
> > ----
> > Simon Jones
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to