On Mon, Jun 03, 2024 at 10:18:05AM GMT, Simon Jones wrote: > In ovs code, bridge_reconfigure function should ONLY be called in > ovs-vswitchd thread. > > But how about this case: > - ovs-vswitchd is starting, the ovs-vswitch thread calls bridge_reconfigure > function. > - at the same time, ovs-vsctl set port xxx request_mtu=4800 comes to > ovs-vswitchd thread. > - the ovs-vswitchd interrupt to process request_mtu.
ovs-vsctl does not interrupt ovs-vswitchd. It writes into the ovsdb-server. Changes in the content of the database, in particular changes in the port configuration, will be picked up and processed by ovs-vswitchd's main thread on the next run of bridge_run(). There should be no race condition here. Maybe the port failed to ge reconfigured and was left in some partially-initialzed state. Could you please enable netdev-dpdk and ofproto debug logs and attach the full ovs-vswitchd.log? > > ??? > > ---- > Simon Jones > > > Ilya Maximets <[email protected]> 于2024年5月31日周五 17:43写道: > > > On 5/31/24 04:00, Simon Jones wrote: > > > Hi all, > > > > > > I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1). > > > Now I found a BUG that ovs crash and could NOT fix again after set > > > request_mtu. > > > > > > 1. How to reproduce and my Analysis: > > > ``` > > > # start ovs and add bridge and port and openflow > > > > > > [root@bogon ~]# ovs-vsctl show > > > 0444869c-dc4d-462f-8caf-074ecbab1a55 > > > Bridge br-int > > > datapath_type: netdev > > > Port p0 > > > Interface p0 > > > type: dpdk > > > options: {dpdk-devargs="0000:c1:00.0"} > > > Port br-int > > > Interface br-int > > > type: internal > > > Bridge br-phy > > > datapath_type: netdev > > > Port pf1vf0 > > > Interface pf1vf0 > > > type: dpdk > > > options: {dpdk-devargs="0000:c1:00.1,representor=[0]"} > > > Port pf1vf1 > > > Interface pf1vf1 > > > type: dpdk > > > options: {dpdk-devargs="0000:c1:00.1,representor=[1]"} > > > Port br-phy > > > Interface br-phy > > > type: internal > > > Port pf1vf3 > > > Interface pf1vf3 > > > type: dpdk > > > options: {dpdk-devargs="0000:c1:00.1,representor=[3]"} > > > Port pf1vf2 > > > Interface pf1vf2 > > > type: dpdk > > > options: {dpdk-devargs="0000:c1:00.1,representor=[2]"} > > > ovs_version: "2.17.2" > > > > > > [root@bogon ~]# ovs-ofctl dump-flows br-int > > > cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262, > > > n_bytes=984712027272, priority=0 actions=NORMAL > > > > > > 865084 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:48.23 > > > revalidator53 > > > 865123 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:00.43 > > > revalidator92 > > > 865158 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:58.49 > > > revalidator127 > > > 865171 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 176:29.69 > > > revalidator140 > > > 865058 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:58.03 > > > revalidator27 > > > 865091 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 175:41.81 > > > revalidator60 > > > 865111 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:05.97 > > > revalidator80 > > > 865113 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 177:09.64 > > > revalidator82 > > > 865130 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:16.27 > > > revalidator99 > > > 865155 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:11.22 > > > revalidator124 > > > 865097 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 177:00.22 > > > revalidator66 > > > 865110 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 175:16.52 > > > revalidator79 > > > 865149 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:00.84 > > > revalidator118 > > > 865151 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:29.06 > > > revalidator120 > > > 865057 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 178:03.60 > > > revalidator26 > > > 865070 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 176:17.63 > > > revalidator39 > > > 865112 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 175:35.65 > > > revalidator81 > > > 865083 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:21.53 > > > revalidator52 > > > 865124 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 175:31.27 > > > revalidator93 > > > 865127 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:59.65 > > > revalidator96 > > > 865147 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:51.85 > > > revalidator116 > > > 865164 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 177:34.16 > > > revalidator133 > > > 865051 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:27.68 > > > revalidator20 > > > 865066 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:54.05 > > > revalidator35 > > > 865087 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:38.54 > > > revalidator56 > > > 865100 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:12.42 > > > revalidator69 > > > 865118 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:02.57 > > > revalidator87 > > > 865121 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:06.20 > > > revalidator90 > > > 865132 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:24.71 > > > revalidator101 > > > 865148 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 179:07.53 > > > revalidator117 > > > 865162 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:18.34 > > > revalidator131 > > > 865047 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:30.75 > > > revalidator16 > > > 865080 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 175:36.41 > > > revalidator49 > > > 865117 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:03.18 > > > revalidator86 > > > 865125 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 177:15.42 > > > revalidator94 > > > 865122 root 10 -10 522.9g 1.6g 42808 S 15.0 0.6 176:45.37 > > > revalidator91 > > > 865065 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 176:49.66 > > > revalidator34 > > > 865116 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 174:57.67 > > > revalidator85 > > > 865161 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 175:10.52 > > > revalidator130 > > > 865133 root 10 -10 522.9g 1.6g 42808 S 14.3 0.6 174:49.83 > > > revalidator102 > > > 865016 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 1:27.68 > > > ovs-vswitchd > > > 865017 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:14.57 > > > eal-intr-thread > > > 865020 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > bond_cmd_parse_ > > > 865021 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > telemetry-v2 > > > 865022 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.65 > > > dpdk_watchdog1 > > > 865023 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:10.16 > > urcu2 > > > 865025 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:36.14 > > > ct_clean3 > > > 865026 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.04 > > > ipf_clean4 > > > 865027 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:12.28 > > > hw_offload5 > > > 865028 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c106/id:6 > > > 865030 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c88/id:8 > > > 865031 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c21/id:9 > > > 865032 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c78/id:10 > > > 865033 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c124/id:11 > > > 865035 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > > > pmd-c96/id:13 > > > > > > Notice here, I found that if only one revalidator, there is no BUG. > > > So maybe thread race-condition of revalidator? > > > > > > # type these commands > > > > > > ovs-vsctl set interface p0 mtu_request=3000 > > > ovs-vsctl set interface p0 mtu_request=1000 > > > ovs-vsctl set interface p0 mtu_request=2000 > > > ovs-vsctl set interface p0 mtu_request=3100 > > > ovs-vsctl set interface p0 mtu_request=200 > > > ovs-vsctl set interface p0 mtu_request=300 > > > ovs-vsctl set interface p0 mtu_request=500 > > > ovs-vsctl set interface p0 mtu_request=3000 > > > ovs-vsctl set interface p0 mtu_request=1500 > > > ovs-vsctl set interface p0 mtu_request=1300 > > > ovs-vsctl set interface p0 mtu_request=1200 > > > ovs-vsctl set interface p0 mtu_request=800 > > > ovs-vsctl set interface p0 mtu_request=4000 > > > ovs-vsctl set interface p0 mtu_request=5000 > > > ovs-vsctl set interface p0 mtu_request=600 > > > ovs-vsctl set interface p0 mtu_request=2400 > > > ovs-vsctl set interface p0 mtu_request=4800 > > > > > > Notice, type these commands at one time, the BUG may happen. > > > But if type commands one by one, which type one command and wait for a > > > time, the BUG will NOT happen. > > > So maybe thread race-condition revalidator? > > > > > > # BUG happen > > > > > > > > 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating > > > with signal 15 (Terminated) > > > > This is not a crash or a bug. Signal 15 is a SIGTERM. It was sent by some > > other process to ask OVS to terminate itself. You need to find the process > > that sends it. > > > > In case you're running OVS inside the container, the usual suspect would be > > the container termination. Container runtimes usually send SIGTERM to the > > processes inside before stopping the container. > > > > > # 1st, ovs-vswitch restart, I think this is because hugepage is not > > enough? > > > 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use > > > device '0000:c1:00.0' which is already in use by 'p0' > > > > This looks strange, I'm not sure how that can happen. > > > > > 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set > > configuration > > > (Address already in use) > > > 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512 > > > # 2nd, after restart, lots of this log. > > > # Is this caused by thread race-condition of revalidator? Which one > > thread > > > add p0, but another add p0 again? > > > > Port additions are happening in a single thread, so there should be no > > race. > > > > > > > > But the key is, this condition could not recover by such as `ovs-vsctl > > > del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`. > > > Only restart ovs-vswitch could recover. > > > ``` > > > > > > 2. My question > > > ``` > > > - Is this a BUG which has already been resolved? If it is, which commit? > > > - How to resolve this BUG? > > > ``` > > > > > > Thanks~ > > > > > > ---- > > > Simon Jones > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
