On Mon, Jun 03, 2024 at 10:18:05AM GMT, Simon Jones wrote:
> In ovs code, bridge_reconfigure function should ONLY be called in
> ovs-vswitchd thread.
>
> But how about this case:
> - ovs-vswitchd is starting, the ovs-vswitch thread calls bridge_reconfigure
> function.
> - at the same time, ovs-vsctl set port xxx request_mtu=4800 comes to
> ovs-vswitchd thread.
> - the ovs-vswitchd interrupt to process request_mtu.

ovs-vsctl does not interrupt ovs-vswitchd. It writes into the
ovsdb-server. Changes in the content of the database, in particular
changes in the port configuration, will be picked up and processed by
ovs-vswitchd's main thread on the next run of bridge_run().

There should be no race condition here.

Maybe the port failed to ge reconfigured and was left in some
partially-initialzed state.

Could you please enable netdev-dpdk and ofproto debug logs and attach
the full ovs-vswitchd.log?

>
> ???
>
> ----
> Simon Jones
>
>
> Ilya Maximets <[email protected]> 于2024年5月31日周五 17:43写道:
>
> > On 5/31/24 04:00, Simon Jones wrote:
> > > Hi all,
> > >
> > > I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1).
> > > Now I found a BUG that ovs crash and could NOT fix again after set
> > > request_mtu.
> > >
> > > 1. How to reproduce and my Analysis:
> > > ```
> > > # start ovs and add bridge and port and openflow
> > >
> > > [root@bogon ~]# ovs-vsctl show
> > > 0444869c-dc4d-462f-8caf-074ecbab1a55
> > >     Bridge br-int
> > >         datapath_type: netdev
> > >         Port p0
> > >             Interface p0
> > >                 type: dpdk
> > >                 options: {dpdk-devargs="0000:c1:00.0"}
> > >         Port br-int
> > >             Interface br-int
> > >                 type: internal
> > >     Bridge br-phy
> > >         datapath_type: netdev
> > >         Port pf1vf0
> > >             Interface pf1vf0
> > >                 type: dpdk
> > >                 options: {dpdk-devargs="0000:c1:00.1,representor=[0]"}
> > >         Port pf1vf1
> > >             Interface pf1vf1
> > >                 type: dpdk
> > >                 options: {dpdk-devargs="0000:c1:00.1,representor=[1]"}
> > >         Port br-phy
> > >             Interface br-phy
> > >                 type: internal
> > >         Port pf1vf3
> > >             Interface pf1vf3
> > >                 type: dpdk
> > >                 options: {dpdk-devargs="0000:c1:00.1,representor=[3]"}
> > >         Port pf1vf2
> > >             Interface pf1vf2
> > >                 type: dpdk
> > >                 options: {dpdk-devargs="0000:c1:00.1,representor=[2]"}
> > >     ovs_version: "2.17.2"
> > >
> > > [root@bogon ~]# ovs-ofctl dump-flows br-int
> > >  cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262,
> > > n_bytes=984712027272, priority=0 actions=NORMAL
> > >
> > >  865084 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:48.23
> > > revalidator53
> > >  865123 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:00.43
> > > revalidator92
> > >  865158 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:58.49
> > > revalidator127
> > >  865171 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 176:29.69
> > > revalidator140
> > >  865058 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:58.03
> > > revalidator27
> > >  865091 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 175:41.81
> > > revalidator60
> > >  865111 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:05.97
> > > revalidator80
> > >  865113 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 177:09.64
> > > revalidator82
> > >  865130 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:16.27
> > > revalidator99
> > >  865155 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:11.22
> > > revalidator124
> > >  865097 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 177:00.22
> > > revalidator66
> > >  865110 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 175:16.52
> > > revalidator79
> > >  865149 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:00.84
> > > revalidator118
> > >  865151 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:29.06
> > > revalidator120
> > >  865057 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 178:03.60
> > > revalidator26
> > >  865070 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 176:17.63
> > > revalidator39
> > >  865112 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 175:35.65
> > > revalidator81
> > >  865083 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:21.53
> > > revalidator52
> > >  865124 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 175:31.27
> > > revalidator93
> > >  865127 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:59.65
> > > revalidator96
> > >  865147 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:51.85
> > > revalidator116
> > >  865164 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 177:34.16
> > > revalidator133
> > >  865051 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:27.68
> > > revalidator20
> > >  865066 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:54.05
> > > revalidator35
> > >  865087 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:38.54
> > > revalidator56
> > >  865100 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:12.42
> > > revalidator69
> > >  865118 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:02.57
> > > revalidator87
> > >  865121 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:06.20
> > > revalidator90
> > >  865132 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:24.71
> > > revalidator101
> > >  865148 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 179:07.53
> > > revalidator117
> > >  865162 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:18.34
> > > revalidator131
> > >  865047 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:30.75
> > > revalidator16
> > >  865080 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 175:36.41
> > > revalidator49
> > >  865117 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:03.18
> > > revalidator86
> > >  865125 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 177:15.42
> > > revalidator94
> > >  865122 root      10 -10  522.9g   1.6g  42808 S  15.0   0.6 176:45.37
> > > revalidator91
> > >  865065 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 176:49.66
> > > revalidator34
> > >  865116 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 174:57.67
> > > revalidator85
> > >  865161 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 175:10.52
> > > revalidator130
> > >  865133 root      10 -10  522.9g   1.6g  42808 S  14.3   0.6 174:49.83
> > > revalidator102
> > >  865016 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   1:27.68
> > > ovs-vswitchd
> > >  865017 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:14.57
> > > eal-intr-thread
> > >  865020 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > bond_cmd_parse_
> > >  865021 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > telemetry-v2
> > >  865022 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.65
> > > dpdk_watchdog1
> > >  865023 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:10.16
> > urcu2
> > >  865025 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:36.14
> > > ct_clean3
> > >  865026 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.04
> > > ipf_clean4
> > >  865027 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:12.28
> > > hw_offload5
> > >  865028 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c106/id:6
> > >  865030 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c88/id:8
> > >  865031 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c21/id:9
> > >  865032 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c78/id:10
> > >  865033 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c124/id:11
> > >  865035 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> > > pmd-c96/id:13
> > >
> > > Notice here, I found that if only one revalidator, there is no BUG.
> > > So maybe thread race-condition of revalidator?
> > >
> > > # type these commands
> > >
> > > ovs-vsctl set interface p0 mtu_request=3000
> > > ovs-vsctl set interface p0 mtu_request=1000
> > > ovs-vsctl set interface p0 mtu_request=2000
> > > ovs-vsctl set interface p0 mtu_request=3100
> > > ovs-vsctl set interface p0 mtu_request=200
> > > ovs-vsctl set interface p0 mtu_request=300
> > > ovs-vsctl set interface p0 mtu_request=500
> > > ovs-vsctl set interface p0 mtu_request=3000
> > > ovs-vsctl set interface p0 mtu_request=1500
> > > ovs-vsctl set interface p0 mtu_request=1300
> > > ovs-vsctl set interface p0 mtu_request=1200
> > > ovs-vsctl set interface p0 mtu_request=800
> > > ovs-vsctl set interface p0 mtu_request=4000
> > > ovs-vsctl set interface p0 mtu_request=5000
> > > ovs-vsctl set interface p0 mtu_request=600
> > > ovs-vsctl set interface p0 mtu_request=2400
> > > ovs-vsctl set interface p0 mtu_request=4800
> > >
> > > Notice, type these commands at one time, the BUG may happen.
> > > But if type commands one by one, which type one command and wait for a
> > > time, the BUG will NOT happen.
> > > So maybe thread race-condition revalidator?
> > >
> > > # BUG happen
> > >
> > >
> > 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating
> > > with signal 15 (Terminated)
> >
> > This is not a crash or a bug.  Signal 15 is a SIGTERM.  It was sent by some
> > other process to ask OVS to terminate itself.  You need to find the process
> > that sends it.
> >
> > In case you're running OVS inside the container, the usual suspect would be
> > the container termination.  Container runtimes usually send SIGTERM to the
> > processes inside before stopping the container.
> >
> > > # 1st, ovs-vswitch restart, I think this is because hugepage is not
> > enough?
> > > 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use
> > > device '0000:c1:00.0' which is already in use by 'p0'
> >
> > This looks strange, I'm not sure how that can happen.
>
>
> > > 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set
> > configuration
> > > (Address already in use)
> > > 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512
> > > # 2nd, after restart, lots of this log.
> > > # Is this caused by thread race-condition of revalidator? Which one
> > thread
> > > add p0, but another add p0 again?
> >
> > Port additions are happening in a single thread, so there should be no
> > race.
> >
> > >
> > > But the key is, this condition could not recover by such as `ovs-vsctl
> > > del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`.
> > > Only restart ovs-vswitch could recover.
> > > ```
> > >
> > > 2. My question
> > > ```
> > > - Is this a BUG which has already been resolved? If it is, which commit?
> > > - How to resolve this BUG?
> > > ```
> > >
> > > Thanks~
> > >
> > > ----
> > > Simon Jones
> >
> >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to