Hi OVS-team,

We had just met a problem in our customer environments, Neutron HA router 
active/standby had switched after high ovs-vswitchd cpu usage.

From the ovs-vswitchd log:
2021-07-22T23:35:00.282Z|420006320|bridge|INFO|bridge br-int: deleted interface 
tapb3eee529-02 on port 1017
2021-07-22T23:35:00.318Z|420006321|bridge|INFO|bridge br-int: added interface 
tapc257bed3-fc on port 1024
2021-07-22T23:35:00.366Z|420006322|bridge|INFO|bridge br-int: added interface 
tap7c1963c9-32 on port 934
2021-07-22T23:35:00.403Z|420006323|bridge|INFO|bridge br-int: added interface 
tap5de0f4ed-70 on port 1094
2021-07-22T23:35:00.441Z|420006324|bridge|INFO|bridge br-int: added interface 
tap76421ff2-e1 on port 990
2021-07-22T23:35:00.487Z|420006325|bridge|INFO|bridge br-int: added interface 
tap71a4dae8-75 on port 1084
2021-07-22T23:35:00.516Z|420006326|bridge|INFO|bridge br-int: added interface 
tapd3177fe4-19 on port 956
2021-07-22T23:35:00.558Z|420006327|bridge|INFO|bridge br-int: added interface 
tapb3eee529-02 on port 1017
2021-07-22T23:35:00.880Z|420006328|bridge|INFO|bridge br-int: deleted interface 
tap64c1f4fe-8b on port 1024
2021-07-22T23:35:00.881Z|420006329|bridge|INFO|bridge br-int: deleted interface 
tap29b6ec20-65 on port 990
2021-07-22T23:35:00.882Z|420006330|bridge|INFO|bridge br-int: deleted interface 
tap7f5e65fd-59 on port 1084
2021-07-22T23:35:00.883Z|420006331|bridge|INFO|bridge br-int: deleted interface 
tape3046139-8c on port 934
2021-07-22T23:35:00.884Z|420006332|bridge|INFO|bridge br-int: deleted interface 
tap9eba01c2-54 on port 1017
2021-07-22T23:35:00.885Z|420006333|bridge|INFO|bridge br-int: deleted interface 
tap26652576-4e on port 1094
2021-07-22T23:35:00.885Z|420006334|bridge|INFO|bridge br-int: deleted interface 
tap7799343f-22 on port 956
2021-07-22T23:35:00.983Z|420006335|bridge|INFO|bridge br-int: added interface 
tap64c1f4fe-8b on port 1024
2021-07-22T23:35:01.056Z|420006336|bridge|INFO|bridge br-int: added interface 
tap29b6ec20-65 on port 990
2021-07-22T23:35:01.101Z|420006337|bridge|INFO|bridge br-int: added interface 
tap7f5e65fd-59 on port 1084
2021-07-22T23:35:01.206Z|420006338|bridge|INFO|bridge br-int: added interface 
tape3046139-8c on port 934
2021-07-22T23:35:01.297Z|420006339|bridge|INFO|bridge br-int: added interface 
tap9eba01c2-54 on port 1017
2021-07-22T23:35:01.657Z|420006340|bridge|INFO|bridge br-int: added interface 
tap26652576-4e on port 1094
2021-07-22T23:35:01.733Z|02155|ofproto_dpif_upcall(handler228)|WARN|Dropped 
9793 log messages in last 62 seconds (most recently, 2 seconds ago) due to 
excessive rate
2021-07-22T23:35:01.733Z|02156|ofproto_dpif_upcall(handler228)|WARN|upcall: 
datapath flow limit reached
2021-07-22T23:35:02.395Z|420006341|bridge|INFO|bridge br-int: added interface 
tap7799343f-22 on port 956
2021-07-22T23:35:03.363Z|00328|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:03.398Z|420006342|ovs_rcu|WARN|blocked 1000 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:03.413Z|138790|timeval(handler196)|WARN|Unreasonably long 
1049ms poll interval (0ms user, 763ms system)
2021-07-22T23:35:03.413Z|138791|timeval(handler196)|WARN|context switches: 236 
voluntary, 107 involuntary
2021-07-22T23:35:03.413Z|138792|coverage(handler196)|INFO|Skipping details of 
duplicate event coverage for hash=ec24845d
2021-07-22T23:35:05.222Z|420006343|ovs_rcu|WARN|blocked 1000 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:05.224Z|00329|ovs_rcu(urcu9)|WARN|blocked 1001 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:05.492Z|138793|timeval(handler196)|WARN|Unreasonably long 
1272ms poll interval (0ms user, 982ms system)
2021-07-22T23:35:05.492Z|138794|timeval(handler196)|WARN|context switches: 240 
voluntary, 104 involuntary
2021-07-22T23:35:05.492Z|138795|coverage(handler196)|INFO|Skipping details of 
duplicate event coverage for hash=ec24845d
2021-07-22T23:35:05.492Z|138796|poll_loop(handler196)|INFO|Dropped 1088 log 
messages in last 6 seconds (most recently, 4 seconds ago) due to excessive rate
2021-07-22T23:35:05.492Z|138797|poll_loop(handler196)|INFO|wakeup due to 0-ms 
timeout at ofproto/ofproto-dpif-upcall.c:752 (79% CPU usage)
2021-07-22T23:35:09.472Z|138798|poll_loop(handler196)|INFO|Dropped 6 log 
messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2021-07-22T23:35:09.473Z|138799|poll_loop(handler196)|INFO|wakeup due to 0-ms 
timeout at ofproto/ofproto-dpif-upcall.c:752 (78% CPU usage)
2021-07-22T23:35:09.473Z|138800|dpif_netlink(handler196)|WARN|system@ovs-system:
 lost packet on port channel 44 of handler 2
2021-07-22T23:35:09.501Z|420006344|bridge|INFO|bridge br-int: deleted interface 
tapc257bed3-fc on port 1024
2021-07-22T23:35:10.473Z|00330|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:10.501Z|420006345|ovs_rcu|WARN|blocked 1000 ms waiting for 
handler196 to quiesce
2021-07-22T23:35:11.473Z|00331|ovs_rcu(urcu9)|WARN|blocked 2000 ms waiting for 
handler196 to quiesce

In the time 23:35:05(UTC time), the handler196 thread cpu usage has high value 
79%. Before that time the log displays blocked handler threads and involuntary 
context switches.
Also there is strange behavior, different tap devices added or deleted on the 
same ovs ofport, the same tap device occurs 50+ times in 1 minutes:
bridge br-int: added interface tapc257bed3-fc on port 1024
bridge br-int: deleted interface tap64c1f4fe-8b on port 1024

It’s no cule to find why the tap device continually added and deleted on the 
same ovs ofport, it seems had lasted for more than one month.


And then we observed in /var/log/messages, the keepalived VRRP switched:
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66825]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55727]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55735]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55709]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66784]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[68081]: 
VRRP_Instance(VR_192) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) 
Received advert with higher priority 50, ours 50
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) 
Entering BACKUP STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[53128]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64102]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[72616]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66712]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[70117]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) 
Received advert with higher priority 50, ours 50
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) 
Entering BACKUP STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[63906]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[58249]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[186176]: 
VRRP_Instance(VR_220) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[67535]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[58302]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[61177]: VRRP_Instance(VR_1) 
Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[72059]: VRRP_Instance(VR_1) 
Transition to MASTER STATE


After keepalived VRRP switched, there are losts of lost packet and flow limit 
reached logs:
|WARN|system@ovs-system: lost packet on port channel
|WARN|upcall: datapath flow limit reached 




_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to