And as a follow-up, I see sometimes next error message in ovn-controller log, though there is a configured much higher limit in seconds for inactivity probe value for the chassis:
ovs|04426|rconn|ERR|unix:/run/openvswitch/br-int.mgmt: no response to inactivity probe after 8 seconds, disconnecting ovs|26560|rconn|ERR|unix:/run/openvswitch/br-int.mgmt: no response to inactivity probe after 10 seconds, disconnecting server with first log message: # ovs-vsctl get open . external_ids:ovn-openflow-probe-interval "30" server with second log message: # ovs-vsctl get open . external_ids:ovn-openflow-probe-interval "60" > On 30 Mar 2023, at 23:19, Vladislav Odintsov <[email protected]> wrote: > > Hi all, > > I’m facing a false positive failover in the event of openflow connection got > dropped after inactivity probe timeout due to high load of ovn-controller > handling huge amount of ovsdb updates. > I wonder wether its a bug or I have to tune any specifics. See ovn-controller > logs: > > 2023-03-30T07:57:24.469Z|58440|inc_proc_eng|INFO|node: logical_flow_output, > recompute (failed handler for input port_groups) took 35774ms > 2023-03-30T07:57:28.521Z|58441|lflow_cache|INFO|Detected cache inactivity > (last active 40006 ms ago): trimming cache > 2023-03-30T07:57:28.528Z|58442|timeval|WARN|Unreasonably long 40084ms poll > interval (32947ms user, 6968ms system) > 2023-03-30T07:57:28.528Z|58443|timeval|WARN|faults: 43468 minor, 0 major > ….. > 2023-03-30T07:57:48.784Z|58496|reconnect|ERR|ssl:xxxx:6642: no response to > inactivity probe after 60.2 seconds, disconnecting > 2023-03-30T07:57:48.784Z|58497|reconnect|INFO|ssl:xxxx:6642: connection > dropped > 2023-03-30T07:57:48.905Z|58498|main|INFO|OVNSB commit failed, force recompute > next time. > 2023-03-30T07:57:49.786Z|58499|reconnect|INFO|ssl:yyyy:6642: connecting... > 2023-03-30T07:57:49.924Z|58500|reconnect|INFO|ssl:yyyy:6642: connected > 2023-03-30T07:57:29.781Z|58501|poll_loop|INFO|wakeup due to 0-ms timeout at > lib/stream-ssl.c:838 (74% CPU usage) > 2023-03-30T07:57:29.836Z|58502|poll_loop|INFO|wakeup due to 0-ms timeout at > lib/stream-ssl.c:838 (74% CPU usage) > …. > 2023-03-30T07:57:05.976Z|58545|poll_loop|INFO|wakeup due to 0-ms timeout at > lib/stream-ssl.c:838 (101% CPU usage) > 2023-03-30T07:57:07.002Z|58546|memory|INFO|peak resident set size grew 54% in > last 682257.2 seconds, from 4435696 kB to 6824820 kB > 2023-03-30T07:57:07.002Z|58547|memory|INFO|idl-cells-OVN_Southbound:11491359 > idl-cells-Open_vSwitch:14416 lflow-cache-entries-cache-expr:530298 > lflow-cache-entries-cache-matches:31770 lflow-cache-size-KB:769553 > local_datapath_usage-KB:630 > ofctrl_desired_flow_usage-KB:577052 ofctrl_installed_flow_usage-KB:409538 > ofctrl_sb_flow_ref_usage-KB:273707 > … > 2023-03-30T07:57:31.667Z|58552|rconn|WARN|unix:/run/openvswitch/br-int.mgmt: > connection dropped (Broken pipe) > 2023-03-30T07:57:31.739Z|58553|binding|INFO|Releasing lport cr-xxx from this > chassis (sb_readonly=0) > 2023-03-30T07:57:31.739Z|58554|if_status|WARN|Dropped 1 log messages in last > 1254725 seconds (most recently, 1254725 seconds ago) due to excessive rate > 2023-03-30T07:57:31.739Z|58555|if_status|WARN|Trying to release unknown > interface cr-xxx > 2023-03-30T07:57:31.743Z|58556|binding|INFO|Releasing lport cr-yyy from this > chassis (sb_readonly=0) > 2023-03-30T07:57:31.743Z|58557|binding|INFO|Releasing lport cr-zzz from this > chassis (sb_readonly=0) > … > > After some time these ports are claimed back: > > 2023-03-30T07:57:49.320Z|59137|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: > connecting... > 2023-03-30T07:57:49.320Z|59138|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: > connected > 2023-03-30T07:57:49.523Z|59139|poll_loop|INFO|wakeup due to 0-ms timeout at > controller/ovn-controller.c:4296 (99% CPU usage) > 2023-03-30T07:57:49.574Z|59140|poll_loop|INFO|wakeup due to 0-ms timeout at > controller/ovn-controller.c:4286 (99% CPU usage) > 2023-03-30T07:58:01.451Z|59141|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 > (xxxx:34736<->yyyy:6642) at lib/stream-ssl.c:836 (99% CPU usage) > 2023-03-30T07:58:02.449Z|59142|binding|INFO|Claiming lport cr-xxx for this > chassis. > 2023-03-30T07:58:02.449Z|59143|binding|INFO|cr-xxx: Claiming xx:xx:xx:xx:xx > x.x.x.x/xx > 2023-03-30T07:58:02.453Z|59144|binding|INFO|Claiming lport cr-yyy for this > chassis. > 2023-03-30T07:58:02.453Z|59145|binding|INFO|cr-yyy: Claiming > yy:yy:yy:yy:yy:yy y.y.y.y/yy > … > > Previously I’ve met with similar problem but the openflow probe interval was > 30 seconds and I set it to 60 and then to 120 seconds. > But it seems that OVS was configured on default inactivity probe timeout for > openflow - 60 seconds. So, the error above occured. In OVS logs it looks like > next: > > 2023-03-30T07:57:28.632Z|61003|rconn|ERR|br-int<->unix#8: no response to > inactivity probe after 60 seconds, disconnecting > > This node is a dedicated l3gateway which has is-interconn=true setting > (interconnection case) and has quite big openflow table around 2.7M flows. > > So, I’ve got some questions for OVN and OVS: > 1. I couldn’t find in ovn-controller code where the logic of handling > openflow connection drop. It is a bug when after openflow connection drop the > claimed cr-ports are being released? I guess, yes. Can you please point me to > the corrent place of code to start investigating? > 2. I see almost the same openflow amount on each l3gateway with > interconnection role. Even l3gateways which have no allocated > cr-port-bindings have 2M+ openflow flows. Is it normal operation or it’s a > place for possible optimisation? I mean it seems that excess flows are > installed. > 3. How to handle long OVS OF updates, which block inactivity probe for OF > connection and then got dropped? From ovn-controller it’s possible to > configure inactivity probe, is it possible and should I configure from > ovs-vswitchd side? > 4. There is a warning on try to release unknown interface, which was relased > right before. Is it normal situation or also should be investigated? > > > Thanks. > > Regards, > Vladislav Odintsov > Regards, Vladislav Odintsov _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
