Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

Jake Yip via discuss Thu, 16 Mar 2023 15:06:37 -0700

Hi all,

Apologies for jumping into this thread. We are seeing the same and it'snice to find someone with similar issues :)


On 8/3/2023 3:43 am, Ilya Maximets via discuss wrote:


We see failures on the OVSDB Relay side:

2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to 
inactivity probe after 5 seconds, disconnecting
2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection dropped
2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected
2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response to 
inactivity probe after 5 seconds, disconnecting
2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection dropped
2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected
2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response to 
inactivity probe after 5 seconds, disconnecting
2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection dropped
2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected
2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response to 
inactivity probe after 5 seconds, disconnecting
2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection dropped
2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected

On the DB cluster this looks like:

2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL 
connection close
2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection dropped 
(Protocol error)


OK.  These are symptoms.  The cause must be something like
'Unreasonably long MANY ms poll interval' on the DB cluster side.
i.e. the reason why the main DB cluster didn't reply to the
probes sent from the relay.  Because as soon as server receives
the probe, it replies right back.  If it didn't reply, it was
doing something else for an extended period of time.  "MANY" is
more than 5 seconds.


We are seeing the same issue here after moving to OVN relay.

- On the relay "no response to inactivity probe after 5 seconds"
- On the OVSDB cluster
  - "Unreasonably long 1726ms poll interval"
  - "connection dropped (Input/output error)"
  - "SSL_write: system error (Broken pipe)"
  - 100% CPU on northd process

Is there anything we could look for on the OVSDB side to narrow downwhat may be causing the load on the cluster side?

A brief history - We are migrating an OpenStack cloud from MidoNet toOVN. This cloud has roughly


- 400 neutron networks / ovn logical switches
- 300 neutron routers
- 14000 neutron ports / ovn logical switchports
- 28000 neutron security groups / ovn port group
- 80000 neutron secgroup rules / acl

We populated the OVN DB by using OpenStack/Neutron ovn sync script.

We have attempted the migration twice previously (2021, 2022) but faileddue to load issues. We've reported issues and have seen lots ofperformance improvements over the last two years. Here is a BIG thankyou to the dev teams!


We are now on the following versions

- OVS 2.17
- OVN 22.03

We are exploring upgrade as an option, but I am concerned if there'ssomething fundamentally wrong with the data / config we have that iscausing the high load, and would like to rule that out first. Please letme know if you need more information, will be happy to start a newthread too.


Regards,
Jake
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

Reply via email to