On 4/3/24 22:15, Brian Haley via discuss wrote:
> Hi,
> 
> I recently have been seeing issues in a large environment where the 
> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, 
> for example:
> 
> 17842 times the listen queue of a socket overflowed
> 17842 SYNs to LISTEN sockets dropped

Does this cause significant re-connection delays or is it just an
observation?

> 
> There is more on NB than SB, but I was surprised to see any. I can only 
> guess at the moment it is happening when the leader changes and hundreds 
> of nodes try and reconnect.

This sounds a little strange.  Do you have hundreds leader-only clients
for Northbound DB?  In general, only write-heavy clients actually need
to be leader-only.

> 
> Looking at their sockets I can see the backlog is only set to 10:
> 
> $ ss -ltm | grep 664
> LISTEN 0      10           0.0.0.0:6641        0.0.0.0:*
> LISTEN 0      10           0.0.0.0:6642        0.0.0.0:*
> 
> Digging into the code, there is only two places where listen() is 
> called, one being inet_open_passive():
> 
>      /* Listen. */
>      if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>          error = sock_errno();
>          VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>          goto error;
>      }
> 
> There is no way to config around this to even test if increasing would 
> help in a running environment.
> 
> So my question is two-fold:
> 
> 1) Should this be increased? 128, 256, 1024? I can send a patch.
> 
> 2) Should this be configurable?
> 
> Has anyone else seen this?

I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly.  And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog.  It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway.  Having lower backlog may allow clients to re-connect
to a less loaded server faster.

Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value.  I see Ihar re-posted his
patch doing that here:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.

One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration.  But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server.  Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.

Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to