On 4/4/24 18:07, Brian Haley wrote:
> Hi,
> 
> On 4/4/24 6:12 AM, Ilya Maximets wrote:
>> On 4/3/24 22:15, Brian Haley via discuss wrote:
>>> Hi,
>>>
>>> I recently have been seeing issues in a large environment where the
>>> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
>>> for example:
>>>
>>> 17842 times the listen queue of a socket overflowed
>>> 17842 SYNs to LISTEN sockets dropped
>>
>> Does this cause significant re-connection delays or is it just an
>> observation?
> 
> It is just an observation at this point.

Ack.

> 
>>> There is more on NB than SB, but I was surprised to see any. I can only
>>> guess at the moment it is happening when the leader changes and hundreds
>>> of nodes try and reconnect.
>>
>> This sounds a little strange.  Do you have hundreds leader-only clients
>> for Northbound DB?  In general, only write-heavy clients actually need
>> to be leader-only.
> 
> There are a lot of leader-only clients due to the way the neutron API 
> server runs - each worker thread has a connection, and they are scaled 
> depending on processor count, so typically there are at least 32. Then 
> multiply that by three since there is HA involved.
> 
> Actually I had a look in a recent report and there were 61 NB/62 SB 
> connections per system, so that would make ~185 for each server. I would 
> think in a typical deployment there might be closer to 100.
> 
>>> Looking at their sockets I can see the backlog is only set to 10:
>>>
>>> $ ss -ltm | grep 664
>>> LISTEN 0      10           0.0.0.0:6641        0.0.0.0:*
>>> LISTEN 0      10           0.0.0.0:6642        0.0.0.0:*
>>>
>>> Digging into the code, there is only two places where listen() is
>>> called, one being inet_open_passive():
>>>
>>>       /* Listen. */
>>>       if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>>>           error = sock_errno();
>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>>           goto error;
>>>       }
>>>
>>> There is no way to config around this to even test if increasing would
>>> help in a running environment.
>>>
>>> So my question is two-fold:
>>>
>>> 1) Should this be increased? 128, 256, 1024? I can send a patch.
>>>
>>> 2) Should this be configurable?
>>>
>>> Has anyone else seen this?
>>
>> I don't remember having any significant issues related to connection
>> timeouts as they usually getting resolved quickly.  And if the server
>> doesn't accept the connection fast enough it means that the server is
>> busy and there may not be real benefit from having more connections
>> in the backlog.  It may just hide the connection timeout warning while
>> service will not actually be available for the roughly the same amount
>> of time anyway.  Having lower backlog may allow clients to re-connect
>> to a less loaded server faster.
> 
> Understood, increasing the backlog might just hide the warnings and not 
> fix the issue.
> 
> I'll explain what seems to be happening, at least from looking at the 
> logs I have. All the worker threads in question are happily connected to 
> the leader. When the leader changes there is a bit of a stampede while 
> they all try and re-connect to the new leader. But since they don't know 
> which of the three (again, HA) systems are the leader, they just pick 
> one of the other two. When they don't get the leader they disconnect and 
> try another.
> 
> It might be there is something we can do on the neutron side as well, 
> the 10 backlog just seemed like the first place to start.

I believe I heard something about adjusting the number of connections
in neutron, but I don't have any specific pointers.  Maybe Ihar knows
something about it?

> 
>> Saying that, the original code clearly wasn't designed for a high
>> number of simultaneous connection attempts, so it makes sense to
>> increase the backlog to some higher value.  I see Ihar re-posted his
>> patch doing that here:
>>    
>> https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
>> I'll take a look at it.
> 
> Thanks, I plan on testing that as well.
> 
>> One other thing that we could do is to accept more connections at a time.
>> Currently we accept one connection per event loop iteration.  But we
>> need to be careful here as handling multiple initial monitor requests
>> for the database within a single iteration may be costly and may reduce
>> overall responsiveness of the server.  Needs some research.
>>
>> Having hundreds leader-only clients for Nb still sounds a little strange
>> to me though.
> 
> There might be a better way, or I might be mis-understanding as well. We 
> actually have some meetings next week and I can add this as a discussion 
> topic.

I believe newer versions of Neutron went away from leader-only connections
in most places.  At least on Sb side:
  https://review.opendev.org/c/openstack/neutron/+/803268

Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to