On 4/4/24 18:07, Brian Haley wrote: > Hi, > > On 4/4/24 6:12 AM, Ilya Maximets wrote: >> On 4/3/24 22:15, Brian Haley via discuss wrote: >>> Hi, >>> >>> I recently have been seeing issues in a large environment where the >>> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, >>> for example: >>> >>> 17842 times the listen queue of a socket overflowed >>> 17842 SYNs to LISTEN sockets dropped >> >> Does this cause significant re-connection delays or is it just an >> observation? > > It is just an observation at this point.
Ack. > >>> There is more on NB than SB, but I was surprised to see any. I can only >>> guess at the moment it is happening when the leader changes and hundreds >>> of nodes try and reconnect. >> >> This sounds a little strange. Do you have hundreds leader-only clients >> for Northbound DB? In general, only write-heavy clients actually need >> to be leader-only. > > There are a lot of leader-only clients due to the way the neutron API > server runs - each worker thread has a connection, and they are scaled > depending on processor count, so typically there are at least 32. Then > multiply that by three since there is HA involved. > > Actually I had a look in a recent report and there were 61 NB/62 SB > connections per system, so that would make ~185 for each server. I would > think in a typical deployment there might be closer to 100. > >>> Looking at their sockets I can see the backlog is only set to 10: >>> >>> $ ss -ltm | grep 664 >>> LISTEN 0 10 0.0.0.0:6641 0.0.0.0:* >>> LISTEN 0 10 0.0.0.0:6642 0.0.0.0:* >>> >>> Digging into the code, there is only two places where listen() is >>> called, one being inet_open_passive(): >>> >>> /* Listen. */ >>> if (style == SOCK_STREAM && listen(fd, 10) < 0) { >>> error = sock_errno(); >>> VLOG_ERR("%s: listen: %s", target, sock_strerror(error)); >>> goto error; >>> } >>> >>> There is no way to config around this to even test if increasing would >>> help in a running environment. >>> >>> So my question is two-fold: >>> >>> 1) Should this be increased? 128, 256, 1024? I can send a patch. >>> >>> 2) Should this be configurable? >>> >>> Has anyone else seen this? >> >> I don't remember having any significant issues related to connection >> timeouts as they usually getting resolved quickly. And if the server >> doesn't accept the connection fast enough it means that the server is >> busy and there may not be real benefit from having more connections >> in the backlog. It may just hide the connection timeout warning while >> service will not actually be available for the roughly the same amount >> of time anyway. Having lower backlog may allow clients to re-connect >> to a less loaded server faster. > > Understood, increasing the backlog might just hide the warnings and not > fix the issue. > > I'll explain what seems to be happening, at least from looking at the > logs I have. All the worker threads in question are happily connected to > the leader. When the leader changes there is a bit of a stampede while > they all try and re-connect to the new leader. But since they don't know > which of the three (again, HA) systems are the leader, they just pick > one of the other two. When they don't get the leader they disconnect and > try another. > > It might be there is something we can do on the neutron side as well, > the 10 backlog just seemed like the first place to start. I believe I heard something about adjusting the number of connections in neutron, but I don't have any specific pointers. Maybe Ihar knows something about it? > >> Saying that, the original code clearly wasn't designed for a high >> number of simultaneous connection attempts, so it makes sense to >> increase the backlog to some higher value. I see Ihar re-posted his >> patch doing that here: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/ >> I'll take a look at it. > > Thanks, I plan on testing that as well. > >> One other thing that we could do is to accept more connections at a time. >> Currently we accept one connection per event loop iteration. But we >> need to be careful here as handling multiple initial monitor requests >> for the database within a single iteration may be costly and may reduce >> overall responsiveness of the server. Needs some research. >> >> Having hundreds leader-only clients for Nb still sounds a little strange >> to me though. > > There might be a better way, or I might be mis-understanding as well. We > actually have some meetings next week and I can add this as a discussion > topic. I believe newer versions of Neutron went away from leader-only connections in most places. At least on Sb side: https://review.opendev.org/c/openstack/neutron/+/803268 Best regards, Ilya Maximets. _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss