Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Brian Haley via discuss




On 4/4/24 12:59 PM, Ilya Maximets wrote:

On 4/4/24 18:07, Brian Haley wrote:

Hi,

On 4/4/24 6:12 AM, Ilya Maximets wrote:

On 4/3/24 22:15, Brian Haley via discuss wrote:

Hi,

I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:

17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped


Does this cause significant re-connection delays or is it just an
observation?


It is just an observation at this point.


Ack.




There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and hundreds
of nodes try and reconnect.


This sounds a little strange.  Do you have hundreds leader-only clients
for Northbound DB?  In general, only write-heavy clients actually need
to be leader-only.


There are a lot of leader-only clients due to the way the neutron API
server runs - each worker thread has a connection, and they are scaled
depending on processor count, so typically there are at least 32. Then
multiply that by three since there is HA involved.

Actually I had a look in a recent report and there were 61 NB/62 SB
connections per system, so that would make ~185 for each server. I would
think in a typical deployment there might be closer to 100.


Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0  10   0.0.0.0:66410.0.0.0:*
LISTEN 0  10   0.0.0.0:66420.0.0.0:*

Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():

   /* Listen. */
   if (style == SOCK_STREAM && listen(fd, 10) < 0) {
   error = sock_errno();
   VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
   goto error;
   }

There is no way to config around this to even test if increasing would
help in a running environment.

So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?


I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly.  And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog.  It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway.  Having lower backlog may allow clients to re-connect
to a less loaded server faster.


Understood, increasing the backlog might just hide the warnings and not
fix the issue.

I'll explain what seems to be happening, at least from looking at the
logs I have. All the worker threads in question are happily connected to
the leader. When the leader changes there is a bit of a stampede while
they all try and re-connect to the new leader. But since they don't know
which of the three (again, HA) systems are the leader, they just pick
one of the other two. When they don't get the leader they disconnect and
try another.

It might be there is something we can do on the neutron side as well,
the 10 backlog just seemed like the first place to start.


I believe I heard something about adjusting the number of connections
in neutron, but I don't have any specific pointers.  Maybe Ihar knows
something about it?


We can set the number of worker threads to run, in this case the values 
are set for a specific workload, so reducing them would have a negative 
effect on overall API performance. Trade-off.



Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value.  I see Ihar re-posted his
patch doing that here:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.


Thanks, I plan on testing that as well.


One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration.  But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server.  Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.


There might be a better way, or I might be mis-understanding as well. We
actually have some meetings next week and I can add this as a discussion
topic.


I believe newer versions of Neutron went away from leader-only connections
in most places.  At least on Sb side:
   https://review.opendev.org/c/openstack/neutron/+/803268


Hah, we actually have that patch applied, if I hadn't done a |wc -l I 
would have noticed the SB connections are divided amongst the three 
units. 

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Ilya Maximets via discuss
On 4/4/24 18:07, Brian Haley wrote:
> Hi,
> 
> On 4/4/24 6:12 AM, Ilya Maximets wrote:
>> On 4/3/24 22:15, Brian Haley via discuss wrote:
>>> Hi,
>>>
>>> I recently have been seeing issues in a large environment where the
>>> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
>>> for example:
>>>
>>> 17842 times the listen queue of a socket overflowed
>>> 17842 SYNs to LISTEN sockets dropped
>>
>> Does this cause significant re-connection delays or is it just an
>> observation?
> 
> It is just an observation at this point.

Ack.

> 
>>> There is more on NB than SB, but I was surprised to see any. I can only
>>> guess at the moment it is happening when the leader changes and hundreds
>>> of nodes try and reconnect.
>>
>> This sounds a little strange.  Do you have hundreds leader-only clients
>> for Northbound DB?  In general, only write-heavy clients actually need
>> to be leader-only.
> 
> There are a lot of leader-only clients due to the way the neutron API 
> server runs - each worker thread has a connection, and they are scaled 
> depending on processor count, so typically there are at least 32. Then 
> multiply that by three since there is HA involved.
> 
> Actually I had a look in a recent report and there were 61 NB/62 SB 
> connections per system, so that would make ~185 for each server. I would 
> think in a typical deployment there might be closer to 100.
> 
>>> Looking at their sockets I can see the backlog is only set to 10:
>>>
>>> $ ss -ltm | grep 664
>>> LISTEN 0  10   0.0.0.0:66410.0.0.0:*
>>> LISTEN 0  10   0.0.0.0:66420.0.0.0:*
>>>
>>> Digging into the code, there is only two places where listen() is
>>> called, one being inet_open_passive():
>>>
>>>   /* Listen. */
>>>   if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>>>   error = sock_errno();
>>>   VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>>   goto error;
>>>   }
>>>
>>> There is no way to config around this to even test if increasing would
>>> help in a running environment.
>>>
>>> So my question is two-fold:
>>>
>>> 1) Should this be increased? 128, 256, 1024? I can send a patch.
>>>
>>> 2) Should this be configurable?
>>>
>>> Has anyone else seen this?
>>
>> I don't remember having any significant issues related to connection
>> timeouts as they usually getting resolved quickly.  And if the server
>> doesn't accept the connection fast enough it means that the server is
>> busy and there may not be real benefit from having more connections
>> in the backlog.  It may just hide the connection timeout warning while
>> service will not actually be available for the roughly the same amount
>> of time anyway.  Having lower backlog may allow clients to re-connect
>> to a less loaded server faster.
> 
> Understood, increasing the backlog might just hide the warnings and not 
> fix the issue.
> 
> I'll explain what seems to be happening, at least from looking at the 
> logs I have. All the worker threads in question are happily connected to 
> the leader. When the leader changes there is a bit of a stampede while 
> they all try and re-connect to the new leader. But since they don't know 
> which of the three (again, HA) systems are the leader, they just pick 
> one of the other two. When they don't get the leader they disconnect and 
> try another.
> 
> It might be there is something we can do on the neutron side as well, 
> the 10 backlog just seemed like the first place to start.

I believe I heard something about adjusting the number of connections
in neutron, but I don't have any specific pointers.  Maybe Ihar knows
something about it?

> 
>> Saying that, the original code clearly wasn't designed for a high
>> number of simultaneous connection attempts, so it makes sense to
>> increase the backlog to some higher value.  I see Ihar re-posted his
>> patch doing that here:
>>
>> https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
>> I'll take a look at it.
> 
> Thanks, I plan on testing that as well.
> 
>> One other thing that we could do is to accept more connections at a time.
>> Currently we accept one connection per event loop iteration.  But we
>> need to be careful here as handling multiple initial monitor requests
>> for the database within a single iteration may be costly and may reduce
>> overall responsiveness of the server.  Needs some research.
>>
>> Having hundreds leader-only clients for Nb still sounds a little strange
>> to me though.
> 
> There might be a better way, or I might be mis-understanding as well. We 
> actually have some meetings next week and I can add this as a discussion 
> topic.

I believe newer versions of Neutron went away from leader-only connections
in most places.  At least on Sb side:
  https://review.opendev.org/c/openstack/neutron/+/803268

Best regards, Ilya Maximets.
___
discuss mailing 

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Brian Haley via discuss

Hi,

On 4/4/24 6:12 AM, Ilya Maximets wrote:

On 4/3/24 22:15, Brian Haley via discuss wrote:

Hi,

I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:

17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped


Does this cause significant re-connection delays or is it just an
observation?


It is just an observation at this point.


There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and hundreds
of nodes try and reconnect.


This sounds a little strange.  Do you have hundreds leader-only clients
for Northbound DB?  In general, only write-heavy clients actually need
to be leader-only.


There are a lot of leader-only clients due to the way the neutron API 
server runs - each worker thread has a connection, and they are scaled 
depending on processor count, so typically there are at least 32. Then 
multiply that by three since there is HA involved.


Actually I had a look in a recent report and there were 61 NB/62 SB 
connections per system, so that would make ~185 for each server. I would 
think in a typical deployment there might be closer to 100.



Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0  10   0.0.0.0:66410.0.0.0:*
LISTEN 0  10   0.0.0.0:66420.0.0.0:*

Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():

  /* Listen. */
  if (style == SOCK_STREAM && listen(fd, 10) < 0) {
  error = sock_errno();
  VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
  goto error;
  }

There is no way to config around this to even test if increasing would
help in a running environment.

So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?


I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly.  And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog.  It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway.  Having lower backlog may allow clients to re-connect
to a less loaded server faster.


Understood, increasing the backlog might just hide the warnings and not 
fix the issue.


I'll explain what seems to be happening, at least from looking at the 
logs I have. All the worker threads in question are happily connected to 
the leader. When the leader changes there is a bit of a stampede while 
they all try and re-connect to the new leader. But since they don't know 
which of the three (again, HA) systems are the leader, they just pick 
one of the other two. When they don't get the leader they disconnect and 
try another.


It might be there is something we can do on the neutron side as well, 
the 10 backlog just seemed like the first place to start.



Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value.  I see Ihar re-posted his
patch doing that here:
   
https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.


Thanks, I plan on testing that as well.


One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration.  But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server.  Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.


There might be a better way, or I might be mis-understanding as well. We 
actually have some meetings next week and I can add this as a discussion 
topic.


Thanks,

-Brian
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Ilya Maximets via discuss
On 4/3/24 22:15, Brian Haley via discuss wrote:
> Hi,
> 
> I recently have been seeing issues in a large environment where the 
> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, 
> for example:
> 
> 17842 times the listen queue of a socket overflowed
> 17842 SYNs to LISTEN sockets dropped

Does this cause significant re-connection delays or is it just an
observation?

> 
> There is more on NB than SB, but I was surprised to see any. I can only 
> guess at the moment it is happening when the leader changes and hundreds 
> of nodes try and reconnect.

This sounds a little strange.  Do you have hundreds leader-only clients
for Northbound DB?  In general, only write-heavy clients actually need
to be leader-only.

> 
> Looking at their sockets I can see the backlog is only set to 10:
> 
> $ ss -ltm | grep 664
> LISTEN 0  10   0.0.0.0:66410.0.0.0:*
> LISTEN 0  10   0.0.0.0:66420.0.0.0:*
> 
> Digging into the code, there is only two places where listen() is 
> called, one being inet_open_passive():
> 
>  /* Listen. */
>  if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>  error = sock_errno();
>  VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>  goto error;
>  }
> 
> There is no way to config around this to even test if increasing would 
> help in a running environment.
> 
> So my question is two-fold:
> 
> 1) Should this be increased? 128, 256, 1024? I can send a patch.
> 
> 2) Should this be configurable?
> 
> Has anyone else seen this?

I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly.  And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog.  It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway.  Having lower backlog may allow clients to re-connect
to a less loaded server faster.

Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value.  I see Ihar re-posted his
patch doing that here:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.

One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration.  But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server.  Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.

Best regards, Ilya Maximets.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question on listen backlog

2024-04-03 Thread Brian Haley via discuss

Hi Ihar,

On 4/3/24 4:37 PM, Ihar Hrachyshka wrote:
On Wed, Apr 3, 2024 at 4:15 PM Brian Haley via discuss 
mailto:ovs-discuss@openvswitch.org>> wrote:


Hi,

I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:

17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped

There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and
hundreds
of nodes try and reconnect.

Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0      10 0.0.0.0:6641         0.0.0.0:*
LISTEN 0      10 0.0.0.0:6642         0.0.0.0:*

Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():

      /* Listen. */
      if (style == SOCK_STREAM && listen(fd, 10) < 0) {
          error = sock_errno();
          VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
          goto error;
      }

There is no way to config around this to even test if increasing would
help in a running environment.

So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?


Yes, I've seen it, though I was focusing on AF_UNIX behavior. You may 
want to check this series: 
https://patchwork.ozlabs.org/project/openvswitch/list/?series=382739=%2A=both 


Specifically, it includes a patch to bump the value for INET sockets to 
64: 
https://patchwork.ozlabs.org/project/openvswitch/patch/20231118010703.4154866-2-ihrac...@redhat.com/ 


The series includes a number of other fixes related to unixctl socket 
handling; I am planning to revive this series in next few months, but 
perhaps we could untangle this bump limit patch from the series and 
merge it independently.


As to the question of potential configurability, I have nothing to 
contribute.


Thanks for the quick response Ihar, I had actually missed the python 
code as I only looked in *.c. I can see if bumping to 64 helps, I just 
don't know how quickly it will be deployed and tested.


Regarding merging it independently, it seems from looking at the series 
it might be something that could/should be backported in it's entirety. 
If not I can try and break-out that one patch for previous versions.


And regarding configurability I can only say it would have been good to 
have it to debug this easily, as opposed to re-building things.


-Brian
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question on listen backlog

2024-04-03 Thread Ihar Hrachyshka via discuss
On Wed, Apr 3, 2024 at 4:15 PM Brian Haley via discuss <
ovs-discuss@openvswitch.org> wrote:

> Hi,
>
> I recently have been seeing issues in a large environment where the
> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
> for example:
>
> 17842 times the listen queue of a socket overflowed
> 17842 SYNs to LISTEN sockets dropped
>
> There is more on NB than SB, but I was surprised to see any. I can only
> guess at the moment it is happening when the leader changes and hundreds
> of nodes try and reconnect.
>
> Looking at their sockets I can see the backlog is only set to 10:
>
> $ ss -ltm | grep 664
> LISTEN 0  10   0.0.0.0:66410.0.0.0:*
> LISTEN 0  10   0.0.0.0:66420.0.0.0:*
>
> Digging into the code, there is only two places where listen() is
> called, one being inet_open_passive():
>
>  /* Listen. */
>  if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>  error = sock_errno();
>  VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>  goto error;
>  }
>
> There is no way to config around this to even test if increasing would
> help in a running environment.
>
> So my question is two-fold:
>
> 1) Should this be increased? 128, 256, 1024? I can send a patch.
>
> 2) Should this be configurable?
>
> Has anyone else seen this?
>

Yes, I've seen it, though I was focusing on AF_UNIX behavior. You may want
to check this series:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=382739=%2A=both

Specifically, it includes a patch to bump the value for INET sockets to 64:
https://patchwork.ozlabs.org/project/openvswitch/patch/20231118010703.4154866-2-ihrac...@redhat.com/

The series includes a number of other fixes related to unixctl socket
handling; I am planning to revive this series in next few months, but
perhaps we could untangle this bump limit patch from the series and merge
it independently.

As to the question of potential configurability, I have nothing to
contribute.

Ihar


>
> Thanks for any thoughts.
>
> -Brian
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Question on listen backlog

2024-04-03 Thread Brian Haley via discuss

Hi,

I recently have been seeing issues in a large environment where the 
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, 
for example:


17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped

There is more on NB than SB, but I was surprised to see any. I can only 
guess at the moment it is happening when the leader changes and hundreds 
of nodes try and reconnect.


Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0  10   0.0.0.0:66410.0.0.0:*
LISTEN 0  10   0.0.0.0:66420.0.0.0:*

Digging into the code, there is only two places where listen() is 
called, one being inet_open_passive():


/* Listen. */
if (style == SOCK_STREAM && listen(fd, 10) < 0) {
error = sock_errno();
VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
goto error;
}

There is no way to config around this to even test if increasing would 
help in a running environment.


So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?

Thanks for any thoughts.

-Brian
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss