date:20240404

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Brian Haley via discuss

On 4/4/24 12:59 PM, Ilya Maximets wrote:

On 4/4/24 18:07, Brian Haley wrote:

Hi,

On 4/4/24 6:12 AM, Ilya Maximets wrote:

On 4/3/24 22:15, Brian Haley via discuss wrote:

Hi,

I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:

17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped

Does this cause significant re-connection delays or is it just an
observation?

It is just an observation at this point.

Ack.

There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and hundreds
of nodes try and reconnect.

This sounds a little strange. Do you have hundreds leader-only clients
for Northbound DB? In general, only write-heavy clients actually need
to be leader-only.

There are a lot of leader-only clients due to the way the neutron API
server runs - each worker thread has a connection, and they are scaled
depending on processor count, so typically there are at least 32. Then
multiply that by three since there is HA involved.

Actually I had a look in a recent report and there were 61 NB/62 SB
connections per system, so that would make ~185 for each server. I would
think in a typical deployment there might be closer to 100.

Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0 10 0.0.0.0:66410.0.0.0:*
LISTEN 0 10 0.0.0.0:66420.0.0.0:*

Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():

/* Listen. */
if (style == SOCK_STREAM && listen(fd, 10) < 0) {
error = sock_errno();
VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
goto error;
}

There is no way to config around this to even test if increasing would
help in a running environment.

So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?

I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly. And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog. It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway. Having lower backlog may allow clients to re-connect
to a less loaded server faster.

Understood, increasing the backlog might just hide the warnings and not
fix the issue.

I'll explain what seems to be happening, at least from looking at the
logs I have. All the worker threads in question are happily connected to
the leader. When the leader changes there is a bit of a stampede while
they all try and re-connect to the new leader. But since they don't know
which of the three (again, HA) systems are the leader, they just pick
one of the other two. When they don't get the leader they disconnect and
try another.

It might be there is something we can do on the neutron side as well,
the 10 backlog just seemed like the first place to start.

I believe I heard something about adjusting the number of connections
in neutron, but I don't have any specific pointers. Maybe Ihar knows
something about it?

We can set the number of worker threads to run, in this case the values
are set for a specific workload, so reducing them would have a negative
effect on overall API performance. Trade-off.

Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value. I see Ihar re-posted his
patch doing that here:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.

Thanks, I plan on testing that as well.

One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration. But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server. Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.

There might be a better way, or I might be mis-understanding as well. We
actually have some meetings next week and I can add this as a discussion
topic.

I believe newer versions of Neutron went away from leader-only connections
in most places. At least on Sb side:
https://review.opendev.org/c/openstack/neutron/+/803268

Hah, we actually have that patch applied, if I hadn't done a |wc -l I
would have noticed the SB connections are divided amongst the three
units.

Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add

2024-04-04 Thread Sri kor via discuss

Hi Dumitru,
   I am on 23.09. here is the output.

[root@ovnkube-db-0 ~]# rpm -qa | grep ovn
ovn23.09.1-23.09.1-11.el9.x86_64
ovn23.09.1-central-23.09.1-11.el9.x86_64

thanks
Srini

On Thu, Apr 4, 2024 at 2:05 AM Dumitru Ceara  wrote:

> On 4/4/24 01:44, Sri kor wrote:
> > Hi Dumitru,
> >I have been facing segmantation fault everytime when I trigger
> > lr-nat-add with dnat_and_snat. It is distro from centros and it is on
> rocky
> > 9.1.
> >
> >> # ovn-nbctl --no-leader-only lr-nat-add
> >>
> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_3a735720_c52f_4e6c_856c_4a0dac424a1e
> >> dnat_and_snat 91.106.223.172 10.100.140.137
> >> Segmentation fault (core dumped)
> >
> >
> >
> >> [root@ovn ~]# cat /etc/os-release
> >> NAME="Rocky Linux"
> >> VERSION="9.1 (Blue Onyx)"
> >> ID="rocky"
> >> ID_LIKE="rhel centos fedora"
> >> VERSION_ID="9.1"
> >> PLATFORM_ID="platform:el9"
> >> PRETTY_NAME="Rocky Linux 9.1 (Blue Onyx)"
> >> ANSI_COLOR="0;32"
> >> LOGO="fedora-logo-icon"
> >> CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
> >> HOME_URL="https://rockylinux.org/;
> >> BUG_REPORT_URL="https://bugs.rockylinux.org/;
> >> ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
> >> ROCKY_SUPPORT_PRODUCT_VERSION="9.1"
> >> REDHAT_SUPPORT_PRODUCT="Rocky Linux"
> >> REDHAT_SUPPORT_PRODUCT_VERSION="9.1"
> >
>
> It's still not clear what ovn version you're exactly running.  Can you
> please share the output of:
>
> rpm -qa | grep ovn
>
> Thanks!
>
> >
> >
> > thanks
> > Srini
> > On Mon, Feb 19, 2024 at 6:25 AM Dumitru Ceara  wrote:
> >
> >> On 2/13/24 00:10, Sri kor via discuss wrote:
> >>> Hi Team,
> >>>   When I am trying to add the nat entry for LR, ovn-nbctl cored. here
> is
> >>> back trace.
> >>
> >> Hi,
> >>
> >>>
> >>> [root@ovnkube-db-0 ~]# ovn-nbctl --no-leader-only lr-nat-add
> >>>
> >>
> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
> >> dnat_and_snat 91.106.221.145 10.10.10.10
> >>> */Segmentation fault (core dumped)/*
> >>> [root@ovnkube-db-0 ~]# ovn-nbctl --version
> >>> ovn-nbctl 23.09.1
> >>> Open vSwitch Library 3.2.2
> >>> DB Schema 7.1.0
> >>> [root@ovnkube-db-0 ~]#
> >>>
> >>>
> >>> (gdb) core-file core.ovn-nbctl.60392.1707778809
> >>> [New LWP 60392]
> >>> [Thread debugging using libthread_db enabled]
> >>> Using host libthread_db library "/lib64/libthread_db.so.1".
> >>> Core was generated by `ovn-nbctl --no-leader-only lr-nat-add
> >>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr'.
> >>> Program terminated with signal SIGSEGV, Segmentation fault.
> >>> #0  0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6
> >>> (gdb) bt
> >>> #0  0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6
> >>> #1  0x55f5ecdb1d39 in nbctl_lr_nat_add.lto_priv ()
> >>> #2  0x55f5ecd9d39a in main_loop.lto_priv ()
> >>> #3  0x55f5ecd998a2 in main ()
> >>> (gdb)
> >>>
> >>
> >> I tried this in a sandbox built from v23.09.1:
> >>
> >> $ ovn-nbctl lr-add
> >>
> >>
> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
> >> $ ovn-nbctl --no-leader-only lr-nat-add
> >>
> >>
> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
> >> dnat_and_snat 91.106.221.145 10.10.10.10
> >> $
> >>
> >> And it doesn't seem to crash.  Is OVN built from source in your case?
> >> If so, can you please share the first part of config.log, e.g.:
> >>
> >> $ head -10 config.log
> >>
> >> If this is a distro provided package, can you please share the distro
> >> and version?
> >>
> >> Thanks,
> >> Dumitru
> >>
> >>
> >
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Ilya Maximets via discuss

On 4/4/24 18:07, Brian Haley wrote:
> Hi,
> 
> On 4/4/24 6:12 AM, Ilya Maximets wrote:
>> On 4/3/24 22:15, Brian Haley via discuss wrote:
>>> Hi,
>>>
>>> I recently have been seeing issues in a large environment where the
>>> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
>>> for example:
>>>
>>> 17842 times the listen queue of a socket overflowed
>>> 17842 SYNs to LISTEN sockets dropped
>>
>> Does this cause significant re-connection delays or is it just an
>> observation?
> 
> It is just an observation at this point.

Ack.

> 
>>> There is more on NB than SB, but I was surprised to see any. I can only
>>> guess at the moment it is happening when the leader changes and hundreds
>>> of nodes try and reconnect.
>>
>> This sounds a little strange.  Do you have hundreds leader-only clients
>> for Northbound DB?  In general, only write-heavy clients actually need
>> to be leader-only.
> 
> There are a lot of leader-only clients due to the way the neutron API 
> server runs - each worker thread has a connection, and they are scaled 
> depending on processor count, so typically there are at least 32. Then 
> multiply that by three since there is HA involved.
> 
> Actually I had a look in a recent report and there were 61 NB/62 SB 
> connections per system, so that would make ~185 for each server. I would 
> think in a typical deployment there might be closer to 100.
> 
>>> Looking at their sockets I can see the backlog is only set to 10:
>>>
>>> $ ss -ltm | grep 664
>>> LISTEN 0  10   0.0.0.0:66410.0.0.0:*
>>> LISTEN 0  10   0.0.0.0:66420.0.0.0:*
>>>
>>> Digging into the code, there is only two places where listen() is
>>> called, one being inet_open_passive():
>>>
>>>   /* Listen. */
>>>   if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>>>   error = sock_errno();
>>>   VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>>   goto error;
>>>   }
>>>
>>> There is no way to config around this to even test if increasing would
>>> help in a running environment.
>>>
>>> So my question is two-fold:
>>>
>>> 1) Should this be increased? 128, 256, 1024? I can send a patch.
>>>
>>> 2) Should this be configurable?
>>>
>>> Has anyone else seen this?
>>
>> I don't remember having any significant issues related to connection
>> timeouts as they usually getting resolved quickly.  And if the server
>> doesn't accept the connection fast enough it means that the server is
>> busy and there may not be real benefit from having more connections
>> in the backlog.  It may just hide the connection timeout warning while
>> service will not actually be available for the roughly the same amount
>> of time anyway.  Having lower backlog may allow clients to re-connect
>> to a less loaded server faster.
> 
> Understood, increasing the backlog might just hide the warnings and not 
> fix the issue.
> 
> I'll explain what seems to be happening, at least from looking at the 
> logs I have. All the worker threads in question are happily connected to 
> the leader. When the leader changes there is a bit of a stampede while 
> they all try and re-connect to the new leader. But since they don't know 
> which of the three (again, HA) systems are the leader, they just pick 
> one of the other two. When they don't get the leader they disconnect and 
> try another.
> 
> It might be there is something we can do on the neutron side as well, 
> the 10 backlog just seemed like the first place to start.

I believe I heard something about adjusting the number of connections
in neutron, but I don't have any specific pointers.  Maybe Ihar knows
something about it?

> 
>> Saying that, the original code clearly wasn't designed for a high
>> number of simultaneous connection attempts, so it makes sense to
>> increase the backlog to some higher value.  I see Ihar re-posted his
>> patch doing that here:
>>
>> https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
>> I'll take a look at it.
> 
> Thanks, I plan on testing that as well.
> 
>> One other thing that we could do is to accept more connections at a time.
>> Currently we accept one connection per event loop iteration.  But we
>> need to be careful here as handling multiple initial monitor requests
>> for the database within a single iteration may be costly and may reduce
>> overall responsiveness of the server.  Needs some research.
>>
>> Having hundreds leader-only clients for Nb still sounds a little strange
>> to me though.
> 
> There might be a better way, or I might be mis-understanding as well. We 
> actually have some meetings next week and I can add this as a discussion 
> topic.

I believe newer versions of Neutron went away from leader-only connections
in most places.  At least on Sb side:
  https://review.opendev.org/c/openstack/neutron/+/803268

Best regards, Ilya Maximets.
___
discuss mailing

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Brian Haley via discuss

Hi,

On 4/4/24 6:12 AM, Ilya Maximets wrote:

On 4/3/24 22:15, Brian Haley via discuss wrote:

Hi,

I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:

17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped

Does this cause significant re-connection delays or is it just an
observation?

It is just an observation at this point.

There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and hundreds
of nodes try and reconnect.

This sounds a little strange. Do you have hundreds leader-only clients
for Northbound DB? In general, only write-heavy clients actually need
to be leader-only.

Looking at their sockets I can see the backlog is only set to 10:

$ ss -ltm | grep 664
LISTEN 0 10 0.0.0.0:66410.0.0.0:*
LISTEN 0 10 0.0.0.0:66420.0.0.0:*

Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():

/* Listen. */
if (style == SOCK_STREAM && listen(fd, 10) < 0) {
error = sock_errno();
VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
goto error;
}

There is no way to config around this to even test if increasing would
help in a running environment.

So my question is two-fold:

1) Should this be increased? 128, 256, 1024? I can send a patch.

2) Should this be configurable?

Has anyone else seen this?

Understood, increasing the backlog might just hide the warnings and not
fix the issue.

It might be there is something we can do on the neutron side as well,
the 10 backlog just seemed like the first place to start.

https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.

Thanks, I plan on testing that as well.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.

There might be a better way, or I might be mis-understanding as well. We
actually have some meetings next week and I can add this as a discussion
topic.

Thanks,

-Brian
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Question on listen backlog

2024-04-04 Thread Ilya Maximets via discuss

On 4/3/24 22:15, Brian Haley via discuss wrote:
> Hi,
> 
> I recently have been seeing issues in a large environment where the 
> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, 
> for example:
> 
> 17842 times the listen queue of a socket overflowed
> 17842 SYNs to LISTEN sockets dropped

Does this cause significant re-connection delays or is it just an
observation?

> 
> There is more on NB than SB, but I was surprised to see any. I can only 
> guess at the moment it is happening when the leader changes and hundreds 
> of nodes try and reconnect.

This sounds a little strange.  Do you have hundreds leader-only clients
for Northbound DB?  In general, only write-heavy clients actually need
to be leader-only.

> 
> Looking at their sockets I can see the backlog is only set to 10:
> 
> $ ss -ltm | grep 664
> LISTEN 0  10   0.0.0.0:66410.0.0.0:*
> LISTEN 0  10   0.0.0.0:66420.0.0.0:*
> 
> Digging into the code, there is only two places where listen() is 
> called, one being inet_open_passive():
> 
>  /* Listen. */
>  if (style == SOCK_STREAM && listen(fd, 10) < 0) {
>  error = sock_errno();
>  VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>  goto error;
>  }
> 
> There is no way to config around this to even test if increasing would 
> help in a running environment.
> 
> So my question is two-fold:
> 
> 1) Should this be increased? 128, 256, 1024? I can send a patch.
> 
> 2) Should this be configurable?
> 
> Has anyone else seen this?

I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly.  And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog.  It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway.  Having lower backlog may allow clients to re-connect
to a less loaded server faster.

Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value.  I see Ihar re-posted his
patch doing that here:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.

One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration.  But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server.  Needs some research.

Having hundreds leader-only clients for Nb still sounds a little strange
to me though.

Best regards, Ilya Maximets.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add

2024-04-04 Thread Dumitru Ceara via discuss

On 4/4/24 01:44, Sri kor wrote:
> Hi Dumitru,
>I have been facing segmantation fault everytime when I trigger
> lr-nat-add with dnat_and_snat. It is distro from centros and it is on rocky
> 9.1.
> 
>> # ovn-nbctl --no-leader-only lr-nat-add
>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_3a735720_c52f_4e6c_856c_4a0dac424a1e
>> dnat_and_snat 91.106.223.172 10.100.140.137
>> Segmentation fault (core dumped)
> 
> 
> 
>> [root@ovn ~]# cat /etc/os-release
>> NAME="Rocky Linux"
>> VERSION="9.1 (Blue Onyx)"
>> ID="rocky"
>> ID_LIKE="rhel centos fedora"
>> VERSION_ID="9.1"
>> PLATFORM_ID="platform:el9"
>> PRETTY_NAME="Rocky Linux 9.1 (Blue Onyx)"
>> ANSI_COLOR="0;32"
>> LOGO="fedora-logo-icon"
>> CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
>> HOME_URL="https://rockylinux.org/;
>> BUG_REPORT_URL="https://bugs.rockylinux.org/;
>> ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
>> ROCKY_SUPPORT_PRODUCT_VERSION="9.1"
>> REDHAT_SUPPORT_PRODUCT="Rocky Linux"
>> REDHAT_SUPPORT_PRODUCT_VERSION="9.1"
> 

It's still not clear what ovn version you're exactly running.  Can you
please share the output of:

rpm -qa | grep ovn

Thanks!

> 
> 
> thanks
> Srini
> On Mon, Feb 19, 2024 at 6:25 AM Dumitru Ceara  wrote:
> 
>> On 2/13/24 00:10, Sri kor via discuss wrote:
>>> Hi Team,
>>>   When I am trying to add the nat entry for LR, ovn-nbctl cored. here is
>>> back trace.
>>
>> Hi,
>>
>>>
>>> [root@ovnkube-db-0 ~]# ovn-nbctl --no-leader-only lr-nat-add
>>>
>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
>> dnat_and_snat 91.106.221.145 10.10.10.10
>>> */Segmentation fault (core dumped)/*
>>> [root@ovnkube-db-0 ~]# ovn-nbctl --version
>>> ovn-nbctl 23.09.1
>>> Open vSwitch Library 3.2.2
>>> DB Schema 7.1.0
>>> [root@ovnkube-db-0 ~]#
>>>
>>>
>>> (gdb) core-file core.ovn-nbctl.60392.1707778809
>>> [New LWP 60392]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-nbctl --no-leader-only lr-nat-add
>>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6
>>> (gdb) bt
>>> #0  0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6
>>> #1  0x55f5ecdb1d39 in nbctl_lr_nat_add.lto_priv ()
>>> #2  0x55f5ecd9d39a in main_loop.lto_priv ()
>>> #3  0x55f5ecd998a2 in main ()
>>> (gdb)
>>>
>>
>> I tried this in a sandbox built from v23.09.1:
>>
>> $ ovn-nbctl lr-add
>>
>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
>> $ ovn-nbctl --no-leader-only lr-nat-add
>>
>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22
>> dnat_and_snat 91.106.221.145 10.10.10.10
>> $
>>
>> And it doesn't seem to crash.  Is OVN built from source in your case?
>> If so, can you please share the first part of config.log, e.g.:
>>
>> $ head -10 config.log
>>
>> If this is a distro provided package, can you please share the distro
>> and version?
>>
>> Thanks,
>> Dumitru
>>
>>
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Question on listen backlog

Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add

Re: [ovs-discuss] Question on listen backlog

Re: [ovs-discuss] Question on listen backlog

Re: [ovs-discuss] Question on listen backlog

Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add

6 matches

Site Navigation

Mail list logo

Footer information