Re: [ovs-discuss] Question on listen backlog
On 4/4/24 12:59 PM, Ilya Maximets wrote: On 4/4/24 18:07, Brian Haley wrote: Hi, On 4/4/24 6:12 AM, Ilya Maximets wrote: On 4/3/24 22:15, Brian Haley via discuss wrote: Hi, I recently have been seeing issues in a large environment where the listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, for example: 17842 times the listen queue of a socket overflowed 17842 SYNs to LISTEN sockets dropped Does this cause significant re-connection delays or is it just an observation? It is just an observation at this point. Ack. There is more on NB than SB, but I was surprised to see any. I can only guess at the moment it is happening when the leader changes and hundreds of nodes try and reconnect. This sounds a little strange. Do you have hundreds leader-only clients for Northbound DB? In general, only write-heavy clients actually need to be leader-only. There are a lot of leader-only clients due to the way the neutron API server runs - each worker thread has a connection, and they are scaled depending on processor count, so typically there are at least 32. Then multiply that by three since there is HA involved. Actually I had a look in a recent report and there were 61 NB/62 SB connections per system, so that would make ~185 for each server. I would think in a typical deployment there might be closer to 100. Looking at their sockets I can see the backlog is only set to 10: $ ss -ltm | grep 664 LISTEN 0 10 0.0.0.0:66410.0.0.0:* LISTEN 0 10 0.0.0.0:66420.0.0.0:* Digging into the code, there is only two places where listen() is called, one being inet_open_passive(): /* Listen. */ if (style == SOCK_STREAM && listen(fd, 10) < 0) { error = sock_errno(); VLOG_ERR("%s: listen: %s", target, sock_strerror(error)); goto error; } There is no way to config around this to even test if increasing would help in a running environment. So my question is two-fold: 1) Should this be increased? 128, 256, 1024? I can send a patch. 2) Should this be configurable? Has anyone else seen this? I don't remember having any significant issues related to connection timeouts as they usually getting resolved quickly. And if the server doesn't accept the connection fast enough it means that the server is busy and there may not be real benefit from having more connections in the backlog. It may just hide the connection timeout warning while service will not actually be available for the roughly the same amount of time anyway. Having lower backlog may allow clients to re-connect to a less loaded server faster. Understood, increasing the backlog might just hide the warnings and not fix the issue. I'll explain what seems to be happening, at least from looking at the logs I have. All the worker threads in question are happily connected to the leader. When the leader changes there is a bit of a stampede while they all try and re-connect to the new leader. But since they don't know which of the three (again, HA) systems are the leader, they just pick one of the other two. When they don't get the leader they disconnect and try another. It might be there is something we can do on the neutron side as well, the 10 backlog just seemed like the first place to start. I believe I heard something about adjusting the number of connections in neutron, but I don't have any specific pointers. Maybe Ihar knows something about it? We can set the number of worker threads to run, in this case the values are set for a specific workload, so reducing them would have a negative effect on overall API performance. Trade-off. Saying that, the original code clearly wasn't designed for a high number of simultaneous connection attempts, so it makes sense to increase the backlog to some higher value. I see Ihar re-posted his patch doing that here: https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/ I'll take a look at it. Thanks, I plan on testing that as well. One other thing that we could do is to accept more connections at a time. Currently we accept one connection per event loop iteration. But we need to be careful here as handling multiple initial monitor requests for the database within a single iteration may be costly and may reduce overall responsiveness of the server. Needs some research. Having hundreds leader-only clients for Nb still sounds a little strange to me though. There might be a better way, or I might be mis-understanding as well. We actually have some meetings next week and I can add this as a discussion topic. I believe newer versions of Neutron went away from leader-only connections in most places. At least on Sb side: https://review.opendev.org/c/openstack/neutron/+/803268 Hah, we actually have that patch applied, if I hadn't done a |wc -l I would have noticed the SB connections are divided amongst the three units.
Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add
Hi Dumitru, I am on 23.09. here is the output. [root@ovnkube-db-0 ~]# rpm -qa | grep ovn ovn23.09.1-23.09.1-11.el9.x86_64 ovn23.09.1-central-23.09.1-11.el9.x86_64 thanks Srini On Thu, Apr 4, 2024 at 2:05 AM Dumitru Ceara wrote: > On 4/4/24 01:44, Sri kor wrote: > > Hi Dumitru, > >I have been facing segmantation fault everytime when I trigger > > lr-nat-add with dnat_and_snat. It is distro from centros and it is on > rocky > > 9.1. > > > >> # ovn-nbctl --no-leader-only lr-nat-add > >> > a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_3a735720_c52f_4e6c_856c_4a0dac424a1e > >> dnat_and_snat 91.106.223.172 10.100.140.137 > >> Segmentation fault (core dumped) > > > > > > > >> [root@ovn ~]# cat /etc/os-release > >> NAME="Rocky Linux" > >> VERSION="9.1 (Blue Onyx)" > >> ID="rocky" > >> ID_LIKE="rhel centos fedora" > >> VERSION_ID="9.1" > >> PLATFORM_ID="platform:el9" > >> PRETTY_NAME="Rocky Linux 9.1 (Blue Onyx)" > >> ANSI_COLOR="0;32" > >> LOGO="fedora-logo-icon" > >> CPE_NAME="cpe:/o:rocky:rocky:9::baseos" > >> HOME_URL="https://rockylinux.org/; > >> BUG_REPORT_URL="https://bugs.rockylinux.org/; > >> ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9" > >> ROCKY_SUPPORT_PRODUCT_VERSION="9.1" > >> REDHAT_SUPPORT_PRODUCT="Rocky Linux" > >> REDHAT_SUPPORT_PRODUCT_VERSION="9.1" > > > > It's still not clear what ovn version you're exactly running. Can you > please share the output of: > > rpm -qa | grep ovn > > Thanks! > > > > > > > thanks > > Srini > > On Mon, Feb 19, 2024 at 6:25 AM Dumitru Ceara wrote: > > > >> On 2/13/24 00:10, Sri kor via discuss wrote: > >>> Hi Team, > >>> When I am trying to add the nat entry for LR, ovn-nbctl cored. here > is > >>> back trace. > >> > >> Hi, > >> > >>> > >>> [root@ovnkube-db-0 ~]# ovn-nbctl --no-leader-only lr-nat-add > >>> > >> > a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 > >> dnat_and_snat 91.106.221.145 10.10.10.10 > >>> */Segmentation fault (core dumped)/* > >>> [root@ovnkube-db-0 ~]# ovn-nbctl --version > >>> ovn-nbctl 23.09.1 > >>> Open vSwitch Library 3.2.2 > >>> DB Schema 7.1.0 > >>> [root@ovnkube-db-0 ~]# > >>> > >>> > >>> (gdb) core-file core.ovn-nbctl.60392.1707778809 > >>> [New LWP 60392] > >>> [Thread debugging using libthread_db enabled] > >>> Using host libthread_db library "/lib64/libthread_db.so.1". > >>> Core was generated by `ovn-nbctl --no-leader-only lr-nat-add > >>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr'. > >>> Program terminated with signal SIGSEGV, Segmentation fault. > >>> #0 0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6 > >>> (gdb) bt > >>> #0 0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6 > >>> #1 0x55f5ecdb1d39 in nbctl_lr_nat_add.lto_priv () > >>> #2 0x55f5ecd9d39a in main_loop.lto_priv () > >>> #3 0x55f5ecd998a2 in main () > >>> (gdb) > >>> > >> > >> I tried this in a sandbox built from v23.09.1: > >> > >> $ ovn-nbctl lr-add > >> > >> > a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 > >> $ ovn-nbctl --no-leader-only lr-nat-add > >> > >> > a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 > >> dnat_and_snat 91.106.221.145 10.10.10.10 > >> $ > >> > >> And it doesn't seem to crash. Is OVN built from source in your case? > >> If so, can you please share the first part of config.log, e.g.: > >> > >> $ head -10 config.log > >> > >> If this is a distro provided package, can you please share the distro > >> and version? > >> > >> Thanks, > >> Dumitru > >> > >> > > > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Question on listen backlog
On 4/4/24 18:07, Brian Haley wrote: > Hi, > > On 4/4/24 6:12 AM, Ilya Maximets wrote: >> On 4/3/24 22:15, Brian Haley via discuss wrote: >>> Hi, >>> >>> I recently have been seeing issues in a large environment where the >>> listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, >>> for example: >>> >>> 17842 times the listen queue of a socket overflowed >>> 17842 SYNs to LISTEN sockets dropped >> >> Does this cause significant re-connection delays or is it just an >> observation? > > It is just an observation at this point. Ack. > >>> There is more on NB than SB, but I was surprised to see any. I can only >>> guess at the moment it is happening when the leader changes and hundreds >>> of nodes try and reconnect. >> >> This sounds a little strange. Do you have hundreds leader-only clients >> for Northbound DB? In general, only write-heavy clients actually need >> to be leader-only. > > There are a lot of leader-only clients due to the way the neutron API > server runs - each worker thread has a connection, and they are scaled > depending on processor count, so typically there are at least 32. Then > multiply that by three since there is HA involved. > > Actually I had a look in a recent report and there were 61 NB/62 SB > connections per system, so that would make ~185 for each server. I would > think in a typical deployment there might be closer to 100. > >>> Looking at their sockets I can see the backlog is only set to 10: >>> >>> $ ss -ltm | grep 664 >>> LISTEN 0 10 0.0.0.0:66410.0.0.0:* >>> LISTEN 0 10 0.0.0.0:66420.0.0.0:* >>> >>> Digging into the code, there is only two places where listen() is >>> called, one being inet_open_passive(): >>> >>> /* Listen. */ >>> if (style == SOCK_STREAM && listen(fd, 10) < 0) { >>> error = sock_errno(); >>> VLOG_ERR("%s: listen: %s", target, sock_strerror(error)); >>> goto error; >>> } >>> >>> There is no way to config around this to even test if increasing would >>> help in a running environment. >>> >>> So my question is two-fold: >>> >>> 1) Should this be increased? 128, 256, 1024? I can send a patch. >>> >>> 2) Should this be configurable? >>> >>> Has anyone else seen this? >> >> I don't remember having any significant issues related to connection >> timeouts as they usually getting resolved quickly. And if the server >> doesn't accept the connection fast enough it means that the server is >> busy and there may not be real benefit from having more connections >> in the backlog. It may just hide the connection timeout warning while >> service will not actually be available for the roughly the same amount >> of time anyway. Having lower backlog may allow clients to re-connect >> to a less loaded server faster. > > Understood, increasing the backlog might just hide the warnings and not > fix the issue. > > I'll explain what seems to be happening, at least from looking at the > logs I have. All the worker threads in question are happily connected to > the leader. When the leader changes there is a bit of a stampede while > they all try and re-connect to the new leader. But since they don't know > which of the three (again, HA) systems are the leader, they just pick > one of the other two. When they don't get the leader they disconnect and > try another. > > It might be there is something we can do on the neutron side as well, > the 10 backlog just seemed like the first place to start. I believe I heard something about adjusting the number of connections in neutron, but I don't have any specific pointers. Maybe Ihar knows something about it? > >> Saying that, the original code clearly wasn't designed for a high >> number of simultaneous connection attempts, so it makes sense to >> increase the backlog to some higher value. I see Ihar re-posted his >> patch doing that here: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/ >> I'll take a look at it. > > Thanks, I plan on testing that as well. > >> One other thing that we could do is to accept more connections at a time. >> Currently we accept one connection per event loop iteration. But we >> need to be careful here as handling multiple initial monitor requests >> for the database within a single iteration may be costly and may reduce >> overall responsiveness of the server. Needs some research. >> >> Having hundreds leader-only clients for Nb still sounds a little strange >> to me though. > > There might be a better way, or I might be mis-understanding as well. We > actually have some meetings next week and I can add this as a discussion > topic. I believe newer versions of Neutron went away from leader-only connections in most places. At least on Sb side: https://review.opendev.org/c/openstack/neutron/+/803268 Best regards, Ilya Maximets. ___ discuss mailing
Re: [ovs-discuss] Question on listen backlog
Hi, On 4/4/24 6:12 AM, Ilya Maximets wrote: On 4/3/24 22:15, Brian Haley via discuss wrote: Hi, I recently have been seeing issues in a large environment where the listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, for example: 17842 times the listen queue of a socket overflowed 17842 SYNs to LISTEN sockets dropped Does this cause significant re-connection delays or is it just an observation? It is just an observation at this point. There is more on NB than SB, but I was surprised to see any. I can only guess at the moment it is happening when the leader changes and hundreds of nodes try and reconnect. This sounds a little strange. Do you have hundreds leader-only clients for Northbound DB? In general, only write-heavy clients actually need to be leader-only. There are a lot of leader-only clients due to the way the neutron API server runs - each worker thread has a connection, and they are scaled depending on processor count, so typically there are at least 32. Then multiply that by three since there is HA involved. Actually I had a look in a recent report and there were 61 NB/62 SB connections per system, so that would make ~185 for each server. I would think in a typical deployment there might be closer to 100. Looking at their sockets I can see the backlog is only set to 10: $ ss -ltm | grep 664 LISTEN 0 10 0.0.0.0:66410.0.0.0:* LISTEN 0 10 0.0.0.0:66420.0.0.0:* Digging into the code, there is only two places where listen() is called, one being inet_open_passive(): /* Listen. */ if (style == SOCK_STREAM && listen(fd, 10) < 0) { error = sock_errno(); VLOG_ERR("%s: listen: %s", target, sock_strerror(error)); goto error; } There is no way to config around this to even test if increasing would help in a running environment. So my question is two-fold: 1) Should this be increased? 128, 256, 1024? I can send a patch. 2) Should this be configurable? Has anyone else seen this? I don't remember having any significant issues related to connection timeouts as they usually getting resolved quickly. And if the server doesn't accept the connection fast enough it means that the server is busy and there may not be real benefit from having more connections in the backlog. It may just hide the connection timeout warning while service will not actually be available for the roughly the same amount of time anyway. Having lower backlog may allow clients to re-connect to a less loaded server faster. Understood, increasing the backlog might just hide the warnings and not fix the issue. I'll explain what seems to be happening, at least from looking at the logs I have. All the worker threads in question are happily connected to the leader. When the leader changes there is a bit of a stampede while they all try and re-connect to the new leader. But since they don't know which of the three (again, HA) systems are the leader, they just pick one of the other two. When they don't get the leader they disconnect and try another. It might be there is something we can do on the neutron side as well, the 10 backlog just seemed like the first place to start. Saying that, the original code clearly wasn't designed for a high number of simultaneous connection attempts, so it makes sense to increase the backlog to some higher value. I see Ihar re-posted his patch doing that here: https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/ I'll take a look at it. Thanks, I plan on testing that as well. One other thing that we could do is to accept more connections at a time. Currently we accept one connection per event loop iteration. But we need to be careful here as handling multiple initial monitor requests for the database within a single iteration may be costly and may reduce overall responsiveness of the server. Needs some research. Having hundreds leader-only clients for Nb still sounds a little strange to me though. There might be a better way, or I might be mis-understanding as well. We actually have some meetings next week and I can add this as a discussion topic. Thanks, -Brian ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Question on listen backlog
On 4/3/24 22:15, Brian Haley via discuss wrote: > Hi, > > I recently have been seeing issues in a large environment where the > listen backlog of ovsdb-server, both NB and SB, is getting over-flowed, > for example: > > 17842 times the listen queue of a socket overflowed > 17842 SYNs to LISTEN sockets dropped Does this cause significant re-connection delays or is it just an observation? > > There is more on NB than SB, but I was surprised to see any. I can only > guess at the moment it is happening when the leader changes and hundreds > of nodes try and reconnect. This sounds a little strange. Do you have hundreds leader-only clients for Northbound DB? In general, only write-heavy clients actually need to be leader-only. > > Looking at their sockets I can see the backlog is only set to 10: > > $ ss -ltm | grep 664 > LISTEN 0 10 0.0.0.0:66410.0.0.0:* > LISTEN 0 10 0.0.0.0:66420.0.0.0:* > > Digging into the code, there is only two places where listen() is > called, one being inet_open_passive(): > > /* Listen. */ > if (style == SOCK_STREAM && listen(fd, 10) < 0) { > error = sock_errno(); > VLOG_ERR("%s: listen: %s", target, sock_strerror(error)); > goto error; > } > > There is no way to config around this to even test if increasing would > help in a running environment. > > So my question is two-fold: > > 1) Should this be increased? 128, 256, 1024? I can send a patch. > > 2) Should this be configurable? > > Has anyone else seen this? I don't remember having any significant issues related to connection timeouts as they usually getting resolved quickly. And if the server doesn't accept the connection fast enough it means that the server is busy and there may not be real benefit from having more connections in the backlog. It may just hide the connection timeout warning while service will not actually be available for the roughly the same amount of time anyway. Having lower backlog may allow clients to re-connect to a less loaded server faster. Saying that, the original code clearly wasn't designed for a high number of simultaneous connection attempts, so it makes sense to increase the backlog to some higher value. I see Ihar re-posted his patch doing that here: https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/ I'll take a look at it. One other thing that we could do is to accept more connections at a time. Currently we accept one connection per event loop iteration. But we need to be careful here as handling multiple initial monitor requests for the database within a single iteration may be costly and may reduce overall responsiveness of the server. Needs some research. Having hundreds leader-only clients for Nb still sounds a little strange to me though. Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Segmentation fault on logical router nat entry addition at nbctl_lr_nat_add
On 4/4/24 01:44, Sri kor wrote: > Hi Dumitru, >I have been facing segmantation fault everytime when I trigger > lr-nat-add with dnat_and_snat. It is distro from centros and it is on rocky > 9.1. > >> # ovn-nbctl --no-leader-only lr-nat-add >> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_3a735720_c52f_4e6c_856c_4a0dac424a1e >> dnat_and_snat 91.106.223.172 10.100.140.137 >> Segmentation fault (core dumped) > > > >> [root@ovn ~]# cat /etc/os-release >> NAME="Rocky Linux" >> VERSION="9.1 (Blue Onyx)" >> ID="rocky" >> ID_LIKE="rhel centos fedora" >> VERSION_ID="9.1" >> PLATFORM_ID="platform:el9" >> PRETTY_NAME="Rocky Linux 9.1 (Blue Onyx)" >> ANSI_COLOR="0;32" >> LOGO="fedora-logo-icon" >> CPE_NAME="cpe:/o:rocky:rocky:9::baseos" >> HOME_URL="https://rockylinux.org/; >> BUG_REPORT_URL="https://bugs.rockylinux.org/; >> ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9" >> ROCKY_SUPPORT_PRODUCT_VERSION="9.1" >> REDHAT_SUPPORT_PRODUCT="Rocky Linux" >> REDHAT_SUPPORT_PRODUCT_VERSION="9.1" > It's still not clear what ovn version you're exactly running. Can you please share the output of: rpm -qa | grep ovn Thanks! > > > thanks > Srini > On Mon, Feb 19, 2024 at 6:25 AM Dumitru Ceara wrote: > >> On 2/13/24 00:10, Sri kor via discuss wrote: >>> Hi Team, >>> When I am trying to add the nat entry for LR, ovn-nbctl cored. here is >>> back trace. >> >> Hi, >> >>> >>> [root@ovnkube-db-0 ~]# ovn-nbctl --no-leader-only lr-nat-add >>> >> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 >> dnat_and_snat 91.106.221.145 10.10.10.10 >>> */Segmentation fault (core dumped)/* >>> [root@ovnkube-db-0 ~]# ovn-nbctl --version >>> ovn-nbctl 23.09.1 >>> Open vSwitch Library 3.2.2 >>> DB Schema 7.1.0 >>> [root@ovnkube-db-0 ~]# >>> >>> >>> (gdb) core-file core.ovn-nbctl.60392.1707778809 >>> [New LWP 60392] >>> [Thread debugging using libthread_db enabled] >>> Using host libthread_db library "/lib64/libthread_db.so.1". >>> Core was generated by `ovn-nbctl --no-leader-only lr-nat-add >>> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr'. >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6 >>> (gdb) bt >>> #0 0x79ebeb4d958b in __strcmp_avx2 () from /lib64/libc.so.6 >>> #1 0x55f5ecdb1d39 in nbctl_lr_nat_add.lto_priv () >>> #2 0x55f5ecd9d39a in main_loop.lto_priv () >>> #3 0x55f5ecd998a2 in main () >>> (gdb) >>> >> >> I tried this in a sandbox built from v23.09.1: >> >> $ ovn-nbctl lr-add >> >> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 >> $ ovn-nbctl --no-leader-only lr-nat-add >> >> a_4a3e9209_8826_4561_9c58_4a852bd61c45_lr_a2b89a8a_530a_446f_bf6c_ecf223a7af22 >> dnat_and_snat 91.106.221.145 10.10.10.10 >> $ >> >> And it doesn't seem to crash. Is OVN built from source in your case? >> If so, can you please share the first part of config.log, e.g.: >> >> $ head -10 config.log >> >> If this is a distro provided package, can you please share the distro >> and version? >> >> Thanks, >> Dumitru >> >> > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss