Thanks for the sysctl recommendations! I'll start by changing the net.core.rmem_* and wmem_* values first, to make it easier to identify the solution. Along those lines ...
I can't seem to find confirmation, but I assume net.core.rmem_* and wmem_* are the only params that will have any affect on netlink sockets. The net.ipv4.* params seem to be specific to the AF_INET address family, and shouldn't affect netlink. And judging by the name, I'm also assuming net.core.netdev_max_backlog only affects packets coming through real networking devices, not netlink. Does that sound correct to you? "Errno 105" is in the error message I posted initially ... Is that what you're looking for? Seems to correspond with ENOBUFS. Thanks again, Paul. On Fri, Jan 25, 2019 at 10:08 PM Paul Wouters <[email protected]> wrote: > On Fri, 25 Jan 2019, Alan Szlosek wrote: > > > We're using Libreswan 3.25 (netkey) on Linux 4.15.0-1020-aws and are > seeing the following error pop up in our pluto logs from time to time, > sometimes several per hour. > > > > ERROR: recvfrom() failed in netlink_get. Errno 105: No buffer space > available > > That looks like the kernel ran out of memory perhaps? > > > Our CPU usage has stayed below 40%, and memory usage has stayed below > 50%. > > But you say that's not the case. hmm. odd. > > > Is there some value we can tune with sysctl that will affect the buffer > associated with these netlink sockets? > > You can try something like: > > # /etc/sysctl.d/pwouters-highspeed.conf > # increase TCP max buffer size setable using setsockopt() > net.core.rmem_max = 536870912 > net.core.wmem_max = 536870912 > # increase Linux autotuning TCP buffer limit > net.ipv4.tcp_rmem = 16384 349520 16777216 > net.ipv4.tcp_wmem = 16384 349520 16777216 > # recommended to increase this for CentOS6 with 10G NICS or higher > net.core.netdev_max_backlog = 250000 > # don't cache ssthresh from previous connection > net.ipv4.tcp_no_metrics_save = 1 > # If you are using Jumbo Frames, also set this > net.ipv4.tcp_mtu_probing = 1 > # recommended for CentOS7/Debian8 hosts > net.core.default_qdisc = fq > > > What else should we be considering? > > You can try git master or apply this patch which should at least give us > the proper failure code from recvfrom(): > > > https://github.com/libreswan/libreswan/commit/9482cbfd03bee42aa8ad4f0b7a2c3f84d02cf550 > > Paul >
_______________________________________________ Swan mailing list [email protected] https://lists.libreswan.org/mailman/listinfo/swan
