Re: After a while of system running no incoming UDP any more?
On Fri, Aug 11, 2017 at 04:34:53PM +0200, Marc Haber wrote: > On Fri, Jul 28, 2017 at 02:14:34PM +0200, Marc Haber wrote: > > I can confirm that these two changes make a system in bad state work > > again immediately. Will try the patch on 4.12.4 later today. > > After upgrading my test systems to 4.12.5, the issue reappeared. This > shows me that the patch indeed helped (my patched 4.12.4 kernels didn't > show the bad behavior), and that the patch didn't make its way into > 4.12.5. The patch applied to 4.12.5, kernels are building. It seems to be in the freshly released 4.12.6. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: After a while of system running no incoming UDP any more?
On Fri, Jul 28, 2017 at 02:14:34PM +0200, Marc Haber wrote: > I can confirm that these two changes make a system in bad state work > again immediately. Will try the patch on 4.12.4 later today. After upgrading my test systems to 4.12.5, the issue reappeared. This shows me that the patch indeed helped (my patched 4.12.4 kernels didn't show the bad behavior), and that the patch didn't make its way into 4.12.5. The patch applied to 4.12.5, kernels are building. The run-time fix works as well. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: After a while of system running no incoming UDP any more?
On Fri, Jul 28, 2017 at 10:07:57AM +0200, Paolo Abeni wrote: > Ad a workaround you can disable UDP early demux: > > echo 0 > /proc/sys/net/ipv4/udp_early_demux > > (will affect both ipv4 and ipv6). > > and (if the system is already into the bad state) increase the udp > accounted memory limit, writing in /proc/sys/net/ipv4/udp_mem greater > values than the current ones (the actual values depends on the system > total memory). I can confirm that these two changes make a system in bad state work again immediately. Will try the patch on 4.12.4 later today. Thanks for helping! Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: After a while of system running no incoming UDP any more?
On Fri, 2017-07-28 at 10:15 +0200, Paolo Abeni wrote: > Hi, > > On Fri, 2017-07-28 at 01:05 -0700, Eric Dumazet wrote: > I think that the leaked sockets are still unhashed on close() (via > udp_lib_close() -> sk_common_release() ->sk_proto->unhash()), so they > should not be listed there ?!? (Rush answer, subject to lack of coffee > issues) If the daemon is still running, sockets are not closed and we can watch refcount being increased. We can see this even on an UDP_RR workload (no UDP flood at all) lpk51:~# while : > do > ./super_netperf 10 -H 2002:af6:b34:: -t UDP_RR -l 1 -- -N > done lpk52:~# cat /proc/net/udp6|grep -v " 2 " sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 293: F60A0220340B:F0A9 F60A0220330B:AB11 01 : 00: 00 583763 387 880ddb862c00 0 3251: F60A0220340B:FC37 F60A0220330B:EA7F 01 : 00: 00 586126 356 880dd25a7940 0 4043: F60A0220340B:FF4F F60A0220330B:9A35 01 0500: 00: 00 583760 409 880ddb863180 0 7832: F60A0220340B:8E1C F60A0220330B:DEC1 01 : 00: 00 583766 387 880ddb863700 0 8568: F60A0220340B:90FC F60A0220330B:8A60 01 : 00: 00 580369 379 880d9e7b2700 0 13085: F60A0220340B:A2A1 F60A0220330B:CDB3 01 : 00: 00 586915 397 880e31994140 0 14975: F60A0220340B:AA03 F60A0220330B:9C71 01 : 00: 00 586123 355 880d9d1baec0 0 15155: F60A0220340B:AAB7 F60A0220330B:D324 01 : 00: 00 583769 367 880dd88141c0 0 25421: F60A0220340B:D2D1 F60A0220330B:CA33 01 : 00: 00 586120 424 880dfda4b0c0 0 28774: F60A0220340B:DFEA F60A0220330B:DD05 01 0500: 00: 00 583757 411 880ddb862680 0
Re: After a while of system running no incoming UDP any more?
Hi, On Fri, 2017-07-28 at 01:05 -0700, Eric Dumazet wrote: > On Fri, 2017-07-28 at 08:26 +0200, Marc Haber wrote: > > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > > Once that a system enter the buggy status, do the packets reach the > > > relevant socket's queue? > > > > > > ss -u > > > nstat |grep -e Udp -e Ip > > > > > > will help checking that. > > > > I now have the issue on one machine, a Xen guest acting as authoritative > > nameserver for my domains. Here are the outputs during normal use, with > > artificial queries coming in on eth0: > > > > [9/1075]mh@impetus:~ $ ss -u > > Recv-Q Send-Q Local Address:Port > > Peer Address:Port > > > > 0 0 127.0.0.1:56547 > > 127.0.0.1:domain > > > > 0 0 216.231.132.60:27667 > > 198.41.0.4:domain > > > > 0 0 216.231.132.60:44121 > >8.8.8.8:domain > > > > 0 0 216.231.132.60:29814 > > 198.41.0.4:domain > > > > [10/1076]mh@impetus:~ $ ss -u > > Recv-Q Send-Q Local Address:Port > > Peer Address:Port > > > > [11/1076]mh@impetus:~ $ ss -u > > Recv-Q Send-Q Local Address:Port > > Peer Address:Port > > > > [12/1076]mh@impetus:~ $ ss -u > > Recv-Q Send-Q Local Address:Port > > Peer Address:Port > > > > [13/1076]mh@impetus:~ $ ss -u > > Recv-Q Send-Q Local Address:Port > > Peer Address:Port > > > > [14/1076]mh@impetus:~ $ nstat | grep -e Udp -e Ip > > IpInReceives400688 0.0 > > IpInAddrErrors 18567 0.0 > > IpInUnknownProtos 3 0.0 > > IpInDelivers330634 0.0 > > IpOutRequests 283637 0.0 > > UdpInDatagrams 145860 0.0 > > UdpNoPorts 1313 0.0 > > UdpInErrors 9356 0.0 > > UdpOutDatagrams 153093 0.0 > > UdpIgnoredMulti 34148 0.0 > > Ip6InReceives 161178 0.0 > > Ip6InNoRoutes 8 0.0 > > Ip6InDelivers 73841 0.0 > > Ip6OutRequests 77575 0.0 > > Ip6InMcastPkts 87332 0.0 > > Ip6OutMcastPkts 1090.0 > > Ip6InOctets 21880674 0.0 > > Ip6OutOctets96330590.0 > > Ip6InMcastOctets93714830.0 > > Ip6OutMcastOctets 6636 0.0 > > Ip6InNoECTPkts 161202 0.0 > > Ip6InECT1Pkts 15 0.0 > > Ip6InECT0Pkts 11 0.0 > > Ip6InCEPkts 4 0.0 > > Udp6InDatagrams 11725 0.0 > > Udp6NoPorts 2 0.0 > > Udp6InErrors1989 0.0 > > Udp6OutDatagrams14483 0.0 > > IpExtInBcastPkts34148 0.0 > > IpExtInOctets 47462716 0.0 > > IpExtOutOctets 31262696 0.0 > > IpExtInBcastOctets 74760590.0 > > IpExtInNoECTPkts400178 0.0 > > IpExtInECT1Pkts 22 0.0 > > IpExtInECT0Pkts 4810.0 > > IpExtInCEPkts 14 0.0 > > [15/1077]mh@impetus:~ $ nstat | grep -e Udp -e Ip > > IpInReceives25 0.0 > > IpInDelivers25 0.0 > > IpOutRequests 16 0.0 > > UdpInDatagrams 1 0.0 > > UdpInErrors 24 0.0 > > UdpOutDatagrams 16 0.0 > > Ip6InReceives 15 0.0 > > Ip6InDelivers 14 0.0 > > Ip6OutRe
Re: After a while of system running no incoming UDP any more?
Hi, On Fri, 2017-07-28 at 08:26 +0200, Marc Haber wrote: > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > Once that a system enter the buggy status, do the packets reach the > > relevant socket's queue? > > > > ss -u > > nstat |grep -e Udp -e Ip > > > > will help checking that. > > I now have the issue on one machine, a Xen guest acting as authoritative > nameserver for my domains. Here are the outputs during normal use, with > artificial queries coming in on eth0: > > [9/1075]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > 0 0 127.0.0.1:56547 >127.0.0.1:domain > 0 0 216.231.132.60:27667 > 198.41.0.4:domain > 0 0 216.231.132.60:44121 > 8.8.8.8:domain > 0 0 216.231.132.60:29814 > 198.41.0.4:domain > [10/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [11/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [12/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [13/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [14/1076]mh@impetus:~ $ nstat | grep -e Udp -e Ip > IpInReceives400688 0.0 > IpInAddrErrors 18567 0.0 > IpInUnknownProtos 3 0.0 > IpInDelivers330634 0.0 > IpOutRequests 283637 0.0 > UdpInDatagrams 145860 0.0 > UdpNoPorts 1313 0.0 > UdpInErrors 9356 0.0 Thanks for the info. This is compatible with what reported on: https://bugzilla.kernel.org/show_bug.cgi?id=196469 and should be fixed by this patch: http://marc.info/?l=linux-netdev&m=150115960024825&w=2 (approval pending) Ad a workaround you can disable UDP early demux: echo 0 > /proc/sys/net/ipv4/udp_early_demux (will affect both ipv4 and ipv6). and (if the system is already into the bad state) increase the udp accounted memory limit, writing in /proc/sys/net/ipv4/udp_mem greater values than the current ones (the actual values depends on the system total memory). Feel free to test the above patch on your systems. Cheers, Paolo
Re: After a while of system running no incoming UDP any more?
On Fri, 2017-07-28 at 08:26 +0200, Marc Haber wrote: > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > Once that a system enter the buggy status, do the packets reach the > > relevant socket's queue? > > > > ss -u > > nstat |grep -e Udp -e Ip > > > > will help checking that. > > I now have the issue on one machine, a Xen guest acting as authoritative > nameserver for my domains. Here are the outputs during normal use, with > artificial queries coming in on eth0: > > [9/1075]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > 0 0 127.0.0.1:56547 >127.0.0.1:domain > 0 0 216.231.132.60:27667 > 198.41.0.4:domain > 0 0 216.231.132.60:44121 > 8.8.8.8:domain > 0 0 216.231.132.60:29814 > 198.41.0.4:domain > [10/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [11/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [12/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [13/1076]mh@impetus:~ $ ss -u > Recv-Q Send-Q Local Address:Port > Peer Address:Port > [14/1076]mh@impetus:~ $ nstat | grep -e Udp -e Ip > IpInReceives400688 0.0 > IpInAddrErrors 18567 0.0 > IpInUnknownProtos 3 0.0 > IpInDelivers330634 0.0 > IpOutRequests 283637 0.0 > UdpInDatagrams 145860 0.0 > UdpNoPorts 1313 0.0 > UdpInErrors 9356 0.0 > UdpOutDatagrams 153093 0.0 > UdpIgnoredMulti 34148 0.0 > Ip6InReceives 161178 0.0 > Ip6InNoRoutes 8 0.0 > Ip6InDelivers 73841 0.0 > Ip6OutRequests 77575 0.0 > Ip6InMcastPkts 87332 0.0 > Ip6OutMcastPkts 1090.0 > Ip6InOctets 21880674 0.0 > Ip6OutOctets96330590.0 > Ip6InMcastOctets93714830.0 > Ip6OutMcastOctets 6636 0.0 > Ip6InNoECTPkts 161202 0.0 > Ip6InECT1Pkts 15 0.0 > Ip6InECT0Pkts 11 0.0 > Ip6InCEPkts 4 0.0 > Udp6InDatagrams 11725 0.0 > Udp6NoPorts 2 0.0 > Udp6InErrors1989 0.0 > Udp6OutDatagrams14483 0.0 > IpExtInBcastPkts34148 0.0 > IpExtInOctets 47462716 0.0 > IpExtOutOctets 31262696 0.0 > IpExtInBcastOctets 74760590.0 > IpExtInNoECTPkts400178 0.0 > IpExtInECT1Pkts 22 0.0 > IpExtInECT0Pkts 4810.0 > IpExtInCEPkts 14 0.0 > [15/1077]mh@impetus:~ $ nstat | grep -e Udp -e Ip > IpInReceives25 0.0 > IpInDelivers25 0.0 > IpOutRequests 16 0.0 > UdpInDatagrams 1 0.0 > UdpInErrors 24 0.0 > UdpOutDatagrams 16 0.0 > Ip6InReceives 15 0.0 > Ip6InDelivers 14 0.0 > Ip6OutRequests 12 0.0 > Ip6InMcastPkts 1 0.0 > Ip6InOctets 1219 0.0 > Ip6OutOctets4384 0.0 > Ip6InMcastOctets1310.
Re: After a while of system running no incoming UDP any more?
On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > Once that a system enter the buggy status, do the packets reach the > relevant socket's queue? > > ss -u > nstat |grep -e Udp -e Ip > > will help checking that. I now have the issue on one machine, a Xen guest acting as authoritative nameserver for my domains. Here are the outputs during normal use, with artificial queries coming in on eth0: [9/1075]mh@impetus:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port 0 0 127.0.0.1:56547 127.0.0.1:domain 0 0 216.231.132.60:27667 198.41.0.4:domain 0 0 216.231.132.60:44121 8.8.8.8:domain 0 0 216.231.132.60:29814 198.41.0.4:domain [10/1076]mh@impetus:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port [11/1076]mh@impetus:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port [12/1076]mh@impetus:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port [13/1076]mh@impetus:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port [14/1076]mh@impetus:~ $ nstat | grep -e Udp -e Ip IpInReceives400688 0.0 IpInAddrErrors 18567 0.0 IpInUnknownProtos 3 0.0 IpInDelivers330634 0.0 IpOutRequests 283637 0.0 UdpInDatagrams 145860 0.0 UdpNoPorts 1313 0.0 UdpInErrors 9356 0.0 UdpOutDatagrams 153093 0.0 UdpIgnoredMulti 34148 0.0 Ip6InReceives 161178 0.0 Ip6InNoRoutes 8 0.0 Ip6InDelivers 73841 0.0 Ip6OutRequests 77575 0.0 Ip6InMcastPkts 87332 0.0 Ip6OutMcastPkts 1090.0 Ip6InOctets 21880674 0.0 Ip6OutOctets96330590.0 Ip6InMcastOctets93714830.0 Ip6OutMcastOctets 6636 0.0 Ip6InNoECTPkts 161202 0.0 Ip6InECT1Pkts 15 0.0 Ip6InECT0Pkts 11 0.0 Ip6InCEPkts 4 0.0 Udp6InDatagrams 11725 0.0 Udp6NoPorts 2 0.0 Udp6InErrors1989 0.0 Udp6OutDatagrams14483 0.0 IpExtInBcastPkts34148 0.0 IpExtInOctets 47462716 0.0 IpExtOutOctets 31262696 0.0 IpExtInBcastOctets 74760590.0 IpExtInNoECTPkts400178 0.0 IpExtInECT1Pkts 22 0.0 IpExtInECT0Pkts 4810.0 IpExtInCEPkts 14 0.0 [15/1077]mh@impetus:~ $ nstat | grep -e Udp -e Ip IpInReceives25 0.0 IpInDelivers25 0.0 IpOutRequests 16 0.0 UdpInDatagrams 1 0.0 UdpInErrors 24 0.0 UdpOutDatagrams 16 0.0 Ip6InReceives 15 0.0 Ip6InDelivers 14 0.0 Ip6OutRequests 12 0.0 Ip6InMcastPkts 1 0.0 Ip6InOctets 1219 0.0 Ip6OutOctets4384 0.0 Ip6InMcastOctets1310.0 Ip6InNoECTPkts 15 0.0 IpExtInOctets 11779 0.0 IpExtOutOctets 1023 0.0 IpExtInNoECTPkts25 0.0 [16/1
Re: After a while of system running no incoming UDP any more?
On Wed, 2017-07-26 at 10:10 +0200, Marc Haber wrote: > On Tue, Jul 25, 2017 at 02:17:52PM +0200, Paolo Abeni wrote: > > On Tue, 2017-07-25 at 13:57 +0200, Marc Haber wrote: > > > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > > > Once that a system enter the buggy status, do the packets reach the > > > > relevant socket's queue? > > > > > > > > ss -u > > > > > > That one only shows table headers on an unaffected system in normal > > > operation, right? > > > > This one shows the current lenght of the socket receive queue (Recv-Q, > > the first column). If the packets land into the skbuff (and the user > > space reader for some reason is not woken up) such value will grow over > > time. > > Only that there is no value: > [4/4992]mh@swivel:~ $ ss -u > Recv-Q Send-Q Local Address:Port Peer Address:Port > > [5/4992]mh@swivel:~ $ > > (is that the intended behavior on a system thiat is not affected by the > issue?) That means there are no open UDP connected sockets in the system at the moment you run ss -u. I forgot to specify you must add also the '-a' command line option to the 'ss' tool to show all udp sockets regardless theis state: ss -ua Anyway this issue looks quite similar to: https://bugzilla.kernel.org/show_bug.cgi?id=196469 Which contains some more information. I suggest to follow-up on such bz. Cheers, Paolo
Re: After a while of system running no incoming UDP any more?
On Tue, Jul 25, 2017 at 02:17:52PM +0200, Paolo Abeni wrote: > On Tue, 2017-07-25 at 13:57 +0200, Marc Haber wrote: > > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > > Once that a system enter the buggy status, do the packets reach the > > > relevant socket's queue? > > > > > > ss -u > > > > That one only shows table headers on an unaffected system in normal > > operation, right? > > This one shows the current lenght of the socket receive queue (Recv-Q, > the first column). If the packets land into the skbuff (and the user > space reader for some reason is not woken up) such value will grow over > time. Only that there is no value: [4/4992]mh@swivel:~ $ ss -u Recv-Q Send-Q Local Address:Port Peer Address:Port [5/4992]mh@swivel:~ $ (is that the intended behavior on a system thiat is not affected by the issue?) > > > nstat |grep -e Udp -e Ip > > > > > > will help checking that. > > > > An unaffected system will show UdpInDatagrams, right? > > > > But where is the connection to the relevant socket's queue? > > If the socket queue lenght (as reported above) does not increase, > IP/UDP stats could give an hint of where and why the packets stop > traversing the network stack. We'll see. Still waiting for the phenomenon to show up again. > Beyond that, you can try using perf probes or kprobe/systemtap to [try > to] track the relevant packets inside the kernel. That's way beyond my kernel foo, I'm afraid. Thanks for helping, I'll report back. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: After a while of system running no incoming UDP any more?
On Tue, 2017-07-25 at 13:57 +0200, Marc Haber wrote: > On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > > Once that a system enter the buggy status, do the packets reach the > > relevant socket's queue? > > > > ss -u > > That one only shows table headers on an unaffected system in normal > operation, right? This one shows the current lenght of the socket receive queue (Recv-Q, the first column). If the packets land into the skbuff (and the user space reader for some reason is not woken up) such value will grow over time. > > nstat |grep -e Udp -e Ip > > > > will help checking that. > > An unaffected system will show UdpInDatagrams, right? > > But where is the connection to the relevant socket's queue? If the socket queue lenght (as reported above) does not increase, IP/UDP stats could give an hint of where and why the packets stop traversing the network stack. Beyond that, you can try using perf probes or kprobe/systemtap to [try to] track the relevant packets inside the kernel. /P
Re: After a while of system running no incoming UDP any more?
Hi Paolo, thanks for your answer. I appreciate that. On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote: > On Mon, 2017-07-24 at 14:09 +0200, Marc Haber wrote: > > Before I begin running older kernels on productive systems, I would like > > to ask wether there have been recent changes in the 4.11 => 4.12 > > development cycle that might cause an issue like that. > > While there has been some activity regarding the UDP protocol lately, > almost nothing touched UDP in the 4.11 release cycle. 4.11 is good, 4.12 is bad. > The issue you describe looks similar to the bug fixed by the commit > 9bd780f5e066 ("udp: fix poll()"), but the bugged code is only in later > kernels. That one is in v4.13-rc1 and v4.13-rc2, but it doesn't apply to my 4.12 trees. > > Any idea what might be happening here and what else I could try? > > Once that a system enter the buggy status, do the packets reach the > relevant socket's queue? > > ss -u That one only shows table headers on an unaffected system in normal operation, right? > nstat |grep -e Udp -e Ip > > will help checking that. An unaffected system will show UdpInDatagrams, right? But where is the connection to the relevant socket's queue? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: After a while of system running no incoming UDP any more?
Hi, On Mon, 2017-07-24 at 14:09 +0200, Marc Haber wrote: > Before I begin running older kernels on productive systems, I would like > to ask wether there have been recent changes in the 4.11 => 4.12 > development cycle that might cause an issue like that. While there has been some activity regarding the UDP protocol lately, almost nothing touched UDP in the 4.11 release cycle. The issue you describe looks similar to the bug fixed by the commit 9bd780f5e066 ("udp: fix poll()"), but the bugged code is only in later kernels. > Any idea what might be happening here and what else I could try? Once that a system enter the buggy status, do the packets reach the relevant socket's queue? ss -u nstat |grep -e Udp -e Ip will help checking that. Thanks, Paol