Re: [REGRESSION] Select hang with zero sized UDP packets
On Wed, 24 Aug 2016 11:22:09 +0300 "Dan Akunis"wrote: > When select wakes up on a UDP socket, user is expecting to get data. Getting > 0 from recvfrom() or whatever read function she uses, is a wrong attitude. > I agree with David. > > The unit test that expects select to wake up is wrong and should be changed. The unit test is correct. The behaviour of a 0 byte frame is actually well established and a 0 byte data frame is a meaningful message is several protocols. It is distinct from no pending data because that would report EWOULDBLOCK. It's more fun with regard to EOF but UDP has no EOF semantics. If you want to understand how to handle zero length datagrams in a connection oriented protocol the old DECnet documentation covers it in all its pain 8) Alan
Re: [REGRESSION] Select hang with zero sized UDP packets
On Wed, 2016-08-24 at 11:22 +0300, Dan Akunis wrote: > When select wakes up on a UDP socket, user is expecting to get data. Getting > 0 from recvfrom() or whatever read function she uses, is a wrong attitude. > I agree with David. > > The unit test that expects select to wake up is wrong and should be changed. > Please do not top post on netdev mailing list. Program is fine and wont be changed to work around a kernel bug. We definitely can send and receive UDP messages with 0 payload. So select() should unblock when one such frame is received, otherwise you could fill up the receive queue with a lot of frames like that and when SO_RCVBUF limit is reached, block future messages. UDP is a datagram protocol. Bug fix is merged : https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=e83c6744e81abc93a20d0eb3b7f504a176a6126a
Re: [REGRESSION] Select hang with zero sized UDP packets
When select wakes up on a UDP socket, user is expecting to get data. Getting 0 from recvfrom() or whatever read function she uses, is a wrong attitude. I agree with David. The unit test that expects select to wake up is wrong and should be changed. -Original Message- From: David Miller Sent: Tuesday, August 23, 2016 9:25 PM To: labb...@redhat.com Cc: kuz...@ms2.inr.ac.ru ; jmor...@namei.org ; yoshf...@linux-ipv6.org ; ka...@trash.net ; samanthaku...@google.com ; will...@google.com ; netdev@vger.kernel.org ; linux-ker...@vger.kernel.org Subject: Re: [REGRESSION] Select hang with zero sized UDP packets From: Laura Abbott <labb...@redhat.com> Date: Tue, 23 Aug 2016 10:53:26 -0700 Fedora received a report[1] of a unit test failing on Ruby when using the 4.7 kernel. This was a test to send a zero sized UDP packet. With the 4.7 kernel, the test now timing out on a select instead of completing. The reduced ruby test is def test_udp_recvfrom_nonblock u1 = UDPSocket.new u2 = UDPSocket.new u1.bind("127.0.0.1", 0) u2.send("", 0, u1.getsockname) IO.select [u1] # test gets stuck here ensure u1.close if u1 u2.close if u2 end Well, if there is no data, should select really wake up? I think it's valid not to.
Re: [REGRESSION] Select hang with zero sized UDP packets
On Tue, 2016-08-23 at 13:06 -0700, Laura Abbott wrote: > > Fixes the test for me. You're welcome to take this as a Tested-by. Thanks Laura, I will submit an official patch immediately.
Re: [REGRESSION] Select hang with zero sized UDP packets
On 08/23/2016 12:03 PM, Eric Dumazet wrote: On Tue, 2016-08-23 at 11:25 -0700, David Miller wrote: From: Laura AbbottDate: Tue, 23 Aug 2016 10:53:26 -0700 Fedora received a report[1] of a unit test failing on Ruby when using the 4.7 kernel. This was a test to send a zero sized UDP packet. With the 4.7 kernel, the test now timing out on a select instead of completing. The reduced ruby test is def test_udp_recvfrom_nonblock u1 = UDPSocket.new u2 = UDPSocket.new u1.bind("127.0.0.1", 0) u2.send("", 0, u1.getsockname) IO.select [u1] # test gets stuck here ensure u1.close if u1 u2.close if u2 end Well, if there is no data, should select really wake up? I think it's valid not to. There are skb in receive queue, with skb->len = 0 This looks like a bug in first_packet_length() or poll logic. Definitely something we can fix. Maybe with : diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e61f7cd65d08..380c05a84041 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1184,11 +1184,11 @@ out: * Drops all bad checksum frames, until a valid one is found. * Returns the length of found skb, or 0 if none is found. */ -static unsigned int first_packet_length(struct sock *sk) +static int first_packet_length(struct sock *sk) { struct sk_buff_head list_kill, *rcvq = >sk_receive_queue; struct sk_buff *skb; - unsigned int res; + int res; __skb_queue_head_init(_kill); @@ -1203,7 +1203,7 @@ static unsigned int first_packet_length(struct sock *sk) __skb_unlink(skb, rcvq); __skb_queue_tail(_kill, skb); } - res = skb ? skb->len : 0; + res = skb ? skb->len : -1; spin_unlock_bh(>lock); if (!skb_queue_empty(_kill)) { @@ -1232,7 +1232,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg) case SIOCINQ: { - unsigned int amount = first_packet_length(sk); + int amount = max(0, first_packet_length(sk)); return put_user(amount, (int __user *)arg); } @@ -2184,7 +2184,7 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) /* Check for false positives due to checksum errors */ if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && - !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk)) + !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) mask &= ~(POLLIN | POLLRDNORM); return mask; Fixes the test for me. You're welcome to take this as a Tested-by. Thanks, Laura
Re: [REGRESSION] Select hang with zero sized UDP packets
On Tue, 2016-08-23 at 11:25 -0700, David Miller wrote: > From: Laura Abbott> Date: Tue, 23 Aug 2016 10:53:26 -0700 > > > Fedora received a report[1] of a unit test failing on Ruby when using > > the > > 4.7 kernel. This was a test to send a zero sized UDP packet. With the > > 4.7 kernel, the test now timing out on a select instead of completing. > > The reduced ruby test is > > > > def test_udp_recvfrom_nonblock > > u1 = UDPSocket.new > > u2 = UDPSocket.new > > u1.bind("127.0.0.1", 0) > > u2.send("", 0, u1.getsockname) > > IO.select [u1] # test gets stuck here > > ensure > > u1.close if u1 > > u2.close if u2 > > end > > Well, if there is no data, should select really wake up? > > I think it's valid not to. There are skb in receive queue, with skb->len = 0 This looks like a bug in first_packet_length() or poll logic. Definitely something we can fix. Maybe with : diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e61f7cd65d08..380c05a84041 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1184,11 +1184,11 @@ out: * Drops all bad checksum frames, until a valid one is found. * Returns the length of found skb, or 0 if none is found. */ -static unsigned int first_packet_length(struct sock *sk) +static int first_packet_length(struct sock *sk) { struct sk_buff_head list_kill, *rcvq = >sk_receive_queue; struct sk_buff *skb; - unsigned int res; + int res; __skb_queue_head_init(_kill); @@ -1203,7 +1203,7 @@ static unsigned int first_packet_length(struct sock *sk) __skb_unlink(skb, rcvq); __skb_queue_tail(_kill, skb); } - res = skb ? skb->len : 0; + res = skb ? skb->len : -1; spin_unlock_bh(>lock); if (!skb_queue_empty(_kill)) { @@ -1232,7 +1232,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg) case SIOCINQ: { - unsigned int amount = first_packet_length(sk); + int amount = max(0, first_packet_length(sk)); return put_user(amount, (int __user *)arg); } @@ -2184,7 +2184,7 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) /* Check for false positives due to checksum errors */ if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && - !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk)) + !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) mask &= ~(POLLIN | POLLRDNORM); return mask;
Re: [REGRESSION] Select hang with zero sized UDP packets
From: Laura AbbottDate: Tue, 23 Aug 2016 10:53:26 -0700 > Fedora received a report[1] of a unit test failing on Ruby when using > the > 4.7 kernel. This was a test to send a zero sized UDP packet. With the > 4.7 kernel, the test now timing out on a select instead of completing. > The reduced ruby test is > > def test_udp_recvfrom_nonblock > u1 = UDPSocket.new > u2 = UDPSocket.new > u1.bind("127.0.0.1", 0) > u2.send("", 0, u1.getsockname) > IO.select [u1] # test gets stuck here > ensure > u1.close if u1 > u2.close if u2 > end Well, if there is no data, should select really wake up? I think it's valid not to.