Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-24 Thread One Thousand Gnomes
On Wed, 24 Aug 2016 11:22:09 +0300
"Dan Akunis"  wrote:

> When select wakes up on a UDP socket, user is expecting to get data. Getting 
> 0 from recvfrom() or whatever read function she uses, is a wrong attitude.
> I agree with David.
> 
> The unit test that expects select to wake up is wrong and should be changed.

The unit test is correct.

The behaviour of a 0 byte frame is actually well established and a 0 byte
data frame is a meaningful message is several protocols. It is distinct
from no pending data because that would report EWOULDBLOCK. It's more fun
with regard to EOF but UDP has no EOF semantics. If you want to
understand how to handle zero length datagrams in a connection oriented
protocol the old DECnet documentation covers it in all its pain 8)

Alan


Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-24 Thread Eric Dumazet
On Wed, 2016-08-24 at 11:22 +0300, Dan Akunis wrote:
> When select wakes up on a UDP socket, user is expecting to get data. Getting 
> 0 from recvfrom() or whatever read function she uses, is a wrong attitude.
> I agree with David.
> 
> The unit test that expects select to wake up is wrong and should be changed.
> 

Please do not top post on netdev mailing list.

Program is fine and wont be changed to work around a kernel bug.

We definitely can send and receive UDP messages with 0 payload.

So select() should unblock when one such frame is received, otherwise
you could fill up the receive queue with a lot of frames like that and
when SO_RCVBUF limit is reached, block future messages.

UDP is a datagram protocol.

Bug fix is merged :
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=e83c6744e81abc93a20d0eb3b7f504a176a6126a





Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-24 Thread Dan Akunis
When select wakes up on a UDP socket, user is expecting to get data. Getting 
0 from recvfrom() or whatever read function she uses, is a wrong attitude.

I agree with David.

The unit test that expects select to wake up is wrong and should be changed.

-Original Message- 
From: David Miller

Sent: Tuesday, August 23, 2016 9:25 PM
To: labb...@redhat.com
Cc: kuz...@ms2.inr.ac.ru ; jmor...@namei.org ; yoshf...@linux-ipv6.org ; 
ka...@trash.net ; samanthaku...@google.com ; will...@google.com ; 
netdev@vger.kernel.org ; linux-ker...@vger.kernel.org

Subject: Re: [REGRESSION] Select hang with zero sized UDP packets

From: Laura Abbott <labb...@redhat.com>
Date: Tue, 23 Aug 2016 10:53:26 -0700


Fedora received a report[1] of a unit test failing on Ruby when using
the
4.7 kernel. This was a test to send a zero sized UDP packet. With the
4.7 kernel, the test now timing out on a select instead of completing.
The reduced ruby test is

  def test_udp_recvfrom_nonblock
u1 = UDPSocket.new
u2 = UDPSocket.new
u1.bind("127.0.0.1", 0)
u2.send("", 0, u1.getsockname)
IO.select [u1]  # test gets stuck here
  ensure
u1.close if u1
u2.close if u2
  end


Well, if there is no data, should select really wake up?

I think it's valid not to. 



Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-23 Thread Eric Dumazet
On Tue, 2016-08-23 at 13:06 -0700, Laura Abbott wrote:

> 
> Fixes the test for me. You're welcome to take this as a Tested-by.

Thanks Laura, I will submit an official patch immediately.




Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-23 Thread Laura Abbott

On 08/23/2016 12:03 PM, Eric Dumazet wrote:

On Tue, 2016-08-23 at 11:25 -0700, David Miller wrote:

From: Laura Abbott 
Date: Tue, 23 Aug 2016 10:53:26 -0700


Fedora received a report[1] of a unit test failing on Ruby when using
the
4.7 kernel. This was a test to send a zero sized UDP packet. With the
4.7 kernel, the test now timing out on a select instead of completing.
The reduced ruby test is

  def test_udp_recvfrom_nonblock
u1 = UDPSocket.new
u2 = UDPSocket.new
u1.bind("127.0.0.1", 0)
u2.send("", 0, u1.getsockname)
IO.select [u1]  # test gets stuck here
  ensure
u1.close if u1
u2.close if u2
  end


Well, if there is no data, should select really wake up?

I think it's valid not to.

There are skb in receive queue, with skb->len = 0

This looks like a bug in first_packet_length() or poll logic.

Definitely something we can fix.

Maybe with :

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e61f7cd65d08..380c05a84041 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1184,11 +1184,11 @@ out:
  * Drops all bad checksum frames, until a valid one is found.
  * Returns the length of found skb, or 0 if none is found.
  */
-static unsigned int first_packet_length(struct sock *sk)
+static int first_packet_length(struct sock *sk)
 {
struct sk_buff_head list_kill, *rcvq = >sk_receive_queue;
struct sk_buff *skb;
-   unsigned int res;
+   int res;

__skb_queue_head_init(_kill);

@@ -1203,7 +1203,7 @@ static unsigned int first_packet_length(struct sock *sk)
__skb_unlink(skb, rcvq);
__skb_queue_tail(_kill, skb);
}
-   res = skb ? skb->len : 0;
+   res = skb ? skb->len : -1;
spin_unlock_bh(>lock);

if (!skb_queue_empty(_kill)) {
@@ -1232,7 +1232,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)

case SIOCINQ:
{
-   unsigned int amount = first_packet_length(sk);
+   int amount = max(0, first_packet_length(sk));

return put_user(amount, (int __user *)arg);
}
@@ -2184,7 +2184,7 @@ unsigned int udp_poll(struct file *file, struct socket 
*sock, poll_table *wait)

/* Check for false positives due to checksum errors */
if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
-   !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk))
+   !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
mask &= ~(POLLIN | POLLRDNORM);

return mask;




Fixes the test for me. You're welcome to take this as a Tested-by.

Thanks,
Laura


Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-23 Thread Eric Dumazet
On Tue, 2016-08-23 at 11:25 -0700, David Miller wrote:
> From: Laura Abbott 
> Date: Tue, 23 Aug 2016 10:53:26 -0700
> 
> > Fedora received a report[1] of a unit test failing on Ruby when using
> > the
> > 4.7 kernel. This was a test to send a zero sized UDP packet. With the
> > 4.7 kernel, the test now timing out on a select instead of completing.
> > The reduced ruby test is
> > 
> >   def test_udp_recvfrom_nonblock
> > u1 = UDPSocket.new
> > u2 = UDPSocket.new
> > u1.bind("127.0.0.1", 0)
> > u2.send("", 0, u1.getsockname)
> > IO.select [u1]  # test gets stuck here
> >   ensure
> > u1.close if u1
> > u2.close if u2
> >   end
> 
> Well, if there is no data, should select really wake up?
> 
> I think it's valid not to.
There are skb in receive queue, with skb->len = 0

This looks like a bug in first_packet_length() or poll logic.

Definitely something we can fix.

Maybe with :

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e61f7cd65d08..380c05a84041 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1184,11 +1184,11 @@ out:
  * Drops all bad checksum frames, until a valid one is found.
  * Returns the length of found skb, or 0 if none is found.
  */
-static unsigned int first_packet_length(struct sock *sk)
+static int first_packet_length(struct sock *sk)
 {
struct sk_buff_head list_kill, *rcvq = >sk_receive_queue;
struct sk_buff *skb;
-   unsigned int res;
+   int res;
 
__skb_queue_head_init(_kill);
 
@@ -1203,7 +1203,7 @@ static unsigned int first_packet_length(struct sock *sk)
__skb_unlink(skb, rcvq);
__skb_queue_tail(_kill, skb);
}
-   res = skb ? skb->len : 0;
+   res = skb ? skb->len : -1;
spin_unlock_bh(>lock);
 
if (!skb_queue_empty(_kill)) {
@@ -1232,7 +1232,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 
case SIOCINQ:
{
-   unsigned int amount = first_packet_length(sk);
+   int amount = max(0, first_packet_length(sk));
 
return put_user(amount, (int __user *)arg);
}
@@ -2184,7 +2184,7 @@ unsigned int udp_poll(struct file *file, struct socket 
*sock, poll_table *wait)
 
/* Check for false positives due to checksum errors */
if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
-   !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk))
+   !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
mask &= ~(POLLIN | POLLRDNORM);
 
return mask;




Re: [REGRESSION] Select hang with zero sized UDP packets

2016-08-23 Thread David Miller
From: Laura Abbott 
Date: Tue, 23 Aug 2016 10:53:26 -0700

> Fedora received a report[1] of a unit test failing on Ruby when using
> the
> 4.7 kernel. This was a test to send a zero sized UDP packet. With the
> 4.7 kernel, the test now timing out on a select instead of completing.
> The reduced ruby test is
> 
>   def test_udp_recvfrom_nonblock
> u1 = UDPSocket.new
> u2 = UDPSocket.new
> u1.bind("127.0.0.1", 0)
> u2.send("", 0, u1.getsockname)
> IO.select [u1]  # test gets stuck here
>   ensure
> u1.close if u1
> u2.close if u2
>   end

Well, if there is no data, should select really wake up?

I think it's valid not to.