subject:"\[PATCH\/RFC\] Re\: recvmmsg\(\) timeout behavior strangeness \[RESEND\]"

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-27 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/28/2014 02:20 PM, Michael Kerrisk (man-pages) wrote:
> On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
>> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
>> escreveu:
>>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>>>  wrote:
 Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>> Can you try the attached patch on top of the first one?

> Patches on patches is a way to make your testers work unnecessarily
> harder. Also, it means that anyone else who was interested in this

 It was meant to highlight the changes with regard to the previous patch,
 i.e. to make things easier for reviewing.
>>>
>>> (I don't think that works...)
>>
>> Lets try both then, 
> 
> That's better!
> 
>> attached goes the updated patch, and this is the
>> diff to the last combined one:
>>
>> diff --git a/net/socket.c b/net/socket.c
>> index 310a50971769..379be43879db 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
>> __user *, mmsg,
>>  
>>  datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys);
>>  
>> -if (datagrams > 0 &&
>> -copy_to_user(timeout, _sys, sizeof(timeout_sys)))
>> +if (copy_to_user(timeout, _sys, sizeof(timeout_sys)))
>>  datagrams = -EFAULT;
>>  
>>  return datagrams;
>>  
>> --
>>
>> This is a quick thing just to show where the problem lies, need to think
>> how to report an -EFAULT at this point properly, i.e. look at
>> __sys_recvmmsg for something related (returning the number of
>> successfully copied datagrams to userspace while storing the error for
>> subsequent reporting):
>>
>> if (err == 0)
>> return datagrams;
>>
>> if (datagrams != 0) {
>> /*
>>  * We may return less entries than requested (vlen) if
>>  * the
>>  * sock is non block and there aren't enough
>>  * datagrams...
>>  */
>> if (err != -EAGAIN) {
>> /*
>>  * ... or  if recvmsg returns an error after we
>>  * received some datagrams, where we record the
>>  * error to return on the next call or if the
>>  * app asks about it using getsockopt(SO_ERROR).
>>  */
>> sock->sk->sk_err = -err;
>> }
>>
>> return datagrams;
>> }
>>
>> I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
>> more about it, sidetracked now, will be back to this.
>>
>> Anyway, attached goes the current combined patch.
> 
> So, I applied against net-next as you suggested offlist.
> Builds and generally tests fine. Some observations:
> 
> * In the case that the call is interrupted by a signal handler and no
>   datagrams have been received, the call fails with EINTR, as expected.
> 
> * The call always updates 'timeout', both in the success case and in the
>   EINTR case. (That seems fine.)

So, returning to your recvmmsg-timeout-v3.patch. I think the behavior as
implemented, and described above is okay.

> But, another question...
> 
> In the case that the call is interrupted by a signal handler and some
> datagrams have already been received, then the call succeeds, and
> returns the number of datagrams received, and 'timeout' is updated with
> the remaining time. Maybe that's the right behavior, but I just want to
> check. There is at least one other possibility:
>
> * Fetch no datagrams (i.e., the datagrams are left to receive in a
>   future call), and the call fails with EINTR, and 'timeout' is updated.
> 
> Maybe that possibility is hard to implement (not sure). But my main point
> is to make the current behavior clear, note the alternative, and ask:
> is the current behavior the best choice. (I'm not saying it's not, but I
> do want the choice to be a conscious one.)

So, I think (can't find the mail right now) that you explained elsewhere
that the above would be hard to implement. And in any case, I'm not sure
it's desirable; I only wanted to check that the choice was a deliberate one.

However, there is still a weirdness, which relates to the discussion you
and David Laight had. 

Suppose the following scenario.

1. We do a recvmmsg() with 10 second timeout, asking for 5 messages.
2. 3 messages arrive
3. 6 seconds after the call, a signal handler interrupts the call.
4. recvmmsg() returns success, telling us it got 3 messages.

So far, so good. But

5. We make a further recvmmsg() call.
6. That call returns immediately, with an EINTR error.

That really should not be happening. As noted elsewhere in this
thread, EINTR is a property of a specific system call,

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-27 Thread Michael Kerrisk (man-pages)

On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) 
> escreveu:
>> Hi Arnaldo,
>>
>> Things have gone quiet ;-). What's the current state of this patch?
> 
> Yeah, I kept meaning to prod the other people on this thread about what
> they thought about my last messages, patches, etc. :-)
> 
> Can I have acked-by or even tested-by on those? Is it ok?

I just need to go back and test one point that sounds like it might still be 
broken.

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-27 Thread Michael Kerrisk (man-pages)

On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote:
 Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 Hi Arnaldo,

 Things have gone quiet ;-). What's the current state of this patch?
 
 Yeah, I kept meaning to prod the other people on this thread about what
 they thought about my last messages, patches, etc. :-)
 
 Can I have acked-by or even tested-by on those? Is it ok?

I just need to go back and test one point that sounds like it might still be 
broken.

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-27 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/28/2014 02:20 PM, Michael Kerrisk (man-pages) wrote:
 On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
 Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
 a...@ghostprotocols.net wrote:
 Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
 Can you try the attached patch on top of the first one?

 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this

 It was meant to highlight the changes with regard to the previous patch,
 i.e. to make things easier for reviewing.

 (I don't think that works...)

 Lets try both then, 
 
 That's better!
 
 attached goes the updated patch, and this is the
 diff to the last combined one:

 diff --git a/net/socket.c b/net/socket.c
 index 310a50971769..379be43879db 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
 __user *, mmsg,
  
  datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys);
  
 -if (datagrams  0 
 -copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
 +if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
  datagrams = -EFAULT;
  
  return datagrams;
  
 --

 This is a quick thing just to show where the problem lies, need to think
 how to report an -EFAULT at this point properly, i.e. look at
 __sys_recvmmsg for something related (returning the number of
 successfully copied datagrams to userspace while storing the error for
 subsequent reporting):

 if (err == 0)
 return datagrams;

 if (datagrams != 0) {
 /*
  * We may return less entries than requested (vlen) if
  * the
  * sock is non block and there aren't enough
  * datagrams...
  */
 if (err != -EAGAIN) {
 /*
  * ... or  if recvmsg returns an error after we
  * received some datagrams, where we record the
  * error to return on the next call or if the
  * app asks about it using getsockopt(SO_ERROR).
  */
 sock-sk-sk_err = -err;
 }

 return datagrams;
 }

 I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
 more about it, sidetracked now, will be back to this.

 Anyway, attached goes the current combined patch.
 
 So, I applied against net-next as you suggested offlist.
 Builds and generally tests fine. Some observations:
 
 * In the case that the call is interrupted by a signal handler and no
   datagrams have been received, the call fails with EINTR, as expected.
 
 * The call always updates 'timeout', both in the success case and in the
   EINTR case. (That seems fine.)

So, returning to your recvmmsg-timeout-v3.patch. I think the behavior as
implemented, and described above is okay.

 But, another question...
 
 In the case that the call is interrupted by a signal handler and some
 datagrams have already been received, then the call succeeds, and
 returns the number of datagrams received, and 'timeout' is updated with
 the remaining time. Maybe that's the right behavior, but I just want to
 check. There is at least one other possibility:

 * Fetch no datagrams (i.e., the datagrams are left to receive in a
   future call), and the call fails with EINTR, and 'timeout' is updated.
 
 Maybe that possibility is hard to implement (not sure). But my main point
 is to make the current behavior clear, note the alternative, and ask:
 is the current behavior the best choice. (I'm not saying it's not, but I
 do want the choice to be a conscious one.)

So, I think (can't find the mail right now) that you explained elsewhere
that the above would be hard to implement. And in any case, I'm not sure
it's desirable; I only wanted to check that the choice was a deliberate one.

However, there is still a weirdness, which relates to the discussion you
and David Laight had. 

Suppose the following scenario.

1. We do a recvmmsg() with 10 second timeout, asking for 5 messages.
2. 3 messages arrive
3. 6 seconds after the call, a signal handler interrupts the call.
4. recvmmsg() returns success, telling us it got 3 messages.

So far, so good. But

5. We make a further recvmmsg() call.
6. That call returns immediately, with an EINTR error.

That really should not be happening. As noted elsewhere in this
thread, EINTR is a property of a specific system call, not of the
thread or the socket. By the time of step 5, the kernel should
already have forgotten about the signal that occurred at step 3.
I don't think I saw

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-24 Thread Arnaldo Carvalho de Melo

Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
> 
> Things have gone quiet ;-). What's the current state of this patch?

Yeah, I kept meaning to prod the other people on this thread about what
they thought about my last messages, patches, etc. :-)

Can I have acked-by or even tested-by on those? Is it ok?

- Arnaldo

> Thanks,
> 
> Michael
> 
> 
> On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
>  wrote:
> > Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
> >> From: 'Arnaldo Carvalho de Melo'
> >> ...
> >> > > I remember some discussions from an XNET standards meeting (I've 
> >> > > forgotten
> >> > > exactly which errors on which calls were being discussed).
> >> > > My recollection is that you return success with a partial transfer
> >> > > count for ANY error that happens after some data has been transferred.
> >> > > The actual error will be returned when it happens again on the next
> >> > > system call - Note the AGAIN, not a saved error.
> >
> >> > A saved error, for the right entity, in the recvmmsg case, that
> >> > basically is batching multiple recvmsg syscalls, doesn't sound like a
> >> > problem, i.e. the idea is to, as much as possible, mimic what multiple
> >> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> >> > subsystems) overhead.
> >
> >> > Perhaps we can have something in between, i.e. for things like EFAULT,
> >> > we should report straight away, effectively dropping whatever datagrams
> >> > successfully received in the current batch, do you agree?
> >
> >> Not unreasonable - EFAULT shouldn't happen unless the application
> >> is buggy.
> >
> > Ok.
> >
> >> > For transient errors the existing mechanism, fixed so that only per
> >> > socket errors are saved for later, as today, could be kept?
> >
> >> I don't think it is ever necessary to save an errno value for the
> >> next system call at all.
> >> Just process the next system call and see what happens.
> >
> >> If the call returns with less than the maximum number of datagrams
> >> and with a non-zero timeout left - then the application can infer
> >> that it was terminated by an abnormal event of some kind.
> >> This might be a signal.
> >
> > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> > error on the next call, but we provide a way for the app to retrieve the
> > reason for the smaller than expected batch?
> >
> >> I'm not sure if an icmp error on a connected datagram socket could
> >> generate a 'disconnect'. It might happen if the interface is being
> >> used for something like SCTP.
> >> In either case the next call will detect the error.
> >
> > - Arnaldo
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-24 Thread Arnaldo Carvalho de Melo

Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
 Hi Arnaldo,
 
 Things have gone quiet ;-). What's the current state of this patch?

Yeah, I kept meaning to prod the other people on this thread about what
they thought about my last messages, patches, etc. :-)

Can I have acked-by or even tested-by on those? Is it ok?

- Arnaldo

 Thanks,
 
 Michael
 
 
 On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
 a...@ghostprotocols.net wrote:
  Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
  From: 'Arnaldo Carvalho de Melo'
  ...
I remember some discussions from an XNET standards meeting (I've 
forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.
 
   A saved error, for the right entity, in the recvmmsg case, that
   basically is batching multiple recvmsg syscalls, doesn't sound like a
   problem, i.e. the idea is to, as much as possible, mimic what multiple
   recvmsg calls would do, but reduce its in/out kernel (and inside kernel
   subsystems) overhead.
 
   Perhaps we can have something in between, i.e. for things like EFAULT,
   we should report straight away, effectively dropping whatever datagrams
   successfully received in the current batch, do you agree?
 
  Not unreasonable - EFAULT shouldn't happen unless the application
  is buggy.
 
  Ok.
 
   For transient errors the existing mechanism, fixed so that only per
   socket errors are saved for later, as today, could be kept?
 
  I don't think it is ever necessary to save an errno value for the
  next system call at all.
  Just process the next system call and see what happens.
 
  If the call returns with less than the maximum number of datagrams
  and with a non-zero timeout left - then the application can infer
  that it was terminated by an abnormal event of some kind.
  This might be a signal.
 
  Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
  error on the next call, but we provide a way for the app to retrieve the
  reason for the smaller than expected batch?
 
  I'm not sure if an icmp error on a connected datagram socket could
  generate a 'disconnect'. It might happen if the interface is being
  used for something like SCTP.
  In either case the next call will detect the error.
 
  - Arnaldo
 
 
 
 -- 
 Michael Kerrisk
 Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
 Linux/UNIX System Programming Training: http://man7.org/training/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-16 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

Things have gone quiet ;-). What's the current state of this patch?

Thanks,

Michael


On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
 wrote:
> Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
>> From: 'Arnaldo Carvalho de Melo'
>> ...
>> > > I remember some discussions from an XNET standards meeting (I've 
>> > > forgotten
>> > > exactly which errors on which calls were being discussed).
>> > > My recollection is that you return success with a partial transfer
>> > > count for ANY error that happens after some data has been transferred.
>> > > The actual error will be returned when it happens again on the next
>> > > system call - Note the AGAIN, not a saved error.
>
>> > A saved error, for the right entity, in the recvmmsg case, that
>> > basically is batching multiple recvmsg syscalls, doesn't sound like a
>> > problem, i.e. the idea is to, as much as possible, mimic what multiple
>> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
>> > subsystems) overhead.
>
>> > Perhaps we can have something in between, i.e. for things like EFAULT,
>> > we should report straight away, effectively dropping whatever datagrams
>> > successfully received in the current batch, do you agree?
>
>> Not unreasonable - EFAULT shouldn't happen unless the application
>> is buggy.
>
> Ok.
>
>> > For transient errors the existing mechanism, fixed so that only per
>> > socket errors are saved for later, as today, could be kept?
>
>> I don't think it is ever necessary to save an errno value for the
>> next system call at all.
>> Just process the next system call and see what happens.
>
>> If the call returns with less than the maximum number of datagrams
>> and with a non-zero timeout left - then the application can infer
>> that it was terminated by an abnormal event of some kind.
>> This might be a signal.
>
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?
>
>> I'm not sure if an icmp error on a connected datagram socket could
>> generate a 'disconnect'. It might happen if the interface is being
>> used for something like SCTP.
>> In either case the next call will detect the error.
>
> - Arnaldo



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-16 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

Things have gone quiet ;-). What's the current state of this patch?

Thanks,

Michael


On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
a...@ghostprotocols.net wrote:
 Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
 From: 'Arnaldo Carvalho de Melo'
 ...
   I remember some discussions from an XNET standards meeting (I've 
   forgotten
   exactly which errors on which calls were being discussed).
   My recollection is that you return success with a partial transfer
   count for ANY error that happens after some data has been transferred.
   The actual error will be returned when it happens again on the next
   system call - Note the AGAIN, not a saved error.

  A saved error, for the right entity, in the recvmmsg case, that
  basically is batching multiple recvmsg syscalls, doesn't sound like a
  problem, i.e. the idea is to, as much as possible, mimic what multiple
  recvmsg calls would do, but reduce its in/out kernel (and inside kernel
  subsystems) overhead.

  Perhaps we can have something in between, i.e. for things like EFAULT,
  we should report straight away, effectively dropping whatever datagrams
  successfully received in the current batch, do you agree?

 Not unreasonable - EFAULT shouldn't happen unless the application
 is buggy.

 Ok.

  For transient errors the existing mechanism, fixed so that only per
  socket errors are saved for later, as today, could be kept?

 I don't think it is ever necessary to save an errno value for the
 next system call at all.
 Just process the next system call and see what happens.

 If the call returns with less than the maximum number of datagrams
 and with a non-zero timeout left - then the application can infer
 that it was terminated by an abnormal event of some kind.
 This might be a signal.

 Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
 error on the next call, but we provide a way for the app to retrieve the
 reason for the smaller than expected batch?

 I'm not sure if an icmp error on a connected datagram socket could
 generate a 'disconnect'. It might happen if the interface is being
 used for something like SCTP.
 In either case the next call will detect the error.

 - Arnaldo



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC] Handle EFAULT in partial recvmmsg was Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 11:17:05AM -0300, 'Arnaldo Carvalho de Melo' escreveu:
> Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
> > From: 'Arnaldo Carvalho de Melo'
> > ...
> > > > I remember some discussions from an XNET standards meeting (I've 
> > > > forgotten
> > > > exactly which errors on which calls were being discussed).
> > > > My recollection is that you return success with a partial transfer
> > > > count for ANY error that happens after some data has been transferred.
> > > > The actual error will be returned when it happens again on the next
> > > > system call - Note the AGAIN, not a saved error.
 
> > > A saved error, for the right entity, in the recvmmsg case, that
> > > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > > subsystems) overhead.
 
> > > Perhaps we can have something in between, i.e. for things like EFAULT,
> > > we should report straight away, effectively dropping whatever datagrams
> > > successfully received in the current batch, do you agree?
  
> > Not unreasonable - EFAULT shouldn't happen unless the application
> > is buggy.
 
> Ok.

So the patch below should handle it, and record that the packets were
dropped, not at the transport level, like UDP_MIB_INERRORS, for
instance, would indicate, but at the batching, recvmmsg level, so
perhaps we'll need a MIB variable for that.

Also a counterpart to the trace_kfree_skb(skb, udp_recvmsg) tracepoint
for dropwatch and similar tools to use, Neil?

I'm keeping this separate from the timeout update patch.

- Arnaldo

diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..63491f015912 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2415,13 +2415,17 @@ out_put:
return datagrams;
 
if (datagrams != 0) {
+   if (err == -EFAULT) {
+   atomic_add(datagrams, >sk->sk_drops);
+   return -EFAULT;
+   }
/*
 * We may return less entries than requested (vlen) if the
 * sock is non block and there aren't enough datagrams...
 */
if (err != -EAGAIN) {
/*
-* ... or  if recvmsg returns an error after we
+* ... or if recvmsg returns a socket error after we
 * received some datagrams, where we record the
 * error to return on the next call or if the
 * app asks about it using getsockopt(SO_ERROR).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de Melo'
> Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
> > From: 'Arnaldo Carvalho de Melo'
> > ...
> > > > I remember some discussions from an XNET standards meeting (I've 
> > > > forgotten
> > > > exactly which errors on which calls were being discussed).
> > > > My recollection is that you return success with a partial transfer
> > > > count for ANY error that happens after some data has been transferred.
> > > > The actual error will be returned when it happens again on the next
> > > > system call - Note the AGAIN, not a saved error.
> 
> > > A saved error, for the right entity, in the recvmmsg case, that
> > > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > > subsystems) overhead.
> 
> > > Perhaps we can have something in between, i.e. for things like EFAULT,
> > > we should report straight away, effectively dropping whatever datagrams
> > > successfully received in the current batch, do you agree?
> 
> > Not unreasonable - EFAULT shouldn't happen unless the application
> > is buggy.
> 
> Ok.
> 
> > > For transient errors the existing mechanism, fixed so that only per
> > > socket errors are saved for later, as today, could be kept?
> 
> > I don't think it is ever necessary to save an errno value for the
> > next system call at all.
> > Just process the next system call and see what happens.
> 
> > If the call returns with less than the maximum number of datagrams
> > and with a non-zero timeout left - then the application can infer
> > that it was terminated by an abnormal event of some kind.
> > This might be a signal.
> 
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?

If you really think it is necessary, then you want a field in the
control structure.
But IMHO returning the 'time left' is more than enough.

IIRC the original problem was that the user-specified timeout
was used as an inter-datagram timer instead of an overall timeout.

I suspect that most application won't actually care about the
'time left', nor the actual number of returned datagrams.
They will just process what they are given and then wait for
the next batch.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
> From: 'Arnaldo Carvalho de Melo'
> ...
> > > I remember some discussions from an XNET standards meeting (I've forgotten
> > > exactly which errors on which calls were being discussed).
> > > My recollection is that you return success with a partial transfer
> > > count for ANY error that happens after some data has been transferred.
> > > The actual error will be returned when it happens again on the next
> > > system call - Note the AGAIN, not a saved error.

> > A saved error, for the right entity, in the recvmmsg case, that
> > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > subsystems) overhead.

> > Perhaps we can have something in between, i.e. for things like EFAULT,
> > we should report straight away, effectively dropping whatever datagrams
> > successfully received in the current batch, do you agree?
 
> Not unreasonable - EFAULT shouldn't happen unless the application
> is buggy.

Ok.
 
> > For transient errors the existing mechanism, fixed so that only per
> > socket errors are saved for later, as today, could be kept?
 
> I don't think it is ever necessary to save an errno value for the
> next system call at all.
> Just process the next system call and see what happens.
 
> If the call returns with less than the maximum number of datagrams
> and with a non-zero timeout left - then the application can infer
> that it was terminated by an abnormal event of some kind.
> This might be a signal.

Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
error on the next call, but we provide a way for the app to retrieve the
reason for the smaller than expected batch?

> I'm not sure if an icmp error on a connected datagram socket could
> generate a 'disconnect'. It might happen if the interface is being
> used for something like SCTP.
> In either case the next call will detect the error.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de Melo'
...
> > I remember some discussions from an XNET standards meeting (I've forgotten
> > exactly which errors on which calls were being discussed).
> > My recollection is that you return success with a partial transfer
> > count for ANY error that happens after some data has been transferred.
> > The actual error will be returned when it happens again on the next
> > system call - Note the AGAIN, not a saved error.
> 
> A saved error, for the right entity, in the recvmmsg case, that
> basically is batching multiple recvmsg syscalls, doesn't sound like a
> problem, i.e. the idea is to, as much as possible, mimic what multiple
> recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> subsystems) overhead.
> 
> Perhaps we can have something in between, i.e. for things like EFAULT,
> we should report straight away, effectively dropping whatever datagrams
> successfully received in the current batch, do you agree?

Not unreasonable - EFAULT shouldn't happen unless the application
is buggy.

> For transient errors the existing mechanism, fixed so that only per
> socket errors are saved for later, as today, could be kept?

I don't think it is ever necessary to save an errno value for the
next system call at all.
Just process the next system call and see what happens.

If the call returns with less than the maximum number of datagrams
and with a non-zero timeout left - then the application can infer
that it was terminated by an abnormal event of some kind.
This might be a signal.
I'm not sure if an icmp error on a connected datagram socket could
generate a 'disconnect'. It might happen if the interface is being
used for something like SCTP.
In either case the next call will detect the error.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread Michael Kerrisk (man-pages)

On 05/29/2014 12:53 PM, David Laight wrote:
> From: 'Arnaldo Carvalho de 
> ...
 So, yes, the user _can_ process the packets already copied to userspace,
 i.e. no packet loss, and then, on the next call, will receive the signal
 notification.
>>
>>> The application shouldn't need to see an EINTR response, any signal handler
>>> should be run when the system call returns to user (regardless of the
>>> system call result code).
>>> If that doesn't happen Linux is badly broken!
>>> >From an application point of view this is exactly the same as the signal
>>> occurring just before/after the kernel entry/exit for the system call.
>>>
>>> The call should just return early with success status.
>>> No need to preserve the EINTR response for later.
>>>
>>> The same might be appropriate for other errors - maybe including EFAULT
>>> copying non-initial messages to userspace.
>>> Put the message being processed back on the socket queue and return
>>> success with the (non-zero) partial message count.
>>
>> We don't need to put anything back, if we get an EFAULT for a datagram,
>> then we stop processing that packet, _dropping_ it (and that is just
>> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
>> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
>> and stop the batch, and if no datagrams were received, return the error
>> straight away.
>>
>> But if some datagrams were successfully received, and at that point
>> _already_ removed from queues and sent successfully to userspace,
>> recvmmsg will return the number of successfully copied datagrams and
>> store the error so that it can return on the next syscall,
> 
> That just doesn't make any sense.

Agreed.

> Saving an errno code would only make any sense if the error were a
> property of the socket - 

Back in http://marc.info/?l=linux-netdev=124298156121906=2
(the follow-on from the discussion that Arnaldo mentions below), 
it was noted:

: Normally you'd expect the call to return what it has read without an
: error, and then the socket error would be picked up on the next call.

and the key point in that sentence was "*socket* error."

> but EFAULT is a property of the system call,
> and EINTR a property of the process (it exists so that the process
> can return to userspace to execute a signal handler - relying on
> SIGALRM to timeout blocking system calls is a recipe for disaster).

Exactly. Interruption by a signal should just result in an early
success return, unless no datagrams have been received so far, in
which case it should produce an EINTR failure. No error should be 
saved for a future call.

> The next system call could be from an entirely different process,
> neither EFAULT nor EINTR would mean anything to it at all.
> 
> ISTR that returning EFAULT generates a signal that will typically
> terminate the process.

Not generally, I think. (I think you're thinking of SIGSEGV when
a process  touches a nonexistent address in user mode.)

> You definitely don't want to send one to a different process.

But it's true that the EFAULT or EINTR shouldn't be returned
to another process.
 
>> Please refer to the original discussion on how to report how many
>> successfully copied datagrams and also report that it stopped before the
>> timeout and the number of requested datagrams in a batch:
>>
>> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com
> 
> I do remember the original problem.
> I don't recall error reporting being referenced.

(See above.)

>> What is being discussed here is how to return the EFAULT that may happen
>> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
>> plain returning all that was requested, with no errors.
> 
> I remember some discussions from an XNET standards meeting (I've forgotten
> exactly which errors on which calls were being discussed).
> My recollection is that you return success with a partial transfer
> count for ANY error that happens after some data has been transferred.
> The actual error will be returned when it happens again on the next
> system call - Note the AGAIN, not a saved error.
> 
> Things like blocking send/write being interrupted spring to mind.
> Possibly even copyin/out failures part way through a read/write call.
> 
>> This EFAULT _after_ datagram processing may happen when updating the
>> remaining timeout, because then how can userspace both receive the
>> number of successfully copied datagrams (in any of the cases mentioned
>> in the previous paragraph) and know that that timeout can't be used
>> because there was a problem while trying to copy it to userspace
>> (EFAULT)?
> 
> Failure to write the control structure back to userspace probably
> deserves an EFAULT return - the application is buggy.
> IIRC normal recvmsg() copies out the control structure at the end
> of processing - that can fail.
> I wouldn't worry about datagram discards on any of those late
> EFAULT conditions.

Agree on all

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 10:53:22AM +, David Laight escreveu:
> From: 'Arnaldo Carvalho de 
> ...
> > > > So, yes, the user _can_ process the packets already copied to userspace,
> > > > i.e. no packet loss, and then, on the next call, will receive the signal
> > > > notification.
> > 
> > > The application shouldn't need to see an EINTR response, any signal 
> > > handler
> > > should be run when the system call returns to user (regardless of the
> > > system call result code).
> > > If that doesn't happen Linux is badly broken!
> > > >From an application point of view this is exactly the same as the signal
> > > occurring just before/after the kernel entry/exit for the system call.

> > > The call should just return early with success status.
> > > No need to preserve the EINTR response for later.

> > > The same might be appropriate for other errors - maybe including EFAULT
> > > copying non-initial messages to userspace.
> > > Put the message being processed back on the socket queue and return
> > > success with the (non-zero) partial message count.
 
> > We don't need to put anything back, if we get an EFAULT for a datagram,
> > then we stop processing that packet, _dropping_ it (and that is just
> > like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
> > and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
> > and stop the batch, and if no datagrams were received, return the error
> > straight away.
 
> > But if some datagrams were successfully received, and at that point
> > _already_ removed from queues and sent successfully to userspace,
> > recvmmsg will return the number of successfully copied datagrams and
> > store the error so that it can return on the next syscall,
 
> That just doesn't make any sense.

Yeah for things like EFAULT, storing it in a per socket area for later
reporting is a bug, a separate bug.

> Saving an errno code would only make any sense if the error were a
> property of the socket - but EFAULT is a property of the system call,

Agreed, so for the errors that are socket related, the mechanism should
work, not for things that are thread specific, then we should either
straight away signal it despite of any successfully received packets in
the batch so far in the current recvmmsg syscall or mimic what would
happen if the user issued multiple recvmsg syscalls instead, i.e. in the
next call _for this thread_, the EFAULT will take place.

> and EINTR a property of the process (it exists so that the process
> can return to userspace to execute a signal handler - relying on
> SIGALRM to timeout blocking system calls is a recipe for disaster).
> 
> The next system call could be from an entirely different process,
> neither EFAULT nor EINTR would mean anything to it at all.

Right, storing thread specific errors on the socket is a bug and has to
be fixed. I.e. _if_ we keep the saving error for next syscall strategy,
then that error has, for the per thread cases, be stored in a per thread
area error field for socket operations.
 
> ISTR that returning EFAULT generates a signal that will typically
> terminate the process.
> You definitely don't want to send one to a different process.

Right.
 
> > Please refer to the original discussion on how to report how many
> > successfully copied datagrams and also report that it stopped before the
> > timeout and the number of requested datagrams in a batch:
 
> > http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com
 
> I do remember the original problem.
> I don't recall error reporting being referenced.
 
> > What is being discussed here is how to return the EFAULT that may happen
> > _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> > plain returning all that was requested, with no errors.

> I remember some discussions from an XNET standards meeting (I've forgotten
> exactly which errors on which calls were being discussed).
> My recollection is that you return success with a partial transfer
> count for ANY error that happens after some data has been transferred.
> The actual error will be returned when it happens again on the next
> system call - Note the AGAIN, not a saved error.

A saved error, for the right entity, in the recvmmsg case, that
basically is batching multiple recvmsg syscalls, doesn't sound like a
problem, i.e. the idea is to, as much as possible, mimic what multiple
recvmsg calls would do, but reduce its in/out kernel (and inside kernel
subsystems) overhead.

Perhaps we can have something in between, i.e. for things like EFAULT,
we should report straight away, effectively dropping whatever datagrams
successfully received in the current batch, do you agree?

For transient errors the existing mechanism, fixed so that only per
socket errors are saved for later, as today, could be kept?
 
> Things like blocking send/write being interrupted spring to mind.
> Possibly even copyin/out failures part way through a read/write call.
> 
> >

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de 
...
> > > So, yes, the user _can_ process the packets already copied to userspace,
> > > i.e. no packet loss, and then, on the next call, will receive the signal
> > > notification.
> 
> > The application shouldn't need to see an EINTR response, any signal handler
> > should be run when the system call returns to user (regardless of the
> > system call result code).
> > If that doesn't happen Linux is badly broken!
> > >From an application point of view this is exactly the same as the signal
> > occurring just before/after the kernel entry/exit for the system call.
> >
> > The call should just return early with success status.
> > No need to preserve the EINTR response for later.
> >
> > The same might be appropriate for other errors - maybe including EFAULT
> > copying non-initial messages to userspace.
> > Put the message being processed back on the socket queue and return
> > success with the (non-zero) partial message count.
> 
> We don't need to put anything back, if we get an EFAULT for a datagram,
> then we stop processing that packet, _dropping_ it (and that is just
> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
> and stop the batch, and if no datagrams were received, return the error
> straight away.
> 
> But if some datagrams were successfully received, and at that point
> _already_ removed from queues and sent successfully to userspace,
> recvmmsg will return the number of successfully copied datagrams and
> store the error so that it can return on the next syscall,

That just doesn't make any sense.
Saving an errno code would only make any sense if the error were a
property of the socket - but EFAULT is a property of the system call,
and EINTR a property of the process (it exists so that the process
can return to userspace to execute a signal handler - relying on
SIGALRM to timeout blocking system calls is a recipe for disaster).

The next system call could be from an entirely different process,
neither EFAULT nor EINTR would mean anything to it at all.

ISTR that returning EFAULT generates a signal that will typically
terminate the process.
You definitely don't want to send one to a different process.

> Please refer to the original discussion on how to report how many
> successfully copied datagrams and also report that it stopped before the
> timeout and the number of requested datagrams in a batch:
> 
> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com

I do remember the original problem.
I don't recall error reporting being referenced.

> What is being discussed here is how to return the EFAULT that may happen
> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> plain returning all that was requested, with no errors.

I remember some discussions from an XNET standards meeting (I've forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.

Things like blocking send/write being interrupted spring to mind.
Possibly even copyin/out failures part way through a read/write call.

> This EFAULT _after_ datagram processing may happen when updating the
> remaining timeout, because then how can userspace both receive the
> number of successfully copied datagrams (in any of the cases mentioned
> in the previous paragraph) and know that that timeout can't be used
> because there was a problem while trying to copy it to userspace
> (EFAULT)?

Failure to write the control structure back to userspace probably
deserves an EFAULT return - the application is buggy.
IIRC normal recvmsg() copies out the control structure at the end
of processing - that can fail.
I wouldn't worry about datagram discards on any of those late
EFAULT conditions.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de 
...
   So, yes, the user _can_ process the packets already copied to userspace,
   i.e. no packet loss, and then, on the next call, will receive the signal
   notification.
 
  The application shouldn't need to see an EINTR response, any signal handler
  should be run when the system call returns to user (regardless of the
  system call result code).
  If that doesn't happen Linux is badly broken!
  From an application point of view this is exactly the same as the signal
  occurring just before/after the kernel entry/exit for the system call.
 
  The call should just return early with success status.
  No need to preserve the EINTR response for later.
 
  The same might be appropriate for other errors - maybe including EFAULT
  copying non-initial messages to userspace.
  Put the message being processed back on the socket queue and return
  success with the (non-zero) partial message count.
 
 We don't need to put anything back, if we get an EFAULT for a datagram,
 then we stop processing that packet, _dropping_ it (and that is just
 like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
 and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
 and stop the batch, and if no datagrams were received, return the error
 straight away.
 
 But if some datagrams were successfully received, and at that point
 _already_ removed from queues and sent successfully to userspace,
 recvmmsg will return the number of successfully copied datagrams and
 store the error so that it can return on the next syscall,

That just doesn't make any sense.
Saving an errno code would only make any sense if the error were a
property of the socket - but EFAULT is a property of the system call,
and EINTR a property of the process (it exists so that the process
can return to userspace to execute a signal handler - relying on
SIGALRM to timeout blocking system calls is a recipe for disaster).

The next system call could be from an entirely different process,
neither EFAULT nor EINTR would mean anything to it at all.

ISTR that returning EFAULT generates a signal that will typically
terminate the process.
You definitely don't want to send one to a different process.

 Please refer to the original discussion on how to report how many
 successfully copied datagrams and also report that it stopped before the
 timeout and the number of requested datagrams in a batch:
 
 http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com

I do remember the original problem.
I don't recall error reporting being referenced.

 What is being discussed here is how to return the EFAULT that may happen
 _after_ datagram processing, be it interrupted by an EFAULT, signal, or
 plain returning all that was requested, with no errors.

I remember some discussions from an XNET standards meeting (I've forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.

Things like blocking send/write being interrupted spring to mind.
Possibly even copyin/out failures part way through a read/write call.

 This EFAULT _after_ datagram processing may happen when updating the
 remaining timeout, because then how can userspace both receive the
 number of successfully copied datagrams (in any of the cases mentioned
 in the previous paragraph) and know that that timeout can't be used
 because there was a problem while trying to copy it to userspace
 (EFAULT)?

Failure to write the control structure back to userspace probably
deserves an EFAULT return - the application is buggy.
IIRC normal recvmsg() copies out the control structure at the end
of processing - that can fail.
I wouldn't worry about datagram discards on any of those late
EFAULT conditions.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 10:53:22AM +, David Laight escreveu:
 From: 'Arnaldo Carvalho de 
 ...
So, yes, the user _can_ process the packets already copied to userspace,
i.e. no packet loss, and then, on the next call, will receive the signal
notification.
  
   The application shouldn't need to see an EINTR response, any signal 
   handler
   should be run when the system call returns to user (regardless of the
   system call result code).
   If that doesn't happen Linux is badly broken!
   From an application point of view this is exactly the same as the signal
   occurring just before/after the kernel entry/exit for the system call.

   The call should just return early with success status.
   No need to preserve the EINTR response for later.

   The same might be appropriate for other errors - maybe including EFAULT
   copying non-initial messages to userspace.
   Put the message being processed back on the socket queue and return
   success with the (non-zero) partial message count.
 
  We don't need to put anything back, if we get an EFAULT for a datagram,
  then we stop processing that packet, _dropping_ it (and that is just
  like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
  and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
  and stop the batch, and if no datagrams were received, return the error
  straight away.
 
  But if some datagrams were successfully received, and at that point
  _already_ removed from queues and sent successfully to userspace,
  recvmmsg will return the number of successfully copied datagrams and
  store the error so that it can return on the next syscall,
 
 That just doesn't make any sense.

Yeah for things like EFAULT, storing it in a per socket area for later
reporting is a bug, a separate bug.

 Saving an errno code would only make any sense if the error were a
 property of the socket - but EFAULT is a property of the system call,

Agreed, so for the errors that are socket related, the mechanism should
work, not for things that are thread specific, then we should either
straight away signal it despite of any successfully received packets in
the batch so far in the current recvmmsg syscall or mimic what would
happen if the user issued multiple recvmsg syscalls instead, i.e. in the
next call _for this thread_, the EFAULT will take place.

 and EINTR a property of the process (it exists so that the process
 can return to userspace to execute a signal handler - relying on
 SIGALRM to timeout blocking system calls is a recipe for disaster).
 
 The next system call could be from an entirely different process,
 neither EFAULT nor EINTR would mean anything to it at all.

Right, storing thread specific errors on the socket is a bug and has to
be fixed. I.e. _if_ we keep the saving error for next syscall strategy,
then that error has, for the per thread cases, be stored in a per thread
area error field for socket operations.
 
 ISTR that returning EFAULT generates a signal that will typically
 terminate the process.
 You definitely don't want to send one to a different process.

Right.
 
  Please refer to the original discussion on how to report how many
  successfully copied datagrams and also report that it stopped before the
  timeout and the number of requested datagrams in a batch:
 
  http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com
 
 I do remember the original problem.
 I don't recall error reporting being referenced.
 
  What is being discussed here is how to return the EFAULT that may happen
  _after_ datagram processing, be it interrupted by an EFAULT, signal, or
  plain returning all that was requested, with no errors.

 I remember some discussions from an XNET standards meeting (I've forgotten
 exactly which errors on which calls were being discussed).
 My recollection is that you return success with a partial transfer
 count for ANY error that happens after some data has been transferred.
 The actual error will be returned when it happens again on the next
 system call - Note the AGAIN, not a saved error.

A saved error, for the right entity, in the recvmmsg case, that
basically is batching multiple recvmsg syscalls, doesn't sound like a
problem, i.e. the idea is to, as much as possible, mimic what multiple
recvmsg calls would do, but reduce its in/out kernel (and inside kernel
subsystems) overhead.

Perhaps we can have something in between, i.e. for things like EFAULT,
we should report straight away, effectively dropping whatever datagrams
successfully received in the current batch, do you agree?

For transient errors the existing mechanism, fixed so that only per
socket errors are saved for later, as today, could be kept?
 
 Things like blocking send/write being interrupted spring to mind.
 Possibly even copyin/out failures part way through a read/write call.
 
  This EFAULT _after_ datagram processing may happen when updating the
  remaining timeout, because then how can

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread Michael Kerrisk (man-pages)

On 05/29/2014 12:53 PM, David Laight wrote:
 From: 'Arnaldo Carvalho de 
 ...
 So, yes, the user _can_ process the packets already copied to userspace,
 i.e. no packet loss, and then, on the next call, will receive the signal
 notification.

 The application shouldn't need to see an EINTR response, any signal handler
 should be run when the system call returns to user (regardless of the
 system call result code).
 If that doesn't happen Linux is badly broken!
 From an application point of view this is exactly the same as the signal
 occurring just before/after the kernel entry/exit for the system call.

 The call should just return early with success status.
 No need to preserve the EINTR response for later.

 The same might be appropriate for other errors - maybe including EFAULT
 copying non-initial messages to userspace.
 Put the message being processed back on the socket queue and return
 success with the (non-zero) partial message count.

 We don't need to put anything back, if we get an EFAULT for a datagram,
 then we stop processing that packet, _dropping_ it (and that is just
 like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
 and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
 and stop the batch, and if no datagrams were received, return the error
 straight away.

 But if some datagrams were successfully received, and at that point
 _already_ removed from queues and sent successfully to userspace,
 recvmmsg will return the number of successfully copied datagrams and
 store the error so that it can return on the next syscall,
 
 That just doesn't make any sense.

Agreed.

 Saving an errno code would only make any sense if the error were a
 property of the socket - 

Back in http://marc.info/?l=linux-netdevm=124298156121906w=2
(the follow-on from the discussion that Arnaldo mentions below), 
it was noted:

: Normally you'd expect the call to return what it has read without an
: error, and then the socket error would be picked up on the next call.

and the key point in that sentence was *socket* error.

 but EFAULT is a property of the system call,
 and EINTR a property of the process (it exists so that the process
 can return to userspace to execute a signal handler - relying on
 SIGALRM to timeout blocking system calls is a recipe for disaster).

Exactly. Interruption by a signal should just result in an early
success return, unless no datagrams have been received so far, in
which case it should produce an EINTR failure. No error should be 
saved for a future call.

 The next system call could be from an entirely different process,
 neither EFAULT nor EINTR would mean anything to it at all.
 
 ISTR that returning EFAULT generates a signal that will typically
 terminate the process.

Not generally, I think. (I think you're thinking of SIGSEGV when
a process  touches a nonexistent address in user mode.)

 You definitely don't want to send one to a different process.

But it's true that the EFAULT or EINTR shouldn't be returned
to another process.
 
 Please refer to the original discussion on how to report how many
 successfully copied datagrams and also report that it stopped before the
 timeout and the number of requested datagrams in a batch:

 http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com
 
 I do remember the original problem.
 I don't recall error reporting being referenced.

(See above.)

 What is being discussed here is how to return the EFAULT that may happen
 _after_ datagram processing, be it interrupted by an EFAULT, signal, or
 plain returning all that was requested, with no errors.
 
 I remember some discussions from an XNET standards meeting (I've forgotten
 exactly which errors on which calls were being discussed).
 My recollection is that you return success with a partial transfer
 count for ANY error that happens after some data has been transferred.
 The actual error will be returned when it happens again on the next
 system call - Note the AGAIN, not a saved error.
 
 Things like blocking send/write being interrupted spring to mind.
 Possibly even copyin/out failures part way through a read/write call.
 
 This EFAULT _after_ datagram processing may happen when updating the
 remaining timeout, because then how can userspace both receive the
 number of successfully copied datagrams (in any of the cases mentioned
 in the previous paragraph) and know that that timeout can't be used
 because there was a problem while trying to copy it to userspace
 (EFAULT)?
 
 Failure to write the control structure back to userspace probably
 deserves an EFAULT return - the application is buggy.
 IIRC normal recvmsg() copies out the control structure at the end
 of processing - that can fail.
 I wouldn't worry about datagram discards on any of those late
 EFAULT conditions.

Agree on all of the above, and that last point certainly seems
like the right approach to me.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de Melo'
...
  I remember some discussions from an XNET standards meeting (I've forgotten
  exactly which errors on which calls were being discussed).
  My recollection is that you return success with a partial transfer
  count for ANY error that happens after some data has been transferred.
  The actual error will be returned when it happens again on the next
  system call - Note the AGAIN, not a saved error.
 
 A saved error, for the right entity, in the recvmmsg case, that
 basically is batching multiple recvmsg syscalls, doesn't sound like a
 problem, i.e. the idea is to, as much as possible, mimic what multiple
 recvmsg calls would do, but reduce its in/out kernel (and inside kernel
 subsystems) overhead.
 
 Perhaps we can have something in between, i.e. for things like EFAULT,
 we should report straight away, effectively dropping whatever datagrams
 successfully received in the current batch, do you agree?

Not unreasonable - EFAULT shouldn't happen unless the application
is buggy.

 For transient errors the existing mechanism, fixed so that only per
 socket errors are saved for later, as today, could be kept?

I don't think it is ever necessary to save an errno value for the
next system call at all.
Just process the next system call and see what happens.

If the call returns with less than the maximum number of datagrams
and with a non-zero timeout left - then the application can infer
that it was terminated by an abnormal event of some kind.
This might be a signal.
I'm not sure if an icmp error on a connected datagram socket could
generate a 'disconnect'. It might happen if the interface is being
used for something like SCTP.
In either case the next call will detect the error.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
 From: 'Arnaldo Carvalho de Melo'
 ...
   I remember some discussions from an XNET standards meeting (I've forgotten
   exactly which errors on which calls were being discussed).
   My recollection is that you return success with a partial transfer
   count for ANY error that happens after some data has been transferred.
   The actual error will be returned when it happens again on the next
   system call - Note the AGAIN, not a saved error.

  A saved error, for the right entity, in the recvmmsg case, that
  basically is batching multiple recvmsg syscalls, doesn't sound like a
  problem, i.e. the idea is to, as much as possible, mimic what multiple
  recvmsg calls would do, but reduce its in/out kernel (and inside kernel
  subsystems) overhead.

  Perhaps we can have something in between, i.e. for things like EFAULT,
  we should report straight away, effectively dropping whatever datagrams
  successfully received in the current batch, do you agree?
 
 Not unreasonable - EFAULT shouldn't happen unless the application
 is buggy.

Ok.
 
  For transient errors the existing mechanism, fixed so that only per
  socket errors are saved for later, as today, could be kept?
 
 I don't think it is ever necessary to save an errno value for the
 next system call at all.
 Just process the next system call and see what happens.
 
 If the call returns with less than the maximum number of datagrams
 and with a non-zero timeout left - then the application can infer
 that it was terminated by an abnormal event of some kind.
 This might be a signal.

Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
error on the next call, but we provide a way for the app to retrieve the
reason for the smaller than expected batch?

 I'm not sure if an icmp error on a connected datagram socket could
 generate a 'disconnect'. It might happen if the interface is being
 used for something like SCTP.
 In either case the next call will detect the error.

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread David Laight

From: 'Arnaldo Carvalho de Melo'
 Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
  From: 'Arnaldo Carvalho de Melo'
  ...
I remember some discussions from an XNET standards meeting (I've 
forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.

   A saved error, for the right entity, in the recvmmsg case, that
   basically is batching multiple recvmsg syscalls, doesn't sound like a
   problem, i.e. the idea is to, as much as possible, mimic what multiple
   recvmsg calls would do, but reduce its in/out kernel (and inside kernel
   subsystems) overhead.

   Perhaps we can have something in between, i.e. for things like EFAULT,
   we should report straight away, effectively dropping whatever datagrams
   successfully received in the current batch, do you agree?

  Not unreasonable - EFAULT shouldn't happen unless the application
  is buggy.

 Ok.

   For transient errors the existing mechanism, fixed so that only per
   socket errors are saved for later, as today, could be kept?

  I don't think it is ever necessary to save an errno value for the
  next system call at all.
  Just process the next system call and see what happens.

  If the call returns with less than the maximum number of datagrams
  and with a non-zero timeout left - then the application can infer
  that it was terminated by an abnormal event of some kind.
  This might be a signal.

 Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
 error on the next call, but we provide a way for the app to retrieve the
 reason for the smaller than expected batch?

If you really think it is necessary, then you want a field in the
control structure.
But IMHO returning the 'time left' is more than enough.

IIRC the original problem was that the user-specified timeout
was used as an inter-datagram timer instead of an overall timeout.

I suspect that most application won't actually care about the
'time left', nor the actual number of returned datagrams.
They will just process what they are given and then wait for
the next batch.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC] Handle EFAULT in partial recvmmsg was Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-29 Thread 'Arnaldo Carvalho de Melo'

Em Thu, May 29, 2014 at 11:17:05AM -0300, 'Arnaldo Carvalho de Melo' escreveu:
 Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu:
  From: 'Arnaldo Carvalho de Melo'
  ...
I remember some discussions from an XNET standards meeting (I've 
forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.
 
   A saved error, for the right entity, in the recvmmsg case, that
   basically is batching multiple recvmsg syscalls, doesn't sound like a
   problem, i.e. the idea is to, as much as possible, mimic what multiple
   recvmsg calls would do, but reduce its in/out kernel (and inside kernel
   subsystems) overhead.
 
   Perhaps we can have something in between, i.e. for things like EFAULT,
   we should report straight away, effectively dropping whatever datagrams
   successfully received in the current batch, do you agree?
  
  Not unreasonable - EFAULT shouldn't happen unless the application
  is buggy.
 
 Ok.

So the patch below should handle it, and record that the packets were
dropped, not at the transport level, like UDP_MIB_INERRORS, for
instance, would indicate, but at the batching, recvmmsg level, so
perhaps we'll need a MIB variable for that.

Also a counterpart to the trace_kfree_skb(skb, udp_recvmsg) tracepoint
for dropwatch and similar tools to use, Neil?

I'm keeping this separate from the timeout update patch.

- Arnaldo

diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..63491f015912 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2415,13 +2415,17 @@ out_put:
return datagrams;
 
if (datagrams != 0) {
+   if (err == -EFAULT) {
+   atomic_add(datagrams, sock-sk-sk_drops);
+   return -EFAULT;
+   }
/*
 * We may return less entries than requested (vlen) if the
 * sock is non block and there aren't enough datagrams...
 */
if (err != -EAGAIN) {
/*
-* ... or  if recvmsg returns an error after we
+* ... or if recvmsg returns a socket error after we
 * received some datagrams, where we record the
 * error to return on the next call or if the
 * app asks about it using getsockopt(SO_ERROR).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread 'Arnaldo Carvalho de Melo'

Em Wed, May 28, 2014 at 03:33:51PM -0600, Chris Friesen escreveu:
> On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:
 
> >What is being discussed here is how to return the EFAULT that may happen
> >_after_ datagram processing, be it interrupted by an EFAULT, signal, or
> >plain returning all that was requested, with no errors.

> >This EFAULT _after_ datagram processing may happen when updating the
> >remaining timeout, because then how can userspace both receive the
> >number of successfully copied datagrams (in any of the cases mentioned
> >in the previous paragraph) and know that that timeout can't be used
> >because there was a problem while trying to copy it to userspace
> >(EFAULT)?
 
> How does select() handle this problem?  It updates the timeout and also
> modifies other data.
 
> Could we just check whether the timeout pointer is valid before doing
> anything else?  Of course we could still fault the page out while waiting
> for messages and then fail to fault it back in later, but that seems like a
> not-very-likely scenario.

I'll check how select behaves, and yes, I think it is not-very-likely
and what we're doing now is reasonable for datagram protocols, i.e. to
return -EFAULT when updating the timeout fails, not reporting if packets
were successfully received, i.e. they end up being "dropped", as
userspace can't easily figure out if some was received short of painting
it with some pattern and then checking the ones that aren't with that
pattern.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Chris Friesen


On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:


What is being discussed here is how to return the EFAULT that may happen
_after_ datagram processing, be it interrupted by an EFAULT, signal, or
plain returning all that was requested, with no errors.

This EFAULT _after_ datagram processing may happen when updating the
remaining timeout, because then how can userspace both receive the
number of successfully copied datagrams (in any of the cases mentioned
in the previous paragraph) and know that that timeout can't be used
because there was a problem while trying to copy it to userspace
(EFAULT)?



How does select() handle this problem?  It updates the timeout and also 
modifies other data.


Could we just check whether the timeout pointer is valid before doing 
anything else?  Of course we could still fault the page out while 
waiting for messages and then fail to fault it back in later, but that 
seems like a not-very-likely scenario.


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread 'Arnaldo Carvalho de Melo'

Em Wed, May 28, 2014 at 03:17:40PM +, David Laight escreveu:
> From: Arnaldo Carvalho de Melo
> ...
> > > But, another question...
> > >
> > > In the case that the call is interrupted by a signal handler and some
> > > datagrams have already been received, then the call succeeds, and
> > > returns the number of datagrams received, and 'timeout' is updated with
> > > the remaining time. Maybe that's the right behavior, but I just want to
 
> > Note that what the comment in the existing code says should apply here,
> > namely that the next recv (m or mmsg) syscall on this socket will return
> > what is in sock->sk->sk_err, that is the signal:
 
> ...
 
> > So, yes, the user _can_ process the packets already copied to userspace,
> > i.e. no packet loss, and then, on the next call, will receive the signal
> > notification.
 
> The application shouldn't need to see an EINTR response, any signal handler
> should be run when the system call returns to user (regardless of the
> system call result code).
> If that doesn't happen Linux is badly broken!
> >From an application point of view this is exactly the same as the signal
> occurring just before/after the kernel entry/exit for the system call.
> 
> The call should just return early with success status.
> No need to preserve the EINTR response for later.
> 
> The same might be appropriate for other errors - maybe including EFAULT
> copying non-initial messages to userspace.
> Put the message being processed back on the socket queue and return
> success with the (non-zero) partial message count.

We don't need to put anything back, if we get an EFAULT for a datagram,
then we stop processing that packet, _dropping_ it (and that is just
like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
and stop the batch, and if no datagrams were received, return the error
straight away.

But if some datagrams were successfully received, and at that point
_already_ removed from queues and sent successfully to userspace,
recvmmsg will return the number of successfully copied datagrams and
store the error so that it can return on the next syscall,

Please refer to the original discussion on how to report how many
successfully copied datagrams and also report that it stopped before the
timeout and the number of requested datagrams in a batch:

http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com

What is being discussed here is how to return the EFAULT that may happen
_after_ datagram processing, be it interrupted by an EFAULT, signal, or
plain returning all that was requested, with no errors.

This EFAULT _after_ datagram processing may happen when updating the
remaining timeout, because then how can userspace both receive the
number of successfully copied datagrams (in any of the cases mentioned
in the previous paragraph) and know that that timeout can't be used
because there was a problem while trying to copy it to userspace
(EFAULT)?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread David Laight

From: Arnaldo Carvalho de Melo
...
> > But, another question...
> >
> > In the case that the call is interrupted by a signal handler and some
> > datagrams have already been received, then the call succeeds, and
> > returns the number of datagrams received, and 'timeout' is updated with
> > the remaining time. Maybe that's the right behavior, but I just want to
> 
> Note that what the comment in the existing code says should apply here,
> namely that the next recv (m or mmsg) syscall on this socket will return
> what is in sock->sk->sk_err, that is the signal:
> 
...
> 
> So, yes, the user _can_ process the packets already copied to userspace,
> i.e. no packet loss, and then, on the next call, will receive the signal
> notification.

The application shouldn't need to see an EINTR response, any signal handler
should be run when the system call returns to user (regardless of the
system call result code).
If that doesn't happen Linux is badly broken!
>From an application point of view this is exactly the same as the signal
occurring just before/after the kernel entry/exit for the system call.

The call should just return early with success status.
No need to preserve the EINTR response for later.

The same might be appropriate for other errors - maybe including EFAULT
copying non-initial messages to userspace.
Put the message being processed back on the socket queue and return
success with the (non-zero) partial message count.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Arnaldo Carvalho de Melo

Em Wed, May 28, 2014 at 02:20:10PM +0200, Michael Kerrisk (man-pages) escreveu:
> On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> > attached goes the updated patch, and this is the
> > diff to the last combined one:
> > 
> > diff --git a/net/socket.c b/net/socket.c
> > index 310a50971769..379be43879db 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
> > __user *, mmsg,
> >  
> > datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys);
> >  
> > -   if (datagrams > 0 &&
> > -   copy_to_user(timeout, _sys, sizeof(timeout_sys)))
> > +   if (copy_to_user(timeout, _sys, sizeof(timeout_sys)))
> > datagrams = -EFAULT;
> >  
> > return datagrams;
> >  
> > --
> > 
> > This is a quick thing just to show where the problem lies, need to think
> > how to report an -EFAULT at this point properly, i.e. look at

Ok, so I can live with the way things were before this fix, i.e. if the
user specifies a timeout, then if it fails when copying to remaining
time to userspace (copy_to_user call above), then we return -EFAULT.

I.e. there would be no change in behaviour, but then perhaps we should
go with the interface that is in place when we received some datagrams
and then some error happens, see comment in the existing code, below:

> > __sys_recvmmsg for something related (returning the number of
> > successfully copied datagrams to userspace while storing the error for
> > subsequent reporting):
> > 
> > if (err == 0)
> > return datagrams;
> > 
> > if (datagrams != 0) {
> > /*
> >  * We may return less entries than requested (vlen) if
> >  * the sock is non block and there aren't enough
> >  * datagrams...
> >  */
> > if (err != -EAGAIN) {
> > /*
> >  * ... or  if recvmsg returns an error after we
> >  * received some datagrams, where we record the
> >  * error to return on the next call or if the
> >  * app asks about it using getsockopt(SO_ERROR).
> >  */
> > sock->sk->sk_err = -err;
> > }
> > 
> > return datagrams;
> > }
> > 
> > I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
> > more about it, sidetracked now, will be back to this.
> > 
> > Anyway, attached goes the current combined patch.
 
> So, I applied against net-next as you suggested offlist.
> Builds and generally tests fine. Some observations:
 
> * In the case that the call is interrupted by a signal handler and no
>   datagrams have been received, the call fails with EINTR, as expected.

Ok
 
> * The call always updates 'timeout', both in the success case and in the
>   EINTR case. (That seems fine.)

Agreed that it is how it should behave.
 
> But, another question...
> 
> In the case that the call is interrupted by a signal handler and some
> datagrams have already been received, then the call succeeds, and
> returns the number of datagrams received, and 'timeout' is updated with
> the remaining time. Maybe that's the right behavior, but I just want to

Note that what the comment in the existing code says should apply here,
namely that the next recv (m or mmsg) syscall on this socket will return
what is in sock->sk->sk_err, that is the signal:

  sys_recvmmsg()
  sock->ops->recvmsg() (e.g. inet_recvmsg)
  sk->prot->recvmsg() (e.g., udp_recvmsg)
  skb_recv_datagram()
  wait_for_more_packets()
  sock_intr_errno()
*err = -EINTR
  sock->sk->sk_err = err

Next recv will end up calling skb_recv_datagram and that does:

struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
int *peeked, int *off, int *err, long 
*timeop)
{
struct sk_buff *skb, *last;
long timeo;
/*
 * Caller is allowed not to check sk->sk_err before
 * skb_recv_datagram()
 */
int error = sock_error(sk);

if (error)
goto no_packet;

out:
if (timeop)
*timeop = timeo;
return NULL;

no_packet:
*err = error;
goto out;
}

So, yes, the user _can_ process the packets already copied to userspace,
i.e. no packet loss, and then, on the next call, will receive the signal
notification.

So, the user can just try the next call and see the signal, and it is
also possible to notice that the timeout didn't expire and less than
vlen packets were received, so something went wrong and calling
getsockopt(SO_ERROR) will clarify things.

This is not some new error reporting facility, it predates recvmmsg,
that merely uses it for consistency.

How to

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Michael Kerrisk (man-pages)

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
> escreveu:
>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>>  wrote:
>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
>>> escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> Can you try the attached patch on top of the first one?
>>>
 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this
>>>
>>> It was meant to highlight the changes with regard to the previous patch,
>>> i.e. to make things easier for reviewing.
>>
>> (I don't think that works...)
> 
> Lets try both then, 

That's better!

> attached goes the updated patch, and this is the
> diff to the last combined one:
> 
> diff --git a/net/socket.c b/net/socket.c
> index 310a50971769..379be43879db 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
> __user *, mmsg,
>  
>   datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys);
>  
> - if (datagrams > 0 &&
> - copy_to_user(timeout, _sys, sizeof(timeout_sys)))
> + if (copy_to_user(timeout, _sys, sizeof(timeout_sys)))
>   datagrams = -EFAULT;
>  
>   return datagrams;
>  
> --
> 
> This is a quick thing just to show where the problem lies, need to think
> how to report an -EFAULT at this point properly, i.e. look at
> __sys_recvmmsg for something related (returning the number of
> successfully copied datagrams to userspace while storing the error for
> subsequent reporting):
> 
> if (err == 0)
> return datagrams;
> 
> if (datagrams != 0) {
> /*
>  * We may return less entries than requested (vlen) if
>  * the
>  * sock is non block and there aren't enough
>  * datagrams...
>  */
> if (err != -EAGAIN) {
> /*
>  * ... or  if recvmsg returns an error after we
>  * received some datagrams, where we record the
>  * error to return on the next call or if the
>  * app asks about it using getsockopt(SO_ERROR).
>  */
> sock->sk->sk_err = -err;
> }
> 
> return datagrams;
> }
> 
> I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
> more about it, sidetracked now, will be back to this.
> 
> Anyway, attached goes the current combined patch.

So, I applied against net-next as you suggested offlist.
Builds and generally tests fine. Some observations:

* In the case that the call is interrupted by a signal handler and no
  datagrams have been received, the call fails with EINTR, as expected.

* The call always updates 'timeout', both in the success case and in the
  EINTR case. (That seems fine.)

But, another question...

In the case that the call is interrupted by a signal handler and some
datagrams have already been received, then the call succeeds, and
returns the number of datagrams received, and 'timeout' is updated with
the remaining time. Maybe that's the right behavior, but I just want to
check. There is at least one other possibility:

* Fetch no datagrams (i.e., the datagrams are left to receive in a
  future call), and the call fails with EINTR, and 'timeout' is updated.

Maybe that possibility is hard to implement (not sure). But my main point
is to make the current behavior clear, note the alternative, and ask:
is the current behavior the best choice. (I'm not saying it's not, but I
do want the choice to be a conscious one.)

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Chris Friesen


On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:


What is being discussed here is how to return the EFAULT that may happen
_after_ datagram processing, be it interrupted by an EFAULT, signal, or
plain returning all that was requested, with no errors.

This EFAULT _after_ datagram processing may happen when updating the
remaining timeout, because then how can userspace both receive the
number of successfully copied datagrams (in any of the cases mentioned
in the previous paragraph) and know that that timeout can't be used
because there was a problem while trying to copy it to userspace
(EFAULT)?



How does select() handle this problem?  It updates the timeout and also 
modifies other data.


Could we just check whether the timeout pointer is valid before doing 
anything else?  Of course we could still fault the page out while 
waiting for messages and then fail to fault it back in later, but that 
seems like a not-very-likely scenario.


Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread 'Arnaldo Carvalho de Melo'

Em Wed, May 28, 2014 at 03:33:51PM -0600, Chris Friesen escreveu:
 On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:
 
 What is being discussed here is how to return the EFAULT that may happen
 _after_ datagram processing, be it interrupted by an EFAULT, signal, or
 plain returning all that was requested, with no errors.

 This EFAULT _after_ datagram processing may happen when updating the
 remaining timeout, because then how can userspace both receive the
 number of successfully copied datagrams (in any of the cases mentioned
 in the previous paragraph) and know that that timeout can't be used
 because there was a problem while trying to copy it to userspace
 (EFAULT)?
 
 How does select() handle this problem?  It updates the timeout and also
 modifies other data.
 
 Could we just check whether the timeout pointer is valid before doing
 anything else?  Of course we could still fault the page out while waiting
 for messages and then fail to fault it back in later, but that seems like a
 not-very-likely scenario.

I'll check how select behaves, and yes, I think it is not-very-likely
and what we're doing now is reasonable for datagram protocols, i.e. to
return -EFAULT when updating the timeout fails, not reporting if packets
were successfully received, i.e. they end up being dropped, as
userspace can't easily figure out if some was received short of painting
it with some pattern and then checking the ones that aren't with that
pattern.

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Michael Kerrisk (man-pages)

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
 Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
 a...@ghostprotocols.net wrote:
 Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
 Can you try the attached patch on top of the first one?

 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this

 It was meant to highlight the changes with regard to the previous patch,
 i.e. to make things easier for reviewing.

 (I don't think that works...)
 
 Lets try both then, 

That's better!

 attached goes the updated patch, and this is the
 diff to the last combined one:
 
 diff --git a/net/socket.c b/net/socket.c
 index 310a50971769..379be43879db 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
 __user *, mmsg,
  
   datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys);
  
 - if (datagrams  0 
 - copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
 + if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
   datagrams = -EFAULT;
  
   return datagrams;
  
 --
 
 This is a quick thing just to show where the problem lies, need to think
 how to report an -EFAULT at this point properly, i.e. look at
 __sys_recvmmsg for something related (returning the number of
 successfully copied datagrams to userspace while storing the error for
 subsequent reporting):
 
 if (err == 0)
 return datagrams;
 
 if (datagrams != 0) {
 /*
  * We may return less entries than requested (vlen) if
  * the
  * sock is non block and there aren't enough
  * datagrams...
  */
 if (err != -EAGAIN) {
 /*
  * ... or  if recvmsg returns an error after we
  * received some datagrams, where we record the
  * error to return on the next call or if the
  * app asks about it using getsockopt(SO_ERROR).
  */
 sock-sk-sk_err = -err;
 }
 
 return datagrams;
 }
 
 I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
 more about it, sidetracked now, will be back to this.
 
 Anyway, attached goes the current combined patch.

So, I applied against net-next as you suggested offlist.
Builds and generally tests fine. Some observations:

* In the case that the call is interrupted by a signal handler and no
  datagrams have been received, the call fails with EINTR, as expected.

* The call always updates 'timeout', both in the success case and in the
  EINTR case. (That seems fine.)

But, another question...

In the case that the call is interrupted by a signal handler and some
datagrams have already been received, then the call succeeds, and
returns the number of datagrams received, and 'timeout' is updated with
the remaining time. Maybe that's the right behavior, but I just want to
check. There is at least one other possibility:

* Fetch no datagrams (i.e., the datagrams are left to receive in a
  future call), and the call fails with EINTR, and 'timeout' is updated.

Maybe that possibility is hard to implement (not sure). But my main point
is to make the current behavior clear, note the alternative, and ask:
is the current behavior the best choice. (I'm not saying it's not, but I
do want the choice to be a conscious one.)

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread Arnaldo Carvalho de Melo

Em Wed, May 28, 2014 at 02:20:10PM +0200, Michael Kerrisk (man-pages) escreveu:
 On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
  attached goes the updated patch, and this is the
  diff to the last combined one:
  
  diff --git a/net/socket.c b/net/socket.c
  index 310a50971769..379be43879db 100644
  --- a/net/socket.c
  +++ b/net/socket.c
  @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr 
  __user *, mmsg,
   
  datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys);
   
  -   if (datagrams  0 
  -   copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
  +   if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys)))
  datagrams = -EFAULT;
   
  return datagrams;
   
  --
  
  This is a quick thing just to show where the problem lies, need to think
  how to report an -EFAULT at this point properly, i.e. look at

Ok, so I can live with the way things were before this fix, i.e. if the
user specifies a timeout, then if it fails when copying to remaining
time to userspace (copy_to_user call above), then we return -EFAULT.

I.e. there would be no change in behaviour, but then perhaps we should
go with the interface that is in place when we received some datagrams
and then some error happens, see comment in the existing code, below:

  __sys_recvmmsg for something related (returning the number of
  successfully copied datagrams to userspace while storing the error for
  subsequent reporting):
  
  if (err == 0)
  return datagrams;
  
  if (datagrams != 0) {
  /*
   * We may return less entries than requested (vlen) if
   * the sock is non block and there aren't enough
   * datagrams...
   */
  if (err != -EAGAIN) {
  /*
   * ... or  if recvmsg returns an error after we
   * received some datagrams, where we record the
   * error to return on the next call or if the
   * app asks about it using getsockopt(SO_ERROR).
   */
  sock-sk-sk_err = -err;
  }
  
  return datagrams;
  }
  
  I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
  more about it, sidetracked now, will be back to this.
  
  Anyway, attached goes the current combined patch.
 
 So, I applied against net-next as you suggested offlist.
 Builds and generally tests fine. Some observations:
 
 * In the case that the call is interrupted by a signal handler and no
   datagrams have been received, the call fails with EINTR, as expected.

Ok
 
 * The call always updates 'timeout', both in the success case and in the
   EINTR case. (That seems fine.)

Agreed that it is how it should behave.
 
 But, another question...
 
 In the case that the call is interrupted by a signal handler and some
 datagrams have already been received, then the call succeeds, and
 returns the number of datagrams received, and 'timeout' is updated with
 the remaining time. Maybe that's the right behavior, but I just want to

Note that what the comment in the existing code says should apply here,
namely that the next recv (m or mmsg) syscall on this socket will return
what is in sock-sk-sk_err, that is the signal:

  sys_recvmmsg()
  sock-ops-recvmsg() (e.g. inet_recvmsg)
  sk-prot-recvmsg() (e.g., udp_recvmsg)
  skb_recv_datagram()
  wait_for_more_packets()
  sock_intr_errno()
*err = -EINTR
  sock-sk-sk_err = err

Next recv will end up calling skb_recv_datagram and that does:

struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
int *peeked, int *off, int *err, long 
*timeop)
{
struct sk_buff *skb, *last;
long timeo;
/*
 * Caller is allowed not to check sk-sk_err before
 * skb_recv_datagram()
 */
int error = sock_error(sk);

if (error)
goto no_packet;
SNIP
out:
if (timeop)
*timeop = timeo;
return NULL;

no_packet:
*err = error;
goto out;
}

So, yes, the user _can_ process the packets already copied to userspace,
i.e. no packet loss, and then, on the next call, will receive the signal
notification.

So, the user can just try the next call and see the signal, and it is
also possible to notice that the timeout didn't expire and less than
vlen packets were received, so something went wrong and calling
getsockopt(SO_ERROR) will clarify things.

This is not some new error reporting facility, it predates recvmmsg,
that merely uses it for consistency.

How to properly report the -EFAULT when copying the remaining timeout to
userspace is the special case here, with

RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread David Laight

From: Arnaldo Carvalho de Melo
...
  But, another question...
 
  In the case that the call is interrupted by a signal handler and some
  datagrams have already been received, then the call succeeds, and
  returns the number of datagrams received, and 'timeout' is updated with
  the remaining time. Maybe that's the right behavior, but I just want to
 
 Note that what the comment in the existing code says should apply here,
 namely that the next recv (m or mmsg) syscall on this socket will return
 what is in sock-sk-sk_err, that is the signal:
 
...
 
 So, yes, the user _can_ process the packets already copied to userspace,
 i.e. no packet loss, and then, on the next call, will receive the signal
 notification.

The application shouldn't need to see an EINTR response, any signal handler
should be run when the system call returns to user (regardless of the
system call result code).
If that doesn't happen Linux is badly broken!
From an application point of view this is exactly the same as the signal
occurring just before/after the kernel entry/exit for the system call.

The call should just return early with success status.
No need to preserve the EINTR response for later.

The same might be appropriate for other errors - maybe including EFAULT
copying non-initial messages to userspace.
Put the message being processed back on the socket queue and return
success with the (non-zero) partial message count.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-28 Thread 'Arnaldo Carvalho de Melo'

Em Wed, May 28, 2014 at 03:17:40PM +, David Laight escreveu:
 From: Arnaldo Carvalho de Melo
 ...
   But, another question...
  
   In the case that the call is interrupted by a signal handler and some
   datagrams have already been received, then the call succeeds, and
   returns the number of datagrams received, and 'timeout' is updated with
   the remaining time. Maybe that's the right behavior, but I just want to
 
  Note that what the comment in the existing code says should apply here,
  namely that the next recv (m or mmsg) syscall on this socket will return
  what is in sock-sk-sk_err, that is the signal:
 
 ...
 
  So, yes, the user _can_ process the packets already copied to userspace,
  i.e. no packet loss, and then, on the next call, will receive the signal
  notification.
 
 The application shouldn't need to see an EINTR response, any signal handler
 should be run when the system call returns to user (regardless of the
 system call result code).
 If that doesn't happen Linux is badly broken!
 From an application point of view this is exactly the same as the signal
 occurring just before/after the kernel entry/exit for the system call.
 
 The call should just return early with success status.
 No need to preserve the EINTR response for later.
 
 The same might be appropriate for other errors - maybe including EFAULT
 copying non-initial messages to userspace.
 Put the message being processed back on the socket queue and return
 success with the (non-zero) partial message count.

We don't need to put anything back, if we get an EFAULT for a datagram,
then we stop processing that packet, _dropping_ it (and that is just
like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
and stop the batch, and if no datagrams were received, return the error
straight away.

But if some datagrams were successfully received, and at that point
_already_ removed from queues and sent successfully to userspace,
recvmmsg will return the number of successfully copied datagrams and
store the error so that it can return on the next syscall,

Please refer to the original discussion on how to report how many
successfully copied datagrams and also report that it stopped before the
timeout and the number of requested datagrams in a batch:

http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com

What is being discussed here is how to return the EFAULT that may happen
_after_ datagram processing, be it interrupted by an EFAULT, signal, or
plain returning all that was requested, with no errors.

This EFAULT _after_ datagram processing may happen when updating the
remaining timeout, because then how can userspace both receive the
number of successfully copied datagrams (in any of the cases mentioned
in the previous paragraph) and know that that timeout can't be used
because there was a problem while trying to copy it to userspace
(EFAULT)?

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
> escreveu:
>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>>  wrote:
>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
>>> escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> Can you try the attached patch on top of the first one?
>>>
 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this
>>>
>>> It was meant to highlight the changes with regard to the previous patch,
>>> i.e. to make things easier for reviewing.
>>
>> (I don't think that works...)
> 
> Lets try both then, attached goes the updated patch, and this is the
> diff to the last combined one:

What tree does this apply to? I tried applying to 3.15-rc7, but a piece 
was rejected, and the fix was not obvious.

Cheers,

Michael


drivers/net/tun.c.rej

--- drivers/net/tun.c
+++ drivers/net/tun.c
@@ -1343,7 +1343,7 @@
 
/* Read frames from queue */
skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
- , , );
+ , , , timeop);
if (skb) {
ret = tun_put_user(tun, tfile, skb, iv, len);
kfree_skb(skb);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
 wrote:
> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
> escreveu:
>> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>> > Can you try the attached patch on top of the first one?
>
>> Patches on patches is a way to make your testers work unnecessarily
>> harder. Also, it means that anyone else who was interested in this
>
> It was meant to highlight the changes with regard to the previous patch,
> i.e. to make things easier for reviewing.

(I don't think that works...)

>> thread likely got lost at this point, because they probably didn't
>> save the first patch. All of this to say: it makes life much easier
>> if you provide a complete new self-contained patch on each iteration.
>
> If you prefer it that way, find one attached, that I was about to send
> (but you can wait till I use your program to test it ;-) )
>
>> > It starts adding explicit parentheses on a ternary, as David requested,
>> > and then should return the remaining timeouts in cases like signals,
>> > etc.
>> >
>> > Please let me know if this is enough.
>>
>> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)
>
> What was the problem experienced?

The problem is that after EINTR, the timeout is not updated with the
remaining time until expiry. (This was true with just patch 1 applied,
and is also true with both patch 1 and patch 2 applied.)

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Arnaldo Carvalho de Melo

Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> > Can you try the attached patch on top of the first one?
 
> Patches on patches is a way to make your testers work unnecessarily
> harder. Also, it means that anyone else who was interested in this

It was meant to highlight the changes with regard to the previous patch,
i.e. to make things easier for reviewing.

> thread likely got lost at this point, because they probably didn't 
> save the first patch. All of this to say: it makes life much easier 
> if you provide a complete new self-contained patch on each iteration.

If you prefer it that way, find one attached, that I was about to send
(but you can wait till I use your program to test it ;-) )
 
> > It starts adding explicit parentheses on a ternary, as David requested,
> > and then should return the remaining timeouts in cases like signals,
> > etc.
> > 
> > Please let me know if this is enough.
> 
> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

What was the problem experienced?
 
> > P.S. compile testing while sending this message :-)
> 
> Okay -- how about some real testing for the next version ;-). I've appended

Hey, you were provinding that real testing! thanks for that! :-)

> my test program below. You can use it as follows:
> 
> ./t_recvmmsg   ...
> 
> (The timeout can also be '-' meaning use NULL as the timeout argument.)

Thanks for the test proggie, will use it.
 
> Cheers,
> 
> Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) 
>> escreveu:
>>> Thanks! I applied this patch against 3.15-rc6.
> 
>>> recvmmsg() now (mostly) does what I expect: 
>>> * it waits until either the timeout expires or vlen messages 
>>>   have been received
>>> * If no message is received before timeout, it returns -1/EAGAIN.
>>> * If vlen messages are received before the timeout expires, then
>>>   the remaining time is returned in timeout.
> 
>>> One question: in the event that the call is interrupted by a signal 
>>> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
>>> not updated with the remaining time on the timer. Would it be desirable 
>>> to emulate the behavior of select() (and other syscalls) in this 
>>> respect, and instead return the remaining time if interrupted by 
>>> a signal?
>  
>> I think so, will check how to achieve that!
> 
> Can you try the attached patch on top of the first one?

Patches on patches is a way to make your testers work unnecessarily
harder. Also, it means that anyone else who was interested in this
thread likely got lost at this point, because they probably didn't 
save the first patch. All of this to say: it makes life much easier 
if you provide a complete new self-contained patch on each iteration.

> It starts adding explicit parentheses on a ternary, as David requested,
> and then should return the remaining timeouts in cases like signals,
> etc.
> 
> Please let me know if this is enough.

Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

> P.S. compile testing while sending this message :-)

Okay -- how about some real testing for the next version ;-). I've appended
my test program below. You can use it as follows:

./t_recvmmsg   ...

(The timeout can also be '-' meaning use NULL as the timeout argument.)

Cheers,

Michael


/* t_recvmmsg.c

   A simple test program for the Linux-specific recvmmsg() system call.
*/
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
} while (0)


static int  /* Public interfaces: inetBind() and inetListen() */
createBoundSocket(const char *service, int type, socklen_t *addrlen)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
int sfd, optval, s;

memset(, 0, sizeof(struct addrinfo));
hints.ai_canonname = NULL;
hints.ai_addr = NULL;
hints.ai_next = NULL;
hints.ai_socktype = type;
hints.ai_family = AF_UNSPEC;/* Allows IPv4 or IPv6 */
hints.ai_flags = AI_PASSIVE;/* Use wildcard IP address */

s = getaddrinfo(NULL, service, , );
if (s != 0)
return -1;

/* Walk through returned list until we find an address structure
   that can be used to successfully create and bind a socket */

optval = 1;
for (rp = result; rp != NULL; rp = rp->ai_next) {
sfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
if (sfd == -1)
continue;   /* On error, try next address */

if (bind(sfd, rp->ai_addr, rp->ai_addrlen) == 0)
break;  /* Success */

/* bind() failed: close this socket and try next address */

close(sfd);
}

if (rp != NULL && addrlen != NULL)
*addrlen = rp->ai_addrlen;  /* Return address structure size */

freeaddrinfo(result);

return (rp == NULL) ? -1 : sfd;
}


static void
handler()
{
/* Just interrupt a syscall */
}


int
main(int argc, char *argv[])
{
int sfd, vlen, j, s;
struct mmsghdr *msgvecp;
struct timespec ts;
struct timespec *tsp;
struct sigaction sa;

if (argc < 4) {
fprintf(stderr, "Usage: %s port tmo-secs buf-len...\n", argv[0]);
exit(EXIT_FAILURE);
}

sfd = createBoundSocket(argv[1], SOCK_DGRAM, NULL);
if (sfd == -1) {
fprintf(stderr, "Could not create server socket (%s)",
strerror(errno));
exit(EXIT_FAILURE);
}

/* Handle a signal, so we can test behaviour when recvmmsg()
   is interrupted by a signal */

sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset(_mask);
if (sigaction (SIGQUIT, , NULL) == -1)
errExit("sigaction");

/* argv[2] specifies recvmmsg() timeout in seconds, or is '-', meaning
   using NULL argument to get infinite timeout */

if (argv[2][0] == '-') {
tsp = NULL;

} else {
ts.tv_sec = atoi(argv[2]);
ts.tv_nsec = 0;
tsp = 
}

/* Remaining command-line arguments specify the size of recvmmsg()
   buffers */

/* The second argument to recvmmsg() is a pointer to an array of
   mmsghdr structures. Each

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
 Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 Thanks! I applied this patch against 3.15-rc6.
 
 recvmmsg() now (mostly) does what I expect: 
 * it waits until either the timeout expires or vlen messages 
   have been received
 * If no message is received before timeout, it returns -1/EAGAIN.
 * If vlen messages are received before the timeout expires, then
   the remaining time is returned in timeout.
 
 One question: in the event that the call is interrupted by a signal 
 handler, it fails (as expected) with EINTR, but the 'timeout' value is 
 not updated with the remaining time on the timer. Would it be desirable 
 to emulate the behavior of select() (and other syscalls) in this 
 respect, and instead return the remaining time if interrupted by 
 a signal?
  
 I think so, will check how to achieve that!
 
 Can you try the attached patch on top of the first one?

Patches on patches is a way to make your testers work unnecessarily
harder. Also, it means that anyone else who was interested in this
thread likely got lost at this point, because they probably didn't 
save the first patch. All of this to say: it makes life much easier 
if you provide a complete new self-contained patch on each iteration.

 It starts adding explicit parentheses on a ternary, as David requested,
 and then should return the remaining timeouts in cases like signals,
 etc.
 
 Please let me know if this is enough.

Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

 P.S. compile testing while sending this message :-)

Okay -- how about some real testing for the next version ;-). I've appended
my test program below. You can use it as follows:

./t_recvmmsg port timeout-in-secs bufsize...

(The timeout can also be '-' meaning use NULL as the timeout argument.)

Cheers,

Michael


/* t_recvmmsg.c

   A simple test program for the Linux-specific recvmmsg() system call.
*/
#define _GNU_SOURCE
#include sys/time.h
#include signal.h
#include sys/socket.h
#include netdb.h
#include sys/types.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include string.h
#include errno.h


#define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
} while (0)


static int  /* Public interfaces: inetBind() and inetListen() */
createBoundSocket(const char *service, int type, socklen_t *addrlen)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
int sfd, optval, s;

memset(hints, 0, sizeof(struct addrinfo));
hints.ai_canonname = NULL;
hints.ai_addr = NULL;
hints.ai_next = NULL;
hints.ai_socktype = type;
hints.ai_family = AF_UNSPEC;/* Allows IPv4 or IPv6 */
hints.ai_flags = AI_PASSIVE;/* Use wildcard IP address */

s = getaddrinfo(NULL, service, hints, result);
if (s != 0)
return -1;

/* Walk through returned list until we find an address structure
   that can be used to successfully create and bind a socket */

optval = 1;
for (rp = result; rp != NULL; rp = rp-ai_next) {
sfd = socket(rp-ai_family, rp-ai_socktype, rp-ai_protocol);
if (sfd == -1)
continue;   /* On error, try next address */

if (bind(sfd, rp-ai_addr, rp-ai_addrlen) == 0)
break;  /* Success */

/* bind() failed: close this socket and try next address */

close(sfd);
}

if (rp != NULL  addrlen != NULL)
*addrlen = rp-ai_addrlen;  /* Return address structure size */

freeaddrinfo(result);

return (rp == NULL) ? -1 : sfd;
}


static void
handler()
{
/* Just interrupt a syscall */
}


int
main(int argc, char *argv[])
{
int sfd, vlen, j, s;
struct mmsghdr *msgvecp;
struct timespec ts;
struct timespec *tsp;
struct sigaction sa;

if (argc  4) {
fprintf(stderr, Usage: %s port tmo-secs buf-len...\n, argv[0]);
exit(EXIT_FAILURE);
}

sfd = createBoundSocket(argv[1], SOCK_DGRAM, NULL);
if (sfd == -1) {
fprintf(stderr, Could not create server socket (%s),
strerror(errno));
exit(EXIT_FAILURE);
}

/* Handle a signal, so we can test behaviour when recvmmsg()
   is interrupted by a signal */

sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset(sa.sa_mask);
if (sigaction (SIGQUIT, sa, NULL) == -1)
errExit(sigaction);

/* argv[2] specifies recvmmsg() timeout in seconds, or is '-', meaning
   using NULL argument to get infinite timeout */

if (argv[2][0] == '-') {
tsp = NULL;

} else {
ts.tv_sec = atoi(argv[2]);
ts.tv_nsec = 0;
tsp = ts;
}

/* Remaining command-line arguments specify the size of recvmmsg()
   buffers */

/* The second argument to recvmmsg()

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Arnaldo Carvalho de Melo

Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
  Can you try the attached patch on top of the first one?
 
 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this

It was meant to highlight the changes with regard to the previous patch,
i.e. to make things easier for reviewing.

 thread likely got lost at this point, because they probably didn't 
 save the first patch. All of this to say: it makes life much easier 
 if you provide a complete new self-contained patch on each iteration.

If you prefer it that way, find one attached, that I was about to send
(but you can wait till I use your program to test it ;-) )
 
  It starts adding explicit parentheses on a ternary, as David requested,
  and then should return the remaining timeouts in cases like signals,
  etc.
  
  Please let me know if this is enough.
 
 Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

What was the problem experienced?
 
  P.S. compile testing while sending this message :-)
 
 Okay -- how about some real testing for the next version ;-). I've appended

Hey, you were provinding that real testing! thanks for that! :-)

 my test program below. You can use it as follows:
 
 ./t_recvmmsg port timeout-in-secs bufsize...
 
 (The timeout can also be '-' meaning use NULL as the timeout argument.)

Thanks for the test proggie, will use it.
 
 Cheers,
 
 Michael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
a...@ghostprotocols.net wrote:
 Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
  Can you try the attached patch on top of the first one?

 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this

 It was meant to highlight the changes with regard to the previous patch,
 i.e. to make things easier for reviewing.

(I don't think that works...)

 thread likely got lost at this point, because they probably didn't
 save the first patch. All of this to say: it makes life much easier
 if you provide a complete new self-contained patch on each iteration.

 If you prefer it that way, find one attached, that I was about to send
 (but you can wait till I use your program to test it ;-) )

  It starts adding explicit parentheses on a ternary, as David requested,
  and then should return the remaining timeouts in cases like signals,
  etc.
 
  Please let me know if this is enough.

 Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

 What was the problem experienced?

The problem is that after EINTR, the timeout is not updated with the
remaining time until expiry. (This was true with just patch 1 applied,
and is also true with both patch 1 and patch 2 applied.)

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
 Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
 a...@ghostprotocols.net wrote:
 Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
 Can you try the attached patch on top of the first one?

 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this

 It was meant to highlight the changes with regard to the previous patch,
 i.e. to make things easier for reviewing.

 (I don't think that works...)
 
 Lets try both then, attached goes the updated patch, and this is the
 diff to the last combined one:

What tree does this apply to? I tried applying to 3.15-rc7, but a piece 
was rejected, and the fix was not obvious.

Cheers,

Michael


drivers/net/tun.c.rej

--- drivers/net/tun.c
+++ drivers/net/tun.c
@@ -1343,7 +1343,7 @@
 
/* Read frames from queue */
skb = __skb_recv_datagram(tfile-socket.sk, noblock ? MSG_DONTWAIT : 0,
- peeked, off, err);
+ peeked, off, err, timeop);
if (skb) {
ret = tun_put_user(tun, tfile, skb, iv, len);
kfree_skb(skb);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-26 Thread Arnaldo Carvalho de Melo

Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) 
> escreveu:
> > Thanks! I applied this patch against 3.15-rc6.

> > recvmmsg() now (mostly) does what I expect: 
> > * it waits until either the timeout expires or vlen messages 
> >   have been received
> > * If no message is received before timeout, it returns -1/EAGAIN.
> > * If vlen messages are received before the timeout expires, then
> >   the remaining time is returned in timeout.

> > One question: in the event that the call is interrupted by a signal 
> > handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> > not updated with the remaining time on the timer. Would it be desirable 
> > to emulate the behavior of select() (and other syscalls) in this 
> > respect, and instead return the remaining time if interrupted by 
> > a signal?
 
> I think so, will check how to achieve that!

Can you try the attached patch on top of the first one?

It starts adding explicit parentheses on a ternary, as David requested,
and then should return the remaining timeouts in cases like signals,
etc.

Please let me know if this is enough.

- Arnaldo

P.S. compile testing while sending this message :-)
diff --git a/include/net/sock.h b/include/net/sock.h
index aef3d7f9c3fa..c48f61c79801 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2106,7 +2106,7 @@ static inline long sock_rcvtimeo(const struct sock *sk, 
bool noblock)
 
 static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool 
noblock)
 {
-   return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
+   return noblock ? 0 : (timeop ? *timeop : sk->sk_rcvtimeo);
 }
 
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a08c4c9dcd23..0dd1715374fa 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -224,12 +224,14 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, 
unsigned int flags,
goto no_packet;
 
} while (!wait_for_more_packets(sk, err, , last));
-
+out:
+   if (timeop)
+   *timeop = timeo;
return NULL;
 
 no_packet:
*err = error;
-   return NULL;
+   goto out;
 }
 EXPORT_SYMBOL(__skb_recv_datagram);
 
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index feaacaa0c970..0991da69f39d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, 
struct socket *sock,
 
finish_wait(sk_sleep(sk), );
 
-   if (err)
-   return err;
+   if (err) {
+   copied = err;
+   break;
+   }
if (sk->sk_shutdown & RCV_SHUTDOWN)
break;
 
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e8b8bb3d50ab..e9082ed598cd 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
release_sock(>sk);
if (continue_call)
rxrpc_put_call(continue_call);
-   return -ENODATA;
+   copied = -ENODATA;
+   goto out_copied;
}
}
 
@@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
release_sock(>sk);
rxrpc_put_call(continue_call);
_leave(" = %d [noncont]", copied);
-   return copied;
+   goto out_copied;
}
}
 
@@ -251,9 +252,10 @@ out:
rxrpc_put_call(call);
if (continue_call)
rxrpc_put_call(continue_call);
+   _leave(" = %d [data]", copied);
+out_copied:
if (timeop)
*timeop = timeo;
-   _leave(" = %d [data]", copied);
return copied;
 
/* handle non-DATA messages such as aborts, incoming connections and
@@ -330,7 +332,8 @@ terminal_message:
if (continue_call)
rxrpc_put_call(continue_call);
_leave(" = %d", ret);
-   return ret;
+   copied = ret;
+   goto out_copied;
 
 copy_error:
_debug("copy error");
@@ -339,7 +342,8 @@ copy_error:
if (continue_call)
rxrpc_put_call(continue_call);
_leave(" = %d", ret);
-   return ret;
+   copied = ret;
+   goto out_copied;
 
 wait_interrupted:
ret = sock_intr_errno(timeo);
@@ -350,8 +354,7 @@ wait_error:
if (copied)
copied = ret;
_leave(" = %d [waitfail

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-26 Thread Arnaldo Carvalho de Melo

Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
> 
> On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
> > Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
> >> escreveu:
> >>> Hi Arnaldo,
> >  
> >>> Ping!
> > 
> >> I acknowledge the problem, the timeout has to be passed to the
> >> underlying ->recvmsg() implementations that should return the time spent
> >> waiting for each packet, so that we can accrue that at recvmmsg level.
> >  
> >> We can do either passing an extra timeout parameter to the recvmsg
> >> implementations or using some struct sock member to specify that
> >> timeout.
> >  
> >> The first approach is intrusive, touches tons of files, so I'll try
> >> making it all mostly transparent by hooking into sock_rcvtimeo()
> >> somehow.
> > 
> > But after thinking a bit more, looks like we need to do that, please
> > take a look at the attached patch to see if it addresses the problem.
> > 
> > Mostly it adds a new timeop to the per protocol recvmsg()
> > implementations, that, if not NULL, should be used instead of
> > SO_RCVTIMEO.
> > 
> > since the underlying recvmsg implementations already check that timeout,
> > return what is remaining, that will then be used in subsequent recvmsg
> > calls, at the end we just convert it back to timespec format.
> > 
> > In most cases it is just passed to skb_recv_datagram, that will check
> > the pointer, use it and update if not NULL.
> > 
> > Should have no problems, but I only did a boot with a system with this
> > patch applied, no problems noticed on a normal desktop session, ssh,
> > etc.
> 
> Thanks! I applied this patch against 3.15-rc6.
> 
> recvmmsg() now (mostly) does what I expect: 
> * it waits until either the timeout expires or vlen messages 
>   have been received
> * If no message is received before timeout, it returns -1/EAGAIN.
> * If vlen messages are received before the timeout expires, then
>   the remaining time is returned in timeout.
> 
> One question: in the event that the call is interrupted by a signal 
> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> not updated with the remaining time on the timer. Would it be desirable 
> to emulate the behavior of select() (and other syscalls) in this 
> respect, and instead return the remaining time if interrupted by 
> a signal?

I think so, will check how to achieve that!
 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-26 Thread Arnaldo Carvalho de Melo

Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu:
 Hi Arnaldo,
 
 On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
  Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
  Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
  escreveu:
  Hi Arnaldo,
   
  Ping!
  
  I acknowledge the problem, the timeout has to be passed to the
  underlying -recvmsg() implementations that should return the time spent
  waiting for each packet, so that we can accrue that at recvmmsg level.
   
  We can do either passing an extra timeout parameter to the recvmsg
  implementations or using some struct sock member to specify that
  timeout.
   
  The first approach is intrusive, touches tons of files, so I'll try
  making it all mostly transparent by hooking into sock_rcvtimeo()
  somehow.
  
  But after thinking a bit more, looks like we need to do that, please
  take a look at the attached patch to see if it addresses the problem.
  
  Mostly it adds a new timeop to the per protocol recvmsg()
  implementations, that, if not NULL, should be used instead of
  SO_RCVTIMEO.
  
  since the underlying recvmsg implementations already check that timeout,
  return what is remaining, that will then be used in subsequent recvmsg
  calls, at the end we just convert it back to timespec format.
  
  In most cases it is just passed to skb_recv_datagram, that will check
  the pointer, use it and update if not NULL.
  
  Should have no problems, but I only did a boot with a system with this
  patch applied, no problems noticed on a normal desktop session, ssh,
  etc.
 
 Thanks! I applied this patch against 3.15-rc6.
 
 recvmmsg() now (mostly) does what I expect: 
 * it waits until either the timeout expires or vlen messages 
   have been received
 * If no message is received before timeout, it returns -1/EAGAIN.
 * If vlen messages are received before the timeout expires, then
   the remaining time is returned in timeout.
 
 One question: in the event that the call is interrupted by a signal 
 handler, it fails (as expected) with EINTR, but the 'timeout' value is 
 not updated with the remaining time on the timer. Would it be desirable 
 to emulate the behavior of select() (and other syscalls) in this 
 respect, and instead return the remaining time if interrupted by 
 a signal?

I think so, will check how to achieve that!
 
 Cheers,
 
 Michael
 
 -- 
 Michael Kerrisk
 Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
 Linux/UNIX System Programming Training: http://man7.org/training/
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-26 Thread Arnaldo Carvalho de Melo

Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
  Thanks! I applied this patch against 3.15-rc6.

  recvmmsg() now (mostly) does what I expect: 
  * it waits until either the timeout expires or vlen messages 
have been received
  * If no message is received before timeout, it returns -1/EAGAIN.
  * If vlen messages are received before the timeout expires, then
the remaining time is returned in timeout.

  One question: in the event that the call is interrupted by a signal 
  handler, it fails (as expected) with EINTR, but the 'timeout' value is 
  not updated with the remaining time on the timer. Would it be desirable 
  to emulate the behavior of select() (and other syscalls) in this 
  respect, and instead return the remaining time if interrupted by 
  a signal?
 
 I think so, will check how to achieve that!

Can you try the attached patch on top of the first one?

It starts adding explicit parentheses on a ternary, as David requested,
and then should return the remaining timeouts in cases like signals,
etc.

Please let me know if this is enough.

- Arnaldo

P.S. compile testing while sending this message :-)
diff --git a/include/net/sock.h b/include/net/sock.h
index aef3d7f9c3fa..c48f61c79801 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2106,7 +2106,7 @@ static inline long sock_rcvtimeo(const struct sock *sk, 
bool noblock)
 
 static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool 
noblock)
 {
-   return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo;
+   return noblock ? 0 : (timeop ? *timeop : sk-sk_rcvtimeo);
 }
 
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a08c4c9dcd23..0dd1715374fa 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -224,12 +224,14 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, 
unsigned int flags,
goto no_packet;
 
} while (!wait_for_more_packets(sk, err, timeo, last));
-
+out:
+   if (timeop)
+   *timeop = timeo;
return NULL;
 
 no_packet:
*err = error;
-   return NULL;
+   goto out;
 }
 EXPORT_SYMBOL(__skb_recv_datagram);
 
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index feaacaa0c970..0991da69f39d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, 
struct socket *sock,
 
finish_wait(sk_sleep(sk), wait);
 
-   if (err)
-   return err;
+   if (err) {
+   copied = err;
+   break;
+   }
if (sk-sk_shutdown  RCV_SHUTDOWN)
break;
 
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e8b8bb3d50ab..e9082ed598cd 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
release_sock(rx-sk);
if (continue_call)
rxrpc_put_call(continue_call);
-   return -ENODATA;
+   copied = -ENODATA;
+   goto out_copied;
}
}
 
@@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
release_sock(rx-sk);
rxrpc_put_call(continue_call);
_leave( = %d [noncont], copied);
-   return copied;
+   goto out_copied;
}
}
 
@@ -251,9 +252,10 @@ out:
rxrpc_put_call(call);
if (continue_call)
rxrpc_put_call(continue_call);
+   _leave( = %d [data], copied);
+out_copied:
if (timeop)
*timeop = timeo;
-   _leave( = %d [data], copied);
return copied;
 
/* handle non-DATA messages such as aborts, incoming connections and
@@ -330,7 +332,8 @@ terminal_message:
if (continue_call)
rxrpc_put_call(continue_call);
_leave( = %d, ret);
-   return ret;
+   copied = ret;
+   goto out_copied;
 
 copy_error:
_debug(copy error);
@@ -339,7 +342,8 @@ copy_error:
if (continue_call)
rxrpc_put_call(continue_call);
_leave( = %d, ret);
-   return ret;
+   copied = ret;
+   goto out_copied;
 
 wait_interrupted:
ret = sock_intr_errno(timeo);
@@ -350,8 +354,7 @@ wait_error:
if (copied)
copied = ret;
_leave( = %d [waitfail %d], copied, ret);
-

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-24 Thread Michael Kerrisk (man-pages)

Ping!

On 05/22/2014 04:27 PM, Michael Kerrisk (man-pages) wrote:
> Hi Arnaldo,
> 
> On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
>> Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
>>> escreveu:
 Hi Arnaldo,
>>  
 Ping!
>>
>>> I acknowledge the problem, the timeout has to be passed to the
>>> underlying ->recvmsg() implementations that should return the time spent
>>> waiting for each packet, so that we can accrue that at recvmmsg level.
>>  
>>> We can do either passing an extra timeout parameter to the recvmsg
>>> implementations or using some struct sock member to specify that
>>> timeout.
>>  
>>> The first approach is intrusive, touches tons of files, so I'll try
>>> making it all mostly transparent by hooking into sock_rcvtimeo()
>>> somehow.
>>
>> But after thinking a bit more, looks like we need to do that, please
>> take a look at the attached patch to see if it addresses the problem.
>>
>> Mostly it adds a new timeop to the per protocol recvmsg()
>> implementations, that, if not NULL, should be used instead of
>> SO_RCVTIMEO.
>>
>> since the underlying recvmsg implementations already check that timeout,
>> return what is remaining, that will then be used in subsequent recvmsg
>> calls, at the end we just convert it back to timespec format.
>>
>> In most cases it is just passed to skb_recv_datagram, that will check
>> the pointer, use it and update if not NULL.
>>
>> Should have no problems, but I only did a boot with a system with this
>> patch applied, no problems noticed on a normal desktop session, ssh,
>> etc.
> 
> Thanks! I applied this patch against 3.15-rc6.
> 
> recvmmsg() now (mostly) does what I expect: 
> * it waits until either the timeout expires or vlen messages 
>   have been received
> * If no message is received before timeout, it returns -1/EAGAIN.
> * If vlen messages are received before the timeout expires, then
>   the remaining time is returned in timeout.
> 
> One question: in the event that the call is interrupted by a signal 
> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> not updated with the remaining time on the timer. Would it be desirable 
> to emulate the behavior of select() (and other syscalls) in this 
> respect, and instead return the remaining time if interrupted by 
> a signal?
> 
> Cheers,
> 
> Michael
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-24 Thread Michael Kerrisk (man-pages)

On 05/23/2014 09:55 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
>> From: Arnaldo Carvalho de Melo 
>> Date: Wed, 21 May 2014 18:05:35 -0300
> 
>>> But after thinking a bit more, looks like we need to do that, please
>>> take a look at the attached patch to see if it addresses the problem.
> 
>>> Mostly it adds a new timeop to the per protocol recvmsg()
>>> implementations, that, if not NULL, should be used instead of
>>> SO_RCVTIMEO.
> 
>>> since the underlying recvmsg implementations already check that timeout,
>>> return what is remaining, that will then be used in subsequent recvmsg
>>> calls, at the end we just convert it back to timespec format.
> 
>>> In most cases it is just passed to skb_recv_datagram, that will check
>>> the pointer, use it and update if not NULL.
> 
>>> Should have no problems, but I only did a boot with a system with this
>>> patch applied, no problems noticed on a normal desktop session, ssh,
>>> etc.
>  
>> This looks fine to me, but I have a small request:
>  
>> +return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
>  
>> I keep forgetting which way these expressions associate, so if you could
>> parenthesize the innermost ?: I'd appreciate it. :)
> 
> Ok, I actually wrote a sample program to verify that these ternaries did
> what I meant 8)
> 
> I'll finish the cset log and do this clarification change.
> 
> Would be great to get Acked-by tags from the original reporter, Michael
> and whoever had a look at this change, if possible. Michael, Elie?

Arnaldo, I already sent you a reply (will reping on that one),
but got no response. My light testing got the expected results,
but I still had one question about the semantics.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-24 Thread Michael Kerrisk (man-pages)

On 05/23/2014 09:55 PM, Arnaldo Carvalho de Melo wrote:
 Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
 From: Arnaldo Carvalho de Melo a...@kernel.org
 Date: Wed, 21 May 2014 18:05:35 -0300

 But after thinking a bit more, looks like we need to do that, please
 take a look at the attached patch to see if it addresses the problem.

 Mostly it adds a new timeop to the per protocol recvmsg()
 implementations, that, if not NULL, should be used instead of
 SO_RCVTIMEO.

 since the underlying recvmsg implementations already check that timeout,
 return what is remaining, that will then be used in subsequent recvmsg
 calls, at the end we just convert it back to timespec format.

 In most cases it is just passed to skb_recv_datagram, that will check
 the pointer, use it and update if not NULL.

 Should have no problems, but I only did a boot with a system with this
 patch applied, no problems noticed on a normal desktop session, ssh,
 etc.

 This looks fine to me, but I have a small request:

 +return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo;

 I keep forgetting which way these expressions associate, so if you could
 parenthesize the innermost ?: I'd appreciate it. :)

 Ok, I actually wrote a sample program to verify that these ternaries did
 what I meant 8)

 I'll finish the cset log and do this clarification change.

 Would be great to get Acked-by tags from the original reporter, Michael
 and whoever had a look at this change, if possible. Michael, Elie?

Arnaldo, I already sent you a reply (will reping on that one),
but got no response. My light testing got the expected results,
but I still had one question about the semantics.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-24 Thread Michael Kerrisk (man-pages)

Ping!

On 05/22/2014 04:27 PM, Michael Kerrisk (man-pages) wrote:
 Hi Arnaldo,
 
 On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
 Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 Hi Arnaldo,
  
 Ping!

 I acknowledge the problem, the timeout has to be passed to the
 underlying -recvmsg() implementations that should return the time spent
 waiting for each packet, so that we can accrue that at recvmmsg level.
  
 We can do either passing an extra timeout parameter to the recvmsg
 implementations or using some struct sock member to specify that
 timeout.
  
 The first approach is intrusive, touches tons of files, so I'll try
 making it all mostly transparent by hooking into sock_rcvtimeo()
 somehow.

 But after thinking a bit more, looks like we need to do that, please
 take a look at the attached patch to see if it addresses the problem.

 Mostly it adds a new timeop to the per protocol recvmsg()
 implementations, that, if not NULL, should be used instead of
 SO_RCVTIMEO.

 since the underlying recvmsg implementations already check that timeout,
 return what is remaining, that will then be used in subsequent recvmsg
 calls, at the end we just convert it back to timespec format.

 In most cases it is just passed to skb_recv_datagram, that will check
 the pointer, use it and update if not NULL.

 Should have no problems, but I only did a boot with a system with this
 patch applied, no problems noticed on a normal desktop session, ssh,
 etc.
 
 Thanks! I applied this patch against 3.15-rc6.
 
 recvmmsg() now (mostly) does what I expect: 
 * it waits until either the timeout expires or vlen messages 
   have been received
 * If no message is received before timeout, it returns -1/EAGAIN.
 * If vlen messages are received before the timeout expires, then
   the remaining time is returned in timeout.
 
 One question: in the event that the call is interrupted by a signal 
 handler, it fails (as expected) with EINTR, but the 'timeout' value is 
 not updated with the remaining time on the timer. Would it be desirable 
 to emulate the behavior of select() (and other syscalls) in this 
 respect, and instead return the remaining time if interrupted by 
 a signal?
 
 Cheers,
 
 Michael
 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-23 Thread Arnaldo Carvalho de Melo

Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
> From: Arnaldo Carvalho de Melo 
> Date: Wed, 21 May 2014 18:05:35 -0300

> > But after thinking a bit more, looks like we need to do that, please
> > take a look at the attached patch to see if it addresses the problem.

> > Mostly it adds a new timeop to the per protocol recvmsg()
> > implementations, that, if not NULL, should be used instead of
> > SO_RCVTIMEO.

> > since the underlying recvmsg implementations already check that timeout,
> > return what is remaining, that will then be used in subsequent recvmsg
> > calls, at the end we just convert it back to timespec format.

> > In most cases it is just passed to skb_recv_datagram, that will check
> > the pointer, use it and update if not NULL.

> > Should have no problems, but I only did a boot with a system with this
> > patch applied, no problems noticed on a normal desktop session, ssh,
> > etc.
 
> This looks fine to me, but I have a small request:
 
> + return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
 
> I keep forgetting which way these expressions associate, so if you could
> parenthesize the innermost ?: I'd appreciate it. :)

Ok, I actually wrote a sample program to verify that these ternaries did
what I meant 8)

I'll finish the cset log and do this clarification change.

Would be great to get Acked-by tags from the original reporter, Michael
and whoever had a look at this change, if possible. Michael, Elie?
 
> Thanks!

Thanks a lot for reviewing it!

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-23 Thread David Miller

From: Arnaldo Carvalho de Melo 
Date: Wed, 21 May 2014 18:05:35 -0300

> But after thinking a bit more, looks like we need to do that, please
> take a look at the attached patch to see if it addresses the problem.
> 
> Mostly it adds a new timeop to the per protocol recvmsg()
> implementations, that, if not NULL, should be used instead of
> SO_RCVTIMEO.
> 
> since the underlying recvmsg implementations already check that timeout,
> return what is remaining, that will then be used in subsequent recvmsg
> calls, at the end we just convert it back to timespec format.
> 
> In most cases it is just passed to skb_recv_datagram, that will check
> the pointer, use it and update if not NULL.
> 
> Should have no problems, but I only did a boot with a system with this
> patch applied, no problems noticed on a normal desktop session, ssh,
> etc.

This looks fine to me, but I have a small request:

+   return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;

I keep forgetting which way these expressions associate, so if you could
parenthesize the innermost ?: I'd appreciate it. :)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-23 Thread David Miller

From: Arnaldo Carvalho de Melo a...@kernel.org
Date: Wed, 21 May 2014 18:05:35 -0300

 But after thinking a bit more, looks like we need to do that, please
 take a look at the attached patch to see if it addresses the problem.

 Mostly it adds a new timeop to the per protocol recvmsg()
 implementations, that, if not NULL, should be used instead of
 SO_RCVTIMEO.

 since the underlying recvmsg implementations already check that timeout,
 return what is remaining, that will then be used in subsequent recvmsg
 calls, at the end we just convert it back to timespec format.

 In most cases it is just passed to skb_recv_datagram, that will check
 the pointer, use it and update if not NULL.

 Should have no problems, but I only did a boot with a system with this
 patch applied, no problems noticed on a normal desktop session, ssh,
 etc.

This looks fine to me, but I have a small request:

+   return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo;

I keep forgetting which way these expressions associate, so if you could
parenthesize the innermost ?: I'd appreciate it. :)

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-23 Thread Arnaldo Carvalho de Melo

Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
 From: Arnaldo Carvalho de Melo a...@kernel.org
 Date: Wed, 21 May 2014 18:05:35 -0300

  But after thinking a bit more, looks like we need to do that, please
  take a look at the attached patch to see if it addresses the problem.

  Mostly it adds a new timeop to the per protocol recvmsg()
  implementations, that, if not NULL, should be used instead of
  SO_RCVTIMEO.

  since the underlying recvmsg implementations already check that timeout,
  return what is remaining, that will then be used in subsequent recvmsg
  calls, at the end we just convert it back to timespec format.

  In most cases it is just passed to skb_recv_datagram, that will check
  the pointer, use it and update if not NULL.

  Should have no problems, but I only did a boot with a system with this
  patch applied, no problems noticed on a normal desktop session, ssh,
  etc.

 This looks fine to me, but I have a small request:

 + return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo;

 I keep forgetting which way these expressions associate, so if you could
 parenthesize the innermost ?: I'd appreciate it. :)

Ok, I actually wrote a sample program to verify that these ternaries did
what I meant 8)

I'll finish the cset log and do this clarification change.

Would be great to get Acked-by tags from the original reporter, Michael
and whoever had a look at this change, if possible. Michael, Elie?

 Thanks!

Thanks a lot for reviewing it!

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-22 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
>> escreveu:
>>> Hi Arnaldo,
>  
>>> Ping!
> 
>> I acknowledge the problem, the timeout has to be passed to the
>> underlying ->recvmsg() implementations that should return the time spent
>> waiting for each packet, so that we can accrue that at recvmmsg level.
>  
>> We can do either passing an extra timeout parameter to the recvmsg
>> implementations or using some struct sock member to specify that
>> timeout.
>  
>> The first approach is intrusive, touches tons of files, so I'll try
>> making it all mostly transparent by hooking into sock_rcvtimeo()
>> somehow.
> 
> But after thinking a bit more, looks like we need to do that, please
> take a look at the attached patch to see if it addresses the problem.
> 
> Mostly it adds a new timeop to the per protocol recvmsg()
> implementations, that, if not NULL, should be used instead of
> SO_RCVTIMEO.
> 
> since the underlying recvmsg implementations already check that timeout,
> return what is remaining, that will then be used in subsequent recvmsg
> calls, at the end we just convert it back to timespec format.
> 
> In most cases it is just passed to skb_recv_datagram, that will check
> the pointer, use it and update if not NULL.
> 
> Should have no problems, but I only did a boot with a system with this
> patch applied, no problems noticed on a normal desktop session, ssh,
> etc.

Thanks! I applied this patch against 3.15-rc6.

recvmmsg() now (mostly) does what I expect: 
* it waits until either the timeout expires or vlen messages 
  have been received
* If no message is received before timeout, it returns -1/EAGAIN.
* If vlen messages are received before the timeout expires, then
  the remaining time is returned in timeout.

One question: in the event that the call is interrupted by a signal 
handler, it fails (as expected) with EINTR, but the 'timeout' value is 
not updated with the remaining time on the timer. Would it be desirable 
to emulate the behavior of select() (and other syscalls) in this 
respect, and instead return the remaining time if interrupted by 
a signal?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-22 Thread Michael Kerrisk (man-pages)

Hi Arnaldo,

On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
 Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) 
 escreveu:
 Hi Arnaldo,
  
 Ping!
 
 I acknowledge the problem, the timeout has to be passed to the
 underlying -recvmsg() implementations that should return the time spent
 waiting for each packet, so that we can accrue that at recvmmsg level.
  
 We can do either passing an extra timeout parameter to the recvmsg
 implementations or using some struct sock member to specify that
 timeout.
  
 The first approach is intrusive, touches tons of files, so I'll try
 making it all mostly transparent by hooking into sock_rcvtimeo()
 somehow.
 
 But after thinking a bit more, looks like we need to do that, please
 take a look at the attached patch to see if it addresses the problem.
 
 Mostly it adds a new timeop to the per protocol recvmsg()
 implementations, that, if not NULL, should be used instead of
 SO_RCVTIMEO.
 
 since the underlying recvmsg implementations already check that timeout,
 return what is remaining, that will then be used in subsequent recvmsg
 calls, at the end we just convert it back to timespec format.
 
 In most cases it is just passed to skb_recv_datagram, that will check
 the pointer, use it and update if not NULL.
 
 Should have no problems, but I only did a boot with a system with this
 patch applied, no problems noticed on a normal desktop session, ssh,
 etc.

Thanks! I applied this patch against 3.15-rc6.

recvmmsg() now (mostly) does what I expect: 
* it waits until either the timeout expires or vlen messages 
  have been received
* If no message is received before timeout, it returns -1/EAGAIN.
* If vlen messages are received before the timeout expires, then
  the remaining time is returned in timeout.

One question: in the event that the call is interrupted by a signal 
handler, it fails (as expected) with EINTR, but the 'timeout' value is 
not updated with the remaining time on the timer. Would it be desirable 
to emulate the behavior of select() (and other syscalls) in this 
respect, and instead return the remaining time if interrupted by 
a signal?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

56 matches

Mail list logo