Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/28/2014 02:20 PM, Michael Kerrisk (man-pages) wrote: > On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: >> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) >> escreveu: >>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo >>> wrote: Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: > On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: >> Can you try the attached patch on top of the first one? > Patches on patches is a way to make your testers work unnecessarily > harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. >>> >>> (I don't think that works...) >> >> Lets try both then, > > That's better! > >> attached goes the updated patch, and this is the >> diff to the last combined one: >> >> diff --git a/net/socket.c b/net/socket.c >> index 310a50971769..379be43879db 100644 >> --- a/net/socket.c >> +++ b/net/socket.c >> @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr >> __user *, mmsg, >> >> datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys); >> >> -if (datagrams > 0 && >> -copy_to_user(timeout, _sys, sizeof(timeout_sys))) >> +if (copy_to_user(timeout, _sys, sizeof(timeout_sys))) >> datagrams = -EFAULT; >> >> return datagrams; >> >> -- >> >> This is a quick thing just to show where the problem lies, need to think >> how to report an -EFAULT at this point properly, i.e. look at >> __sys_recvmmsg for something related (returning the number of >> successfully copied datagrams to userspace while storing the error for >> subsequent reporting): >> >> if (err == 0) >> return datagrams; >> >> if (datagrams != 0) { >> /* >> * We may return less entries than requested (vlen) if >> * the >> * sock is non block and there aren't enough >> * datagrams... >> */ >> if (err != -EAGAIN) { >> /* >> * ... or if recvmsg returns an error after we >> * received some datagrams, where we record the >> * error to return on the next call or if the >> * app asks about it using getsockopt(SO_ERROR). >> */ >> sock->sk->sk_err = -err; >> } >> >> return datagrams; >> } >> >> I.e. userspace would have to use getsockopt(SO_ERROR)... need to think >> more about it, sidetracked now, will be back to this. >> >> Anyway, attached goes the current combined patch. > > So, I applied against net-next as you suggested offlist. > Builds and generally tests fine. Some observations: > > * In the case that the call is interrupted by a signal handler and no > datagrams have been received, the call fails with EINTR, as expected. > > * The call always updates 'timeout', both in the success case and in the > EINTR case. (That seems fine.) So, returning to your recvmmsg-timeout-v3.patch. I think the behavior as implemented, and described above is okay. > But, another question... > > In the case that the call is interrupted by a signal handler and some > datagrams have already been received, then the call succeeds, and > returns the number of datagrams received, and 'timeout' is updated with > the remaining time. Maybe that's the right behavior, but I just want to > check. There is at least one other possibility: > > * Fetch no datagrams (i.e., the datagrams are left to receive in a > future call), and the call fails with EINTR, and 'timeout' is updated. > > Maybe that possibility is hard to implement (not sure). But my main point > is to make the current behavior clear, note the alternative, and ask: > is the current behavior the best choice. (I'm not saying it's not, but I > do want the choice to be a conscious one.) So, I think (can't find the mail right now) that you explained elsewhere that the above would be hard to implement. And in any case, I'm not sure it's desirable; I only wanted to check that the choice was a deliberate one. However, there is still a weirdness, which relates to the discussion you and David Laight had. Suppose the following scenario. 1. We do a recvmmsg() with 10 second timeout, asking for 5 messages. 2. 3 messages arrive 3. 6 seconds after the call, a signal handler interrupts the call. 4. recvmmsg() returns success, telling us it got 3 messages. So far, so good. But 5. We make a further recvmmsg() call. 6. That call returns immediately, with an EINTR error. That really should not be happening. As noted elsewhere in this thread, EINTR is a property of a specific system call,
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) > escreveu: >> Hi Arnaldo, >> >> Things have gone quiet ;-). What's the current state of this patch? > > Yeah, I kept meaning to prod the other people on this thread about what > they thought about my last messages, patches, etc. :-) > > Can I have acked-by or even tested-by on those? Is it ok? I just need to go back and test one point that sounds like it might still be broken. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote: Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, Things have gone quiet ;-). What's the current state of this patch? Yeah, I kept meaning to prod the other people on this thread about what they thought about my last messages, patches, etc. :-) Can I have acked-by or even tested-by on those? Is it ok? I just need to go back and test one point that sounds like it might still be broken. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/28/2014 02:20 PM, Michael Kerrisk (man-pages) wrote: On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu: On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. (I don't think that works...) Lets try both then, That's better! attached goes the updated patch, and this is the diff to the last combined one: diff --git a/net/socket.c b/net/socket.c index 310a50971769..379be43879db 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys); -if (datagrams 0 -copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) +if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) datagrams = -EFAULT; return datagrams; -- This is a quick thing just to show where the problem lies, need to think how to report an -EFAULT at this point properly, i.e. look at __sys_recvmmsg for something related (returning the number of successfully copied datagrams to userspace while storing the error for subsequent reporting): if (err == 0) return datagrams; if (datagrams != 0) { /* * We may return less entries than requested (vlen) if * the * sock is non block and there aren't enough * datagrams... */ if (err != -EAGAIN) { /* * ... or if recvmsg returns an error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). */ sock-sk-sk_err = -err; } return datagrams; } I.e. userspace would have to use getsockopt(SO_ERROR)... need to think more about it, sidetracked now, will be back to this. Anyway, attached goes the current combined patch. So, I applied against net-next as you suggested offlist. Builds and generally tests fine. Some observations: * In the case that the call is interrupted by a signal handler and no datagrams have been received, the call fails with EINTR, as expected. * The call always updates 'timeout', both in the success case and in the EINTR case. (That seems fine.) So, returning to your recvmmsg-timeout-v3.patch. I think the behavior as implemented, and described above is okay. But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to check. There is at least one other possibility: * Fetch no datagrams (i.e., the datagrams are left to receive in a future call), and the call fails with EINTR, and 'timeout' is updated. Maybe that possibility is hard to implement (not sure). But my main point is to make the current behavior clear, note the alternative, and ask: is the current behavior the best choice. (I'm not saying it's not, but I do want the choice to be a conscious one.) So, I think (can't find the mail right now) that you explained elsewhere that the above would be hard to implement. And in any case, I'm not sure it's desirable; I only wanted to check that the choice was a deliberate one. However, there is still a weirdness, which relates to the discussion you and David Laight had. Suppose the following scenario. 1. We do a recvmmsg() with 10 second timeout, asking for 5 messages. 2. 3 messages arrive 3. 6 seconds after the call, a signal handler interrupts the call. 4. recvmmsg() returns success, telling us it got 3 messages. So far, so good. But 5. We make a further recvmmsg() call. 6. That call returns immediately, with an EINTR error. That really should not be happening. As noted elsewhere in this thread, EINTR is a property of a specific system call, not of the thread or the socket. By the time of step 5, the kernel should already have forgotten about the signal that occurred at step 3. I don't think I saw
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu: > Hi Arnaldo, > > Things have gone quiet ;-). What's the current state of this patch? Yeah, I kept meaning to prod the other people on this thread about what they thought about my last messages, patches, etc. :-) Can I have acked-by or even tested-by on those? Is it ok? - Arnaldo > Thanks, > > Michael > > > On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo > wrote: > > Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: > >> From: 'Arnaldo Carvalho de Melo' > >> ... > >> > > I remember some discussions from an XNET standards meeting (I've > >> > > forgotten > >> > > exactly which errors on which calls were being discussed). > >> > > My recollection is that you return success with a partial transfer > >> > > count for ANY error that happens after some data has been transferred. > >> > > The actual error will be returned when it happens again on the next > >> > > system call - Note the AGAIN, not a saved error. > > > >> > A saved error, for the right entity, in the recvmmsg case, that > >> > basically is batching multiple recvmsg syscalls, doesn't sound like a > >> > problem, i.e. the idea is to, as much as possible, mimic what multiple > >> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel > >> > subsystems) overhead. > > > >> > Perhaps we can have something in between, i.e. for things like EFAULT, > >> > we should report straight away, effectively dropping whatever datagrams > >> > successfully received in the current batch, do you agree? > > > >> Not unreasonable - EFAULT shouldn't happen unless the application > >> is buggy. > > > > Ok. > > > >> > For transient errors the existing mechanism, fixed so that only per > >> > socket errors are saved for later, as today, could be kept? > > > >> I don't think it is ever necessary to save an errno value for the > >> next system call at all. > >> Just process the next system call and see what happens. > > > >> If the call returns with less than the maximum number of datagrams > >> and with a non-zero timeout left - then the application can infer > >> that it was terminated by an abnormal event of some kind. > >> This might be a signal. > > > > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the > > error on the next call, but we provide a way for the app to retrieve the > > reason for the smaller than expected batch? > > > >> I'm not sure if an icmp error on a connected datagram socket could > >> generate a 'disconnect'. It might happen if the interface is being > >> used for something like SCTP. > >> In either case the next call will detect the error. > > > > - Arnaldo > > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, Things have gone quiet ;-). What's the current state of this patch? Yeah, I kept meaning to prod the other people on this thread about what they thought about my last messages, patches, etc. :-) Can I have acked-by or even tested-by on those? Is it ok? - Arnaldo Thanks, Michael On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. Ok. For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the error on the next call, but we provide a way for the app to retrieve the reason for the smaller than expected batch? I'm not sure if an icmp error on a connected datagram socket could generate a 'disconnect'. It might happen if the interface is being used for something like SCTP. In either case the next call will detect the error. - Arnaldo -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, Things have gone quiet ;-). What's the current state of this patch? Thanks, Michael On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo wrote: > Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: >> From: 'Arnaldo Carvalho de Melo' >> ... >> > > I remember some discussions from an XNET standards meeting (I've >> > > forgotten >> > > exactly which errors on which calls were being discussed). >> > > My recollection is that you return success with a partial transfer >> > > count for ANY error that happens after some data has been transferred. >> > > The actual error will be returned when it happens again on the next >> > > system call - Note the AGAIN, not a saved error. > >> > A saved error, for the right entity, in the recvmmsg case, that >> > basically is batching multiple recvmsg syscalls, doesn't sound like a >> > problem, i.e. the idea is to, as much as possible, mimic what multiple >> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel >> > subsystems) overhead. > >> > Perhaps we can have something in between, i.e. for things like EFAULT, >> > we should report straight away, effectively dropping whatever datagrams >> > successfully received in the current batch, do you agree? > >> Not unreasonable - EFAULT shouldn't happen unless the application >> is buggy. > > Ok. > >> > For transient errors the existing mechanism, fixed so that only per >> > socket errors are saved for later, as today, could be kept? > >> I don't think it is ever necessary to save an errno value for the >> next system call at all. >> Just process the next system call and see what happens. > >> If the call returns with less than the maximum number of datagrams >> and with a non-zero timeout left - then the application can infer >> that it was terminated by an abnormal event of some kind. >> This might be a signal. > > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the > error on the next call, but we provide a way for the app to retrieve the > reason for the smaller than expected batch? > >> I'm not sure if an icmp error on a connected datagram socket could >> generate a 'disconnect'. It might happen if the interface is being >> used for something like SCTP. >> In either case the next call will detect the error. > > - Arnaldo -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, Things have gone quiet ;-). What's the current state of this patch? Thanks, Michael On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. Ok. For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the error on the next call, but we provide a way for the app to retrieve the reason for the smaller than expected batch? I'm not sure if an icmp error on a connected datagram socket could generate a 'disconnect'. It might happen if the interface is being used for something like SCTP. In either case the next call will detect the error. - Arnaldo -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] Handle EFAULT in partial recvmmsg was Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 11:17:05AM -0300, 'Arnaldo Carvalho de Melo' escreveu: > Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: > > From: 'Arnaldo Carvalho de Melo' > > ... > > > > I remember some discussions from an XNET standards meeting (I've > > > > forgotten > > > > exactly which errors on which calls were being discussed). > > > > My recollection is that you return success with a partial transfer > > > > count for ANY error that happens after some data has been transferred. > > > > The actual error will be returned when it happens again on the next > > > > system call - Note the AGAIN, not a saved error. > > > A saved error, for the right entity, in the recvmmsg case, that > > > basically is batching multiple recvmsg syscalls, doesn't sound like a > > > problem, i.e. the idea is to, as much as possible, mimic what multiple > > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel > > > subsystems) overhead. > > > Perhaps we can have something in between, i.e. for things like EFAULT, > > > we should report straight away, effectively dropping whatever datagrams > > > successfully received in the current batch, do you agree? > > Not unreasonable - EFAULT shouldn't happen unless the application > > is buggy. > Ok. So the patch below should handle it, and record that the packets were dropped, not at the transport level, like UDP_MIB_INERRORS, for instance, would indicate, but at the batching, recvmmsg level, so perhaps we'll need a MIB variable for that. Also a counterpart to the trace_kfree_skb(skb, udp_recvmsg) tracepoint for dropwatch and similar tools to use, Neil? I'm keeping this separate from the timeout update patch. - Arnaldo diff --git a/net/socket.c b/net/socket.c index abf56b2a14f9..63491f015912 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2415,13 +2415,17 @@ out_put: return datagrams; if (datagrams != 0) { + if (err == -EFAULT) { + atomic_add(datagrams, >sk->sk_drops); + return -EFAULT; + } /* * We may return less entries than requested (vlen) if the * sock is non block and there aren't enough datagrams... */ if (err != -EAGAIN) { /* -* ... or if recvmsg returns an error after we +* ... or if recvmsg returns a socket error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de Melo' > Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: > > From: 'Arnaldo Carvalho de Melo' > > ... > > > > I remember some discussions from an XNET standards meeting (I've > > > > forgotten > > > > exactly which errors on which calls were being discussed). > > > > My recollection is that you return success with a partial transfer > > > > count for ANY error that happens after some data has been transferred. > > > > The actual error will be returned when it happens again on the next > > > > system call - Note the AGAIN, not a saved error. > > > > A saved error, for the right entity, in the recvmmsg case, that > > > basically is batching multiple recvmsg syscalls, doesn't sound like a > > > problem, i.e. the idea is to, as much as possible, mimic what multiple > > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel > > > subsystems) overhead. > > > > Perhaps we can have something in between, i.e. for things like EFAULT, > > > we should report straight away, effectively dropping whatever datagrams > > > successfully received in the current batch, do you agree? > > > Not unreasonable - EFAULT shouldn't happen unless the application > > is buggy. > > Ok. > > > > For transient errors the existing mechanism, fixed so that only per > > > socket errors are saved for later, as today, could be kept? > > > I don't think it is ever necessary to save an errno value for the > > next system call at all. > > Just process the next system call and see what happens. > > > If the call returns with less than the maximum number of datagrams > > and with a non-zero timeout left - then the application can infer > > that it was terminated by an abnormal event of some kind. > > This might be a signal. > > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the > error on the next call, but we provide a way for the app to retrieve the > reason for the smaller than expected batch? If you really think it is necessary, then you want a field in the control structure. But IMHO returning the 'time left' is more than enough. IIRC the original problem was that the user-specified timeout was used as an inter-datagram timer instead of an overall timeout. I suspect that most application won't actually care about the 'time left', nor the actual number of returned datagrams. They will just process what they are given and then wait for the next batch. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: > From: 'Arnaldo Carvalho de Melo' > ... > > > I remember some discussions from an XNET standards meeting (I've forgotten > > > exactly which errors on which calls were being discussed). > > > My recollection is that you return success with a partial transfer > > > count for ANY error that happens after some data has been transferred. > > > The actual error will be returned when it happens again on the next > > > system call - Note the AGAIN, not a saved error. > > A saved error, for the right entity, in the recvmmsg case, that > > basically is batching multiple recvmsg syscalls, doesn't sound like a > > problem, i.e. the idea is to, as much as possible, mimic what multiple > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel > > subsystems) overhead. > > Perhaps we can have something in between, i.e. for things like EFAULT, > > we should report straight away, effectively dropping whatever datagrams > > successfully received in the current batch, do you agree? > Not unreasonable - EFAULT shouldn't happen unless the application > is buggy. Ok. > > For transient errors the existing mechanism, fixed so that only per > > socket errors are saved for later, as today, could be kept? > I don't think it is ever necessary to save an errno value for the > next system call at all. > Just process the next system call and see what happens. > If the call returns with less than the maximum number of datagrams > and with a non-zero timeout left - then the application can infer > that it was terminated by an abnormal event of some kind. > This might be a signal. Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the error on the next call, but we provide a way for the app to retrieve the reason for the smaller than expected batch? > I'm not sure if an icmp error on a connected datagram socket could > generate a 'disconnect'. It might happen if the interface is being > used for something like SCTP. > In either case the next call will detect the error. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de Melo' ... > > I remember some discussions from an XNET standards meeting (I've forgotten > > exactly which errors on which calls were being discussed). > > My recollection is that you return success with a partial transfer > > count for ANY error that happens after some data has been transferred. > > The actual error will be returned when it happens again on the next > > system call - Note the AGAIN, not a saved error. > > A saved error, for the right entity, in the recvmmsg case, that > basically is batching multiple recvmsg syscalls, doesn't sound like a > problem, i.e. the idea is to, as much as possible, mimic what multiple > recvmsg calls would do, but reduce its in/out kernel (and inside kernel > subsystems) overhead. > > Perhaps we can have something in between, i.e. for things like EFAULT, > we should report straight away, effectively dropping whatever datagrams > successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. > For transient errors the existing mechanism, fixed so that only per > socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. I'm not sure if an icmp error on a connected datagram socket could generate a 'disconnect'. It might happen if the interface is being used for something like SCTP. In either case the next call will detect the error. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/29/2014 12:53 PM, David Laight wrote: > From: 'Arnaldo Carvalho de > ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. >> >>> The application shouldn't need to see an EINTR response, any signal handler >>> should be run when the system call returns to user (regardless of the >>> system call result code). >>> If that doesn't happen Linux is badly broken! >>> >From an application point of view this is exactly the same as the signal >>> occurring just before/after the kernel entry/exit for the system call. >>> >>> The call should just return early with success status. >>> No need to preserve the EINTR response for later. >>> >>> The same might be appropriate for other errors - maybe including EFAULT >>> copying non-initial messages to userspace. >>> Put the message being processed back on the socket queue and return >>> success with the (non-zero) partial message count. >> >> We don't need to put anything back, if we get an EFAULT for a datagram, >> then we stop processing that packet, _dropping_ it (and that is just >> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, >> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) >> and stop the batch, and if no datagrams were received, return the error >> straight away. >> >> But if some datagrams were successfully received, and at that point >> _already_ removed from queues and sent successfully to userspace, >> recvmmsg will return the number of successfully copied datagrams and >> store the error so that it can return on the next syscall, > > That just doesn't make any sense. Agreed. > Saving an errno code would only make any sense if the error were a > property of the socket - Back in http://marc.info/?l=linux-netdev=124298156121906=2 (the follow-on from the discussion that Arnaldo mentions below), it was noted: : Normally you'd expect the call to return what it has read without an : error, and then the socket error would be picked up on the next call. and the key point in that sentence was "*socket* error." > but EFAULT is a property of the system call, > and EINTR a property of the process (it exists so that the process > can return to userspace to execute a signal handler - relying on > SIGALRM to timeout blocking system calls is a recipe for disaster). Exactly. Interruption by a signal should just result in an early success return, unless no datagrams have been received so far, in which case it should produce an EINTR failure. No error should be saved for a future call. > The next system call could be from an entirely different process, > neither EFAULT nor EINTR would mean anything to it at all. > > ISTR that returning EFAULT generates a signal that will typically > terminate the process. Not generally, I think. (I think you're thinking of SIGSEGV when a process touches a nonexistent address in user mode.) > You definitely don't want to send one to a different process. But it's true that the EFAULT or EINTR shouldn't be returned to another process. >> Please refer to the original discussion on how to report how many >> successfully copied datagrams and also report that it stopped before the >> timeout and the number of requested datagrams in a batch: >> >> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com > > I do remember the original problem. > I don't recall error reporting being referenced. (See above.) >> What is being discussed here is how to return the EFAULT that may happen >> _after_ datagram processing, be it interrupted by an EFAULT, signal, or >> plain returning all that was requested, with no errors. > > I remember some discussions from an XNET standards meeting (I've forgotten > exactly which errors on which calls were being discussed). > My recollection is that you return success with a partial transfer > count for ANY error that happens after some data has been transferred. > The actual error will be returned when it happens again on the next > system call - Note the AGAIN, not a saved error. > > Things like blocking send/write being interrupted spring to mind. > Possibly even copyin/out failures part way through a read/write call. > >> This EFAULT _after_ datagram processing may happen when updating the >> remaining timeout, because then how can userspace both receive the >> number of successfully copied datagrams (in any of the cases mentioned >> in the previous paragraph) and know that that timeout can't be used >> because there was a problem while trying to copy it to userspace >> (EFAULT)? > > Failure to write the control structure back to userspace probably > deserves an EFAULT return - the application is buggy. > IIRC normal recvmsg() copies out the control structure at the end > of processing - that can fail. > I wouldn't worry about datagram discards on any of those late > EFAULT conditions. Agree on all
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 10:53:22AM +, David Laight escreveu: > From: 'Arnaldo Carvalho de > ... > > > > So, yes, the user _can_ process the packets already copied to userspace, > > > > i.e. no packet loss, and then, on the next call, will receive the signal > > > > notification. > > > > > The application shouldn't need to see an EINTR response, any signal > > > handler > > > should be run when the system call returns to user (regardless of the > > > system call result code). > > > If that doesn't happen Linux is badly broken! > > > >From an application point of view this is exactly the same as the signal > > > occurring just before/after the kernel entry/exit for the system call. > > > The call should just return early with success status. > > > No need to preserve the EINTR response for later. > > > The same might be appropriate for other errors - maybe including EFAULT > > > copying non-initial messages to userspace. > > > Put the message being processed back on the socket queue and return > > > success with the (non-zero) partial message count. > > We don't need to put anything back, if we get an EFAULT for a datagram, > > then we stop processing that packet, _dropping_ it (and that is just > > like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, > > and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) > > and stop the batch, and if no datagrams were received, return the error > > straight away. > > But if some datagrams were successfully received, and at that point > > _already_ removed from queues and sent successfully to userspace, > > recvmmsg will return the number of successfully copied datagrams and > > store the error so that it can return on the next syscall, > That just doesn't make any sense. Yeah for things like EFAULT, storing it in a per socket area for later reporting is a bug, a separate bug. > Saving an errno code would only make any sense if the error were a > property of the socket - but EFAULT is a property of the system call, Agreed, so for the errors that are socket related, the mechanism should work, not for things that are thread specific, then we should either straight away signal it despite of any successfully received packets in the batch so far in the current recvmmsg syscall or mimic what would happen if the user issued multiple recvmsg syscalls instead, i.e. in the next call _for this thread_, the EFAULT will take place. > and EINTR a property of the process (it exists so that the process > can return to userspace to execute a signal handler - relying on > SIGALRM to timeout blocking system calls is a recipe for disaster). > > The next system call could be from an entirely different process, > neither EFAULT nor EINTR would mean anything to it at all. Right, storing thread specific errors on the socket is a bug and has to be fixed. I.e. _if_ we keep the saving error for next syscall strategy, then that error has, for the per thread cases, be stored in a per thread area error field for socket operations. > ISTR that returning EFAULT generates a signal that will typically > terminate the process. > You definitely don't want to send one to a different process. Right. > > Please refer to the original discussion on how to report how many > > successfully copied datagrams and also report that it stopped before the > > timeout and the number of requested datagrams in a batch: > > http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com > I do remember the original problem. > I don't recall error reporting being referenced. > > What is being discussed here is how to return the EFAULT that may happen > > _after_ datagram processing, be it interrupted by an EFAULT, signal, or > > plain returning all that was requested, with no errors. > I remember some discussions from an XNET standards meeting (I've forgotten > exactly which errors on which calls were being discussed). > My recollection is that you return success with a partial transfer > count for ANY error that happens after some data has been transferred. > The actual error will be returned when it happens again on the next > system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? > Things like blocking send/write being interrupted spring to mind. > Possibly even copyin/out failures part way through a read/write call. > > >
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de ... > > > So, yes, the user _can_ process the packets already copied to userspace, > > > i.e. no packet loss, and then, on the next call, will receive the signal > > > notification. > > > The application shouldn't need to see an EINTR response, any signal handler > > should be run when the system call returns to user (regardless of the > > system call result code). > > If that doesn't happen Linux is badly broken! > > >From an application point of view this is exactly the same as the signal > > occurring just before/after the kernel entry/exit for the system call. > > > > The call should just return early with success status. > > No need to preserve the EINTR response for later. > > > > The same might be appropriate for other errors - maybe including EFAULT > > copying non-initial messages to userspace. > > Put the message being processed back on the socket queue and return > > success with the (non-zero) partial message count. > > We don't need to put anything back, if we get an EFAULT for a datagram, > then we stop processing that packet, _dropping_ it (and that is just > like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, > and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) > and stop the batch, and if no datagrams were received, return the error > straight away. > > But if some datagrams were successfully received, and at that point > _already_ removed from queues and sent successfully to userspace, > recvmmsg will return the number of successfully copied datagrams and > store the error so that it can return on the next syscall, That just doesn't make any sense. Saving an errno code would only make any sense if the error were a property of the socket - but EFAULT is a property of the system call, and EINTR a property of the process (it exists so that the process can return to userspace to execute a signal handler - relying on SIGALRM to timeout blocking system calls is a recipe for disaster). The next system call could be from an entirely different process, neither EFAULT nor EINTR would mean anything to it at all. ISTR that returning EFAULT generates a signal that will typically terminate the process. You definitely don't want to send one to a different process. > Please refer to the original discussion on how to report how many > successfully copied datagrams and also report that it stopped before the > timeout and the number of requested datagrams in a batch: > > http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com I do remember the original problem. I don't recall error reporting being referenced. > What is being discussed here is how to return the EFAULT that may happen > _after_ datagram processing, be it interrupted by an EFAULT, signal, or > plain returning all that was requested, with no errors. I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. Things like blocking send/write being interrupted spring to mind. Possibly even copyin/out failures part way through a read/write call. > This EFAULT _after_ datagram processing may happen when updating the > remaining timeout, because then how can userspace both receive the > number of successfully copied datagrams (in any of the cases mentioned > in the previous paragraph) and know that that timeout can't be used > because there was a problem while trying to copy it to userspace > (EFAULT)? Failure to write the control structure back to userspace probably deserves an EFAULT return - the application is buggy. IIRC normal recvmsg() copies out the control structure at the end of processing - that can fail. I wouldn't worry about datagram discards on any of those late EFAULT conditions. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. We don't need to put anything back, if we get an EFAULT for a datagram, then we stop processing that packet, _dropping_ it (and that is just like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) and stop the batch, and if no datagrams were received, return the error straight away. But if some datagrams were successfully received, and at that point _already_ removed from queues and sent successfully to userspace, recvmmsg will return the number of successfully copied datagrams and store the error so that it can return on the next syscall, That just doesn't make any sense. Saving an errno code would only make any sense if the error were a property of the socket - but EFAULT is a property of the system call, and EINTR a property of the process (it exists so that the process can return to userspace to execute a signal handler - relying on SIGALRM to timeout blocking system calls is a recipe for disaster). The next system call could be from an entirely different process, neither EFAULT nor EINTR would mean anything to it at all. ISTR that returning EFAULT generates a signal that will typically terminate the process. You definitely don't want to send one to a different process. Please refer to the original discussion on how to report how many successfully copied datagrams and also report that it stopped before the timeout and the number of requested datagrams in a batch: http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com I do remember the original problem. I don't recall error reporting being referenced. What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. Things like blocking send/write being interrupted spring to mind. Possibly even copyin/out failures part way through a read/write call. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? Failure to write the control structure back to userspace probably deserves an EFAULT return - the application is buggy. IIRC normal recvmsg() copies out the control structure at the end of processing - that can fail. I wouldn't worry about datagram discards on any of those late EFAULT conditions. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 10:53:22AM +, David Laight escreveu: From: 'Arnaldo Carvalho de ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. We don't need to put anything back, if we get an EFAULT for a datagram, then we stop processing that packet, _dropping_ it (and that is just like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) and stop the batch, and if no datagrams were received, return the error straight away. But if some datagrams were successfully received, and at that point _already_ removed from queues and sent successfully to userspace, recvmmsg will return the number of successfully copied datagrams and store the error so that it can return on the next syscall, That just doesn't make any sense. Yeah for things like EFAULT, storing it in a per socket area for later reporting is a bug, a separate bug. Saving an errno code would only make any sense if the error were a property of the socket - but EFAULT is a property of the system call, Agreed, so for the errors that are socket related, the mechanism should work, not for things that are thread specific, then we should either straight away signal it despite of any successfully received packets in the batch so far in the current recvmmsg syscall or mimic what would happen if the user issued multiple recvmsg syscalls instead, i.e. in the next call _for this thread_, the EFAULT will take place. and EINTR a property of the process (it exists so that the process can return to userspace to execute a signal handler - relying on SIGALRM to timeout blocking system calls is a recipe for disaster). The next system call could be from an entirely different process, neither EFAULT nor EINTR would mean anything to it at all. Right, storing thread specific errors on the socket is a bug and has to be fixed. I.e. _if_ we keep the saving error for next syscall strategy, then that error has, for the per thread cases, be stored in a per thread area error field for socket operations. ISTR that returning EFAULT generates a signal that will typically terminate the process. You definitely don't want to send one to a different process. Right. Please refer to the original discussion on how to report how many successfully copied datagrams and also report that it stopped before the timeout and the number of requested datagrams in a batch: http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com I do remember the original problem. I don't recall error reporting being referenced. What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? Things like blocking send/write being interrupted spring to mind. Possibly even copyin/out failures part way through a read/write call. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/29/2014 12:53 PM, David Laight wrote: From: 'Arnaldo Carvalho de ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. We don't need to put anything back, if we get an EFAULT for a datagram, then we stop processing that packet, _dropping_ it (and that is just like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) and stop the batch, and if no datagrams were received, return the error straight away. But if some datagrams were successfully received, and at that point _already_ removed from queues and sent successfully to userspace, recvmmsg will return the number of successfully copied datagrams and store the error so that it can return on the next syscall, That just doesn't make any sense. Agreed. Saving an errno code would only make any sense if the error were a property of the socket - Back in http://marc.info/?l=linux-netdevm=124298156121906w=2 (the follow-on from the discussion that Arnaldo mentions below), it was noted: : Normally you'd expect the call to return what it has read without an : error, and then the socket error would be picked up on the next call. and the key point in that sentence was *socket* error. but EFAULT is a property of the system call, and EINTR a property of the process (it exists so that the process can return to userspace to execute a signal handler - relying on SIGALRM to timeout blocking system calls is a recipe for disaster). Exactly. Interruption by a signal should just result in an early success return, unless no datagrams have been received so far, in which case it should produce an EINTR failure. No error should be saved for a future call. The next system call could be from an entirely different process, neither EFAULT nor EINTR would mean anything to it at all. ISTR that returning EFAULT generates a signal that will typically terminate the process. Not generally, I think. (I think you're thinking of SIGSEGV when a process touches a nonexistent address in user mode.) You definitely don't want to send one to a different process. But it's true that the EFAULT or EINTR shouldn't be returned to another process. Please refer to the original discussion on how to report how many successfully copied datagrams and also report that it stopped before the timeout and the number of requested datagrams in a batch: http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com I do remember the original problem. I don't recall error reporting being referenced. (See above.) What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. Things like blocking send/write being interrupted spring to mind. Possibly even copyin/out failures part way through a read/write call. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? Failure to write the control structure back to userspace probably deserves an EFAULT return - the application is buggy. IIRC normal recvmsg() copies out the control structure at the end of processing - that can fail. I wouldn't worry about datagram discards on any of those late EFAULT conditions. Agree on all of the above, and that last point certainly seems like the right approach to me. Cheers, Michael -- Michael Kerrisk Linux man-pages
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. I'm not sure if an icmp error on a connected datagram socket could generate a 'disconnect'. It might happen if the interface is being used for something like SCTP. In either case the next call will detect the error. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. Ok. For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the error on the next call, but we provide a way for the app to retrieve the reason for the smaller than expected batch? I'm not sure if an icmp error on a connected datagram socket could generate a 'disconnect'. It might happen if the interface is being used for something like SCTP. In either case the next call will detect the error. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: 'Arnaldo Carvalho de Melo' Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. Ok. For transient errors the existing mechanism, fixed so that only per socket errors are saved for later, as today, could be kept? I don't think it is ever necessary to save an errno value for the next system call at all. Just process the next system call and see what happens. If the call returns with less than the maximum number of datagrams and with a non-zero timeout left - then the application can infer that it was terminated by an abnormal event of some kind. This might be a signal. Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the error on the next call, but we provide a way for the app to retrieve the reason for the smaller than expected batch? If you really think it is necessary, then you want a field in the control structure. But IMHO returning the 'time left' is more than enough. IIRC the original problem was that the user-specified timeout was used as an inter-datagram timer instead of an overall timeout. I suspect that most application won't actually care about the 'time left', nor the actual number of returned datagrams. They will just process what they are given and then wait for the next batch. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] Handle EFAULT in partial recvmmsg was Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 29, 2014 at 11:17:05AM -0300, 'Arnaldo Carvalho de Melo' escreveu: Em Thu, May 29, 2014 at 02:06:04PM +, David Laight escreveu: From: 'Arnaldo Carvalho de Melo' ... I remember some discussions from an XNET standards meeting (I've forgotten exactly which errors on which calls were being discussed). My recollection is that you return success with a partial transfer count for ANY error that happens after some data has been transferred. The actual error will be returned when it happens again on the next system call - Note the AGAIN, not a saved error. A saved error, for the right entity, in the recvmmsg case, that basically is batching multiple recvmsg syscalls, doesn't sound like a problem, i.e. the idea is to, as much as possible, mimic what multiple recvmsg calls would do, but reduce its in/out kernel (and inside kernel subsystems) overhead. Perhaps we can have something in between, i.e. for things like EFAULT, we should report straight away, effectively dropping whatever datagrams successfully received in the current batch, do you agree? Not unreasonable - EFAULT shouldn't happen unless the application is buggy. Ok. So the patch below should handle it, and record that the packets were dropped, not at the transport level, like UDP_MIB_INERRORS, for instance, would indicate, but at the batching, recvmmsg level, so perhaps we'll need a MIB variable for that. Also a counterpart to the trace_kfree_skb(skb, udp_recvmsg) tracepoint for dropwatch and similar tools to use, Neil? I'm keeping this separate from the timeout update patch. - Arnaldo diff --git a/net/socket.c b/net/socket.c index abf56b2a14f9..63491f015912 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2415,13 +2415,17 @@ out_put: return datagrams; if (datagrams != 0) { + if (err == -EFAULT) { + atomic_add(datagrams, sock-sk-sk_drops); + return -EFAULT; + } /* * We may return less entries than requested (vlen) if the * sock is non block and there aren't enough datagrams... */ if (err != -EAGAIN) { /* -* ... or if recvmsg returns an error after we +* ... or if recvmsg returns a socket error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 03:33:51PM -0600, Chris Friesen escreveu: > On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote: > >What is being discussed here is how to return the EFAULT that may happen > >_after_ datagram processing, be it interrupted by an EFAULT, signal, or > >plain returning all that was requested, with no errors. > >This EFAULT _after_ datagram processing may happen when updating the > >remaining timeout, because then how can userspace both receive the > >number of successfully copied datagrams (in any of the cases mentioned > >in the previous paragraph) and know that that timeout can't be used > >because there was a problem while trying to copy it to userspace > >(EFAULT)? > How does select() handle this problem? It updates the timeout and also > modifies other data. > Could we just check whether the timeout pointer is valid before doing > anything else? Of course we could still fault the page out while waiting > for messages and then fail to fault it back in later, but that seems like a > not-very-likely scenario. I'll check how select behaves, and yes, I think it is not-very-likely and what we're doing now is reasonable for datagram protocols, i.e. to return -EFAULT when updating the timeout fails, not reporting if packets were successfully received, i.e. they end up being "dropped", as userspace can't easily figure out if some was received short of painting it with some pattern and then checking the ones that aren't with that pattern. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote: What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? How does select() handle this problem? It updates the timeout and also modifies other data. Could we just check whether the timeout pointer is valid before doing anything else? Of course we could still fault the page out while waiting for messages and then fail to fault it back in later, but that seems like a not-very-likely scenario. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 03:17:40PM +, David Laight escreveu: > From: Arnaldo Carvalho de Melo > ... > > > But, another question... > > > > > > In the case that the call is interrupted by a signal handler and some > > > datagrams have already been received, then the call succeeds, and > > > returns the number of datagrams received, and 'timeout' is updated with > > > the remaining time. Maybe that's the right behavior, but I just want to > > Note that what the comment in the existing code says should apply here, > > namely that the next recv (m or mmsg) syscall on this socket will return > > what is in sock->sk->sk_err, that is the signal: > ... > > So, yes, the user _can_ process the packets already copied to userspace, > > i.e. no packet loss, and then, on the next call, will receive the signal > > notification. > The application shouldn't need to see an EINTR response, any signal handler > should be run when the system call returns to user (regardless of the > system call result code). > If that doesn't happen Linux is badly broken! > >From an application point of view this is exactly the same as the signal > occurring just before/after the kernel entry/exit for the system call. > > The call should just return early with success status. > No need to preserve the EINTR response for later. > > The same might be appropriate for other errors - maybe including EFAULT > copying non-initial messages to userspace. > Put the message being processed back on the socket queue and return > success with the (non-zero) partial message count. We don't need to put anything back, if we get an EFAULT for a datagram, then we stop processing that packet, _dropping_ it (and that is just like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) and stop the batch, and if no datagrams were received, return the error straight away. But if some datagrams were successfully received, and at that point _already_ removed from queues and sent successfully to userspace, recvmmsg will return the number of successfully copied datagrams and store the error so that it can return on the next syscall, Please refer to the original discussion on how to report how many successfully copied datagrams and also report that it stopped before the timeout and the number of requested datagrams in a batch: http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: Arnaldo Carvalho de Melo ... > > But, another question... > > > > In the case that the call is interrupted by a signal handler and some > > datagrams have already been received, then the call succeeds, and > > returns the number of datagrams received, and 'timeout' is updated with > > the remaining time. Maybe that's the right behavior, but I just want to > > Note that what the comment in the existing code says should apply here, > namely that the next recv (m or mmsg) syscall on this socket will return > what is in sock->sk->sk_err, that is the signal: > ... > > So, yes, the user _can_ process the packets already copied to userspace, > i.e. no packet loss, and then, on the next call, will receive the signal > notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! >From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 02:20:10PM +0200, Michael Kerrisk (man-pages) escreveu: > On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: > > attached goes the updated patch, and this is the > > diff to the last combined one: > > > > diff --git a/net/socket.c b/net/socket.c > > index 310a50971769..379be43879db 100644 > > --- a/net/socket.c > > +++ b/net/socket.c > > @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr > > __user *, mmsg, > > > > datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys); > > > > - if (datagrams > 0 && > > - copy_to_user(timeout, _sys, sizeof(timeout_sys))) > > + if (copy_to_user(timeout, _sys, sizeof(timeout_sys))) > > datagrams = -EFAULT; > > > > return datagrams; > > > > -- > > > > This is a quick thing just to show where the problem lies, need to think > > how to report an -EFAULT at this point properly, i.e. look at Ok, so I can live with the way things were before this fix, i.e. if the user specifies a timeout, then if it fails when copying to remaining time to userspace (copy_to_user call above), then we return -EFAULT. I.e. there would be no change in behaviour, but then perhaps we should go with the interface that is in place when we received some datagrams and then some error happens, see comment in the existing code, below: > > __sys_recvmmsg for something related (returning the number of > > successfully copied datagrams to userspace while storing the error for > > subsequent reporting): > > > > if (err == 0) > > return datagrams; > > > > if (datagrams != 0) { > > /* > > * We may return less entries than requested (vlen) if > > * the sock is non block and there aren't enough > > * datagrams... > > */ > > if (err != -EAGAIN) { > > /* > > * ... or if recvmsg returns an error after we > > * received some datagrams, where we record the > > * error to return on the next call or if the > > * app asks about it using getsockopt(SO_ERROR). > > */ > > sock->sk->sk_err = -err; > > } > > > > return datagrams; > > } > > > > I.e. userspace would have to use getsockopt(SO_ERROR)... need to think > > more about it, sidetracked now, will be back to this. > > > > Anyway, attached goes the current combined patch. > So, I applied against net-next as you suggested offlist. > Builds and generally tests fine. Some observations: > * In the case that the call is interrupted by a signal handler and no > datagrams have been received, the call fails with EINTR, as expected. Ok > * The call always updates 'timeout', both in the success case and in the > EINTR case. (That seems fine.) Agreed that it is how it should behave. > But, another question... > > In the case that the call is interrupted by a signal handler and some > datagrams have already been received, then the call succeeds, and > returns the number of datagrams received, and 'timeout' is updated with > the remaining time. Maybe that's the right behavior, but I just want to Note that what the comment in the existing code says should apply here, namely that the next recv (m or mmsg) syscall on this socket will return what is in sock->sk->sk_err, that is the signal: sys_recvmmsg() sock->ops->recvmsg() (e.g. inet_recvmsg) sk->prot->recvmsg() (e.g., udp_recvmsg) skb_recv_datagram() wait_for_more_packets() sock_intr_errno() *err = -EINTR sock->sk->sk_err = err Next recv will end up calling skb_recv_datagram and that does: struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, int *peeked, int *off, int *err, long *timeop) { struct sk_buff *skb, *last; long timeo; /* * Caller is allowed not to check sk->sk_err before * skb_recv_datagram() */ int error = sock_error(sk); if (error) goto no_packet; out: if (timeop) *timeop = timeo; return NULL; no_packet: *err = error; goto out; } So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. So, the user can just try the next call and see the signal, and it is also possible to notice that the timeout didn't expire and less than vlen packets were received, so something went wrong and calling getsockopt(SO_ERROR) will clarify things. This is not some new error reporting facility, it predates recvmmsg, that merely uses it for consistency. How to
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) > escreveu: >> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo >> wrote: >>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) >>> escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: > Can you try the attached patch on top of the first one? >>> Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this >>> >>> It was meant to highlight the changes with regard to the previous patch, >>> i.e. to make things easier for reviewing. >> >> (I don't think that works...) > > Lets try both then, That's better! > attached goes the updated patch, and this is the > diff to the last combined one: > > diff --git a/net/socket.c b/net/socket.c > index 310a50971769..379be43879db 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr > __user *, mmsg, > > datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, _sys); > > - if (datagrams > 0 && > - copy_to_user(timeout, _sys, sizeof(timeout_sys))) > + if (copy_to_user(timeout, _sys, sizeof(timeout_sys))) > datagrams = -EFAULT; > > return datagrams; > > -- > > This is a quick thing just to show where the problem lies, need to think > how to report an -EFAULT at this point properly, i.e. look at > __sys_recvmmsg for something related (returning the number of > successfully copied datagrams to userspace while storing the error for > subsequent reporting): > > if (err == 0) > return datagrams; > > if (datagrams != 0) { > /* > * We may return less entries than requested (vlen) if > * the > * sock is non block and there aren't enough > * datagrams... > */ > if (err != -EAGAIN) { > /* > * ... or if recvmsg returns an error after we > * received some datagrams, where we record the > * error to return on the next call or if the > * app asks about it using getsockopt(SO_ERROR). > */ > sock->sk->sk_err = -err; > } > > return datagrams; > } > > I.e. userspace would have to use getsockopt(SO_ERROR)... need to think > more about it, sidetracked now, will be back to this. > > Anyway, attached goes the current combined patch. So, I applied against net-next as you suggested offlist. Builds and generally tests fine. Some observations: * In the case that the call is interrupted by a signal handler and no datagrams have been received, the call fails with EINTR, as expected. * The call always updates 'timeout', both in the success case and in the EINTR case. (That seems fine.) But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to check. There is at least one other possibility: * Fetch no datagrams (i.e., the datagrams are left to receive in a future call), and the call fails with EINTR, and 'timeout' is updated. Maybe that possibility is hard to implement (not sure). But my main point is to make the current behavior clear, note the alternative, and ask: is the current behavior the best choice. (I'm not saying it's not, but I do want the choice to be a conscious one.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote: What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? How does select() handle this problem? It updates the timeout and also modifies other data. Could we just check whether the timeout pointer is valid before doing anything else? Of course we could still fault the page out while waiting for messages and then fail to fault it back in later, but that seems like a not-very-likely scenario. Chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 03:33:51PM -0600, Chris Friesen escreveu: On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote: What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? How does select() handle this problem? It updates the timeout and also modifies other data. Could we just check whether the timeout pointer is valid before doing anything else? Of course we could still fault the page out while waiting for messages and then fail to fault it back in later, but that seems like a not-very-likely scenario. I'll check how select behaves, and yes, I think it is not-very-likely and what we're doing now is reasonable for datagram protocols, i.e. to return -EFAULT when updating the timeout fails, not reporting if packets were successfully received, i.e. they end up being dropped, as userspace can't easily figure out if some was received short of painting it with some pattern and then checking the ones that aren't with that pattern. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu: On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. (I don't think that works...) Lets try both then, That's better! attached goes the updated patch, and this is the diff to the last combined one: diff --git a/net/socket.c b/net/socket.c index 310a50971769..379be43879db 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys); - if (datagrams 0 - copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) + if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) datagrams = -EFAULT; return datagrams; -- This is a quick thing just to show where the problem lies, need to think how to report an -EFAULT at this point properly, i.e. look at __sys_recvmmsg for something related (returning the number of successfully copied datagrams to userspace while storing the error for subsequent reporting): if (err == 0) return datagrams; if (datagrams != 0) { /* * We may return less entries than requested (vlen) if * the * sock is non block and there aren't enough * datagrams... */ if (err != -EAGAIN) { /* * ... or if recvmsg returns an error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). */ sock-sk-sk_err = -err; } return datagrams; } I.e. userspace would have to use getsockopt(SO_ERROR)... need to think more about it, sidetracked now, will be back to this. Anyway, attached goes the current combined patch. So, I applied against net-next as you suggested offlist. Builds and generally tests fine. Some observations: * In the case that the call is interrupted by a signal handler and no datagrams have been received, the call fails with EINTR, as expected. * The call always updates 'timeout', both in the success case and in the EINTR case. (That seems fine.) But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to check. There is at least one other possibility: * Fetch no datagrams (i.e., the datagrams are left to receive in a future call), and the call fails with EINTR, and 'timeout' is updated. Maybe that possibility is hard to implement (not sure). But my main point is to make the current behavior clear, note the alternative, and ask: is the current behavior the best choice. (I'm not saying it's not, but I do want the choice to be a conscious one.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 02:20:10PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: attached goes the updated patch, and this is the diff to the last combined one: diff --git a/net/socket.c b/net/socket.c index 310a50971769..379be43879db 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, timeout_sys); - if (datagrams 0 - copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) + if (copy_to_user(timeout, timeout_sys, sizeof(timeout_sys))) datagrams = -EFAULT; return datagrams; -- This is a quick thing just to show where the problem lies, need to think how to report an -EFAULT at this point properly, i.e. look at Ok, so I can live with the way things were before this fix, i.e. if the user specifies a timeout, then if it fails when copying to remaining time to userspace (copy_to_user call above), then we return -EFAULT. I.e. there would be no change in behaviour, but then perhaps we should go with the interface that is in place when we received some datagrams and then some error happens, see comment in the existing code, below: __sys_recvmmsg for something related (returning the number of successfully copied datagrams to userspace while storing the error for subsequent reporting): if (err == 0) return datagrams; if (datagrams != 0) { /* * We may return less entries than requested (vlen) if * the sock is non block and there aren't enough * datagrams... */ if (err != -EAGAIN) { /* * ... or if recvmsg returns an error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). */ sock-sk-sk_err = -err; } return datagrams; } I.e. userspace would have to use getsockopt(SO_ERROR)... need to think more about it, sidetracked now, will be back to this. Anyway, attached goes the current combined patch. So, I applied against net-next as you suggested offlist. Builds and generally tests fine. Some observations: * In the case that the call is interrupted by a signal handler and no datagrams have been received, the call fails with EINTR, as expected. Ok * The call always updates 'timeout', both in the success case and in the EINTR case. (That seems fine.) Agreed that it is how it should behave. But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to Note that what the comment in the existing code says should apply here, namely that the next recv (m or mmsg) syscall on this socket will return what is in sock-sk-sk_err, that is the signal: sys_recvmmsg() sock-ops-recvmsg() (e.g. inet_recvmsg) sk-prot-recvmsg() (e.g., udp_recvmsg) skb_recv_datagram() wait_for_more_packets() sock_intr_errno() *err = -EINTR sock-sk-sk_err = err Next recv will end up calling skb_recv_datagram and that does: struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, int *peeked, int *off, int *err, long *timeop) { struct sk_buff *skb, *last; long timeo; /* * Caller is allowed not to check sk-sk_err before * skb_recv_datagram() */ int error = sock_error(sk); if (error) goto no_packet; SNIP out: if (timeop) *timeop = timeo; return NULL; no_packet: *err = error; goto out; } So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. So, the user can just try the next call and see the signal, and it is also possible to notice that the timeout didn't expire and less than vlen packets were received, so something went wrong and calling getsockopt(SO_ERROR) will clarify things. This is not some new error reporting facility, it predates recvmmsg, that merely uses it for consistency. How to properly report the -EFAULT when copying the remaining timeout to userspace is the special case here, with
RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: Arnaldo Carvalho de Melo ... But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to Note that what the comment in the existing code says should apply here, namely that the next recv (m or mmsg) syscall on this socket will return what is in sock-sk-sk_err, that is the signal: ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Wed, May 28, 2014 at 03:17:40PM +, David Laight escreveu: From: Arnaldo Carvalho de Melo ... But, another question... In the case that the call is interrupted by a signal handler and some datagrams have already been received, then the call succeeds, and returns the number of datagrams received, and 'timeout' is updated with the remaining time. Maybe that's the right behavior, but I just want to Note that what the comment in the existing code says should apply here, namely that the next recv (m or mmsg) syscall on this socket will return what is in sock-sk-sk_err, that is the signal: ... So, yes, the user _can_ process the packets already copied to userspace, i.e. no packet loss, and then, on the next call, will receive the signal notification. The application shouldn't need to see an EINTR response, any signal handler should be run when the system call returns to user (regardless of the system call result code). If that doesn't happen Linux is badly broken! From an application point of view this is exactly the same as the signal occurring just before/after the kernel entry/exit for the system call. The call should just return early with success status. No need to preserve the EINTR response for later. The same might be appropriate for other errors - maybe including EFAULT copying non-initial messages to userspace. Put the message being processed back on the socket queue and return success with the (non-zero) partial message count. We don't need to put anything back, if we get an EFAULT for a datagram, then we stop processing that packet, _dropping_ it (and that is just like recvmsg works, look at __skb_recv_datagram, the skb_unlink there, and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails) and stop the batch, and if no datagrams were received, return the error straight away. But if some datagrams were successfully received, and at that point _already_ removed from queues and sent successfully to userspace, recvmmsg will return the number of successfully copied datagrams and store the error so that it can return on the next syscall, Please refer to the original discussion on how to report how many successfully copied datagrams and also report that it stopped before the timeout and the number of requested datagrams in a batch: http://lkml.kernel.org/r/200905221022.48790.remi.denis-courm...@nokia.com What is being discussed here is how to return the EFAULT that may happen _after_ datagram processing, be it interrupted by an EFAULT, signal, or plain returning all that was requested, with no errors. This EFAULT _after_ datagram processing may happen when updating the remaining timeout, because then how can userspace both receive the number of successfully copied datagrams (in any of the cases mentioned in the previous paragraph) and know that that timeout can't be used because there was a problem while trying to copy it to userspace (EFAULT)? - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) > escreveu: >> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo >> wrote: >>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) >>> escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: > Can you try the attached patch on top of the first one? >>> Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this >>> >>> It was meant to highlight the changes with regard to the previous patch, >>> i.e. to make things easier for reviewing. >> >> (I don't think that works...) > > Lets try both then, attached goes the updated patch, and this is the > diff to the last combined one: What tree does this apply to? I tried applying to 3.15-rc7, but a piece was rejected, and the fix was not obvious. Cheers, Michael drivers/net/tun.c.rej --- drivers/net/tun.c +++ drivers/net/tun.c @@ -1343,7 +1343,7 @@ /* Read frames from queue */ skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0, - , , ); + , , , timeop); if (skb) { ret = tun_put_user(tun, tfile, skb, iv, len); kfree_skb(skb); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) > escreveu: >> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: >> > Can you try the attached patch on top of the first one? > >> Patches on patches is a way to make your testers work unnecessarily >> harder. Also, it means that anyone else who was interested in this > > It was meant to highlight the changes with regard to the previous patch, > i.e. to make things easier for reviewing. (I don't think that works...) >> thread likely got lost at this point, because they probably didn't >> save the first patch. All of this to say: it makes life much easier >> if you provide a complete new self-contained patch on each iteration. > > If you prefer it that way, find one attached, that I was about to send > (but you can wait till I use your program to test it ;-) ) > >> > It starts adding explicit parentheses on a ternary, as David requested, >> > and then should return the remaining timeouts in cases like signals, >> > etc. >> > >> > Please let me know if this is enough. >> >> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) > > What was the problem experienced? The problem is that after EINTR, the timeout is not updated with the remaining time until expiry. (This was true with just patch 1 applied, and is also true with both patch 1 and patch 2 applied.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: > On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: > > Can you try the attached patch on top of the first one? > Patches on patches is a way to make your testers work unnecessarily > harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. > thread likely got lost at this point, because they probably didn't > save the first patch. All of this to say: it makes life much easier > if you provide a complete new self-contained patch on each iteration. If you prefer it that way, find one attached, that I was about to send (but you can wait till I use your program to test it ;-) ) > > It starts adding explicit parentheses on a ternary, as David requested, > > and then should return the remaining timeouts in cases like signals, > > etc. > > > > Please let me know if this is enough. > > Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) What was the problem experienced? > > P.S. compile testing while sending this message :-) > > Okay -- how about some real testing for the next version ;-). I've appended Hey, you were provinding that real testing! thanks for that! :-) > my test program below. You can use it as follows: > > ./t_recvmmsg ... > > (The timeout can also be '-' meaning use NULL as the timeout argument.) Thanks for the test proggie, will use it. > Cheers, > > Michael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) >> escreveu: >>> Thanks! I applied this patch against 3.15-rc6. > >>> recvmmsg() now (mostly) does what I expect: >>> * it waits until either the timeout expires or vlen messages >>> have been received >>> * If no message is received before timeout, it returns -1/EAGAIN. >>> * If vlen messages are received before the timeout expires, then >>> the remaining time is returned in timeout. > >>> One question: in the event that the call is interrupted by a signal >>> handler, it fails (as expected) with EINTR, but the 'timeout' value is >>> not updated with the remaining time on the timer. Would it be desirable >>> to emulate the behavior of select() (and other syscalls) in this >>> respect, and instead return the remaining time if interrupted by >>> a signal? > >> I think so, will check how to achieve that! > > Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this thread likely got lost at this point, because they probably didn't save the first patch. All of this to say: it makes life much easier if you provide a complete new self-contained patch on each iteration. > It starts adding explicit parentheses on a ternary, as David requested, > and then should return the remaining timeouts in cases like signals, > etc. > > Please let me know if this is enough. Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) > P.S. compile testing while sending this message :-) Okay -- how about some real testing for the next version ;-). I've appended my test program below. You can use it as follows: ./t_recvmmsg ... (The timeout can also be '-' meaning use NULL as the timeout argument.) Cheers, Michael /* t_recvmmsg.c A simple test program for the Linux-specific recvmmsg() system call. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \ } while (0) static int /* Public interfaces: inetBind() and inetListen() */ createBoundSocket(const char *service, int type, socklen_t *addrlen) { struct addrinfo hints; struct addrinfo *result, *rp; int sfd, optval, s; memset(, 0, sizeof(struct addrinfo)); hints.ai_canonname = NULL; hints.ai_addr = NULL; hints.ai_next = NULL; hints.ai_socktype = type; hints.ai_family = AF_UNSPEC;/* Allows IPv4 or IPv6 */ hints.ai_flags = AI_PASSIVE;/* Use wildcard IP address */ s = getaddrinfo(NULL, service, , ); if (s != 0) return -1; /* Walk through returned list until we find an address structure that can be used to successfully create and bind a socket */ optval = 1; for (rp = result; rp != NULL; rp = rp->ai_next) { sfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (sfd == -1) continue; /* On error, try next address */ if (bind(sfd, rp->ai_addr, rp->ai_addrlen) == 0) break; /* Success */ /* bind() failed: close this socket and try next address */ close(sfd); } if (rp != NULL && addrlen != NULL) *addrlen = rp->ai_addrlen; /* Return address structure size */ freeaddrinfo(result); return (rp == NULL) ? -1 : sfd; } static void handler() { /* Just interrupt a syscall */ } int main(int argc, char *argv[]) { int sfd, vlen, j, s; struct mmsghdr *msgvecp; struct timespec ts; struct timespec *tsp; struct sigaction sa; if (argc < 4) { fprintf(stderr, "Usage: %s port tmo-secs buf-len...\n", argv[0]); exit(EXIT_FAILURE); } sfd = createBoundSocket(argv[1], SOCK_DGRAM, NULL); if (sfd == -1) { fprintf(stderr, "Could not create server socket (%s)", strerror(errno)); exit(EXIT_FAILURE); } /* Handle a signal, so we can test behaviour when recvmmsg() is interrupted by a signal */ sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset(_mask); if (sigaction (SIGQUIT, , NULL) == -1) errExit("sigaction"); /* argv[2] specifies recvmmsg() timeout in seconds, or is '-', meaning using NULL argument to get infinite timeout */ if (argv[2][0] == '-') { tsp = NULL; } else { ts.tv_sec = atoi(argv[2]); ts.tv_nsec = 0; tsp = } /* Remaining command-line arguments specify the size of recvmmsg() buffers */ /* The second argument to recvmmsg() is a pointer to an array of mmsghdr structures. Each
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu: Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu: Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? I think so, will check how to achieve that! Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this thread likely got lost at this point, because they probably didn't save the first patch. All of this to say: it makes life much easier if you provide a complete new self-contained patch on each iteration. It starts adding explicit parentheses on a ternary, as David requested, and then should return the remaining timeouts in cases like signals, etc. Please let me know if this is enough. Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) P.S. compile testing while sending this message :-) Okay -- how about some real testing for the next version ;-). I've appended my test program below. You can use it as follows: ./t_recvmmsg port timeout-in-secs bufsize... (The timeout can also be '-' meaning use NULL as the timeout argument.) Cheers, Michael /* t_recvmmsg.c A simple test program for the Linux-specific recvmmsg() system call. */ #define _GNU_SOURCE #include sys/time.h #include signal.h #include sys/socket.h #include netdb.h #include sys/types.h #include stdio.h #include stdlib.h #include unistd.h #include string.h #include errno.h #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \ } while (0) static int /* Public interfaces: inetBind() and inetListen() */ createBoundSocket(const char *service, int type, socklen_t *addrlen) { struct addrinfo hints; struct addrinfo *result, *rp; int sfd, optval, s; memset(hints, 0, sizeof(struct addrinfo)); hints.ai_canonname = NULL; hints.ai_addr = NULL; hints.ai_next = NULL; hints.ai_socktype = type; hints.ai_family = AF_UNSPEC;/* Allows IPv4 or IPv6 */ hints.ai_flags = AI_PASSIVE;/* Use wildcard IP address */ s = getaddrinfo(NULL, service, hints, result); if (s != 0) return -1; /* Walk through returned list until we find an address structure that can be used to successfully create and bind a socket */ optval = 1; for (rp = result; rp != NULL; rp = rp-ai_next) { sfd = socket(rp-ai_family, rp-ai_socktype, rp-ai_protocol); if (sfd == -1) continue; /* On error, try next address */ if (bind(sfd, rp-ai_addr, rp-ai_addrlen) == 0) break; /* Success */ /* bind() failed: close this socket and try next address */ close(sfd); } if (rp != NULL addrlen != NULL) *addrlen = rp-ai_addrlen; /* Return address structure size */ freeaddrinfo(result); return (rp == NULL) ? -1 : sfd; } static void handler() { /* Just interrupt a syscall */ } int main(int argc, char *argv[]) { int sfd, vlen, j, s; struct mmsghdr *msgvecp; struct timespec ts; struct timespec *tsp; struct sigaction sa; if (argc 4) { fprintf(stderr, Usage: %s port tmo-secs buf-len...\n, argv[0]); exit(EXIT_FAILURE); } sfd = createBoundSocket(argv[1], SOCK_DGRAM, NULL); if (sfd == -1) { fprintf(stderr, Could not create server socket (%s), strerror(errno)); exit(EXIT_FAILURE); } /* Handle a signal, so we can test behaviour when recvmmsg() is interrupted by a signal */ sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset(sa.sa_mask); if (sigaction (SIGQUIT, sa, NULL) == -1) errExit(sigaction); /* argv[2] specifies recvmmsg() timeout in seconds, or is '-', meaning using NULL argument to get infinite timeout */ if (argv[2][0] == '-') { tsp = NULL; } else { ts.tv_sec = atoi(argv[2]); ts.tv_nsec = 0; tsp = ts; } /* Remaining command-line arguments specify the size of recvmmsg() buffers */ /* The second argument to recvmmsg()
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. thread likely got lost at this point, because they probably didn't save the first patch. All of this to say: it makes life much easier if you provide a complete new self-contained patch on each iteration. If you prefer it that way, find one attached, that I was about to send (but you can wait till I use your program to test it ;-) ) It starts adding explicit parentheses on a ternary, as David requested, and then should return the remaining timeouts in cases like signals, etc. Please let me know if this is enough. Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) What was the problem experienced? P.S. compile testing while sending this message :-) Okay -- how about some real testing for the next version ;-). I've appended Hey, you were provinding that real testing! thanks for that! :-) my test program below. You can use it as follows: ./t_recvmmsg port timeout-in-secs bufsize... (The timeout can also be '-' meaning use NULL as the timeout argument.) Thanks for the test proggie, will use it. Cheers, Michael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. (I don't think that works...) thread likely got lost at this point, because they probably didn't save the first patch. All of this to say: it makes life much easier if you provide a complete new self-contained patch on each iteration. If you prefer it that way, find one attached, that I was about to send (but you can wait till I use your program to test it ;-) ) It starts adding explicit parentheses on a ternary, as David requested, and then should return the remaining timeouts in cases like signals, etc. Please let me know if this is enough. Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7) What was the problem experienced? The problem is that after EINTR, the timeout is not updated with the remaining time until expiry. (This was true with just patch 1 applied, and is also true with both patch 1 and patch 2 applied.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote: Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu: On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo a...@ghostprotocols.net wrote: Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu: On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote: Can you try the attached patch on top of the first one? Patches on patches is a way to make your testers work unnecessarily harder. Also, it means that anyone else who was interested in this It was meant to highlight the changes with regard to the previous patch, i.e. to make things easier for reviewing. (I don't think that works...) Lets try both then, attached goes the updated patch, and this is the diff to the last combined one: What tree does this apply to? I tried applying to 3.15-rc7, but a piece was rejected, and the fix was not obvious. Cheers, Michael drivers/net/tun.c.rej --- drivers/net/tun.c +++ drivers/net/tun.c @@ -1343,7 +1343,7 @@ /* Read frames from queue */ skb = __skb_recv_datagram(tfile-socket.sk, noblock ? MSG_DONTWAIT : 0, - peeked, off, err); + peeked, off, err, timeop); if (skb) { ret = tun_put_user(tun, tfile, skb, iv, len); kfree_skb(skb); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu: > Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) > escreveu: > > Thanks! I applied this patch against 3.15-rc6. > > recvmmsg() now (mostly) does what I expect: > > * it waits until either the timeout expires or vlen messages > > have been received > > * If no message is received before timeout, it returns -1/EAGAIN. > > * If vlen messages are received before the timeout expires, then > > the remaining time is returned in timeout. > > One question: in the event that the call is interrupted by a signal > > handler, it fails (as expected) with EINTR, but the 'timeout' value is > > not updated with the remaining time on the timer. Would it be desirable > > to emulate the behavior of select() (and other syscalls) in this > > respect, and instead return the remaining time if interrupted by > > a signal? > I think so, will check how to achieve that! Can you try the attached patch on top of the first one? It starts adding explicit parentheses on a ternary, as David requested, and then should return the remaining timeouts in cases like signals, etc. Please let me know if this is enough. - Arnaldo P.S. compile testing while sending this message :-) diff --git a/include/net/sock.h b/include/net/sock.h index aef3d7f9c3fa..c48f61c79801 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2106,7 +2106,7 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock) { - return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo; + return noblock ? 0 : (timeop ? *timeop : sk->sk_rcvtimeo); } static inline long sock_sndtimeo(const struct sock *sk, bool noblock) diff --git a/net/core/datagram.c b/net/core/datagram.c index a08c4c9dcd23..0dd1715374fa 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -224,12 +224,14 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, goto no_packet; } while (!wait_for_more_packets(sk, err, , last)); - +out: + if (timeop) + *timeop = timeo; return NULL; no_packet: *err = error; - return NULL; + goto out; } EXPORT_SYMBOL(__skb_recv_datagram); diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c index feaacaa0c970..0991da69f39d 100644 --- a/net/irda/af_irda.c +++ b/net/irda/af_irda.c @@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock, finish_wait(sk_sleep(sk), ); - if (err) - return err; + if (err) { + copied = err; + break; + } if (sk->sk_shutdown & RCV_SHUTDOWN) break; diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c index e8b8bb3d50ab..e9082ed598cd 100644 --- a/net/rxrpc/ar-recvmsg.c +++ b/net/rxrpc/ar-recvmsg.c @@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock, release_sock(>sk); if (continue_call) rxrpc_put_call(continue_call); - return -ENODATA; + copied = -ENODATA; + goto out_copied; } } @@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock, release_sock(>sk); rxrpc_put_call(continue_call); _leave(" = %d [noncont]", copied); - return copied; + goto out_copied; } } @@ -251,9 +252,10 @@ out: rxrpc_put_call(call); if (continue_call) rxrpc_put_call(continue_call); + _leave(" = %d [data]", copied); +out_copied: if (timeop) *timeop = timeo; - _leave(" = %d [data]", copied); return copied; /* handle non-DATA messages such as aborts, incoming connections and @@ -330,7 +332,8 @@ terminal_message: if (continue_call) rxrpc_put_call(continue_call); _leave(" = %d", ret); - return ret; + copied = ret; + goto out_copied; copy_error: _debug("copy error"); @@ -339,7 +342,8 @@ copy_error: if (continue_call) rxrpc_put_call(continue_call); _leave(" = %d", ret); - return ret; + copied = ret; + goto out_copied; wait_interrupted: ret = sock_intr_errno(timeo); @@ -350,8 +354,7 @@ wait_error: if (copied) copied = ret; _leave(" = %d [waitfail
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu: > Hi Arnaldo, > > On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: > > Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: > >> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) > >> escreveu: > >>> Hi Arnaldo, > > > >>> Ping! > > > >> I acknowledge the problem, the timeout has to be passed to the > >> underlying ->recvmsg() implementations that should return the time spent > >> waiting for each packet, so that we can accrue that at recvmmsg level. > > > >> We can do either passing an extra timeout parameter to the recvmsg > >> implementations or using some struct sock member to specify that > >> timeout. > > > >> The first approach is intrusive, touches tons of files, so I'll try > >> making it all mostly transparent by hooking into sock_rcvtimeo() > >> somehow. > > > > But after thinking a bit more, looks like we need to do that, please > > take a look at the attached patch to see if it addresses the problem. > > > > Mostly it adds a new timeop to the per protocol recvmsg() > > implementations, that, if not NULL, should be used instead of > > SO_RCVTIMEO. > > > > since the underlying recvmsg implementations already check that timeout, > > return what is remaining, that will then be used in subsequent recvmsg > > calls, at the end we just convert it back to timespec format. > > > > In most cases it is just passed to skb_recv_datagram, that will check > > the pointer, use it and update if not NULL. > > > > Should have no problems, but I only did a boot with a system with this > > patch applied, no problems noticed on a normal desktop session, ssh, > > etc. > > Thanks! I applied this patch against 3.15-rc6. > > recvmmsg() now (mostly) does what I expect: > * it waits until either the timeout expires or vlen messages > have been received > * If no message is received before timeout, it returns -1/EAGAIN. > * If vlen messages are received before the timeout expires, then > the remaining time is returned in timeout. > > One question: in the event that the call is interrupted by a signal > handler, it fails (as expected) with EINTR, but the 'timeout' value is > not updated with the remaining time on the timer. Would it be desirable > to emulate the behavior of select() (and other syscalls) in this > respect, and instead return the remaining time if interrupted by > a signal? I think so, will check how to achieve that! > Cheers, > > Michael > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, Ping! I acknowledge the problem, the timeout has to be passed to the underlying -recvmsg() implementations that should return the time spent waiting for each packet, so that we can accrue that at recvmmsg level. We can do either passing an extra timeout parameter to the recvmsg implementations or using some struct sock member to specify that timeout. The first approach is intrusive, touches tons of files, so I'll try making it all mostly transparent by hooking into sock_rcvtimeo() somehow. But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? I think so, will check how to achieve that! Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu: Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu: Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? I think so, will check how to achieve that! Can you try the attached patch on top of the first one? It starts adding explicit parentheses on a ternary, as David requested, and then should return the remaining timeouts in cases like signals, etc. Please let me know if this is enough. - Arnaldo P.S. compile testing while sending this message :-) diff --git a/include/net/sock.h b/include/net/sock.h index aef3d7f9c3fa..c48f61c79801 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2106,7 +2106,7 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock) { - return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo; + return noblock ? 0 : (timeop ? *timeop : sk-sk_rcvtimeo); } static inline long sock_sndtimeo(const struct sock *sk, bool noblock) diff --git a/net/core/datagram.c b/net/core/datagram.c index a08c4c9dcd23..0dd1715374fa 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -224,12 +224,14 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, goto no_packet; } while (!wait_for_more_packets(sk, err, timeo, last)); - +out: + if (timeop) + *timeop = timeo; return NULL; no_packet: *err = error; - return NULL; + goto out; } EXPORT_SYMBOL(__skb_recv_datagram); diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c index feaacaa0c970..0991da69f39d 100644 --- a/net/irda/af_irda.c +++ b/net/irda/af_irda.c @@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock, finish_wait(sk_sleep(sk), wait); - if (err) - return err; + if (err) { + copied = err; + break; + } if (sk-sk_shutdown RCV_SHUTDOWN) break; diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c index e8b8bb3d50ab..e9082ed598cd 100644 --- a/net/rxrpc/ar-recvmsg.c +++ b/net/rxrpc/ar-recvmsg.c @@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock, release_sock(rx-sk); if (continue_call) rxrpc_put_call(continue_call); - return -ENODATA; + copied = -ENODATA; + goto out_copied; } } @@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock, release_sock(rx-sk); rxrpc_put_call(continue_call); _leave( = %d [noncont], copied); - return copied; + goto out_copied; } } @@ -251,9 +252,10 @@ out: rxrpc_put_call(call); if (continue_call) rxrpc_put_call(continue_call); + _leave( = %d [data], copied); +out_copied: if (timeop) *timeop = timeo; - _leave( = %d [data], copied); return copied; /* handle non-DATA messages such as aborts, incoming connections and @@ -330,7 +332,8 @@ terminal_message: if (continue_call) rxrpc_put_call(continue_call); _leave( = %d, ret); - return ret; + copied = ret; + goto out_copied; copy_error: _debug(copy error); @@ -339,7 +342,8 @@ copy_error: if (continue_call) rxrpc_put_call(continue_call); _leave( = %d, ret); - return ret; + copied = ret; + goto out_copied; wait_interrupted: ret = sock_intr_errno(timeo); @@ -350,8 +354,7 @@ wait_error: if (copied) copied = ret; _leave( = %d [waitfail %d], copied, ret); -
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Ping! On 05/22/2014 04:27 PM, Michael Kerrisk (man-pages) wrote: > Hi Arnaldo, > > On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: >> Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: >>> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) >>> escreveu: Hi Arnaldo, >> Ping! >> >>> I acknowledge the problem, the timeout has to be passed to the >>> underlying ->recvmsg() implementations that should return the time spent >>> waiting for each packet, so that we can accrue that at recvmmsg level. >> >>> We can do either passing an extra timeout parameter to the recvmsg >>> implementations or using some struct sock member to specify that >>> timeout. >> >>> The first approach is intrusive, touches tons of files, so I'll try >>> making it all mostly transparent by hooking into sock_rcvtimeo() >>> somehow. >> >> But after thinking a bit more, looks like we need to do that, please >> take a look at the attached patch to see if it addresses the problem. >> >> Mostly it adds a new timeop to the per protocol recvmsg() >> implementations, that, if not NULL, should be used instead of >> SO_RCVTIMEO. >> >> since the underlying recvmsg implementations already check that timeout, >> return what is remaining, that will then be used in subsequent recvmsg >> calls, at the end we just convert it back to timespec format. >> >> In most cases it is just passed to skb_recv_datagram, that will check >> the pointer, use it and update if not NULL. >> >> Should have no problems, but I only did a boot with a system with this >> patch applied, no problems noticed on a normal desktop session, ssh, >> etc. > > Thanks! I applied this patch against 3.15-rc6. > > recvmmsg() now (mostly) does what I expect: > * it waits until either the timeout expires or vlen messages > have been received > * If no message is received before timeout, it returns -1/EAGAIN. > * If vlen messages are received before the timeout expires, then > the remaining time is returned in timeout. > > One question: in the event that the call is interrupted by a signal > handler, it fails (as expected) with EINTR, but the 'timeout' value is > not updated with the remaining time on the timer. Would it be desirable > to emulate the behavior of select() (and other syscalls) in this > respect, and instead return the remaining time if interrupted by > a signal? > > Cheers, > > Michael > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/23/2014 09:55 PM, Arnaldo Carvalho de Melo wrote: > Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu: >> From: Arnaldo Carvalho de Melo >> Date: Wed, 21 May 2014 18:05:35 -0300 > >>> But after thinking a bit more, looks like we need to do that, please >>> take a look at the attached patch to see if it addresses the problem. > >>> Mostly it adds a new timeop to the per protocol recvmsg() >>> implementations, that, if not NULL, should be used instead of >>> SO_RCVTIMEO. > >>> since the underlying recvmsg implementations already check that timeout, >>> return what is remaining, that will then be used in subsequent recvmsg >>> calls, at the end we just convert it back to timespec format. > >>> In most cases it is just passed to skb_recv_datagram, that will check >>> the pointer, use it and update if not NULL. > >>> Should have no problems, but I only did a boot with a system with this >>> patch applied, no problems noticed on a normal desktop session, ssh, >>> etc. > >> This looks fine to me, but I have a small request: > >> +return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo; > >> I keep forgetting which way these expressions associate, so if you could >> parenthesize the innermost ?: I'd appreciate it. :) > > Ok, I actually wrote a sample program to verify that these ternaries did > what I meant 8) > > I'll finish the cset log and do this clarification change. > > Would be great to get Acked-by tags from the original reporter, Michael > and whoever had a look at this change, if possible. Michael, Elie? Arnaldo, I already sent you a reply (will reping on that one), but got no response. My light testing got the expected results, but I still had one question about the semantics. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
On 05/23/2014 09:55 PM, Arnaldo Carvalho de Melo wrote: Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu: From: Arnaldo Carvalho de Melo a...@kernel.org Date: Wed, 21 May 2014 18:05:35 -0300 But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. This looks fine to me, but I have a small request: +return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo; I keep forgetting which way these expressions associate, so if you could parenthesize the innermost ?: I'd appreciate it. :) Ok, I actually wrote a sample program to verify that these ternaries did what I meant 8) I'll finish the cset log and do this clarification change. Would be great to get Acked-by tags from the original reporter, Michael and whoever had a look at this change, if possible. Michael, Elie? Arnaldo, I already sent you a reply (will reping on that one), but got no response. My light testing got the expected results, but I still had one question about the semantics. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Ping! On 05/22/2014 04:27 PM, Michael Kerrisk (man-pages) wrote: Hi Arnaldo, On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, Ping! I acknowledge the problem, the timeout has to be passed to the underlying -recvmsg() implementations that should return the time spent waiting for each packet, so that we can accrue that at recvmmsg level. We can do either passing an extra timeout parameter to the recvmsg implementations or using some struct sock member to specify that timeout. The first approach is intrusive, touches tons of files, so I'll try making it all mostly transparent by hooking into sock_rcvtimeo() somehow. But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu: > From: Arnaldo Carvalho de Melo > Date: Wed, 21 May 2014 18:05:35 -0300 > > But after thinking a bit more, looks like we need to do that, please > > take a look at the attached patch to see if it addresses the problem. > > Mostly it adds a new timeop to the per protocol recvmsg() > > implementations, that, if not NULL, should be used instead of > > SO_RCVTIMEO. > > since the underlying recvmsg implementations already check that timeout, > > return what is remaining, that will then be used in subsequent recvmsg > > calls, at the end we just convert it back to timespec format. > > In most cases it is just passed to skb_recv_datagram, that will check > > the pointer, use it and update if not NULL. > > Should have no problems, but I only did a boot with a system with this > > patch applied, no problems noticed on a normal desktop session, ssh, > > etc. > This looks fine to me, but I have a small request: > + return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo; > I keep forgetting which way these expressions associate, so if you could > parenthesize the innermost ?: I'd appreciate it. :) Ok, I actually wrote a sample program to verify that these ternaries did what I meant 8) I'll finish the cset log and do this clarification change. Would be great to get Acked-by tags from the original reporter, Michael and whoever had a look at this change, if possible. Michael, Elie? > Thanks! Thanks a lot for reviewing it! - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: Arnaldo Carvalho de Melo Date: Wed, 21 May 2014 18:05:35 -0300 > But after thinking a bit more, looks like we need to do that, please > take a look at the attached patch to see if it addresses the problem. > > Mostly it adds a new timeop to the per protocol recvmsg() > implementations, that, if not NULL, should be used instead of > SO_RCVTIMEO. > > since the underlying recvmsg implementations already check that timeout, > return what is remaining, that will then be used in subsequent recvmsg > calls, at the end we just convert it back to timespec format. > > In most cases it is just passed to skb_recv_datagram, that will check > the pointer, use it and update if not NULL. > > Should have no problems, but I only did a boot with a system with this > patch applied, no problems noticed on a normal desktop session, ssh, > etc. This looks fine to me, but I have a small request: + return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo; I keep forgetting which way these expressions associate, so if you could parenthesize the innermost ?: I'd appreciate it. :) Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
From: Arnaldo Carvalho de Melo a...@kernel.org Date: Wed, 21 May 2014 18:05:35 -0300 But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. This looks fine to me, but I have a small request: + return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo; I keep forgetting which way these expressions associate, so if you could parenthesize the innermost ?: I'd appreciate it. :) Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu: From: Arnaldo Carvalho de Melo a...@kernel.org Date: Wed, 21 May 2014 18:05:35 -0300 But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. This looks fine to me, but I have a small request: + return noblock ? 0 : timeop ? *timeop : sk-sk_rcvtimeo; I keep forgetting which way these expressions associate, so if you could parenthesize the innermost ?: I'd appreciate it. :) Ok, I actually wrote a sample program to verify that these ternaries did what I meant 8) I'll finish the cset log and do this clarification change. Would be great to get Acked-by tags from the original reporter, Michael and whoever had a look at this change, if possible. Michael, Elie? Thanks! Thanks a lot for reviewing it! - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) >> escreveu: >>> Hi Arnaldo, > >>> Ping! > >> I acknowledge the problem, the timeout has to be passed to the >> underlying ->recvmsg() implementations that should return the time spent >> waiting for each packet, so that we can accrue that at recvmmsg level. > >> We can do either passing an extra timeout parameter to the recvmsg >> implementations or using some struct sock member to specify that >> timeout. > >> The first approach is intrusive, touches tons of files, so I'll try >> making it all mostly transparent by hooking into sock_rcvtimeo() >> somehow. > > But after thinking a bit more, looks like we need to do that, please > take a look at the attached patch to see if it addresses the problem. > > Mostly it adds a new timeop to the per protocol recvmsg() > implementations, that, if not NULL, should be used instead of > SO_RCVTIMEO. > > since the underlying recvmsg implementations already check that timeout, > return what is remaining, that will then be used in subsequent recvmsg > calls, at the end we just convert it back to timespec format. > > In most cases it is just passed to skb_recv_datagram, that will check > the pointer, use it and update if not NULL. > > Should have no problems, but I only did a boot with a system with this > patch applied, no problems noticed on a normal desktop session, ssh, > etc. Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
Hi Arnaldo, On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote: Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu: Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu: Hi Arnaldo, Ping! I acknowledge the problem, the timeout has to be passed to the underlying -recvmsg() implementations that should return the time spent waiting for each packet, so that we can accrue that at recvmmsg level. We can do either passing an extra timeout parameter to the recvmsg implementations or using some struct sock member to specify that timeout. The first approach is intrusive, touches tons of files, so I'll try making it all mostly transparent by hooking into sock_rcvtimeo() somehow. But after thinking a bit more, looks like we need to do that, please take a look at the attached patch to see if it addresses the problem. Mostly it adds a new timeop to the per protocol recvmsg() implementations, that, if not NULL, should be used instead of SO_RCVTIMEO. since the underlying recvmsg implementations already check that timeout, return what is remaining, that will then be used in subsequent recvmsg calls, at the end we just convert it back to timespec format. In most cases it is just passed to skb_recv_datagram, that will check the pointer, use it and update if not NULL. Should have no problems, but I only did a boot with a system with this patch applied, no problems noticed on a normal desktop session, ssh, etc. Thanks! I applied this patch against 3.15-rc6. recvmmsg() now (mostly) does what I expect: * it waits until either the timeout expires or vlen messages have been received * If no message is received before timeout, it returns -1/EAGAIN. * If vlen messages are received before the timeout expires, then the remaining time is returned in timeout. One question: in the event that the call is interrupted by a signal handler, it fails (as expected) with EINTR, but the 'timeout' value is not updated with the remaining time on the timer. Would it be desirable to emulate the behavior of select() (and other syscalls) in this respect, and instead return the remaining time if interrupted by a signal? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/