Re: pselect/etc semantics
"Eric W. Biederman" wrote: > Frankly the only reason this appears to be worth touching is that we > have a userspace regression. Otherwise I would call the current > behavior more correct and desirable than ignoring the signal longer. > > If I am reading sitaution properly I suggest we go back to resoring the > sigmask by hand in epoll_pwait, and put in a big fat comment about a > silly userspace program depending on that behavior. > > That will be easier to backport and it will just fix the regression and > not overfix the problem for reasonable applications. Fwiw, the cmogstored (userspace program which regressed) only hit this regression in an obscure code path for tuning; that code path probably sees no use outside of the test suite. Add to that, cmogstored is an unofficial and optional component to the obscure, largely-forgotten MogileFS. Finally, the main users of cmogstored are on FreeBSD. They hit the kqueue+self-pipe code path instead of epoll_pwait. I only used epoll_pwait on Linux since I figured I could save a few bytes of memory by skipping eventfd/self-pipe... Anyways, I updated cmogstored a few weeks back to call `note_run' (the signal dispatcher) if epoll_pwait (wrapped by `mog_idleq_wait_intr') returns 0 instead of -1 (EINTR) to workaround this kernel change: https://bogomips.org/cmogstored-public/20190511075630.17811-...@80x24.org/ I could easily make a change to call `note_run' unconditionally when `mog_idleq_wait_intr' returns, too. But that said, there could be other users hitting the same problem I did. So maybe cmogstored's primary use on Linux these days is finding regressions :>
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
On Thu, May 30, 2019 at 4:41 PM Oleg Nesterov wrote: > On 05/30, Arnd Bergmann wrote: > Plus every file touched by this patch asks for more cleanups. Say, do_poll() > should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly > EINTR->ERESTARTNOHAND in its callers. And more. > > > For the stable > > kernels, I think we want just the addition of the 'bool interrupted' > > argument > > to restore_user_sigmask() > > or simply revert this patch. I will check if this is possible today... At > first > glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did > "ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can > turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow. Right, there were several differences between the system calls that Deepa's original change got rid of. I don't know if any ones besides the do_pselect() return code can be observed in practice. > > > - ret = set_user_sigmask(ksig.sigmask, , , > > > ksig.sigsetsize); > > > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > > > if (ret) > > > return ret; > > > > > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : > > > NULL); > > > - restore_user_sigmask(ksig.sigmask, ); > > > - if (signal_pending(current) && !ret) > > > + > > > + interrupted = signal_pending(current); > > > + update_xxx(interrupted); > > > > Maybe name this > > > >restore_saved_sigmask_if(!interrupted); > > Yes, I thought about restore_if(), but to me > > restore_saved_sigmask_if(ret != -EINTR); > > doesn't look readable... May be > > restore_saved_sigmask_unless(ret == -EINTR); > > ? but actually I agree with any naming. Yes, restore_saved_sigmask_unless() probably better. > > With some of the recent discussions about compat syscall handling, > > I now think that we want to just fold set_compat_user_sigmask() > > into set_user_sigmask() > > agreed, and I thought about this too. But again, I'd prefer to do this > and other cleanups later, on top of this patch. Ok, fair enough. I don't care much about the order as long as the regression fix comes first. Arnd
Re: pselect/etc semantics
On Thu, May 30, 2019 at 3:54 AM Eric W. Biederman wrote: > Arnd Bergmann writes: > > On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov wrote: > > > > Not sure about the order of the cleanups, but probably something like > > this would work: > > > > 1. fix the race (to be backported) > > 2. unify set_compat_user_sigmask/set_user_sigmask > > 3. remove unneeded compat handlers > > 4. replace restore_user_sigmask with restore_saved_sigmask_if() > > 5. also unify compat_get_fd_set()/get_fd_set() and kill off > > compat select() variants. > > Are new system calls added preventing a revert of the patch in question > for stable kernels? Yes, a straight revert would not work, as it was done as a cleanup in order to simplify the following conversion. I suppose one could undo the cleanup in both the time32 and time64 versions of each syscall, but I would consider that a more risky change than just fixing the bug that was reported. Arnd
Re: pselect/etc semantics
Oleg Nesterov writes: > On 05/30, Eric W. Biederman wrote: >> >> ebied...@xmission.com (Eric W. Biederman) writes: >> >> > Which means I believe we have a semantically valid change in behavior >> > that is causing a regression. >> >> I haven't made a survey of all of the functions yet but >> fucntions return -ENORESTARTNOHAND will never return -EINTR and are >> immune from this problem. > > Hmm. handle_signal: > > case -ERESTARTNOHAND: > regs->ax = -EINTR; > break; > > but I am not sure I understand which problem do you mean.. Yes. My mistake. I looked at the transparent restart case for when a signal is not pending and failed to look at what happens when a signal is delivered. So yes. Everything changed does appear to have a behavioral difference where they can now succeed and not return -EINTR. Eric
Re: pselect/etc semantics
On Thu, May 30, 2019 at 8:48 AM Deepa Dinamani wrote: > > > On May 30, 2019, at 8:38 AM, Eric W. Biederman > > wrote: > > > > ebied...@xmission.com (Eric W. Biederman) writes: > > > >> Which means I believe we have a semantically valid change in behavior > >> that is causing a regression. > > > > I haven't made a survey of all of the functions yet but > > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > > immune from this problem. > > > > AKA pselect is fine. While epoll_pwait can be affected. > > This was my understanding as well. I think I was misremembered here. I had noted this before: https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=p...@mail.gmail.com/ "sys_io_pgetevents() does not seem to have this problem as we are still checking signal_pending() here. sys_pselect6() seems to have a similar problem. The changes to sys_pselect6() also impact sys_select() as the changes are in the common code path." This was the code replaced for io_pgetevents by 854a6ed56839a40f6b is as below. No matter what events completed, there was signal_pending() check after the return from do_io_getevents(). --- a/fs/aio.c +++ b/fs/aio.c @@ -2110,18 +2110,9 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - if (signal_pending(current)) { - if (ksig.sigmask) { - current->saved_sigmask = sigsaved; - set_restore_sigmask(); - } - - if (!ret) - ret = -ERESTARTNOHAND; - } else { - if (ksig.sigmask) - sigprocmask(SIG_SETMASK, , NULL); - } + restore_user_sigmask(ksig.sigmask, ); + if (signal_pending(current) && !ret) + ret = -ERESTARTNOHAND; Can I ask a simple question for my understanding? man page for epoll_pwait says EINTR The call was interrupted by a signal handler before either any of the requested events occurred or the timeout expired; see signal(7). But it is not clear to me if we can figure out(without race) the chronological order if one of the requested events are completed or a signal came first. Is this a correct exectation? Also like pointed out above, this behavior is not consistent for all such syscalls(io_pgetevents). Was this also by design? -Deepa
RE: pselect/etc semantics
From: Eric W. Biederman > Sent: 30 May 2019 16:38 > ebied...@xmission.com (Eric W. Biederman) writes: > > > Which means I believe we have a semantically valid change in behavior > > that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. Eh? ERESTARTNOHAND just makes the system call restart if there is no signal handler, EINTR should still be returned if there is a handler. All the functions that have a temporary signal mask are likely to be expected to work the same way and thus have the same bugs. http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html# isn't overly helpful. But I think it should return EINTR even if there is no handler unless SA_RESTART is set: [EINTR] The function was interrupted while blocked waiting for any of the selected descriptors to become ready and before the timeout interval expired. If SA_RESTART has been set for the interrupting signal, it is implementation-defined whether the function restarts or returns with [EINTR]. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: pselect/etc semantics
On 05/30, David Laight wrote: > > 4) As an optimisation a signal that arrives after the timer >expires, but before the mask is restored can be 'deemed' >to have happened before the timeout expired and EINTR >returned. This is what pselect/ppoll already does. Oleg.
Re: pselect/etc semantics
On 05/30, Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: > > > Which means I believe we have a semantically valid change in behavior > > that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. Hmm. handle_signal: case -ERESTARTNOHAND: regs->ax = -EINTR; break; but I am not sure I understand which problem do you mean.. Oleg.
Re: pselect/etc semantics
On 05/30, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand > > each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > If I have read this thread correctly the core issue is that ther is a > program that used to work and that fails now. The question is why. I didn't even try to investigate, I wasn't cc'ed initially and I then I had enough confusion when I started to look at the patch. However, 854a6ed56839a40f6 obviously introduced the user-visible change so I am not surprised it breaks something. And yes, personally I think this change is not right. > Which means I believe we have a semantically valid change in behavior > that is causing a regression. See below, > void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) > { > > if (!usigmask) > return; > /* >* When signals are pending, do not restore them here. >* Restoring sigmask here can lead to delivering signals that the above >* syscalls are intended to block because of the sigmask passed in. >*/ > if (signal_pending(current)) { > current->saved_sigmask = *sigsaved; > set_restore_sigmask(); > return; > } > > /* >* This is needed because the fast syscall return path does not restore >* saved_sigmask when signals are not pending. >*/ > set_current_blocked(sigsaved); > } > > Which has been reported results in a return value of 0, 0 or success > and a signal > delivered, where previously that did not happen. Yes. And to me this doesn't look right. OK, OK, probably this is because I never read the docs, only the source code in fs/select.c. But to me pselect() should either return success/timeout or deliver a signal. Not both. Even if the signal was already pending before pselect() was called. To clarify, "a signal" means a signal which was blocked before pselect(sigmask) and temporary unblocked in this syscall. And even if this doesn't violate posix, I see no reason to change the historic behaviour. And this regression probably means we can't ;) Oleg.
Re: pselect/etc semantics
> On May 30, 2019, at 8:38 AM, Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: > >> Which means I believe we have a semantically valid change in behavior >> that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. > > AKA pselect is fine. While epoll_pwait can be affected. This was my understanding as well. > Has anyone contacted Omar Kilani to see if that is his issue? > https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/ Omar was cc-ed when this regression was reported. I did cc him on fix and asked if he could try it. We have not heard from him. > So far the only regression report I am seeing is from Eric Wong. > AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/ > Are there any others? How did we get to be talking about more > than just epoll_pwait? This is the only report that I know of. I’m not sure why people started talking about pselect. I was also confused why instead of reviewing the patch and discussing the fix, we ended up talking about how to simplify the code. We have deviated much from what should have been a code review. -Deepa
Re: pselect/etc semantics
ebied...@xmission.com (Eric W. Biederman) writes: > Which means I believe we have a semantically valid change in behavior > that is causing a regression. I haven't made a survey of all of the functions yet but fucntions return -ENORESTARTNOHAND will never return -EINTR and are immune from this problem. AKA pselect is fine. While epoll_pwait can be affected. Has anyone contacted Omar Kilani to see if that is his issue? https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/ So far the only regression report I am seeing is from Eric Wong. AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/ Are there any others? How did we get to be talking about more than just epoll_pwait? Eric
RE: pselect/etc semantics
From: Eric W. Biederman > Sent: 30 May 2019 14:01 > Oleg Nesterov writes: > > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand > > each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > If I have read this thread correctly the core issue is that ther is a > program that used to work and that fails now. The question is why. > > There are two ways the semantics for a sigmask changing system call > can be defined. The naive way and by reading posix. I will pick > on pselect. > > The naive way: > int pselect(int nfds, fd_set *readfds, fd_set *writefds, > fd_set *exceptfds, const struct timespec *timeout, > const sigset_t *sigmask) > { > sigset_t oldmask; > int ret; > > if (sigmask) > sigprocmask(SIG_SETMASK, sigmask, ); > > ret = select(nfds, readfds, writefds, exceptfds, timeout); > > if (sigmask) > sigprocmask(SIG_SETMASK, , NULL); > > return ret; > } > > The standard for pselect behavior at > https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html > says: > > If sigmask is not a null pointer, then the pselect() function shall > > replace the signal mask of the caller by the set of signals pointed to > > by sigmask before examining the descriptors, and shall restore the > > signal mask of the calling thread before returning. Right, but that bit says nothing about when signals are 'delivered'. Section 2.4 of http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html isn't very enlightening either - well, pretty impenetrable really. But since the ToG specs don't require a user-kernel boundary I believe that the signal handler is expected to be called as soon as it is unmasked. Now, for async signals, the kernel can always pretend that they happened slightly later than they actually did. So signal delivery is delayed until 'return to user'. In some cases system calls are restarted to make the whole thing transparent. For pselect() etc I think this means: 1) Signal handlers can only be called if EINTR is returned. 2) If a signal is deliverable after the mask is changed then the signal hander must be called. This means EINTR must be returned. ie this must be detected before checking anything else. 3) A signal that is raised after the wait completes can be 'deemed' to have happened after the mask is restored and left masked until the applications next pselect() call. 4) As an optimisation a signal that arrives after the timer expires, but before the mask is restored can be 'deemed' to have happened before the timeout expired and EINTR returned. Now not all system calls that use a temporary mask may want to work that way. In particular they may want to return 'success' and have the signal handlers called. So maybe the set_xxx() function that installs to user's mask should return: -ve: errno value 0: no signal pending 1: signal pending. And the update_xxx() function that restores the saved mask should take a parameter to indicate whether signal handlers should be called maybe 'bool call_handlers' and return 0/1 depending on whether signals were pending. pselect() then contains: rval = set_xxx(); if (rval) { if (rval < 0) return rval; update_xxx(true); return -EINTR: } rval = wait(); #if 0 // don't report late signals update_xxx(rval == -EINTR); #else // report signals after timeout if (update_xxx(!rval || rval == -EINTR)) rval == -EINTR; #endif return rval; } David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
On 05/30, Arnd Bergmann wrote: > > I think this is a nice simplification, but it would help not to mix up the > minimal regression fix with the rewrite of those functions. Yes, yes, agreed. Plus every file touched by this patch asks for more cleanups. Say, do_poll() should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly EINTR->ERESTARTNOHAND in its callers. And more. > For the stable > kernels, I think we want just the addition of the 'bool interrupted' argument > to restore_user_sigmask() or simply revert this patch. I will check if this is possible today... At first glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did "ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow. > > - ret = set_user_sigmask(ksig.sigmask, , , > > ksig.sigsetsize); > > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > > if (ret) > > return ret; > > > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : > > NULL); > > - restore_user_sigmask(ksig.sigmask, ); > > - if (signal_pending(current) && !ret) > > + > > + interrupted = signal_pending(current); > > + update_xxx(interrupted); > > Maybe name this > >restore_saved_sigmask_if(!interrupted); Yes, I thought about restore_if(), but to me restore_saved_sigmask_if(ret != -EINTR); doesn't look readable... May be restore_saved_sigmask_unless(ret == -EINTR); ? but actually I agree with any naming. > and make restore_saved_sigmask_if() an inline function > next to restore_saved_sigmask()? agreed, > With some of the recent discussions about compat syscall handling, > I now think that we want to just fold set_compat_user_sigmask() > into set_user_sigmask() agreed, and I thought about this too. But again, I'd prefer to do this and other cleanups later, on top of this patch. Oleg.
Re: pselect/etc semantics
Eric Wong writes: > Agreed... I believe cmogstored has always had a bug in the way > it uses epoll_pwait because it failed to check interrupts if: > > a) an FD is ready + interrupt > b) epoll_pwait returns 0 on interrupt > > The bug remains in userspace for a), which I will fix by adding > an interrupt check when an FD is ready. The window is very > small for a) and difficult to trigger, and also in a rare code > path. > > The b) case is the kernel bug introduced in 854a6ed56839a40f > ("signal: Add restore_user_sigmask()"). > > I don't think there's any disagreement that b) is a kernel bug. See my reply to Oleg. I think (b) is a regression that needs to be fixed. I do not think that (b) is a kernel bug. Both versions of the of what sigmask means posix and naive will allow (b). Because fundamentally the sigmask is restored after the rest of the system call happens. Eric
Re: pselect/etc semantics
Oleg Nesterov writes: > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. If I have read this thread correctly the core issue is that ther is a program that used to work and that fails now. The question is why. There are two ways the semantics for a sigmask changing system call can be defined. The naive way and by reading posix. I will pick on pselect. The naive way: int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask) { sigset_t oldmask; int ret; if (sigmask) sigprocmask(SIG_SETMASK, sigmask, ); ret = select(nfds, readfds, writefds, exceptfds, timeout); if (sigmask) sigprocmask(SIG_SETMASK, , NULL); return ret; } The standard for pselect behavior at https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html says: > If sigmask is not a null pointer, then the pselect() function shall > replace the signal mask of the caller by the set of signals pointed to > by sigmask before examining the descriptors, and shall restore the > signal mask of the calling thread before returning. ... > On failure, the objects pointed to by the readfds, writefds, and > errorfds arguments shall not be modified. If the timeout interval > expires without the specified condition being true for any of the > specified file descriptors, the objects pointed to by the readfds, > writefds, and errorfds arguments shall have all bits set to 0. So the standard specified behavior is if the return value is -EINTR you know you were interrupted by a signal, for any other return value you don't know. An error value just indicates that the file descriptor sets have not been updated. That is what the standard and reasonable people will expect from the system call. Looking at set_user_sigmask and restore_user_sigmask I believe we are don't violate those semantics today. Which means I believe we have a semantically valid change in behavior that is causing a regression. >From my inspection of the code the change in behavior is from these two pieces of code: >From the v4.18 epoll_pwait. /* * If we changed the signal mask, we need to restore the original one. * In case we've got a signal while waiting, we do not restore the * signal mask yet, and we allow do_signal() to deliver the signal on * the way back to userspace, before the signal mask is restored. */ if (sigmask) { if (error == -EINTR) { memcpy(>saved_sigmask, , sizeof(sigsaved)); set_restore_sigmask(); } else set_current_blocked(); } /* * restore_user_sigmask: * usigmask: sigmask passed in from userland. * sigsaved: saved sigmask when the syscall started and changed the sigmask to * usigmask. * * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed in from userland for the syscalls. */ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) { if (!usigmask) return; /* * When signals are pending, do not restore them here. * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ if (signal_pending(current)) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; } /* * This is needed because the fast syscall return path does not restore * saved_sigmask when signals are not pending. */ set_current_blocked(sigsaved); } Which has been reported results in a return value of 0, and a signal delivered, where previously that did not happen. They only way I can see that happening is that set_current_blocked in __set_task_blocked clears pending signals (that will be blocked) and calls retarget_shared_pending. Frankly the only reason this appears to be worth touching is that we have a userspace regression. Otherwise I would call the current behavior more correct and desirable than ignoring the signal longer. If I am reading sitaution properly I suggest we go back to resoring the sigmask by hand in epoll_pwait, and put in a big fat comment about a silly userspace program depending on that behavior. That will be easier to backport and it will just fix the regression and not overfix the problem for reasonable applications. Oleg your cleanup also seems reasonable. But I would lik
RE: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
From: Eric Wong > Sent: 29 May 2019 19:50 ... > > Personally I think that is wrong. > > Given code like the above that has: > > while (!interrupted) { > > pselect(..., ); > > // process available data > > } > > > > You want the signal handler to be executed even if one of the fds > > always has available data. > > Otherwise you can't interrupt a process that is always busy. FWIW in the past I've seen a process that loops select-accept-fork-exec get its priority reduced to the point where it never blocked in select. The client side retries made it go badly wrong! If it had limited when SIG_INT was detected it would have been a little more difficult to kill. > Agreed... I believe cmogstored has always had a bug in the way > it uses epoll_pwait because it failed to check interrupts if: > > a) an FD is ready + interrupt > b) epoll_pwait returns 0 on interrupt But the kernel code seems to only call the signal handler (for signals that are enabled during pselect() (etc)) if the underlying wait is interrupted. > The bug remains in userspace for a), which I will fix by adding > an interrupt check when an FD is ready. The window is very > small for a) and difficult to trigger, and also in a rare code > path. > > The b) case is the kernel bug introduced in 854a6ed56839a40f > ("signal: Add restore_user_sigmask()"). > > I don't think there's any disagreement that b) is a kernel bug. If the signal is raised after the wait has timed out but before the signal mask is restored. > So the confusion is for a), and POSIX is not clear w.r.t. how > pselect/poll works when there's both FD readiness and an > interrupt. > > Thus I'm inclined to believe *select/*poll/epoll_*wait should > follow POSIX read() semantics: > >If a read() is interrupted by a signal before it reads any data, it > shall >return −1 with errno set to [EINTR]. > >If a read() is interrupted by a signal after it has successfully > read >some data, it shall return the number of bytes read. Except that above you want different semantics :-) For read() any signal handler is always called. And the return value of a non-blocking read that returns no data is not affected by any pending signals. For pselect() that would mean that if the wait timed out (result 0) and then a signal was raised (before the mask got changed back) then the return value would be zero and the signal handler would be called. So your (b) above is not a bug. Even select() can return 'timeout' and have a signal handler called. There are really two separate issues: 1) Signals that are pending when the new mask is applied. 2) Signals that are raised after the wait terminates (success or timeout). If the signal handlers for (2) are not called then they become (1) next time around the application loop. Maybe the 'restore sigmask' function should be passed an indication of whether it is allowed to let signal handler be called and return whether they would be called (ie whether it restored the signal mask or left it for the syscall exit code to do after calling the signal handlers). That would allow epoll() to convert timeout+pending signal to EINTR, or to allow all handlers be called regardless of the return value. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: pselect/etc semantics
Arnd Bergmann writes: > On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov wrote: >> >> Al, Linus, Eric, please help. >> >> The previous discussion was very confusing, we simply can not understand each >> other. >> >> To me everything looks very simple and clear, but perhaps I missed something >> obvious? Please correct me. > > Thanks for the elaborate explanation in this patch, it all starts making sense > to me now. I also looked at your patch in detail and thought I had found > a few mistakes at first but those all turned out to be mistakes in my reading. > >> See the compile-tested patch at the end. Of course, the new _xxx() helpers >> should be renamed somehow. fs/aio.c doesn't look right with or without this >> patch, but iiuc this is what it did before 854a6ed56839a. > > I think this is a nice simplification, but it would help not to mix up the > minimal regression fix with the rewrite of those functions. For the stable > kernels, I think we want just the addition of the 'bool interrupted' argument > to restore_user_sigmask() to close the race that was introduced > 854a6ed56839a. Following up on that for future kernels, your patch > improves the readability, but we can probably take it even further. > >> - ret = set_user_sigmask(ksig.sigmask, , , >> ksig.sigsetsize); >> + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); >> if (ret) >> return ret; >> >> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : >> NULL); >> - restore_user_sigmask(ksig.sigmask, ); >> - if (signal_pending(current) && !ret) >> + >> + interrupted = signal_pending(current); >> + update_xxx(interrupted); > > Maybe name this > >restore_saved_sigmask_if(!interrupted); > > and make restore_saved_sigmask_if() an inline function > next to restore_saved_sigmask()? > >> @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, >> if (usig && copy_from_user(, usig, sizeof(ksig))) >> return -EFAULT; >> >> - ret = set_compat_user_sigmask(ksig.sigmask, , , >> ksig.sigsetsize); >> + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); >> if (ret) >> return ret; > > With some of the recent discussions about compat syscall handling, > I now think that we want to just fold set_compat_user_sigmask() > into set_user_sigmask() (whatever they get called in the end) > with an in_compat_syscall() conditional inside it, and completely get > rid of the COMPAT_SYSCALL_DEFINEx() definitions for those > system calls for which this is the only difference. > > Unfortunately we still need the time32/time64 distinction, but removing > syscall handlers is a significant cleanup here already, and we can > move most of the function body of sys_io_pgetevents() into > do_io_getevents() in the process. Same for some of the other calls. > > Not sure about the order of the cleanups, but probably something like > this would work: > > 1. fix the race (to be backported) > 2. unify set_compat_user_sigmask/set_user_sigmask > 3. remove unneeded compat handlers > 4. replace restore_user_sigmask with restore_saved_sigmask_if() > 5. also unify compat_get_fd_set()/get_fd_set() and kill off > compat select() variants. Are new system calls added preventing a revert of the patch in question for stable kernels? Eric
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov wrote: > > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. Thanks for the elaborate explanation in this patch, it all starts making sense to me now. I also looked at your patch in detail and thought I had found a few mistakes at first but those all turned out to be mistakes in my reading. > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. I think this is a nice simplification, but it would help not to mix up the minimal regression fix with the rewrite of those functions. For the stable kernels, I think we want just the addition of the 'bool interrupted' argument to restore_user_sigmask() to close the race that was introduced 854a6ed56839a. Following up on that for future kernels, your patch improves the readability, but we can probably take it even further. > - ret = set_user_sigmask(ksig.sigmask, , , > ksig.sigsetsize); > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : > NULL); > - restore_user_sigmask(ksig.sigmask, ); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); Maybe name this restore_saved_sigmask_if(!interrupted); and make restore_saved_sigmask_if() an inline function next to restore_saved_sigmask()? > @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, > if (usig && copy_from_user(, usig, sizeof(ksig))) > return -EFAULT; > > - ret = set_compat_user_sigmask(ksig.sigmask, , , > ksig.sigsetsize); > + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; With some of the recent discussions about compat syscall handling, I now think that we want to just fold set_compat_user_sigmask() into set_user_sigmask() (whatever they get called in the end) with an in_compat_syscall() conditional inside it, and completely get rid of the COMPAT_SYSCALL_DEFINEx() definitions for those system calls for which this is the only difference. Unfortunately we still need the time32/time64 distinction, but removing syscall handlers is a significant cleanup here already, and we can move most of the function body of sys_io_pgetevents() into do_io_getevents() in the process. Same for some of the other calls. Not sure about the order of the cleanups, but probably something like this would work: 1. fix the race (to be backported) 2. unify set_compat_user_sigmask/set_user_sigmask 3. remove unneeded compat handlers 4. replace restore_user_sigmask with restore_saved_sigmask_if() 5. also unify compat_get_fd_set()/get_fd_set() and kill off compat select() variants. Arnd
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
David Laight wrote: > From: Oleg Nesterov > > Sent: 29 May 2019 17:12 > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand > > each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > > > I think that the following code is correct > > > > int interrupted = 0; > > > > void sigint_handler(int sig) > > { > > interrupted = 1; > > } > > > > int main(void) > > { > > sigset_t sigint, empty; > > > > sigemptyset(); > > sigaddset(, SIGINT); > > sigprocmask(SIG_BLOCK, , NULL); > > > > signal(SIGINT, sigint_handler); > > > > sigemptyset();// so pselect() unblocks SIGINT > > > > ret = pselect(..., ); > ^ sigint > > > > if (ret >= 0) // sucess or timeout > > assert(!interrupted); > > > > if (interrupted) > > assert(ret == -EINTR); > > } > > > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, > > this > > signal should not be delivered if a ready fd was found or timeout. The > > signal > > handle should only run if ret == -EINTR. > > Personally I think that is wrong. > Given code like the above that has: > while (!interrupted) { > pselect(..., ); > // process available data > } > > You want the signal handler to be executed even if one of the fds > always has available data. > Otherwise you can't interrupt a process that is always busy. Agreed... I believe cmogstored has always had a bug in the way it uses epoll_pwait because it failed to check interrupts if: a) an FD is ready + interrupt b) epoll_pwait returns 0 on interrupt The bug remains in userspace for a), which I will fix by adding an interrupt check when an FD is ready. The window is very small for a) and difficult to trigger, and also in a rare code path. The b) case is the kernel bug introduced in 854a6ed56839a40f ("signal: Add restore_user_sigmask()"). I don't think there's any disagreement that b) is a kernel bug. So the confusion is for a), and POSIX is not clear w.r.t. how pselect/poll works when there's both FD readiness and an interrupt. Thus I'm inclined to believe *select/*poll/epoll_*wait should follow POSIX read() semantics: If a read() is interrupted by a signal before it reads any data, it shall return −1 with errno set to [EINTR]. If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes read. > One option is to return -EINTR if a signal is pending when the mask > is updated - before even looking at anything else. > > Signals that happen later on (eg after a timeout) need not be reported > (until the next time around the loop). I'm not sure that's necessary and it would cause delays in signal handling.
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov wrote: > > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(); > sigaddset(, SIGINT); > sigprocmask(SIG_BLOCK, , NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset();// so pselect() unblocks SIGINT > > ret = pselect(..., ); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. I do not think we discussed this part earlier. But, if this is true then this is what is wrong as part of 854a6ed56839a. I missed that before. > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply > can't > avoid this). This patch is wrong because I did not know that it was ok to deliver a signal and not set the errno before. I also admitted to this. And proposed another way to revert the patch.: https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6...@mail.gmail.com/ -Deepa
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
Resending due to inadvertent conversion of prior message to html. On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov wrote: > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(); > sigaddset(, SIGINT); > sigprocmask(SIG_BLOCK, , NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset();// so pselect() unblocks SIGINT > > ret = pselect(..., ); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. > > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply > can't > avoid this). > > This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), > now this is broken by the signal_pending() check in restore_user_sigmask(). > > This patch > https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/ > turns 0 into -EINTR if signal_pending(), but I think we should simply restore > the old behaviour and simplify the code. > > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. > > Let me show the code with the patch applied. I am using epoll_pwait() as an > example because it looks very simple. > > > static inline void set_restore_sigmask(void) > { > // WARN_ON(!TIF_SIGPENDING) was removed by this patch > current->restore_sigmask = true; > } > > int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > sigset_t *kmask; > > if (!umask) > return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning > // until the syscall returns. > set_restore_sigmask(); > current->saved_sigmask = current->blocked; > set_current_blocked(kmask); > > return 0; > } > > > void update_xxx(bool interrupted) > { > // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was > "moved" > // from set_restore_sigmask() above. > if (interrupted) > WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > else > restore_saved_sigmask(); > } > > SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, > events, > int, maxevents, int, timeout, const sigset_t __user > *, sigmask, > size_t, sigsetsize) > { > int error; > > error = set_xxx(sigmask, sigsetsize); > if (error) > return error; > > error = do_epoll_wait(epfd, events, maxevents, timeout); > update_xxx(error == -EINTR); > > return error; > } > > Oleg. > --- > > fs/aio.c | 40 ++- > fs/eventpoll.c | 12 +++ > fs/io_uring.c| 12 +++ > fs/select.c | 40 +++ > include/linux/compat.h | 4 +-- > include/linux/sched/signal.h | 2 -- > include/linux/signal.h | 6 ++-- > kernel/signal.c | 77 > -
RE: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
From: Oleg Nesterov > Sent: 29 May 2019 17:12 > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(); > sigaddset(, SIGINT); > sigprocmask(SIG_BLOCK, , NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset();// so pselect() unblocks SIGINT > > ret = pselect(..., ); ^ sigint > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. Personally I think that is wrong. Given code like the above that has: while (!interrupted) { pselect(..., ); // process available data } You want the signal handler to be executed even if one of the fds always has available data. Otherwise you can't interrupt a process that is always busy. One option is to return -EINTR if a signal is pending when the mask is updated - before even looking at anything else. Signals that happen later on (eg after a timeout) need not be reported (until the next time around the loop). > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply > can't > avoid this). You mean any signal that isn't blocked when pselect() is called > This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), > now this is broken by the signal_pending() check in restore_user_sigmask(). > > This patch > https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/ > turns 0 into -EINTR if signal_pending(), but I think we should simply restore > the old behaviour and simplify the code. > > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. > > Let me show the code with the patch applied. I am using epoll_pwait() as an > example because it looks very simple. > > > static inline void set_restore_sigmask(void) > { > // WARN_ON(!TIF_SIGPENDING) was removed by this patch > current->restore_sigmask = true; > } > > int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > sigset_t *kmask; ^ no '*' here, add & before uses. > > if (!umask) > return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning > // until the syscall returns. > set_restore_sigmask(); > current->saved_sigmask = current->blocked; > set_current_blocked(kmask); > > return 0; > } > > > void update_xxx(bool interrupted) > { > // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was > "moved" > // from set_restore_sigmask() above. > if (interrupted) > WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > else > restore_saved_sigmask(); > } I looked at the code earlier, but failed to find the code that actually delivers the signals. It may be 'racy' with update_xxx() regardless of whether that is looking for -EINTR or just a pending signal. I assume that TIF_SIGPENGING is used to (not) short-circuit the system call return path so that signals get delivered. So that it is important that update_xxx() calls restore_saved_sigmask() if there is no signal pending. (Although a signal can happe
pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
Al, Linus, Eric, please help. The previous discussion was very confusing, we simply can not understand each other. To me everything looks very simple and clear, but perhaps I missed something obvious? Please correct me. I think that the following code is correct int interrupted = 0; void sigint_handler(int sig) { interrupted = 1; } int main(void) { sigset_t sigint, empty; sigemptyset(); sigaddset(, SIGINT); sigprocmask(SIG_BLOCK, , NULL); signal(SIGINT, sigint_handler); sigemptyset();// so pselect() unblocks SIGINT ret = pselect(..., ); if (ret >= 0) // sucess or timeout assert(!interrupted); if (interrupted) assert(ret == -EINTR); } IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this signal should not be delivered if a ready fd was found or timeout. The signal handle should only run if ret == -EINTR. (pselect() can be interrupted by any other signal which has a handler. In this case the handler can be called even if ret >= 0. This is correct, I fail to understand why some people think this is wrong, and in any case we simply can't avoid this). This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), now this is broken by the signal_pending() check in restore_user_sigmask(). This patch https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/ turns 0 into -EINTR if signal_pending(), but I think we should simply restore the old behaviour and simplify the code. See the compile-tested patch at the end. Of course, the new _xxx() helpers should be renamed somehow. fs/aio.c doesn't look right with or without this patch, but iiuc this is what it did before 854a6ed56839a. Let me show the code with the patch applied. I am using epoll_pwait() as an example because it looks very simple. static inline void set_restore_sigmask(void) { // WARN_ON(!TIF_SIGPENDING) was removed by this patch current->restore_sigmask = true; } int set_xxx(const sigset_t __user *umask, size_t sigsetsize) { sigset_t *kmask; if (!umask) return 0; if (sigsetsize != sizeof(sigset_t)) return -EINVAL; if (copy_from_user(kmask, umask, sizeof(sigset_t))) return -EFAULT; // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning // until the syscall returns. set_restore_sigmask(); current->saved_sigmask = current->blocked; set_current_blocked(kmask); return 0; } void update_xxx(bool interrupted) { // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was "moved" // from set_restore_sigmask() above. if (interrupted) WARN_ON(!test_thread_flag(TIF_SIGPENDING)); else restore_saved_sigmask(); } SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, int, maxevents, int, timeout, const sigset_t __user *, sigmask, size_t, sigsetsize) { int error; error = set_xxx(sigmask, sigsetsize); if (error) return error; error = do_epoll_wait(epfd, events, maxevents, timeout); update_xxx(error == -EINTR); return error; } Oleg. --- fs/aio.c | 40 ++- fs/eventpoll.c | 12 +++ fs/io_uring.c| 12 +++ fs/select.c | 40 +++ include/linux/compat.h | 4 +-- include/linux/sched/signal.h | 2 -- include/linux/signal.h | 6 ++-- kernel/signal.c | 77 8 files changed, 74 insertions(+), 119 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1f..8315bd2 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents, const struct __aio_sigset __user *, usig) { struct __aio_sigset ksig = { NULL, }; - sigset_tksigmask, sigsaved; struct timespec64 ts; + bool interrupted; int ret; if (timeout && unlikely(get_timespec64(, timeout))) @@ -2103,13 +2103,15 @@ SYSCALL_DEFINE6(io_pgetevents, if (usig && copy_from_user(, usig, sizeof(ksig))) return -EFAULT; - ret = set_user_sigmask(ksig.sigmask, , , ksig.sigsetsize
Re: [PATCH v4 0/5] y2038: Make ppoll, io_pgetevents and pselect y2038 safe
On 9/20/18, Deepa Dinamani wrote: > The series transitions the ppoll, io_getevents, and pselect syscalls > to be y2038 safe. > > This is part of the work proceeding for syscalls for y2038. > It is based on the series [1] from Arnd Bergmann. > > The overview of the series is as below: > 1. Refactor sigmask handling logic for the above syscalls. > 2. Provide y2038 safe versions of the syscalls for all ABIs. > > [1] https://lkml.org/lkml/2018/8/27/651 I'm sorry for dropping the ball on this earlier, so it already missed the 4.20 merge window, and is getting almost too late for 4.21 as well now. I've applied your five patches to my y2038 tree now, so it should at least be in linux-next as of tomorrow and make it into 4.21. I'll see what other patches I have in my backlog that we should get into that release before I send him a pull request. Arnd
[RFC PATCH 2/3] use hrtimer in select and pselect
This changes the select and pselect system calls to use the new schedule_timeout_hr function. Since many applications use the select function instead of nanosleep, this provides a higher resolution sleep to them. BUG: the same needs to be done for the compat syscalls, the current patch breaks building on 64 bit machines. Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]> Index: linux-cg/fs/select.c === --- linux-cg.orig/fs/select.c +++ linux-cg/fs/select.c @@ -189,7 +189,7 @@ get_max: #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) #define POLLEX_SET (POLLPRI) -int do_select(int n, fd_set_bits *fds, s64 *timeout) +int do_select(int n, fd_set_bits *fds, ktime_t *timeout) { struct poll_wqueues table; poll_table *wait; @@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s poll_initwait(); wait = - if (!*timeout) + if (timeout && !timeout->tv64) wait = NULL; retval = 0; for (;;) { unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp; - long __timeout; set_current_state(TASK_INTERRUPTIBLE); @@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s *rexp = res_ex; } wait = NULL; - if (retval || !*timeout || signal_pending(current)) + if (retval || (timeout && !timeout->tv64) + || signal_pending(current)) break; if(table.error) { retval = table.error; break; } - if (*timeout < 0) { + if (!timeout || timeout->tv64 < 0) /* Wait indefinitely */ - __timeout = MAX_SCHEDULE_TIMEOUT; - } else if (unlikely(*timeout >= (s64)MAX_SCHEDULE_TIMEOUT - 1)) { - /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in a loop */ - __timeout = MAX_SCHEDULE_TIMEOUT - 1; - *timeout -= __timeout; - } else { - __timeout = *timeout; - *timeout = 0; - } - __timeout = schedule_timeout(__timeout); - if (*timeout >= 0) - *timeout += __timeout; + schedule(); + else + *timeout = schedule_timeout_hr(*timeout); } __set_current_state(TASK_RUNNING); @@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s ((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1) static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, - fd_set __user *exp, s64 *timeout) + fd_set __user *exp, ktime_t *timeout) { fd_set_bits fds; void *bits; @@ -384,7 +375,7 @@ out_nofds: asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, struct timeval __user *tvp) { - s64 timeout = -1; + ktime_t timeout, *timeoutp = NULL; struct timeval tv; int ret; @@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set if (tv.tv_sec < 0 || tv.tv_usec < 0) return -EINVAL; + timeout = timeval_to_ktime(tv); /* Cast to u64 to make GCC stop complaining */ - if ((u64)tv.tv_sec >= (u64)MAX_INT64_SECONDS) - timeout = -1; /* infinite */ - else { - timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ); - timeout += tv.tv_sec * HZ; - } + if ((u64)tv.tv_sec < (u64)MAX_INT64_SECONDS) + timeoutp = } - ret = core_sys_select(n, inp, outp, exp, ); + ret = core_sys_select(n, inp, outp, exp, timeoutp); if (tvp) { struct timeval rtv; if (current->personality & STICKY_TIMEOUTS) goto sticky; - rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)), HZ)); - rtv.tv_sec = timeout; + rtv = ktime_to_timeval(timeout); if (timeval_compare(, ) >= 0) rtv = tv; if (copy_to_user(tvp, , sizeof(rtv))) { @@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s fd_set __user *exp, struct timespec __user *tsp, const sigset_t __user *sigmask, size_t sigsetsize) { - s64 timeout = MAX_SCHEDULE_TIMEOUT; + ktime_t timeout, *timeoutp = NULL; sigset_t ksigmask, sigsaved; struct timespec ts; int ret; @@ -450,13 +437,11 @@ asmlinkag
[RFC PATCH 2/3] use hrtimer in select and pselect
This changes the select and pselect system calls to use the new schedule_timeout_hr function. Since many applications use the select function instead of nanosleep, this provides a higher resolution sleep to them. BUG: the same needs to be done for the compat syscalls, the current patch breaks building on 64 bit machines. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Index: linux-cg/fs/select.c === --- linux-cg.orig/fs/select.c +++ linux-cg/fs/select.c @@ -189,7 +189,7 @@ get_max: #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) #define POLLEX_SET (POLLPRI) -int do_select(int n, fd_set_bits *fds, s64 *timeout) +int do_select(int n, fd_set_bits *fds, ktime_t *timeout) { struct poll_wqueues table; poll_table *wait; @@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s poll_initwait(table); wait = table.pt; - if (!*timeout) + if (timeout !timeout-tv64) wait = NULL; retval = 0; for (;;) { unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp; - long __timeout; set_current_state(TASK_INTERRUPTIBLE); @@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s *rexp = res_ex; } wait = NULL; - if (retval || !*timeout || signal_pending(current)) + if (retval || (timeout !timeout-tv64) + || signal_pending(current)) break; if(table.error) { retval = table.error; break; } - if (*timeout 0) { + if (!timeout || timeout-tv64 0) /* Wait indefinitely */ - __timeout = MAX_SCHEDULE_TIMEOUT; - } else if (unlikely(*timeout = (s64)MAX_SCHEDULE_TIMEOUT - 1)) { - /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in a loop */ - __timeout = MAX_SCHEDULE_TIMEOUT - 1; - *timeout -= __timeout; - } else { - __timeout = *timeout; - *timeout = 0; - } - __timeout = schedule_timeout(__timeout); - if (*timeout = 0) - *timeout += __timeout; + schedule(); + else + *timeout = schedule_timeout_hr(*timeout); } __set_current_state(TASK_RUNNING); @@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s ((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1) static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, - fd_set __user *exp, s64 *timeout) + fd_set __user *exp, ktime_t *timeout) { fd_set_bits fds; void *bits; @@ -384,7 +375,7 @@ out_nofds: asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, struct timeval __user *tvp) { - s64 timeout = -1; + ktime_t timeout, *timeoutp = NULL; struct timeval tv; int ret; @@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set if (tv.tv_sec 0 || tv.tv_usec 0) return -EINVAL; + timeout = timeval_to_ktime(tv); /* Cast to u64 to make GCC stop complaining */ - if ((u64)tv.tv_sec = (u64)MAX_INT64_SECONDS) - timeout = -1; /* infinite */ - else { - timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ); - timeout += tv.tv_sec * HZ; - } + if ((u64)tv.tv_sec (u64)MAX_INT64_SECONDS) + timeoutp = timeout; } - ret = core_sys_select(n, inp, outp, exp, timeout); + ret = core_sys_select(n, inp, outp, exp, timeoutp); if (tvp) { struct timeval rtv; if (current-personality STICKY_TIMEOUTS) goto sticky; - rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)timeout), HZ)); - rtv.tv_sec = timeout; + rtv = ktime_to_timeval(timeout); if (timeval_compare(rtv, tv) = 0) rtv = tv; if (copy_to_user(tvp, rtv, sizeof(rtv))) { @@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s fd_set __user *exp, struct timespec __user *tsp, const sigset_t __user *sigmask, size_t sigsetsize) { - s64 timeout = MAX_SCHEDULE_TIMEOUT; + ktime_t timeout, *timeoutp = NULL; sigset_t ksigmask, sigsaved; struct timespec ts; int ret; @@ -450,13 +437,11 @@ asmlinkage long sys_pselect7(int n, fd_s
Re: Add pselect, ppoll system calls.
David, > On Fri, 2005-08-05 at 14:49 +0200, Michael Kerrisk wrote: > > If I change your program to do something like the above, I also > > do not see a message from the handler -- i.e., it is not being > > called, and I'm pretty sure it should be. > > Hm, yes. What happens is we come back out of the select() immediately > because of the pending signal, but on the way back to userspace we put the > old signal mask back... so by the time we check for it, there _is_ no > (unblocked) signal pending. > > If it's mandatory that we actually call the signal handler, I'm just about to go off on holiday, and don't have a chance to pull up all the relevant standards details at them moment. However, I'm fairly sure that the signal handler should be called. (Try running a modified version of my program on Solaris 10 or the Unix-03 conversion of AIX (5.3?).) > then we need to > play tricks like sigsuspend() does to leave the old signal mask on the > stack frame. That's a bit painful atm because do_signal is different > between architectures. Yes, I'd say the behaviour should in fact be like what sigsuspend() does. Cheers, Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add pselect, ppoll system calls.
David, On Fri, 2005-08-05 at 14:49 +0200, Michael Kerrisk wrote: If I change your program to do something like the above, I also do not see a message from the handler -- i.e., it is not being called, and I'm pretty sure it should be. Hm, yes. What happens is we come back out of the select() immediately because of the pending signal, but on the way back to userspace we put the old signal mask back... so by the time we check for it, there _is_ no (unblocked) signal pending. If it's mandatory that we actually call the signal handler, I'm just about to go off on holiday, and don't have a chance to pull up all the relevant standards details at them moment. However, I'm fairly sure that the signal handler should be called. (Try running a modified version of my program on Solaris 10 or the Unix-03 conversion of AIX (5.3?).) then we need to play tricks like sigsuspend() does to leave the old signal mask on the stack frame. That's a bit painful atm because do_signal is different between architectures. Yes, I'd say the behaviour should in fact be like what sigsuspend() does. Cheers, Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add pselect, ppoll system calls.
David Woodhouse wrote: > If it's mandatory that we actually call the signal handler, then we need > to play tricks like sigsuspend() does to leave the old signal mask on > the stack frame. That's a bit painful atm because do_signal is different > between architectures. It is necessary that the handler is called. This is the purpose of these interfaces. If this means more complexity is needed then this is how the cookie crumbles. One use case for pselect would be something like this: int got_signal; void sigint_handler(int sig) { got_signal = 1; } { ... while (1) { if (!got_signal) pselect() if (got_signal) { handle signal got_signal = 0; } } ... } -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: Add pselect, ppoll system calls.
David Woodhouse wrote: If it's mandatory that we actually call the signal handler, then we need to play tricks like sigsuspend() does to leave the old signal mask on the stack frame. That's a bit painful atm because do_signal is different between architectures. It is necessary that the handler is called. This is the purpose of these interfaces. If this means more complexity is needed then this is how the cookie crumbles. One use case for pselect would be something like this: int got_signal; void sigint_handler(int sig) { got_signal = 1; } { ... while (1) { if (!got_signal) pselect() if (got_signal) { handle signal got_signal = 0; } } ... } -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: pselect() modifying timeout
On Gwe, 2005-08-05 at 12:42 +0200, Michael Kerrisk wrote: > 1. POSIX made the behaviour of pselect() explicit -- the >timeout must not be modified. The idea was to avoid the >vagueness of the select() specification; it had to be vague >because of existing implementations. By contrast, there were Unfortunately it made the wrong choice with pselect, as Linux select experience has shown the modified timeout is *very* useful data to some applications. So the patch is better than the POSUX behaviour. The library can wrap it to provide the poorer standards compliant API while not stopping people using the better one for Linux specific apps. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pselect() modifying timeout
On Gwe, 2005-08-05 at 12:42 +0200, Michael Kerrisk wrote: 1. POSIX made the behaviour of pselect() explicit -- the timeout must not be modified. The idea was to avoid the vagueness of the select() specification; it had to be vague because of existing implementations. By contrast, there were Unfortunately it made the wrong choice with pselect, as Linux select experience has shown the modified timeout is *very* useful data to some applications. So the patch is better than the POSUX behaviour. The library can wrap it to provide the poorer standards compliant API while not stopping people using the better one for Linux specific apps. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pselect() modifying timeout
> Michael Kerrisk wrote: > > Please consider making Linux pselect() conform to POSIX on this > > point. > > There is no question the implementation will conform. But this not > dependent on changing the syscall interface. We deliberately decided to > not require the kernel interface to be different from select. The > userlevel code will take care of the difference. The kernel code is good > as proposed. Okay -- thanks for the info. Cheers, Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pselect() modifying timeout
Michael Kerrisk wrote: > Please consider making Linux pselect() conform to POSIX on this > point. There is no question the implementation will conform. But this not dependent on changing the syscall interface. We deliberately decided to not require the kernel interface to be different from select. The userlevel code will take care of the difference. The kernel code is good as proposed. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: Add pselect, ppoll system calls.
> From: David Woodhouse <[EMAIL PROTECTED]> > Subject: Re: Add pselect, ppoll system calls. > Date: Wed, 15 Jun 2005 12:36:54 +0100 > > On Mon, 2005-06-13 at 08:38 -0700, Ulrich Drepper wrote: > > > Eep -- I hadn't noticed that difference. Will update patch > > accordingly. > > > > And change [the poll timeout] to expect a 64bit value I hope... > > I believe that 64-bit int types in syscalls are a pain on some > architectures because of restrictions on precisely which register pairs > may be used. I think I'd rather just use a struct timespec, which also > makes it consistent with pselect(). > > > > The other documented difference (other than the signal mask bit) is > > > that > > > pselect() is documented not to modify the timeout parameter. I'm not > > > sure whether I should preserve that difference, or not. > > > > As long as there is a configuration where the timeout value is not > > modified, it doesn't matter. That is the case for select() using a > > personality switch. > > I've made pselect() consistent with select() by using the same > personality switch to control its behaviour. > > I've also fixed select() so that timeouts of greater than LONG_MAX > jiffies are implemented accurately, instead of being infinite. Hi David, I applied the 15 Jun version of your patch agains 2.6.12 and tried a test program. I see some behaviour that puzzles me. When I run the program below, and type control-C, the program tells me that the SIGINT hander was *not* called: ./t_pselect - Making syscall(SYS_pselect6) call [Type ^C] pselect: Unknown error 514 <-- ERESTNOHAND ready = -1 stdin IS NOT readable signal handler WAS NOT called I'm not sure whether this is possibly a mistake in the way I've constructed my test program (I don't think so, but I'm not !00% confident), or some bug in the implementation (I did try making a syscall(SYS__newselect), and that *did* show the signal handler being called. Also, I now have a test result on Solaris 10 for pselect(), and it shows the signal handler is called in this case.) Cheers, Michael /* t_pselect.c Michael Kerrisk, Aug 2005 */ #if defined(__linux__) && !defined(NO_SYSCALL) #define USE_PSELECT_SYSCALL 1 #endif #ifdef USE_PSELECT_SYSCALL #define _GNU_SOURCE #include #endif #include #include #include #include #include #include #include #include #include #define errMsg(msg) do { perror(msg); } while (0) #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \ } while (0) #ifdef USE_PSELECT_SYSCALL /* Following are for x86 */ #define SYS_pselect6 289 #define SYS_ppoll 290 #endif sig_atomic_t gotsig = 0; static void handler(int sig) { gotsig = 1; printf("Caught signal %d\n", sig); /* UNSAFE (see Section $$$) */ } /* handler */ int main(int argc, char *argv[]) { fd_set readfds; int ready, nfds; struct timespec timeout; struct timespec *pto; struct sigaction sa; #ifdef USE_PSELECT_SYSCALL struct { sigset_t *sp; size_t size; } pselarg6; #endif sigset_t empty, all; if (argc != 2) { fprintf(stderr, "Usage: %s {nsecs|-}\n", argv[1]); exit(EXIT_FAILURE); } setbuf(stdout, NULL); /* Block all sigs */ sigfillset(); if (sigprocmask(SIG_BLOCK, , NULL) == -1) errExit("sigprocmask"); /* Establish hander for SIGINT */ sa.sa_handler = handler; sigemptyset(_mask); sa.sa_flags = 0; if (sigaction(SIGINT, , NULL) == -1) errExit("sigaction"); /* Timeout is specified in argv[1] */ if (strcmp(argv[1], "-") == 0) { pto = NULL; /* Infinite timeout */ } else { pto = timeout.tv_sec = atoi(argv[1]); timeout.tv_nsec = 0; } /* Initialize descriptor set */ nfds = 1; FD_ZERO(); FD_SET(STDIN_FILENO, ); sigemptyset(); /* sigaddset(, SIGINT); */ /* Make the call */ #ifdef USE_PSELECT_SYSCALL #if 1 printf("Making syscall(SYS_pselect6) call\n"); pselarg6.sp = pselarg6.size = 8; /* sizeof(sigset_t) */ ready = syscall(SYS_pselect6, nfds, , NULL, NULL, pto, ); #else /* The following is just some testing cruft */ { struct timeval tv; struct timeval *ptv; if (strcmp(argv[1], "-") == 0) { ptv = NULL; /* Infinite timeout */ } else { ptv = tv.tv_sec = atoi(argv[1]); tv.tv_usec = 0; } if (sigprocmask(SIG_SETMASK, , NULL) == -1) errExit("sigprocmask"); printf("Making syscall(SYS__newselect) call\n"); ready = syscall(SYS__newselect, nfds, , NULL, NULL, ptv); printf(" Ignore timeout informati
pselect() modifying timeout
Hello David, By the way, looking at the comments to the last version of the pselect()/ppoll()patch, I see that the treatment of the timeout argument is made dependent on the personality. http://marc.theaimsgroup.com/?l=linux-kernel=111883591220436=2 I'm not sure that this is a good idea; my reasons as follows: 1. POSIX made the behaviour of pselect() explicit -- the timeout must not be modified. The idea was to avoid the vagueness of the select() specification; it had to be vague because of existing implementations. By contrast, there were no pre-existing implementations when pselect() was specified. (By the way, although one or two posts in the earlier thread implied that pselect() has long/widely been present on some systems, this is almost certainly not true. The only systems where I believe it is currently implemented are two that were recently Unix 03 certified: Solaris 10 and AIX (5.3?). I know from doing quite a bit of checking that it is not present as a kernel implementation on most (all?) other systems (even though it was already described by POSIX.1g and Richard Stevens 7 years ago)) I haven't tested Solaris 10 and AIX, but I think one can be reasonably sure that they would conform to the letter of POSIX law. Lacking any strong reason to the contrary, Linux should (IMO) too (why gratuitously introduce differences across implementations?). 2. The existing (non-atomic) glibc pselect() implementation does not change the timeout argument. Please consider making Linux pselect() conform to POSIX on this point. Cheers, Michael -- 5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail +++ GMX - die erste Adresse für Mail, Message, More +++ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pselect() modifying timeout
Hello David, By the way, looking at the comments to the last version of the pselect()/ppoll()patch, I see that the treatment of the timeout argument is made dependent on the personality. http://marc.theaimsgroup.com/?l=linux-kernelm=111883591220436w=2 I'm not sure that this is a good idea; my reasons as follows: 1. POSIX made the behaviour of pselect() explicit -- the timeout must not be modified. The idea was to avoid the vagueness of the select() specification; it had to be vague because of existing implementations. By contrast, there were no pre-existing implementations when pselect() was specified. (By the way, although one or two posts in the earlier thread implied that pselect() has long/widely been present on some systems, this is almost certainly not true. The only systems where I believe it is currently implemented are two that were recently Unix 03 certified: Solaris 10 and AIX (5.3?). I know from doing quite a bit of checking that it is not present as a kernel implementation on most (all?) other systems (even though it was already described by POSIX.1g and Richard Stevens 7 years ago)) I haven't tested Solaris 10 and AIX, but I think one can be reasonably sure that they would conform to the letter of POSIX law. Lacking any strong reason to the contrary, Linux should (IMO) too (why gratuitously introduce differences across implementations?). 2. The existing (non-atomic) glibc pselect() implementation does not change the timeout argument. Please consider making Linux pselect() conform to POSIX on this point. Cheers, Michael -- 5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail +++ GMX - die erste Adresse für Mail, Message, More +++ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add pselect, ppoll system calls.
From: David Woodhouse [EMAIL PROTECTED] Subject: Re: Add pselect, ppoll system calls. Date: Wed, 15 Jun 2005 12:36:54 +0100 On Mon, 2005-06-13 at 08:38 -0700, Ulrich Drepper wrote: Eep -- I hadn't noticed that difference. Will update patch accordingly. And change [the poll timeout] to expect a 64bit value I hope... I believe that 64-bit int types in syscalls are a pain on some architectures because of restrictions on precisely which register pairs may be used. I think I'd rather just use a struct timespec, which also makes it consistent with pselect(). The other documented difference (other than the signal mask bit) is that pselect() is documented not to modify the timeout parameter. I'm not sure whether I should preserve that difference, or not. As long as there is a configuration where the timeout value is not modified, it doesn't matter. That is the case for select() using a personality switch. I've made pselect() consistent with select() by using the same personality switch to control its behaviour. I've also fixed select() so that timeouts of greater than LONG_MAX jiffies are implemented accurately, instead of being infinite. Hi David, I applied the 15 Jun version of your patch agains 2.6.12 and tried a test program. I see some behaviour that puzzles me. When I run the program below, and type control-C, the program tells me that the SIGINT hander was *not* called: ./t_pselect - Making syscall(SYS_pselect6) call [Type ^C] pselect: Unknown error 514 -- ERESTNOHAND ready = -1 stdin IS NOT readable signal handler WAS NOT called I'm not sure whether this is possibly a mistake in the way I've constructed my test program (I don't think so, but I'm not !00% confident), or some bug in the implementation (I did try making a syscall(SYS__newselect), and that *did* show the signal handler being called. Also, I now have a test result on Solaris 10 for pselect(), and it shows the signal handler is called in this case.) Cheers, Michael /* t_pselect.c Michael Kerrisk, Aug 2005 */ #if defined(__linux__) !defined(NO_SYSCALL) #define USE_PSELECT_SYSCALL 1 #endif #ifdef USE_PSELECT_SYSCALL #define _GNU_SOURCE #include syscall.h #endif #include sys/time.h #include sys/select.h #include signal.h #include sys/types.h #include stdio.h #include stdlib.h #include unistd.h #include string.h #include errno.h #define errMsg(msg) do { perror(msg); } while (0) #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \ } while (0) #ifdef USE_PSELECT_SYSCALL /* Following are for x86 */ #define SYS_pselect6 289 #define SYS_ppoll 290 #endif sig_atomic_t gotsig = 0; static void handler(int sig) { gotsig = 1; printf(Caught signal %d\n, sig); /* UNSAFE (see Section $$$) */ } /* handler */ int main(int argc, char *argv[]) { fd_set readfds; int ready, nfds; struct timespec timeout; struct timespec *pto; struct sigaction sa; #ifdef USE_PSELECT_SYSCALL struct { sigset_t *sp; size_t size; } pselarg6; #endif sigset_t empty, all; if (argc != 2) { fprintf(stderr, Usage: %s {nsecs|-}\n, argv[1]); exit(EXIT_FAILURE); } setbuf(stdout, NULL); /* Block all sigs */ sigfillset(all); if (sigprocmask(SIG_BLOCK, all, NULL) == -1) errExit(sigprocmask); /* Establish hander for SIGINT */ sa.sa_handler = handler; sigemptyset(sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGINT, sa, NULL) == -1) errExit(sigaction); /* Timeout is specified in argv[1] */ if (strcmp(argv[1], -) == 0) { pto = NULL; /* Infinite timeout */ } else { pto = timeout; timeout.tv_sec = atoi(argv[1]); timeout.tv_nsec = 0; } /* Initialize descriptor set */ nfds = 1; FD_ZERO(readfds); FD_SET(STDIN_FILENO, readfds); sigemptyset(empty); /* sigaddset(empty, SIGINT); */ /* Make the call */ #ifdef USE_PSELECT_SYSCALL #if 1 printf(Making syscall(SYS_pselect6) call\n); pselarg6.sp = empty; pselarg6.size = 8; /* sizeof(sigset_t) */ ready = syscall(SYS_pselect6, nfds, readfds, NULL, NULL, pto, pselarg6); #else /* The following is just some testing cruft */ { struct timeval tv; struct timeval *ptv; if (strcmp(argv[1], -) == 0) { ptv = NULL; /* Infinite timeout */ } else { ptv = tv; tv.tv_sec = atoi(argv[1]); tv.tv_usec = 0; } if (sigprocmask(SIG_SETMASK, empty, NULL) == -1) errExit(sigprocmask); printf(Making syscall(SYS__newselect) call\n); ready = syscall(SYS__newselect, nfds, readfds, NULL, NULL, ptv); printf( Ignore timeout information printed below \n); } #endif #else /* This is how a proper pselect() call looks on other implementations, or when calling
Re: pselect() modifying timeout
Michael Kerrisk wrote: Please consider making Linux pselect() conform to POSIX on this point. There is no question the implementation will conform. But this not dependent on changing the syscall interface. We deliberately decided to not require the kernel interface to be different from select. The userlevel code will take care of the difference. The kernel code is good as proposed. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: pselect() modifying timeout
Michael Kerrisk wrote: Please consider making Linux pselect() conform to POSIX on this point. There is no question the implementation will conform. But this not dependent on changing the syscall interface. We deliberately decided to not require the kernel interface to be different from select. The userlevel code will take care of the difference. The kernel code is good as proposed. Okay -- thanks for the info. Cheers, Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pselect
>For people who prefer programming above documenting, >here is a simple small thing to do: > >POSIX.1g and Austin document a pselect() call intended to >remove the race condition that is present when one wants >to wait on either a signal or some file descriptor. >(See also Stevens, Unix Network Programming, Volume 1, 2nd Ed., >1998, p. 168 and the pselect.2 man page released today.) >Glibc 2.0 has a bad version (wrong number of parameters) >and glibc 2.1 a better version, but the whole purpose >of pselect is to avoid the race, and glibc cannot do that, >one needs kernel support. >So, probably someone should make a system call pselect >almost identical to the present select, adding a sigmask >parameter. (Or something more general.) Well, pselect is not that interesting I suppose, ppoll would be my favorite call to make use of. -- Jens-Uwe Mager mailto:62CFDB25> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pselect
For people who prefer programming above documenting, here is a simple small thing to do: POSIX.1g and Austin document a pselect() call intended to remove the race condition that is present when one wants to wait on either a signal or some file descriptor. (See also Stevens, Unix Network Programming, Volume 1, 2nd Ed., 1998, p. 168 and the pselect.2 man page released today.) Glibc 2.0 has a bad version (wrong number of parameters) and glibc 2.1 a better version, but the whole purpose of pselect is to avoid the race, and glibc cannot do that, one needs kernel support. So, probably someone should make a system call pselect almost identical to the present select, adding a sigmask parameter. (Or something more general.) Well, pselect is not that interesting I suppose, ppoll would be my favorite call to make use of. -- Jens-Uwe Mager pgp-mailto:62CFDB25 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pselect
For people who prefer programming above documenting, here is a simple small thing to do: POSIX.1g and Austin document a pselect() call intended to remove the race condition that is present when one wants to wait on either a signal or some file descriptor. (See also Stevens, Unix Network Programming, Volume 1, 2nd Ed., 1998, p. 168 and the pselect.2 man page released today.) Glibc 2.0 has a bad version (wrong number of parameters) and glibc 2.1 a better version, but the whole purpose of pselect is to avoid the race, and glibc cannot do that, one needs kernel support. So, probably someone should make a system call pselect almost identical to the present select, adding a sigmask parameter. (Or something more general.) Andries - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pselect
For people who prefer programming above documenting, here is a simple small thing to do: POSIX.1g and Austin document a pselect() call intended to remove the race condition that is present when one wants to wait on either a signal or some file descriptor. (See also Stevens, Unix Network Programming, Volume 1, 2nd Ed., 1998, p. 168 and the pselect.2 man page released today.) Glibc 2.0 has a bad version (wrong number of parameters) and glibc 2.1 a better version, but the whole purpose of pselect is to avoid the race, and glibc cannot do that, one needs kernel support. So, probably someone should make a system call pselect almost identical to the present select, adding a sigmask parameter. (Or something more general.) Andries - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/