Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Sat, Dec 15, 2012 at 05:26:30PM +0100, Jonas Bonn wrote: > That said, let me point again to the series I posted for review a year > ago that attempts to make the restart logic more generic: > > https://lkml.org/lkml/2011/10/23/80 > > The entires patch series, which doesn't necessarily even apply to > Linux master anymore due to other changes along the way, can be found > at: > > git://openrisc.net/~jonas/linux ('signal-arch' branch) > > Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there. > > Al, what do you think? In principle that's a nice cleanup, but... For one thing, I really do not believe that we should be returning to userland for handlerless restarts. At all. In that respect s390 and arm do it right. Moreover, the loop calling do_notify_resume() is better off in C (see e.g. alpha and arm for examples of that approach). Another thing is interplay with ptrace ;-/ And ptrace ABI is pretty much "whatever gdb and its ilk happen to rely upon" - it's not formalized at all and by now it's too late for existing ports. Worse, we get to keep it working as is; old binaries shall not be broken. Hell knows... ptrace vs. signals is a thing of horror, indeed - I had a damn good reason for not including it into this braindump. Note that there are several aspects of behaviour that vary between the architectures: * get_signal_to_deliver() is a chance for tracer to play with our state; what instruction pointer value should it see in case of possible restart? * Does the change of instruction pointer done by tracer cancel restart? Or is there some other way for tracer to signal the need of such cancel? * Does the change of syscall return value done by tracer affect (or cancel) the restart? And that's just for starters - I'm leaving aside the unholy mess around single-stepping vs. signals. The question here is not just "what behaviour would make sense if we designed it from scratch", it's "what behaviour is demonstrated by this architecture". gdb is chock-full of arch-dependent workarounds and yes, we need to keep that working. Moreover, gdb isn't the only thing tied to that crap. What we need is a decent review of what's in there for all architectures. And that'll take cooperation from maintainers - it's far too easy to miss things on 3rd-party review in that area ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 8 December 2012 08:44, Al Viro wrote: > On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: >> What we need to guarantee is >> * restarts do not happen on signals caught in interrupts or exceptions >> * restarts do not happen on signals caught in sigreturn() >> * restart should happen only once, even if we get through do_signal() many >> times. > > FWIW, here's the current situation: > > openrisc: broken. regs->orig_gpr11 can be easily used to fix - it fits the > usual model, but isn't set by sigreturn/restarts. BTW, the comment around > the call of do_notify_resume() in the asm glue is deeply confused - we *do* > want the userspace pt_regs; fortunately, there can't be any on top of those > at that point. > Right, I've known about this for a while and have even had a patch for this lying about in a side-branch. I got side-tracked while fixing this into trying to make the restart logic more generic, and the openrisc fix never got merged upstream. I'll need to revisit this. That said, let me point again to the series I posted for review a year ago that attempts to make the restart logic more generic: https://lkml.org/lkml/2011/10/23/80 The entires patch series, which doesn't necessarily even apply to Linux master anymore due to other changes along the way, can be found at: git://openrisc.net/~jonas/linux ('signal-arch' branch) Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there. Al, what do you think? /Jonas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 8 December 2012 08:44, Al Viro v...@zeniv.linux.org.uk wrote: On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: What we need to guarantee is * restarts do not happen on signals caught in interrupts or exceptions * restarts do not happen on signals caught in sigreturn() * restart should happen only once, even if we get through do_signal() many times. FWIW, here's the current situation: openrisc: broken. regs-orig_gpr11 can be easily used to fix - it fits the usual model, but isn't set by sigreturn/restarts. BTW, the comment around the call of do_notify_resume() in the asm glue is deeply confused - we *do* want the userspace pt_regs; fortunately, there can't be any on top of those at that point. Right, I've known about this for a while and have even had a patch for this lying about in a side-branch. I got side-tracked while fixing this into trying to make the restart logic more generic, and the openrisc fix never got merged upstream. I'll need to revisit this. That said, let me point again to the series I posted for review a year ago that attempts to make the restart logic more generic: https://lkml.org/lkml/2011/10/23/80 The entires patch series, which doesn't necessarily even apply to Linux master anymore due to other changes along the way, can be found at: git://openrisc.net/~jonas/linux ('signal-arch' branch) Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there. Al, what do you think? /Jonas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Sat, Dec 15, 2012 at 05:26:30PM +0100, Jonas Bonn wrote: That said, let me point again to the series I posted for review a year ago that attempts to make the restart logic more generic: https://lkml.org/lkml/2011/10/23/80 The entires patch series, which doesn't necessarily even apply to Linux master anymore due to other changes along the way, can be found at: git://openrisc.net/~jonas/linux ('signal-arch' branch) Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there. Al, what do you think? In principle that's a nice cleanup, but... For one thing, I really do not believe that we should be returning to userland for handlerless restarts. At all. In that respect s390 and arm do it right. Moreover, the loop calling do_notify_resume() is better off in C (see e.g. alpha and arm for examples of that approach). Another thing is interplay with ptrace ;-/ And ptrace ABI is pretty much whatever gdb and its ilk happen to rely upon - it's not formalized at all and by now it's too late for existing ports. Worse, we get to keep it working as is; old binaries shall not be broken. Hell knows... ptrace vs. signals is a thing of horror, indeed - I had a damn good reason for not including it into this braindump. Note that there are several aspects of behaviour that vary between the architectures: * get_signal_to_deliver() is a chance for tracer to play with our state; what instruction pointer value should it see in case of possible restart? * Does the change of instruction pointer done by tracer cancel restart? Or is there some other way for tracer to signal the need of such cancel? * Does the change of syscall return value done by tracer affect (or cancel) the restart? And that's just for starters - I'm leaving aside the unholy mess around single-stepping vs. signals. The question here is not just what behaviour would make sense if we designed it from scratch, it's what behaviour is demonstrated by this architecture. gdb is chock-full of arch-dependent workarounds and yes, we need to keep that working. Moreover, gdb isn't the only thing tied to that crap. What we need is a decent review of what's in there for all architectures. And that'll take cooperation from maintainers - it's far too easy to miss things on 3rd-party review in that area ;-/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 08/12/12 18:14, Al Viro wrote: > On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: > >> What we need to guarantee is >> * restarts do not happen on signals caught in interrupts or exceptions >> * restarts do not happen on signals caught in sigreturn() Since we don't currently have an orig syscall in our pt_regs that we can make invalid, is it acceptable to just explicitly exclude rt_sigreturn? e.g. using something like this around the switch statements that check for -ERESTART*: /* * Decide whether a syscall restart should be checked for. * This needs to exclude non-syscalls (syscall == -1), and sys_rt_sigreturn. */ static int restartable_syscall(int syscall) { return syscall >= 0 && syscall != __NR_rt_sigreturn; } >> * restart should happen only once, even if we get through do_signal() many >> times. > > BTW, signal caught in sigreturn is *not* something requiring a narrow > race. It's perfectly normal to have some signals blocked for the > duration of signal handler - the signal itself is blocked by default > unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing > to block an arbitrary set of signals. Upon return from signal handler > we undo that and if any of the temporary blocked signals has arrived > while we'd been in the handler (e.g. had been raised by the handler itself), > it will be caught as soon as we unblock it, i.e. in sigreturn. > > Unfortunately, the testcases are somewhat architecture-dependent. See, for > example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there > might be a way to arrange for asm-free equivalent if one played with -O0, > but it's probably not worth the trouble. That one deals with sigreturn > from signal caught in interrupt; sigreturn from signal caught in syscall > is a bit trickier. TBH, I don't understand your syscall calling conventions > well enough to cook one up; your restart logics looks really strange. > You leave ->DX[0].U0 modified - fair enough, it doesn't seem to be used > by syscall entry path - *and* you revert ->DX[0].U1 to the state you > used to have on entry. The thing is, I don't see any place that would > have changed it in between; why do you need that > regs->REG_SYSCALL = orig_syscall; > in there at all? Yes, this doesn't seem to be necessary. The only other place REG_SYSCALL is changed is when it's set to __NR_restart_syscall, in which case I presume it never needs to be reset i.e. if a syscall returns -ERESTART_RESTARTBLOCK, it either doesn't return a different -ERESTART* from the restart block callback, or it's acceptable in that case to restart sys_restart_syscall rather than the original syscall that returned -ERESTART_RESTARTBLOCK. Is that right? > > BTW, could you (and other folks submitting ports) document the ABI? > See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example... > Good idea, something like below? Thanks James Date: Tue, 11 Dec 2012 10:08:26 + Subject: [PATCH 1/1] metag: add kernel-ABI document Add a document in Documentation/metag/ describing the Linux ABI for metag. It includes an outline of the registers, which ones are special in userland and the kernel, the system call ABI, and an overview of the calling conventions. It was suggested that new architecture ports should have some ABI documentation available, with Documentation/frv/kernel-ABI.txt referenced as a decent example, from which some inspiration was drawn for this patch. Signed-off-by: James Hogan Reported-by: Al Viro --- Documentation/metag/00-INDEX |2 + Documentation/metag/kernel-ABI.txt | 256 2 files changed, 258 insertions(+), 0 deletions(-) create mode 100644 Documentation/metag/kernel-ABI.txt diff --git a/Documentation/metag/00-INDEX b/Documentation/metag/00-INDEX index 4fe2e16..4a93a27 100644 --- a/Documentation/metag/00-INDEX +++ b/Documentation/metag/00-INDEX @@ -6,3 +6,5 @@ coremem.txt - Documents the core memory interface used by suspend code comet/ - Documentation specific to the Comet SoC +kernel-ABI.txt + - Documents metag ABI details diff --git a/Documentation/metag/kernel-ABI.txt b/Documentation/metag/kernel-ABI.txt new file mode 100644 index 000..7b8dee8 --- /dev/null +++ b/Documentation/metag/kernel-ABI.txt @@ -0,0 +1,256 @@ + == + KERNEL ABIS FOR METAG ARCH + == + +This document describes the Linux ABIs for the metag architecture, and has the +following sections: + + (*) Outline of registers + (*) Userland registers + (*) Kernel registers + (*) System call ABI + (*) Calling conventions + + + +OUTLINE OF REGISTERS + + +The main Meta core registers are arranged in units: + + UNITTypeDESCRIPTION GP EXT PRIVGLOBAL + === === === === === ===
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 08/12/12 18:14, Al Viro wrote: On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: What we need to guarantee is * restarts do not happen on signals caught in interrupts or exceptions * restarts do not happen on signals caught in sigreturn() Since we don't currently have an orig syscall in our pt_regs that we can make invalid, is it acceptable to just explicitly exclude rt_sigreturn? e.g. using something like this around the switch statements that check for -ERESTART*: /* * Decide whether a syscall restart should be checked for. * This needs to exclude non-syscalls (syscall == -1), and sys_rt_sigreturn. */ static int restartable_syscall(int syscall) { return syscall = 0 syscall != __NR_rt_sigreturn; } * restart should happen only once, even if we get through do_signal() many times. BTW, signal caught in sigreturn is *not* something requiring a narrow race. It's perfectly normal to have some signals blocked for the duration of signal handler - the signal itself is blocked by default unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing to block an arbitrary set of signals. Upon return from signal handler we undo that and if any of the temporary blocked signals has arrived while we'd been in the handler (e.g. had been raised by the handler itself), it will be caught as soon as we unblock it, i.e. in sigreturn. Unfortunately, the testcases are somewhat architecture-dependent. See, for example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there might be a way to arrange for asm-free equivalent if one played with -O0, but it's probably not worth the trouble. That one deals with sigreturn from signal caught in interrupt; sigreturn from signal caught in syscall is a bit trickier. TBH, I don't understand your syscall calling conventions well enough to cook one up; your restart logics looks really strange. You leave -DX[0].U0 modified - fair enough, it doesn't seem to be used by syscall entry path - *and* you revert -DX[0].U1 to the state you used to have on entry. The thing is, I don't see any place that would have changed it in between; why do you need that regs-REG_SYSCALL = orig_syscall; in there at all? Yes, this doesn't seem to be necessary. The only other place REG_SYSCALL is changed is when it's set to __NR_restart_syscall, in which case I presume it never needs to be reset i.e. if a syscall returns -ERESTART_RESTARTBLOCK, it either doesn't return a different -ERESTART* from the restart block callback, or it's acceptable in that case to restart sys_restart_syscall rather than the original syscall that returned -ERESTART_RESTARTBLOCK. Is that right? BTW, could you (and other folks submitting ports) document the ABI? See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example... Good idea, something like below? Thanks James Date: Tue, 11 Dec 2012 10:08:26 + Subject: [PATCH 1/1] metag: add kernel-ABI document Add a document in Documentation/metag/ describing the Linux ABI for metag. It includes an outline of the registers, which ones are special in userland and the kernel, the system call ABI, and an overview of the calling conventions. It was suggested that new architecture ports should have some ABI documentation available, with Documentation/frv/kernel-ABI.txt referenced as a decent example, from which some inspiration was drawn for this patch. Signed-off-by: James Hogan james.ho...@imgtec.com Reported-by: Al Viro v...@zeniv.linux.org.uk --- Documentation/metag/00-INDEX |2 + Documentation/metag/kernel-ABI.txt | 256 2 files changed, 258 insertions(+), 0 deletions(-) create mode 100644 Documentation/metag/kernel-ABI.txt diff --git a/Documentation/metag/00-INDEX b/Documentation/metag/00-INDEX index 4fe2e16..4a93a27 100644 --- a/Documentation/metag/00-INDEX +++ b/Documentation/metag/00-INDEX @@ -6,3 +6,5 @@ coremem.txt - Documents the core memory interface used by suspend code comet/ - Documentation specific to the Comet SoC +kernel-ABI.txt + - Documents metag ABI details diff --git a/Documentation/metag/kernel-ABI.txt b/Documentation/metag/kernel-ABI.txt new file mode 100644 index 000..7b8dee8 --- /dev/null +++ b/Documentation/metag/kernel-ABI.txt @@ -0,0 +1,256 @@ + == + KERNEL ABIS FOR METAG ARCH + == + +This document describes the Linux ABIs for the metag architecture, and has the +following sections: + + (*) Outline of registers + (*) Userland registers + (*) Kernel registers + (*) System call ABI + (*) Calling conventions + + + +OUTLINE OF REGISTERS + + +The main Meta core registers are arranged in units: + + UNITTypeDESCRIPTION GP EXT PRIVGLOBAL + === === === === === ===
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 06/12/12 22:09, Al Viro wrote: > On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote: > >> Agreed, it looks wrong. Looking at the sh version, is there a particular >> reason to only check for -EFAULT and not the other errors that >> do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)? > > See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 Thanks :) > >>> BTW, what's to stop the syscall restart triggering if you catch a signal >>> while in rt_sigreturn(2)? >> >> I'm not sure I understand how that could cause a problem. Could you >> elaborate the sequence of events? >> >> The signal restart is triggered by the return value register, so >> rt_sigreturn would have to return -ERESTART*. This could happen if the >> signal handler overwrote the return value in the sigcontext (which as >> far as I can tell could also happen on ARM), or if the syscall that was >> originally interrupted by the signal has -ERESTARTNOINTR || >> (-ERESTARTSYS && SA_RESTART), but in that case rt_sigreturn has already >> switched back to the context of the original syscall so that's the right >> thing to do isn't it? I've probably missed something important :-) > > [we probably need something along the lines of braindump below in > somewhere in Documentation/*; comments and improvements are very > welcome - this is just a starting point. We *do* need some coherent > explanation of signal semantics, judging by how often people step on > the same landmines...] Thanks, this helps a lot. Cheers James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On 06/12/12 22:09, Al Viro wrote: On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote: Agreed, it looks wrong. Looking at the sh version, is there a particular reason to only check for -EFAULT and not the other errors that do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)? See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 Thanks :) BTW, what's to stop the syscall restart triggering if you catch a signal while in rt_sigreturn(2)? I'm not sure I understand how that could cause a problem. Could you elaborate the sequence of events? The signal restart is triggered by the return value register, so rt_sigreturn would have to return -ERESTART*. This could happen if the signal handler overwrote the return value in the sigcontext (which as far as I can tell could also happen on ARM), or if the syscall that was originally interrupted by the signal has -ERESTARTNOINTR || (-ERESTARTSYS SA_RESTART), but in that case rt_sigreturn has already switched back to the context of the original syscall so that's the right thing to do isn't it? I've probably missed something important :-) [we probably need something along the lines of braindump below in somewhere in Documentation/*; comments and improvements are very welcome - this is just a starting point. We *do* need some coherent explanation of signal semantics, judging by how often people step on the same landmines...] Thanks, this helps a lot. Cheers James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: > What we need to guarantee is > * restarts do not happen on signals caught in interrupts or exceptions > * restarts do not happen on signals caught in sigreturn() > * restart should happen only once, even if we get through do_signal() many > times. BTW, signal caught in sigreturn is *not* something requiring a narrow race. It's perfectly normal to have some signals blocked for the duration of signal handler - the signal itself is blocked by default unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing to block an arbitrary set of signals. Upon return from signal handler we undo that and if any of the temporary blocked signals has arrived while we'd been in the handler (e.g. had been raised by the handler itself), it will be caught as soon as we unblock it, i.e. in sigreturn. Unfortunately, the testcases are somewhat architecture-dependent. See, for example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there might be a way to arrange for asm-free equivalent if one played with -O0, but it's probably not worth the trouble. That one deals with sigreturn from signal caught in interrupt; sigreturn from signal caught in syscall is a bit trickier. TBH, I don't understand your syscall calling conventions well enough to cook one up; your restart logics looks really strange. You leave ->DX[0].U0 modified - fair enough, it doesn't seem to be used by syscall entry path - *and* you revert ->DX[0].U1 to the state you used to have on entry. The thing is, I don't see any place that would have changed it in between; why do you need that regs->REG_SYSCALL = orig_syscall; in there at all? BTW, could you (and other folks submitting ports) document the ABI? See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: What we need to guarantee is * restarts do not happen on signals caught in interrupts or exceptions * restarts do not happen on signals caught in sigreturn() * restart should happen only once, even if we get through do_signal() many times. BTW, signal caught in sigreturn is *not* something requiring a narrow race. It's perfectly normal to have some signals blocked for the duration of signal handler - the signal itself is blocked by default unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing to block an arbitrary set of signals. Upon return from signal handler we undo that and if any of the temporary blocked signals has arrived while we'd been in the handler (e.g. had been raised by the handler itself), it will be caught as soon as we unblock it, i.e. in sigreturn. Unfortunately, the testcases are somewhat architecture-dependent. See, for example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there might be a way to arrange for asm-free equivalent if one played with -O0, but it's probably not worth the trouble. That one deals with sigreturn from signal caught in interrupt; sigreturn from signal caught in syscall is a bit trickier. TBH, I don't understand your syscall calling conventions well enough to cook one up; your restart logics looks really strange. You leave -DX[0].U0 modified - fair enough, it doesn't seem to be used by syscall entry path - *and* you revert -DX[0].U1 to the state you used to have on entry. The thing is, I don't see any place that would have changed it in between; why do you need that regs-REG_SYSCALL = orig_syscall; in there at all? BTW, could you (and other folks submitting ports) document the ABI? See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: > "Subtle and undocumented" is an extremely polite way to describe that. > By now we had at least a dozen architectures step on that trap, simply because > they had different calling conventions and the same logics did *not* "just > work" there. > > What we need to guarantee is > * restarts do not happen on signals caught in interrupts or exceptions > * restarts do not happen on signals caught in sigreturn() > * restart should happen only once, even if we get through do_signal() many > times. FWIW, here's the current situation: alpha: works. Double restarts are prevented by the loop in do_work_pending() resetting 'r0' (syscall number or 0 if restarts should not be done) to 0 after the first call of do_signal(); all restart logics is conditional on r0 != 0. The logics making sure that we get the right value passed to do_work_pending() is convoluted and had cost us at least one bug (sigreturn/rt_sigreturn had stopped only once in case of straced process; strace(1) got seriously confused and produced garbage). arm: works. Double restarts are prevented by logics similar to alpha do_work_pending(); prevention of restarts on non-syscalls and sigreturn is done by asm glue setting r8 ('why', aka 'tbl') to 0 in non-syscall entry points and to syscall table address in syscall entry; zeroed in asm wrappers for sigreturn/rt_sigreturn. Used to be broken until several years ago. arm64: works. Syscall number is in pt_regs (->syscallno); -1 for non-syscall ones. Reaching do_signal() the first time around will set it to -1 and so will sigreturn (in restore_sigframe()). Restart logics is conditional on ->syscallno being non-negative. avr32: _very_ odd logics used to decide whether to do restarts or not and frankly, I do not believe that it could possibly work correctly - whatever we do when building a sigframe, we don't touch SYSREG_SR in process, so that won't prevent double restarts. And if we had r12 (first argument of syscall) restart-worthy at the entry, setup_syscall_restart() will leave us with restart-worthy value in ->r12. So e.g. pause(2) called when r12 happened to contain -514 (it's a zero-argument syscall, so calling it doesn't involve assignments to r12) will happily hit double restarts if we have e.g. SIGCHLD coming often enough. If that thing works, I would really appreciate a detailed explanation of how it manages to do that - it definitely deserves one. blackfin: doesn't handle multiple signals; if you get a SIGSEGV generated by failing attempt to set a sigframe up, too bad - you'll pass to userland and coredump will hit at some later point when a hardware interrupt happend. Restarts on sigreturn and non-syscalls are prevented by checking if ->orig_p0 is non-negative (similar to arm64 solution above) and it's easy to turn into prevention of double restarts, which will become necessary as soon as multiple signal handling gets fixed. Actually, it's almost OK as is - ERESTART_RESTARTBLOCK case is the only problematic one. c6x: there's a flag next to pt_regs on stack and it's non-zero if and only if we have a syscall. Passed explicitly to do_notify_resume() to tell if restarts are allowed. As far as I can see, it is vulnerable to bogus restarts on sigreturn (can be fixed by clearing the same flag in do_rt_sigreturn() - simple *(long *)(regs + 1) = 0 in there will do). It might be vulnerable to double restarts as well - pause(2) is not enabled there, but it's not much comfort. In the best case we are relying on the following property: no syscall can return -ERESTART... when called with the first argument equal to that value. Might be true (the usual suspects are pause() and ancient sigsuspend() of 3-argument variety and neither is used here), but it's brittle as hell. Come to think of that, clone(2) would probably fit the bill - we ignore all unknown bits in the mask and clone(2) *can* return -ERESTARTNOINTR. So it's almost certainly vulnerable. The same fix would do, but explicit loop a-la arm might be better. cris: same lossage as on blackfin (quits after the first signal). Vulnerable to bogus restarts on sigreturn. Would be vulnerable to double restarts if it handled multiple signals. frv: works (similar to the situation on arm64. Used to be broken until a couple of years ago. h8300: works. Prevention of restarts on sigreturn and non-syscalls based on sign of ->orig_er0; the first pass through syscall restart logics renders regs->er0 (return value) non-restart-worthy - anything that used to be restart-worthy becomes either non-negative (->orig_er0 has to be, or we won't touch that at all) or -EINTR. In other words, we can't hit that sucker twice. hexagon: broken. Prevention of restarts on non-syscalls is based on sign of ->syscall_nr. sigreturn carefully sets it *positive* and that makes it vulnerable to bogus restarts. Moreover, double restarts are possible as well, same as on c6x. ia64:
Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote: Subtle and undocumented is an extremely polite way to describe that. By now we had at least a dozen architectures step on that trap, simply because they had different calling conventions and the same logics did *not* just work there. What we need to guarantee is * restarts do not happen on signals caught in interrupts or exceptions * restarts do not happen on signals caught in sigreturn() * restart should happen only once, even if we get through do_signal() many times. FWIW, here's the current situation: alpha: works. Double restarts are prevented by the loop in do_work_pending() resetting 'r0' (syscall number or 0 if restarts should not be done) to 0 after the first call of do_signal(); all restart logics is conditional on r0 != 0. The logics making sure that we get the right value passed to do_work_pending() is convoluted and had cost us at least one bug (sigreturn/rt_sigreturn had stopped only once in case of straced process; strace(1) got seriously confused and produced garbage). arm: works. Double restarts are prevented by logics similar to alpha do_work_pending(); prevention of restarts on non-syscalls and sigreturn is done by asm glue setting r8 ('why', aka 'tbl') to 0 in non-syscall entry points and to syscall table address in syscall entry; zeroed in asm wrappers for sigreturn/rt_sigreturn. Used to be broken until several years ago. arm64: works. Syscall number is in pt_regs (-syscallno); -1 for non-syscall ones. Reaching do_signal() the first time around will set it to -1 and so will sigreturn (in restore_sigframe()). Restart logics is conditional on -syscallno being non-negative. avr32: _very_ odd logics used to decide whether to do restarts or not and frankly, I do not believe that it could possibly work correctly - whatever we do when building a sigframe, we don't touch SYSREG_SR in process, so that won't prevent double restarts. And if we had r12 (first argument of syscall) restart-worthy at the entry, setup_syscall_restart() will leave us with restart-worthy value in -r12. So e.g. pause(2) called when r12 happened to contain -514 (it's a zero-argument syscall, so calling it doesn't involve assignments to r12) will happily hit double restarts if we have e.g. SIGCHLD coming often enough. If that thing works, I would really appreciate a detailed explanation of how it manages to do that - it definitely deserves one. blackfin: doesn't handle multiple signals; if you get a SIGSEGV generated by failing attempt to set a sigframe up, too bad - you'll pass to userland and coredump will hit at some later point when a hardware interrupt happend. Restarts on sigreturn and non-syscalls are prevented by checking if -orig_p0 is non-negative (similar to arm64 solution above) and it's easy to turn into prevention of double restarts, which will become necessary as soon as multiple signal handling gets fixed. Actually, it's almost OK as is - ERESTART_RESTARTBLOCK case is the only problematic one. c6x: there's a flag next to pt_regs on stack and it's non-zero if and only if we have a syscall. Passed explicitly to do_notify_resume() to tell if restarts are allowed. As far as I can see, it is vulnerable to bogus restarts on sigreturn (can be fixed by clearing the same flag in do_rt_sigreturn() - simple *(long *)(regs + 1) = 0 in there will do). It might be vulnerable to double restarts as well - pause(2) is not enabled there, but it's not much comfort. In the best case we are relying on the following property: no syscall can return -ERESTART... when called with the first argument equal to that value. Might be true (the usual suspects are pause() and ancient sigsuspend() of 3-argument variety and neither is used here), but it's brittle as hell. Come to think of that, clone(2) would probably fit the bill - we ignore all unknown bits in the mask and clone(2) *can* return -ERESTARTNOINTR. So it's almost certainly vulnerable. The same fix would do, but explicit loop a-la arm might be better. cris: same lossage as on blackfin (quits after the first signal). Vulnerable to bogus restarts on sigreturn. Would be vulnerable to double restarts if it handled multiple signals. frv: works (similar to the situation on arm64. Used to be broken until a couple of years ago. h8300: works. Prevention of restarts on sigreturn and non-syscalls based on sign of -orig_er0; the first pass through syscall restart logics renders regs-er0 (return value) non-restart-worthy - anything that used to be restart-worthy becomes either non-negative (-orig_er0 has to be, or we won't touch that at all) or -EINTR. In other words, we can't hit that sucker twice. hexagon: broken. Prevention of restarts on non-syscalls is based on sign of -syscall_nr. sigreturn carefully sets it *positive* and that makes it vulnerable to bogus restarts. Moreover, double restarts are possible as well, same as on c6x. ia64: really, really weird. The
[braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote: > Agreed, it looks wrong. Looking at the sh version, is there a particular > reason to only check for -EFAULT and not the other errors that > do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)? See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 > > BTW, what's to stop the syscall restart triggering if you catch a signal > > while in rt_sigreturn(2)? > > I'm not sure I understand how that could cause a problem. Could you > elaborate the sequence of events? > > The signal restart is triggered by the return value register, so > rt_sigreturn would have to return -ERESTART*. This could happen if the > signal handler overwrote the return value in the sigcontext (which as > far as I can tell could also happen on ARM), or if the syscall that was > originally interrupted by the signal has -ERESTARTNOINTR || > (-ERESTARTSYS && SA_RESTART), but in that case rt_sigreturn has already > switched back to the context of the original syscall so that's the right > thing to do isn't it? I've probably missed something important :-) [we probably need something along the lines of braindump below in somewhere in Documentation/*; comments and improvements are very welcome - this is just a starting point. We *do* need some coherent explanation of signal semantics, judging by how often people step on the same landmines...] To understand what's going on with the signals, it's better to think of crossing the kernel boundary not as of function call and return from it, but as of coroutines passing control back and forth. When we enter the kernel mode, we start with saving CPU state. Usually (and you are going to hate that word well before you read to the end) it's stored in struct pt_regs, but it might be more complicated. For our purposes it's better to think of it as abstract saved state, leaving aside the question of how it's represented. What happens next depends on why we'd been called - it might have been an interrupt or exception or it might have been a syscall. The main difference is, of course, that on return from interrupt or exception we want registers unchanged while on return from syscall we expect to find return value in a register. Another thing is that after syscall we want to advance to the next instruction, while e.g. a page fault usually wants to repeat the instruction that had triggered it. It's _very_ architecture-dependent; in any case, we usually know very early which userland instruction would we normally have executed next, so let's consider that knowledge a part of saved state. For now I'm going to ignore ptrace-related horrors; it's a separate story (and an ugly one). In absense of that, the syscall execution is conceptually simple - * choose which function to call and what arguments to pass to it (both derived from saved state; usually that's picked directly from values left in registers by userland, but there might be more convoluted cases when e.g. syscall number is encoded as part of instruction or some arguments are passed on userland stack, etc.) * do a function call * shove return value into the saved state At that point we are ready to resume the execution of userland code. Usually. Unless a signal had arrived. So far it looked pretty much like a function call - sure, we had been using the kernel stack instead of the userland one, the calling sequence had been unusual and the call chain included a nasty glob of assembler glue, but as far as control flow is concerned, everything looks as if we'd called a function and returned from it. It's about to get more complicated, though. Signal arrival is indicated by TIF_SIGPENDING flag set on the target process. If we see it set when we are about to resume the execution in userland, we find out which signal has it been and what handler (if any) do we have for it. For now I'm going to leave the syscall restart logics aside; we'll get back to it shortly. Suppose we had a successful call of open(2) or e.g. just handled a page fault. We have saved state ready for resuming the userland execution. It contains the address of next instruction to execute and, in case of open(2), has the return value of sys_open() already reflected in it. Now we find ourselves wanting to have a signal handler executed; it's a userland function and we can't have that run in kernel mode. Here's what we normally do: * create a new userland coroutine context - that's what sigframe is about. * encode the saved state and store it in sigframe * modify the saved state so that the next instruction to execute would be at the beginning of signal handler, registers would look as if we were passing the right arguments to that handler and return address (either in register or in stack frame built for the handler - depends on ABI) would point to a small piece of code that would do cleanup and terminate that coroutine (more
[braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote: Agreed, it looks wrong. Looking at the sh version, is there a particular reason to only check for -EFAULT and not the other errors that do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)? See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 BTW, what's to stop the syscall restart triggering if you catch a signal while in rt_sigreturn(2)? I'm not sure I understand how that could cause a problem. Could you elaborate the sequence of events? The signal restart is triggered by the return value register, so rt_sigreturn would have to return -ERESTART*. This could happen if the signal handler overwrote the return value in the sigcontext (which as far as I can tell could also happen on ARM), or if the syscall that was originally interrupted by the signal has -ERESTARTNOINTR || (-ERESTARTSYS SA_RESTART), but in that case rt_sigreturn has already switched back to the context of the original syscall so that's the right thing to do isn't it? I've probably missed something important :-) [we probably need something along the lines of braindump below in somewhere in Documentation/*; comments and improvements are very welcome - this is just a starting point. We *do* need some coherent explanation of signal semantics, judging by how often people step on the same landmines...] To understand what's going on with the signals, it's better to think of crossing the kernel boundary not as of function call and return from it, but as of coroutines passing control back and forth. When we enter the kernel mode, we start with saving CPU state. Usually (and you are going to hate that word well before you read to the end) it's stored in struct pt_regs, but it might be more complicated. For our purposes it's better to think of it as abstract saved state, leaving aside the question of how it's represented. What happens next depends on why we'd been called - it might have been an interrupt or exception or it might have been a syscall. The main difference is, of course, that on return from interrupt or exception we want registers unchanged while on return from syscall we expect to find return value in a register. Another thing is that after syscall we want to advance to the next instruction, while e.g. a page fault usually wants to repeat the instruction that had triggered it. It's _very_ architecture-dependent; in any case, we usually know very early which userland instruction would we normally have executed next, so let's consider that knowledge a part of saved state. For now I'm going to ignore ptrace-related horrors; it's a separate story (and an ugly one). In absense of that, the syscall execution is conceptually simple - * choose which function to call and what arguments to pass to it (both derived from saved state; usually that's picked directly from values left in registers by userland, but there might be more convoluted cases when e.g. syscall number is encoded as part of instruction or some arguments are passed on userland stack, etc.) * do a function call * shove return value into the saved state At that point we are ready to resume the execution of userland code. Usually. Unless a signal had arrived. So far it looked pretty much like a function call - sure, we had been using the kernel stack instead of the userland one, the calling sequence had been unusual and the call chain included a nasty glob of assembler glue, but as far as control flow is concerned, everything looks as if we'd called a function and returned from it. It's about to get more complicated, though. Signal arrival is indicated by TIF_SIGPENDING flag set on the target process. If we see it set when we are about to resume the execution in userland, we find out which signal has it been and what handler (if any) do we have for it. For now I'm going to leave the syscall restart logics aside; we'll get back to it shortly. Suppose we had a successful call of open(2) or e.g. just handled a page fault. We have saved state ready for resuming the userland execution. It contains the address of next instruction to execute and, in case of open(2), has the return value of sys_open() already reflected in it. Now we find ourselves wanting to have a signal handler executed; it's a userland function and we can't have that run in kernel mode. Here's what we normally do: * create a new userland coroutine context - that's what sigframe is about. * encode the saved state and store it in sigframe * modify the saved state so that the next instruction to execute would be at the beginning of signal handler, registers would look as if we were passing the right arguments to that handler and return address (either in register or in stack frame built for the handler - depends on ABI) would point to a small piece of code that would do cleanup and terminate that coroutine (more on that in a minute).