Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-15 Thread Al Viro
On Sat, Dec 15, 2012 at 05:26:30PM +0100, Jonas Bonn wrote:

> That said, let me point again to the series I posted for review a year
> ago that attempts to make the restart logic more generic:
> 
> https://lkml.org/lkml/2011/10/23/80
> 
> The entires patch series, which doesn't necessarily even apply to
> Linux master anymore due to other changes along the way, can be found
> at:
> 
> git://openrisc.net/~jonas/linux  ('signal-arch' branch)
> 
> Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there.
> 
> Al, what do you think?

In principle that's a nice cleanup, but...  For one thing, I really do not
believe that we should be returning to userland for handlerless restarts.
At all.  In that respect s390 and arm do it right.  Moreover, the loop calling
do_notify_resume() is better off in C (see e.g. alpha and arm for examples of
that approach).

Another thing is interplay with ptrace ;-/  And ptrace ABI is pretty much
"whatever gdb and its ilk happen to rely upon" - it's not formalized at
all and by now it's too late for existing ports.  Worse, we get to keep it
working as is; old binaries shall not be broken.

Hell knows...  ptrace vs. signals is a thing of horror, indeed - I had a damn
good reason for not including it into this braindump.  Note that there are
several aspects of behaviour that vary between the architectures:
* get_signal_to_deliver() is a chance for tracer to play with our
state; what instruction pointer value should it see in case of possible
restart?
* Does the change of instruction pointer done by tracer cancel
restart?  Or is there some other way for tracer to signal the need of
such cancel?
* Does the change of syscall return value done by tracer affect
(or cancel) the restart?
And that's just for starters - I'm leaving aside the unholy mess around
single-stepping vs. signals.  The question here is not just "what behaviour
would make sense if we designed it from scratch", it's "what behaviour is
demonstrated by this architecture".  gdb is chock-full of arch-dependent
workarounds and yes, we need to keep that working.  Moreover, gdb isn't
the only thing tied to that crap.

What we need is a decent review of what's in there for all
architectures.  And that'll take cooperation from maintainers -
it's far too easy to miss things on 3rd-party review in that area ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-15 Thread Jonas Bonn
On 8 December 2012 08:44, Al Viro  wrote:
> On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
>>   What we need to guarantee is
>> * restarts do not happen on signals caught in interrupts or exceptions
>> * restarts do not happen on signals caught in sigreturn()
>> * restart should happen only once, even if we get through do_signal() many
>> times.
>
> FWIW, here's the current situation:
>

> openrisc: broken.  regs->orig_gpr11 can be easily used to fix - it fits the
> usual model, but isn't set by sigreturn/restarts.  BTW, the comment around
> the call of do_notify_resume() in the asm glue is deeply confused - we *do*
> want the userspace pt_regs; fortunately, there can't be any on top of those
> at that point.
>

Right, I've known about this for a while and have even had a patch for
this lying about in a side-branch.  I got side-tracked while fixing
this into trying to make the restart logic more generic, and the
openrisc fix never got merged upstream.  I'll need to revisit this.

That said, let me point again to the series I posted for review a year
ago that attempts to make the restart logic more generic:

https://lkml.org/lkml/2011/10/23/80

The entires patch series, which doesn't necessarily even apply to
Linux master anymore due to other changes along the way, can be found
at:

git://openrisc.net/~jonas/linux  ('signal-arch' branch)

Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there.

Al, what do you think?

/Jonas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-15 Thread Jonas Bonn
On 8 December 2012 08:44, Al Viro v...@zeniv.linux.org.uk wrote:
 On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
   What we need to guarantee is
 * restarts do not happen on signals caught in interrupts or exceptions
 * restarts do not happen on signals caught in sigreturn()
 * restart should happen only once, even if we get through do_signal() many
 times.

 FWIW, here's the current situation:


 openrisc: broken.  regs-orig_gpr11 can be easily used to fix - it fits the
 usual model, but isn't set by sigreturn/restarts.  BTW, the comment around
 the call of do_notify_resume() in the asm glue is deeply confused - we *do*
 want the userspace pt_regs; fortunately, there can't be any on top of those
 at that point.


Right, I've known about this for a while and have even had a patch for
this lying about in a side-branch.  I got side-tracked while fixing
this into trying to make the restart logic more generic, and the
openrisc fix never got merged upstream.  I'll need to revisit this.

That said, let me point again to the series I posted for review a year
ago that attempts to make the restart logic more generic:

https://lkml.org/lkml/2011/10/23/80

The entires patch series, which doesn't necessarily even apply to
Linux master anymore due to other changes along the way, can be found
at:

git://openrisc.net/~jonas/linux  ('signal-arch' branch)

Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there.

Al, what do you think?

/Jonas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-15 Thread Al Viro
On Sat, Dec 15, 2012 at 05:26:30PM +0100, Jonas Bonn wrote:

 That said, let me point again to the series I posted for review a year
 ago that attempts to make the restart logic more generic:
 
 https://lkml.org/lkml/2011/10/23/80
 
 The entires patch series, which doesn't necessarily even apply to
 Linux master anymore due to other changes along the way, can be found
 at:
 
 git://openrisc.net/~jonas/linux  ('signal-arch' branch)
 
 Commit 4aa1797d978fe2d45ececceee535257e19374df8 is the interesting one there.
 
 Al, what do you think?

In principle that's a nice cleanup, but...  For one thing, I really do not
believe that we should be returning to userland for handlerless restarts.
At all.  In that respect s390 and arm do it right.  Moreover, the loop calling
do_notify_resume() is better off in C (see e.g. alpha and arm for examples of
that approach).

Another thing is interplay with ptrace ;-/  And ptrace ABI is pretty much
whatever gdb and its ilk happen to rely upon - it's not formalized at
all and by now it's too late for existing ports.  Worse, we get to keep it
working as is; old binaries shall not be broken.

Hell knows...  ptrace vs. signals is a thing of horror, indeed - I had a damn
good reason for not including it into this braindump.  Note that there are
several aspects of behaviour that vary between the architectures:
* get_signal_to_deliver() is a chance for tracer to play with our
state; what instruction pointer value should it see in case of possible
restart?
* Does the change of instruction pointer done by tracer cancel
restart?  Or is there some other way for tracer to signal the need of
such cancel?
* Does the change of syscall return value done by tracer affect
(or cancel) the restart?
And that's just for starters - I'm leaving aside the unholy mess around
single-stepping vs. signals.  The question here is not just what behaviour
would make sense if we designed it from scratch, it's what behaviour is
demonstrated by this architecture.  gdb is chock-full of arch-dependent
workarounds and yes, we need to keep that working.  Moreover, gdb isn't
the only thing tied to that crap.

What we need is a decent review of what's in there for all
architectures.  And that'll take cooperation from maintainers -
it's far too easy to miss things on 3rd-party review in that area ;-/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-12 Thread James Hogan
On 08/12/12 18:14, Al Viro wrote:
> On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
> 
>>  What we need to guarantee is
>> * restarts do not happen on signals caught in interrupts or exceptions
>> * restarts do not happen on signals caught in sigreturn()

Since we don't currently have an orig syscall in our pt_regs that
we can make invalid, is it acceptable to just explicitly exclude
rt_sigreturn? e.g. using something like this around the switch
statements that check for -ERESTART*:

/*
 * Decide whether a syscall restart should be checked for.
 * This needs to exclude non-syscalls (syscall == -1), and sys_rt_sigreturn.
 */
static int restartable_syscall(int syscall)
{
return syscall >= 0 && syscall != __NR_rt_sigreturn;
}

>> * restart should happen only once, even if we get through do_signal() many
>> times.
> 
> BTW, signal caught in sigreturn is *not* something requiring a narrow
> race.  It's perfectly normal to have some signals blocked for the
> duration of signal handler - the signal itself is blocked by default
> unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing
> to block an arbitrary set of signals.  Upon return from signal handler
> we undo that and if any of the temporary blocked signals has arrived
> while we'd been in the handler (e.g. had been raised by the handler itself),
> it will be caught as soon as we unblock it, i.e. in sigreturn.
> 
> Unfortunately, the testcases are somewhat architecture-dependent.  See, for
> example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there
> might be a way to arrange for asm-free equivalent if one played with -O0,
> but it's probably not worth the trouble.  That one deals with sigreturn
> from signal caught in interrupt; sigreturn from signal caught in syscall
> is a bit trickier.  TBH, I don't understand your syscall calling conventions
> well enough to cook one up; your restart logics looks really strange.
> You leave ->DX[0].U0 modified - fair enough, it doesn't seem to be used
> by syscall entry path - *and* you revert ->DX[0].U1 to the state you
> used to have on entry.  The thing is, I don't see any place that would
> have changed it in between; why do you need that
>   regs->REG_SYSCALL = orig_syscall;
> in there at all?

Yes, this doesn't seem to be necessary. The only other place
REG_SYSCALL is changed is when it's set to __NR_restart_syscall, in
which case I presume it never needs to be reset i.e. if a syscall
returns -ERESTART_RESTARTBLOCK, it either doesn't return a different
-ERESTART* from the restart block callback, or it's acceptable in
that case to restart sys_restart_syscall rather than the original
syscall that returned -ERESTART_RESTARTBLOCK. Is that right?

> 
> BTW, could you (and other folks submitting ports) document the ABI?
> See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example...
> 

Good idea, something like below?

Thanks
James

Date: Tue, 11 Dec 2012 10:08:26 +
Subject: [PATCH 1/1] metag: add kernel-ABI document

Add a document in Documentation/metag/ describing the Linux ABI for
metag. It includes an outline of the registers, which ones are special
in userland and the kernel, the system call ABI, and an overview of the
calling conventions.

It was suggested that new architecture ports should have some ABI
documentation available, with Documentation/frv/kernel-ABI.txt
referenced as a decent example, from which some inspiration was drawn
for this patch.

Signed-off-by: James Hogan 
Reported-by: Al Viro 
---
 Documentation/metag/00-INDEX   |2 +
 Documentation/metag/kernel-ABI.txt |  256 
 2 files changed, 258 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/metag/kernel-ABI.txt

diff --git a/Documentation/metag/00-INDEX b/Documentation/metag/00-INDEX
index 4fe2e16..4a93a27 100644
--- a/Documentation/metag/00-INDEX
+++ b/Documentation/metag/00-INDEX
@@ -6,3 +6,5 @@ coremem.txt
- Documents the core memory interface used by suspend code
 comet/
- Documentation specific to the Comet SoC
+kernel-ABI.txt
+   - Documents metag ABI details
diff --git a/Documentation/metag/kernel-ABI.txt 
b/Documentation/metag/kernel-ABI.txt
new file mode 100644
index 000..7b8dee8
--- /dev/null
+++ b/Documentation/metag/kernel-ABI.txt
@@ -0,0 +1,256 @@
+   ==
+   KERNEL ABIS FOR METAG ARCH
+   ==
+
+This document describes the Linux ABIs for the metag architecture, and has the
+following sections:
+
+ (*) Outline of registers
+ (*) Userland registers
+ (*) Kernel registers
+ (*) System call ABI
+ (*) Calling conventions
+
+
+
+OUTLINE OF REGISTERS
+
+
+The main Meta core registers are arranged in units:
+
+   UNITTypeDESCRIPTION GP  EXT PRIVGLOBAL
+   === === === === === === 

Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-12 Thread James Hogan
On 08/12/12 18:14, Al Viro wrote:
 On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
 
  What we need to guarantee is
 * restarts do not happen on signals caught in interrupts or exceptions
 * restarts do not happen on signals caught in sigreturn()

Since we don't currently have an orig syscall in our pt_regs that
we can make invalid, is it acceptable to just explicitly exclude
rt_sigreturn? e.g. using something like this around the switch
statements that check for -ERESTART*:

/*
 * Decide whether a syscall restart should be checked for.
 * This needs to exclude non-syscalls (syscall == -1), and sys_rt_sigreturn.
 */
static int restartable_syscall(int syscall)
{
return syscall = 0  syscall != __NR_rt_sigreturn;
}

 * restart should happen only once, even if we get through do_signal() many
 times.
 
 BTW, signal caught in sigreturn is *not* something requiring a narrow
 race.  It's perfectly normal to have some signals blocked for the
 duration of signal handler - the signal itself is blocked by default
 unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing
 to block an arbitrary set of signals.  Upon return from signal handler
 we undo that and if any of the temporary blocked signals has arrived
 while we'd been in the handler (e.g. had been raised by the handler itself),
 it will be caught as soon as we unblock it, i.e. in sigreturn.
 
 Unfortunately, the testcases are somewhat architecture-dependent.  See, for
 example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there
 might be a way to arrange for asm-free equivalent if one played with -O0,
 but it's probably not worth the trouble.  That one deals with sigreturn
 from signal caught in interrupt; sigreturn from signal caught in syscall
 is a bit trickier.  TBH, I don't understand your syscall calling conventions
 well enough to cook one up; your restart logics looks really strange.
 You leave -DX[0].U0 modified - fair enough, it doesn't seem to be used
 by syscall entry path - *and* you revert -DX[0].U1 to the state you
 used to have on entry.  The thing is, I don't see any place that would
 have changed it in between; why do you need that
   regs-REG_SYSCALL = orig_syscall;
 in there at all?

Yes, this doesn't seem to be necessary. The only other place
REG_SYSCALL is changed is when it's set to __NR_restart_syscall, in
which case I presume it never needs to be reset i.e. if a syscall
returns -ERESTART_RESTARTBLOCK, it either doesn't return a different
-ERESTART* from the restart block callback, or it's acceptable in
that case to restart sys_restart_syscall rather than the original
syscall that returned -ERESTART_RESTARTBLOCK. Is that right?

 
 BTW, could you (and other folks submitting ports) document the ABI?
 See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example...
 

Good idea, something like below?

Thanks
James

Date: Tue, 11 Dec 2012 10:08:26 +
Subject: [PATCH 1/1] metag: add kernel-ABI document

Add a document in Documentation/metag/ describing the Linux ABI for
metag. It includes an outline of the registers, which ones are special
in userland and the kernel, the system call ABI, and an overview of the
calling conventions.

It was suggested that new architecture ports should have some ABI
documentation available, with Documentation/frv/kernel-ABI.txt
referenced as a decent example, from which some inspiration was drawn
for this patch.

Signed-off-by: James Hogan james.ho...@imgtec.com
Reported-by: Al Viro v...@zeniv.linux.org.uk
---
 Documentation/metag/00-INDEX   |2 +
 Documentation/metag/kernel-ABI.txt |  256 
 2 files changed, 258 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/metag/kernel-ABI.txt

diff --git a/Documentation/metag/00-INDEX b/Documentation/metag/00-INDEX
index 4fe2e16..4a93a27 100644
--- a/Documentation/metag/00-INDEX
+++ b/Documentation/metag/00-INDEX
@@ -6,3 +6,5 @@ coremem.txt
- Documents the core memory interface used by suspend code
 comet/
- Documentation specific to the Comet SoC
+kernel-ABI.txt
+   - Documents metag ABI details
diff --git a/Documentation/metag/kernel-ABI.txt 
b/Documentation/metag/kernel-ABI.txt
new file mode 100644
index 000..7b8dee8
--- /dev/null
+++ b/Documentation/metag/kernel-ABI.txt
@@ -0,0 +1,256 @@
+   ==
+   KERNEL ABIS FOR METAG ARCH
+   ==
+
+This document describes the Linux ABIs for the metag architecture, and has the
+following sections:
+
+ (*) Outline of registers
+ (*) Userland registers
+ (*) Kernel registers
+ (*) System call ABI
+ (*) Calling conventions
+
+
+
+OUTLINE OF REGISTERS
+
+
+The main Meta core registers are arranged in units:
+
+   UNITTypeDESCRIPTION GP  EXT PRIVGLOBAL
+   === === === === === === 

Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-10 Thread James Hogan
On 06/12/12 22:09, Al Viro wrote:
> On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote:
> 
>> Agreed, it looks wrong. Looking at the sh version, is there a particular
>> reason to only check for -EFAULT and not the other errors that
>> do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)?
> 
> See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2

Thanks :)

> 
>>> BTW, what's to stop the syscall restart triggering if you catch a signal
>>> while in rt_sigreturn(2)?
>>
>> I'm not sure I understand how that could cause a problem. Could you
>> elaborate the sequence of events?
>>
>> The signal restart is triggered by the return value register, so
>> rt_sigreturn would have to return -ERESTART*. This could happen if the
>> signal handler overwrote the return value in the sigcontext (which as
>> far as I can tell could also happen on ARM), or if the syscall that was
>> originally interrupted by the signal has -ERESTARTNOINTR ||
>> (-ERESTARTSYS && SA_RESTART), but in that case rt_sigreturn has already
>> switched back to the context of the original syscall so that's the right
>> thing to do isn't it? I've probably missed something important :-)
> 
> [we probably need something along the lines of braindump below in 
> somewhere in Documentation/*; comments and improvements are very
> welcome - this is just a starting point.  We *do* need some coherent
> explanation of signal semantics, judging by how often people step on
> the same landmines...]

Thanks, this helps a lot.

Cheers
James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-10 Thread James Hogan
On 06/12/12 22:09, Al Viro wrote:
 On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote:
 
 Agreed, it looks wrong. Looking at the sh version, is there a particular
 reason to only check for -EFAULT and not the other errors that
 do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)?
 
 See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2

Thanks :)

 
 BTW, what's to stop the syscall restart triggering if you catch a signal
 while in rt_sigreturn(2)?

 I'm not sure I understand how that could cause a problem. Could you
 elaborate the sequence of events?

 The signal restart is triggered by the return value register, so
 rt_sigreturn would have to return -ERESTART*. This could happen if the
 signal handler overwrote the return value in the sigcontext (which as
 far as I can tell could also happen on ARM), or if the syscall that was
 originally interrupted by the signal has -ERESTARTNOINTR ||
 (-ERESTARTSYS  SA_RESTART), but in that case rt_sigreturn has already
 switched back to the context of the original syscall so that's the right
 thing to do isn't it? I've probably missed something important :-)
 
 [we probably need something along the lines of braindump below in 
 somewhere in Documentation/*; comments and improvements are very
 welcome - this is just a starting point.  We *do* need some coherent
 explanation of signal semantics, judging by how often people step on
 the same landmines...]

Thanks, this helps a lot.

Cheers
James

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-08 Thread Al Viro
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:

>   What we need to guarantee is
> * restarts do not happen on signals caught in interrupts or exceptions
> * restarts do not happen on signals caught in sigreturn()
> * restart should happen only once, even if we get through do_signal() many
> times.

BTW, signal caught in sigreturn is *not* something requiring a narrow
race.  It's perfectly normal to have some signals blocked for the
duration of signal handler - the signal itself is blocked by default
unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing
to block an arbitrary set of signals.  Upon return from signal handler
we undo that and if any of the temporary blocked signals has arrived
while we'd been in the handler (e.g. had been raised by the handler itself),
it will be caught as soon as we unblock it, i.e. in sigreturn.

Unfortunately, the testcases are somewhat architecture-dependent.  See, for
example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there
might be a way to arrange for asm-free equivalent if one played with -O0,
but it's probably not worth the trouble.  That one deals with sigreturn
from signal caught in interrupt; sigreturn from signal caught in syscall
is a bit trickier.  TBH, I don't understand your syscall calling conventions
well enough to cook one up; your restart logics looks really strange.
You leave ->DX[0].U0 modified - fair enough, it doesn't seem to be used
by syscall entry path - *and* you revert ->DX[0].U1 to the state you
used to have on entry.  The thing is, I don't see any place that would
have changed it in between; why do you need that
regs->REG_SYSCALL = orig_syscall;
in there at all?

BTW, could you (and other folks submitting ports) document the ABI?
See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-08 Thread Al Viro
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:

   What we need to guarantee is
 * restarts do not happen on signals caught in interrupts or exceptions
 * restarts do not happen on signals caught in sigreturn()
 * restart should happen only once, even if we get through do_signal() many
 times.

BTW, signal caught in sigreturn is *not* something requiring a narrow
race.  It's perfectly normal to have some signals blocked for the
duration of signal handler - the signal itself is blocked by default
unless you have set SA_NODEFER in sa_flags and there's sa_mask allowing
to block an arbitrary set of signals.  Upon return from signal handler
we undo that and if any of the temporary blocked signals has arrived
while we'd been in the handler (e.g. had been raised by the handler itself),
it will be caught as soon as we unblock it, i.e. in sigreturn.

Unfortunately, the testcases are somewhat architecture-dependent.  See, for
example, arm one in commit 653d48b22166db2d8b1515ebe6f9f0f7c95dfc86; there
might be a way to arrange for asm-free equivalent if one played with -O0,
but it's probably not worth the trouble.  That one deals with sigreturn
from signal caught in interrupt; sigreturn from signal caught in syscall
is a bit trickier.  TBH, I don't understand your syscall calling conventions
well enough to cook one up; your restart logics looks really strange.
You leave -DX[0].U0 modified - fair enough, it doesn't seem to be used
by syscall entry path - *and* you revert -DX[0].U1 to the state you
used to have on entry.  The thing is, I don't see any place that would
have changed it in between; why do you need that
regs-REG_SYSCALL = orig_syscall;
in there at all?

BTW, could you (and other folks submitting ports) document the ABI?
See e.g. Documentation/frv/kernel-ABI.txt for fairly decent example...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-07 Thread Al Viro
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
>   "Subtle and undocumented" is an extremely polite way to describe that.
> By now we had at least a dozen architectures step on that trap, simply because
> they had different calling conventions and the same logics did *not* "just
> work" there.  
> 
>   What we need to guarantee is
> * restarts do not happen on signals caught in interrupts or exceptions
> * restarts do not happen on signals caught in sigreturn()
> * restart should happen only once, even if we get through do_signal() many
> times.

FWIW, here's the current situation:

alpha: works.  Double restarts are prevented by the loop in do_work_pending()
resetting 'r0' (syscall number or 0 if restarts should not be done)
to 0 after the first call of do_signal(); all restart logics is conditional
on r0 != 0.  The logics making sure that we get the right value passed
to do_work_pending() is convoluted and had cost us at least one bug
(sigreturn/rt_sigreturn had stopped only once in case of straced process;
strace(1) got seriously confused and produced garbage).

arm: works.  Double restarts are prevented by logics similar to alpha
do_work_pending(); prevention of restarts on non-syscalls and sigreturn is
done by asm glue setting r8 ('why', aka 'tbl') to 0 in non-syscall entry points
and to syscall table address in syscall entry; zeroed in asm wrappers for
sigreturn/rt_sigreturn.  Used to be broken until several years ago.

arm64: works.  Syscall number is in pt_regs (->syscallno); -1 for non-syscall
ones.  Reaching do_signal() the first time around will set it to -1 and so will
sigreturn (in restore_sigframe()).  Restart logics is conditional on
->syscallno being non-negative.

avr32: _very_ odd logics used to decide whether to do restarts or not and
frankly, I do not believe that it could possibly work correctly - whatever
we do when building a sigframe, we don't touch SYSREG_SR in process, so
that won't prevent double restarts.  And if we had r12 (first argument of
syscall) restart-worthy at the entry, setup_syscall_restart() will leave
us with restart-worthy value in ->r12.  So e.g. pause(2) called when
r12 happened to contain -514 (it's a zero-argument syscall, so calling it
doesn't involve assignments to r12) will happily hit double restarts if
we have e.g. SIGCHLD coming often enough.  If that thing works, I would
really appreciate a detailed explanation of how it manages to do that - it
definitely deserves one.

blackfin: doesn't handle multiple signals; if you get a SIGSEGV generated
by failing attempt to set a sigframe up, too bad - you'll pass to userland
and coredump will hit at some later point when a hardware interrupt happend.
Restarts on sigreturn and non-syscalls are prevented by checking if
->orig_p0 is non-negative (similar to arm64 solution above) and it's easy to
turn into prevention of double restarts, which will become necessary as soon
as multiple signal handling gets fixed.  Actually, it's almost OK as is -
ERESTART_RESTARTBLOCK case is the only problematic one.

c6x: there's a flag next to pt_regs on stack and it's non-zero if and only
if we have a syscall.  Passed explicitly to do_notify_resume() to tell if
restarts are allowed.  As far as I can see, it is vulnerable to bogus
restarts on sigreturn (can be fixed by clearing the same flag in
do_rt_sigreturn() - simple *(long *)(regs + 1) = 0 in there will do).
It might be vulnerable to double restarts as well - pause(2) is not
enabled there, but it's not much comfort.  In the best case we are relying
on the following property:
no syscall can return -ERESTART... when called with the first
argument equal to that value.
Might be true (the usual suspects are pause() and ancient sigsuspend() of
3-argument variety and neither is used here), but it's brittle as hell.
Come to think of that, clone(2) would probably fit the bill - we ignore
all unknown bits in the mask and clone(2) *can* return -ERESTARTNOINTR.
So it's almost certainly vulnerable.  The same fix would do, but explicit
loop a-la arm might be better.

cris: same lossage as on blackfin (quits after the first signal).  Vulnerable
to bogus restarts on sigreturn.  Would be vulnerable to double restarts if it
handled multiple signals.

frv: works (similar to the situation on arm64.  Used to be broken until a
couple of years ago.

h8300: works.  Prevention of restarts on sigreturn and non-syscalls based
on sign of ->orig_er0; the first pass through syscall restart logics renders
regs->er0 (return value) non-restart-worthy - anything that used to be
restart-worthy becomes either non-negative (->orig_er0 has to be, or we
won't touch that at all) or -EINTR.  In other words, we can't hit that
sucker twice.

hexagon: broken.  Prevention of restarts on non-syscalls is based on
sign of ->syscall_nr.  sigreturn carefully sets it *positive* and that
makes it vulnerable to bogus restarts.  Moreover, double restarts are
possible as well, same as on c6x.

ia64: 

Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-07 Thread Al Viro
On Thu, Dec 06, 2012 at 10:09:55PM +, Al Viro wrote:
   Subtle and undocumented is an extremely polite way to describe that.
 By now we had at least a dozen architectures step on that trap, simply because
 they had different calling conventions and the same logics did *not* just
 work there.  
 
   What we need to guarantee is
 * restarts do not happen on signals caught in interrupts or exceptions
 * restarts do not happen on signals caught in sigreturn()
 * restart should happen only once, even if we get through do_signal() many
 times.

FWIW, here's the current situation:

alpha: works.  Double restarts are prevented by the loop in do_work_pending()
resetting 'r0' (syscall number or 0 if restarts should not be done)
to 0 after the first call of do_signal(); all restart logics is conditional
on r0 != 0.  The logics making sure that we get the right value passed
to do_work_pending() is convoluted and had cost us at least one bug
(sigreturn/rt_sigreturn had stopped only once in case of straced process;
strace(1) got seriously confused and produced garbage).

arm: works.  Double restarts are prevented by logics similar to alpha
do_work_pending(); prevention of restarts on non-syscalls and sigreturn is
done by asm glue setting r8 ('why', aka 'tbl') to 0 in non-syscall entry points
and to syscall table address in syscall entry; zeroed in asm wrappers for
sigreturn/rt_sigreturn.  Used to be broken until several years ago.

arm64: works.  Syscall number is in pt_regs (-syscallno); -1 for non-syscall
ones.  Reaching do_signal() the first time around will set it to -1 and so will
sigreturn (in restore_sigframe()).  Restart logics is conditional on
-syscallno being non-negative.

avr32: _very_ odd logics used to decide whether to do restarts or not and
frankly, I do not believe that it could possibly work correctly - whatever
we do when building a sigframe, we don't touch SYSREG_SR in process, so
that won't prevent double restarts.  And if we had r12 (first argument of
syscall) restart-worthy at the entry, setup_syscall_restart() will leave
us with restart-worthy value in -r12.  So e.g. pause(2) called when
r12 happened to contain -514 (it's a zero-argument syscall, so calling it
doesn't involve assignments to r12) will happily hit double restarts if
we have e.g. SIGCHLD coming often enough.  If that thing works, I would
really appreciate a detailed explanation of how it manages to do that - it
definitely deserves one.

blackfin: doesn't handle multiple signals; if you get a SIGSEGV generated
by failing attempt to set a sigframe up, too bad - you'll pass to userland
and coredump will hit at some later point when a hardware interrupt happend.
Restarts on sigreturn and non-syscalls are prevented by checking if
-orig_p0 is non-negative (similar to arm64 solution above) and it's easy to
turn into prevention of double restarts, which will become necessary as soon
as multiple signal handling gets fixed.  Actually, it's almost OK as is -
ERESTART_RESTARTBLOCK case is the only problematic one.

c6x: there's a flag next to pt_regs on stack and it's non-zero if and only
if we have a syscall.  Passed explicitly to do_notify_resume() to tell if
restarts are allowed.  As far as I can see, it is vulnerable to bogus
restarts on sigreturn (can be fixed by clearing the same flag in
do_rt_sigreturn() - simple *(long *)(regs + 1) = 0 in there will do).
It might be vulnerable to double restarts as well - pause(2) is not
enabled there, but it's not much comfort.  In the best case we are relying
on the following property:
no syscall can return -ERESTART... when called with the first
argument equal to that value.
Might be true (the usual suspects are pause() and ancient sigsuspend() of
3-argument variety and neither is used here), but it's brittle as hell.
Come to think of that, clone(2) would probably fit the bill - we ignore
all unknown bits in the mask and clone(2) *can* return -ERESTARTNOINTR.
So it's almost certainly vulnerable.  The same fix would do, but explicit
loop a-la arm might be better.

cris: same lossage as on blackfin (quits after the first signal).  Vulnerable
to bogus restarts on sigreturn.  Would be vulnerable to double restarts if it
handled multiple signals.

frv: works (similar to the situation on arm64.  Used to be broken until a
couple of years ago.

h8300: works.  Prevention of restarts on sigreturn and non-syscalls based
on sign of -orig_er0; the first pass through syscall restart logics renders
regs-er0 (return value) non-restart-worthy - anything that used to be
restart-worthy becomes either non-negative (-orig_er0 has to be, or we
won't touch that at all) or -EINTR.  In other words, we can't hit that
sucker twice.

hexagon: broken.  Prevention of restarts on non-syscalls is based on
sign of -syscall_nr.  sigreturn carefully sets it *positive* and that
makes it vulnerable to bogus restarts.  Moreover, double restarts are
possible as well, same as on c6x.

ia64: really, really weird.  The 

[braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-06 Thread Al Viro
On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote:

> Agreed, it looks wrong. Looking at the sh version, is there a particular
> reason to only check for -EFAULT and not the other errors that
> do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)?

See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2

> > BTW, what's to stop the syscall restart triggering if you catch a signal
> > while in rt_sigreturn(2)?
> 
> I'm not sure I understand how that could cause a problem. Could you
> elaborate the sequence of events?
> 
> The signal restart is triggered by the return value register, so
> rt_sigreturn would have to return -ERESTART*. This could happen if the
> signal handler overwrote the return value in the sigcontext (which as
> far as I can tell could also happen on ARM), or if the syscall that was
> originally interrupted by the signal has -ERESTARTNOINTR ||
> (-ERESTARTSYS && SA_RESTART), but in that case rt_sigreturn has already
> switched back to the context of the original syscall so that's the right
> thing to do isn't it? I've probably missed something important :-)

[we probably need something along the lines of braindump below in 
somewhere in Documentation/*; comments and improvements are very
welcome - this is just a starting point.  We *do* need some coherent
explanation of signal semantics, judging by how often people step on
the same landmines...]

To understand what's going on with the signals, it's better
to think of crossing the kernel boundary not as of function call and
return from it, but as of coroutines passing control back and forth.

When we enter the kernel mode, we start with saving CPU state.
Usually (and you are going to hate that word well before you read to
the end) it's stored in struct pt_regs, but it might be more complicated.
For our purposes it's better to think of it as abstract saved state,
leaving aside the question of how it's represented.

What happens next depends on why we'd been called - it might
have been an interrupt or exception or it might have been a syscall.
The main difference is, of course, that on return from interrupt or
exception we want registers unchanged while on return from syscall we
expect to find return value in a register.  Another thing is that
after syscall we want to advance to the next instruction, while e.g.
a page fault usually wants to repeat the instruction that had triggered
it.  It's _very_ architecture-dependent; in any case, we usually know
very early which userland instruction would we normally have executed
next, so let's consider that knowledge a part of saved state.

For now I'm going to ignore ptrace-related horrors; it's a
separate story (and an ugly one).  In absense of that, the syscall
execution is conceptually simple -
* choose which function to call and what arguments to pass
to it (both derived from saved state; usually that's picked directly
from values left in registers by userland, but there might be more
convoluted cases when e.g. syscall number is encoded as part of
instruction or some arguments are passed on userland stack, etc.)
* do a function call
* shove return value into the saved state
At that point we are ready to resume the execution of userland code.
Usually.  Unless a signal had arrived.  So far it looked pretty much
like a function call - sure, we had been using the kernel stack instead
of the userland one, the calling sequence had been unusual and the call
chain included a nasty glob of assembler glue, but as far as control
flow is concerned, everything looks as if we'd called a function and
returned from it.  It's about to get more complicated, though.

Signal arrival is indicated by TIF_SIGPENDING flag set on the
target process.  If we see it set when we are about to resume the
execution in userland, we find out which signal has it been and what
handler (if any) do we have for it.  For now I'm going to leave the
syscall restart logics aside; we'll get back to it shortly.  Suppose
we had a successful call of open(2) or e.g. just handled a page fault.
We have saved state ready for resuming the userland execution.  It
contains the address of next instruction to execute and, in case of
open(2), has the return value of sys_open() already reflected in it.
Now we find ourselves wanting to have a signal handler executed; it's
a userland function and we can't have that run in kernel mode.
Here's what we normally do:
* create a new userland coroutine context - that's what sigframe
is about.
* encode the saved state and store it in sigframe
* modify the saved state so that the next instruction to execute
would be at the beginning of signal handler, registers would look as if
we were passing the right arguments to that handler and return address
(either in register or in stack frame built for the handler - depends on
ABI) would point to a small piece of code that would do cleanup and
terminate that coroutine (more 

[braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)

2012-12-06 Thread Al Viro
On Thu, Dec 06, 2012 at 11:17:34AM +, James Hogan wrote:

 Agreed, it looks wrong. Looking at the sh version, is there a particular
 reason to only check for -EFAULT and not the other errors that
 do_sigaltstack can return (-EPERM, -EINVAL, and -ENOMEM)?

See commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2

  BTW, what's to stop the syscall restart triggering if you catch a signal
  while in rt_sigreturn(2)?
 
 I'm not sure I understand how that could cause a problem. Could you
 elaborate the sequence of events?
 
 The signal restart is triggered by the return value register, so
 rt_sigreturn would have to return -ERESTART*. This could happen if the
 signal handler overwrote the return value in the sigcontext (which as
 far as I can tell could also happen on ARM), or if the syscall that was
 originally interrupted by the signal has -ERESTARTNOINTR ||
 (-ERESTARTSYS  SA_RESTART), but in that case rt_sigreturn has already
 switched back to the context of the original syscall so that's the right
 thing to do isn't it? I've probably missed something important :-)

[we probably need something along the lines of braindump below in 
somewhere in Documentation/*; comments and improvements are very
welcome - this is just a starting point.  We *do* need some coherent
explanation of signal semantics, judging by how often people step on
the same landmines...]

To understand what's going on with the signals, it's better
to think of crossing the kernel boundary not as of function call and
return from it, but as of coroutines passing control back and forth.

When we enter the kernel mode, we start with saving CPU state.
Usually (and you are going to hate that word well before you read to
the end) it's stored in struct pt_regs, but it might be more complicated.
For our purposes it's better to think of it as abstract saved state,
leaving aside the question of how it's represented.

What happens next depends on why we'd been called - it might
have been an interrupt or exception or it might have been a syscall.
The main difference is, of course, that on return from interrupt or
exception we want registers unchanged while on return from syscall we
expect to find return value in a register.  Another thing is that
after syscall we want to advance to the next instruction, while e.g.
a page fault usually wants to repeat the instruction that had triggered
it.  It's _very_ architecture-dependent; in any case, we usually know
very early which userland instruction would we normally have executed
next, so let's consider that knowledge a part of saved state.

For now I'm going to ignore ptrace-related horrors; it's a
separate story (and an ugly one).  In absense of that, the syscall
execution is conceptually simple -
* choose which function to call and what arguments to pass
to it (both derived from saved state; usually that's picked directly
from values left in registers by userland, but there might be more
convoluted cases when e.g. syscall number is encoded as part of
instruction or some arguments are passed on userland stack, etc.)
* do a function call
* shove return value into the saved state
At that point we are ready to resume the execution of userland code.
Usually.  Unless a signal had arrived.  So far it looked pretty much
like a function call - sure, we had been using the kernel stack instead
of the userland one, the calling sequence had been unusual and the call
chain included a nasty glob of assembler glue, but as far as control
flow is concerned, everything looks as if we'd called a function and
returned from it.  It's about to get more complicated, though.

Signal arrival is indicated by TIF_SIGPENDING flag set on the
target process.  If we see it set when we are about to resume the
execution in userland, we find out which signal has it been and what
handler (if any) do we have for it.  For now I'm going to leave the
syscall restart logics aside; we'll get back to it shortly.  Suppose
we had a successful call of open(2) or e.g. just handled a page fault.
We have saved state ready for resuming the userland execution.  It
contains the address of next instruction to execute and, in case of
open(2), has the return value of sys_open() already reflected in it.
Now we find ourselves wanting to have a signal handler executed; it's
a userland function and we can't have that run in kernel mode.
Here's what we normally do:
* create a new userland coroutine context - that's what sigframe
is about.
* encode the saved state and store it in sigframe
* modify the saved state so that the next instruction to execute
would be at the beginning of signal handler, registers would look as if
we were passing the right arguments to that handler and return address
(either in register or in stack frame built for the handler - depends on
ABI) would point to a small piece of code that would do cleanup and
terminate that coroutine (more on that in a minute).