Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-19 Thread Eric W. Biederman
Dave Martin  writes:

> On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote:
>
> [...]
>
>> The other thing we should do is to get rid of the stupid padding.
>> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
>> completely insane, when it's always just zero in the kernel.
>
> Agreed, inside the kernel the padding achieves nothing.
>
>> So put that _pad[] thing inside #ifndef __KERNEL__, and make
>> copy_siginfo_to_user() write the padding zeroes when copying to user
>> space. The reason for the padding is "future expansion", so we do want
>> to tell the user space that it's maybe up to 128 bytes in size, but if
>> we don't fill it all, we shouldn't waste time and memory on clearing
>> the padding internally.
>> 
>> I'm certainly *hoping* nobody depends on the whole 128 bytes in
>> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
>> is negative), but the man-page only says "si_value", and the compat
>> function doesn't copy any more than that either, so any user that
>> tries to fill in more than si_value is already broken. In fact, it
>> might even be worth enforcing that in rt_sigqueueinfo(), just to see
>> if anybody plays any games..
>
> [...]
>
> Digression:
>
> Since we don't traditionally zero the tail-padding in the user sigframe,
> is there a reliable way for userspace to detect newly-added fields in
> siginfo other than by having an explicit sigaction sa_flags flag to
> request them?  I imagine something like [1] below from the userspace
> perspective.
>
> On a separate thread, the issue of how to report syndrome information
> for SIGSEGV came up [2] (such as whether the faulting instruction was a
> read or a write).  This information is useful (and used) by things like
> userspace sanitisers and qemu.  Currently, reporting this to userspace
> relies on arch-specific cruft in the sigframe.
>
> We're committed to maintaining what's already in each arch sigframe,
> but it would be preferable to have a portable way of adding information
> to siginfo in a generic way.  si_code doesn't really work for that,
> since si_codes are mutually exclusive: I can't see a way of adding
> supplementary information using si_code.
>
> Anyway, that would be a separate RFC in the future (if ever).

So far what I have seen done is ``registers'' added to sigcontext.
Which it looks like you have done with esr.

Scrubbing information from faults to where the addresses point
outside of the userspace mapping makes sense.

I think before I would pursue what you are talking about on a generic
level I would want to look at the fact that we handle unblockable faults
wrong.  While unlikely it is possible for someone to send a thread
specific signal at just the right time, and have that signal
delivered before the synchronous fault.

Then we could pass through additional arguments through that new
``generic'' path.  Especially what are arguments such as
tsk->thread.fault_address and tsk->thread.fault_code.

We can do anything we like with a new SA_flag as that allows us to
change the format of the sigframe.

If we are very careful we can add generic fields after that crazy union
anonymous union in the _sigfault case of struct siginfo.

The trick would be to find something that would be enough so that people
don't need to implement their own instruction decoder to see what is
going on.  Something that is applicable to every sigfault case not just
SIGSEGV.  Something that we can and want to implement on multiple
architectures.

The point being doing something generic can be a lot of work, even if it
is worth it in the end.


> [1]
>
> static volatile int have_extflags = 0;
>
> static void handler(int n, siginfo_t *si, void *uc)
> {
>   /* ... */
>
>   if (have_extflags) {
>   /* Check si->si_extflags */
>   } else {
>   /* fallback */
>   }
>
>   /* ... */
> }
>
> int main(void)
> {
>   /* ... */
>
>   struct sigaction sa;
>
>   /* ... */
>
>   sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS;
>   sa.sa_sigaction = handler;
>   if (!sigaction(SIGSEGV, &sa, NULL)) {
>   have_extflags = 1;
>   } else {
>   sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS;
>   if (sigaction(SIGSEGV, &sa, NULL))
>   goto error;
>   }
>
>   /* ... */
> }
>
> [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault 
> on kernel VA
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html

Eric


Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-19 Thread Dave Martin
On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote:

[...]

> The other thing we should do is to get rid of the stupid padding.
> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
> completely insane, when it's always just zero in the kernel.

Agreed, inside the kernel the padding achieves nothing.

> So put that _pad[] thing inside #ifndef __KERNEL__, and make
> copy_siginfo_to_user() write the padding zeroes when copying to user
> space. The reason for the padding is "future expansion", so we do want
> to tell the user space that it's maybe up to 128 bytes in size, but if
> we don't fill it all, we shouldn't waste time and memory on clearing
> the padding internally.
> 
> I'm certainly *hoping* nobody depends on the whole 128 bytes in
> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
> is negative), but the man-page only says "si_value", and the compat
> function doesn't copy any more than that either, so any user that
> tries to fill in more than si_value is already broken. In fact, it
> might even be worth enforcing that in rt_sigqueueinfo(), just to see
> if anybody plays any games..

[...]

Digression:

Since we don't traditionally zero the tail-padding in the user sigframe,
is there a reliable way for userspace to detect newly-added fields in
siginfo other than by having an explicit sigaction sa_flags flag to
request them?  I imagine something like [1] below from the userspace
perspective.

On a separate thread, the issue of how to report syndrome information
for SIGSEGV came up [2] (such as whether the faulting instruction was a
read or a write).  This information is useful (and used) by things like
userspace sanitisers and qemu.  Currently, reporting this to userspace
relies on arch-specific cruft in the sigframe.

We're committed to maintaining what's already in each arch sigframe,
but it would be preferable to have a portable way of adding information
to siginfo in a generic way.  si_code doesn't really work for that,
since si_codes are mutually exclusive: I can't see a way of adding
supplementary information using si_code.

Anyway, that would be a separate RFC in the future (if ever).

Cheers
---Dave


[1]

static volatile int have_extflags = 0;

static void handler(int n, siginfo_t *si, void *uc)
{
/* ... */

if (have_extflags) {
/* Check si->si_extflags */
} else {
/* fallback */
}

/* ... */
}

int main(void)
{
/* ... */

struct sigaction sa;

/* ... */

sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS;
sa.sa_sigaction = handler;
if (!sigaction(SIGSEGV, &sa, NULL)) {
have_extflags = 1;
} else {
sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS;
if (sigaction(SIGSEGV, &sa, NULL))
goto error;
}

/* ... */
}

[2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on 
kernel VA
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html


Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-18 Thread Eric W. Biederman
Linus Torvalds  writes:

> (
>
> On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
>  wrote:

[snip bit about wanting what is effectively force_sig_fault instead of
clear_siginfo everywhere]

> The other thing we should do is to get rid of the stupid padding.
> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
> completely insane, when it's always just zero in the kernel.
>
> So put that _pad[] thing inside #ifndef __KERNEL__, and make
> copy_siginfo_to_user() write the padding zeroes when copying to user
> space. The reason for the padding is "future expansion", so we do want
> to tell the user space that it's maybe up to 128 bytes in size, but if
> we don't fill it all, we shouldn't waste time and memory on clearing
> the padding internally.
>
> I'm certainly *hoping* nobody depends on the whole 128 bytes in
> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
> is negative), but the man-page only says "si_value", and the compat
> function doesn't copy any more than that either, so any user that
> tries to fill in more than si_value is already broken. In fact, it
> might even be worth enforcing that in rt_sigqueueinfo(), just to see
> if anybody plays any games..


>From my earlier looking I think this is all doable except detecting
if

> On x86-64, without the pointless padding, the size of 'struct siginfo'
> inside the kernel would be 48 bytes. That's quite a big difference for
> something that is often allocated on the kernel stack.

>From my earlier looking I can say that I know of no case where signals
are injected into the kernel that we need more bytes than the kernel
currently provides.

The two cases I know of are signal reinjection for checkpoint/restart
or something that just uses pid, uid and ptr.   Generally that is
enough.

If we just truncate siginfo to everything except the pad bytes in the
kernel there should be no problems.  What I don't see how to do is to
limit the size of struct siginfo the kernel accepts in a forward
compatible way.  Something that would fail if for some reason you used
more siginfo bytes.

Looking at userspace.  glibc always memsets siginfo_t to 0.
Criu just uses whatever it captured with PTRACE_PEEKSIGINFO,
and criu uses unitialized memory for the target of PTRACE_PEEKSIGINFO.

If we truncate siginfo in the kernel then we check for a known si_code
in which case we are safe to truncate siginfo.  If the si_code is
unknown then we should check to see if the extra bytes are 0.  That
should work with everything.  Plus it is a natural point to test to
see if userspace is using signals that the kernel does not currently
support.

I will put that in my queue.

> So I'm certainly willing to make those kinds of changes, but let's
> make them real *improvements* now, ok? Wasn't that the point of all
> the cleanups in the end?

Definitely.  With the strace test case causing people to talk about
regressions I was just looking to see if it would make sense to do
something minimal for -rc2 so reduce concerns about regressions.

Now I am going to focus on getting what I can ready for the next merge
window.

Eric






Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-15 Thread Eric W. Biederman
Linus Torvalds  writes:

> (
>
> On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
>  wrote:
>>
>> Would you consider the patchset below for -rc2?
>
> Ugh.

The point of this series is to squash the potential for regressions even
from the weird broken code that fills in fields for si_code 0 that do
not match SI_USER.

The copy_siginfo_to_user32 never handled that so we don't need to worry
about that.  All of the signals that abuse si_code 0 go through
force_sig_info so signalfd can not catch them.  Which means that only
copy_siginfo_to_user needs to be worried about.

Last time I was benchmarking I could not see a difference between
copying the entire siginfo and only a small part so I don't see
the point of optimizing now.

For an actual cleanup I intend to go farther than you are proposing and
convert everthing to the set of helper functions I have added to
kernel/signal.c force_sig_fault, force_sig_mceerr, force_sig_bnderr,
force_sig_pkuerr.

When I am done there will be maybe 5 instances of clear_siginfo outside
of those helper functions.  At which point I won't care if we remove
clear_siginfo and just use memset.  That is a real improvement
that looks something like: "105 files changed, 933 insertions(+), 1698 
deletions(-)"
That is my end goal with all of this.

You complained to me about regressions and you are right with the
current handling of FPE_FIXME TRAP_FIXME the code is not bug compatible
with old versions of linux, and people have noticed.

What this patchset represents is the bare minimum needed to convert
copy_siginfo_to_user to a copy_to_user, and remove the possibility
of regressions.

To that end I need to ensure every struct siginfo in the entire
kernel is memset to 0.  A structure initializer is absolutely
not enough.  So I don't mind people thinking clear_siginfo is necessary.


To that end clear_siginfo where it does not take the size of
struct siginfo as a parameter is much less suceptible to typos,
and allows me to ensure every instance of struct siginfo is fully
initialized.

I fully intend to remove practically every instance of struct siginfo
from the architecture specific code, and use the aforementioned helpers.
The trouble is that some of the architecture code need refactoring
for that to happen.  Even your proposed init_siginfo can't be widely
used as a common pattern is to fill in siginfo partially in the
code with si_code and si_signo being filled in almost last.

So in short.  I intend to remove most of these clear_siginfo calls when
I remove the siginfos later.  This series is focused on making
copy_siginfo_to_user simple enough we don't need to use siginfo_layout,
and thus can be fully backwards compatibile with ourselves and we won't
need to worry about regressions.  I have aimed to keep this simple
enough we can merge this after -rc1 because we don't want regressions.

Eric

ps.  I intend to place this change first in my series wether or not it makes
it into -rc2 so that I can be certain we remove any possible regressions
in behavior on the buggy architectures.  Then I can take my time and
ensure the non-trivial changes of refactoring etc are done carefully so
I don't introduce other bugs.  I need that so I can sleep at night.

pps.  I can look at some of your other suggestions but cleverness leads
to regressions, and if you are going to complain at me harshly when I
have been being careful and taking things seriously I am not
particularly willing to take unnecessary chances.


Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-15 Thread Linus Torvalds
(

On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
 wrote:
>
> Would you consider the patchset below for -rc2?

Ugh.

I have an irrational dislike of "clear_siginfo()". It's a nasty broken
interface, which just mis-spells "memset()", and makes it non-obvious
that you could clear it other ways too (eg "kzalloc()" or whatever).

And this series brings in a lot of new users.

Honestly, both "clear_siginfo()" and "copy_siginfo()" are crazy
interfaces. They may have made sense when converting code, but not so
much now.

If we want to have a function that initializes a siginfo, I think it
should _actually_ initialize it, and at least take the two fields that
a siginfo has to have to be valid: si_signo and si_code.

So I'd rather see a patch that does something like this:

  - clear_siginfo(info);
  - info->si_signo = sig;
  - info->si_errno = 0;
  - info->si_code = SI_USER;
  - info->si_pid = 0;
  - info->si_uid = 0;
  + init_siginfo(info, sig, SI_USER);

which at least makes that function be *worth* something, not just a
bad spelling of memset. It's not just the removal of pointless "set to
zero", it's the whole concept of "this function actually makes a valid
siginfo", which is lacking in the existing function.

(Yeah, yeah, si_errno is "generic" too and always part of a siginfo,
but nobody cares. It's pretty much always set to zero, it would be
stupid to add that interface to the "init_siginfo()" function. So just
clearing it is fine, the one or two places that want to set it to some
silly value can do it themselves).

Then your series would incidentally also:

 (a) make for fewer lines overall, rather than add lines

 (b) make it clear that we always initialize si_code, which now *must*
be a valid value with all the recent siginfo changes.

Hmm?

The other thing we should do is to get rid of the stupid padding.
Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
completely insane, when it's always just zero in the kernel.

So put that _pad[] thing inside #ifndef __KERNEL__, and make
copy_siginfo_to_user() write the padding zeroes when copying to user
space. The reason for the padding is "future expansion", so we do want
to tell the user space that it's maybe up to 128 bytes in size, but if
we don't fill it all, we shouldn't waste time and memory on clearing
the padding internally.

I'm certainly *hoping* nobody depends on the whole 128 bytes in
rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
is negative), but the man-page only says "si_value", and the compat
function doesn't copy any more than that either, so any user that
tries to fill in more than si_value is already broken. In fact, it
might even be worth enforcing that in rt_sigqueueinfo(), just to see
if anybody plays any games..

On x86-64, without the pointless padding, the size of 'struct siginfo'
inside the kernel would be 48 bytes. That's quite a big difference for
something that is often allocated on the kernel stack.

So I'm certainly willing to make those kinds of changes, but let's
make them real *improvements* now, ok? Wasn't that the point of all
the cleanups in the end?

   Linus


[RFC PATCH 0/3] Dealing with the aliases of SI_USER

2018-04-15 Thread Eric W. Biederman

Linus,

Would you consider the patchset below for -rc2?

Dealing with the aliases of SI_USER has been a challenge as we have had
a b0rked ABI in some cases since 2.5.

So far no one except myself has suggested that changing the si_code of
from 0 to something else for those problematic aliases of SI_USER is
going to be a problem.  So it looks like just fixing the issue is a real
possibility.

Fixing the cases that do kill(SIGFPE, ...) because at least test cases
care seems important.

The best fixes to backport appear to be the real architecture fixes that
remove the aliases for SI_USER as those are deep fixes that
fundamentally fix the problems, and are also very small changes.

I am not yet brave enough to merge architectural fixes like that without
arch maintainers buy-in.   Getting at least an ack if nothing else takes
a little bit of time.

Still we have a arm fix upthread and David Miller has given his nod
to a sparc fix that uses FPE_FLTUNK.  So it appears real architecture
fixes are progressing.  Further I have looked and that leaves only
powerpc, parisc, ia64, and alpha.   The new si_code FPE_FLTUNK appears
to address most of those, and there is an untested patch for parisc.

So real progress appears possible.

The generic code can do better, and that is what this rfc patchset is
about.  It ensures siginfo is fully initialized and uses copy_to_user
to copy siginfo to userspace.  This takes siginfo_layout out of
the picture and so for non-compat non-signalfd siginfos the status
quo returns to what it was before I introduced siginfo_layout (AKA
regressions go bye-bye).

I believe given the issues these changes are a candiate for -rc2.
Otherwise I will keep these changes for the next merge window.

Eric W. Biederman (3):
  signal: Ensure every siginfo we send has all bits initialized
  signal: Reduce copy_siginfo_to_user to just copy_to_user
  signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout

 arch/alpha/kernel/osf_sys.c   |  1 +
 arch/alpha/kernel/signal.c|  2 +
 arch/alpha/kernel/traps.c |  5 ++
 arch/alpha/mm/fault.c |  2 +
 arch/arc/mm/fault.c   |  2 +
 arch/arm/kernel/ptrace.c  |  1 +
 arch/arm/kernel/swp_emulate.c |  1 +
 arch/arm/kernel/traps.c   |  5 ++
 arch/arm/mm/alignment.c   |  1 +
 arch/arm/mm/fault.c   |  5 ++
 arch/arm/vfp/vfpmodule.c  |  3 +-
 arch/arm64/kernel/fpsimd.c|  2 +-
 arch/arm64/kernel/sys_compat.c|  1 +
 arch/arm64/kernel/traps.c |  1 +
 arch/arm64/mm/fault.c | 18 --
 arch/c6x/kernel/traps.c   |  1 +
 arch/hexagon/kernel/traps.c   |  1 +
 arch/hexagon/mm/vm_fault.c|  1 +
 arch/ia64/kernel/brl_emu.c|  1 +
 arch/ia64/kernel/signal.c |  2 +
 arch/ia64/kernel/traps.c  | 27 -
 arch/ia64/kernel/unaligned.c  |  1 +
 arch/ia64/mm/fault.c  |  4 +-
 arch/m68k/kernel/traps.c  |  2 +
 arch/microblaze/kernel/exceptions.c   |  1 +
 arch/microblaze/mm/fault.c|  4 +-
 arch/mips/mm/fault.c  |  1 +
 arch/nds32/kernel/traps.c |  6 +-
 arch/nds32/mm/fault.c |  1 +
 arch/nios2/kernel/traps.c |  1 +
 arch/openrisc/kernel/traps.c  |  5 +-
 arch/openrisc/mm/fault.c  |  1 +
 arch/parisc/kernel/ptrace.c   |  1 +
 arch/parisc/kernel/traps.c|  2 +
 arch/parisc/kernel/unaligned.c|  1 +
 arch/parisc/math-emu/driver.c |  1 +
 arch/parisc/mm/fault.c|  1 +
 arch/powerpc/kernel/process.c |  1 +
 arch/powerpc/kernel/traps.c   |  3 +-
 arch/powerpc/mm/fault.c   |  1 +
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/kernel/traps.c |  1 +
 arch/s390/kernel/traps.c  |  5 +-
 arch/s390/mm/fault.c  |  2 +
 arch/sh/kernel/hw_breakpoint.c|  1 +
 arch/sh/kernel/traps_32.c |  2 +
 arch/sh/math-emu/math.c   |  1 +
 arch/sh/mm/fault.c|  1 +
 arch/sparc/kernel/process_64.c|  1 +
 arch/sparc/kernel/sys_sparc_32.c  |  1 +
 arch/sparc/kernel/traps_32.c  | 10 
 arch/sparc/kernel/traps_64.c  | 14 +
 arch/sparc/kernel/unaligned_32.c  |  1 +
 arch/sparc/mm/fault_32.c  |  1 +
 arch/sparc/mm/fault_64.c  |  1 +
 arch/um/kernel/trap.c |  2 +
 arch/unicore32/kernel/fpu-ucf64.c |  2 +-
 arch/unicore32/mm/fault.c |  3 +
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/kernel/ptrace.c  |  2 +