Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER
Dave Martin writes: > On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote: > > [...] > >> The other thing we should do is to get rid of the stupid padding. >> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is >> completely insane, when it's always just zero in the kernel. > > Agreed, inside the kernel the padding achieves nothing. > >> So put that _pad[] thing inside #ifndef __KERNEL__, and make >> copy_siginfo_to_user() write the padding zeroes when copying to user >> space. The reason for the padding is "future expansion", so we do want >> to tell the user space that it's maybe up to 128 bytes in size, but if >> we don't fill it all, we shouldn't waste time and memory on clearing >> the padding internally. >> >> I'm certainly *hoping* nobody depends on the whole 128 bytes in >> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code >> is negative), but the man-page only says "si_value", and the compat >> function doesn't copy any more than that either, so any user that >> tries to fill in more than si_value is already broken. In fact, it >> might even be worth enforcing that in rt_sigqueueinfo(), just to see >> if anybody plays any games.. > > [...] > > Digression: > > Since we don't traditionally zero the tail-padding in the user sigframe, > is there a reliable way for userspace to detect newly-added fields in > siginfo other than by having an explicit sigaction sa_flags flag to > request them? I imagine something like [1] below from the userspace > perspective. > > On a separate thread, the issue of how to report syndrome information > for SIGSEGV came up [2] (such as whether the faulting instruction was a > read or a write). This information is useful (and used) by things like > userspace sanitisers and qemu. Currently, reporting this to userspace > relies on arch-specific cruft in the sigframe. > > We're committed to maintaining what's already in each arch sigframe, > but it would be preferable to have a portable way of adding information > to siginfo in a generic way. si_code doesn't really work for that, > since si_codes are mutually exclusive: I can't see a way of adding > supplementary information using si_code. > > Anyway, that would be a separate RFC in the future (if ever). So far what I have seen done is ``registers'' added to sigcontext. Which it looks like you have done with esr. Scrubbing information from faults to where the addresses point outside of the userspace mapping makes sense. I think before I would pursue what you are talking about on a generic level I would want to look at the fact that we handle unblockable faults wrong. While unlikely it is possible for someone to send a thread specific signal at just the right time, and have that signal delivered before the synchronous fault. Then we could pass through additional arguments through that new ``generic'' path. Especially what are arguments such as tsk->thread.fault_address and tsk->thread.fault_code. We can do anything we like with a new SA_flag as that allows us to change the format of the sigframe. If we are very careful we can add generic fields after that crazy union anonymous union in the _sigfault case of struct siginfo. The trick would be to find something that would be enough so that people don't need to implement their own instruction decoder to see what is going on. Something that is applicable to every sigfault case not just SIGSEGV. Something that we can and want to implement on multiple architectures. The point being doing something generic can be a lot of work, even if it is worth it in the end. > [1] > > static volatile int have_extflags = 0; > > static void handler(int n, siginfo_t *si, void *uc) > { > /* ... */ > > if (have_extflags) { > /* Check si->si_extflags */ > } else { > /* fallback */ > } > > /* ... */ > } > > int main(void) > { > /* ... */ > > struct sigaction sa; > > /* ... */ > > sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS; > sa.sa_sigaction = handler; > if (!sigaction(SIGSEGV, &sa, NULL)) { > have_extflags = 1; > } else { > sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS; > if (sigaction(SIGSEGV, &sa, NULL)) > goto error; > } > > /* ... */ > } > > [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault > on kernel VA > http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html Eric
Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER
On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote: [...] > The other thing we should do is to get rid of the stupid padding. > Right now "struct siginfo" is pointlessly padded to 128 bytes. That is > completely insane, when it's always just zero in the kernel. Agreed, inside the kernel the padding achieves nothing. > So put that _pad[] thing inside #ifndef __KERNEL__, and make > copy_siginfo_to_user() write the padding zeroes when copying to user > space. The reason for the padding is "future expansion", so we do want > to tell the user space that it's maybe up to 128 bytes in size, but if > we don't fill it all, we shouldn't waste time and memory on clearing > the padding internally. > > I'm certainly *hoping* nobody depends on the whole 128 bytes in > rt_sigqueueinfo(). In theory you can fill it all (as long as si_code > is negative), but the man-page only says "si_value", and the compat > function doesn't copy any more than that either, so any user that > tries to fill in more than si_value is already broken. In fact, it > might even be worth enforcing that in rt_sigqueueinfo(), just to see > if anybody plays any games.. [...] Digression: Since we don't traditionally zero the tail-padding in the user sigframe, is there a reliable way for userspace to detect newly-added fields in siginfo other than by having an explicit sigaction sa_flags flag to request them? I imagine something like [1] below from the userspace perspective. On a separate thread, the issue of how to report syndrome information for SIGSEGV came up [2] (such as whether the faulting instruction was a read or a write). This information is useful (and used) by things like userspace sanitisers and qemu. Currently, reporting this to userspace relies on arch-specific cruft in the sigframe. We're committed to maintaining what's already in each arch sigframe, but it would be preferable to have a portable way of adding information to siginfo in a generic way. si_code doesn't really work for that, since si_codes are mutually exclusive: I can't see a way of adding supplementary information using si_code. Anyway, that would be a separate RFC in the future (if ever). Cheers ---Dave [1] static volatile int have_extflags = 0; static void handler(int n, siginfo_t *si, void *uc) { /* ... */ if (have_extflags) { /* Check si->si_extflags */ } else { /* fallback */ } /* ... */ } int main(void) { /* ... */ struct sigaction sa; /* ... */ sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS; sa.sa_sigaction = handler; if (!sigaction(SIGSEGV, &sa, NULL)) { have_extflags = 1; } else { sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS; if (sigaction(SIGSEGV, &sa, NULL)) goto error; } /* ... */ } [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html
Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER
Linus Torvalds writes: > ( > > On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman > wrote: [snip bit about wanting what is effectively force_sig_fault instead of clear_siginfo everywhere] > The other thing we should do is to get rid of the stupid padding. > Right now "struct siginfo" is pointlessly padded to 128 bytes. That is > completely insane, when it's always just zero in the kernel. > > So put that _pad[] thing inside #ifndef __KERNEL__, and make > copy_siginfo_to_user() write the padding zeroes when copying to user > space. The reason for the padding is "future expansion", so we do want > to tell the user space that it's maybe up to 128 bytes in size, but if > we don't fill it all, we shouldn't waste time and memory on clearing > the padding internally. > > I'm certainly *hoping* nobody depends on the whole 128 bytes in > rt_sigqueueinfo(). In theory you can fill it all (as long as si_code > is negative), but the man-page only says "si_value", and the compat > function doesn't copy any more than that either, so any user that > tries to fill in more than si_value is already broken. In fact, it > might even be worth enforcing that in rt_sigqueueinfo(), just to see > if anybody plays any games.. >From my earlier looking I think this is all doable except detecting if > On x86-64, without the pointless padding, the size of 'struct siginfo' > inside the kernel would be 48 bytes. That's quite a big difference for > something that is often allocated on the kernel stack. >From my earlier looking I can say that I know of no case where signals are injected into the kernel that we need more bytes than the kernel currently provides. The two cases I know of are signal reinjection for checkpoint/restart or something that just uses pid, uid and ptr. Generally that is enough. If we just truncate siginfo to everything except the pad bytes in the kernel there should be no problems. What I don't see how to do is to limit the size of struct siginfo the kernel accepts in a forward compatible way. Something that would fail if for some reason you used more siginfo bytes. Looking at userspace. glibc always memsets siginfo_t to 0. Criu just uses whatever it captured with PTRACE_PEEKSIGINFO, and criu uses unitialized memory for the target of PTRACE_PEEKSIGINFO. If we truncate siginfo in the kernel then we check for a known si_code in which case we are safe to truncate siginfo. If the si_code is unknown then we should check to see if the extra bytes are 0. That should work with everything. Plus it is a natural point to test to see if userspace is using signals that the kernel does not currently support. I will put that in my queue. > So I'm certainly willing to make those kinds of changes, but let's > make them real *improvements* now, ok? Wasn't that the point of all > the cleanups in the end? Definitely. With the strace test case causing people to talk about regressions I was just looking to see if it would make sense to do something minimal for -rc2 so reduce concerns about regressions. Now I am going to focus on getting what I can ready for the next merge window. Eric
Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER
Linus Torvalds writes: > ( > > On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman > wrote: >> >> Would you consider the patchset below for -rc2? > > Ugh. The point of this series is to squash the potential for regressions even from the weird broken code that fills in fields for si_code 0 that do not match SI_USER. The copy_siginfo_to_user32 never handled that so we don't need to worry about that. All of the signals that abuse si_code 0 go through force_sig_info so signalfd can not catch them. Which means that only copy_siginfo_to_user needs to be worried about. Last time I was benchmarking I could not see a difference between copying the entire siginfo and only a small part so I don't see the point of optimizing now. For an actual cleanup I intend to go farther than you are proposing and convert everthing to the set of helper functions I have added to kernel/signal.c force_sig_fault, force_sig_mceerr, force_sig_bnderr, force_sig_pkuerr. When I am done there will be maybe 5 instances of clear_siginfo outside of those helper functions. At which point I won't care if we remove clear_siginfo and just use memset. That is a real improvement that looks something like: "105 files changed, 933 insertions(+), 1698 deletions(-)" That is my end goal with all of this. You complained to me about regressions and you are right with the current handling of FPE_FIXME TRAP_FIXME the code is not bug compatible with old versions of linux, and people have noticed. What this patchset represents is the bare minimum needed to convert copy_siginfo_to_user to a copy_to_user, and remove the possibility of regressions. To that end I need to ensure every struct siginfo in the entire kernel is memset to 0. A structure initializer is absolutely not enough. So I don't mind people thinking clear_siginfo is necessary. To that end clear_siginfo where it does not take the size of struct siginfo as a parameter is much less suceptible to typos, and allows me to ensure every instance of struct siginfo is fully initialized. I fully intend to remove practically every instance of struct siginfo from the architecture specific code, and use the aforementioned helpers. The trouble is that some of the architecture code need refactoring for that to happen. Even your proposed init_siginfo can't be widely used as a common pattern is to fill in siginfo partially in the code with si_code and si_signo being filled in almost last. So in short. I intend to remove most of these clear_siginfo calls when I remove the siginfos later. This series is focused on making copy_siginfo_to_user simple enough we don't need to use siginfo_layout, and thus can be fully backwards compatibile with ourselves and we won't need to worry about regressions. I have aimed to keep this simple enough we can merge this after -rc1 because we don't want regressions. Eric ps. I intend to place this change first in my series wether or not it makes it into -rc2 so that I can be certain we remove any possible regressions in behavior on the buggy architectures. Then I can take my time and ensure the non-trivial changes of refactoring etc are done carefully so I don't introduce other bugs. I need that so I can sleep at night. pps. I can look at some of your other suggestions but cleverness leads to regressions, and if you are going to complain at me harshly when I have been being careful and taking things seriously I am not particularly willing to take unnecessary chances.
Re: [RFC PATCH 0/3] Dealing with the aliases of SI_USER
( On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman wrote: > > Would you consider the patchset below for -rc2? Ugh. I have an irrational dislike of "clear_siginfo()". It's a nasty broken interface, which just mis-spells "memset()", and makes it non-obvious that you could clear it other ways too (eg "kzalloc()" or whatever). And this series brings in a lot of new users. Honestly, both "clear_siginfo()" and "copy_siginfo()" are crazy interfaces. They may have made sense when converting code, but not so much now. If we want to have a function that initializes a siginfo, I think it should _actually_ initialize it, and at least take the two fields that a siginfo has to have to be valid: si_signo and si_code. So I'd rather see a patch that does something like this: - clear_siginfo(info); - info->si_signo = sig; - info->si_errno = 0; - info->si_code = SI_USER; - info->si_pid = 0; - info->si_uid = 0; + init_siginfo(info, sig, SI_USER); which at least makes that function be *worth* something, not just a bad spelling of memset. It's not just the removal of pointless "set to zero", it's the whole concept of "this function actually makes a valid siginfo", which is lacking in the existing function. (Yeah, yeah, si_errno is "generic" too and always part of a siginfo, but nobody cares. It's pretty much always set to zero, it would be stupid to add that interface to the "init_siginfo()" function. So just clearing it is fine, the one or two places that want to set it to some silly value can do it themselves). Then your series would incidentally also: (a) make for fewer lines overall, rather than add lines (b) make it clear that we always initialize si_code, which now *must* be a valid value with all the recent siginfo changes. Hmm? The other thing we should do is to get rid of the stupid padding. Right now "struct siginfo" is pointlessly padded to 128 bytes. That is completely insane, when it's always just zero in the kernel. So put that _pad[] thing inside #ifndef __KERNEL__, and make copy_siginfo_to_user() write the padding zeroes when copying to user space. The reason for the padding is "future expansion", so we do want to tell the user space that it's maybe up to 128 bytes in size, but if we don't fill it all, we shouldn't waste time and memory on clearing the padding internally. I'm certainly *hoping* nobody depends on the whole 128 bytes in rt_sigqueueinfo(). In theory you can fill it all (as long as si_code is negative), but the man-page only says "si_value", and the compat function doesn't copy any more than that either, so any user that tries to fill in more than si_value is already broken. In fact, it might even be worth enforcing that in rt_sigqueueinfo(), just to see if anybody plays any games.. On x86-64, without the pointless padding, the size of 'struct siginfo' inside the kernel would be 48 bytes. That's quite a big difference for something that is often allocated on the kernel stack. So I'm certainly willing to make those kinds of changes, but let's make them real *improvements* now, ok? Wasn't that the point of all the cleanups in the end? Linus
[RFC PATCH 0/3] Dealing with the aliases of SI_USER
Linus, Would you consider the patchset below for -rc2? Dealing with the aliases of SI_USER has been a challenge as we have had a b0rked ABI in some cases since 2.5. So far no one except myself has suggested that changing the si_code of from 0 to something else for those problematic aliases of SI_USER is going to be a problem. So it looks like just fixing the issue is a real possibility. Fixing the cases that do kill(SIGFPE, ...) because at least test cases care seems important. The best fixes to backport appear to be the real architecture fixes that remove the aliases for SI_USER as those are deep fixes that fundamentally fix the problems, and are also very small changes. I am not yet brave enough to merge architectural fixes like that without arch maintainers buy-in. Getting at least an ack if nothing else takes a little bit of time. Still we have a arm fix upthread and David Miller has given his nod to a sparc fix that uses FPE_FLTUNK. So it appears real architecture fixes are progressing. Further I have looked and that leaves only powerpc, parisc, ia64, and alpha. The new si_code FPE_FLTUNK appears to address most of those, and there is an untested patch for parisc. So real progress appears possible. The generic code can do better, and that is what this rfc patchset is about. It ensures siginfo is fully initialized and uses copy_to_user to copy siginfo to userspace. This takes siginfo_layout out of the picture and so for non-compat non-signalfd siginfos the status quo returns to what it was before I introduced siginfo_layout (AKA regressions go bye-bye). I believe given the issues these changes are a candiate for -rc2. Otherwise I will keep these changes for the next merge window. Eric W. Biederman (3): signal: Ensure every siginfo we send has all bits initialized signal: Reduce copy_siginfo_to_user to just copy_to_user signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout arch/alpha/kernel/osf_sys.c | 1 + arch/alpha/kernel/signal.c| 2 + arch/alpha/kernel/traps.c | 5 ++ arch/alpha/mm/fault.c | 2 + arch/arc/mm/fault.c | 2 + arch/arm/kernel/ptrace.c | 1 + arch/arm/kernel/swp_emulate.c | 1 + arch/arm/kernel/traps.c | 5 ++ arch/arm/mm/alignment.c | 1 + arch/arm/mm/fault.c | 5 ++ arch/arm/vfp/vfpmodule.c | 3 +- arch/arm64/kernel/fpsimd.c| 2 +- arch/arm64/kernel/sys_compat.c| 1 + arch/arm64/kernel/traps.c | 1 + arch/arm64/mm/fault.c | 18 -- arch/c6x/kernel/traps.c | 1 + arch/hexagon/kernel/traps.c | 1 + arch/hexagon/mm/vm_fault.c| 1 + arch/ia64/kernel/brl_emu.c| 1 + arch/ia64/kernel/signal.c | 2 + arch/ia64/kernel/traps.c | 27 - arch/ia64/kernel/unaligned.c | 1 + arch/ia64/mm/fault.c | 4 +- arch/m68k/kernel/traps.c | 2 + arch/microblaze/kernel/exceptions.c | 1 + arch/microblaze/mm/fault.c| 4 +- arch/mips/mm/fault.c | 1 + arch/nds32/kernel/traps.c | 6 +- arch/nds32/mm/fault.c | 1 + arch/nios2/kernel/traps.c | 1 + arch/openrisc/kernel/traps.c | 5 +- arch/openrisc/mm/fault.c | 1 + arch/parisc/kernel/ptrace.c | 1 + arch/parisc/kernel/traps.c| 2 + arch/parisc/kernel/unaligned.c| 1 + arch/parisc/math-emu/driver.c | 1 + arch/parisc/mm/fault.c| 1 + arch/powerpc/kernel/process.c | 1 + arch/powerpc/kernel/traps.c | 3 +- arch/powerpc/mm/fault.c | 1 + arch/powerpc/platforms/cell/spufs/fault.c | 2 +- arch/riscv/kernel/traps.c | 1 + arch/s390/kernel/traps.c | 5 +- arch/s390/mm/fault.c | 2 + arch/sh/kernel/hw_breakpoint.c| 1 + arch/sh/kernel/traps_32.c | 2 + arch/sh/math-emu/math.c | 1 + arch/sh/mm/fault.c| 1 + arch/sparc/kernel/process_64.c| 1 + arch/sparc/kernel/sys_sparc_32.c | 1 + arch/sparc/kernel/traps_32.c | 10 arch/sparc/kernel/traps_64.c | 14 + arch/sparc/kernel/unaligned_32.c | 1 + arch/sparc/mm/fault_32.c | 1 + arch/sparc/mm/fault_64.c | 1 + arch/um/kernel/trap.c | 2 + arch/unicore32/kernel/fpu-ucf64.c | 2 +- arch/unicore32/mm/fault.c | 3 + arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/ptrace.c | 2 +