Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Mon, Sep 14, 2020 at 2:14 PM Dave Hansen wrote: > > On 9/14/20 11:31 AM, Andy Lutomirski wrote: > > No matter what we do, the effects of calling vfork() are going to be a > > bit odd with SHSTK enabled. I suppose we could disallow this, but > > that seems likely to cause its own issues. > > What's odd about it? If you're a vfork()'d child, you can't touch the > stack at all, right? If you do, you or your parent will probably die a > horrible death. > An evil program could vfork(), have the child do a bunch of returns and a bunch of calls, and exit. The net effect would be to change the parent's shadow stack contents. In a sufficiently strict model, this is potentially problematic. The question is: how much do we want to protect userspace from itself? --Andy
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/16/2020 6:52 AM, Andy Lutomirski wrote: On Mon, Sep 14, 2020 at 2:14 PM Dave Hansen wrote: On 9/14/20 11:31 AM, Andy Lutomirski wrote: No matter what we do, the effects of calling vfork() are going to be a bit odd with SHSTK enabled. I suppose we could disallow this, but that seems likely to cause its own issues. What's odd about it? If you're a vfork()'d child, you can't touch the stack at all, right? If you do, you or your parent will probably die a horrible death. An evil program could vfork(), have the child do a bunch of returns and a bunch of calls, and exit. The net effect would be to change the parent's shadow stack contents. In a sufficiently strict model, this is potentially problematic. When a vfork child returns, its parent's shadow stack pointer is where it was before the child starts. To move the shadow stack pointer and re-use the content left by the child, the parent needs to use CALL, RET, INCSSP, or RSTORSSP. This seems to be difficult. The question is: how much do we want to protect userspace from itself? > If any issue comes up, people can always find ways to counter it. --Andy
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/15/2020 12:24 PM, Dave Hansen wrote: On 9/15/20 12:08 PM, Yu-cheng Yu wrote: On Mon, 2020-09-14 at 17:12 -0700, Yu, Yu-cheng wrote: On 9/14/2020 7:50 AM, Dave Hansen wrote: On 9/11/20 3:59 PM, Yu-cheng Yu wrote: ... Here are the changes if we take the mprotect(PROT_SHSTK) approach. Any comments/suggestions? I still don't like it. :) I'll also be much happier when there's a proper changelog to accompany this which also spells out the alternatives any why they suck so much. [...] I revised it. If this turns out needing more work/discussion, we can split it out from the shadow stack series. Where does that leave things? You only get shadow stacks for single-threaded apps which have the ELF bits set? As long as the system supports shadow stack, any application can mmap()/mprotect() a shadow stack. A pthread can allocate a shadow stack too. However, only shadow stack-enabled programs can activate/use the shadow stack.
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/15/20 12:08 PM, Yu-cheng Yu wrote: > On Mon, 2020-09-14 at 17:12 -0700, Yu, Yu-cheng wrote: >> On 9/14/2020 7:50 AM, Dave Hansen wrote: >>> On 9/11/20 3:59 PM, Yu-cheng Yu wrote: >>> ... Here are the changes if we take the mprotect(PROT_SHSTK) approach. Any comments/suggestions? >>> I still don't like it. :) >>> >>> I'll also be much happier when there's a proper changelog to accompany >>> this which also spells out the alternatives any why they suck so much. > [...] > > I revised it. If this turns out needing more work/discussion, we can split it > out from the shadow stack series. Where does that leave things? You only get shadow stacks for single-threaded apps which have the ELF bits set?
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Mon, 2020-09-14 at 17:12 -0700, Yu, Yu-cheng wrote: > On 9/14/2020 7:50 AM, Dave Hansen wrote: > > On 9/11/20 3:59 PM, Yu-cheng Yu wrote: > > ... > > > Here are the changes if we take the mprotect(PROT_SHSTK) approach. > > > Any comments/suggestions? > > > > I still don't like it. :) > > > > I'll also be much happier when there's a proper changelog to accompany > > this which also spells out the alternatives any why they suck so much. [...] I revised it. If this turns out needing more work/discussion, we can split it out from the shadow stack series. Thanks, Yu-cheng -- >From 114f3108207e3eb15fe13b3ff43d1ac273b5072a Mon Sep 17 00:00:00 2001 From: Yu-cheng Yu Date: Thu, 10 Sep 2020 16:41:30 -0700 Subject: [PATCH] mm: Introduce PROT_SHSTK for shadow stack There are three possible options to create a shadow stack allocation API: an arch_prctl, a new syscall, or adding PROT_SHSTK to mmap()/mprotect(). Each has its advantages and compromises. An arch_prctl() is the least intrusive. However, the existing x86 arch_prctl() takes only two parameters. Multiple parameters must be passed in a memory buffer. There is a proposal to pass more parameters in registers [1], but no active discussion on that. A new syscall minimizes compatibility issues and offers an extensible frame work to other architectures, but this will likely result in some overlap of mmap()/mprotect(). The introduction of PROT_SHSTK to mmap()/mprotect() takes advantage of existing APIs. The x86-specific PROT_SHSTK is translated to VM_SHSTK, and a shadow stack mapping is created without reinventing the wheel. There are potential pitfalls though. The most obvious one would be using this as a bypass to shadow stack protection. However, the attacker would have to get to the syscall first. Since arch_calc_vm_prot_bits() is modified, I have moved arch_vm_get_page _prot() and arch_calc_vm_prot_bits() to x86/include/asm/mman.h. This will be more consistent with other architectures. [1] https://lore.kernel.org/lkml/20200828121624.108243-1-hjl.to...@gmail.com/ Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/mman.h | 63 arch/x86/include/uapi/asm/mman.h | 28 ++ mm/mprotect.c| 10 + 3 files changed, 77 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/mman.h diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index ..2f19e429c9e2 --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include +#include + +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +/* + * Take the 4 protection key bits out of the vma->vm_flags + * value and turn them in to the bits that we can put in + * to a pte. + * + * Only override these if Protection Keys are available + * (which is only on 64-bit). + */ +#define arch_vm_get_page_prot(vm_flags)__pgprot( \ + ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ + ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ + ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ + ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) + +#define pkey_vm_prot_bits(prot, key) ( \ + ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ + ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ + ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ + ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) +#else +#define pkey_vm_prot_bits(prot, key) (0) +#endif + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) +{ + unsigned long vm_prot_bits = pkey_vm_prot_bits(prot, pkey); + + vm_prot_bits |= ((prot & PROT_SHSTK) ? VM_SHSTK : 0); + return vm_prot_bits; +} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) + +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +{ + unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; + + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER) && + static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) { + supported |= PROT_SHSTK; + + /* +* A shadow stack mapping is indirectly writable by only +* the CALL and WRUSS instructions, but not other write +* instructions). PROT_SHSTK and PROT_WRITE are mutually +* exclusive. +*/ + supported &= ~PROT_WRITE; + } + + return (prot & ~supported) == 0; +} +#define arch_validate_prot arch_validate_prot + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..39bb7db344a6 100644 --- a/arch/x86/include/uapi/asm/mman.h +++
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/14/20 11:31 AM, Andy Lutomirski wrote: > No matter what we do, the effects of calling vfork() are going to be a > bit odd with SHSTK enabled. I suppose we could disallow this, but > that seems likely to cause its own issues. What's odd about it? If you're a vfork()'d child, you can't touch the stack at all, right? If you do, you or your parent will probably die a horrible death. The extra shadow stacks sanity checks means we'll probably see shadow stack exceptions instead of the slightly more chaotic death without them.
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/14/2020 11:31 AM, Andy Lutomirski wrote: On Sep 14, 2020, at 7:50 AM, Dave Hansen wrote: On 9/11/20 3:59 PM, Yu-cheng Yu wrote: ... Here are the changes if we take the mprotect(PROT_SHSTK) approach. Any comments/suggestions? I still don't like it. :) I'll also be much happier when there's a proper changelog to accompany this which also spells out the alternatives any why they suck so much. Let’s take a step back here. Ignoring the precise API, what exactly is a shadow stack from the perspective of a Linux user program? The simplest answer is that it’s just memory that happens to have certain protections. This enables all kinds of shenanigans. A program could map a memfd twice, once as shadow stack and once as non-shadow-stack, and change its control flow. Similarly, a program could mprotect its shadow stack, modify it, and mprotect it back. In What if we do the following: - If the mapping has VM_SHARED, it cannot be turned to shadow stack. Shadow stack cannot be shared anyway. - Only allow an anonymous mapping to be converted to shadow stack, but not the other way. some threat models, though could be seen as a WRSS bypass. (Although if an attacker can coerce a process to call mprotect(), the game is likely mostly over anyway.) But we could be more restrictive, or perhaps we could allow user code to opt into more restrictions. For example, we could have shadow stacks be special memory that cannot be written from usermode by any means other than ptrace() and friends, WRSS, and actual shadow stack usage. What is the goal? There primary goal is to allocate/mmap a shadow stack from user space. No matter what we do, the effects of calling vfork() are going to be a bit odd with SHSTK enabled. I suppose we could disallow this, but that seems likely to cause its own issues. Do you mean vfork() has issues with call/return? That is taken care of in GLIBC. Or do you mean it has issues with mprotect(PROT_SHSTK)?
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
> On Sep 14, 2020, at 7:50 AM, Dave Hansen wrote: > > On 9/11/20 3:59 PM, Yu-cheng Yu wrote: > ... >> Here are the changes if we take the mprotect(PROT_SHSTK) approach. >> Any comments/suggestions? > > I still don't like it. :) > > I'll also be much happier when there's a proper changelog to accompany > this which also spells out the alternatives any why they suck so much. > Let’s take a step back here. Ignoring the precise API, what exactly is a shadow stack from the perspective of a Linux user program? The simplest answer is that it’s just memory that happens to have certain protections. This enables all kinds of shenanigans. A program could map a memfd twice, once as shadow stack and once as non-shadow-stack, and change its control flow. Similarly, a program could mprotect its shadow stack, modify it, and mprotect it back. In some threat models, though could be seen as a WRSS bypass. (Although if an attacker can coerce a process to call mprotect(), the game is likely mostly over anyway.) But we could be more restrictive, or perhaps we could allow user code to opt into more restrictions. For example, we could have shadow stacks be special memory that cannot be written from usermode by any means other than ptrace() and friends, WRSS, and actual shadow stack usage. What is the goal? No matter what we do, the effects of calling vfork() are going to be a bit odd with SHSTK enabled. I suppose we could disallow this, but that seems likely to cause its own issues.
Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/11/20 3:59 PM, Yu-cheng Yu wrote: ... > Here are the changes if we take the mprotect(PROT_SHSTK) approach. > Any comments/suggestions? I still don't like it. :) I'll also be much happier when there's a proper changelog to accompany this which also spells out the alternatives any why they suck so much. > diff --git a/arch/x86/include/uapi/asm/mman.h > b/arch/x86/include/uapi/asm/mman.h > index d4a8d0424bfb..024f006fcfe8 100644 > --- a/arch/x86/include/uapi/asm/mman.h > +++ b/arch/x86/include/uapi/asm/mman.h > @@ -4,6 +4,8 @@ > > #define MAP_32BIT0x40/* only give out 32bit addresses */ > > +#define PROT_SHSTK 0x10/* shadow stack pages */ > + > #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > /* > * Take the 4 protection key bits out of the vma->vm_flags > @@ -19,13 +21,35 @@ > ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ > ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) > > -#define arch_calc_vm_prot_bits(prot, key) ( \ > +#define pkey_vm_prot_bits(prot, key) ( \ > ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ > ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ > ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ > ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) > +#else > +#define pkey_vm_prot_bits(prot, key) > #endif My inner compiler doesn't think this will compile: ( | shstk_vm_prot_bits(prot)) > +#define shstk_vm_prot_bits(prot) ( \ > + (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) ? \ > + VM_SHSTK : 0) Why do you need to filter PROT_SHSTK twice. Won't the prot passed in here be filtered by arch_validate_prot()? > +#define arch_calc_vm_prot_bits(prot, key) \ > + (pkey_vm_prot_bits(prot, key) | shstk_vm_prot_bits(prot)) > + IMNHO, this is eminently more readable if you do: #define arch_calc_vm_prot_bits(prot, key) \ (shstk_vm_prot_bits(prot)) \ pkey_vm_prot_bits(prot, key)) BTW, can these be static inlines? I forget if I had a good reason for making them #defines. > +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) > +{ > + unsigned long supported = PROT_READ | PROT_EXEC | PROT_SEM; > + > + if (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) > + supported |= PROT_SHSTK; > + else > + supported |= PROT_WRITE; I generally like to make the common case dirt simple to understand. That would probably be: unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; if (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) { supported |= PROT_SHSTK; // Comment about why SHSTK and WRITE // are mutually exclusive. supported &= ~PROT_WRITE; } > #endif /* _ASM_X86_MMAN_H */ > diff --git a/mm/mprotect.c b/mm/mprotect.c > index a8edbcb3af99..520bd8caa005 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -571,6 +571,17 @@ static int do_mprotect_pkey(unsigned long start, size_t > len, > goto out; > } > } > + > + /* > + * Only anonymous mapping is suitable for shadow stack. > + */ Why? > + if (prot & PROT_SHSTK) { > + if (vma->vm_file) { > + error = -EINVAL; > + goto out; > + } > + } You can also save a couple of lines there. The two conditions are pretty small.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Wed, 2020-09-09 at 16:29 -0700, Dave Hansen wrote: > On 9/9/20 4:25 PM, Yu, Yu-cheng wrote: > > On 9/9/2020 4:11 PM, Dave Hansen wrote: > > > On 9/9/20 4:07 PM, Yu, Yu-cheng wrote: > > > > What if a writable mapping is passed to madvise(MADV_SHSTK)? Should > > > > that be rejected? > > > > > > It doesn't matter to me. Even if it's readable, it _stops_ being even > > > directly readable after it's a shadow stack, right? I don't think > > > writes are special in any way. If anything, we *want* it to be writable > > > because that indicates that it can be written to, and we will want to > > > write to it soon. > > > > > But in a PROT_WRITE mapping, all the pte's have _PAGE_BIT_RW set. To > > change them to shadow stack, we need to clear that bit from the pte's. > > That will be like mprotect_fixup()/change_protection_range(). > > The page table hardware bits don't matter. The user-visible protection > effects matter. > > For instance, we have PROT_EXEC, which *CLEARS* a hardware NX PTE bit. > The PROT_ permissions are independent of the hardware. > > I don't think the interface should be influenced at *all* by what whacko > PTE bit combinations we have to set to get the behavior. Here are the changes if we take the mprotect(PROT_SHSTK) approach. Any comments/suggestions? --- arch/x86/include/uapi/asm/mman.h | 26 +- mm/mprotect.c| 11 +++ 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..024f006fcfe8 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -4,6 +4,8 @@ #define MAP_32BIT 0x40/* only give out 32bit addresses */ +#define PROT_SHSTK 0x10/* shadow stack pages */ + #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS /* * Take the 4 protection key bits out of the vma->vm_flags @@ -19,13 +21,35 @@ ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) -#define arch_calc_vm_prot_bits(prot, key) (\ +#define pkey_vm_prot_bits(prot, key) ( \ ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) +#else +#define pkey_vm_prot_bits(prot, key) #endif +#define shstk_vm_prot_bits(prot) ( \ + (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) ? \ + VM_SHSTK : 0) + +#define arch_calc_vm_prot_bits(prot, key) \ + (pkey_vm_prot_bits(prot, key) | shstk_vm_prot_bits(prot)) + #include +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +{ + unsigned long supported = PROT_READ | PROT_EXEC | PROT_SEM; + + if (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) + supported |= PROT_SHSTK; + else + supported |= PROT_WRITE; + + return (prot & ~supported) == 0; +} +#define arch_validate_prot arch_validate_prot + #endif /* _ASM_X86_MMAN_H */ diff --git a/mm/mprotect.c b/mm/mprotect.c index a8edbcb3af99..520bd8caa005 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -571,6 +571,17 @@ static int do_mprotect_pkey(unsigned long start, size_t len, goto out; } } + + /* +* Only anonymous mapping is suitable for shadow stack. +*/ + if (prot & PROT_SHSTK) { + if (vma->vm_file) { + error = -EINVAL; + goto out; + } + } + if (start > vma->vm_start) prev = vma; --
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/2020 4:29 PM, Dave Hansen wrote: On 9/9/20 4:25 PM, Yu, Yu-cheng wrote: On 9/9/2020 4:11 PM, Dave Hansen wrote: On 9/9/20 4:07 PM, Yu, Yu-cheng wrote: What if a writable mapping is passed to madvise(MADV_SHSTK)? Should that be rejected? It doesn't matter to me. Even if it's readable, it _stops_ being even directly readable after it's a shadow stack, right? I don't think writes are special in any way. If anything, we *want* it to be writable because that indicates that it can be written to, and we will want to write to it soon. But in a PROT_WRITE mapping, all the pte's have _PAGE_BIT_RW set. To change them to shadow stack, we need to clear that bit from the pte's. That will be like mprotect_fixup()/change_protection_range(). The page table hardware bits don't matter. The user-visible protection effects matter. For instance, we have PROT_EXEC, which *CLEARS* a hardware NX PTE bit. The PROT_ permissions are independent of the hardware. Same for shadow stack here. We consider shadow stack "writable", but we want to clear _PAGE_BIT_RW, which is set for the PTEs of the other type of "writable" mapping. To change a writable data mapping to a writable shadow stack mapping, we need to call change_protection_range(). I don't think the interface should be influenced at *all* by what whacko PTE bit combinations we have to set to get the behavior.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/20 4:25 PM, Yu, Yu-cheng wrote: > On 9/9/2020 4:11 PM, Dave Hansen wrote: >> On 9/9/20 4:07 PM, Yu, Yu-cheng wrote: >>> What if a writable mapping is passed to madvise(MADV_SHSTK)? Should >>> that be rejected? >> >> It doesn't matter to me. Even if it's readable, it _stops_ being even >> directly readable after it's a shadow stack, right? I don't think >> writes are special in any way. If anything, we *want* it to be writable >> because that indicates that it can be written to, and we will want to >> write to it soon. >> > But in a PROT_WRITE mapping, all the pte's have _PAGE_BIT_RW set. To > change them to shadow stack, we need to clear that bit from the pte's. > That will be like mprotect_fixup()/change_protection_range(). The page table hardware bits don't matter. The user-visible protection effects matter. For instance, we have PROT_EXEC, which *CLEARS* a hardware NX PTE bit. The PROT_ permissions are independent of the hardware. I don't think the interface should be influenced at *all* by what whacko PTE bit combinations we have to set to get the behavior.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/20 4:07 PM, Yu, Yu-cheng wrote: > What if a writable mapping is passed to madvise(MADV_SHSTK)? Should > that be rejected? It doesn't matter to me. Even if it's readable, it _stops_ being even directly readable after it's a shadow stack, right? I don't think writes are special in any way. If anything, we *want* it to be writable because that indicates that it can be written to, and we will want to write to it soon.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/2020 3:59 PM, Dave Hansen wrote: On 9/9/20 3:08 PM, Yu, Yu-cheng wrote: After looking at this more, I found the changes are more similar to mprotect() than madvise(). We are going to change an anonymous mapping to a read-only mapping, and add the VM_SHSTK flag to it. Would an x86-specific mprotect(PROT_SHSTK) make more sense? One alternative would be requiring a read-only mapping for madvise(MADV_SHSTK). But that is inconvenient for the application. Why? It's just: mmap()/malloc(); mprotect(PROT_READ); madvise(MADV_SHSTK); vs. mmap()/malloc(); mprotect(PROT_SHSTK); I'm not sure a single syscall counts as inconvenient. I don't quite think we should use a PROT_ bit for this. It seems like the kind of thing that could be fragile and break existing expectations. I don't care _that_ strongly though. What if a writable mapping is passed to madvise(MADV_SHSTK)? Should that be rejected?
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/2020 4:11 PM, Dave Hansen wrote: On 9/9/20 4:07 PM, Yu, Yu-cheng wrote: What if a writable mapping is passed to madvise(MADV_SHSTK)? Should that be rejected? It doesn't matter to me. Even if it's readable, it _stops_ being even directly readable after it's a shadow stack, right? I don't think writes are special in any way. If anything, we *want* it to be writable because that indicates that it can be written to, and we will want to write to it soon. But in a PROT_WRITE mapping, all the pte's have _PAGE_BIT_RW set. To change them to shadow stack, we need to clear that bit from the pte's. That will be like mprotect_fixup()/change_protection_range().
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/9/20 3:08 PM, Yu, Yu-cheng wrote: > After looking at this more, I found the changes are more similar to > mprotect() than madvise(). We are going to change an anonymous mapping > to a read-only mapping, and add the VM_SHSTK flag to it. Would an > x86-specific mprotect(PROT_SHSTK) make more sense? > > One alternative would be requiring a read-only mapping for > madvise(MADV_SHSTK). But that is inconvenient for the application. Why? It's just: mmap()/malloc(); mprotect(PROT_READ); madvise(MADV_SHSTK); vs. mmap()/malloc(); mprotect(PROT_SHSTK); I'm not sure a single syscall counts as inconvenient. I don't quite think we should use a PROT_ bit for this. It seems like the kind of thing that could be fragile and break existing expectations. I don't care _that_ strongly though.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/8/2020 11:25 AM, Yu, Yu-cheng wrote: On 9/8/2020 10:57 AM, Dave Hansen wrote: On 9/8/20 10:50 AM, Yu, Yu-cheng wrote: What about this: - Do not add any new syscall or arch_prctl for creating a new shadow stack. - Add a new arch_prctl that can turn an anonymous mapping to a shadow stack mapping. This allows the application to do whatever is necessary. It can even allow GDB or JIT code to create or fix a call stack. Fine with me. But, it's going to effectively be arch_prctl(PR_CONVERT_TO_SHS..., addr, len); when it could just as easily be: madvise(addr, len, MADV_SHSTK...); Or a new syscall. The only question in my mind is whether we want to do something generic that we can use for other similar things in the future, like: madvise2(addr, len, flags, MADV2_SHSTK...); I don't really feel strongly about it, though. Could you please share your logic on why you want a prctl() as opposed to a whole new syscall? A new syscall is more intrusive, I think. When creating a new shadow stack, the kernel also installs a restore token on the top of the new shadow stack, and it is somewhat x86-specific. So far no other arch's need this. Yes, madvise is better if the kernel only needs to change the mapping. The application itself can create the restore token before calling madvise(). After looking at this more, I found the changes are more similar to mprotect() than madvise(). We are going to change an anonymous mapping to a read-only mapping, and add the VM_SHSTK flag to it. Would an x86-specific mprotect(PROT_SHSTK) make more sense? One alternative would be requiring a read-only mapping for madvise(MADV_SHSTK). But that is inconvenient for the application. Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/8/2020 10:57 AM, Dave Hansen wrote: On 9/8/20 10:50 AM, Yu, Yu-cheng wrote: What about this: - Do not add any new syscall or arch_prctl for creating a new shadow stack. - Add a new arch_prctl that can turn an anonymous mapping to a shadow stack mapping. This allows the application to do whatever is necessary. It can even allow GDB or JIT code to create or fix a call stack. Fine with me. But, it's going to effectively be arch_prctl(PR_CONVERT_TO_SHS..., addr, len); when it could just as easily be: madvise(addr, len, MADV_SHSTK...); Or a new syscall. The only question in my mind is whether we want to do something generic that we can use for other similar things in the future, like: madvise2(addr, len, flags, MADV2_SHSTK...); I don't really feel strongly about it, though. Could you please share your logic on why you want a prctl() as opposed to a whole new syscall? A new syscall is more intrusive, I think. When creating a new shadow stack, the kernel also installs a restore token on the top of the new shadow stack, and it is somewhat x86-specific. So far no other arch's need this. Yes, madvise is better if the kernel only needs to change the mapping. The application itself can create the restore token before calling madvise().
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/8/20 10:50 AM, Yu, Yu-cheng wrote: > What about this: > > - Do not add any new syscall or arch_prctl for creating a new shadow stack. > > - Add a new arch_prctl that can turn an anonymous mapping to a shadow > stack mapping. > > This allows the application to do whatever is necessary. It can even > allow GDB or JIT code to create or fix a call stack. Fine with me. But, it's going to effectively be arch_prctl(PR_CONVERT_TO_SHS..., addr, len); when it could just as easily be: madvise(addr, len, MADV_SHSTK...); Or a new syscall. The only question in my mind is whether we want to do something generic that we can use for other similar things in the future, like: madvise2(addr, len, flags, MADV2_SHSTK...); I don't really feel strongly about it, though. Could you please share your logic on why you want a prctl() as opposed to a whole new syscall?
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Tue, Sep 01, 2020 at 11:11:37AM -0700, Dave Hansen wrote: > On 9/1/20 10:45 AM, Andy Lutomirski wrote: > >>> For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect > >>> family of calls. One or two additional arch-specific mmap flags are > >>> sufficient for now. > >>> > >>> Is x86 definitely not going to fit within those calls? > >> That can work for x86. Andy, what if we create PROT_SHSTK, which can > >> been seen only from the user. Once in kernel, it is translated to > >> VM_SHSTK. One question for mremap/mprotect is, do we allow a normal > >> data area to become shadow stack? > > I'm unconvinced that we want to use a somewhat precious PROT_ or VM_ > > bit for this. Using a flag bit makes sense if we expect anyone to > > ever map an fd or similar as a shadow stack, but that seems a bit odd > > in the first place. To me, it seems more logical for a shadow stack > > to be a special sort of mapping with a special vm_ops, not a normal > > mapping with a special flag set. Although I realize that we want > > shadow stacks to work like anonymous memory with respect to fork(). > > Dave? > > I actually don't like the idea of *creating* mappings much. > > I think the pkey model has worked out pretty well where we separate > creating the mapping from doing something *to* it, like changing > protections. For instance, it would be nice if we could preserve things > like using hugetlbfs or heck even doing KSM for shadow stacks. > > If we're *creating* mappings, we've pretty much ruled out things like > hugetlbfs. > > Something like mprotect_shstk() would allow an implementation today that > only works on anonymous memory *and* sets up a special vm_ops. But, the > same exact ABI could do wonky stuff in the future if we decided we > wanted to do shadow stacks on DAX or hugetlbfs or whatever. > > I don't really like the idea of PROT_SHSTK those are plumbed into a > bunch of interfaces. But, I also can't deny that it seems to be working > fine for the arm64 folks. Note, there are some rough edges, such as what happens when someone calls mprotect() on memory marked with PROT_BTI. Unless the caller knows whether PROT_BTI should be set for that page, the flag may get unintentionally cleared. Since the flag only applies to text pages though, it's not _that_ much of a concern. Software that deals with writable text pages is also usually involved in generating the code and so will know about PROT_BTI. That's was the theory anyway. In the longer term, it might be preferable to have a mprotect2() that can leave some flags unmodified, and that doesn't silently ignore unknown flags (at least one of mmap or mprotect does; I don't recall which). We attempt didn't go this far, for now. For arm64 it seemed fairly natural for the BTI flag to be a PROT_ flag, but I don't know enough detail about x86 shstk to know whether it's a natural fit there. Cheers ---Dave
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/1/2020 11:17 AM, Florian Weimer wrote: * Yu-cheng Yu: On 9/1/2020 10:50 AM, Florian Weimer wrote: * Yu-cheng Yu: Like other arch_prctl()'s, this parameter was 'unsigned long' earlier. The idea was, since this arch_prctl is only implemented for the 64-bit kernel, we wanted it to look as 64-bit only. I will change it back to 'unsigned long'. What about x32? In general, long is rather problematic for x32. The problem is the size of 'long', right? Because this parameter is passed in a register, and only the lower bits are used, x32 works as well. The userspace calling convention leaves the upper 32-bit undefined. Therefore, this only works by accident if the kernel does not check that the upper 32-bit are zero, which is probably a kernel bug. It's unclear to me what you are trying to accomplish. Why do you want to use unsigned long here? The correct type appears to be unsigned int. This correctly expresses that the upper 32 bits of the register do not matter. Yes, you are right. I will make it 'unsigned int'. Thanks, Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Tue, Sep 1, 2020 at 11:17 AM Florian Weimer wrote: > > * Yu-cheng Yu: > > > On 9/1/2020 10:50 AM, Florian Weimer wrote: > >> * Yu-cheng Yu: > >> > >>> Like other arch_prctl()'s, this parameter was 'unsigned long' > >>> earlier. The idea was, since this arch_prctl is only implemented for > >>> the 64-bit kernel, we wanted it to look as 64-bit only. I will change > >>> it back to 'unsigned long'. > >> What about x32? In general, long is rather problematic for x32. > > > > The problem is the size of 'long', right? > > Because this parameter is passed in a register, and only the lower > > bits are used, x32 works as well. > > The userspace calling convention leaves the upper 32-bit undefined. > Therefore, this only works by accident if the kernel does not check that > the upper 32-bit are zero, which is probably a kernel bug. > > It's unclear to me what you are trying to accomplish. Why do you want > to use unsigned long here? The correct type appears to be unsigned int. > This correctly expresses that the upper 32 bits of the register do not > matter. > unsigned int is the correct type since only the lower 32 bits are used. -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* Yu-cheng Yu: > On 9/1/2020 10:50 AM, Florian Weimer wrote: >> * Yu-cheng Yu: >> >>> Like other arch_prctl()'s, this parameter was 'unsigned long' >>> earlier. The idea was, since this arch_prctl is only implemented for >>> the 64-bit kernel, we wanted it to look as 64-bit only. I will change >>> it back to 'unsigned long'. >> What about x32? In general, long is rather problematic for x32. > > The problem is the size of 'long', right? > Because this parameter is passed in a register, and only the lower > bits are used, x32 works as well. The userspace calling convention leaves the upper 32-bit undefined. Therefore, this only works by accident if the kernel does not check that the upper 32-bit are zero, which is probably a kernel bug. It's unclear to me what you are trying to accomplish. Why do you want to use unsigned long here? The correct type appears to be unsigned int. This correctly expresses that the upper 32 bits of the register do not matter. Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/1/20 10:45 AM, Andy Lutomirski wrote: >>> For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect >>> family of calls. One or two additional arch-specific mmap flags are >>> sufficient for now. >>> >>> Is x86 definitely not going to fit within those calls? >> That can work for x86. Andy, what if we create PROT_SHSTK, which can >> been seen only from the user. Once in kernel, it is translated to >> VM_SHSTK. One question for mremap/mprotect is, do we allow a normal >> data area to become shadow stack? > I'm unconvinced that we want to use a somewhat precious PROT_ or VM_ > bit for this. Using a flag bit makes sense if we expect anyone to > ever map an fd or similar as a shadow stack, but that seems a bit odd > in the first place. To me, it seems more logical for a shadow stack > to be a special sort of mapping with a special vm_ops, not a normal > mapping with a special flag set. Although I realize that we want > shadow stacks to work like anonymous memory with respect to fork(). > Dave? I actually don't like the idea of *creating* mappings much. I think the pkey model has worked out pretty well where we separate creating the mapping from doing something *to* it, like changing protections. For instance, it would be nice if we could preserve things like using hugetlbfs or heck even doing KSM for shadow stacks. If we're *creating* mappings, we've pretty much ruled out things like hugetlbfs. Something like mprotect_shstk() would allow an implementation today that only works on anonymous memory *and* sets up a special vm_ops. But, the same exact ABI could do wonky stuff in the future if we decided we wanted to do shadow stacks on DAX or hugetlbfs or whatever. I don't really like the idea of PROT_SHSTK those are plumbed into a bunch of interfaces. But, I also can't deny that it seems to be working fine for the arm64 folks.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/1/2020 10:50 AM, Florian Weimer wrote: * Yu-cheng Yu: Like other arch_prctl()'s, this parameter was 'unsigned long' earlier. The idea was, since this arch_prctl is only implemented for the 64-bit kernel, we wanted it to look as 64-bit only. I will change it back to 'unsigned long'. What about x32? In general, long is rather problematic for x32. The problem is the size of 'long', right? Because this parameter is passed in a register, and only the lower bits are used, x32 works as well.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* Yu-cheng Yu: > Like other arch_prctl()'s, this parameter was 'unsigned long' > earlier. The idea was, since this arch_prctl is only implemented for > the 64-bit kernel, we wanted it to look as 64-bit only. I will change > it back to 'unsigned long'. What about x32? In general, long is rather problematic for x32. Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/27/2020 7:08 AM, H.J. Lu wrote: On Thu, Aug 27, 2020 at 7:07 AM H.J. Lu wrote: On Thu, Aug 27, 2020 at 6:36 AM Florian Weimer wrote: * H. J. Lu: On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: * Dave Martin: You're right that this has implications: for i386, libc probably pulls more arguments off the stack than are really there in some situations. This isn't a new problem though. There are already generic prctls with fewer than 4 args that are used on x86. As originally posted, glibc prctl would have to know that it has to pull an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But then the u64 argument is a problem for arch_prctl as well. Argument of ARCH_X86_CET_DISABLE is int and passed in register. The commit message and the C source say otherwise, I think (not sure about the C source, not a kernel hacker). It should read: arch_prctl(ARCH_X86_CET_DISABLE, unsigned long features) Or arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) Like other arch_prctl()'s, this parameter was 'unsigned long' earlier. The idea was, since this arch_prctl is only implemented for the 64-bit kernel, we wanted it to look as 64-bit only. I will change it back to 'unsigned long'. Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Tue, Sep 1, 2020 at 10:23 AM Yu, Yu-cheng wrote: > > On 9/1/2020 3:28 AM, Dave Martin wrote: > > On Thu, Aug 27, 2020 at 06:26:11AM -0700, H.J. Lu wrote: > >> On Wed, Aug 26, 2020 at 12:57 PM Dave Hansen wrote: > >>> > >>> On 8/26/20 11:49 AM, Yu, Yu-cheng wrote: > > I would expect things like Go and various JITs to call it directly. > > > > If we wanted to be fancy and add a potentially more widely useful > > syscall, how about: > > > > mmap_special(void *addr, size_t length, int prot, int flags, int type); > > > > Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, > > this is really just mmap() except that we want to map something a bit > > magical, and we don't want to require opening a device node to do it. > > One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. > Does ARM have similar needs for memory mapping, Dave? > >>> > >>> No idea. > >>> > >>> But, mmap_special() is *basically* mmap2() with extra-big flags space. > >>> I suspect it will grow some more uses on top of shadow stacks. It could > >>> have, for instance, been used to allocate MPX bounds tables. > >> > >> There is no reason we can't use > >> > >> long arch_prctl (int, unsigned long, unsigned long, unsigned long, ..); > >> > >> for ARCH_X86_CET_MMAP_SHSTK. We just need to use > >> > >> syscall (SYS_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, ...); > > > > > > For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect > > family of calls. One or two additional arch-specific mmap flags are > > sufficient for now. > > > > Is x86 definitely not going to fit within those calls? > > That can work for x86. Andy, what if we create PROT_SHSTK, which can > been seen only from the user. Once in kernel, it is translated to > VM_SHSTK. One question for mremap/mprotect is, do we allow a normal > data area to become shadow stack? I'm unconvinced that we want to use a somewhat precious PROT_ or VM_ bit for this. Using a flag bit makes sense if we expect anyone to ever map an fd or similar as a shadow stack, but that seems a bit odd in the first place. To me, it seems more logical for a shadow stack to be a special sort of mapping with a special vm_ops, not a normal mapping with a special flag set. Although I realize that we want shadow stacks to work like anonymous memory with respect to fork(). Dave? --Andy
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 9/1/2020 3:28 AM, Dave Martin wrote: On Thu, Aug 27, 2020 at 06:26:11AM -0700, H.J. Lu wrote: On Wed, Aug 26, 2020 at 12:57 PM Dave Hansen wrote: On 8/26/20 11:49 AM, Yu, Yu-cheng wrote: I would expect things like Go and various JITs to call it directly. If we wanted to be fancy and add a potentially more widely useful syscall, how about: mmap_special(void *addr, size_t length, int prot, int flags, int type); Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, this is really just mmap() except that we want to map something a bit magical, and we don't want to require opening a device node to do it. One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. Does ARM have similar needs for memory mapping, Dave? No idea. But, mmap_special() is *basically* mmap2() with extra-big flags space. I suspect it will grow some more uses on top of shadow stacks. It could have, for instance, been used to allocate MPX bounds tables. There is no reason we can't use long arch_prctl (int, unsigned long, unsigned long, unsigned long, ..); for ARCH_X86_CET_MMAP_SHSTK. We just need to use syscall (SYS_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, ...); For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect family of calls. One or two additional arch-specific mmap flags are sufficient for now. Is x86 definitely not going to fit within those calls? That can work for x86. Andy, what if we create PROT_SHSTK, which can been seen only from the user. Once in kernel, it is translated to VM_SHSTK. One question for mremap/mprotect is, do we allow a normal data area to become shadow stack? For now, I can't see what arg[2] is used for (and hence the type argument of mmap_special()), but I haven't dug through the whole series. If we use the approach above, then we don't need arch_prctl changes. Thanks, Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 06:26:11AM -0700, H.J. Lu wrote: > On Wed, Aug 26, 2020 at 12:57 PM Dave Hansen wrote: > > > > On 8/26/20 11:49 AM, Yu, Yu-cheng wrote: > > >> I would expect things like Go and various JITs to call it directly. > > >> > > >> If we wanted to be fancy and add a potentially more widely useful > > >> syscall, how about: > > >> > > >> mmap_special(void *addr, size_t length, int prot, int flags, int type); > > >> > > >> Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, > > >> this is really just mmap() except that we want to map something a bit > > >> magical, and we don't want to require opening a device node to do it. > > > > > > One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. > > > Does ARM have similar needs for memory mapping, Dave? > > > > No idea. > > > > But, mmap_special() is *basically* mmap2() with extra-big flags space. > > I suspect it will grow some more uses on top of shadow stacks. It could > > have, for instance, been used to allocate MPX bounds tables. > > There is no reason we can't use > > long arch_prctl (int, unsigned long, unsigned long, unsigned long, ..); > > for ARCH_X86_CET_MMAP_SHSTK. We just need to use > > syscall (SYS_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, ...); For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect family of calls. One or two additional arch-specific mmap flags are sufficient for now. Is x86 definitely not going to fit within those calls? For now, I can't see what arg[2] is used for (and hence the type argument of mmap_special()), but I haven't dug through the whole series. Cheers ---Dave
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Fri, Aug 28, 2020 at 10:39 AM Andy Lutomirski wrote: > > On Fri, Aug 28, 2020 at 4:38 AM H.J. Lu wrote: > > > > On Thu, Aug 27, 2020 at 11:24 PM Florian Weimer wrote: > > > > > > * H. J. Lu: > > > > > > > Can you think of ANY issues of passing more arguments to arch_prctl? > > > > > > On x32, the glibc arch_prctl system call wrapper only passes two > > > arguments to the kernel, and applications have no way of detecting that. > > > musl only passes two arguments on all architectures. It happens to work > > > anyway with default compiler flags, but that's an accident. > > > > In the current glibc, there is no arch_prctl wrapper for i386. There are > > arch_prctl wrappers with 2 arguments for x86-64 and x32. But this isn't an > > issue for glibc since glibc is both the provider and the user of the new > > arch_prctl extension. Besides, > > > > long syscall(long number, ...); > > > > is always available. > > Userspace is probably full of tools and libraries that contain tables > of system calls and their signatures. Think tracing, audit, container > management, etc. I don't know how they will react to the addition of > new arguments. Yes, they need to be updated to understand other new operations added for CET. -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Fri, Aug 28, 2020 at 4:38 AM H.J. Lu wrote: > > On Thu, Aug 27, 2020 at 11:24 PM Florian Weimer wrote: > > > > * H. J. Lu: > > > > > Can you think of ANY issues of passing more arguments to arch_prctl? > > > > On x32, the glibc arch_prctl system call wrapper only passes two > > arguments to the kernel, and applications have no way of detecting that. > > musl only passes two arguments on all architectures. It happens to work > > anyway with default compiler flags, but that's an accident. > > In the current glibc, there is no arch_prctl wrapper for i386. There are > arch_prctl wrappers with 2 arguments for x86-64 and x32. But this isn't an > issue for glibc since glibc is both the provider and the user of the new > arch_prctl extension. Besides, > > long syscall(long number, ...); > > is always available. Userspace is probably full of tools and libraries that contain tables of system calls and their signatures. Think tracing, audit, container management, etc. I don't know how they will react to the addition of new arguments.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 11:24 PM Florian Weimer wrote: > > * H. J. Lu: > > > Can you think of ANY issues of passing more arguments to arch_prctl? > > On x32, the glibc arch_prctl system call wrapper only passes two > arguments to the kernel, and applications have no way of detecting that. > musl only passes two arguments on all architectures. It happens to work > anyway with default compiler flags, but that's an accident. In the current glibc, there is no arch_prctl wrapper for i386. There are arch_prctl wrappers with 2 arguments for x86-64 and x32. But this isn't an issue for glibc since glibc is both the provider and the user of the new arch_prctl extension. Besides, long syscall(long number, ...); is always available. -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* H. J. Lu: > Can you think of ANY issues of passing more arguments to arch_prctl? On x32, the glibc arch_prctl system call wrapper only passes two arguments to the kernel, and applications have no way of detecting that. musl only passes two arguments on all architectures. It happens to work anyway with default compiler flags, but that's an accident. Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 6:35 PM Andy Lutomirski wrote: > > On Thu, Aug 27, 2020 at 12:38 PM H.J. Lu wrote: > > > > On Thu, Aug 27, 2020 at 11:56 AM Andy Lutomirski > > wrote: > > > > > > > > > > > > > On Aug 27, 2020, at 11:13 AM, Yu, Yu-cheng > > > > wrote: > > > > > > > > On 8/27/2020 6:36 AM, Florian Weimer wrote: > > > >> * H. J. Lu: > > > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer > > > wrote: > > > > > > > > * Dave Martin: > > > > > > > >> You're right that this has implications: for i386, libc probably > > > >> pulls > > > >> more arguments off the stack than are really there in some > > > >> situations. > > > >> This isn't a new problem though. There are already generic prctls > > > >> with > > > >> fewer than 4 args that are used on x86. > > > > > > > > As originally posted, glibc prctl would have to know that it has to > > > > pull > > > > an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > > > > then the u64 argument is a problem for arch_prctl as well. > > > > > > > >>> > > > >>> Argument of ARCH_X86_CET_DISABLE is int and passed in register. > > > >> The commit message and the C source say otherwise, I think (not sure > > > >> about the C source, not a kernel hacker). > > > > > > > > H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, > > > > and then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags > > > > and size are all in registers, this also solves problems being pointed > > > > out earlier. Without a wrapper, the shadow stack mmap call (from user > > > > space) will be: > > > > > > > > syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). > > > > > > I admit I don’t see a show stopping technical reason we can’t add > > > arguments to an existing syscall, but I’m pretty sure it’s unprecedented, > > > and it doesn’t seem like a good idea. > > > > prctl prototype is: > > > > extern int prctl (int __option, ...) > > > > and implemented in kernel as: > > > > int prctl(int option, unsigned long arg2, unsigned long arg3, > > unsigned long arg4, unsigned long arg5); > > > > Not all prctl operations take all 5 arguments. It also applies > > to arch_prctl. It is quite normal for different operations of > > arch_prctl to take different numbers of arguments. > > If by "quite normal" you mean "does not happen", then I agree. > > In any event, I will not have anything to do with a patch that changes > an existing syscall signature unless Linus personally acks it. So if > you want to email him and linux-abi, be my guest. Can you think of ANY issues of passing more arguments to arch_prctl? syscall () provided by glibc always passes 6 arguments to the kernel. Arguments are already in the registers. What kind of problems do you see? -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 12:38 PM H.J. Lu wrote: > > On Thu, Aug 27, 2020 at 11:56 AM Andy Lutomirski wrote: > > > > > > > > > On Aug 27, 2020, at 11:13 AM, Yu, Yu-cheng wrote: > > > > > > On 8/27/2020 6:36 AM, Florian Weimer wrote: > > >> * H. J. Lu: > > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer > > wrote: > > > > > > * Dave Martin: > > > > > >> You're right that this has implications: for i386, libc probably > > >> pulls > > >> more arguments off the stack than are really there in some > > >> situations. > > >> This isn't a new problem though. There are already generic prctls > > >> with > > >> fewer than 4 args that are used on x86. > > > > > > As originally posted, glibc prctl would have to know that it has to > > > pull > > > an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > > > then the u64 argument is a problem for arch_prctl as well. > > > > > >>> > > >>> Argument of ARCH_X86_CET_DISABLE is int and passed in register. > > >> The commit message and the C source say otherwise, I think (not sure > > >> about the C source, not a kernel hacker). > > > > > > H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, > > > and then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags > > > and size are all in registers, this also solves problems being pointed > > > out earlier. Without a wrapper, the shadow stack mmap call (from user > > > space) will be: > > > > > > syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). > > > > I admit I don’t see a show stopping technical reason we can’t add arguments > > to an existing syscall, but I’m pretty sure it’s unprecedented, and it > > doesn’t seem like a good idea. > > prctl prototype is: > > extern int prctl (int __option, ...) > > and implemented in kernel as: > > int prctl(int option, unsigned long arg2, unsigned long arg3, > unsigned long arg4, unsigned long arg5); > > Not all prctl operations take all 5 arguments. It also applies > to arch_prctl. It is quite normal for different operations of > arch_prctl to take different numbers of arguments. If by "quite normal" you mean "does not happen", then I agree. In any event, I will not have anything to do with a patch that changes an existing syscall signature unless Linus personally acks it. So if you want to email him and linux-abi, be my guest.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 11:56 AM Andy Lutomirski wrote: > > > > > On Aug 27, 2020, at 11:13 AM, Yu, Yu-cheng wrote: > > > > On 8/27/2020 6:36 AM, Florian Weimer wrote: > >> * H. J. Lu: > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer > wrote: > > > > * Dave Martin: > > > >> You're right that this has implications: for i386, libc probably pulls > >> more arguments off the stack than are really there in some situations. > >> This isn't a new problem though. There are already generic prctls with > >> fewer than 4 args that are used on x86. > > > > As originally posted, glibc prctl would have to know that it has to pull > > an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > > then the u64 argument is a problem for arch_prctl as well. > > > >>> > >>> Argument of ARCH_X86_CET_DISABLE is int and passed in register. > >> The commit message and the C source say otherwise, I think (not sure > >> about the C source, not a kernel hacker). > > > > H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, and > > then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags and > > size are all in registers, this also solves problems being pointed out > > earlier. Without a wrapper, the shadow stack mmap call (from user space) > > will be: > > > > syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). > > I admit I don’t see a show stopping technical reason we can’t add arguments > to an existing syscall, but I’m pretty sure it’s unprecedented, and it > doesn’t seem like a good idea. prctl prototype is: extern int prctl (int __option, ...) and implemented in kernel as: int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); Not all prctl operations take all 5 arguments. It also applies to arch_prctl. It is quite normal for different operations of arch_prctl to take different numbers of arguments. -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/27/2020 11:56 AM, Andy Lutomirski wrote: On Aug 27, 2020, at 11:13 AM, Yu, Yu-cheng wrote: On 8/27/2020 6:36 AM, Florian Weimer wrote: * H. J. Lu: On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: * Dave Martin: You're right that this has implications: for i386, libc probably pulls more arguments off the stack than are really there in some situations. This isn't a new problem though. There are already generic prctls with fewer than 4 args that are used on x86. As originally posted, glibc prctl would have to know that it has to pull an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But then the u64 argument is a problem for arch_prctl as well. Argument of ARCH_X86_CET_DISABLE is int and passed in register. The commit message and the C source say otherwise, I think (not sure about the C source, not a kernel hacker). H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, and then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags and size are all in registers, this also solves problems being pointed out earlier. Without a wrapper, the shadow stack mmap call (from user space) will be: syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). I admit I don’t see a show stopping technical reason we can’t add arguments to an existing syscall, but I’m pretty sure it’s unprecedented, and it doesn’t seem like a good idea. There are nine existing arch_prctl calls now. If the concern is the extra new arguments getting misused, we can mask them out for the existing calls. Otherwise, I have not seen anything that can break. Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
> On Aug 27, 2020, at 11:13 AM, Yu, Yu-cheng wrote: > > On 8/27/2020 6:36 AM, Florian Weimer wrote: >> * H. J. Lu: On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: > > * Dave Martin: > >> You're right that this has implications: for i386, libc probably pulls >> more arguments off the stack than are really there in some situations. >> This isn't a new problem though. There are already generic prctls with >> fewer than 4 args that are used on x86. > > As originally posted, glibc prctl would have to know that it has to pull > an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > then the u64 argument is a problem for arch_prctl as well. > >>> >>> Argument of ARCH_X86_CET_DISABLE is int and passed in register. >> The commit message and the C source say otherwise, I think (not sure >> about the C source, not a kernel hacker). > > H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, and > then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags and size > are all in registers, this also solves problems being pointed out earlier. > Without a wrapper, the shadow stack mmap call (from user space) will be: > > syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). I admit I don’t see a show stopping technical reason we can’t add arguments to an existing syscall, but I’m pretty sure it’s unprecedented, and it doesn’t seem like a good idea.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/27/2020 6:36 AM, Florian Weimer wrote: * H. J. Lu: On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: * Dave Martin: You're right that this has implications: for i386, libc probably pulls more arguments off the stack than are really there in some situations. This isn't a new problem though. There are already generic prctls with fewer than 4 args that are used on x86. As originally posted, glibc prctl would have to know that it has to pull an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But then the u64 argument is a problem for arch_prctl as well. Argument of ARCH_X86_CET_DISABLE is int and passed in register. The commit message and the C source say otherwise, I think (not sure about the C source, not a kernel hacker). H.J. Lu suggested that we fix x86 arch_prctl() to take four arguments, and then keep MMAP_SHSTK as an arch_prctl(). Because now the map flags and size are all in registers, this also solves problems being pointed out earlier. Without a wrapper, the shadow stack mmap call (from user space) will be: syscall(_NR_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, size, MAP_32BIT). I think this would be a nice alternative to another new syscall. If this looks good to everyone, I can send out new patches as response to my current version, and then after all issues fixed, send v12. Thanks, Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Wed, Aug 26, 2020 at 12:57 PM Dave Hansen wrote: > > On 8/26/20 11:49 AM, Yu, Yu-cheng wrote: > >> I would expect things like Go and various JITs to call it directly. > >> > >> If we wanted to be fancy and add a potentially more widely useful > >> syscall, how about: > >> > >> mmap_special(void *addr, size_t length, int prot, int flags, int type); > >> > >> Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, > >> this is really just mmap() except that we want to map something a bit > >> magical, and we don't want to require opening a device node to do it. > > > > One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. > > Does ARM have similar needs for memory mapping, Dave? > > No idea. > > But, mmap_special() is *basically* mmap2() with extra-big flags space. > I suspect it will grow some more uses on top of shadow stacks. It could > have, for instance, been used to allocate MPX bounds tables. There is no reason we can't use long arch_prctl (int, unsigned long, unsigned long, unsigned long, ..); for ARCH_X86_CET_MMAP_SHSTK. We just need to use syscall (SYS_arch_prctl, ARCH_X86_CET_MMAP_SHSTK, ...); -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 6:36 AM Florian Weimer wrote: > > * H. J. Lu: > > > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: > >> > >> * Dave Martin: > >> > >> > You're right that this has implications: for i386, libc probably pulls > >> > more arguments off the stack than are really there in some situations. > >> > This isn't a new problem though. There are already generic prctls with > >> > fewer than 4 args that are used on x86. > >> > >> As originally posted, glibc prctl would have to know that it has to pull > >> an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > >> then the u64 argument is a problem for arch_prctl as well. > >> > > > > Argument of ARCH_X86_CET_DISABLE is int and passed in register. > > The commit message and the C source say otherwise, I think (not sure > about the C source, not a kernel hacker). It should read: arch_prctl(ARCH_X86_CET_DISABLE, unsigned long features) -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 7:07 AM H.J. Lu wrote: > > On Thu, Aug 27, 2020 at 6:36 AM Florian Weimer wrote: > > > > * H. J. Lu: > > > > > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: > > >> > > >> * Dave Martin: > > >> > > >> > You're right that this has implications: for i386, libc probably pulls > > >> > more arguments off the stack than are really there in some situations. > > >> > This isn't a new problem though. There are already generic prctls with > > >> > fewer than 4 args that are used on x86. > > >> > > >> As originally posted, glibc prctl would have to know that it has to pull > > >> an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > > >> then the u64 argument is a problem for arch_prctl as well. > > >> > > > > > > Argument of ARCH_X86_CET_DISABLE is int and passed in register. > > > > The commit message and the C source say otherwise, I think (not sure > > about the C source, not a kernel hacker). > > It should read: > > arch_prctl(ARCH_X86_CET_DISABLE, unsigned long features) > Or arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* H. J. Lu: > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: >> >> * Dave Martin: >> >> > You're right that this has implications: for i386, libc probably pulls >> > more arguments off the stack than are really there in some situations. >> > This isn't a new problem though. There are already generic prctls with >> > fewer than 4 args that are used on x86. >> >> As originally posted, glibc prctl would have to know that it has to pull >> an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But >> then the u64 argument is a problem for arch_prctl as well. >> > > Argument of ARCH_X86_CET_DISABLE is int and passed in register. The commit message and the C source say otherwise, I think (not sure about the C source, not a kernel hacker). Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: > > * Dave Martin: > > > You're right that this has implications: for i386, libc probably pulls > > more arguments off the stack than are really there in some situations. > > This isn't a new problem though. There are already generic prctls with > > fewer than 4 args that are used on x86. > > As originally posted, glibc prctl would have to know that it has to pull > an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But > then the u64 argument is a problem for arch_prctl as well. > Argument of ARCH_X86_CET_DISABLE is int and passed in register. -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* Dave Martin: > You're right that this has implications: for i386, libc probably pulls > more arguments off the stack than are really there in some situations. > This isn't a new problem though. There are already generic prctls with > fewer than 4 args that are used on x86. As originally posted, glibc prctl would have to know that it has to pull an u64 argument off the argument list for ARCH_X86_CET_DISABLE. But then the u64 argument is a problem for arch_prctl as well. Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/26/20 11:49 AM, Yu, Yu-cheng wrote: >> I would expect things like Go and various JITs to call it directly. >> >> If we wanted to be fancy and add a potentially more widely useful >> syscall, how about: >> >> mmap_special(void *addr, size_t length, int prot, int flags, int type); >> >> Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, >> this is really just mmap() except that we want to map something a bit >> magical, and we don't want to require opening a device node to do it. > > One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. > Does ARM have similar needs for memory mapping, Dave? No idea. But, mmap_special() is *basically* mmap2() with extra-big flags space. I suspect it will grow some more uses on top of shadow stacks. It could have, for instance, been used to allocate MPX bounds tables.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Wed, Aug 26, 2020 at 11:49 AM Yu, Yu-cheng wrote: > > On 8/26/2020 10:04 AM, Andy Lutomirski wrote: > > On Wed, Aug 26, 2020 at 9:52 AM Florian Weimer wrote: > >> > >> * Dave Martin: > >> > >>> On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: > On 8/25/2020 4:20 PM, Dave Hansen wrote: > > On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: > I think this is more arch-specific. Even if it becomes a new > syscall, > we still need to pass the same parameters. > >>> > >>> Right, but without the copying in and out of memory. > >>> > >> Linux-api is already on the Cc list. Do we need to add more people to > >> get some agreements for the syscall? > > What kind of agreement are you looking for? I'd suggest just coding it > > up and posting the patches. Adding syscalls really is really pretty > > straightforward and isn't much code at all. > > > > Sure, I will do that. > >>> > >>> Alternatively, would a regular prctl() work here? > >> > >> Is this something appliation code has to call, or just the dynamic > >> loader? > >> > >> prctl in glibc is a variadic function, so if there's a mismatch between > >> the kernel/userspace syscall convention and the userspace calling > >> convention (for variadic functions) for specific types, it can't be made > >> to work in a generic way. > >> > >> The loader can use inline assembly for system calls and does not have > >> this issue, but applications would be implcated by it. > >> > > > > I would expect things like Go and various JITs to call it directly. > > > > If we wanted to be fancy and add a potentially more widely useful > > syscall, how about: > > > > mmap_special(void *addr, size_t length, int prot, int flags, int type); > > > > Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, > > this is really just mmap() except that we want to map something a bit > > magical, and we don't want to require opening a device node to do it. > > > > One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. > Does ARM have similar needs for memory mapping, Dave? > arch/arm64/include/uapi/asm/mman.h: #define PROT_BTI 0x10 /* BTI guarded page */ -- H.J.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/26/2020 10:04 AM, Andy Lutomirski wrote: On Wed, Aug 26, 2020 at 9:52 AM Florian Weimer wrote: * Dave Martin: On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: On 8/25/2020 4:20 PM, Dave Hansen wrote: On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: I think this is more arch-specific. Even if it becomes a new syscall, we still need to pass the same parameters. Right, but without the copying in and out of memory. Linux-api is already on the Cc list. Do we need to add more people to get some agreements for the syscall? What kind of agreement are you looking for? I'd suggest just coding it up and posting the patches. Adding syscalls really is really pretty straightforward and isn't much code at all. Sure, I will do that. Alternatively, would a regular prctl() work here? Is this something appliation code has to call, or just the dynamic loader? prctl in glibc is a variadic function, so if there's a mismatch between the kernel/userspace syscall convention and the userspace calling convention (for variadic functions) for specific types, it can't be made to work in a generic way. The loader can use inline assembly for system calls and does not have this issue, but applications would be implcated by it. I would expect things like Go and various JITs to call it directly. If we wanted to be fancy and add a potentially more widely useful syscall, how about: mmap_special(void *addr, size_t length, int prot, int flags, int type); Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, this is really just mmap() except that we want to map something a bit magical, and we don't want to require opening a device node to do it. One benefit of MMAP_SPECIAL_* is there are more free bits than MAP_*. Does ARM have similar needs for memory mapping, Dave? Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Wed, Aug 26, 2020 at 06:51:48PM +0200, Florian Weimer wrote: > * Dave Martin: > > > On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: > >> On 8/25/2020 4:20 PM, Dave Hansen wrote: > >> >On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: > >> I think this is more arch-specific. Even if it becomes a new syscall, > >> we still need to pass the same parameters. > >> >>> > >> >>>Right, but without the copying in and out of memory. > >> >>> > >> >>Linux-api is already on the Cc list. Do we need to add more people to > >> >>get some agreements for the syscall? > >> >What kind of agreement are you looking for? I'd suggest just coding it > >> >up and posting the patches. Adding syscalls really is really pretty > >> >straightforward and isn't much code at all. > >> > > >> > >> Sure, I will do that. > > > > Alternatively, would a regular prctl() work here? > > Is this something appliation code has to call, or just the dynamic > loader? > > prctl in glibc is a variadic function, so if there's a mismatch between > the kernel/userspace syscall convention and the userspace calling > convention (for variadic functions) for specific types, it can't be made > to work in a generic way. > > The loader can use inline assembly for system calls and does not have > this issue, but applications would be implcated by it. To the extent that this is a problem, libc's prctl() wrapper has to handle it already. New prctl() calls tend to demand precisely 4 arguments and require unused arguments to be 0, but this is more down to policy rather than because anything breaks otherwise. You're right that this has implications: for i386, libc probably pulls more arguments off the stack than are really there in some situations. This isn't a new problem though. There are already generic prctls with fewer than 4 args that are used on x86. Merging the actual prctl() and arch_prctl() syscalls doesn't acutally stop libc from retaining separate wrappers if they have different argument marshaling requirements in some corner cases. There might be some underlying reason by x86 has its own call and nobody else followed the same model, but I don't know what it is. Cheers ---Dave
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Wed, Aug 26, 2020 at 9:52 AM Florian Weimer wrote: > > * Dave Martin: > > > On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: > >> On 8/25/2020 4:20 PM, Dave Hansen wrote: > >> >On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: > >> I think this is more arch-specific. Even if it becomes a new syscall, > >> we still need to pass the same parameters. > >> >>> > >> >>>Right, but without the copying in and out of memory. > >> >>> > >> >>Linux-api is already on the Cc list. Do we need to add more people to > >> >>get some agreements for the syscall? > >> >What kind of agreement are you looking for? I'd suggest just coding it > >> >up and posting the patches. Adding syscalls really is really pretty > >> >straightforward and isn't much code at all. > >> > > >> > >> Sure, I will do that. > > > > Alternatively, would a regular prctl() work here? > > Is this something appliation code has to call, or just the dynamic > loader? > > prctl in glibc is a variadic function, so if there's a mismatch between > the kernel/userspace syscall convention and the userspace calling > convention (for variadic functions) for specific types, it can't be made > to work in a generic way. > > The loader can use inline assembly for system calls and does not have > this issue, but applications would be implcated by it. > I would expect things like Go and various JITs to call it directly. If we wanted to be fancy and add a potentially more widely useful syscall, how about: mmap_special(void *addr, size_t length, int prot, int flags, int type); Where type is something like MMAP_SPECIAL_X86_SHSTK. Fundamentally, this is really just mmap() except that we want to map something a bit magical, and we don't want to require opening a device node to do it. --Andy
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
* Dave Martin: > On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: >> On 8/25/2020 4:20 PM, Dave Hansen wrote: >> >On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: >> I think this is more arch-specific. Even if it becomes a new syscall, >> we still need to pass the same parameters. >> >>> >> >>>Right, but without the copying in and out of memory. >> >>> >> >>Linux-api is already on the Cc list. Do we need to add more people to >> >>get some agreements for the syscall? >> >What kind of agreement are you looking for? I'd suggest just coding it >> >up and posting the patches. Adding syscalls really is really pretty >> >straightforward and isn't much code at all. >> > >> >> Sure, I will do that. > > Alternatively, would a regular prctl() work here? Is this something appliation code has to call, or just the dynamic loader? prctl in glibc is a variadic function, so if there's a mismatch between the kernel/userspace syscall convention and the userspace calling convention (for variadic functions) for specific types, it can't be made to work in a generic way. The loader can use inline assembly for system calls and does not have this issue, but applications would be implcated by it. Thanks, Florian
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: > On 8/25/2020 4:20 PM, Dave Hansen wrote: > >On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: > I think this is more arch-specific. Even if it becomes a new syscall, > we still need to pass the same parameters. > >>> > >>>Right, but without the copying in and out of memory. > >>> > >>Linux-api is already on the Cc list. Do we need to add more people to > >>get some agreements for the syscall? > >What kind of agreement are you looking for? I'd suggest just coding it > >up and posting the patches. Adding syscalls really is really pretty > >straightforward and isn't much code at all. > > > > Sure, I will do that. Alternatively, would a regular prctl() work here? arch_prctl() feels like a historical weirdness for x86 -- other arches all seem to be using regular prctl(), which allows for 4 args. I don't know the history behind the difference here. (Since prctl() and arch_prctl() use non-clashing command numbers, I had wondered whether it would be worth just merging the x86 calls in with the rest and making the two calls aliases. That's one for later, though...) Cheers ---Dave
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/25/2020 4:20 PM, Dave Hansen wrote: On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: I think this is more arch-specific. Even if it becomes a new syscall, we still need to pass the same parameters. Right, but without the copying in and out of memory. Linux-api is already on the Cc list. Do we need to add more people to get some agreements for the syscall? What kind of agreement are you looking for? I'd suggest just coding it up and posting the patches. Adding syscalls really is really pretty straightforward and isn't much code at all. Sure, I will do that. Thanks, Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: >>> I think this is more arch-specific. Even if it becomes a new syscall, >>> we still need to pass the same parameters. >> >> Right, but without the copying in and out of memory. >> > Linux-api is already on the Cc list. Do we need to add more people to > get some agreements for the syscall? What kind of agreement are you looking for? I'd suggest just coding it up and posting the patches. Adding syscalls really is really pretty straightforward and isn't much code at all.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/25/2020 12:19 PM, Dave Hansen wrote: On 8/25/20 11:43 AM, Yu, Yu-cheng wrote: arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) Allocate a new shadow stack. The parameter 'args' is a pointer to a user buffer. *args = desired size *(args + 1) = MAP_32BIT or MAP_POPULATE On returning, *args is the allocated shadow stack address. This is hideous. Would this be better as a new syscall? Could you point out why this is hideous, so that I can modify the arch_prctl? Passing values in memory is hideous when we don't have to. A syscall would let you have separate arguments for size and flags and would also let you have a nice return value instead of needing to do that in memory too. That is a good justification. I think this is more arch-specific. Even if it becomes a new syscall, we still need to pass the same parameters. Right, but without the copying in and out of memory. Linux-api is already on the Cc list. Do we need to add more people to get some agreements for the syscall?
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/25/20 11:43 AM, Yu, Yu-cheng wrote: >>> arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) >>> Allocate a new shadow stack. >>> >>> The parameter 'args' is a pointer to a user buffer. >>> >>> *args = desired size >>> *(args + 1) = MAP_32BIT or MAP_POPULATE >>> >>> On returning, *args is the allocated shadow stack address. >> >> This is hideous. Would this be better as a new syscall? > > Could you point out why this is hideous, so that I can modify the > arch_prctl? Passing values in memory is hideous when we don't have to. A syscall would let you have separate arguments for size and flags and would also let you have a nice return value instead of needing to do that in memory too. > I think this is more arch-specific. Even if it becomes a new syscall, > we still need to pass the same parameters. Right, but without the copying in and out of memory.
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 8/24/2020 5:36 PM, Andy Lutomirski wrote: On Mon, Aug 24, 2020 at 5:30 PM Yu-cheng Yu wrote: arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) Allocate a new shadow stack. The parameter 'args' is a pointer to a user buffer. *args = desired size *(args + 1) = MAP_32BIT or MAP_POPULATE On returning, *args is the allocated shadow stack address. This is hideous. Would this be better as a new syscall? Could you point out why this is hideous, so that I can modify the arch_prctl? I think this is more arch-specific. Even if it becomes a new syscall, we still need to pass the same parameters. Yu-cheng
Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On Mon, Aug 24, 2020 at 5:30 PM Yu-cheng Yu wrote: > arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) > Allocate a new shadow stack. > > The parameter 'args' is a pointer to a user buffer. > > *args = desired size > *(args + 1) = MAP_32BIT or MAP_POPULATE > > On returning, *args is the allocated shadow stack address. This is hideous. Would this be better as a new syscall? --Andy
[PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
arch_prctl(ARCH_X86_CET_STATUS, u64 *args) Get CET feature status. The parameter 'args' is a pointer to a user buffer. The kernel returns the following information: *args = shadow stack/IBT status *(args + 1) = shadow stack base address *(args + 2) = shadow stack size arch_prctl(ARCH_X86_CET_DISABLE, u64 features) Disable CET features specified in 'features'. Return -EPERM if CET is locked. arch_prctl(ARCH_X86_CET_LOCK) Lock in CET features. arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) Allocate a new shadow stack. The parameter 'args' is a pointer to a user buffer. *args = desired size *(args + 1) = MAP_32BIT or MAP_POPULATE On returning, *args is the allocated shadow stack address. Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to 'arg2', as it is now also passed to prctl_cet(). Signed-off-by: Yu-cheng Yu --- v11: - Check input for invalid features. - Fix prctl_cet() return values. - Change ARCH_X86_CET_ALLOC_SHSTK to ARCH_X86_CET_MMAP_SHSTK to take MAP_32BIT, MAP_POPULATE as inputs. v10: - Verify CET is enabled before handling arch_prctl. - Change input parameters from unsigned long to u64, to make it clear they are 64-bit. arch/x86/include/asm/cet.h | 4 + arch/x86/include/uapi/asm/prctl.h | 5 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cet.c | 26 +++ arch/x86/kernel/cet_prctl.c | 98 + arch/x86/kernel/process.c | 6 +- tools/arch/x86/include/uapi/asm/prctl.h | 5 ++ 7 files changed, 142 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kernel/cet_prctl.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 71dc92acd2f2..f7eb197998ad 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -14,16 +14,20 @@ struct sc_ext; struct cet_status { unsigned long shstk_base; unsigned long shstk_size; + unsigned intlocked:1; }; #ifdef CONFIG_X86_INTEL_CET +int prctl_cet(int option, u64 arg2); int cet_setup_shstk(void); int cet_setup_thread_shstk(struct task_struct *p); +unsigned long cet_alloc_shstk(unsigned long size, int flags); void cet_disable_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; } static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; } static inline void cet_disable_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..3aaac13cdc87 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,9 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 +#define ARCH_X86_CET_MMAP_SHSTK0x3004 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 76f27f518266..97556e4204d6 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,7 +145,7 @@ obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o -obj-$(CONFIG_X86_INTEL_CET)+= cet.o +obj-$(CONFIG_X86_INTEL_CET)+= cet.o cet_prctl.o ### # 64 bit specific files diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index b30c61a66c8e..2bf1a6b6abb6 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -148,6 +148,32 @@ static int create_rstor_token(bool ia32, unsigned long ssp, return 0; } +unsigned long cet_alloc_shstk(unsigned long len, int flags) +{ + unsigned long token; + unsigned long addr, ssp; + + addr = alloc_shstk(round_up(len, PAGE_SIZE), flags); + + if (IS_ERR_VALUE(addr)) + return addr; + + /* Restore token is 8 bytes and aligned to 8 bytes */ + ssp = addr + len; + token = ssp; + + if (!in_ia32_syscall()) + token |= TOKEN_MODE_64; + ssp -= 8; + + if (write_user_shstk_64(ssp, token)) { + vm_munmap(addr, len); + return -EINVAL; + } + + return addr; +} + int cet_setup_shstk(void) { unsigned long addr, size; diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c new file mode 100644 index ..cc49eef08ab0 --- /dev/null +++ b/arch/x86/kernel/cet_prctl.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: