kettenis, mortimer, and I have had long discussions about the uvm_map_inentry() checks in MD trap.c being excessively broad. We can this check-function less often.
I don't want to explain the full picture about how SP and PC checks in syscall() and trap() hinder ROP, ROP stack pivoting, gadgets with unfortunate memory access side-effects, etc. But... When a ROP chain pivots onto a non-MAP_STACK chunk of data, to use a data/heap loaded secondary chain, it will eventually want to do a system call. (a) Thus, we force the chain to pivot back to MAP_STACK memory before doing the first syscall, otherwise the SP check in syscall() will kill it. This pivot-back is additional labour not neccessary for attack on other operating systems. (b) We also do sp checking at trap time. On most architectures it is currently done for MANY types of userland traps. (Additionally, it can tsleep, which exposed a bunch of old bugs on many architecturs we fixed at k2k20). We now propose only performing uvm_map_inentry() for the standard userland "page fault" case which goes to uvm_fault() It is pointless (and costly) to use interrupts, or other strange and rare traps, hoping to catch a rop chain which is just chewing through properly mapped memory. Those other traps tend to also get us into issues like tsleep. Without a performance cost, and negligable security benefit. However, unmapped memory is the achilles heel. A poorly written chain will eventually accesseses unmapped memory because it is using a gadget which has a memory-access side effect (and references to a not-yet-mapped page (either cow fault, or zfod, or demand-from-vnode), or because it is trying to use memory-scanning or sled methods). Additionally, such unmapped memory accesses maybe done as ptr derefs against uncontrolled registers, and libc randomization plays a powerful role (but the margin way too narrow) So for an attacker, this reduces gadget selection to ONLY THE BEST GADGETS -- meaning, minimal or no side effects. We've reduced gadget count by a lot, placed code in very random places, so this becomes a further hinderance in selection. By narrowing the check, it will bring back a bit of performance from before the addition of trap.c uvm_map_inentry() Here are the first four simple ones: i386, amd64, alpha, sh Others are a bit more complex, and will follow. Index: alpha/alpha/trap.c =================================================================== RCS file: /cvs/src/sys/arch/alpha/alpha/trap.c,v retrieving revision 1.89 diff -u -p -u -r1.89 trap.c --- alpha/alpha/trap.c 19 Aug 2020 10:10:57 -0000 1.89 +++ alpha/alpha/trap.c 23 Sep 2020 12:18:25 -0000 @@ -244,10 +244,6 @@ trap(a0, a1, a2, entry, framep) if (user) { p->p_md.md_tf = framep; refreshcreds(p); - if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), - "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", - uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) - goto out; } switch (entry) { @@ -390,6 +386,12 @@ trap(a0, a1, a2, entry, framep) struct vm_map *map; int rv; extern struct vm_map *kernel_map; + + if (user && + !uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), + "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", + uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) + goto out; switch (a2) { case -1: /* instruction fetch fault */ Index: amd64/amd64/trap.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/trap.c,v retrieving revision 1.81 diff -u -p -u -r1.81 trap.c --- amd64/amd64/trap.c 14 Sep 2020 12:51:28 -0000 1.81 +++ amd64/amd64/trap.c 23 Sep 2020 12:17:10 -0000 @@ -343,11 +343,6 @@ usertrap(struct trapframe *frame) p->p_md.md_regs = frame; refreshcreds(p); - if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), - "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", - uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) - goto out; - switch (type) { case T_PROTFLT: /* protection fault */ case T_TSSFLT: @@ -381,6 +376,10 @@ usertrap(struct trapframe *frame) break; case T_PAGEFLT: /* page fault */ + if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), + "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", + uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) + goto out; if (pageflttrap(frame, cr2, 1)) goto out; /* FALLTHROUGH */ Index: i386/i386/trap.c =================================================================== RCS file: /cvs/src/sys/arch/i386/i386/trap.c,v retrieving revision 1.145 diff -u -p -u -r1.145 trap.c --- i386/i386/trap.c 15 Sep 2020 09:30:31 -0000 1.145 +++ i386/i386/trap.c 23 Sep 2020 12:16:23 -0000 @@ -153,10 +153,6 @@ trap(struct trapframe *frame) type |= T_USER; p->p_md.md_regs = frame; refreshcreds(p); - if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), - "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", - uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) - goto out; } switch (type) { @@ -352,6 +348,10 @@ trap(struct trapframe *frame) int error; int signal, sicode; + if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), + "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", + uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) + goto out; KERNEL_LOCK(); faultcommon: vm = p->p_vmspace; Index: sh/sh/trap.c =================================================================== RCS file: /cvs/src/sys/arch/sh/sh/trap.c,v retrieving revision 1.42 diff -u -p -u -r1.42 trap.c --- sh/sh/trap.c 22 Sep 2020 15:50:20 -0000 1.42 +++ sh/sh/trap.c 23 Sep 2020 12:19:30 -0000 @@ -173,12 +173,6 @@ general_exception(struct proc *p, struct KDASSERT(p->p_md.md_regs == tf); /* check exception depth */ expevt |= EXP_USER; refreshcreds(p); - if (tra != _SH_TRA_SYSCALL << 2) { - if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), - "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", - uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) - goto out; - } } switch (expevt) { @@ -389,6 +383,11 @@ tlb_exception(struct proc *p, struct tra /* Select address space */ if (usermode) { + if (!uvm_map_inentry(p, &p->p_spinentry, PROC_STACK(p), + "[%s]%d/%d sp=%lx inside %lx-%lx: not MAP_STACK\n", + uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) + goto out; + TLB_ASSERT(p != NULL, "no curproc"); map = &p->p_vmspace->vm_map; pmap = map->pmap;