Re: [PATCH] x86: panic when a kernel stack overflow is detected
On Sun, Jul 28, 2019 at 08:53:58PM -0700, Andy Lutomirski wrote: > On Sun, Jul 28, 2019 at 6:59 PM Daniel Axtens wrote: > > > > Currently, when a kernel stack overflow is detected via VMAP_STACK, > > the task is killed with die(). > > > > This isn't safe, because we don't know how that process has affected > > kernel state. In particular, we don't know what locks have been taken. > > For example, we can hit a case with lkdtm where a thread takes a > > stack overflow in printk() after taking the logbuf_lock. In that case, > > we deadlock when the kernel next does a printk. > > > > Do not attempt to kill the process when a kernel stack overflow is > > detected. The system state is unknown, the only safe thing to do is > > panic(). (panic() also prints without taking locks so a useful debug > > splat is printed even when logbuf_lock is held.) > > The thing I don't like about this is that it reduces the chance that > we successfully log anything to disk. > > PeterZ, do you have any useful input here? I wonder if we could do > something like printk_oh_crap() that is just printk() except that it > panics if it fails to return after a few seconds. People are already had at work rewriting printk. The current thing is unfixable. Then again, I don't know if there's any sane options aside of early serial. Still, mucking with printk won't help you at all if the task is holding some other/filesystem lock required to do that writeback.
Re: [PATCH] x86: panic when a kernel stack overflow is detected
On Sun, Jul 28, 2019 at 6:59 PM Daniel Axtens wrote: > > Currently, when a kernel stack overflow is detected via VMAP_STACK, > the task is killed with die(). > > This isn't safe, because we don't know how that process has affected > kernel state. In particular, we don't know what locks have been taken. > For example, we can hit a case with lkdtm where a thread takes a > stack overflow in printk() after taking the logbuf_lock. In that case, > we deadlock when the kernel next does a printk. > > Do not attempt to kill the process when a kernel stack overflow is > detected. The system state is unknown, the only safe thing to do is > panic(). (panic() also prints without taking locks so a useful debug > splat is printed even when logbuf_lock is held.) The thing I don't like about this is that it reduces the chance that we successfully log anything to disk. PeterZ, do you have any useful input here? I wonder if we could do something like printk_oh_crap() that is just printk() except that it panics if it fails to return after a few seconds. --Andy
[PATCH] x86: panic when a kernel stack overflow is detected
Currently, when a kernel stack overflow is detected via VMAP_STACK, the task is killed with die(). This isn't safe, because we don't know how that process has affected kernel state. In particular, we don't know what locks have been taken. For example, we can hit a case with lkdtm where a thread takes a stack overflow in printk() after taking the logbuf_lock. In that case, we deadlock when the kernel next does a printk. Do not attempt to kill the process when a kernel stack overflow is detected. The system state is unknown, the only safe thing to do is panic(). (panic() also prints without taking locks so a useful debug splat is printed even when logbuf_lock is held.) Reported-by: Marco Elver Signed-off-by: Daniel Axtens --- arch/x86/kernel/traps.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4bb0f8447112..bfb0ec667c09 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -301,13 +301,14 @@ __visible void __noreturn handle_stack_overflow(const char *message, struct pt_regs *regs, unsigned long fault_address) { - printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n", -(void *)fault_address, current->stack, -(char *)current->stack + THREAD_SIZE - 1); - die(message, regs, 0); + /* +* It's not safe to kill the task, as it's in kernel space and +* might be holding important locks. Just panic. +*/ - /* Be absolutely certain we don't return. */ - panic("%s", message); + panic("%s - stack guard page was hit at %p (stack is %p..%p)", + message, (void *)fault_address, current->stack, + (char *)current->stack + THREAD_SIZE - 1); } #endif -- 2.20.1