Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 10:51 AM Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju > wrote: > > > > [Old patch from yesterday] > > > > After applying your patch on top on linux next tag 20201015 > > there are two observations, > > 1) i386 build failed. please find build error build > > Yes, this was expected. That patch explicitly only works on x86-64, > because 32-bit needs the double register handling for 64-bit values > (mainly loff_t). > > > 2) x86_64 kasan test PASS and the reported error not found. > > Ok, good. That confirms that the problem you reported is indeed the > register allocation. > > The patch I sent an hour ago (the one based on Rasmus' one from > yesterday) should fix things too, and - unlike yesterday's - work on > 32-bit. > > But I'll wait for confirmation (and hopefully a sign-off from Rasmus > so that I can give him authorship) before actually committing it. > > Linus My test vm failed to boot since commit d55564cfc222326e944893eff0c4118353e349ec x86: Make __put_user() generate an out-of-line call The patch also fixed it. Thanks! Song
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju wrote: > > [Old patch from yesterday] > > After applying your patch on top on linux next tag 20201015 > there are two observations, > 1) i386 build failed. please find build error build Yes, this was expected. That patch explicitly only works on x86-64, because 32-bit needs the double register handling for 64-bit values (mainly loff_t). > 2) x86_64 kasan test PASS and the reported error not found. Ok, good. That confirms that the problem you reported is indeed the register allocation. The patch I sent an hour ago (the one based on Rasmus' one from yesterday) should fix things too, and - unlike yesterday's - work on 32-bit. But I'll wait for confirmation (and hopefully a sign-off from Rasmus so that I can give him authorship) before actually committing it. Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, 23 Oct 2020 at 22:03, Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds > wrote: > > > > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes > > wrote: > > > > > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > > > could break something else), but the obvious first step of rearranging > > > so that the ptr argument is evaluated before the assignment to __val_pu > > > > Ack. We could do that. > > > > I'm more inclined to just bite the bullet and go back to the ugly > > conditional on the size that I had hoped to avoid, but if that turns > > out too ugly, mind signing off on your patch and I'll have that as a > > fallback? > > Actually, looking at that code, and the fact that we've used the > "register asm()" format forever for the get_user() side, I think your > approach is the right one. > > I'd rename the internal ptr variable to "__ptr_pu", and make sure the > assignments happen just before the asm call (with the __val_pu > assignment being the final thing). > > lso, it needs to be > > void __user *__ptr_pu; > > instead of > > __typeof__(ptr) __ptr = (ptr); > > because "ptr" may actually be an array, and we need to have the usual > C "array to pointer" conversions happen, rather than try to make > __ptr_pu be an array too. > > So the patch would become something like the appended instead, but I'd > still like your sign-off (and I'd put you as author of the fix). > > Narest, can you confirm that this patch fixes the issue for you? This patch fixed the reported problem. Tested-by: Naresh Kamboju Build location: https://builds.tuxbuild.com/uDAiW8jkN61oWoyxZDkEYA/ Test logs, https://lkft.validation.linaro.org/scheduler/job/1868045#L1597 - Naresh
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, 23 Oct 2020 at 08:35, Linus Torvalds wrote: > > On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > > > The kernel Naresh originally referred to is here: > > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ > > is unnecessary (because the 8-byte case is still just a single > register, no %eax:%edx games needed), it would be interesting to hear > if the attached patch fixes it. That would confirm that the problem > really is due to some register allocation issue interaction (or, > alternatively, it would tell me that there's something else going on). [Old patch from yesterday] After applying your patch on top on linux next tag 20201015 there are two observations, 1) i386 build failed. please find build error build 2) x86_64 kasan test PASS and the reported error not found. i386 build failure, -- make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=i386 HOSTCC=gcc CC="sccache gcc" O=build # In file included from ../include/linux/uaccess.h:11, from ../arch/x86/include/asm/fpu/xstate.h:5, from ../arch/x86/include/asm/pgtable.h:26, from ../include/linux/pgtable.h:6, from ../include/linux/mm.h:33, from ../include/linux/memblock.h:13, from ../fs/proc/page.c:2: ../fs/proc/page.c: In function ‘kpagecgroup_read’: ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../fs/proc/page.c:307:7: note: in expansion of macro ‘put_user’ 307 | if (put_user(ino, out)) { | ^~~~ make[3]: *** [../scripts/Makefile.build:283: fs/proc/page.o] Error 1 make[3]: Target '__build' not remade because of errors. make[2]: *** [../scripts/Makefile.build:500: fs/proc] Error 2 In file included from ../include/linux/uaccess.h:11, from ../include/linux/sched/task.h:11, from ../include/linux/sched/signal.h:9, from ../include/linux/rcuwait.h:6, from ../include/linux/percpu-rwsem.h:7, from ../include/linux/fs.h:33, from ../include/linux/cgroup.h:17, from ../include/linux/memcontrol.h:13, from ../include/linux/swap.h:9, from ../include/linux/suspend.h:5, from ../kernel/power/user.c:10: ../kernel/power/user.c: In function ‘snapshot_ioctl’: ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:340:11: note: in expansion of macro ‘put_user’ 340 | error = put_user(size, (loff_t __user *)arg); | ^~~~ ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:346:11: note: in expansion of macro ‘put_user’ 346 | error = put_user(size, (loff_t __user *)arg); | ^~~~ ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:357:12: note: in expansion of macro ‘put_user’ 357 |error = put_user(offset, (loff_t __user *)arg); |^~~~ x86_64 Kasan tested and the reported issue not found. https://lkft.validation.linaro.org/scheduler/job/1868029#L2374 - Naresh
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes > wrote: > > > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > > could break something else), but the obvious first step of rearranging > > so that the ptr argument is evaluated before the assignment to __val_pu > > Ack. We could do that. > > I'm more inclined to just bite the bullet and go back to the ugly > conditional on the size that I had hoped to avoid, but if that turns > out too ugly, mind signing off on your patch and I'll have that as a > fallback? Actually, looking at that code, and the fact that we've used the "register asm()" format forever for the get_user() side, I think your approach is the right one. I'd rename the internal ptr variable to "__ptr_pu", and make sure the assignments happen just before the asm call (with the __val_pu assignment being the final thing). lso, it needs to be void __user *__ptr_pu; instead of __typeof__(ptr) __ptr = (ptr); because "ptr" may actually be an array, and we need to have the usual C "array to pointer" conversions happen, rather than try to make __ptr_pu be an array too. So the patch would become something like the appended instead, but I'd still like your sign-off (and I'd put you as author of the fix). Narest, can you confirm that this patch fixes the issue for you? Linus patch Description: Binary data
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes wrote: > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > could break something else), but the obvious first step of rearranging > so that the ptr argument is evaluated before the assignment to __val_pu Ack. We could do that. I'm more inclined to just bite the bullet and go back to the ugly conditional on the size that I had hoped to avoid, but if that turns out too ugly, mind signing off on your patch and I'll have that as a fallback? Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 10:02 PM Sean Christopherson wrote: > > I haven't reproduced the crash, but I did find a smoking gun that confirms the > "register shenanigans are evil shenanigans" theory. I ran into a similar > thing > recently where a seemingly innocuous line of code after loading a value into a > register variable wreaked havoc because it clobbered the input register. Yup, that certainly looks like the smoking gun. Thanks for finding an example of this, clearly I'll have to either go back to the "conditionally use 'A' or 'a' depending on size" model, or perhaps try Rasmus' patch. Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On 23/10/2020 07.02, Sean Christopherson wrote: > On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote: >> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: >>> >>> The kernel Naresh originally referred to is here: >>> https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ >> >> Thanks. >> >> And when I started looking at it, I realized that my original idea >> ("just look for __put_user_nocheck_X calls, there aren't so many of >> those") was garbage, and that I was just being stupid. >> >> Yes, the commit that broke was about __put_user(), but in order to not >> duplicate all the code, it re-used the regular put_user() >> infrastructure, and so all the normal put_user() calls are potential >> problem spots too if this is about the compiler interaction with KASAN >> and the asm changes. >> >> So it's not just a couple of special cases to look at, it's all the >> normal cases too. >> >> Ok, back to the drawing board, but I think reverting it is probably >> the right thing to do if I can't think of something smart. >> >> That said, since you see this on x86-64, where the whole ugly trick with that >> >>register asm("%"_ASM_AX) >> >> is unnecessary (because the 8-byte case is still just a single >> register, no %eax:%edx games needed), it would be interesting to hear >> if the attached patch fixes it. That would confirm that the problem >> really is due to some register allocation issue interaction (or, >> alternatively, it would tell me that there's something else going on). > > I haven't reproduced the crash, but I did find a smoking gun that confirms the > "register shenanigans are evil shenanigans" theory. I ran into a similar > thing > recently where a seemingly innocuous line of code after loading a value into a > register variable wreaked havoc because it clobbered the input register. > > This put_user() in schedule_tail(): > >if (current->set_child_tid) >put_user(task_pid_vnr(current), current->set_child_tid); > > generates the following assembly with KASAN out-of-line: > >0x810dccc9 <+73>: xor%edx,%edx >0x810dcccb <+75>: xor%esi,%esi >0x810dcccd <+77>: mov%rbp,%rdi >0x810dccd0 <+80>: callq 0x810bf5e0 <__task_pid_nr_ns> >0x810dccd5 <+85>: mov%r12,%rdi >0x810dccd8 <+88>: callq 0x81388c60 <__asan_load8> >0x810dccdd <+93>: mov0x590(%rbp),%rcx >0x810dcce4 <+100>: callq 0x817708a0 <__put_user_4> >0x810dcce9 <+105>: pop%rbx >0x810dccea <+106>: pop%rbp >0x810dcceb <+107>: pop%r12 > > __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by > __asan_load8()'s check on current for the current->set_child_tid dereference. > Yup, and you don't need KASAN to implicitly generate function calls for you. With x86_64 defconfig, I get extern u64 __user *get_destination(int x, int y); void pu_test(void) { u64 big = 0x1234abcd5678; if (put_user(big, get_destination(4, 5))) pr_warn("uh"); } to generate 4d60 : 4d60: 53 push %rbx 4d61: be 05 00 00 00 mov$0x5,%esi 4d66: bf 04 00 00 00 mov$0x4,%edi 4d6b: e8 00 00 00 00 callq 4d70 4d6c: R_X86_64_PC32 get_destination-0x4 4d70: 48 89 c1mov%rax,%rcx 4d73: e8 00 00 00 00 callq 4d78 4d74: R_X86_64_PC32 __put_user_8-0x4 4d78: 85 c9 test %ecx,%ecx 4d7a: 75 02 jne4d7e 4d7c: 5b pop%rbx 4d7d: c3 retq 4d7e: 5b pop%rbx 4d7f: 48 c7 c7 00 00 00 00mov$0x0,%rdi 4d82: R_X86_64_32S .rodata.str1.1+0xfa 4d86: e9 00 00 00 00 jmpq 4d8b 4d87: R_X86_64_PC32 printk-0x4 That's certainly garbage. Now, I don't know if it's a sufficient fix (or could break something else), but the obvious first step of rearranging so that the ptr argument is evaluated before the assignment to __val_pu diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 477c503f2753..b5d3290fcd09 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void); #define do_put_user_call(fn,x,ptr) \ ({ \ int __ret_pu; \ - register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); \ + __typeof__(ptr) __ptr = (ptr); \ + register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX)
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote: > On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > > > The kernel Naresh originally referred to is here: > > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ > > Thanks. > > And when I started looking at it, I realized that my original idea > ("just look for __put_user_nocheck_X calls, there aren't so many of > those") was garbage, and that I was just being stupid. > > Yes, the commit that broke was about __put_user(), but in order to not > duplicate all the code, it re-used the regular put_user() > infrastructure, and so all the normal put_user() calls are potential > problem spots too if this is about the compiler interaction with KASAN > and the asm changes. > > So it's not just a couple of special cases to look at, it's all the > normal cases too. > > Ok, back to the drawing board, but I think reverting it is probably > the right thing to do if I can't think of something smart. > > That said, since you see this on x86-64, where the whole ugly trick with that > >register asm("%"_ASM_AX) > > is unnecessary (because the 8-byte case is still just a single > register, no %eax:%edx games needed), it would be interesting to hear > if the attached patch fixes it. That would confirm that the problem > really is due to some register allocation issue interaction (or, > alternatively, it would tell me that there's something else going on). I haven't reproduced the crash, but I did find a smoking gun that confirms the "register shenanigans are evil shenanigans" theory. I ran into a similar thing recently where a seemingly innocuous line of code after loading a value into a register variable wreaked havoc because it clobbered the input register. This put_user() in schedule_tail(): if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); generates the following assembly with KASAN out-of-line: 0x810dccc9 <+73>: xor%edx,%edx 0x810dcccb <+75>: xor%esi,%esi 0x810dcccd <+77>: mov%rbp,%rdi 0x810dccd0 <+80>: callq 0x810bf5e0 <__task_pid_nr_ns> 0x810dccd5 <+85>: mov%r12,%rdi 0x810dccd8 <+88>: callq 0x81388c60 <__asan_load8> 0x810dccdd <+93>: mov0x590(%rbp),%rcx 0x810dcce4 <+100>: callq 0x817708a0 <__put_user_4> 0x810dcce9 <+105>: pop%rbx 0x810dccea <+106>: pop%rbp 0x810dcceb <+107>: pop%r12 __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by __asan_load8()'s check on current for the current->set_child_tid dereference.
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > The kernel Naresh originally referred to is here: > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ Thanks. And when I started looking at it, I realized that my original idea ("just look for __put_user_nocheck_X calls, there aren't so many of those") was garbage, and that I was just being stupid. Yes, the commit that broke was about __put_user(), but in order to not duplicate all the code, it re-used the regular put_user() infrastructure, and so all the normal put_user() calls are potential problem spots too if this is about the compiler interaction with KASAN and the asm changes. So it's not just a couple of special cases to look at, it's all the normal cases too. Ok, back to the drawing board, but I think reverting it is probably the right thing to do if I can't think of something smart. That said, since you see this on x86-64, where the whole ugly trick with that register asm("%"_ASM_AX) is unnecessary (because the 8-byte case is still just a single register, no %eax:%edx games needed), it would be interesting to hear if the attached patch fixes it. That would confirm that the problem really is due to some register allocation issue interaction (or, alternatively, it would tell me that there's something else going on). Linus patch Description: Binary data
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
Hello! On Thu, 22 Oct 2020 at 19:11, Linus Torvalds wrote: > On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds > Would you mind sending me the problematic vmlinux file in private (or, > likely better - a pointer to some place I can download it, it's going > to be huge). The kernel Naresh originally referred to is here: https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ Greetings! Daniel Díaz daniel.d...@linaro.org