Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 10:51 AM Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju > wrote: > > > > [Old patch from yesterday] > > > > After applying your patch on top on linux next tag 20201015 > > there are two observations, > > 1) i386 build failed. please find build error build > > Yes, this was expected. That patch explicitly only works on x86-64, > because 32-bit needs the double register handling for 64-bit values > (mainly loff_t). > > > 2) x86_64 kasan test PASS and the reported error not found. > > Ok, good. That confirms that the problem you reported is indeed the > register allocation. > > The patch I sent an hour ago (the one based on Rasmus' one from > yesterday) should fix things too, and - unlike yesterday's - work on > 32-bit. > > But I'll wait for confirmation (and hopefully a sign-off from Rasmus > so that I can give him authorship) before actually committing it. > > Linus My test vm failed to boot since commit d55564cfc222326e944893eff0c4118353e349ec x86: Make __put_user() generate an out-of-line call The patch also fixed it. Thanks! Song
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju wrote: > > [Old patch from yesterday] > > After applying your patch on top on linux next tag 20201015 > there are two observations, > 1) i386 build failed. please find build error build Yes, this was expected. That patch explicitly only works on x86-64, because 32-bit needs the double register handling for 64-bit values (mainly loff_t). > 2) x86_64 kasan test PASS and the reported error not found. Ok, good. That confirms that the problem you reported is indeed the register allocation. The patch I sent an hour ago (the one based on Rasmus' one from yesterday) should fix things too, and - unlike yesterday's - work on 32-bit. But I'll wait for confirmation (and hopefully a sign-off from Rasmus so that I can give him authorship) before actually committing it. Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, 23 Oct 2020 at 22:03, Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds > wrote: > > > > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes > > wrote: > > > > > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > > > could break something else), but the obvious first step of rearranging > > > so that the ptr argument is evaluated before the assignment to __val_pu > > > > Ack. We could do that. > > > > I'm more inclined to just bite the bullet and go back to the ugly > > conditional on the size that I had hoped to avoid, but if that turns > > out too ugly, mind signing off on your patch and I'll have that as a > > fallback? > > Actually, looking at that code, and the fact that we've used the > "register asm()" format forever for the get_user() side, I think your > approach is the right one. > > I'd rename the internal ptr variable to "__ptr_pu", and make sure the > assignments happen just before the asm call (with the __val_pu > assignment being the final thing). > > lso, it needs to be > > void __user *__ptr_pu; > > instead of > > __typeof__(ptr) __ptr = (ptr); > > because "ptr" may actually be an array, and we need to have the usual > C "array to pointer" conversions happen, rather than try to make > __ptr_pu be an array too. > > So the patch would become something like the appended instead, but I'd > still like your sign-off (and I'd put you as author of the fix). > > Narest, can you confirm that this patch fixes the issue for you? This patch fixed the reported problem. Tested-by: Naresh Kamboju Build location: https://builds.tuxbuild.com/uDAiW8jkN61oWoyxZDkEYA/ Test logs, https://lkft.validation.linaro.org/scheduler/job/1868045#L1597 - Naresh
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, 23 Oct 2020 at 08:35, Linus Torvalds wrote: > > On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > > > The kernel Naresh originally referred to is here: > > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ > > is unnecessary (because the 8-byte case is still just a single > register, no %eax:%edx games needed), it would be interesting to hear > if the attached patch fixes it. That would confirm that the problem > really is due to some register allocation issue interaction (or, > alternatively, it would tell me that there's something else going on). [Old patch from yesterday] After applying your patch on top on linux next tag 20201015 there are two observations, 1) i386 build failed. please find build error build 2) x86_64 kasan test PASS and the reported error not found. i386 build failure, -- make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=i386 HOSTCC=gcc CC="sccache gcc" O=build # In file included from ../include/linux/uaccess.h:11, from ../arch/x86/include/asm/fpu/xstate.h:5, from ../arch/x86/include/asm/pgtable.h:26, from ../include/linux/pgtable.h:6, from ../include/linux/mm.h:33, from ../include/linux/memblock.h:13, from ../fs/proc/page.c:2: ../fs/proc/page.c: In function ‘kpagecgroup_read’: ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../fs/proc/page.c:307:7: note: in expansion of macro ‘put_user’ 307 | if (put_user(ino, out)) { | ^~~~ make[3]: *** [../scripts/Makefile.build:283: fs/proc/page.o] Error 1 make[3]: Target '__build' not remade because of errors. make[2]: *** [../scripts/Makefile.build:500: fs/proc] Error 2 In file included from ../include/linux/uaccess.h:11, from ../include/linux/sched/task.h:11, from ../include/linux/sched/signal.h:9, from ../include/linux/rcuwait.h:6, from ../include/linux/percpu-rwsem.h:7, from ../include/linux/fs.h:33, from ../include/linux/cgroup.h:17, from ../include/linux/memcontrol.h:13, from ../include/linux/swap.h:9, from ../include/linux/suspend.h:5, from ../kernel/power/user.c:10: ../kernel/power/user.c: In function ‘snapshot_ioctl’: ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:340:11: note: in expansion of macro ‘put_user’ 340 | error = put_user(size, (loff_t __user *)arg); | ^~~~ ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:346:11: note: in expansion of macro ‘put_user’ 346 | error = put_user(size, (loff_t __user *)arg); | ^~~~ ../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand constraints in an ‘asm’ 217 | asm volatile("call __" #fn "_%P[size]"\ | ^~~ ../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro ‘do_put_user_call’ 244 | #define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); }) |^~~~ ../kernel/power/user.c:357:12: note: in expansion of macro ‘put_user’ 357 |error = put_user(offset, (loff_t __user *)arg); |^~~~ x86_64 Kasan tested and the reported issue not found. https://lkft.validation.linaro.org/scheduler/job/1868029#L2374 - Naresh
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds wrote: > > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes > wrote: > > > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > > could break something else), but the obvious first step of rearranging > > so that the ptr argument is evaluated before the assignment to __val_pu > > Ack. We could do that. > > I'm more inclined to just bite the bullet and go back to the ugly > conditional on the size that I had hoped to avoid, but if that turns > out too ugly, mind signing off on your patch and I'll have that as a > fallback? Actually, looking at that code, and the fact that we've used the "register asm()" format forever for the get_user() side, I think your approach is the right one. I'd rename the internal ptr variable to "__ptr_pu", and make sure the assignments happen just before the asm call (with the __val_pu assignment being the final thing). lso, it needs to be void __user *__ptr_pu; instead of __typeof__(ptr) __ptr = (ptr); because "ptr" may actually be an array, and we need to have the usual C "array to pointer" conversions happen, rather than try to make __ptr_pu be an array too. So the patch would become something like the appended instead, but I'd still like your sign-off (and I'd put you as author of the fix). Narest, can you confirm that this patch fixes the issue for you? Linus patch Description: Binary data
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes wrote: > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or > could break something else), but the obvious first step of rearranging > so that the ptr argument is evaluated before the assignment to __val_pu Ack. We could do that. I'm more inclined to just bite the bullet and go back to the ugly conditional on the size that I had hoped to avoid, but if that turns out too ugly, mind signing off on your patch and I'll have that as a fallback? Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 10:02 PM Sean Christopherson wrote: > > I haven't reproduced the crash, but I did find a smoking gun that confirms the > "register shenanigans are evil shenanigans" theory. I ran into a similar > thing > recently where a seemingly innocuous line of code after loading a value into a > register variable wreaked havoc because it clobbered the input register. Yup, that certainly looks like the smoking gun. Thanks for finding an example of this, clearly I'll have to either go back to the "conditionally use 'A' or 'a' depending on size" model, or perhaps try Rasmus' patch. Linus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On 23/10/2020 07.02, Sean Christopherson wrote: > On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote: >> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: >>> >>> The kernel Naresh originally referred to is here: >>> https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ >> >> Thanks. >> >> And when I started looking at it, I realized that my original idea >> ("just look for __put_user_nocheck_X calls, there aren't so many of >> those") was garbage, and that I was just being stupid. >> >> Yes, the commit that broke was about __put_user(), but in order to not >> duplicate all the code, it re-used the regular put_user() >> infrastructure, and so all the normal put_user() calls are potential >> problem spots too if this is about the compiler interaction with KASAN >> and the asm changes. >> >> So it's not just a couple of special cases to look at, it's all the >> normal cases too. >> >> Ok, back to the drawing board, but I think reverting it is probably >> the right thing to do if I can't think of something smart. >> >> That said, since you see this on x86-64, where the whole ugly trick with that >> >>register asm("%"_ASM_AX) >> >> is unnecessary (because the 8-byte case is still just a single >> register, no %eax:%edx games needed), it would be interesting to hear >> if the attached patch fixes it. That would confirm that the problem >> really is due to some register allocation issue interaction (or, >> alternatively, it would tell me that there's something else going on). > > I haven't reproduced the crash, but I did find a smoking gun that confirms the > "register shenanigans are evil shenanigans" theory. I ran into a similar > thing > recently where a seemingly innocuous line of code after loading a value into a > register variable wreaked havoc because it clobbered the input register. > > This put_user() in schedule_tail(): > >if (current->set_child_tid) >put_user(task_pid_vnr(current), current->set_child_tid); > > generates the following assembly with KASAN out-of-line: > >0x810dccc9 <+73>: xor%edx,%edx >0x810dcccb <+75>: xor%esi,%esi >0x810dcccd <+77>: mov%rbp,%rdi >0x810dccd0 <+80>: callq 0x810bf5e0 <__task_pid_nr_ns> >0x810dccd5 <+85>: mov%r12,%rdi >0x810dccd8 <+88>: callq 0x81388c60 <__asan_load8> >0x810dccdd <+93>: mov0x590(%rbp),%rcx >0x810dcce4 <+100>: callq 0x817708a0 <__put_user_4> >0x810dcce9 <+105>: pop%rbx >0x810dccea <+106>: pop%rbp >0x810dcceb <+107>: pop%r12 > > __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by > __asan_load8()'s check on current for the current->set_child_tid dereference. > Yup, and you don't need KASAN to implicitly generate function calls for you. With x86_64 defconfig, I get extern u64 __user *get_destination(int x, int y); void pu_test(void) { u64 big = 0x1234abcd5678; if (put_user(big, get_destination(4, 5))) pr_warn("uh"); } to generate 4d60 : 4d60: 53 push %rbx 4d61: be 05 00 00 00 mov$0x5,%esi 4d66: bf 04 00 00 00 mov$0x4,%edi 4d6b: e8 00 00 00 00 callq 4d70 4d6c: R_X86_64_PC32 get_destination-0x4 4d70: 48 89 c1mov%rax,%rcx 4d73: e8 00 00 00 00 callq 4d78 4d74: R_X86_64_PC32 __put_user_8-0x4 4d78: 85 c9 test %ecx,%ecx 4d7a: 75 02 jne4d7e 4d7c: 5b pop%rbx 4d7d: c3 retq 4d7e: 5b pop%rbx 4d7f: 48 c7 c7 00 00 00 00mov$0x0,%rdi 4d82: R_X86_64_32S .rodata.str1.1+0xfa 4d86: e9 00 00 00 00 jmpq 4d8b 4d87: R_X86_64_PC32 printk-0x4 That's certainly garbage. Now, I don't know if it's a sufficient fix (or could break something else), but the obvious first step of rearranging so that the ptr argument is evaluated before the assignment to __val_pu diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 477c503f2753..b5d3290fcd09 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void); #define do_put_user_call(fn,x,ptr) \ ({ \ int __ret_pu; \ - register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); \ + __typeof__(ptr) __ptr = (ptr); \ + register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX)
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote: > On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > > > The kernel Naresh originally referred to is here: > > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ > > Thanks. > > And when I started looking at it, I realized that my original idea > ("just look for __put_user_nocheck_X calls, there aren't so many of > those") was garbage, and that I was just being stupid. > > Yes, the commit that broke was about __put_user(), but in order to not > duplicate all the code, it re-used the regular put_user() > infrastructure, and so all the normal put_user() calls are potential > problem spots too if this is about the compiler interaction with KASAN > and the asm changes. > > So it's not just a couple of special cases to look at, it's all the > normal cases too. > > Ok, back to the drawing board, but I think reverting it is probably > the right thing to do if I can't think of something smart. > > That said, since you see this on x86-64, where the whole ugly trick with that > >register asm("%"_ASM_AX) > > is unnecessary (because the 8-byte case is still just a single > register, no %eax:%edx games needed), it would be interesting to hear > if the attached patch fixes it. That would confirm that the problem > really is due to some register allocation issue interaction (or, > alternatively, it would tell me that there's something else going on). I haven't reproduced the crash, but I did find a smoking gun that confirms the "register shenanigans are evil shenanigans" theory. I ran into a similar thing recently where a seemingly innocuous line of code after loading a value into a register variable wreaked havoc because it clobbered the input register. This put_user() in schedule_tail(): if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); generates the following assembly with KASAN out-of-line: 0x810dccc9 <+73>: xor%edx,%edx 0x810dcccb <+75>: xor%esi,%esi 0x810dcccd <+77>: mov%rbp,%rdi 0x810dccd0 <+80>: callq 0x810bf5e0 <__task_pid_nr_ns> 0x810dccd5 <+85>: mov%r12,%rdi 0x810dccd8 <+88>: callq 0x81388c60 <__asan_load8> 0x810dccdd <+93>: mov0x590(%rbp),%rcx 0x810dcce4 <+100>: callq 0x817708a0 <__put_user_4> 0x810dcce9 <+105>: pop%rbx 0x810dccea <+106>: pop%rbp 0x810dcceb <+107>: pop%r12 __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by __asan_load8()'s check on current for the current->set_child_tid dereference.
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: > > The kernel Naresh originally referred to is here: > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ Thanks. And when I started looking at it, I realized that my original idea ("just look for __put_user_nocheck_X calls, there aren't so many of those") was garbage, and that I was just being stupid. Yes, the commit that broke was about __put_user(), but in order to not duplicate all the code, it re-used the regular put_user() infrastructure, and so all the normal put_user() calls are potential problem spots too if this is about the compiler interaction with KASAN and the asm changes. So it's not just a couple of special cases to look at, it's all the normal cases too. Ok, back to the drawing board, but I think reverting it is probably the right thing to do if I can't think of something smart. That said, since you see this on x86-64, where the whole ugly trick with that register asm("%"_ASM_AX) is unnecessary (because the 8-byte case is still just a single register, no %eax:%edx games needed), it would be interesting to hear if the attached patch fixes it. That would confirm that the problem really is due to some register allocation issue interaction (or, alternatively, it would tell me that there's something else going on). Linus patch Description: Binary data
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
Hello! On Thu, 22 Oct 2020 at 19:11, Linus Torvalds wrote: > On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds > Would you mind sending me the problematic vmlinux file in private (or, > likely better - a pointer to some place I can download it, it's going > to be huge). The kernel Naresh originally referred to is here: https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ Greetings! Daniel Díaz daniel.d...@linaro.org
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 5:11 PM Linus Torvalds wrote: > > In particular, I wonder if it's that KASAN causes some reload pattern, > and the whole > > register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); > .. > asm volatile(.. "r" (__val_pu) ..) > > thing causes problems. That pattern isn't new (see the same pattern and the comment above get_user). But our previous use of that pattern had it as an output of the asm, and the new use is as an input. That obviously shouldn't matter, but if it's some odd compiler code generation interaction, all bets are off.. Linus
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds wrote: > > Thanks. Very funky, but thanks. I've been running that commit on my > machine for over half a year, and it still looks "trivially correct" > to me, but let me go look at it one more time. Can't argue with a > reliable bisect and revert.. Hmm. The fact that it only happens with KASAN makes me suspect it's some bad interaction with the inline asm syntax change (and explains why I've run with this for half a year without issues). In particular, I wonder if it's that KASAN causes some reload pattern, and the whole register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); .. asm volatile(.. "r" (__val_pu) ..) thing causes problems. That's an ugly pattern, but it's written that way to get gcc to handle the 64-bit case properly (with the value in %rax:%rdx). It turns out that the decode of the user-mode SIGSEGV code is a variation of system calls, ie 0: b8 18 00 00 00mov$0x18,%eax 5: 0f 05syscall 7: 48 3d 01 f0 ff ffcmp$0xf001,%rax d: 73 01jae0x10 f:* c3retq<-- trapping instruction or 0: 41 52push %r10 2: 52push %rdx 3: 4d 31 d2 xor%r10,%r10 6: ba 02 00 00 00mov$0x2,%edx b: be 80 00 00 00mov$0x80,%esi 10: 39 d0cmp%edx,%eax 12: 75 07jne0x1b 14: b8 ca 00 00 00mov$0xca,%eax 19: 0f 05syscall 1b: 89 d0mov%edx,%eax 1d: 87 07xchg %eax,(%rdi) 1f: 85 c0test %eax,%eax 21: 75 f1jne0x14 23:* 5apop%rdx <-- trapping instruction 24: 41 5apop%r10 26: c3retq so in both cases it looks like 'syscall' returned with a bad stack pointer. Which is certainly a sign of some code generation issue. Very annoying, because it probably means that it's compiler-specific too. And that "syscall 018" looks very odd. I think that's sched_yield() on x86-64, which doesn't have any __put_user() cases at all.. Would you mind sending me the problematic vmlinux file in private (or, likely better - a pointer to some place I can download it, it's going to be huge). Linus
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Thu, Oct 22, 2020 at 1:55 PM Naresh Kamboju wrote: > > The bad commit points to, > > commit d55564cfc222326e944893eff0c4118353e349ec > x86: Make __put_user() generate an out-of-line call > > I have reverted this single patch and confirmed the reported > problem is not seen anymore. Thanks. Very funky, but thanks. I've been running that commit on my machine for over half a year, and it still looks "trivially correct" to me, but let me go look at it one more time. Can't argue with a reliable bisect and revert.. Linus
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Wed, 21 Oct 2020 at 22:52, Naresh Kamboju wrote: > > On Wed, 21 Oct 2020 at 22:35, Linus Torvalds > wrote: > > > > On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju > > wrote: > > > > > > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) > > > tested on > > > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds. > > > This regression started happening from next-20201015 nowards > > > > Is it repeatable enough to be bisectable? > > Yes. This is easily reproducible. > I will bisect and report here. The bad commit points to, commit d55564cfc222326e944893eff0c4118353e349ec x86: Make __put_user() generate an out-of-line call I have reverted this single patch and confirmed the reported problem is not seen anymore. - Naresh
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Wed, 21 Oct 2020 at 22:35, Linus Torvalds wrote: > > On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju > wrote: > > > > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) > > tested on > > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds. > > This regression started happening from next-20201015 nowards > > Is it repeatable enough to be bisectable? Yes. This is easily reproducible. I will bisect and report here. > > Linus - Naresh
Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju wrote: > > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested > on > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds. > This regression started happening from next-20201015 nowards Is it repeatable enough to be bisectable? Linus
mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds. This regression started happening from next-20201015 nowards There are few more regression on linux next, ltp-cve-tests: * cve-2015-7550 ltp-math-tests: * float_bessel * float_exp_log * float_iperb * float_power * float_trigo ltp-mm-tests: * mallocstress01 * mtest05 * mtest06_3 ltp-syscalls-tests: * clone08 * clone301 * fcntl34 * fcntl34_64 * fcntl36 * fcntl36_64 * keyctl02 * rt_tgsigqueueinfo01 metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20201015 kernel-config: https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/kernel.config steps to reproduce: # boot x86_64 with KASAN enabled kernel and run tests # cd /opt/ltp/testcases/bin # ./mmstress # ./mmap3 -x 0.002 -p # ./mallocstress mtest05 (mmstress) : mmstress0 TINFO : run mmstress -h for all options mmstress0 TINFO : test1: Test case tests the race condition between simultaneous read faults in the same address space. [ 279.469207] mmstress[1309]: segfault at 7f3d71a36ee8 ip 7f3d77132bdf sp 7f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000] [ 279.469305] audit: type=1701 audit(1602818315.656:3): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1307 comm=\"mmstress\" exe=\"/opt/ltp/testcases/bin/mmstress\" sig=11 res=1 [ 279.481636] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff ff 73 01 48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f [ 279.498212] mmstress[1311]: segfault at 7f3d70a34ee8 ip 7f3d77132bdf sp 7f3d70a34ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000] [ 279.516839] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff ff 73 01 48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s tst_test.c:1246: INFO: Timeout per run is 0h 09m 00s tst_test.c:1291: BROK: Test killed by SIGBUS! mtest06_3 (mmap3 -x 0.002 -p) : --- mmap3.c:154: INFO: Seed 22 mmap3.c:155: INFO: Number of loops 1000 mmap3.c:156: INFO: Number of threads 40 mmap3.c:157: INFO: MAP[ 286.657788] mmap3[1350]: segfault at 7f12179d4680 ip 7f121859951d sp 7f12179d1e10 error 6 in libpthread-2.27.so[7f1218589000+19000] _PRIVATE = 1 mm[ 286.671184] Code: c4 10 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 15 99 8a 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 48 8b 15 85 8a 20 00 f7 d8 <64> 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 b8 01 00 [ 286.677386] audit: type=1701 audit(1602818322.844:6): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1348 comm=\"mmap3\" exe=\"/opt/ltp/testcases/bin/mmap3\" sig=11 res=1 ap3.c:158: INFO: Execution time 0.002000H mallocstress01 (mallocstress) : -- pid[1496]: shmat_rd_wr(): shmget():success got segment id 32830 pid[1496]: do_shmat_shmadt(): got shmat address = 0x7f301eae9000 pid[1496]: shmat_rd_wr(): shmget():success got segment id 328[ 291.851376] mallocstress[1502]: segfault at 0 ip sp 7f80dea3ec50 error 14 30 pid[1496]: d[ 291.851466] mallocstress[1507]: segfault at 7f80dc239c98 ip 7f80df2bf81c sp 7f80dc239c98 error 4 o_shmat_shmadt()[ 291.851485] mallocstress[1505]: segfault at 7f80dd23bc38 ip 7f80df33fe93 sp 7f80dd23bc38 error 4 [ 291.851490] Code: 00 00 00 00 0f 1f 00 41 52 52 4d 31 d2 ba 02 00 00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 89 d0 87 07 85 c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52 c7 07 00 00 00 00 be : got shmat addr[ 291.851565] audit: type=1701 audit(1602818328.038:7): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1500 comm=\"mallocstress\" exe=\"/opt/ltp/testcases/bin/mallocstress\" sig=11 res=1 [ 291.852984] mallocstress[1504]: segfault at 7f80dda3cc38 ip 7f80df33fe93 sp 7f80dda3cc38 error 4 ess = 0x7f301e85[ 291.852988] Code: 00 00 00 00 0f 1f 00 41 52 52 4d 31 d2 ba 02 00 00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 89 d0 87 07 85 c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52 c7 07 00 00 00 00 be [ 291.853045] audit: type=1701 audit(1602818328.040:8): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1500 comm=\"mallocstress\" exe=\"/opt/ltp/testcases/bin/mallocstress\" sig=11 res=1 5000 tst_test.c[ 291.860373] Code: Unable to access opcode bytes at RIP 0xffd6. [ 291.860453] mallocstress[1506]: segfault at 7f80dca3ac98 ip 7f80df2bf81c sp 7f80dca3ac98 error 4 :1246: INFO: Tim[ 291.860654] audit: type=