Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Song Liu
On Fri, Oct 23, 2020 at 10:51 AM Linus Torvalds
 wrote:
>
> On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju
>  wrote:
> >
> > [Old patch from yesterday]
> >
> > After applying your patch on top on linux next tag 20201015
> > there are two observations,
> >   1) i386 build failed. please find build error build
>
> Yes, this was expected. That patch explicitly only works on x86-64,
> because 32-bit needs the double register handling for 64-bit values
> (mainly loff_t).
>
> >   2) x86_64 kasan test PASS and the reported error not found.
>
> Ok, good. That confirms that the problem you reported is indeed the
> register allocation.
>
> The patch I sent an hour ago (the one based on Rasmus' one from
> yesterday) should fix things too, and - unlike yesterday's - work on
> 32-bit.
>
> But I'll wait for confirmation (and hopefully a sign-off from Rasmus
> so that I can give him authorship) before actually committing it.
>
>   Linus

My test vm failed to boot since

commit d55564cfc222326e944893eff0c4118353e349ec
x86: Make __put_user() generate an out-of-line call

The patch also fixed it.

Thanks!
Song


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Linus Torvalds
On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju
 wrote:
>
> [Old patch from yesterday]
>
> After applying your patch on top on linux next tag 20201015
> there are two observations,
>   1) i386 build failed. please find build error build

Yes, this was expected. That patch explicitly only works on x86-64,
because 32-bit needs the double register handling for 64-bit values
(mainly loff_t).

>   2) x86_64 kasan test PASS and the reported error not found.

Ok, good. That confirms that the problem you reported is indeed the
register allocation.

The patch I sent an hour ago (the one based on Rasmus' one from
yesterday) should fix things too, and - unlike yesterday's - work on
32-bit.

But I'll wait for confirmation (and hopefully a sign-off from Rasmus
so that I can give him authorship) before actually committing it.

  Linus


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Naresh Kamboju
On Fri, 23 Oct 2020 at 22:03, Linus Torvalds
 wrote:
>
> On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds
>  wrote:
> >
> > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
> >  wrote:
> > >
> > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> > > could break something else), but the obvious first step of rearranging
> > > so that the ptr argument is evaluated before the assignment to __val_pu
> >
> > Ack. We could do that.
> >
> > I'm more inclined to just bite the bullet and go back to the ugly
> > conditional on the size that I had hoped to avoid, but if that turns
> > out too ugly, mind signing off on your patch and I'll have that as a
> > fallback?
>
> Actually, looking at that code, and the fact that we've used the
> "register asm()" format forever for the get_user() side, I think your
> approach is the right one.
>
> I'd rename the internal ptr variable to "__ptr_pu", and make sure the
> assignments happen just before the asm call (with the __val_pu
> assignment being the final thing).
>
> lso, it needs to be
>
> void __user *__ptr_pu;
>
> instead of
>
> __typeof__(ptr) __ptr = (ptr);
>
> because "ptr" may actually be an array, and we need to have the usual
> C "array to pointer" conversions happen, rather than try to make
> __ptr_pu be an array too.
>
> So the patch would become something like the appended instead, but I'd
> still like your sign-off (and I'd put you as author of the fix).
>
> Narest, can you confirm that this patch fixes the issue for you?

This patch fixed the reported problem.

Tested-by: Naresh Kamboju 

Build location:
https://builds.tuxbuild.com/uDAiW8jkN61oWoyxZDkEYA/

Test logs,
https://lkft.validation.linaro.org/scheduler/job/1868045#L1597

- Naresh


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Naresh Kamboju
On Fri, 23 Oct 2020 at 08:35, Linus Torvalds
 wrote:
>
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz  wrote:
> >
> > The kernel Naresh originally referred to is here:
> >   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).

[Old patch from yesterday]

After applying your patch on top on linux next tag 20201015
there are two observations,
  1) i386 build failed. please find build error build
  2) x86_64 kasan test PASS and the reported error not found.


i386 build failure,
--
make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=i386 HOSTCC=gcc
CC="sccache gcc" O=build
#
In file included from ../include/linux/uaccess.h:11,
 from ../arch/x86/include/asm/fpu/xstate.h:5,
 from ../arch/x86/include/asm/pgtable.h:26,
 from ../include/linux/pgtable.h:6,
 from ../include/linux/mm.h:33,
 from ../include/linux/memblock.h:13,
 from ../fs/proc/page.c:2:
../fs/proc/page.c: In function ‘kpagecgroup_read’:
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"\
  |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
  |^~~~
../fs/proc/page.c:307:7: note: in expansion of macro ‘put_user’
  307 |   if (put_user(ino, out)) {
  |   ^~~~
make[3]: *** [../scripts/Makefile.build:283: fs/proc/page.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [../scripts/Makefile.build:500: fs/proc] Error 2
In file included from ../include/linux/uaccess.h:11,
 from ../include/linux/sched/task.h:11,
 from ../include/linux/sched/signal.h:9,
 from ../include/linux/rcuwait.h:6,
 from ../include/linux/percpu-rwsem.h:7,
 from ../include/linux/fs.h:33,
 from ../include/linux/cgroup.h:17,
 from ../include/linux/memcontrol.h:13,
 from ../include/linux/swap.h:9,
 from ../include/linux/suspend.h:5,
 from ../kernel/power/user.c:10:
../kernel/power/user.c: In function ‘snapshot_ioctl’:
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"\
  |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
  |^~~~
../kernel/power/user.c:340:11: note: in expansion of macro ‘put_user’
  340 |   error = put_user(size, (loff_t __user *)arg);
  |   ^~~~
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"\
  |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
  |^~~~
../kernel/power/user.c:346:11: note: in expansion of macro ‘put_user’
  346 |   error = put_user(size, (loff_t __user *)arg);
  |   ^~~~
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"\
  |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
  |^~~~
../kernel/power/user.c:357:12: note: in expansion of macro ‘put_user’
  357 |error = put_user(offset, (loff_t __user *)arg);
  |^~~~


x86_64 Kasan tested and the reported issue not found.
https://lkft.validation.linaro.org/scheduler/job/1868029#L2374

- Naresh


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Linus Torvalds
On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds
 wrote:
>
> On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
>  wrote:
> >
> > That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> > could break something else), but the obvious first step of rearranging
> > so that the ptr argument is evaluated before the assignment to __val_pu
>
> Ack. We could do that.
>
> I'm more inclined to just bite the bullet and go back to the ugly
> conditional on the size that I had hoped to avoid, but if that turns
> out too ugly, mind signing off on your patch and I'll have that as a
> fallback?

Actually, looking at that code, and the fact that we've used the
"register asm()" format forever for the get_user() side, I think your
approach is the right one.

I'd rename the internal ptr variable to "__ptr_pu", and make sure the
assignments happen just before the asm call (with the __val_pu
assignment being the final thing).

lso, it needs to be

void __user *__ptr_pu;

instead of

__typeof__(ptr) __ptr = (ptr);

because "ptr" may actually be an array, and we need to have the usual
C "array to pointer" conversions happen, rather than try to make
__ptr_pu be an array too.

So the patch would become something like the appended instead, but I'd
still like your sign-off (and I'd put you as author of the fix).

Narest, can you confirm that this patch fixes the issue for you?

  Linus


patch
Description: Binary data


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Linus Torvalds
On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
 wrote:
>
> That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> could break something else), but the obvious first step of rearranging
> so that the ptr argument is evaluated before the assignment to __val_pu

Ack. We could do that.

I'm more inclined to just bite the bullet and go back to the ugly
conditional on the size that I had hoped to avoid, but if that turns
out too ugly, mind signing off on your patch and I'll have that as a
fallback?

 Linus


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Linus Torvalds
On Thu, Oct 22, 2020 at 10:02 PM Sean Christopherson
 wrote:
>
> I haven't reproduced the crash, but I did find a smoking gun that confirms the
> "register shenanigans are evil shenanigans" theory.  I ran into a similar 
> thing
> recently where a seemingly innocuous line of code after loading a value into a
> register variable wreaked havoc because it clobbered the input register.

Yup, that certainly looks like the smoking gun.

Thanks for finding an example of this, clearly I'll have to either go
back to the "conditionally use 'A' or 'a' depending on size" model, or
perhaps try Rasmus' patch.

  Linus


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Rasmus Villemoes
On 23/10/2020 07.02, Sean Christopherson wrote:
> On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
>> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz  wrote:
>>>
>>> The kernel Naresh originally referred to is here:
>>>   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>>
>> Thanks.
>>
>> And when I started looking at it, I realized that my original idea
>> ("just look for __put_user_nocheck_X calls, there aren't so many of
>> those") was garbage, and that I was just being stupid.
>>
>> Yes, the commit that broke was about __put_user(), but in order to not
>> duplicate all the code, it re-used the regular put_user()
>> infrastructure, and so all the normal put_user() calls are potential
>> problem spots too if this is about the compiler interaction with KASAN
>> and the asm changes.
>>
>> So it's not just a couple of special cases to look at, it's all the
>> normal cases too.
>>
>> Ok, back to the drawing board, but I think reverting it is probably
>> the right thing to do if I can't think of something smart.
>>
>> That said, since you see this on x86-64, where the whole ugly trick with that
>>
>>register asm("%"_ASM_AX)
>>
>> is unnecessary (because the 8-byte case is still just a single
>> register, no %eax:%edx games needed), it would be interesting to hear
>> if the attached patch fixes it. That would confirm that the problem
>> really is due to some register allocation issue interaction (or,
>> alternatively, it would tell me that there's something else going on).
> 
> I haven't reproduced the crash, but I did find a smoking gun that confirms the
> "register shenanigans are evil shenanigans" theory.  I ran into a similar 
> thing
> recently where a seemingly innocuous line of code after loading a value into a
> register variable wreaked havoc because it clobbered the input register.
> 
> This put_user() in schedule_tail():
> 
>if (current->set_child_tid)
>put_user(task_pid_vnr(current), current->set_child_tid);
> 
> generates the following assembly with KASAN out-of-line:
> 
>0x810dccc9 <+73>: xor%edx,%edx
>0x810dcccb <+75>: xor%esi,%esi
>0x810dcccd <+77>: mov%rbp,%rdi
>0x810dccd0 <+80>: callq  0x810bf5e0 <__task_pid_nr_ns>
>0x810dccd5 <+85>: mov%r12,%rdi
>0x810dccd8 <+88>: callq  0x81388c60 <__asan_load8>
>0x810dccdd <+93>: mov0x590(%rbp),%rcx
>0x810dcce4 <+100>: callq  0x817708a0 <__put_user_4>
>0x810dcce9 <+105>: pop%rbx
>0x810dccea <+106>: pop%rbp
>0x810dcceb <+107>: pop%r12
> 
> __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
> __asan_load8()'s check on current for the current->set_child_tid dereference.
> 

Yup, and you don't need KASAN to implicitly generate function calls for
you. With x86_64 defconfig, I get

extern u64 __user *get_destination(int x, int y);

void pu_test(void)
{
u64 big = 0x1234abcd5678;

if (put_user(big, get_destination(4, 5)))
pr_warn("uh");
}

to generate

4d60 :
4d60:   53  push   %rbx
4d61:   be 05 00 00 00  mov$0x5,%esi
4d66:   bf 04 00 00 00  mov$0x4,%edi
4d6b:   e8 00 00 00 00  callq  4d70 
4d6c: R_X86_64_PC32 get_destination-0x4
4d70:   48 89 c1mov%rax,%rcx
4d73:   e8 00 00 00 00  callq  4d78 
4d74: R_X86_64_PC32 __put_user_8-0x4
4d78:   85 c9   test   %ecx,%ecx
4d7a:   75 02   jne4d7e 
4d7c:   5b  pop%rbx
4d7d:   c3  retq
4d7e:   5b  pop%rbx
4d7f:   48 c7 c7 00 00 00 00mov$0x0,%rdi
4d82: R_X86_64_32S  .rodata.str1.1+0xfa
4d86:   e9 00 00 00 00  jmpq   4d8b 
4d87: R_X86_64_PC32 printk-0x4


That's certainly garbage. Now, I don't know if it's a sufficient fix (or
could break something else), but the obvious first step of rearranging
so that the ptr argument is evaluated before the assignment to __val_pu

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 477c503f2753..b5d3290fcd09 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void);
 #define do_put_user_call(fn,x,ptr) \
 ({ \
int __ret_pu;   \
-   register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);   \
+   __typeof__(ptr) __ptr = (ptr);  \
+   register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX) 

Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Sean Christopherson
On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz  wrote:
> >
> > The kernel Naresh originally referred to is here:
> >   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
> 
> Thanks.
> 
> And when I started looking at it, I realized that my original idea
> ("just look for __put_user_nocheck_X calls, there aren't so many of
> those") was garbage, and that I was just being stupid.
> 
> Yes, the commit that broke was about __put_user(), but in order to not
> duplicate all the code, it re-used the regular put_user()
> infrastructure, and so all the normal put_user() calls are potential
> problem spots too if this is about the compiler interaction with KASAN
> and the asm changes.
> 
> So it's not just a couple of special cases to look at, it's all the
> normal cases too.
> 
> Ok, back to the drawing board, but I think reverting it is probably
> the right thing to do if I can't think of something smart.
> 
> That said, since you see this on x86-64, where the whole ugly trick with that
> 
>register asm("%"_ASM_AX)
> 
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).

I haven't reproduced the crash, but I did find a smoking gun that confirms the
"register shenanigans are evil shenanigans" theory.  I ran into a similar thing
recently where a seemingly innocuous line of code after loading a value into a
register variable wreaked havoc because it clobbered the input register.

This put_user() in schedule_tail():

   if (current->set_child_tid)
   put_user(task_pid_vnr(current), current->set_child_tid);

generates the following assembly with KASAN out-of-line:

   0x810dccc9 <+73>: xor%edx,%edx
   0x810dcccb <+75>: xor%esi,%esi
   0x810dcccd <+77>: mov%rbp,%rdi
   0x810dccd0 <+80>: callq  0x810bf5e0 <__task_pid_nr_ns>
   0x810dccd5 <+85>: mov%r12,%rdi
   0x810dccd8 <+88>: callq  0x81388c60 <__asan_load8>
   0x810dccdd <+93>: mov0x590(%rbp),%rcx
   0x810dcce4 <+100>: callq  0x817708a0 <__put_user_4>
   0x810dcce9 <+105>: pop%rbx
   0x810dccea <+106>: pop%rbp
   0x810dcceb <+107>: pop%r12

__task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
__asan_load8()'s check on current for the current->set_child_tid dereference.


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Linus Torvalds
On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz  wrote:
>
> The kernel Naresh originally referred to is here:
>   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/

Thanks.

And when I started looking at it, I realized that my original idea
("just look for __put_user_nocheck_X calls, there aren't so many of
those") was garbage, and that I was just being stupid.

Yes, the commit that broke was about __put_user(), but in order to not
duplicate all the code, it re-used the regular put_user()
infrastructure, and so all the normal put_user() calls are potential
problem spots too if this is about the compiler interaction with KASAN
and the asm changes.

So it's not just a couple of special cases to look at, it's all the
normal cases too.

Ok, back to the drawing board, but I think reverting it is probably
the right thing to do if I can't think of something smart.

That said, since you see this on x86-64, where the whole ugly trick with that

   register asm("%"_ASM_AX)

is unnecessary (because the 8-byte case is still just a single
register, no %eax:%edx games needed), it would be interesting to hear
if the attached patch fixes it. That would confirm that the problem
really is due to some register allocation issue interaction (or,
alternatively, it would tell me that there's something else going on).

  Linus


patch
Description: Binary data


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Daniel Díaz
Hello!

On Thu, 22 Oct 2020 at 19:11, Linus Torvalds
 wrote:
> On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds
> Would you mind sending me the problematic vmlinux file in private (or,
> likely better - a pointer to some place I can download it, it's going
> to be huge).

The kernel Naresh originally referred to is here:
  https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/

Greetings!

Daniel Díaz
daniel.d...@linaro.org


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Linus Torvalds
On Thu, Oct 22, 2020 at 5:11 PM Linus Torvalds
 wrote:
>
> In particular, I wonder if it's that KASAN causes some reload pattern,
> and the whole
>
>  register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);
> ..
>  asm volatile(.. "r" (__val_pu) ..)
>
> thing causes problems.

That pattern isn't new (see the same pattern and the comment above get_user).

But our previous use of that pattern had it as an output of the asm,
and the new use is as an input. That obviously shouldn't matter, but
if it's some odd compiler code generation interaction, all bets are
off..

Linus


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Linus Torvalds
On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds
 wrote:
>
> Thanks. Very funky, but thanks. I've been running that commit on my
> machine for over half a year, and it still looks "trivially correct"
> to me, but let me go look at it one more time. Can't argue with a
> reliable bisect and revert..

Hmm. The fact that it only happens with KASAN makes me suspect it's
some bad interaction with the inline asm syntax change (and explains
why I've run with this for half a year without issues).

In particular, I wonder if it's that KASAN causes some reload pattern,
and the whole

 register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);
..
 asm volatile(.. "r" (__val_pu) ..)

thing causes problems. That's an ugly pattern, but it's written that
way to get gcc to handle the 64-bit case properly (with the value in
%rax:%rdx).

It turns out that the decode of the user-mode SIGSEGV code is a
variation of system calls, ie

   0: b8 18 00 00 00mov$0x18,%eax
   5: 0f 05syscall
   7: 48 3d 01 f0 ff ffcmp$0xf001,%rax
   d: 73 01jae0x10
   f:* c3retq<-- trapping instruction

or

   0: 41 52push   %r10
   2: 52push   %rdx
   3: 4d 31 d2  xor%r10,%r10
   6: ba 02 00 00 00mov$0x2,%edx
   b: be 80 00 00 00mov$0x80,%esi
  10: 39 d0cmp%edx,%eax
  12: 75 07jne0x1b
  14: b8 ca 00 00 00mov$0xca,%eax
  19: 0f 05syscall
  1b: 89 d0mov%edx,%eax
  1d: 87 07xchg   %eax,(%rdi)
  1f: 85 c0test   %eax,%eax
  21: 75 f1jne0x14
  23:* 5apop%rdx <-- trapping instruction
  24: 41 5apop%r10
  26: c3retq

so in both cases it looks like 'syscall' returned with a bad stack pointer.

Which is certainly a sign of some code generation issue.

Very annoying, because it probably means that it's compiler-specific
too. And that "syscall 018" looks very odd. I think that's
sched_yield() on x86-64, which doesn't have any __put_user() cases at
all..

Would you mind sending me the problematic vmlinux file in private (or,
likely better - a pointer to some place I can download it, it's going
to be huge).

  Linus


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Linus Torvalds
On Thu, Oct 22, 2020 at 1:55 PM Naresh Kamboju
 wrote:
>
> The bad commit points to,
>
> commit d55564cfc222326e944893eff0c4118353e349ec
> x86: Make __put_user() generate an out-of-line call
>
> I have reverted this single patch and confirmed the reported
> problem is not seen anymore.

Thanks. Very funky, but thanks. I've been running that commit on my
machine for over half a year, and it still looks "trivially correct"
to me, but let me go look at it one more time. Can't argue with a
reliable bisect and revert..

Linus


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-22 Thread Naresh Kamboju
On Wed, 21 Oct 2020 at 22:52, Naresh Kamboju  wrote:
>
> On Wed, 21 Oct 2020 at 22:35, Linus Torvalds
>  wrote:
> >
> > On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
> >  wrote:
> > >
> > > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) 
> > > tested on
> > > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> > > This regression started happening from next-20201015 nowards
> >
> > Is it repeatable enough to be bisectable?
>
> Yes. This is easily reproducible.
> I will bisect and report here.

The bad commit points to,

commit d55564cfc222326e944893eff0c4118353e349ec
x86: Make __put_user() generate an out-of-line call

I have reverted this single patch and confirmed the reported
problem is not seen anymore.

- Naresh


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-21 Thread Naresh Kamboju
On Wed, 21 Oct 2020 at 22:35, Linus Torvalds
 wrote:
>
> On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
>  wrote:
> >
> > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) 
> > tested on
> > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> > This regression started happening from next-20201015 nowards
>
> Is it repeatable enough to be bisectable?

Yes. This is easily reproducible.
I will bisect and report here.

>
>  Linus

- Naresh


Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-21 Thread Linus Torvalds
On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
 wrote:
>
> LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested 
> on
> x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> This regression started happening from next-20201015 nowards

Is it repeatable enough to be bisectable?

 Linus


mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-21 Thread Naresh Kamboju
LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on
x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
This regression started happening from next-20201015 nowards

There are few more regression on linux next,
  ltp-cve-tests:
* cve-2015-7550
  ltp-math-tests:
* float_bessel
* float_exp_log
* float_iperb
* float_power
* float_trigo
  ltp-mm-tests:
* mallocstress01
* mtest05
* mtest06_3
  ltp-syscalls-tests:
* clone08
* clone301
* fcntl34
* fcntl34_64
* fcntl36
* fcntl36_64
* keyctl02
* rt_tgsigqueueinfo01

metadata:
  git branch: master
  git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git describe: next-20201015
  kernel-config:
https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/kernel.config

steps to reproduce:
  # boot x86_64 with KASAN enabled kernel and run tests
  # cd /opt/ltp/testcases/bin
  # ./mmstress
  # ./mmap3 -x 0.002 -p
  # ./mallocstress

mtest05  (mmstress) :

mmstress0  TINFO  :  run mmstress -h for all options
mmstress0  TINFO  :  test1: Test case tests the race condition
between simultaneous read faults in the same address space.
[  279.469207] mmstress[1309]: segfault at 7f3d71a36ee8 ip
7f3d77132bdf sp 7f3d71a36ee8 error 4 in
libc-2.27.so[7f3d77058000+1aa000]
[  279.469305] audit: type=1701 audit(1602818315.656:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1307
comm=\"mmstress\" exe=\"/opt/ltp/testcases/bin/mmstress\" sig=11 res=1
[  279.481636] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff
ff 73 01  48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f
[  279.498212] mmstress[1311]: segfault at 7f3d70a34ee8 ip
7f3d77132bdf sp 7f3d70a34ee8 error 4 in
libc-2.27.so[7f3d77058000+1aa000]
[  279.516839] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff
ff 73 01  48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f
tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s
tst_test.c:1246: INFO: Timeout per run is 0h 09m 00s
tst_test.c:1291: BROK: Test killed by SIGBUS!

mtest06_3 (mmap3 -x 0.002 -p) :
---
mmap3.c:154: INFO: Seed 22
mmap3.c:155: INFO: Number of loops 1000
mmap3.c:156: INFO: Number of threads 40
mmap3.c:157: INFO: MAP[  286.657788] mmap3[1350]: segfault at
7f12179d4680 ip 7f121859951d sp 7f12179d1e10 error 6 in
libpthread-2.27.so[7f1218589000+19000]
_PRIVATE = 1
mm[  286.671184] Code: c4 10 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 15
99 8a 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 48 8b 15 85 8a 20
00 f7 d8 <64> 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 b8
01 00
[  286.677386] audit: type=1701 audit(1602818322.844:6):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1348
comm=\"mmap3\" exe=\"/opt/ltp/testcases/bin/mmap3\" sig=11 res=1
ap3.c:158: INFO: Execution time 0.002000H

mallocstress01 (mallocstress) :
--
pid[1496]: shmat_rd_wr(): shmget():success got segment id 32830
pid[1496]: do_shmat_shmadt(): got shmat address = 0x7f301eae9000
pid[1496]: shmat_rd_wr(): shmget():success got segment id 328[
291.851376] mallocstress[1502]: segfault at 0 ip  sp
7f80dea3ec50 error 14
30
pid[1496]: d[  291.851466] mallocstress[1507]: segfault at
7f80dc239c98 ip 7f80df2bf81c sp 7f80dc239c98 error 4
o_shmat_shmadt()[  291.851485] mallocstress[1505]: segfault at
7f80dd23bc38 ip 7f80df33fe93 sp 7f80dd23bc38 error 4
[  291.851490] Code: 00 00 00 00 0f 1f 00 41 52 52 4d 31 d2 ba 02 00
00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 89 d0 87 07 85
c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52 c7 07 00 00 00
00 be
: got shmat addr[  291.851565] audit: type=1701
audit(1602818328.038:7): auid=4294967295 uid=0 gid=0 ses=4294967295
subj=kernel pid=1500 comm=\"mallocstress\"
exe=\"/opt/ltp/testcases/bin/mallocstress\" sig=11 res=1
[  291.852984] mallocstress[1504]: segfault at 7f80dda3cc38 ip
7f80df33fe93 sp 7f80dda3cc38 error 4
ess = 0x7f301e85[  291.852988] Code: 00 00 00 00 0f 1f 00 41 52 52 4d
31 d2 ba 02 00 00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05
89 d0 87 07 85 c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52
c7 07 00 00 00 00 be
[  291.853045] audit: type=1701 audit(1602818328.040:8):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1500
comm=\"mallocstress\" exe=\"/opt/ltp/testcases/bin/mallocstress\"
sig=11 res=1
5000
tst_test.c[  291.860373] Code: Unable to access opcode bytes at RIP
0xffd6.
[  291.860453] mallocstress[1506]: segfault at 7f80dca3ac98 ip
7f80df2bf81c sp 7f80dca3ac98 error 4
:1246: INFO: Tim[  291.860654] audit: type=