Re: [vma caching] BUG: unable to handle kernel paging request at ffff880008142f40

2014-03-11 Thread Fengguang Wu
Hi Davidlohr,

On Mon, Mar 10, 2014 at 02:08:08PM -0700, Davidlohr Bueso wrote:
> On Mon, 2014-03-10 at 10:43 +0800, Fengguang Wu wrote:
> > Hi Davidlohr,
> > 
> > I got the below dmesg and the first bad commit is
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > commit 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
> > Author: Davidlohr Bueso 
> > AuthorDate: Thu Mar 6 11:01:48 2014 +1100
> > Commit: Stephen Rothwell 
> > CommitDate: Thu Mar 6 11:01:48 2014 +1100
> > 
> > mm: per-thread vma caching
> > 
> > hwclock: can't open '/dev/misc/rtc': No such file or directory
> > Running postinst /etc/rpm-postinsts/100...
> 
> Hmm this kind of errors strike me as dealing with some bogus vma from a
> stale cache.
> 
> > [3.658976] BUG: unable to handle kernel paging request at 
> > 880008142f40
> > [3.661422] IP: [] vmacache_find+0x78/0x90
> > [3.662223] PGD 2542067 PUD 2543067 PMD fba5067 PTE 88142060
> > [3.662223] Oops:  [#1] DEBUG_PAGEALLOC
> > [3.662223] Modules linked in:
> > [3.662223] CPU: 0 PID: 326 Comm: 90-trinity Not tainted 
> > 3.14.0-rc5-next-20140307 #1
> 
> Have you only seen this through DEBUG_PAGEALLOC + trinity?

Yes.

> > [3.662223] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [3.662223] task: 8800083020d0 ti: 8800082a task.ti: 
> > 8800082a
> > [3.662223] RIP: 0010:[]  [] 
> > vmacache_find+0x78/0x90
> > [3.662223] RSP: :8800082a1e00  EFLAGS: 00010282
> > [3.662223] RAX: 880008142f40 RBX: 00a9 RCX: 
> > 8800083020d0
> > [3.662223] RDX: 0002 RSI: 7fff8a141698 RDI: 
> > 880008124bc0
> > [3.662223] RBP: 8800082a1e00 R08:  R09: 
> > 0001
> > [3.662223] R10: 8800083020d0 R11:  R12: 
> > 7fff8a141698
> > [3.662223] R13: 880008124bc0 R14: 8800082a1f58 R15: 
> > 8800083020d0
> > [3.662223] FS:  7fe3ca364700() GS:81a06000() 
> > knlGS:
> > [3.662223] CS:  0010 DS:  ES:  CR0: 80050033
> > [3.662223] CR2: 880008142f40 CR3: 0824c000 CR4: 
> > 06b0
> > [3.662223] Stack:
> > [3.662223]  8800082a1e28 81125219 00a9 
> > 7fff8a141698
> > [3.662223]  880008124bc0 8800082a1f28 816d71fe 
> > 0246
> > [3.662223]  0002 880008124c58 0006 
> > 0010
> > [3.662223] Call Trace:
> > [3.662223]  [] find_vma+0x19/0x70
> > [3.662223]  [] __do_page_fault+0x29e/0x560
> > [3.662223]  [] ? mntput_no_expire+0x6f/0x1a0
> > [3.662223]  [] ? mntput_no_expire+0x11/0x1a0
> > [3.662223]  [] ? mntput+0x35/0x40
> > [3.662223]  [] ? __fput+0x24f/0x290
> > [3.662223]  [] ? trace_hardirqs_off_thunk+0x3a/0x3c
> > [3.662223]  [] do_page_fault+0xe/0x10
> > [3.662223]  [] do_async_page_fault+0x35/0x90
> > [3.662223]  [] async_page_fault+0x25/0x30
> > [3.662223] Code: c7 81 b0 02 00 00 00 00 00 00 eb 32 0f 1f 80 00 00 00 
> > 00 31 d2 66 0f 1f 44 00 00 48 63 c2 48 8b 84 c1 98 02 00 00 48 85 c0 74 0b 
> > <48> 39 30 77 06 48 3b 70 08 72 0a 83 c2 01 83 fa 04 75 dd 31 c0 
> > [3.662223] RIP  [] vmacache_find+0x78/0x90
> 
> So this is:
>0:   c7 81 b0 02 00 00 00movl   $0x0,0x2b0(%rcx)
>7:   00 00 00 
>a:   eb 32   jmp0x3e
>c:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
>   13:   31 d2   xor%edx,%edx
>   15:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
>   1b:   48 63 c2movslq %edx,%rax
>   1e:   48 8b 84 c1 98 02 00mov0x298(%rcx,%rax,8),%rax
>   25:   00 
>   26:   48 85 c0test   %rax,%rax
>   29:   74 0b   je 0x36
>   2b:*  48 39 30cmp%rsi,(%rax)  <-- trapping 
> instruction
> 
> 
> which seems to be the following, where vma is stale:
>   if (vma && vma->vm_start <= addr && vma->vm_end > addr)
>   return vma;
> 
> 
>   2e:   77 06   ja 0x36
>   30:   48 3b 70 08 cmp0x8(%rax),%rsi
>   34:   72 0a   jb 0x40
>   36:   83 c2 01add$0x1,%edx
>   39:   83 fa 04cmp$0x4,%edx
>   3c:   75 dd   jne0x1b
>   3e:   31 c0   xor%eax,%eax
> 
> Could you please try this fix: https://lkml.org/lkml/2014/3/10/505 - it
> is fix for a race Oleg found that can cause us to keep bogus vmas in the
> cache under certain VM_CLONE scenarios.

With the patch applied on top of 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
I still get this:

sed: /lib/modules/3.14.0-rc5-00226-g0d9ad422-dirty/modules.dep: No such file or 
directory
xargs: modprobe: No such file or directory
run-parts: /etc/kernel-tests/01-modprobe exited with code 127
[3.807108] BUG: 

Re: [vma caching] BUG: unable to handle kernel paging request at ffff880008142f40

2014-03-11 Thread Fengguang Wu
Hi Davidlohr,

On Mon, Mar 10, 2014 at 02:08:08PM -0700, Davidlohr Bueso wrote:
 On Mon, 2014-03-10 at 10:43 +0800, Fengguang Wu wrote:
  Hi Davidlohr,
  
  I got the below dmesg and the first bad commit is
  
  git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
  commit 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
  Author: Davidlohr Bueso davidl...@hp.com
  AuthorDate: Thu Mar 6 11:01:48 2014 +1100
  Commit: Stephen Rothwell s...@canb.auug.org.au
  CommitDate: Thu Mar 6 11:01:48 2014 +1100
  
  mm: per-thread vma caching
  
  hwclock: can't open '/dev/misc/rtc': No such file or directory
  Running postinst /etc/rpm-postinsts/100...
 
 Hmm this kind of errors strike me as dealing with some bogus vma from a
 stale cache.
 
  [3.658976] BUG: unable to handle kernel paging request at 
  880008142f40
  [3.661422] IP: [8111a1d8] vmacache_find+0x78/0x90
  [3.662223] PGD 2542067 PUD 2543067 PMD fba5067 PTE 88142060
  [3.662223] Oops:  [#1] DEBUG_PAGEALLOC
  [3.662223] Modules linked in:
  [3.662223] CPU: 0 PID: 326 Comm: 90-trinity Not tainted 
  3.14.0-rc5-next-20140307 #1
 
 Have you only seen this through DEBUG_PAGEALLOC + trinity?

Yes.

  [3.662223] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [3.662223] task: 8800083020d0 ti: 8800082a task.ti: 
  8800082a
  [3.662223] RIP: 0010:[8111a1d8]  [8111a1d8] 
  vmacache_find+0x78/0x90
  [3.662223] RSP: :8800082a1e00  EFLAGS: 00010282
  [3.662223] RAX: 880008142f40 RBX: 00a9 RCX: 
  8800083020d0
  [3.662223] RDX: 0002 RSI: 7fff8a141698 RDI: 
  880008124bc0
  [3.662223] RBP: 8800082a1e00 R08:  R09: 
  0001
  [3.662223] R10: 8800083020d0 R11:  R12: 
  7fff8a141698
  [3.662223] R13: 880008124bc0 R14: 8800082a1f58 R15: 
  8800083020d0
  [3.662223] FS:  7fe3ca364700() GS:81a06000() 
  knlGS:
  [3.662223] CS:  0010 DS:  ES:  CR0: 80050033
  [3.662223] CR2: 880008142f40 CR3: 0824c000 CR4: 
  06b0
  [3.662223] Stack:
  [3.662223]  8800082a1e28 81125219 00a9 
  7fff8a141698
  [3.662223]  880008124bc0 8800082a1f28 816d71fe 
  0246
  [3.662223]  0002 880008124c58 0006 
  0010
  [3.662223] Call Trace:
  [3.662223]  [81125219] find_vma+0x19/0x70
  [3.662223]  [816d71fe] __do_page_fault+0x29e/0x560
  [3.662223]  [8116cc6f] ? mntput_no_expire+0x6f/0x1a0
  [3.662223]  [8116cc11] ? mntput_no_expire+0x11/0x1a0
  [3.662223]  [8116cdd5] ? mntput+0x35/0x40
  [3.662223]  [8114f51f] ? __fput+0x24f/0x290
  [3.662223]  [812794ca] ? trace_hardirqs_off_thunk+0x3a/0x3c
  [3.662223]  [816d74ce] do_page_fault+0xe/0x10
  [3.662223]  [816d6ad5] do_async_page_fault+0x35/0x90
  [3.662223]  [816d3b05] async_page_fault+0x25/0x30
  [3.662223] Code: c7 81 b0 02 00 00 00 00 00 00 eb 32 0f 1f 80 00 00 00 
  00 31 d2 66 0f 1f 44 00 00 48 63 c2 48 8b 84 c1 98 02 00 00 48 85 c0 74 0b 
  48 39 30 77 06 48 3b 70 08 72 0a 83 c2 01 83 fa 04 75 dd 31 c0 
  [3.662223] RIP  [8111a1d8] vmacache_find+0x78/0x90
 
 So this is:
0:   c7 81 b0 02 00 00 00movl   $0x0,0x2b0(%rcx)
7:   00 00 00 
a:   eb 32   jmp0x3e
c:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
   13:   31 d2   xor%edx,%edx
   15:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
   1b:   48 63 c2movslq %edx,%rax
   1e:   48 8b 84 c1 98 02 00mov0x298(%rcx,%rax,8),%rax
   25:   00 
   26:   48 85 c0test   %rax,%rax
   29:   74 0b   je 0x36
   2b:*  48 39 30cmp%rsi,(%rax)  -- trapping 
 instruction
 
 
 which seems to be the following, where vma is stale:
   if (vma  vma-vm_start = addr  vma-vm_end  addr)
   return vma;
 
 
   2e:   77 06   ja 0x36
   30:   48 3b 70 08 cmp0x8(%rax),%rsi
   34:   72 0a   jb 0x40
   36:   83 c2 01add$0x1,%edx
   39:   83 fa 04cmp$0x4,%edx
   3c:   75 dd   jne0x1b
   3e:   31 c0   xor%eax,%eax
 
 Could you please try this fix: https://lkml.org/lkml/2014/3/10/505 - it
 is fix for a race Oleg found that can cause us to keep bogus vmas in the
 cache under certain VM_CLONE scenarios.

With the patch applied on top of 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
I still get this:

sed: /lib/modules/3.14.0-rc5-00226-g0d9ad422-dirty/modules.dep: No such file or 
directory
xargs: modprobe: No such file or directory

Re: [vma caching] BUG: unable to handle kernel paging request at ffff880008142f40

2014-03-10 Thread Davidlohr Bueso
On Mon, 2014-03-10 at 10:43 +0800, Fengguang Wu wrote:
> Hi Davidlohr,
> 
> I got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
> Author: Davidlohr Bueso 
> AuthorDate: Thu Mar 6 11:01:48 2014 +1100
> Commit: Stephen Rothwell 
> CommitDate: Thu Mar 6 11:01:48 2014 +1100
> 
> mm: per-thread vma caching
> 
> hwclock: can't open '/dev/misc/rtc': No such file or directory
> Running postinst /etc/rpm-postinsts/100...

Hmm this kind of errors strike me as dealing with some bogus vma from a
stale cache.

> [3.658976] BUG: unable to handle kernel paging request at 880008142f40
> [3.661422] IP: [] vmacache_find+0x78/0x90
> [3.662223] PGD 2542067 PUD 2543067 PMD fba5067 PTE 88142060
> [3.662223] Oops:  [#1] DEBUG_PAGEALLOC
> [3.662223] Modules linked in:
> [3.662223] CPU: 0 PID: 326 Comm: 90-trinity Not tainted 
> 3.14.0-rc5-next-20140307 #1

Have you only seen this through DEBUG_PAGEALLOC + trinity?

> [3.662223] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [3.662223] task: 8800083020d0 ti: 8800082a task.ti: 
> 8800082a
> [3.662223] RIP: 0010:[]  [] 
> vmacache_find+0x78/0x90
> [3.662223] RSP: :8800082a1e00  EFLAGS: 00010282
> [3.662223] RAX: 880008142f40 RBX: 00a9 RCX: 
> 8800083020d0
> [3.662223] RDX: 0002 RSI: 7fff8a141698 RDI: 
> 880008124bc0
> [3.662223] RBP: 8800082a1e00 R08:  R09: 
> 0001
> [3.662223] R10: 8800083020d0 R11:  R12: 
> 7fff8a141698
> [3.662223] R13: 880008124bc0 R14: 8800082a1f58 R15: 
> 8800083020d0
> [3.662223] FS:  7fe3ca364700() GS:81a06000() 
> knlGS:
> [3.662223] CS:  0010 DS:  ES:  CR0: 80050033
> [3.662223] CR2: 880008142f40 CR3: 0824c000 CR4: 
> 06b0
> [3.662223] Stack:
> [3.662223]  8800082a1e28 81125219 00a9 
> 7fff8a141698
> [3.662223]  880008124bc0 8800082a1f28 816d71fe 
> 0246
> [3.662223]  0002 880008124c58 0006 
> 0010
> [3.662223] Call Trace:
> [3.662223]  [] find_vma+0x19/0x70
> [3.662223]  [] __do_page_fault+0x29e/0x560
> [3.662223]  [] ? mntput_no_expire+0x6f/0x1a0
> [3.662223]  [] ? mntput_no_expire+0x11/0x1a0
> [3.662223]  [] ? mntput+0x35/0x40
> [3.662223]  [] ? __fput+0x24f/0x290
> [3.662223]  [] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [3.662223]  [] do_page_fault+0xe/0x10
> [3.662223]  [] do_async_page_fault+0x35/0x90
> [3.662223]  [] async_page_fault+0x25/0x30
> [3.662223] Code: c7 81 b0 02 00 00 00 00 00 00 eb 32 0f 1f 80 00 00 00 00 
> 31 d2 66 0f 1f 44 00 00 48 63 c2 48 8b 84 c1 98 02 00 00 48 85 c0 74 0b <48> 
> 39 30 77 06 48 3b 70 08 72 0a 83 c2 01 83 fa 04 75 dd 31 c0 
> [3.662223] RIP  [] vmacache_find+0x78/0x90

So this is:
   0:   c7 81 b0 02 00 00 00movl   $0x0,0x2b0(%rcx)
   7:   00 00 00 
   a:   eb 32   jmp0x3e
   c:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
  13:   31 d2   xor%edx,%edx
  15:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
  1b:   48 63 c2movslq %edx,%rax
  1e:   48 8b 84 c1 98 02 00mov0x298(%rcx,%rax,8),%rax
  25:   00 
  26:   48 85 c0test   %rax,%rax
  29:   74 0b   je 0x36
  2b:*  48 39 30cmp%rsi,(%rax)  <-- trapping 
instruction


which seems to be the following, where vma is stale:
if (vma && vma->vm_start <= addr && vma->vm_end > addr)
return vma;


  2e:   77 06   ja 0x36
  30:   48 3b 70 08 cmp0x8(%rax),%rsi
  34:   72 0a   jb 0x40
  36:   83 c2 01add$0x1,%edx
  39:   83 fa 04cmp$0x4,%edx
  3c:   75 dd   jne0x1b
  3e:   31 c0   xor%eax,%eax

Could you please try this fix: https://lkml.org/lkml/2014/3/10/505 - it
is fix for a race Oleg found that can cause us to keep bogus vmas in the
cache under certain VM_CLONE scenarios. Also, how frequently are you
able to trigger this?

Thanks for the report.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [vma caching] BUG: unable to handle kernel paging request at ffff880008142f40

2014-03-10 Thread Davidlohr Bueso
On Mon, 2014-03-10 at 10:43 +0800, Fengguang Wu wrote:
 Hi Davidlohr,
 
 I got the below dmesg and the first bad commit is
 
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
 commit 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3
 Author: Davidlohr Bueso davidl...@hp.com
 AuthorDate: Thu Mar 6 11:01:48 2014 +1100
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu Mar 6 11:01:48 2014 +1100
 
 mm: per-thread vma caching
 
 hwclock: can't open '/dev/misc/rtc': No such file or directory
 Running postinst /etc/rpm-postinsts/100...

Hmm this kind of errors strike me as dealing with some bogus vma from a
stale cache.

 [3.658976] BUG: unable to handle kernel paging request at 880008142f40
 [3.661422] IP: [8111a1d8] vmacache_find+0x78/0x90
 [3.662223] PGD 2542067 PUD 2543067 PMD fba5067 PTE 88142060
 [3.662223] Oops:  [#1] DEBUG_PAGEALLOC
 [3.662223] Modules linked in:
 [3.662223] CPU: 0 PID: 326 Comm: 90-trinity Not tainted 
 3.14.0-rc5-next-20140307 #1

Have you only seen this through DEBUG_PAGEALLOC + trinity?

 [3.662223] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [3.662223] task: 8800083020d0 ti: 8800082a task.ti: 
 8800082a
 [3.662223] RIP: 0010:[8111a1d8]  [8111a1d8] 
 vmacache_find+0x78/0x90
 [3.662223] RSP: :8800082a1e00  EFLAGS: 00010282
 [3.662223] RAX: 880008142f40 RBX: 00a9 RCX: 
 8800083020d0
 [3.662223] RDX: 0002 RSI: 7fff8a141698 RDI: 
 880008124bc0
 [3.662223] RBP: 8800082a1e00 R08:  R09: 
 0001
 [3.662223] R10: 8800083020d0 R11:  R12: 
 7fff8a141698
 [3.662223] R13: 880008124bc0 R14: 8800082a1f58 R15: 
 8800083020d0
 [3.662223] FS:  7fe3ca364700() GS:81a06000() 
 knlGS:
 [3.662223] CS:  0010 DS:  ES:  CR0: 80050033
 [3.662223] CR2: 880008142f40 CR3: 0824c000 CR4: 
 06b0
 [3.662223] Stack:
 [3.662223]  8800082a1e28 81125219 00a9 
 7fff8a141698
 [3.662223]  880008124bc0 8800082a1f28 816d71fe 
 0246
 [3.662223]  0002 880008124c58 0006 
 0010
 [3.662223] Call Trace:
 [3.662223]  [81125219] find_vma+0x19/0x70
 [3.662223]  [816d71fe] __do_page_fault+0x29e/0x560
 [3.662223]  [8116cc6f] ? mntput_no_expire+0x6f/0x1a0
 [3.662223]  [8116cc11] ? mntput_no_expire+0x11/0x1a0
 [3.662223]  [8116cdd5] ? mntput+0x35/0x40
 [3.662223]  [8114f51f] ? __fput+0x24f/0x290
 [3.662223]  [812794ca] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [3.662223]  [816d74ce] do_page_fault+0xe/0x10
 [3.662223]  [816d6ad5] do_async_page_fault+0x35/0x90
 [3.662223]  [816d3b05] async_page_fault+0x25/0x30
 [3.662223] Code: c7 81 b0 02 00 00 00 00 00 00 eb 32 0f 1f 80 00 00 00 00 
 31 d2 66 0f 1f 44 00 00 48 63 c2 48 8b 84 c1 98 02 00 00 48 85 c0 74 0b 48 
 39 30 77 06 48 3b 70 08 72 0a 83 c2 01 83 fa 04 75 dd 31 c0 
 [3.662223] RIP  [8111a1d8] vmacache_find+0x78/0x90

So this is:
   0:   c7 81 b0 02 00 00 00movl   $0x0,0x2b0(%rcx)
   7:   00 00 00 
   a:   eb 32   jmp0x3e
   c:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
  13:   31 d2   xor%edx,%edx
  15:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
  1b:   48 63 c2movslq %edx,%rax
  1e:   48 8b 84 c1 98 02 00mov0x298(%rcx,%rax,8),%rax
  25:   00 
  26:   48 85 c0test   %rax,%rax
  29:   74 0b   je 0x36
  2b:*  48 39 30cmp%rsi,(%rax)  -- trapping 
instruction


which seems to be the following, where vma is stale:
if (vma  vma-vm_start = addr  vma-vm_end  addr)
return vma;


  2e:   77 06   ja 0x36
  30:   48 3b 70 08 cmp0x8(%rax),%rsi
  34:   72 0a   jb 0x40
  36:   83 c2 01add$0x1,%edx
  39:   83 fa 04cmp$0x4,%edx
  3c:   75 dd   jne0x1b
  3e:   31 c0   xor%eax,%eax

Could you please try this fix: https://lkml.org/lkml/2014/3/10/505 - it
is fix for a race Oleg found that can cause us to keep bogus vmas in the
cache under certain VM_CLONE scenarios. Also, how frequently are you
able to trigger this?

Thanks for the report.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/