Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote:
> @@ -8,6 +8,7 @@ foo (int a, int b)
>asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
>return 0;
>  lab:
> +  asm ("");
>return 0;
>  }

Or alternatively put the asm (""); right after asm goto,
  asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
  asm ("");
  return ...;
lab;
  return ...;
What generates better code remains to be tested.  In any case, please
conditionalize the hacks on non-fixed compilers once the fix is released.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote:
> > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > >
> > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
> > > > 4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
> > > > beats me to it.  But historically, the case where asm goto labels 
> > > > jump to fallthru basic block had numerous problems in the past.
> > > 
> > > That bug lists the component as middle end; this suggests x86_64 would 
> > > be vulnerable too, can you confirm? So far we've only observed the 
> > > wrong code on i386 targets, x86_64 targets appeared correct.
> > 
> > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
> > even say on ppc64 (sure, one would have to rewrite the asm to have it 
> > fail at runtime).
> 
> Please let us know once you know enough about the bug to suggest 
> workarounds. Because it's a nice optimization even extra instruction(s) 
> would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
> with an (unused) goto to it, or something like that?

IMHO you don't need to put there a nop, I guess asm (""); would be enough,
that will still make sure the label is never in the fallthru basic block
and the whole class of issues with asm goto with labels in the fallthru
bb can't hit.  The disadvantage is that it will generate worse code.

@@ -8,6 +8,7 @@ foo (int a, int b)
   asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
   return 0;
 lab:
+  asm ("");
   return 0;
 }

on the testcase from the PR results in something like:
#APP
# 8 "pr58670-1.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L5:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl$-3, %eax
ret
while code without the extra asm (""); and with a fixed compiler:
#APP
# 6 "pr58670.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
.L2:
movl$-3, %eax
ret

FYI, list of past compiler issues with asm goto include:
PR54127, PR46226, PR44071, PR52650, PR54455, PR51767.

I hope we get this fixed for 4.8.2, so you could then avoid
these hacks for GCC 4.8.2 and later.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Ingo Molnar

* Jakub Jelinek  wrote:

> On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> >
> > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
> > > 4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
> > > beats me to it.  But historically, the case where asm goto labels 
> > > jump to fallthru basic block had numerous problems in the past.
> > 
> > That bug lists the component as middle end; this suggests x86_64 would 
> > be vulnerable too, can you confirm? So far we've only observed the 
> > wrong code on i386 targets, x86_64 targets appeared correct.
> 
> Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
> even say on ppc64 (sure, one would have to rewrite the asm to have it 
> fail at runtime).

Please let us know once you know enough about the bug to suggest 
workarounds. Because it's a nice optimization even extra instruction(s) 
would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
with an (unused) goto to it, or something like that?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Ingo Molnar

* Jakub Jelinek ja...@redhat.com wrote:

 On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
  On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
 
   Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
   4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
   beats me to it.  But historically, the case where asm goto labels 
   jump to fallthru basic block had numerous problems in the past.
  
  That bug lists the component as middle end; this suggests x86_64 would 
  be vulnerable too, can you confirm? So far we've only observed the 
  wrong code on i386 targets, x86_64 targets appeared correct.
 
 Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
 even say on ppc64 (sure, one would have to rewrite the asm to have it 
 fail at runtime).

Please let us know once you know enough about the bug to suggest 
workarounds. Because it's a nice optimization even extra instruction(s) 
would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
with an (unused) goto to it, or something like that?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote:
  On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
   On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
  
Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
beats me to it.  But historically, the case where asm goto labels 
jump to fallthru basic block had numerous problems in the past.
   
   That bug lists the component as middle end; this suggests x86_64 would 
   be vulnerable too, can you confirm? So far we've only observed the 
   wrong code on i386 targets, x86_64 targets appeared correct.
  
  Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
  even say on ppc64 (sure, one would have to rewrite the asm to have it 
  fail at runtime).
 
 Please let us know once you know enough about the bug to suggest 
 workarounds. Because it's a nice optimization even extra instruction(s) 
 would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
 with an (unused) goto to it, or something like that?

IMHO you don't need to put there a nop, I guess asm (); would be enough,
that will still make sure the label is never in the fallthru basic block
and the whole class of issues with asm goto with labels in the fallthru
bb can't hit.  The disadvantage is that it will generate worse code.

@@ -8,6 +8,7 @@ foo (int a, int b)
   asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab);
   return 0;
 lab:
+  asm ();
   return 0;
 }

on the testcase from the PR results in something like:
#APP
# 8 pr58670-1.c 1
bts $1, -4(%rsp); jc .L3
# 0  2
#NO_APP
.L5:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl$-3, %eax
ret
while code without the extra asm (); and with a fixed compiler:
#APP
# 6 pr58670.c 1
bts $1, -4(%rsp); jc .L3
# 0  2
#NO_APP
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
.L2:
movl$-3, %eax
ret

FYI, list of past compiler issues with asm goto include:
PR54127, PR46226, PR44071, PR52650, PR54455, PR51767.

I hope we get this fixed for 4.8.2, so you could then avoid
these hacks for GCC 4.8.2 and later.

Jakub
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote:
 @@ -8,6 +8,7 @@ foo (int a, int b)
asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab);
return 0;
  lab:
 +  asm ();
return 0;
  }

Or alternatively put the asm (); right after asm goto,
  asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab);
  asm ();
  return ...;
lab;
  return ...;
What generates better code remains to be tested.  In any case, please
conditionalize the hacks on non-fixed compilers once the fix is released.

Jakub
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Mike Galbraith
On Wed, 2013-10-09 at 19:18 +0200, Ingo Molnar wrote: 
> * Ingo Molnar  wrote:
> 
> > 
> > * Peter Zijlstra  wrote:
> > 
> > > On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
> > > > > > Fengguang, I do not think this will help, but just in case. Could 
> > > > > > you
> > > > > > show the result of
> > > > > > 
> > > > > > $ kernel/task_work.s
> > > > 
> > > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
> > > 
> > > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
> > > > bts $1, 8(%eax); setc %dl   #,, c
> > > 
> > > That compiler doesn't appear to have asm goto support, so we fall back 
> > > to the code we already knew worked :-)
> > 
> > I'm using 4.7.2 with randconfig testing, which has asm goto support, and 
> > I haven't seen this crash yet.
> > 
> > Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing 
> > bug gets exposed by GCC 4.8.
> 
> And as it happens, just a few hours later I hit a very similar crash, this 
> time compiled with both 4.7.3 and 4.7.2! (config attached)
> 
> This has a weird-x86-arch tuning knob as well:
> 
>   CONFIG_MGEODE_LX=y
> 
> So I think we might need to turn off asm goto for all things 32-bit x86.

Hm, 32 bit x86...

I built 4.8.1 yesterday, so can now build x86_64 tip, but I suspect I'll
not be the only one with a compiler that goes belly up.

net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’:
net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at 
haifa-sched.c:2353

gcc-4.6.2 (opensuse 12.1) has happily chewed up humongous piles of
source, but finds this asm goto stuff to be toxic.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
> > Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
> > unless somebody beats me to it.  But historically, the case where
> > asm goto labels jump to fallthru basic block had numerous problems in the
> > past.
> 
> That bug lists the component as middle end; this suggests x86_64 would
> be vulnerable too, can you confirm? So far we've only observed the wrong
> code on i386 targets, x86_64 targets appeared correct.

Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and
even say on ppc64 (sure, one would have to rewrite the asm to have it fail
at runtime).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
> Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
> unless somebody beats me to it.  But historically, the case where
> asm goto labels jump to fallthru basic block had numerous problems in the
> past.

That bug lists the component as middle end; this suggests x86_64 would
be vulnerable too, can you confirm? So far we've only observed the wrong
code on i386 targets, x86_64 targets appeared correct.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Linus Torvalds
On Wed, Oct 9, 2013 at 11:16 AM, Jakub Jelinek  wrote:
>
> Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
> Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
> unless somebody beats me to it.  But historically, the case where
> asm goto labels jump to fallthru basic block had numerous problems in the
> past.

Ok, so it isn't even specific for x86-32, because your test-case shows
the bug for me on 64-bit too. Apparently we just have a harder time
hitting it in practice in the kernel on x86-64./

Too bad. It makes me nervous about all our _traditional_ uses of asm
goto too, never mind the new ones..

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
> > > Once I force a x86_64 build using the 'same' config it goes away and
> > > generates 'sensible' code again (although I don't see why L9 isn't
> > > merged with L2):
> > 
> > i386-SMP also generates correct code afaict; a tad stupid but not wrong.
> > 
> > If I remove ftrace from the .config its still broken..
> > If I also remove the likely/unlikely tracer its still broken and lots
> > smaller:
> 
> OK, its -march=winchip2 that's buggered.

Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
unless somebody beats me to it.  But historically, the case where
asm goto labels jump to fallthru basic block had numerous problems in the
past.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
> Once I force a x86_64 build using the 'same' config it goes away and
> generates 'sensible' code again (although I don't see why L9 isn't
> merged with L2):

i386-SMP also generates correct code afaict; a tad stupid but not wrong.

If I remove ftrace from the .config its still broken..
If I also remove the likely/unlikely tracer its still broken and lots
smaller:

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
movl%eax, %esi  # task, task
.p2align 4,,15
.L4:
movl904(%esi), %ebx # task_5(D)->task_works, __old
cmpl$work_exited, %ebx  #, __old
je  .L5 #,
movl%ebx, (%edx)# __old, work_10(D)->next
movl%ebx, %eax  # __old, __ret
#APP
# 34 "/usr/src/linux-2.6/kernel/task_work.c" 1
cmpxchgl %edx,904(%esi) # work, *__ptr_12
# 0 "" 2
#NO_APP
cmpl%eax, %ebx  # __ret, __old
jne .L4 #,
testb   %cl, %cl# notify
je  .L6 #,
movl4(%esi), %eax   # task_5(D)->stack, task_5(D)->stack
#APP
# 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1
bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_18],
# 0 "" 2
#NO_APP
.L6:
xorl%edi, %edi  # D.14172
.L2:
movl%edi, %eax  # D.14172,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.L5:
movl$-3, %edi   #, D.14172
jmp .L2 #
.size   task_work_add, .-task_work_add

That "jc .L2" needs to be .L6 ! It looks like it fails to deal with the
empty branch.

Why this thing needs to use EDI is anybodies guess I suppose. Would've
made much more sense to have:

.L6:
xorl %eax, %eax
.L2:
popl %ebx
popl %esi
popl %ebp
ret
.L5:
movl, $-3, %eax
jmp .L2

At least its not duplicating the popl+ret bits 3 times anymore.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> > This is what we are going to return. But note that -20(%ebp) was not
> > initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5
> > above. IOW, in this case we seem to return a random value from stack.
> 
> I think you're quite right, and I can confirm I can reproduce this with
> gcc-4.8.1 and Wu's .config:
>
> [...]
>
> Once I force a x86_64 build using the 'same' config it goes away and 
> generates 'sensible' code again [...]

So this at least opens up the possibility that we can create a not too 
painful quirk and only use the 'asm goto' optimization tricks on 64-bit 
kernels?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Oleg Nesterov
OK, thanks...

I didn't notice Richard and Jakub were not cc'ed... Add them, perhaps
they can take a look.

On 10/09, Peter Zijlstra wrote:
>
> On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote:
> > I'm afraid I am wrong, my asm skills are close to zero... but this
> > code looks wrong to me, and this can explain the oopses.
> >
> > > task_work_add:
> > >   pushl   %ebp#
> > >   movl%esp, %ebp  #,
> > >   pushl   %edi#
> > >   pushl   %esi#
> > >   pushl   %ebx#
> > >   subl$12, %esp   #,
> > >   callmcount
> > >   movl%eax, %edi  # task, task
> > >   movl%edx, -16(%ebp) # work, %sfp
> > >   movb%cl, -21(%ebp)  # notify, %sfp
> > >   .p2align 4,,15
> > > .L3:
> > >   movl904(%edi), %esi # task_3(D)->task_works, head
> > >   cmpl$work_exited, %esi  #, head
> > >   sete%bl #, D.14145
> > >   andl$255, %ebx  #, D.14145
> > >   xorl%ecx, %ecx  #
> > >   movl%ebx, %edx  # D.14145,
> > >   movl$__f.14042, %eax#,
> > >   callftrace_likely_update#
> > >   testl   %ebx, %ebx  # D.14145
> > >   jne .L4 #,
> > >   movl-16(%ebp), %edx # %sfp,
> > >   movl%esi, (%edx)# head, work_13(D)->next
> > >   movl%esi, %eax  # head, __ret
> > > #APP
> > > # 34 "/c/wfg/tip/kernel/task_work.c" 1
> > >   cmpxchgl %edx,904(%edi) #, *__ptr_16
> > > # 0 "" 2
> > > #NO_APP
> > >   cmpl%eax, %esi  # __ret, head
> > >   jne .L3 #,
> >
> > OK, we added the new work successfully, we should return 0. If we return
> > non-zero, fput() (the likely caller) assumes that it should use the 
> > workqueues
> > to close/free this file. Then later task_work_run() will do __fput() again.
> >
> > >   cmpb$0, -21(%ebp)   #, %sfp
> > >   je  .L5 #,
> > >   movl4(%edi), %eax   # task_3(D)->stack, task_3(D)->stack
> > > #APP
> > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
> > >   bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
> > > *)D.14203_29],
> >
> > This is set_notify_resume(). Probably !CONFIG_SMP (I do not see 
> > kick_process).
> >
> > > # 0 "" 2
> > > #NO_APP
> > > .L5:
> > >   movl$0, -20(%ebp)   #, %sfp
> > > .L2:
> > >   movl-20(%ebp), %eax # %sfp,
> >
> > This is what we are going to return. But note that -20(%ebp) was not
> > initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5
> > above. IOW, in this case we seem to return a random value from stack.
>
> I think you're quite right, and I can confirm I can reproduce this with
> gcc-4.8.1 and Wu's .config:
>
> .p2align 4,,15
> .globl  task_work_add
> .type   task_work_add, @function
> task_work_add:
> pushl   %ebp#
> movl%esp, %ebp  #,
> pushl   %edi#
> pushl   %esi#
> pushl   %ebx#
> subl$12, %esp   #,
> callmcount
> movl%eax, %esi  # task, task
> movl%edx, %edi  # work, work
> movl%ecx, -24(%ebp) # notify, %sfp
> jmp .L4 #
> .p2align 4,,15
> .L9:
> movl%ebx, (%edi)# __old, work_15(D)->next
> movl%ebx, %eax  # __old, __ret
> #APP
> # 34 "/usr/src/linux-2.6/kernel/task_work.c" 1
> cmpxchgl %edi,904(%esi) # work, *__ptr_17
> # 0 "" 2
> #NO_APP
> cmpl%eax, %ebx  # __ret, __old
> je  .L8 #,
> .L4:
> movl904(%esi), %ebx # task_7(D)->task_works, __old
> cmpl$work_exited, %ebx  #, __old
> sete-13(%ebp)   #, %sfp
> xorl%edx, %edx  # __r
> movb-13(%ebp), %dl  # %sfp, __r
> xorl%ecx, %ecx  #
> movl$__f.14204, %eax#,
> callftrace_likely_update#
> cmpb$0, -13(%ebp)   #, %sfp
> je  .L9 #,
> movl$-3, -20(%ebp)  #, %sfp
> .L2:
> movl-20(%ebp), %eax # %sfp,
> addl$12, %esp   #,
> popl%ebx#
> popl%esi#
> popl%edi#
> popl%ebp#
> ret
> .p2align 4,,15
> .L8:
> cmpb$0, -24(%ebp)   #, %sfp
> je  .L6 #,
> movl4(%esi), %eax   # task_7(D)->stack, task_7(D)->stack
> #APP
> # 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1
> bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23],
> # 0 "" 2
> #NO_APP
> .L6:
> movl$0, -20(%ebp)   #, %sfp
> movl-20(%ebp), %eax # %sfp,
> addl$12, %esp   #,
> popl%ebx#
> popl%esi#
> popl%edi#
> popl%ebp#
> ret
> .size   task_work_add, .-task_work_add
>
> Once I force a x86_64 build using the 'same' config it goes away and
> generates 'sensible' code again (although I don't see why L9 isn't
> merged with L2):
>
> .p2align 4,,15
> 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote:
> I'm afraid I am wrong, my asm skills are close to zero... but this
> code looks wrong to me, and this can explain the oopses.
> 
> > task_work_add:
> > pushl   %ebp#
> > movl%esp, %ebp  #,
> > pushl   %edi#
> > pushl   %esi#
> > pushl   %ebx#
> > subl$12, %esp   #,
> > callmcount
> > movl%eax, %edi  # task, task
> > movl%edx, -16(%ebp) # work, %sfp
> > movb%cl, -21(%ebp)  # notify, %sfp
> > .p2align 4,,15
> > .L3:
> > movl904(%edi), %esi # task_3(D)->task_works, head
> > cmpl$work_exited, %esi  #, head
> > sete%bl #, D.14145
> > andl$255, %ebx  #, D.14145
> > xorl%ecx, %ecx  #
> > movl%ebx, %edx  # D.14145,
> > movl$__f.14042, %eax#,
> > callftrace_likely_update#
> > testl   %ebx, %ebx  # D.14145
> > jne .L4 #,
> > movl-16(%ebp), %edx # %sfp,
> > movl%esi, (%edx)# head, work_13(D)->next
> > movl%esi, %eax  # head, __ret
> > #APP
> > # 34 "/c/wfg/tip/kernel/task_work.c" 1
> > cmpxchgl %edx,904(%edi) #, *__ptr_16
> > # 0 "" 2
> > #NO_APP
> > cmpl%eax, %esi  # __ret, head
> > jne .L3 #,
> 
> OK, we added the new work successfully, we should return 0. If we return
> non-zero, fput() (the likely caller) assumes that it should use the workqueues
> to close/free this file. Then later task_work_run() will do __fput() again.
> 
> > cmpb$0, -21(%ebp)   #, %sfp
> > je  .L5 #,
> > movl4(%edi), %eax   # task_3(D)->stack, task_3(D)->stack
> > #APP
> > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
> > bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
> > *)D.14203_29],
> 
> This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process).
> 
> > # 0 "" 2
> > #NO_APP
> > .L5:
> > movl$0, -20(%ebp)   #, %sfp
> > .L2:
> > movl-20(%ebp), %eax # %sfp,
> 
> This is what we are going to return. But note that -20(%ebp) was not
> initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5
> above. IOW, in this case we seem to return a random value from stack.

I think you're quite right, and I can confirm I can reproduce this with
gcc-4.8.1 and Wu's .config:

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
subl$12, %esp   #,
callmcount
movl%eax, %esi  # task, task
movl%edx, %edi  # work, work
movl%ecx, -24(%ebp) # notify, %sfp
jmp .L4 #
.p2align 4,,15
.L9:
movl%ebx, (%edi)# __old, work_15(D)->next
movl%ebx, %eax  # __old, __ret
#APP
# 34 "/usr/src/linux-2.6/kernel/task_work.c" 1
cmpxchgl %edi,904(%esi) # work, *__ptr_17
# 0 "" 2
#NO_APP
cmpl%eax, %ebx  # __ret, __old
je  .L8 #,
.L4:
movl904(%esi), %ebx # task_7(D)->task_works, __old
cmpl$work_exited, %ebx  #, __old
sete-13(%ebp)   #, %sfp
xorl%edx, %edx  # __r
movb-13(%ebp), %dl  # %sfp, __r
xorl%ecx, %ecx  #
movl$__f.14204, %eax#,
callftrace_likely_update#
cmpb$0, -13(%ebp)   #, %sfp
je  .L9 #,
movl$-3, -20(%ebp)  #, %sfp
.L2:
movl-20(%ebp), %eax # %sfp,
addl$12, %esp   #,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.p2align 4,,15
.L8:
cmpb$0, -24(%ebp)   #, %sfp
je  .L6 #,
movl4(%esi), %eax   # task_7(D)->stack, task_7(D)->stack
#APP
# 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1
bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23],
# 0 "" 2
#NO_APP
.L6:
movl$0, -20(%ebp)   #, %sfp
movl-20(%ebp), %eax # %sfp,
addl$12, %esp   #,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.size   task_work_add, .-task_work_add

Once I force a x86_64 build using the 'same' config it goes away and
generates 'sensible' code again (although I don't see why L9 isn't
merged with L2):

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
call__fentry__
pushq   %rbp#
movq%rsp, %rbp  #,
pushq   %r15#
pushq   %r14#
movl%edx, %r14d # notify, notify
pushq   %r13#
movq%rsi, %r13  # work, work
pushq   

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
On Wed, Oct 09, 2013 at 02:27:05PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
> > > > Fengguang, I do not think this will help, but just in case. Could you
> > > > show the result of
> > > > 
> > > > $ kernel/task_work.s
> > 
> > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
> 
> > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
> > bts $1, 8(%eax); setc %dl   #,, c
> 
> That compiler doesn't appear to have asm goto support, so we fall back
> to the code we already knew worked :-)

Ah OK..

btw, here is a simple script I used to reproduce the problem. I'll
attach the 3MB yocto initrd in another email. However I suspect
whatever initrd would be OK.

Thanks,
Fengguang


kvm-0day.sh
Description: Bourne shell script


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
> > > > Fengguang, I do not think this will help, but just in case. Could you
> > > > show the result of
> > > > 
> > > > $ kernel/task_work.s
> > 
> > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
> 
> > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
> > bts $1, 8(%eax); setc %dl   #,, c
> 
> That compiler doesn't appear to have asm goto support, so we fall back
> to the code we already knew worked :-)

I'm using 4.7.2 with randconfig testing, which has asm goto support, and I 
haven't seen this crash yet.

Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing 
bug gets exposed by GCC 4.8.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Oleg Nesterov
Hi Fengguang,

On 10/09, Fengguang Wu wrote:
>
> Thanks for looking into this. Attached is the task_work.s for you.

Thanks a lot!

I'm afraid I am wrong, my asm skills are close to zero... but this
code looks wrong to me, and this can explain the oopses.

> task_work_add:
>   pushl   %ebp#
>   movl%esp, %ebp  #,
>   pushl   %edi#
>   pushl   %esi#
>   pushl   %ebx#
>   subl$12, %esp   #,
>   callmcount
>   movl%eax, %edi  # task, task
>   movl%edx, -16(%ebp) # work, %sfp
>   movb%cl, -21(%ebp)  # notify, %sfp
>   .p2align 4,,15
> .L3:
>   movl904(%edi), %esi # task_3(D)->task_works, head
>   cmpl$work_exited, %esi  #, head
>   sete%bl #, D.14145
>   andl$255, %ebx  #, D.14145
>   xorl%ecx, %ecx  #
>   movl%ebx, %edx  # D.14145,
>   movl$__f.14042, %eax#,
>   callftrace_likely_update#
>   testl   %ebx, %ebx  # D.14145
>   jne .L4 #,
>   movl-16(%ebp), %edx # %sfp,
>   movl%esi, (%edx)# head, work_13(D)->next
>   movl%esi, %eax  # head, __ret
> #APP
> # 34 "/c/wfg/tip/kernel/task_work.c" 1
>   cmpxchgl %edx,904(%edi) #, *__ptr_16
> # 0 "" 2
> #NO_APP
>   cmpl%eax, %esi  # __ret, head
>   jne .L3 #,

OK, we added the new work successfully, we should return 0. If we return
non-zero, fput() (the likely caller) assumes that it should use the workqueues
to close/free this file. Then later task_work_run() will do __fput() again.

>   cmpb$0, -21(%ebp)   #, %sfp
>   je  .L5 #,
>   movl4(%edi), %eax   # task_3(D)->stack, task_3(D)->stack
> #APP
> # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
>   bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
> *)D.14203_29],

This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process).

> # 0 "" 2
> #NO_APP
> .L5:
>   movl$0, -20(%ebp)   #, %sfp
> .L2:
>   movl-20(%ebp), %eax # %sfp,

This is what we are going to return. But note that -20(%ebp) was not
initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5
above. IOW, in this case we seem to return a random value from stack.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
> > > Fengguang, I do not think this will help, but just in case. Could you
> > > show the result of
> > > 
> > > $ kernel/task_work.s
> 
> Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!

> # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1
>   bts $1, 8(%eax); setc %dl   #,, c

That compiler doesn't appear to have asm goto support, so we fall back
to the code we already knew worked :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
> > > Fengguang, I do not think this will help, but just in case. Could you
> > > show the result of
> > > 
> > > $ kernel/task_work.s
> 
> Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
> 
> Attached is the new kernel/task_work.s.

Here is the diff:

gcc 4.6.3 vs 4.4.7
==
--- task_work.s 2013-10-09 20:19:48.312272579 +0800
+++ /tmp/task_work.s2013-10-09 20:18:14.0 +0800
@@ -1,136 +1,150 @@
.file   "task_work.c"
-# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu)
-#  compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 
3.1.0-p3, MPC version 0.9
-# warning: GMP header version 5.0.4 differs from library version 5.0.2.
-# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2.
+# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu)
+#  compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 
3.1.1-p2.
+# warning: GMP header version 5.1.1 differs from library version 5.0.2.
 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
-# options passed:  -nostdinc -I /c/wfg/tip/arch/x86/include
-# -I arch/x86/include/generated -I /c/wfg/tip/include -I include
-# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi
-# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel
-# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__
-# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1
-# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1
-# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s
-# -D KBUILD_BASENAME=KBUILD_STR(task_work)
-# -D KBUILD_MODNAME=KBUILD_STR(task_work)
-# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include
-# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
+# options passed:  -nostdinc -I/c/wfg/tip/arch/x86/include
+# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude
+# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi
+# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel
+# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__
+# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
+# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1
+# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work)
+# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem
+# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include
+# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
 # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
-# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
-# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes
-# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security
-# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable
-# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing
-# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
+# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip
+# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
+# -Werror-implicit-function-declaration -Wno-format-security
+# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement
+# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common
+# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
 # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
 # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
 # -fconserve-stack -fverbose-asm
-# options enabled:  -fauto-inc-dec -fbranch-count-reg -fcaller-saves
-# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers
-# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize
-# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types
-# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse
-# -fgcse-lm -fguess-branch-probability -fident -fif-conversion
-# -fif-conversion2 -findirect-inlining -finline
-# -finline-functions-called-once -finline-small-functions -fipa-cp
-# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
+# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
+# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping
+# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining
+# -feliminate-unused-debug-types -fexpensive-optimizations
+# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
+# -fguess-branch-probability -fident -fif-conversion -fif-conversion2
+# -findirect-inlining -finline -finline-functions-called-once
+# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference
 # -fira-share-save-slots -fira-share-spill-slots -fivopts
 # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
> > Fengguang, I do not think this will help, but just in case. Could you
> > show the result of
> > 
> > $ kernel/task_work.s

Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!

Attached is the new kernel/task_work.s.

Thanks,
Fengguang
.file   "task_work.c"
# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu)
#   compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 
3.1.1-p2.
# warning: GMP header version 5.1.1 differs from library version 5.0.2.
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -nostdinc -I/c/wfg/tip/arch/x86/include
# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude
# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi
# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel
# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__
# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1
# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work)
# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem
# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include
# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
# /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
# -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip
# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
# -Werror-implicit-function-declaration -Wno-format-security
# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement
# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common
# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
# -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
# -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
# -fconserve-stack -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping
# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -fexpensive-optimizations
# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
# -fguess-branch-probability -fident -fif-conversion -fif-conversion2
# -findirect-inlining -finline -finline-functions-called-once
# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
# -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move
# -fpeephole -fpeephole2 -fprofile -freg-struct-return -fregmove
# -freorder-blocks -freorder-functions -frerun-cse-after-loop
# -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros
# -fsplit-ivs-in-unroller -fsplit-wide-types -fthread-jumps
# -ftoplevel-reorder -ftrapping-math -ftree-builtin-call-dce -ftree-ccp
# -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
# -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im
# -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
# -ftree-pre -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-sra
# -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version -ftree-vrp
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mfused-madd -mglibc
# -mieee-fp -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf
# -mtls-direct-seg-refs

# Compiler executable checksum: f7c11247ad5a53a602823d9bd673a474

.section.rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "/c/wfg/tip/kernel/task_work.c"
.text
.p2align 4,,15
.globl task_work_run
.type   task_work_run, @function
task_work_run:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
callmcount
#APP
# 14 "/c/wfg/tip/arch/x86/include/asm/current.h" 1
movl current_task,%edi  #, task
# 0 "" 2
#NO_APP
leal904(%edi), %ebx #, D.18648
.p2align 4,,15
.L15:
movl(%ebx), %edx#* D.18648, work
testl   %edx, %edx  # work
je  .L17#,
.L2:
xorl%ecx, %ecx  # head.458
.L3:
movl%edx, %eax  # work, __ret
#APP
# 99 "/c/wfg/tip/kernel/task_work.c" 1
cmpxchgl %ecx,(%ebx)# head.458,* D.18648
# 0 "" 2
#NO_APP
cmpl%eax, %edx  # __ret, work
jne .L15#,
testl   %edx, %edx  # work
je  .L10#,
.p2align 4,,15
.L12:
#APP
# 656 "/c/wfg/tip/arch/x86/include/asm/processor.h" 1
rep; nop
# 0 "" 2
#NO_APP
movl960(%edi), %eax # .pi_lock.raw_lock.slock, D.18658
testl   %eax, %eax  # D.18658
je  

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
Hi Oleg,

Thanks for looking into this. Attached is the task_work.s for you.

> Fengguang, I do not think this will help, but just in case. Could you
> show the result of
> 
> $ kernel/task_work.s
> 
> ?

Sorry I lost some emails and found it back in LKML. Opened up too many
mutt clients..

Thanks,
Fengguang
.file   "task_work.c"
# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu)
#   compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 
3.1.0-p3, MPC version 0.9
# warning: GMP header version 5.0.4 differs from library version 5.0.2.
# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2.
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -nostdinc -I /c/wfg/tip/arch/x86/include
# -I arch/x86/include/generated -I /c/wfg/tip/include -I include
# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi
# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel
# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__
# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1
# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1
# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s
# -D KBUILD_BASENAME=KBUILD_STR(task_work)
# -D KBUILD_MODNAME=KBUILD_STR(task_work)
# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include
# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
# /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
# -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes
# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security
# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable
# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing
# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
# -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
# -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
# -fconserve-stack -fverbose-asm
# options enabled:  -fauto-inc-dec -fbranch-count-reg -fcaller-saves
# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers
# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize
# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types
# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse
# -fgcse-lm -fguess-branch-probability -fident -fif-conversion
# -fif-conversion2 -findirect-inlining -finline
# -finline-functions-called-once -finline-small-functions -fipa-cp
# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
# -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move
# -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays
# -fprofile -freg-struct-return -fregmove -freorder-blocks
# -freorder-functions -frerun-cse-after-loop
# -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields
# -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
# -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
# -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
# -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pre -ftree-pta
# -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize
# -ftree-sra -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version
# -ftree-vrp -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mglibc -mieee-fp
# -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf
# -mtls-direct-seg-refs

# Compiler executable checksum: aa5cb4c8e9c62c6cc9349213df314c34

.text
.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
subl$12, %esp   #,
callmcount
movl%eax, %edi  # task, task
movl%edx, -16(%ebp) # work, %sfp
movb%cl, -21(%ebp)  # notify, %sfp
.p2align 4,,15
.L3:
movl904(%edi), %esi # task_3(D)->task_works, head
cmpl$work_exited, %esi  #, head
sete%bl #, D.14145
andl$255, %ebx  #, D.14145
  

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
Hi Oleg,

Thanks for looking into this. Attached is the task_work.s for you.

 Fengguang, I do not think this will help, but just in case. Could you
 show the result of
 
 $ kernel/task_work.s
 
 ?

Sorry I lost some emails and found it back in LKML. Opened up too many
mutt clients..

Thanks,
Fengguang
.file   task_work.c
# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu)
#   compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 
3.1.0-p3, MPC version 0.9
# warning: GMP header version 5.0.4 differs from library version 5.0.2.
# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2.
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -nostdinc -I /c/wfg/tip/arch/x86/include
# -I arch/x86/include/generated -I /c/wfg/tip/include -I include
# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi
# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel
# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__
# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1
# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1
# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s
# -D KBUILD_BASENAME=KBUILD_STR(task_work)
# -D KBUILD_MODNAME=KBUILD_STR(task_work)
# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include
# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
# /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
# -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes
# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security
# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable
# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing
# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
# -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
# -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
# -fconserve-stack -fverbose-asm
# options enabled:  -fauto-inc-dec -fbranch-count-reg -fcaller-saves
# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers
# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize
# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types
# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse
# -fgcse-lm -fguess-branch-probability -fident -fif-conversion
# -fif-conversion2 -findirect-inlining -finline
# -finline-functions-called-once -finline-small-functions -fipa-cp
# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
# -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move
# -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays
# -fprofile -freg-struct-return -fregmove -freorder-blocks
# -freorder-functions -frerun-cse-after-loop
# -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields
# -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
# -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
# -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
# -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pre -ftree-pta
# -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize
# -ftree-sra -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version
# -ftree-vrp -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mglibc -mieee-fp
# -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf
# -mtls-direct-seg-refs

# Compiler executable checksum: aa5cb4c8e9c62c6cc9349213df314c34

.text
.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
subl$12, %esp   #,
callmcount
movl%eax, %edi  # task, task
movl%edx, -16(%ebp) # work, %sfp
movb%cl, -21(%ebp)  # notify, %sfp
.p2align 4,,15
.L3:
movl904(%edi), %esi # task_3(D)-task_works, head
cmpl$work_exited, %esi  #, head
sete%bl #, D.14145
andl$255, %ebx  #, D.14145

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
  Fengguang, I do not think this will help, but just in case. Could you
  show the result of
  
  $ kernel/task_work.s

Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!

Attached is the new kernel/task_work.s.

Thanks,
Fengguang
.file   task_work.c
# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu)
#   compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 
3.1.1-p2.
# warning: GMP header version 5.1.1 differs from library version 5.0.2.
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -nostdinc -I/c/wfg/tip/arch/x86/include
# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude
# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi
# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel
# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__
# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1
# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work)
# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem
# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include
# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
# /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
# -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip
# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
# -Werror-implicit-function-declaration -Wno-format-security
# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement
# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common
# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
# -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
# -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
# -fconserve-stack -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping
# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -fexpensive-optimizations
# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
# -fguess-branch-probability -fident -fif-conversion -fif-conversion2
# -findirect-inlining -finline -finline-functions-called-once
# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
# -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move
# -fpeephole -fpeephole2 -fprofile -freg-struct-return -fregmove
# -freorder-blocks -freorder-functions -frerun-cse-after-loop
# -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros
# -fsplit-ivs-in-unroller -fsplit-wide-types -fthread-jumps
# -ftoplevel-reorder -ftrapping-math -ftree-builtin-call-dce -ftree-ccp
# -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
# -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im
# -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
# -ftree-pre -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-sra
# -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version -ftree-vrp
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mfused-madd -mglibc
# -mieee-fp -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf
# -mtls-direct-seg-refs

# Compiler executable checksum: f7c11247ad5a53a602823d9bd673a474

.section.rodata.str1.1,aMS,@progbits,1
.LC0:
.string /c/wfg/tip/kernel/task_work.c
.text
.p2align 4,,15
.globl task_work_run
.type   task_work_run, @function
task_work_run:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
callmcount
#APP
# 14 /c/wfg/tip/arch/x86/include/asm/current.h 1
movl current_task,%edi  #, task
# 0  2
#NO_APP
leal904(%edi), %ebx #, D.18648
.p2align 4,,15
.L15:
movl(%ebx), %edx#* D.18648, work
testl   %edx, %edx  # work
je  .L17#,
.L2:
xorl%ecx, %ecx  # head.458
.L3:
movl%edx, %eax  # work, __ret
#APP
# 99 /c/wfg/tip/kernel/task_work.c 1
cmpxchgl %ecx,(%ebx)# head.458,* D.18648
# 0  2
#NO_APP
cmpl%eax, %edx  # __ret, work
jne .L15#,
testl   %edx, %edx  # work
je  .L10#,
.p2align 4,,15
.L12:
#APP
# 656 /c/wfg/tip/arch/x86/include/asm/processor.h 1
rep; nop
# 0  2
#NO_APP
movl960(%edi), %eax # variable.pi_lock.raw_lock.slock, D.18658
testl   %eax, %eax  # D.18658
je  .L12#,

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
   Fengguang, I do not think this will help, but just in case. Could you
   show the result of
   
   $ kernel/task_work.s
 
 Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
 
 Attached is the new kernel/task_work.s.

Here is the diff:

gcc 4.6.3 vs 4.4.7
==
--- task_work.s 2013-10-09 20:19:48.312272579 +0800
+++ /tmp/task_work.s2013-10-09 20:18:14.0 +0800
@@ -1,136 +1,150 @@
.file   task_work.c
-# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu)
-#  compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 
3.1.0-p3, MPC version 0.9
-# warning: GMP header version 5.0.4 differs from library version 5.0.2.
-# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2.
+# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu)
+#  compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 
3.1.1-p2.
+# warning: GMP header version 5.1.1 differs from library version 5.0.2.
 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
-# options passed:  -nostdinc -I /c/wfg/tip/arch/x86/include
-# -I arch/x86/include/generated -I /c/wfg/tip/include -I include
-# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi
-# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel
-# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__
-# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1
-# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1
-# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s
-# -D KBUILD_BASENAME=KBUILD_STR(task_work)
-# -D KBUILD_MODNAME=KBUILD_STR(task_work)
-# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include
-# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
+# options passed:  -nostdinc -I/c/wfg/tip/arch/x86/include
+# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude
+# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi
+# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel
+# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__
+# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
+# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1
+# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work)
+# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem
+# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include
+# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d
 # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3
 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args
-# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
-# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes
-# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security
-# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable
-# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing
-# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
+# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip
+# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
+# -Werror-implicit-function-declaration -Wno-format-security
+# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement
+# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common
+# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic
 # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector
 # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow
 # -fconserve-stack -fverbose-asm
-# options enabled:  -fauto-inc-dec -fbranch-count-reg -fcaller-saves
-# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers
-# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize
-# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types
-# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse
-# -fgcse-lm -fguess-branch-probability -fident -fif-conversion
-# -fif-conversion2 -findirect-inlining -finline
-# -finline-functions-called-once -finline-small-functions -fipa-cp
-# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
+# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
+# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping
+# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining
+# -feliminate-unused-debug-types -fexpensive-optimizations
+# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
+# -fguess-branch-probability -fident -fif-conversion -fif-conversion2
+# -findirect-inlining -finline -finline-functions-called-once
+# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference
 # -fira-share-save-slots -fira-share-spill-slots -fivopts
 # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
   Fengguang, I do not think this will help, but just in case. Could you
   show the result of
   
   $ kernel/task_work.s
 
 Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!

 # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
   bts $1, 8(%eax); setc %dl   #,, c

That compiler doesn't appear to have asm goto support, so we fall back
to the code we already knew worked :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Oleg Nesterov
Hi Fengguang,

On 10/09, Fengguang Wu wrote:

 Thanks for looking into this. Attached is the task_work.s for you.

Thanks a lot!

I'm afraid I am wrong, my asm skills are close to zero... but this
code looks wrong to me, and this can explain the oopses.

 task_work_add:
   pushl   %ebp#
   movl%esp, %ebp  #,
   pushl   %edi#
   pushl   %esi#
   pushl   %ebx#
   subl$12, %esp   #,
   callmcount
   movl%eax, %edi  # task, task
   movl%edx, -16(%ebp) # work, %sfp
   movb%cl, -21(%ebp)  # notify, %sfp
   .p2align 4,,15
 .L3:
   movl904(%edi), %esi # task_3(D)-task_works, head
   cmpl$work_exited, %esi  #, head
   sete%bl #, D.14145
   andl$255, %ebx  #, D.14145
   xorl%ecx, %ecx  #
   movl%ebx, %edx  # D.14145,
   movl$__f.14042, %eax#,
   callftrace_likely_update#
   testl   %ebx, %ebx  # D.14145
   jne .L4 #,
   movl-16(%ebp), %edx # %sfp,
   movl%esi, (%edx)# head, work_13(D)-next
   movl%esi, %eax  # head, __ret
 #APP
 # 34 /c/wfg/tip/kernel/task_work.c 1
   cmpxchgl %edx,904(%edi) #, *__ptr_16
 # 0  2
 #NO_APP
   cmpl%eax, %esi  # __ret, head
   jne .L3 #,

OK, we added the new work successfully, we should return 0. If we return
non-zero, fput() (the likely caller) assumes that it should use the workqueues
to close/free this file. Then later task_work_run() will do __fput() again.

   cmpb$0, -21(%ebp)   #, %sfp
   je  .L5 #,
   movl4(%edi), %eax   # task_3(D)-stack, task_3(D)-stack
 #APP
 # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
   bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
 *)D.14203_29],

This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process).

 # 0  2
 #NO_APP
 .L5:
   movl$0, -20(%ebp)   #, %sfp
 .L2:
   movl-20(%ebp), %eax # %sfp,

This is what we are going to return. But note that -20(%ebp) was not
initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5
above. IOW, in this case we seem to return a random value from stack.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
Fengguang, I do not think this will help, but just in case. Could you
show the result of

$ kernel/task_work.s
  
  Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
 
  # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
  bts $1, 8(%eax); setc %dl   #,, c
 
 That compiler doesn't appear to have asm goto support, so we fall back
 to the code we already knew worked :-)

I'm using 4.7.2 with randconfig testing, which has asm goto support, and I 
haven't seen this crash yet.

Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing 
bug gets exposed by GCC 4.8.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Fengguang Wu
On Wed, Oct 09, 2013 at 02:27:05PM +0200, Peter Zijlstra wrote:
 On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
Fengguang, I do not think this will help, but just in case. Could you
show the result of

$ kernel/task_work.s
  
  Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
 
  # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
  bts $1, 8(%eax); setc %dl   #,, c
 
 That compiler doesn't appear to have asm goto support, so we fall back
 to the code we already knew worked :-)

Ah OK..

btw, here is a simple script I used to reproduce the problem. I'll
attach the 3MB yocto initrd in another email. However I suspect
whatever initrd would be OK.

Thanks,
Fengguang


kvm-0day.sh
Description: Bourne shell script


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote:
 I'm afraid I am wrong, my asm skills are close to zero... but this
 code looks wrong to me, and this can explain the oopses.
 
  task_work_add:
  pushl   %ebp#
  movl%esp, %ebp  #,
  pushl   %edi#
  pushl   %esi#
  pushl   %ebx#
  subl$12, %esp   #,
  callmcount
  movl%eax, %edi  # task, task
  movl%edx, -16(%ebp) # work, %sfp
  movb%cl, -21(%ebp)  # notify, %sfp
  .p2align 4,,15
  .L3:
  movl904(%edi), %esi # task_3(D)-task_works, head
  cmpl$work_exited, %esi  #, head
  sete%bl #, D.14145
  andl$255, %ebx  #, D.14145
  xorl%ecx, %ecx  #
  movl%ebx, %edx  # D.14145,
  movl$__f.14042, %eax#,
  callftrace_likely_update#
  testl   %ebx, %ebx  # D.14145
  jne .L4 #,
  movl-16(%ebp), %edx # %sfp,
  movl%esi, (%edx)# head, work_13(D)-next
  movl%esi, %eax  # head, __ret
  #APP
  # 34 /c/wfg/tip/kernel/task_work.c 1
  cmpxchgl %edx,904(%edi) #, *__ptr_16
  # 0  2
  #NO_APP
  cmpl%eax, %esi  # __ret, head
  jne .L3 #,
 
 OK, we added the new work successfully, we should return 0. If we return
 non-zero, fput() (the likely caller) assumes that it should use the workqueues
 to close/free this file. Then later task_work_run() will do __fput() again.
 
  cmpb$0, -21(%ebp)   #, %sfp
  je  .L5 #,
  movl4(%edi), %eax   # task_3(D)-stack, task_3(D)-stack
  #APP
  # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
  bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
  *)D.14203_29],
 
 This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process).
 
  # 0  2
  #NO_APP
  .L5:
  movl$0, -20(%ebp)   #, %sfp
  .L2:
  movl-20(%ebp), %eax # %sfp,
 
 This is what we are going to return. But note that -20(%ebp) was not
 initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5
 above. IOW, in this case we seem to return a random value from stack.

I think you're quite right, and I can confirm I can reproduce this with
gcc-4.8.1 and Wu's .config:

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
subl$12, %esp   #,
callmcount
movl%eax, %esi  # task, task
movl%edx, %edi  # work, work
movl%ecx, -24(%ebp) # notify, %sfp
jmp .L4 #
.p2align 4,,15
.L9:
movl%ebx, (%edi)# __old, work_15(D)-next
movl%ebx, %eax  # __old, __ret
#APP
# 34 /usr/src/linux-2.6/kernel/task_work.c 1
cmpxchgl %edi,904(%esi) # work, *__ptr_17
# 0  2
#NO_APP
cmpl%eax, %ebx  # __ret, __old
je  .L8 #,
.L4:
movl904(%esi), %ebx # task_7(D)-task_works, __old
cmpl$work_exited, %ebx  #, __old
sete-13(%ebp)   #, %sfp
xorl%edx, %edx  # __r
movb-13(%ebp), %dl  # %sfp, __r
xorl%ecx, %ecx  #
movl$__f.14204, %eax#,
callftrace_likely_update#
cmpb$0, -13(%ebp)   #, %sfp
je  .L9 #,
movl$-3, -20(%ebp)  #, %sfp
.L2:
movl-20(%ebp), %eax # %sfp,
addl$12, %esp   #,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.p2align 4,,15
.L8:
cmpb$0, -24(%ebp)   #, %sfp
je  .L6 #,
movl4(%esi), %eax   # task_7(D)-stack, task_7(D)-stack
#APP
# 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1
bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23],
# 0  2
#NO_APP
.L6:
movl$0, -20(%ebp)   #, %sfp
movl-20(%ebp), %eax # %sfp,
addl$12, %esp   #,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.size   task_work_add, .-task_work_add

Once I force a x86_64 build using the 'same' config it goes away and
generates 'sensible' code again (although I don't see why L9 isn't
merged with L2):

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
call__fentry__
pushq   %rbp#
movq%rsp, %rbp  #,
pushq   %r15#
pushq   %r14#
movl%edx, %r14d # notify, notify
pushq   %r13#
movq%rsi, %r13  # work, work
pushq   %r12#
movq%rdi, %r12  # task, task
pushq   %rbx#
jmp .L4 #
.p2align 4,,10
   

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Oleg Nesterov
OK, thanks...

I didn't notice Richard and Jakub were not cc'ed... Add them, perhaps
they can take a look.

On 10/09, Peter Zijlstra wrote:

 On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote:
  I'm afraid I am wrong, my asm skills are close to zero... but this
  code looks wrong to me, and this can explain the oopses.
 
   task_work_add:
 pushl   %ebp#
 movl%esp, %ebp  #,
 pushl   %edi#
 pushl   %esi#
 pushl   %ebx#
 subl$12, %esp   #,
 callmcount
 movl%eax, %edi  # task, task
 movl%edx, -16(%ebp) # work, %sfp
 movb%cl, -21(%ebp)  # notify, %sfp
 .p2align 4,,15
   .L3:
 movl904(%edi), %esi # task_3(D)-task_works, head
 cmpl$work_exited, %esi  #, head
 sete%bl #, D.14145
 andl$255, %ebx  #, D.14145
 xorl%ecx, %ecx  #
 movl%ebx, %edx  # D.14145,
 movl$__f.14042, %eax#,
 callftrace_likely_update#
 testl   %ebx, %ebx  # D.14145
 jne .L4 #,
 movl-16(%ebp), %edx # %sfp,
 movl%esi, (%edx)# head, work_13(D)-next
 movl%esi, %eax  # head, __ret
   #APP
   # 34 /c/wfg/tip/kernel/task_work.c 1
 cmpxchgl %edx,904(%edi) #, *__ptr_16
   # 0  2
   #NO_APP
 cmpl%eax, %esi  # __ret, head
 jne .L3 #,
 
  OK, we added the new work successfully, we should return 0. If we return
  non-zero, fput() (the likely caller) assumes that it should use the 
  workqueues
  to close/free this file. Then later task_work_run() will do __fput() again.
 
 cmpb$0, -21(%ebp)   #, %sfp
 je  .L5 #,
 movl4(%edi), %eax   # task_3(D)-stack, task_3(D)-stack
   #APP
   # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int 
   *)D.14203_29],
 
  This is set_notify_resume(). Probably !CONFIG_SMP (I do not see 
  kick_process).
 
   # 0  2
   #NO_APP
   .L5:
 movl$0, -20(%ebp)   #, %sfp
   .L2:
 movl-20(%ebp), %eax # %sfp,
 
  This is what we are going to return. But note that -20(%ebp) was not
  initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5
  above. IOW, in this case we seem to return a random value from stack.

 I think you're quite right, and I can confirm I can reproduce this with
 gcc-4.8.1 and Wu's .config:

 .p2align 4,,15
 .globl  task_work_add
 .type   task_work_add, @function
 task_work_add:
 pushl   %ebp#
 movl%esp, %ebp  #,
 pushl   %edi#
 pushl   %esi#
 pushl   %ebx#
 subl$12, %esp   #,
 callmcount
 movl%eax, %esi  # task, task
 movl%edx, %edi  # work, work
 movl%ecx, -24(%ebp) # notify, %sfp
 jmp .L4 #
 .p2align 4,,15
 .L9:
 movl%ebx, (%edi)# __old, work_15(D)-next
 movl%ebx, %eax  # __old, __ret
 #APP
 # 34 /usr/src/linux-2.6/kernel/task_work.c 1
 cmpxchgl %edi,904(%esi) # work, *__ptr_17
 # 0  2
 #NO_APP
 cmpl%eax, %ebx  # __ret, __old
 je  .L8 #,
 .L4:
 movl904(%esi), %ebx # task_7(D)-task_works, __old
 cmpl$work_exited, %ebx  #, __old
 sete-13(%ebp)   #, %sfp
 xorl%edx, %edx  # __r
 movb-13(%ebp), %dl  # %sfp, __r
 xorl%ecx, %ecx  #
 movl$__f.14204, %eax#,
 callftrace_likely_update#
 cmpb$0, -13(%ebp)   #, %sfp
 je  .L9 #,
 movl$-3, -20(%ebp)  #, %sfp
 .L2:
 movl-20(%ebp), %eax # %sfp,
 addl$12, %esp   #,
 popl%ebx#
 popl%esi#
 popl%edi#
 popl%ebp#
 ret
 .p2align 4,,15
 .L8:
 cmpb$0, -24(%ebp)   #, %sfp
 je  .L6 #,
 movl4(%esi), %eax   # task_7(D)-stack, task_7(D)-stack
 #APP
 # 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1
 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23],
 # 0  2
 #NO_APP
 .L6:
 movl$0, -20(%ebp)   #, %sfp
 movl-20(%ebp), %eax # %sfp,
 addl$12, %esp   #,
 popl%ebx#
 popl%esi#
 popl%edi#
 popl%ebp#
 ret
 .size   task_work_add, .-task_work_add

 Once I force a x86_64 build using the 'same' config it goes away and
 generates 'sensible' code again (although I don't see why L9 isn't
 merged with L2):

 .p2align 4,,15
 .globl  task_work_add
 .type   task_work_add, @function
 task_work_add:
 call__fentry__
 pushq   %rbp#
 movq%rsp, %rbp  #,
 pushq   %r15#
 pushq   %r14#
 movl%edx, %r14d # notify, notify
  

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

  This is what we are going to return. But note that -20(%ebp) was not
  initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5
  above. IOW, in this case we seem to return a random value from stack.
 
 I think you're quite right, and I can confirm I can reproduce this with
 gcc-4.8.1 and Wu's .config:

 [...]

 Once I force a x86_64 build using the 'same' config it goes away and 
 generates 'sensible' code again [...]

So this at least opens up the possibility that we can create a not too 
painful quirk and only use the 'asm goto' optimization tricks on 64-bit 
kernels?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
 Once I force a x86_64 build using the 'same' config it goes away and
 generates 'sensible' code again (although I don't see why L9 isn't
 merged with L2):

i386-SMP also generates correct code afaict; a tad stupid but not wrong.

If I remove ftrace from the .config its still broken..
If I also remove the likely/unlikely tracer its still broken and lots
smaller:

.p2align 4,,15
.globl  task_work_add
.type   task_work_add, @function
task_work_add:
pushl   %ebp#
movl%esp, %ebp  #,
pushl   %edi#
pushl   %esi#
pushl   %ebx#
movl%eax, %esi  # task, task
.p2align 4,,15
.L4:
movl904(%esi), %ebx # task_5(D)-task_works, __old
cmpl$work_exited, %ebx  #, __old
je  .L5 #,
movl%ebx, (%edx)# __old, work_10(D)-next
movl%ebx, %eax  # __old, __ret
#APP
# 34 /usr/src/linux-2.6/kernel/task_work.c 1
cmpxchgl %edx,904(%esi) # work, *__ptr_12
# 0  2
#NO_APP
cmpl%eax, %ebx  # __ret, __old
jne .L4 #,
testb   %cl, %cl# notify
je  .L6 #,
movl4(%esi), %eax   # task_5(D)-stack, task_5(D)-stack
#APP
# 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1
bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_18],
# 0  2
#NO_APP
.L6:
xorl%edi, %edi  # D.14172
.L2:
movl%edi, %eax  # D.14172,
popl%ebx#
popl%esi#
popl%edi#
popl%ebp#
ret
.L5:
movl$-3, %edi   #, D.14172
jmp .L2 #
.size   task_work_add, .-task_work_add

That jc .L2 needs to be .L6 ! It looks like it fails to deal with the
empty branch.

Why this thing needs to use EDI is anybodies guess I suppose. Would've
made much more sense to have:

.L6:
xorl %eax, %eax
.L2:
popl %ebx
popl %esi
popl %ebp
ret
.L5:
movl, $-3, %eax
jmp .L2

At least its not duplicating the popl+ret bits 3 times anymore.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote:
 On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote:
  On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
   Once I force a x86_64 build using the 'same' config it goes away and
   generates 'sensible' code again (although I don't see why L9 isn't
   merged with L2):
  
  i386-SMP also generates correct code afaict; a tad stupid but not wrong.
  
  If I remove ftrace from the .config its still broken..
  If I also remove the likely/unlikely tracer its still broken and lots
  smaller:
 
 OK, its -march=winchip2 that's buggered.

Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
unless somebody beats me to it.  But historically, the case where
asm goto labels jump to fallthru basic block had numerous problems in the
past.

Jakub
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Linus Torvalds
On Wed, Oct 9, 2013 at 11:16 AM, Jakub Jelinek ja...@redhat.com wrote:

 Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
 Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
 unless somebody beats me to it.  But historically, the case where
 asm goto labels jump to fallthru basic block had numerous problems in the
 past.

Ok, so it isn't even specific for x86-32, because your test-case shows
the bug for me on 64-bit too. Apparently we just have a harder time
hitting it in practice in the kernel on x86-64./

Too bad. It makes me nervous about all our _traditional_ uses of asm
goto too, never mind the new ones..

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
 Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
 Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
 unless somebody beats me to it.  But historically, the case where
 asm goto labels jump to fallthru basic block had numerous problems in the
 past.

That bug lists the component as middle end; this suggests x86_64 would
be vulnerable too, can you confirm? So far we've only observed the wrong
code on i386 targets, x86_64 targets appeared correct.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
 On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
  Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
  Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
  unless somebody beats me to it.  But historically, the case where
  asm goto labels jump to fallthru basic block had numerous problems in the
  past.
 
 That bug lists the component as middle end; this suggests x86_64 would
 be vulnerable too, can you confirm? So far we've only observed the wrong
 code on i386 targets, x86_64 targets appeared correct.

Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and
even say on ppc64 (sure, one would have to rewrite the asm to have it fail
at runtime).

Jakub
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Mike Galbraith
On Wed, 2013-10-09 at 19:18 +0200, Ingo Molnar wrote: 
 * Ingo Molnar mi...@kernel.org wrote:
 
  
  * Peter Zijlstra pet...@infradead.org wrote:
  
   On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote:
  Fengguang, I do not think this will help, but just in case. Could 
  you
  show the result of
  
  $ kernel/task_work.s

Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine!
   
# 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1
bts $1, 8(%eax); setc %dl   #,, c
   
   That compiler doesn't appear to have asm goto support, so we fall back 
   to the code we already knew worked :-)
  
  I'm using 4.7.2 with randconfig testing, which has asm goto support, and 
  I haven't seen this crash yet.
  
  Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing 
  bug gets exposed by GCC 4.8.
 
 And as it happens, just a few hours later I hit a very similar crash, this 
 time compiled with both 4.7.3 and 4.7.2! (config attached)
 
 This has a weird-x86-arch tuning knob as well:
 
   CONFIG_MGEODE_LX=y
 
 So I think we might need to turn off asm goto for all things 32-bit x86.

Hm, 32 bit x86...

I built 4.8.1 yesterday, so can now build x86_64 tip, but I suspect I'll
not be the only one with a compiler that goes belly up.

net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’:
net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at 
haifa-sched.c:2353

gcc-4.6.2 (opensuse 12.1) has happily chewed up humongous piles of
source, but finds this asm goto stuff to be toxic.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Mike Galbraith
On Tue, 2013-10-08 at 21:05 +0200, Jakub Jelinek wrote: 
> On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> > On 10/08, Linus Torvalds wrote:
> > >
> > > (not yet merged), see:
> > >
> > > 
> > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
> > 
> > I do not really understand inline assembly constraints, but I'll ask
> > anyway.
> > 
> > +#define __GEN_RMWcc(fullop, var, cc, ...) \
> > +do { \
> > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
> > + : : "m" (var), ## __VA_ARGS__ \
> >   ^
> > 
> > don't we need
> > 
> > "+m" (var)
> > 
> > here?
> 
> You actually can't have output operands with asm goto, only inputs
> and clobbers.  But the "memory" clobber should be enough here.
> 
> If you suspect a compiler bug, can somebody please narrow it down to
> a single object file (if I've skimmed the patch right, it is just an
> optimization, where object files compiled without and with the patch
> should actually coexist fine in the same kernel), ideally to a single
> routine if possible and post a preprocessed source + gcc command line
> + version of gcc?

gcc version 4.6.2 (SUSE Linux) won't produce output, but where it dies
might point in the general direction of newer gcc troubles? 

  CC [M]  net/sunrpc/xprtsock.o
net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’:
net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at 
haifa-sched.c:2353

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:35 PM, Oleg Nesterov  wrote:
>
> Cough... sorry for off-topic question,
>
> static inline int test_and_set_bit(long nr, volatile unsigned long 
> *addr)
> {
> int oldbit;
>
> asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
>  "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : 
> "memory");
>
> doesn't this mean that "ADDR" doesn't need "+" as well?

We use ADDR for some of the non-barrier ones too, that don't have the
barrier. See clear_bit() and friends..

> Or at least, perhaps it makes sense to identify the include file which
> makes the difference. Say, revert the changes in bitops.h, retest, then
> in atomic.h if the kernel still fails, etc.

Yeah, except Fengguang is the only one seeing this in his automated tests..

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Jakub Jelinek wrote:
>
> On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> >
> > I do not really understand inline assembly constraints, but I'll ask
> > anyway.
> >
> > +#define __GEN_RMWcc(fullop, var, cc, ...) \
> > +do { \
> > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
> > + : : "m" (var), ## __VA_ARGS__ \
> >   ^
> >
> > don't we need
> >
> > "+m" (var)
> >
> > here?
>
> You actually can't have output operands with asm goto, only inputs
> and clobbers.  But the "memory" clobber should be enough here.

Thanks Jakub and Linus.

Cough... sorry for off-topic question,

static inline int test_and_set_bit(long nr, volatile unsigned long 
*addr)
{
int oldbit;

asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
 "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : 
"memory");

doesn't this mean that "ADDR" doesn't need "+" as well?


> If you suspect a compiler bug, can somebody please narrow it down to
> a single object file (if I've skimmed the patch right, it is just an
> optimization, where object files compiled without and with the patch
> should actually coexist fine in the same kernel), ideally to a single
> routine if possible and post a preprocessed source + gcc command line
> + version of gcc?

Or at least, perhaps it makes sense to identify the include file which
makes the difference. Say, revert the changes in bitops.h, retest, then
in atomic.h if the kernel still fails, etc.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:20 PM, Linus Torvalds
 wrote:
>
> I'll try to see if I can reproduce this on my hardware

Yeah, doesn't reproduce here..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:05 PM, Jakub Jelinek  wrote:
>
> If you suspect a compiler bug, can somebody please narrow it down to
> a single object file (if I've skimmed the patch right, it is just an
> optimization, where object files compiled without and with the patch
> should actually coexist fine in the same kernel), ideally to a single
> routine if possible and post a preprocessed source + gcc command line
> + version of gcc?

It is indeed just an optimization, and we could in theory switch
between the two versions on a case-by-case basis, but we don't have
any sane way to really do that.

I'll try to see if I can reproduce this on my hardware (just applying
that patch on top of my own tip) and see if I can try to narrow things
down. But I looked at the assembly for a couple of files, and it all
looked good, and I know this patch works fine for others (ie all the
normal -tip testing), so I suspect it's something specific to what
Fengguang does.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> On 10/08, Linus Torvalds wrote:
> >
> > (not yet merged), see:
> >
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
> 
> I do not really understand inline assembly constraints, but I'll ask
> anyway.
> 
>   +#define __GEN_RMWcc(fullop, var, cc, ...) \
>   +do { \
>   + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
>   + : : "m" (var), ## __VA_ARGS__ \
> ^
> 
> don't we need
> 
>   "+m" (var)
> 
> here?

You actually can't have output operands with asm goto, only inputs
and clobbers.  But the "memory" clobber should be enough here.

If you suspect a compiler bug, can somebody please narrow it down to
a single object file (if I've skimmed the patch right, it is just an
optimization, where object files compiled without and with the patch
should actually coexist fine in the same kernel), ideally to a single
routine if possible and post a preprocessed source + gcc command line
+ version of gcc?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 11:51 AM, Oleg Nesterov  wrote:
>
> I do not really understand inline assembly constraints, but I'll ask
> anyway.
>
> +#define __GEN_RMWcc(fullop, var, cc, ...) \
> +do { \
> + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
> + : : "m" (var), ## __VA_ARGS__ \
>   ^
>
> don't we need
>
> "+m" (var)

We have a memory clobber instead. So the memory is marked as input and
clobbered.

And we'd love to mark it "+m", but "ask goto" cannot have outputs.

For the serializing ones, the memory clobber is ok - they have barrier
semantics anyway. But we'd actually *want* to use "asm goto" for some
cases where the memory clobber is too big of a hammer, so if we ever
get input/output constraints to "asm goto" we'll be happy.

Of course, right now it looks like we shouldn't be in a rush to use
"asm goto" at all...

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Linus Torvalds wrote:
>
> (not yet merged), see:
>
> 
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d

I do not really understand inline assembly constraints, but I'll ask
anyway.

+#define __GEN_RMWcc(fullop, var, cc, ...) \
+do { \
+ asm volatile goto (fullop "; j" cc " %l[cc_label]" \
+ : : "m" (var), ## __VA_ARGS__ \
  ^

don't we need

"+m" (var)

here?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Fengguang Wu wrote:
>
> Yeah, this will quiet the oops messages:
>
> -#ifdef CC_HAVE_ASM_GOTO
> +#if 0

Can't understand how this can affect task_work.c...

Well, task_work_add() does test_and_set_bit(), so that patch actually
changes this code, but still I can't see how this can lead to these
OOPSes.

Fengguang, I do not think this will help, but just in case. Could you
show the result of

$ kernel/task_work.s

?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
[ Richard and Jakub added to cc, they can perhaps help or at least
point us to the right gcc person.

  Richard, Jakub, the bug is triggered by kernel commit 0c44c2d0f459
(not yet merged), see:


http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d

  for the patch. The actual inline asm is pretty dang small, and the
non-asm-goto version works fine. Can you take a look? ]

On Tue, Oct 8, 2013 at 12:51 AM, Fengguang Wu  wrote:
> [9.844709] *pdpt = 072c1001 *pde = 
>
>> That said, Fengguang, can you try two things just to check:
>>
>>  - add "cc" to the clobbers list for the asm goto (technically it
>> should be on the non-asm-goto as well, but we never had that, and
>> maybe the fact that gcc always ends up testing a register afterwards
>> hides the need for the clobber).
>>
>> So it would look like this in arch/x86/include/asm/rmwcc.h
>>
>>   #define __GEN_RMWcc(fullop, var, cc, ...) \
>>   do { \
>>   asm volatile goto (fullop "; j" cc " %l[cc_label]" \
>>   : : "m" (var), ## __VA_ARGS__ \
>>   : "memory", "cc" : cc_label); \
>>   return 0; \
>>   cc_label: \
>>   return 1; \
>>
>> (where that "cc" thing is new). I'm not sure if "cc" really matters on
>> x86 at all (it didn't use to, long long ago), but maybe it does these
>> days..
>
> Tests show that it makes no difference by adding the "cc" this way:
>
> -   : "memory" : cc_label); \
> +   : "memory", "cc" : cc_label); 
>   \

Ok, that was a long shot, I don't think gcc actually ever assumes cc
is live over an asm on x86.

>> If that makes no difference, please just verify that the non-asm-goto
>> version works fine, by changing the
>>
>>   #ifdef CC_HAVE_ASM_GOTO
>>
>> into a simple "#if 0" to disable the asm-goto version.
>
> Yeah, this will quiet the oops messages:
>
> -#ifdef CC_HAVE_ASM_GOTO
> +#if 0

Ok. So it looks very much like "asm goto()" is simply buggered. Too
bad, since it  generated nice clear code.

I suspect it's the memory clobber - maybe it only marks memory as
clobbered for the fallthrough case, and the actual "goto" case might
used old cached values? What do I know, it's just a theory.

We do have "asm goto" with memory clobbers elsewhere (our x86 version
of __mutex_fastpath_lock()), but that use is very limited and only
gets expanded in a single place. The new bitop cases get expanded
*everywhere*, so if there is something subtly wrong wrt code
generation that requires some particular pattern, they'd trigger it
much more easily.

Anybody have any ideas?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
I'll try to find other messages to understand what you are talking
about, just one note for now

On 10/07, Linus Torvalds wrote:
>
> Your oops makes very little sense, it looks like task_work_run() just
> called out to random crap, probably because the work was already
> released, so "work->func()" ends up being bad.

Or task_work_run() can hit work->func == NULL if do_exit() is called
twice if, say, the task does BUG() after exit_task_work().

> participants anyway, just in case there is some race. The comment says
> that it can race with task_work_cancel() playing with *work. Oleg,
> comments?

The comment tries to say that if we are racing with task_work_cancel()
it can't delete the first entry == work, we won the race, its
cmpxchg(task->task_works) should fail.

Howver, task_work_cancel() can delete one of the next entries and
change, say, work->next. And we need to wait anyway if it scans this
list.

I'll try to recheck, but so far I do not see anything wrong.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Fengguang Wu
Hi Linus,

On Mon, Oct 07, 2013 at 11:47:39AM -0700, Linus Torvalds wrote:
> On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu  wrote:
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit 0c44c2d0f459 ("x86: Use asm goto to implement better 
> > modify_and_test() functions"
> 
> Hmm. I'm looking at the final version of that patch, and I'm not
> seeing anything wrong. It may trigger a compiler bug - there aren't
> that many "asm goto" users, and using them for the bitops adds a lot
> of new cases.
> 
> Your oops makes very little sense, it looks like task_work_run() just
> called out to random crap, probably because the work was already
> released, so "work->func()" ends up being bad. I'm adding Oleg to the
> participants anyway, just in case there is some race. The comment says
> that it can race with task_work_cancel() playing with *work. Oleg,
> comments?
> 
> However, I don't see any actual bit-op code in task_work_run() itself,
> so it's something else that got miscompiled and corrupted memory. In
> that respect, the oops you have looks more like the oopses you got
> with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set?

The options was set:

DEBUG_KOBJECT_RELEASE=y
 
I tried disabled it, and find the error still remains:

[9.719060] Write protecting the kernel text: 6116k
[9.720356] Write protecting the kernel read-only data: 2616k
[9.721586] NX-protecting the kernel data: 6172k
[9.750420] BUG: unable to handle kernel NULL pointer dereference at   (null)
[9.750870] IP: [<  (null)>]   (null)
[9.750870] *pdpt = 072be001 *pde = 
[9.750870] Oops: 0010 [#1] DEBUG_PAGEALLOC
[9.750870] CPU: 0 PID: 84 Comm: rc.local Not tainted 
3.12.0-rc1-00081-g6bfa687 #4
[9.750870] task: 872c4000 ti: 872c6000 task.ti: 872c6000
[9.750870] EIP: 0060:[<>] EFLAGS: 00010246 CPU: 0
[9.750870] EIP is at 0x0
[9.750870] EAX: 82076134 EBX: 872b2780 ECX:  EDX: 82076134
[9.750870] ESI: 872c4000 EDI: 872c4388 EBP: 872c7f9c ESP: 872c7f8c
[9.750870]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[9.750870] CR0: 8005003b CR2:  CR3: 072bd000 CR4: 06b0
[9.750870] Stack:
[9.750870]  810545b9 0001 789ecf58 7767dff4 872c7fac 81002358  
78a03903
[9.750870]  872c6000 815f6bd0      

[9.750870]   007b 007b   000b 777d81d0 
0073
[9.750870] Call Trace:
[9.750870]  [<810545b9>] ? task_work_run+0x79/0xb0
[9.750870]  [<81002358>] do_notify_resume+0x58/0x70
[9.750870]  [<815f6bd0>] work_notifysig+0x2b/0x3b
[9.750870] Code:  Bad EIP value.
[9.750870] EIP: [<>] 0x0 SS:ESP 0068:872c7f8c
[9.750870] CR2: 
[9.769399] ---[ end trace da54692b95c91495 ]---
[9.777566] BUG: unable to handle kernel paging request at 05140060
[9.778845] IP: [<81054594>] task_work_run+0x54/0xb0
[9.779774] *pdpt =  *pde = f000ff53f000ff53
[9.780708] Oops:  [#2] DEBUG_PAGEALLOC
[9.781431] CPU: 0 PID: 85 Comm: hostname Tainted: G  D  
3.12.0-rc1-00081-g6bfa687 #4
[9.781721] task: 872c8000 ti: 872ca000 task.ti: 872ca000
[9.781721] EIP: 0060:[<81054594>] EFLAGS: 00010206 CPU: 0
[9.781721] EIP is at task_work_run+0x54/0xb0
[9.781721] EAX: 05140060 EBX: 8729b900 ECX:  EDX: 05140060
[9.781721] ESI: 872c8000 EDI: 872c8388 EBP: 872cbf3c ESP: 872cbf30
[9.781721]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[9.781721] CR0: 8005003b CR2: 05140060 CR3: 072cc000 CR4: 06b0
[9.781721] Stack:
[9.781721]   872af400 872c8000 872cbf8c 8103a02a 0014 776cefb8 
8105b49b
[9.781721]   872cbfac 0001 0015 61636f6c 736f686c 6f6c2e74 
872af458
[9.781721]  69616d6f 872af46e 872af458   872ae980 872c8000 
872cbfa4
[9.781721] Call Trace:
[9.781721]  [<8103a02a>] do_exit+0x2aa/0x920
[9.781721]  [<8105b49b>] ? up_write+0x1b/0x30
[9.781721]  [<8103a732>] do_group_exit+0x52/0xb0
[9.781721]  [<8103a7a8>] SyS_exit_group+0x18/0x20
[9.781721]  [<815f7130>] sysenter_do_call+0x12/0x3c
[9.781721] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 
00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> 02 
89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0
[9.781721] EIP: [<81054594>] task_work_run+0x54/0xb0 SS:ESP 0068:872cbf30
[9.781721] CR2: 05140060
[9.802246] ---[ end trace da54692b95c91496 ]---
[9.802881] Fixing recursive fault but reboot is needed!
[9.811986] BUG: unable to handle kernel paging request at 0805a000
[9.812911] IP: [<81054594>] task_work_run+0x54/0xb0
[9.813683] *pdpt = 072e2001 *pde = 072cf067 *pte = 

[9.815024] Oops:  [#3] DEBUG_PAGEALLOC
[9.815623] CPU: 0 PID: 86 Comm: plymouthd Tainted: G  D  

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Fengguang Wu
Hi Linus,

On Mon, Oct 07, 2013 at 11:47:39AM -0700, Linus Torvalds wrote:
 On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu fengguang...@intel.com wrote:
 
  I got the below dmesg and the first bad commit is
 
  commit 0c44c2d0f459 (x86: Use asm goto to implement better 
  modify_and_test() functions
 
 Hmm. I'm looking at the final version of that patch, and I'm not
 seeing anything wrong. It may trigger a compiler bug - there aren't
 that many asm goto users, and using them for the bitops adds a lot
 of new cases.
 
 Your oops makes very little sense, it looks like task_work_run() just
 called out to random crap, probably because the work was already
 released, so work-func() ends up being bad. I'm adding Oleg to the
 participants anyway, just in case there is some race. The comment says
 that it can race with task_work_cancel() playing with *work. Oleg,
 comments?
 
 However, I don't see any actual bit-op code in task_work_run() itself,
 so it's something else that got miscompiled and corrupted memory. In
 that respect, the oops you have looks more like the oopses you got
 with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set?

The options was set:

DEBUG_KOBJECT_RELEASE=y
 
I tried disabled it, and find the error still remains:

[9.719060] Write protecting the kernel text: 6116k
[9.720356] Write protecting the kernel read-only data: 2616k
[9.721586] NX-protecting the kernel data: 6172k
[9.750420] BUG: unable to handle kernel NULL pointer dereference at   (null)
[9.750870] IP: [  (null)]   (null)
[9.750870] *pdpt = 072be001 *pde = 
[9.750870] Oops: 0010 [#1] DEBUG_PAGEALLOC
[9.750870] CPU: 0 PID: 84 Comm: rc.local Not tainted 
3.12.0-rc1-00081-g6bfa687 #4
[9.750870] task: 872c4000 ti: 872c6000 task.ti: 872c6000
[9.750870] EIP: 0060:[] EFLAGS: 00010246 CPU: 0
[9.750870] EIP is at 0x0
[9.750870] EAX: 82076134 EBX: 872b2780 ECX:  EDX: 82076134
[9.750870] ESI: 872c4000 EDI: 872c4388 EBP: 872c7f9c ESP: 872c7f8c
[9.750870]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[9.750870] CR0: 8005003b CR2:  CR3: 072bd000 CR4: 06b0
[9.750870] Stack:
[9.750870]  810545b9 0001 789ecf58 7767dff4 872c7fac 81002358  
78a03903
[9.750870]  872c6000 815f6bd0      

[9.750870]   007b 007b   000b 777d81d0 
0073
[9.750870] Call Trace:
[9.750870]  [810545b9] ? task_work_run+0x79/0xb0
[9.750870]  [81002358] do_notify_resume+0x58/0x70
[9.750870]  [815f6bd0] work_notifysig+0x2b/0x3b
[9.750870] Code:  Bad EIP value.
[9.750870] EIP: [] 0x0 SS:ESP 0068:872c7f8c
[9.750870] CR2: 
[9.769399] ---[ end trace da54692b95c91495 ]---
[9.777566] BUG: unable to handle kernel paging request at 05140060
[9.778845] IP: [81054594] task_work_run+0x54/0xb0
[9.779774] *pdpt =  *pde = f000ff53f000ff53
[9.780708] Oops:  [#2] DEBUG_PAGEALLOC
[9.781431] CPU: 0 PID: 85 Comm: hostname Tainted: G  D  
3.12.0-rc1-00081-g6bfa687 #4
[9.781721] task: 872c8000 ti: 872ca000 task.ti: 872ca000
[9.781721] EIP: 0060:[81054594] EFLAGS: 00010206 CPU: 0
[9.781721] EIP is at task_work_run+0x54/0xb0
[9.781721] EAX: 05140060 EBX: 8729b900 ECX:  EDX: 05140060
[9.781721] ESI: 872c8000 EDI: 872c8388 EBP: 872cbf3c ESP: 872cbf30
[9.781721]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[9.781721] CR0: 8005003b CR2: 05140060 CR3: 072cc000 CR4: 06b0
[9.781721] Stack:
[9.781721]   872af400 872c8000 872cbf8c 8103a02a 0014 776cefb8 
8105b49b
[9.781721]   872cbfac 0001 0015 61636f6c 736f686c 6f6c2e74 
872af458
[9.781721]  69616d6f 872af46e 872af458   872ae980 872c8000 
872cbfa4
[9.781721] Call Trace:
[9.781721]  [8103a02a] do_exit+0x2aa/0x920
[9.781721]  [8105b49b] ? up_write+0x1b/0x30
[9.781721]  [8103a732] do_group_exit+0x52/0xb0
[9.781721]  [8103a7a8] SyS_exit_group+0x18/0x20
[9.781721]  [815f7130] sysenter_do_call+0x12/0x3c
[9.781721] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 
00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 8b 02 
89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0
[9.781721] EIP: [81054594] task_work_run+0x54/0xb0 SS:ESP 0068:872cbf30
[9.781721] CR2: 05140060
[9.802246] ---[ end trace da54692b95c91496 ]---
[9.802881] Fixing recursive fault but reboot is needed!
[9.811986] BUG: unable to handle kernel paging request at 0805a000
[9.812911] IP: [81054594] task_work_run+0x54/0xb0
[9.813683] *pdpt = 072e2001 *pde = 072cf067 *pte = 

[9.815024] Oops:  [#3] DEBUG_PAGEALLOC
[9.815623] CPU: 0 PID: 86 Comm: plymouthd Tainted: G  D  
3.12.0-rc1-00081-g6bfa687 #4
[9.816819] task: 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
I'll try to find other messages to understand what you are talking
about, just one note for now

On 10/07, Linus Torvalds wrote:

 Your oops makes very little sense, it looks like task_work_run() just
 called out to random crap, probably because the work was already
 released, so work-func() ends up being bad.

Or task_work_run() can hit work-func == NULL if do_exit() is called
twice if, say, the task does BUG() after exit_task_work().

 participants anyway, just in case there is some race. The comment says
 that it can race with task_work_cancel() playing with *work. Oleg,
 comments?

The comment tries to say that if we are racing with task_work_cancel()
it can't delete the first entry == work, we won the race, its
cmpxchg(task-task_works) should fail.

Howver, task_work_cancel() can delete one of the next entries and
change, say, work-next. And we need to wait anyway if it scans this
list.

I'll try to recheck, but so far I do not see anything wrong.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
[ Richard and Jakub added to cc, they can perhaps help or at least
point us to the right gcc person.

  Richard, Jakub, the bug is triggered by kernel commit 0c44c2d0f459
(not yet merged), see:


http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d

  for the patch. The actual inline asm is pretty dang small, and the
non-asm-goto version works fine. Can you take a look? ]

On Tue, Oct 8, 2013 at 12:51 AM, Fengguang Wu fengguang...@intel.com wrote:
 [9.844709] *pdpt = 072c1001 *pde = 

 That said, Fengguang, can you try two things just to check:

  - add cc to the clobbers list for the asm goto (technically it
 should be on the non-asm-goto as well, but we never had that, and
 maybe the fact that gcc always ends up testing a register afterwards
 hides the need for the clobber).

 So it would look like this in arch/x86/include/asm/rmwcc.h

   #define __GEN_RMWcc(fullop, var, cc, ...) \
   do { \
   asm volatile goto (fullop ; j cc  %l[cc_label] \
   : : m (var), ## __VA_ARGS__ \
   : memory, cc : cc_label); \
   return 0; \
   cc_label: \
   return 1; \

 (where that cc thing is new). I'm not sure if cc really matters on
 x86 at all (it didn't use to, long long ago), but maybe it does these
 days..

 Tests show that it makes no difference by adding the cc this way:

 -   : memory : cc_label); \
 +   : memory, cc : cc_label); 
   \

Ok, that was a long shot, I don't think gcc actually ever assumes cc
is live over an asm on x86.

 If that makes no difference, please just verify that the non-asm-goto
 version works fine, by changing the

   #ifdef CC_HAVE_ASM_GOTO

 into a simple #if 0 to disable the asm-goto version.

 Yeah, this will quiet the oops messages:

 -#ifdef CC_HAVE_ASM_GOTO
 +#if 0

Ok. So it looks very much like asm goto() is simply buggered. Too
bad, since it  generated nice clear code.

I suspect it's the memory clobber - maybe it only marks memory as
clobbered for the fallthrough case, and the actual goto case might
used old cached values? What do I know, it's just a theory.

We do have asm goto with memory clobbers elsewhere (our x86 version
of __mutex_fastpath_lock()), but that use is very limited and only
gets expanded in a single place. The new bitop cases get expanded
*everywhere*, so if there is something subtly wrong wrt code
generation that requires some particular pattern, they'd trigger it
much more easily.

Anybody have any ideas?

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Fengguang Wu wrote:

 Yeah, this will quiet the oops messages:

 -#ifdef CC_HAVE_ASM_GOTO
 +#if 0

Can't understand how this can affect task_work.c...

Well, task_work_add() does test_and_set_bit(), so that patch actually
changes this code, but still I can't see how this can lead to these
OOPSes.

Fengguang, I do not think this will help, but just in case. Could you
show the result of

$ kernel/task_work.s

?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Linus Torvalds wrote:

 (not yet merged), see:

 
 http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d

I do not really understand inline assembly constraints, but I'll ask
anyway.

+#define __GEN_RMWcc(fullop, var, cc, ...) \
+do { \
+ asm volatile goto (fullop ; j cc  %l[cc_label] \
+ : : m (var), ## __VA_ARGS__ \
  ^

don't we need

+m (var)

here?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 11:51 AM, Oleg Nesterov o...@redhat.com wrote:

 I do not really understand inline assembly constraints, but I'll ask
 anyway.

 +#define __GEN_RMWcc(fullop, var, cc, ...) \
 +do { \
 + asm volatile goto (fullop ; j cc  %l[cc_label] \
 + : : m (var), ## __VA_ARGS__ \
   ^

 don't we need

 +m (var)

We have a memory clobber instead. So the memory is marked as input and
clobbered.

And we'd love to mark it +m, but ask goto cannot have outputs.

For the serializing ones, the memory clobber is ok - they have barrier
semantics anyway. But we'd actually *want* to use asm goto for some
cases where the memory clobber is too big of a hammer, so if we ever
get input/output constraints to asm goto we'll be happy.

Of course, right now it looks like we shouldn't be in a rush to use
asm goto at all...

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
 On 10/08, Linus Torvalds wrote:
 
  (not yet merged), see:
 
  
  http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
 
 I do not really understand inline assembly constraints, but I'll ask
 anyway.
 
   +#define __GEN_RMWcc(fullop, var, cc, ...) \
   +do { \
   + asm volatile goto (fullop ; j cc  %l[cc_label] \
   + : : m (var), ## __VA_ARGS__ \
 ^
 
 don't we need
 
   +m (var)
 
 here?

You actually can't have output operands with asm goto, only inputs
and clobbers.  But the memory clobber should be enough here.

If you suspect a compiler bug, can somebody please narrow it down to
a single object file (if I've skimmed the patch right, it is just an
optimization, where object files compiled without and with the patch
should actually coexist fine in the same kernel), ideally to a single
routine if possible and post a preprocessed source + gcc command line
+ version of gcc?

Jakub
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:05 PM, Jakub Jelinek ja...@redhat.com wrote:

 If you suspect a compiler bug, can somebody please narrow it down to
 a single object file (if I've skimmed the patch right, it is just an
 optimization, where object files compiled without and with the patch
 should actually coexist fine in the same kernel), ideally to a single
 routine if possible and post a preprocessed source + gcc command line
 + version of gcc?

It is indeed just an optimization, and we could in theory switch
between the two versions on a case-by-case basis, but we don't have
any sane way to really do that.

I'll try to see if I can reproduce this on my hardware (just applying
that patch on top of my own tip) and see if I can try to narrow things
down. But I looked at the assembly for a couple of files, and it all
looked good, and I know this patch works fine for others (ie all the
normal -tip testing), so I suspect it's something specific to what
Fengguang does.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:20 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 I'll try to see if I can reproduce this on my hardware

Yeah, doesn't reproduce here..

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Oleg Nesterov
On 10/08, Jakub Jelinek wrote:

 On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
 
  I do not really understand inline assembly constraints, but I'll ask
  anyway.
 
  +#define __GEN_RMWcc(fullop, var, cc, ...) \
  +do { \
  + asm volatile goto (fullop ; j cc  %l[cc_label] \
  + : : m (var), ## __VA_ARGS__ \
^
 
  don't we need
 
  +m (var)
 
  here?

 You actually can't have output operands with asm goto, only inputs
 and clobbers.  But the memory clobber should be enough here.

Thanks Jakub and Linus.

Cough... sorry for off-topic question,

static inline int test_and_set_bit(long nr, volatile unsigned long 
*addr)
{
int oldbit;

asm volatile(LOCK_PREFIX bts %2,%1\n\t
 sbb %0,%0 : =r (oldbit), ADDR : Ir (nr) : 
memory);

doesn't this mean that ADDR doesn't need + as well?


 If you suspect a compiler bug, can somebody please narrow it down to
 a single object file (if I've skimmed the patch right, it is just an
 optimization, where object files compiled without and with the patch
 should actually coexist fine in the same kernel), ideally to a single
 routine if possible and post a preprocessed source + gcc command line
 + version of gcc?

Or at least, perhaps it makes sense to identify the include file which
makes the difference. Say, revert the changes in bitops.h, retest, then
in atomic.h if the kernel still fails, etc.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 12:35 PM, Oleg Nesterov o...@redhat.com wrote:

 Cough... sorry for off-topic question,

 static inline int test_and_set_bit(long nr, volatile unsigned long 
 *addr)
 {
 int oldbit;

 asm volatile(LOCK_PREFIX bts %2,%1\n\t
  sbb %0,%0 : =r (oldbit), ADDR : Ir (nr) : 
 memory);

 doesn't this mean that ADDR doesn't need + as well?

We use ADDR for some of the non-barrier ones too, that don't have the
barrier. See clear_bit() and friends..

 Or at least, perhaps it makes sense to identify the include file which
 makes the difference. Say, revert the changes in bitops.h, retest, then
 in atomic.h if the kernel still fails, etc.

Yeah, except Fengguang is the only one seeing this in his automated tests..

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Mike Galbraith
On Tue, 2013-10-08 at 21:05 +0200, Jakub Jelinek wrote: 
 On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
  On 10/08, Linus Torvalds wrote:
  
   (not yet merged), see:
  
   
   http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
  
  I do not really understand inline assembly constraints, but I'll ask
  anyway.
  
  +#define __GEN_RMWcc(fullop, var, cc, ...) \
  +do { \
  + asm volatile goto (fullop ; j cc  %l[cc_label] \
  + : : m (var), ## __VA_ARGS__ \
^
  
  don't we need
  
  +m (var)
  
  here?
 
 You actually can't have output operands with asm goto, only inputs
 and clobbers.  But the memory clobber should be enough here.
 
 If you suspect a compiler bug, can somebody please narrow it down to
 a single object file (if I've skimmed the patch right, it is just an
 optimization, where object files compiled without and with the patch
 should actually coexist fine in the same kernel), ideally to a single
 routine if possible and post a preprocessed source + gcc command line
 + version of gcc?

gcc version 4.6.2 (SUSE Linux) won't produce output, but where it dies
might point in the general direction of newer gcc troubles? 

  CC [M]  net/sunrpc/xprtsock.o
net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’:
net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at 
haifa-sched.c:2353

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Linus Torvalds
On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu  wrote:
>
> I got the below dmesg and the first bad commit is
>
> commit 0c44c2d0f459 ("x86: Use asm goto to implement better modify_and_test() 
> functions"

Hmm. I'm looking at the final version of that patch, and I'm not
seeing anything wrong. It may trigger a compiler bug - there aren't
that many "asm goto" users, and using them for the bitops adds a lot
of new cases.

Your oops makes very little sense, it looks like task_work_run() just
called out to random crap, probably because the work was already
released, so "work->func()" ends up being bad. I'm adding Oleg to the
participants anyway, just in case there is some race. The comment says
that it can race with task_work_cancel() playing with *work. Oleg,
comments?

However, I don't see any actual bit-op code in task_work_run() itself,
so it's something else that got miscompiled and corrupted memory. In
that respect, the oops you have looks more like the oopses you got
with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set?

That said, Fengguang, can you try two things just to check:

 - add "cc" to the clobbers list for the asm goto (technically it
should be on the non-asm-goto as well, but we never had that, and
maybe the fact that gcc always ends up testing a register afterwards
hides the need for the clobber).

So it would look like this in arch/x86/include/asm/rmwcc.h

  #define __GEN_RMWcc(fullop, var, cc, ...) \
  do { \
  asm volatile goto (fullop "; j" cc " %l[cc_label]" \
  : : "m" (var), ## __VA_ARGS__ \
  : "memory", "cc" : cc_label); \
  return 0; \
  cc_label: \
  return 1; \

(where that "cc" thing is new). I'm not sure if "cc" really matters on
x86 at all (it didn't use to, long long ago), but maybe it does these
days..

If that makes no difference, please just verify that the non-asm-goto
version works fine, by changing the

  #ifdef CC_HAVE_ASM_GOTO

into a simple "#if 0" to disable the asm-goto version.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Fengguang Wu
On Mon, Oct 07, 2013 at 11:08:56AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
> > Wu, do you use the same compiler version for all the builds that crash
> > like this (I'm assuming the other email was this same commit)? Does a
> > different compiler make things work again?
> 
> OK, so in a further email you say you use gcc-4.8.1 (which is actually
> newer than the one I used for most of the work although I do have it and
> tried it iirc).

Right. And I've got the boot result for gcc-4.6.3: the problem is
still there. Here is the qemu cmdline and new dmesg.

=
cmd=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $1
  -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net 
nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net 
user,vlan=0,hostfwd=tcp::10661-:22
-net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot 
-watchdog i6300esb
-serial file:/dev/shm/serial-0day -daemonize -display none -monitor null
)

"${cmd[@]}" -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 
log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk 
sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 
console=tty0 vga=normal  root=/dev/ram0 rw'
=

[0.00] Linux version 3.12.0-rc1-00081-g6bfa687 (wfg@bee) (gcc version 
4.6.3 (Debian 4.6.3-1) ) #1 Mon Oct 7 19:18:22 CST 2013
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
[0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] Hypervisor detected: KVM
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
[0.00] Scanning 1 areas for low memory corruption
[0.00] initial memory mapped: [mem 0x-0x029f]
[0.00] Base memory trampoline at [8009b000] 9b000 size 16384
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] init_memory_mapping: [mem 0x0e40-0x0e5f]
[0.00]  [mem 0x0e40-0x0e5f] page 4k
[0.00] BRK [0x020a7000, 0x020a7fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0c00-0x0e3f]
[0.00]  [mem 0x0c00-0x0e3f] page 4k
[0.00] BRK [0x020a8000, 0x020a8fff] PGTABLE
[0.00] BRK [0x020a9000, 0x020a9fff] PGTABLE
[0.00] BRK [0x020aa000, 0x020aafff] PGTABLE
[0.00] BRK [0x020ab000, 0x020abfff] PGTABLE
[0.00] BRK [0x020ac000, 0x020acfff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0010-0x0bff]
[0.00]  [mem 0x0010-0x0bff] page 4k
[0.00] init_memory_mapping: [mem 0x0e60-0x0fffdfff]
[0.00]  [mem 0x0e60-0x0fffdfff] page 4k
[0.00] log_buf_len: 8388608
[0.00] early log buf free: 128876(98%)
[0.00] RAMDISK: [mem 0x0e73f000-0x0ffe]
[0.00] ACPI: RSDP 000f16b0 00014 (v00 BOCHS )
[0.00] ACPI: RSDT 0fffe3f0 00034 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
[0.00] ACPI: FACP 0f80 00074 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
[0.00] ACPI: DSDT 0fffe430 01137 (v01   BXPC   BXDSDT 0001 INTL 
20100528)
[0.00] ACPI: FACS 0f40 00040
[0.00] ACPI: SSDT 06a0 00899 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
[0.00] ACPI: APIC 05b0 00080 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
[0.00] ACPI: HPET 0570 00038 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffe000
[0.00]   low ram: 0 - 0fffe000
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:fffd001, boot clock
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x1000-0x0fffdfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009efff]
[0.00]   node   0: [mem 0x0010-0x0fffdfff]
[0.00] On node 0 totalpages: 65436
[0.00]   Normal zone: 576 pages used for memmap
[0.00]   Normal zone: 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Fengguang Wu
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
> On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote:
> > Greetings,
> > 
> > I got the below dmesg and the first bad commit is
> > 
> > commit 0c44c2d0f459cd7e275242b72f500137c4fa834d
> > Author: Peter Zijlstra 
> > Date:   Wed Sep 11 15:19:24 2013 +0200
> > 
> > x86: Use asm goto to implement better modify_and_test() functions
> > 
> > Linus suggested using asm goto to get rid of the typical SETcc + TEST
> > instruction pair -- which also clobbers an extra register -- for our
> > typical modify_and_test() functions.
> > 
> > Because asm goto doesn't allow output fields it has to include an
> > unconditinal memory clobber when it changes a memory variable to force
> > a reload.
> > 
> > Luckily all atomic ops already imply a compiler barrier to go along
> > with their memory barrier semantics.
> > 
> > Suggested-by: Linus Torvalds 
> > Signed-off-by: Peter Zijlstra 
> > Link: 
> > http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org
> > Signed-off-by: Ingo Molnar 
> 
> 
> Well that blows,.. Anybody got any clue as to where to start looking?
> I've not actually seen anything like this on my own machines.

Perhaps it's related to one of

- the randconfig
- kvm
- gcc

In the end of dmesg file, there is the qemu command line to run the kernel:

qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel
/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab
 -append 'hung_task_panic=1
rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug 
sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw
link=/kernel-tests/run-queue/kvm/i386-randconfig-j1-10052106/next:master/.vmlinuz-a0cf1abc25ac197dd97b857c0f6341066a8cb1cf-20131005211923-7-athens
branch=next/master
BOOT_IMAGE=/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab'
  -initrd
/kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net 
nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net 
user,vlan=0,hostfwd=tcp::10661-:22
-net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot 
-watchdog i6300esb -drive
file=/fs/LABEL=KVM/disk0-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk1-quantal-athens-6,media=disk,if=virtio -drive
file=/fs/LABEL=KVM/disk2-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk3-quantal-athens-6,media=disk,if=virtio -drive
file=/fs/LABEL=KVM/disk4-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk5-quantal-athens-6,media=disk,if=virtio -pidfile
/dev/shm/kboot/pid-quantal-athens-6 -serial 
file:/dev/shm/kboot/serial-quantal-athens-6 -daemonize -display none -monitor 
null

> Wu, do you use the same compiler version for all the builds that crash
> like this (I'm assuming the other email was this same commit)? Does a
> different compiler make things work again?

Good point. I'll try a different compiler.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Peter Zijlstra
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
> Wu, do you use the same compiler version for all the builds that crash
> like this (I'm assuming the other email was this same commit)? Does a
> different compiler make things work again?

OK, so in a further email you say you use gcc-4.8.1 (which is actually
newer than the one I used for most of the work although I do have it and
tried it iirc).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Peter Zijlstra
On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote:
> Greetings,
> 
> I got the below dmesg and the first bad commit is
> 
> commit 0c44c2d0f459cd7e275242b72f500137c4fa834d
> Author: Peter Zijlstra 
> Date:   Wed Sep 11 15:19:24 2013 +0200
> 
> x86: Use asm goto to implement better modify_and_test() functions
> 
> Linus suggested using asm goto to get rid of the typical SETcc + TEST
> instruction pair -- which also clobbers an extra register -- for our
> typical modify_and_test() functions.
> 
> Because asm goto doesn't allow output fields it has to include an
> unconditinal memory clobber when it changes a memory variable to force
> a reload.
> 
> Luckily all atomic ops already imply a compiler barrier to go along
> with their memory barrier semantics.
> 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org
> Signed-off-by: Ingo Molnar 


Well that blows,.. Anybody got any clue as to where to start looking?
I've not actually seen anything like this on my own machines.

Wu, do you use the same compiler version for all the builds that crash
like this (I'm assuming the other email was this same commit)? Does a
different compiler make things work again?


> [3.336040] Write protecting the kernel read-only data: 2644k
> [3.336982] NX-protecting the kernel data: 6152k
> [3.375173] BUG: unable to handle kernel paging request at 00740060
> [3.376162] IP: [<81053fc4>] task_work_run+0x54/0xa0
> [3.376837] *pdpt = 072e1001 *pde =  
> [3.377579] Oops:  [#1] DEBUG_PAGEALLOC
> [3.378158] CPU: 0 PID: 85 Comm: hostname Not tainted 
> 3.12.0-rc2-next-20130927-03100-ga0cf1ab #5
> [3.378206] task: 8730c000 ti: 8730e000 task.ti: 8730e000
> [3.378206] EIP: 0060:[<81053fc4>] EFLAGS: 00010206 CPU: 0
> [3.378206] EIP is at task_work_run+0x54/0xa0
> [3.378206] EAX: 00740060 EBX: 87309000 ECX:  EDX: 00740060
> [3.378206] ESI: 8730c388 EDI: 8730c000 EBP: 8730ff40 ESP: 8730ff34
> [3.378206]  DS: 007b ES: 007b FS:  GS:  SS: 0068
> [3.378206] CR0: 8005003b CR2: 00740060 CR3: 072d7000 CR4: 06b0
> [3.378206] Stack:
> [3.378206]   87308058 8730c000 8730ff8c 81039315 77675fb8 
> 8105af7b 
> [3.378206]  8730ffac 0001 6c0e41a5 61636f6c 736f686c 6f6c2e74 
> 646c6163 8730c398
> [3.378206]  815fc8fe 81022f40   872f1880 8730c000 
> 8730ffa4 81039a0a
> [3.378206] Call Trace:
> [3.378206]  [<81039315>] do_exit+0x2a5/0x910
> [3.378206]  [<8105af7b>] ? up_write+0x1b/0x30
> [3.378206]  [<815fc8fe>] ? restore_all+0xf/0xf
> [3.378206]  [<81022f40>] ? kvm_read_and_reset_pf_reason+0x40/0x40
> [3.378206]  [<81039a0a>] do_group_exit+0x4a/0xa0
> [3.378206]  [<81039a78>] SyS_exit_group+0x18/0x20
> [3.378206]  [<815fcf50>] sysenter_do_call+0x12/0x3c
> [3.378206] Code: 36 31 c9 89 d0 0f b1 0e 39 c2 75 eb 85 d2 74 5c 8d b4 26 
> 00 00 00 00 f3 90 8b 87 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> 
> 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 47 0c 04 74 c4 b9 f0 af
> [3.378206] EIP: [<81053fc4>] task_work_run+0x54/0xa0 SS:ESP 0068:8730ff34
> [3.378206] CR2: 00740060
> [3.394549] ---[ end trace a6f697254c888db0 ]---
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Peter Zijlstra
On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote:
 Greetings,
 
 I got the below dmesg and the first bad commit is
 
 commit 0c44c2d0f459cd7e275242b72f500137c4fa834d
 Author: Peter Zijlstra pet...@infradead.org
 Date:   Wed Sep 11 15:19:24 2013 +0200
 
 x86: Use asm goto to implement better modify_and_test() functions
 
 Linus suggested using asm goto to get rid of the typical SETcc + TEST
 instruction pair -- which also clobbers an extra register -- for our
 typical modify_and_test() functions.
 
 Because asm goto doesn't allow output fields it has to include an
 unconditinal memory clobber when it changes a memory variable to force
 a reload.
 
 Luckily all atomic ops already imply a compiler barrier to go along
 with their memory barrier semantics.
 
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org
 Signed-off-by: Ingo Molnar mi...@kernel.org


Well that blows,.. Anybody got any clue as to where to start looking?
I've not actually seen anything like this on my own machines.

Wu, do you use the same compiler version for all the builds that crash
like this (I'm assuming the other email was this same commit)? Does a
different compiler make things work again?


 [3.336040] Write protecting the kernel read-only data: 2644k
 [3.336982] NX-protecting the kernel data: 6152k
 [3.375173] BUG: unable to handle kernel paging request at 00740060
 [3.376162] IP: [81053fc4] task_work_run+0x54/0xa0
 [3.376837] *pdpt = 072e1001 *pde =  
 [3.377579] Oops:  [#1] DEBUG_PAGEALLOC
 [3.378158] CPU: 0 PID: 85 Comm: hostname Not tainted 
 3.12.0-rc2-next-20130927-03100-ga0cf1ab #5
 [3.378206] task: 8730c000 ti: 8730e000 task.ti: 8730e000
 [3.378206] EIP: 0060:[81053fc4] EFLAGS: 00010206 CPU: 0
 [3.378206] EIP is at task_work_run+0x54/0xa0
 [3.378206] EAX: 00740060 EBX: 87309000 ECX:  EDX: 00740060
 [3.378206] ESI: 8730c388 EDI: 8730c000 EBP: 8730ff40 ESP: 8730ff34
 [3.378206]  DS: 007b ES: 007b FS:  GS:  SS: 0068
 [3.378206] CR0: 8005003b CR2: 00740060 CR3: 072d7000 CR4: 06b0
 [3.378206] Stack:
 [3.378206]   87308058 8730c000 8730ff8c 81039315 77675fb8 
 8105af7b 
 [3.378206]  8730ffac 0001 6c0e41a5 61636f6c 736f686c 6f6c2e74 
 646c6163 8730c398
 [3.378206]  815fc8fe 81022f40   872f1880 8730c000 
 8730ffa4 81039a0a
 [3.378206] Call Trace:
 [3.378206]  [81039315] do_exit+0x2a5/0x910
 [3.378206]  [8105af7b] ? up_write+0x1b/0x30
 [3.378206]  [815fc8fe] ? restore_all+0xf/0xf
 [3.378206]  [81022f40] ? kvm_read_and_reset_pf_reason+0x40/0x40
 [3.378206]  [81039a0a] do_group_exit+0x4a/0xa0
 [3.378206]  [81039a78] SyS_exit_group+0x18/0x20
 [3.378206]  [815fcf50] sysenter_do_call+0x12/0x3c
 [3.378206] Code: 36 31 c9 89 d0 0f b1 0e 39 c2 75 eb 85 d2 74 5c 8d b4 26 
 00 00 00 00 f3 90 8b 87 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 8b 
 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 47 0c 04 74 c4 b9 f0 af
 [3.378206] EIP: [81053fc4] task_work_run+0x54/0xa0 SS:ESP 0068:8730ff34
 [3.378206] CR2: 00740060
 [3.394549] ---[ end trace a6f697254c888db0 ]---
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Peter Zijlstra
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
 Wu, do you use the same compiler version for all the builds that crash
 like this (I'm assuming the other email was this same commit)? Does a
 different compiler make things work again?

OK, so in a further email you say you use gcc-4.8.1 (which is actually
newer than the one I used for most of the work although I do have it and
tried it iirc).


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Fengguang Wu
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
 On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote:
  Greetings,
  
  I got the below dmesg and the first bad commit is
  
  commit 0c44c2d0f459cd7e275242b72f500137c4fa834d
  Author: Peter Zijlstra pet...@infradead.org
  Date:   Wed Sep 11 15:19:24 2013 +0200
  
  x86: Use asm goto to implement better modify_and_test() functions
  
  Linus suggested using asm goto to get rid of the typical SETcc + TEST
  instruction pair -- which also clobbers an extra register -- for our
  typical modify_and_test() functions.
  
  Because asm goto doesn't allow output fields it has to include an
  unconditinal memory clobber when it changes a memory variable to force
  a reload.
  
  Luckily all atomic ops already imply a compiler barrier to go along
  with their memory barrier semantics.
  
  Suggested-by: Linus Torvalds torva...@linux-foundation.org
  Signed-off-by: Peter Zijlstra pet...@infradead.org
  Link: 
  http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org
  Signed-off-by: Ingo Molnar mi...@kernel.org
 
 
 Well that blows,.. Anybody got any clue as to where to start looking?
 I've not actually seen anything like this on my own machines.

Perhaps it's related to one of

- the randconfig
- kvm
- gcc

In the end of dmesg file, there is the qemu command line to run the kernel:

qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel
/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab
 -append 'hung_task_panic=1
rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug 
sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw
link=/kernel-tests/run-queue/kvm/i386-randconfig-j1-10052106/next:master/.vmlinuz-a0cf1abc25ac197dd97b857c0f6341066a8cb1cf-20131005211923-7-athens
branch=next/master
BOOT_IMAGE=/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab'
  -initrd
/kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net 
nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net 
user,vlan=0,hostfwd=tcp::10661-:22
-net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot 
-watchdog i6300esb -drive
file=/fs/LABEL=KVM/disk0-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk1-quantal-athens-6,media=disk,if=virtio -drive
file=/fs/LABEL=KVM/disk2-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk3-quantal-athens-6,media=disk,if=virtio -drive
file=/fs/LABEL=KVM/disk4-quantal-athens-6,media=disk,if=virtio -drive 
file=/fs/LABEL=KVM/disk5-quantal-athens-6,media=disk,if=virtio -pidfile
/dev/shm/kboot/pid-quantal-athens-6 -serial 
file:/dev/shm/kboot/serial-quantal-athens-6 -daemonize -display none -monitor 
null

 Wu, do you use the same compiler version for all the builds that crash
 like this (I'm assuming the other email was this same commit)? Does a
 different compiler make things work again?

Good point. I'll try a different compiler.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Fengguang Wu
On Mon, Oct 07, 2013 at 11:08:56AM +0200, Peter Zijlstra wrote:
 On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote:
  Wu, do you use the same compiler version for all the builds that crash
  like this (I'm assuming the other email was this same commit)? Does a
  different compiler make things work again?
 
 OK, so in a further email you say you use gcc-4.8.1 (which is actually
 newer than the one I used for most of the work although I do have it and
 tried it iirc).

Right. And I've got the boot result for gcc-4.6.3: the problem is
still there. Here is the qemu cmdline and new dmesg.

=
cmd=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $1
  -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net 
nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net 
user,vlan=0,hostfwd=tcp::10661-:22
-net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot 
-watchdog i6300esb
-serial file:/dev/shm/serial-0day -daemonize -display none -monitor null
)

${cmd[@]} -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 
log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk 
sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 
console=tty0 vga=normal  root=/dev/ram0 rw'
=

[0.00] Linux version 3.12.0-rc1-00081-g6bfa687 (wfg@bee) (gcc version 
4.6.3 (Debian 4.6.3-1) ) #1 Mon Oct 7 19:18:22 CST 2013
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
[0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] Hypervisor detected: KVM
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
[0.00] Scanning 1 areas for low memory corruption
[0.00] initial memory mapped: [mem 0x-0x029f]
[0.00] Base memory trampoline at [8009b000] 9b000 size 16384
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] init_memory_mapping: [mem 0x0e40-0x0e5f]
[0.00]  [mem 0x0e40-0x0e5f] page 4k
[0.00] BRK [0x020a7000, 0x020a7fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0c00-0x0e3f]
[0.00]  [mem 0x0c00-0x0e3f] page 4k
[0.00] BRK [0x020a8000, 0x020a8fff] PGTABLE
[0.00] BRK [0x020a9000, 0x020a9fff] PGTABLE
[0.00] BRK [0x020aa000, 0x020aafff] PGTABLE
[0.00] BRK [0x020ab000, 0x020abfff] PGTABLE
[0.00] BRK [0x020ac000, 0x020acfff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0010-0x0bff]
[0.00]  [mem 0x0010-0x0bff] page 4k
[0.00] init_memory_mapping: [mem 0x0e60-0x0fffdfff]
[0.00]  [mem 0x0e60-0x0fffdfff] page 4k
[0.00] log_buf_len: 8388608
[0.00] early log buf free: 128876(98%)
[0.00] RAMDISK: [mem 0x0e73f000-0x0ffe]
[0.00] ACPI: RSDP 000f16b0 00014 (v00 BOCHS )
[0.00] ACPI: RSDT 0fffe3f0 00034 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
[0.00] ACPI: FACP 0f80 00074 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
[0.00] ACPI: DSDT 0fffe430 01137 (v01   BXPC   BXDSDT 0001 INTL 
20100528)
[0.00] ACPI: FACS 0f40 00040
[0.00] ACPI: SSDT 06a0 00899 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
[0.00] ACPI: APIC 05b0 00080 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
[0.00] ACPI: HPET 0570 00038 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffe000
[0.00]   low ram: 0 - 0fffe000
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:fffd001, boot clock
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x1000-0x0fffdfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009efff]
[0.00]   node   0: [mem 0x0010-0x0fffdfff]
[0.00] On node 0 totalpages: 65436
[0.00]   Normal zone: 576 pages used for memmap
[0.00]   Normal zone: 0 pages 

Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-07 Thread Linus Torvalds
On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu fengguang...@intel.com wrote:

 I got the below dmesg and the first bad commit is

 commit 0c44c2d0f459 (x86: Use asm goto to implement better modify_and_test() 
 functions

Hmm. I'm looking at the final version of that patch, and I'm not
seeing anything wrong. It may trigger a compiler bug - there aren't
that many asm goto users, and using them for the bitops adds a lot
of new cases.

Your oops makes very little sense, it looks like task_work_run() just
called out to random crap, probably because the work was already
released, so work-func() ends up being bad. I'm adding Oleg to the
participants anyway, just in case there is some race. The comment says
that it can race with task_work_cancel() playing with *work. Oleg,
comments?

However, I don't see any actual bit-op code in task_work_run() itself,
so it's something else that got miscompiled and corrupted memory. In
that respect, the oops you have looks more like the oopses you got
with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set?

That said, Fengguang, can you try two things just to check:

 - add cc to the clobbers list for the asm goto (technically it
should be on the non-asm-goto as well, but we never had that, and
maybe the fact that gcc always ends up testing a register afterwards
hides the need for the clobber).

So it would look like this in arch/x86/include/asm/rmwcc.h

  #define __GEN_RMWcc(fullop, var, cc, ...) \
  do { \
  asm volatile goto (fullop ; j cc  %l[cc_label] \
  : : m (var), ## __VA_ARGS__ \
  : memory, cc : cc_label); \
  return 0; \
  cc_label: \
  return 1; \

(where that cc thing is new). I'm not sure if cc really matters on
x86 at all (it didn't use to, long long ago), but maybe it does these
days..

If that makes no difference, please just verify that the non-asm-goto
version works fine, by changing the

  #ifdef CC_HAVE_ASM_GOTO

into a simple #if 0 to disable the asm-goto version.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/