Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote: > @@ -8,6 +8,7 @@ foo (int a, int b) >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); >return 0; > lab: > + asm (""); >return 0; > } Or alternatively put the asm (""); right after asm goto, asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); asm (""); return ...; lab; return ...; What generates better code remains to be tested. In any case, please conditionalize the hacks on non-fixed compilers once the fix is released. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote: > > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > > > > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of > > > > 4.[6-9] miscompile it. Will have a look tomorrow unless somebody > > > > beats me to it. But historically, the case where asm goto labels > > > > jump to fallthru basic block had numerous problems in the past. > > > > > > That bug lists the component as middle end; this suggests x86_64 would > > > be vulnerable too, can you confirm? So far we've only observed the > > > wrong code on i386 targets, x86_64 targets appeared correct. > > > > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and > > even say on ppc64 (sure, one would have to rewrite the asm to have it > > fail at runtime). > > Please let us know once you know enough about the bug to suggest > workarounds. Because it's a nice optimization even extra instruction(s) > would be acceptable I suspect: we could perhaps put a NOP into a slowpath, > with an (unused) goto to it, or something like that? IMHO you don't need to put there a nop, I guess asm (""); would be enough, that will still make sure the label is never in the fallthru basic block and the whole class of issues with asm goto with labels in the fallthru bb can't hit. The disadvantage is that it will generate worse code. @@ -8,6 +8,7 @@ foo (int a, int b) asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); return 0; lab: + asm (""); return 0; } on the testcase from the PR results in something like: #APP # 8 "pr58670-1.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L5: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: movl$-3, %eax ret while code without the extra asm (""); and with a fixed compiler: #APP # 6 "pr58670.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: .L2: movl$-3, %eax ret FYI, list of past compiler issues with asm goto include: PR54127, PR46226, PR44071, PR52650, PR54455, PR51767. I hope we get this fixed for 4.8.2, so you could then avoid these hacks for GCC 4.8.2 and later. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Jakub Jelinek wrote: > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of > > > 4.[6-9] miscompile it. Will have a look tomorrow unless somebody > > > beats me to it. But historically, the case where asm goto labels > > > jump to fallthru basic block had numerous problems in the past. > > > > That bug lists the component as middle end; this suggests x86_64 would > > be vulnerable too, can you confirm? So far we've only observed the > > wrong code on i386 targets, x86_64 targets appeared correct. > > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and > even say on ppc64 (sure, one would have to rewrite the asm to have it > fail at runtime). Please let us know once you know enough about the bug to suggest workarounds. Because it's a nice optimization even extra instruction(s) would be acceptable I suspect: we could perhaps put a NOP into a slowpath, with an (unused) goto to it, or something like that? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Jakub Jelinek ja...@redhat.com wrote: On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. That bug lists the component as middle end; this suggests x86_64 would be vulnerable too, can you confirm? So far we've only observed the wrong code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Please let us know once you know enough about the bug to suggest workarounds. Because it's a nice optimization even extra instruction(s) would be acceptable I suspect: we could perhaps put a NOP into a slowpath, with an (unused) goto to it, or something like that? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote: On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. That bug lists the component as middle end; this suggests x86_64 would be vulnerable too, can you confirm? So far we've only observed the wrong code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Please let us know once you know enough about the bug to suggest workarounds. Because it's a nice optimization even extra instruction(s) would be acceptable I suspect: we could perhaps put a NOP into a slowpath, with an (unused) goto to it, or something like that? IMHO you don't need to put there a nop, I guess asm (); would be enough, that will still make sure the label is never in the fallthru basic block and the whole class of issues with asm goto with labels in the fallthru bb can't hit. The disadvantage is that it will generate worse code. @@ -8,6 +8,7 @@ foo (int a, int b) asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab); return 0; lab: + asm (); return 0; } on the testcase from the PR results in something like: #APP # 8 pr58670-1.c 1 bts $1, -4(%rsp); jc .L3 # 0 2 #NO_APP .L5: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: movl$-3, %eax ret while code without the extra asm (); and with a fixed compiler: #APP # 6 pr58670.c 1 bts $1, -4(%rsp); jc .L3 # 0 2 #NO_APP .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: .L2: movl$-3, %eax ret FYI, list of past compiler issues with asm goto include: PR54127, PR46226, PR44071, PR52650, PR54455, PR51767. I hope we get this fixed for 4.8.2, so you could then avoid these hacks for GCC 4.8.2 and later. Jakub -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote: @@ -8,6 +8,7 @@ foo (int a, int b) asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab); return 0; lab: + asm (); return 0; } Or alternatively put the asm (); right after asm goto, asm volatile goto (bts $1, %0; jc %l[lab] : : m (b) : memory : lab); asm (); return ...; lab; return ...; What generates better code remains to be tested. In any case, please conditionalize the hacks on non-fixed compilers once the fix is released. Jakub -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, 2013-10-09 at 19:18 +0200, Ingo Molnar wrote: > * Ingo Molnar wrote: > > > > > * Peter Zijlstra wrote: > > > > > On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: > > > > > > Fengguang, I do not think this will help, but just in case. Could > > > > > > you > > > > > > show the result of > > > > > > > > > > > > $ kernel/task_work.s > > > > > > > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! > > > > > > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > > > > bts $1, 8(%eax); setc %dl #,, c > > > > > > That compiler doesn't appear to have asm goto support, so we fall back > > > to the code we already knew worked :-) > > > > I'm using 4.7.2 with randconfig testing, which has asm goto support, and > > I haven't seen this crash yet. > > > > Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing > > bug gets exposed by GCC 4.8. > > And as it happens, just a few hours later I hit a very similar crash, this > time compiled with both 4.7.3 and 4.7.2! (config attached) > > This has a weird-x86-arch tuning knob as well: > > CONFIG_MGEODE_LX=y > > So I think we might need to turn off asm goto for all things 32-bit x86. Hm, 32 bit x86... I built 4.8.1 yesterday, so can now build x86_64 tip, but I suspect I'll not be the only one with a compiler that goes belly up. net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’: net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at haifa-sched.c:2353 gcc-4.6.2 (opensuse 12.1) has happily chewed up humongous piles of source, but finds this asm goto stuff to be toxic. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 > > Seems all of 4.[6-9] miscompile it. Will have a look tomorrow > > unless somebody beats me to it. But historically, the case where > > asm goto labels jump to fallthru basic block had numerous problems in the > > past. > > That bug lists the component as middle end; this suggests x86_64 would > be vulnerable too, can you confirm? So far we've only observed the wrong > code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 > Seems all of 4.[6-9] miscompile it. Will have a look tomorrow > unless somebody beats me to it. But historically, the case where > asm goto labels jump to fallthru basic block had numerous problems in the > past. That bug lists the component as middle end; this suggests x86_64 would be vulnerable too, can you confirm? So far we've only observed the wrong code on i386 targets, x86_64 targets appeared correct. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 9, 2013 at 11:16 AM, Jakub Jelinek wrote: > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 > Seems all of 4.[6-9] miscompile it. Will have a look tomorrow > unless somebody beats me to it. But historically, the case where > asm goto labels jump to fallthru basic block had numerous problems in the > past. Ok, so it isn't even specific for x86-32, because your test-case shows the bug for me on 64-bit too. Apparently we just have a harder time hitting it in practice in the kernel on x86-64./ Too bad. It makes me nervous about all our _traditional_ uses of asm goto too, never mind the new ones.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: > > > Once I force a x86_64 build using the 'same' config it goes away and > > > generates 'sensible' code again (although I don't see why L9 isn't > > > merged with L2): > > > > i386-SMP also generates correct code afaict; a tad stupid but not wrong. > > > > If I remove ftrace from the .config its still broken.. > > If I also remove the likely/unlikely tracer its still broken and lots > > smaller: > > OK, its -march=winchip2 that's buggered. Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: > Once I force a x86_64 build using the 'same' config it goes away and > generates 'sensible' code again (although I don't see why L9 isn't > merged with L2): i386-SMP also generates correct code afaict; a tad stupid but not wrong. If I remove ftrace from the .config its still broken.. If I also remove the likely/unlikely tracer its still broken and lots smaller: .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# movl%eax, %esi # task, task .p2align 4,,15 .L4: movl904(%esi), %ebx # task_5(D)->task_works, __old cmpl$work_exited, %ebx #, __old je .L5 #, movl%ebx, (%edx)# __old, work_10(D)->next movl%ebx, %eax # __old, __ret #APP # 34 "/usr/src/linux-2.6/kernel/task_work.c" 1 cmpxchgl %edx,904(%esi) # work, *__ptr_12 # 0 "" 2 #NO_APP cmpl%eax, %ebx # __ret, __old jne .L4 #, testb %cl, %cl# notify je .L6 #, movl4(%esi), %eax # task_5(D)->stack, task_5(D)->stack #APP # 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_18], # 0 "" 2 #NO_APP .L6: xorl%edi, %edi # D.14172 .L2: movl%edi, %eax # D.14172, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .L5: movl$-3, %edi #, D.14172 jmp .L2 # .size task_work_add, .-task_work_add That "jc .L2" needs to be .L6 ! It looks like it fails to deal with the empty branch. Why this thing needs to use EDI is anybodies guess I suppose. Would've made much more sense to have: .L6: xorl %eax, %eax .L2: popl %ebx popl %esi popl %ebp ret .L5: movl, $-3, %eax jmp .L2 At least its not duplicating the popl+ret bits 3 times anymore. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Peter Zijlstra wrote: > > This is what we are going to return. But note that -20(%ebp) was not > > initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5 > > above. IOW, in this case we seem to return a random value from stack. > > I think you're quite right, and I can confirm I can reproduce this with > gcc-4.8.1 and Wu's .config: > > [...] > > Once I force a x86_64 build using the 'same' config it goes away and > generates 'sensible' code again [...] So this at least opens up the possibility that we can create a not too painful quirk and only use the 'asm goto' optimization tricks on 64-bit kernels? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
OK, thanks... I didn't notice Richard and Jakub were not cc'ed... Add them, perhaps they can take a look. On 10/09, Peter Zijlstra wrote: > > On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote: > > I'm afraid I am wrong, my asm skills are close to zero... but this > > code looks wrong to me, and this can explain the oopses. > > > > > task_work_add: > > > pushl %ebp# > > > movl%esp, %ebp #, > > > pushl %edi# > > > pushl %esi# > > > pushl %ebx# > > > subl$12, %esp #, > > > callmcount > > > movl%eax, %edi # task, task > > > movl%edx, -16(%ebp) # work, %sfp > > > movb%cl, -21(%ebp) # notify, %sfp > > > .p2align 4,,15 > > > .L3: > > > movl904(%edi), %esi # task_3(D)->task_works, head > > > cmpl$work_exited, %esi #, head > > > sete%bl #, D.14145 > > > andl$255, %ebx #, D.14145 > > > xorl%ecx, %ecx # > > > movl%ebx, %edx # D.14145, > > > movl$__f.14042, %eax#, > > > callftrace_likely_update# > > > testl %ebx, %ebx # D.14145 > > > jne .L4 #, > > > movl-16(%ebp), %edx # %sfp, > > > movl%esi, (%edx)# head, work_13(D)->next > > > movl%esi, %eax # head, __ret > > > #APP > > > # 34 "/c/wfg/tip/kernel/task_work.c" 1 > > > cmpxchgl %edx,904(%edi) #, *__ptr_16 > > > # 0 "" 2 > > > #NO_APP > > > cmpl%eax, %esi # __ret, head > > > jne .L3 #, > > > > OK, we added the new work successfully, we should return 0. If we return > > non-zero, fput() (the likely caller) assumes that it should use the > > workqueues > > to close/free this file. Then later task_work_run() will do __fput() again. > > > > > cmpb$0, -21(%ebp) #, %sfp > > > je .L5 #, > > > movl4(%edi), %eax # task_3(D)->stack, task_3(D)->stack > > > #APP > > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > > > bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int > > > *)D.14203_29], > > > > This is set_notify_resume(). Probably !CONFIG_SMP (I do not see > > kick_process). > > > > > # 0 "" 2 > > > #NO_APP > > > .L5: > > > movl$0, -20(%ebp) #, %sfp > > > .L2: > > > movl-20(%ebp), %eax # %sfp, > > > > This is what we are going to return. But note that -20(%ebp) was not > > initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5 > > above. IOW, in this case we seem to return a random value from stack. > > I think you're quite right, and I can confirm I can reproduce this with > gcc-4.8.1 and Wu's .config: > > .p2align 4,,15 > .globl task_work_add > .type task_work_add, @function > task_work_add: > pushl %ebp# > movl%esp, %ebp #, > pushl %edi# > pushl %esi# > pushl %ebx# > subl$12, %esp #, > callmcount > movl%eax, %esi # task, task > movl%edx, %edi # work, work > movl%ecx, -24(%ebp) # notify, %sfp > jmp .L4 # > .p2align 4,,15 > .L9: > movl%ebx, (%edi)# __old, work_15(D)->next > movl%ebx, %eax # __old, __ret > #APP > # 34 "/usr/src/linux-2.6/kernel/task_work.c" 1 > cmpxchgl %edi,904(%esi) # work, *__ptr_17 > # 0 "" 2 > #NO_APP > cmpl%eax, %ebx # __ret, __old > je .L8 #, > .L4: > movl904(%esi), %ebx # task_7(D)->task_works, __old > cmpl$work_exited, %ebx #, __old > sete-13(%ebp) #, %sfp > xorl%edx, %edx # __r > movb-13(%ebp), %dl # %sfp, __r > xorl%ecx, %ecx # > movl$__f.14204, %eax#, > callftrace_likely_update# > cmpb$0, -13(%ebp) #, %sfp > je .L9 #, > movl$-3, -20(%ebp) #, %sfp > .L2: > movl-20(%ebp), %eax # %sfp, > addl$12, %esp #, > popl%ebx# > popl%esi# > popl%edi# > popl%ebp# > ret > .p2align 4,,15 > .L8: > cmpb$0, -24(%ebp) #, %sfp > je .L6 #, > movl4(%esi), %eax # task_7(D)->stack, task_7(D)->stack > #APP > # 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1 > bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23], > # 0 "" 2 > #NO_APP > .L6: > movl$0, -20(%ebp) #, %sfp > movl-20(%ebp), %eax # %sfp, > addl$12, %esp #, > popl%ebx# > popl%esi# > popl%edi# > popl%ebp# > ret > .size task_work_add, .-task_work_add > > Once I force a x86_64 build using the 'same' config it goes away and > generates 'sensible' code again (although I don't see why L9 isn't > merged with L2): > > .p2align 4,,15 >
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote: > I'm afraid I am wrong, my asm skills are close to zero... but this > code looks wrong to me, and this can explain the oopses. > > > task_work_add: > > pushl %ebp# > > movl%esp, %ebp #, > > pushl %edi# > > pushl %esi# > > pushl %ebx# > > subl$12, %esp #, > > callmcount > > movl%eax, %edi # task, task > > movl%edx, -16(%ebp) # work, %sfp > > movb%cl, -21(%ebp) # notify, %sfp > > .p2align 4,,15 > > .L3: > > movl904(%edi), %esi # task_3(D)->task_works, head > > cmpl$work_exited, %esi #, head > > sete%bl #, D.14145 > > andl$255, %ebx #, D.14145 > > xorl%ecx, %ecx # > > movl%ebx, %edx # D.14145, > > movl$__f.14042, %eax#, > > callftrace_likely_update# > > testl %ebx, %ebx # D.14145 > > jne .L4 #, > > movl-16(%ebp), %edx # %sfp, > > movl%esi, (%edx)# head, work_13(D)->next > > movl%esi, %eax # head, __ret > > #APP > > # 34 "/c/wfg/tip/kernel/task_work.c" 1 > > cmpxchgl %edx,904(%edi) #, *__ptr_16 > > # 0 "" 2 > > #NO_APP > > cmpl%eax, %esi # __ret, head > > jne .L3 #, > > OK, we added the new work successfully, we should return 0. If we return > non-zero, fput() (the likely caller) assumes that it should use the workqueues > to close/free this file. Then later task_work_run() will do __fput() again. > > > cmpb$0, -21(%ebp) #, %sfp > > je .L5 #, > > movl4(%edi), %eax # task_3(D)->stack, task_3(D)->stack > > #APP > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > > bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int > > *)D.14203_29], > > This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process). > > > # 0 "" 2 > > #NO_APP > > .L5: > > movl$0, -20(%ebp) #, %sfp > > .L2: > > movl-20(%ebp), %eax # %sfp, > > This is what we are going to return. But note that -20(%ebp) was not > initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5 > above. IOW, in this case we seem to return a random value from stack. I think you're quite right, and I can confirm I can reproduce this with gcc-4.8.1 and Wu's .config: .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %esi # task, task movl%edx, %edi # work, work movl%ecx, -24(%ebp) # notify, %sfp jmp .L4 # .p2align 4,,15 .L9: movl%ebx, (%edi)# __old, work_15(D)->next movl%ebx, %eax # __old, __ret #APP # 34 "/usr/src/linux-2.6/kernel/task_work.c" 1 cmpxchgl %edi,904(%esi) # work, *__ptr_17 # 0 "" 2 #NO_APP cmpl%eax, %ebx # __ret, __old je .L8 #, .L4: movl904(%esi), %ebx # task_7(D)->task_works, __old cmpl$work_exited, %ebx #, __old sete-13(%ebp) #, %sfp xorl%edx, %edx # __r movb-13(%ebp), %dl # %sfp, __r xorl%ecx, %ecx # movl$__f.14204, %eax#, callftrace_likely_update# cmpb$0, -13(%ebp) #, %sfp je .L9 #, movl$-3, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .p2align 4,,15 .L8: cmpb$0, -24(%ebp) #, %sfp je .L6 #, movl4(%esi), %eax # task_7(D)->stack, task_7(D)->stack #APP # 208 "/usr/src/linux-2.6/arch/x86/include/asm/bitops.h" 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23], # 0 "" 2 #NO_APP .L6: movl$0, -20(%ebp) #, %sfp movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .size task_work_add, .-task_work_add Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again (although I don't see why L9 isn't merged with L2): .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: call__fentry__ pushq %rbp# movq%rsp, %rbp #, pushq %r15# pushq %r14# movl%edx, %r14d # notify, notify pushq %r13# movq%rsi, %r13 # work, work pushq
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 02:27:05PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: > > > > Fengguang, I do not think this will help, but just in case. Could you > > > > show the result of > > > > > > > > $ kernel/task_work.s > > > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! > > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > > bts $1, 8(%eax); setc %dl #,, c > > That compiler doesn't appear to have asm goto support, so we fall back > to the code we already knew worked :-) Ah OK.. btw, here is a simple script I used to reproduce the problem. I'll attach the 3MB yocto initrd in another email. However I suspect whatever initrd would be OK. Thanks, Fengguang kvm-0day.sh Description: Bourne shell script
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: > > > > Fengguang, I do not think this will help, but just in case. Could you > > > > show the result of > > > > > > > > $ kernel/task_work.s > > > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! > > > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > > bts $1, 8(%eax); setc %dl #,, c > > That compiler doesn't appear to have asm goto support, so we fall back > to the code we already knew worked :-) I'm using 4.7.2 with randconfig testing, which has asm goto support, and I haven't seen this crash yet. Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing bug gets exposed by GCC 4.8. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Fengguang, On 10/09, Fengguang Wu wrote: > > Thanks for looking into this. Attached is the task_work.s for you. Thanks a lot! I'm afraid I am wrong, my asm skills are close to zero... but this code looks wrong to me, and this can explain the oopses. > task_work_add: > pushl %ebp# > movl%esp, %ebp #, > pushl %edi# > pushl %esi# > pushl %ebx# > subl$12, %esp #, > callmcount > movl%eax, %edi # task, task > movl%edx, -16(%ebp) # work, %sfp > movb%cl, -21(%ebp) # notify, %sfp > .p2align 4,,15 > .L3: > movl904(%edi), %esi # task_3(D)->task_works, head > cmpl$work_exited, %esi #, head > sete%bl #, D.14145 > andl$255, %ebx #, D.14145 > xorl%ecx, %ecx # > movl%ebx, %edx # D.14145, > movl$__f.14042, %eax#, > callftrace_likely_update# > testl %ebx, %ebx # D.14145 > jne .L4 #, > movl-16(%ebp), %edx # %sfp, > movl%esi, (%edx)# head, work_13(D)->next > movl%esi, %eax # head, __ret > #APP > # 34 "/c/wfg/tip/kernel/task_work.c" 1 > cmpxchgl %edx,904(%edi) #, *__ptr_16 > # 0 "" 2 > #NO_APP > cmpl%eax, %esi # __ret, head > jne .L3 #, OK, we added the new work successfully, we should return 0. If we return non-zero, fput() (the likely caller) assumes that it should use the workqueues to close/free this file. Then later task_work_run() will do __fput() again. > cmpb$0, -21(%ebp) #, %sfp > je .L5 #, > movl4(%edi), %eax # task_3(D)->stack, task_3(D)->stack > #APP > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int > *)D.14203_29], This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process). > # 0 "" 2 > #NO_APP > .L5: > movl$0, -20(%ebp) #, %sfp > .L2: > movl-20(%ebp), %eax # %sfp, This is what we are going to return. But note that -20(%ebp) was not initialized if TIF_NOTIFY_RESUME was already set, "jc .L2" skips .L5 above. IOW, in this case we seem to return a random value from stack. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: > > > Fengguang, I do not think this will help, but just in case. Could you > > > show the result of > > > > > > $ kernel/task_work.s > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! > # 208 "/c/wfg/tip/arch/x86/include/asm/bitops.h" 1 > bts $1, 8(%eax); setc %dl #,, c That compiler doesn't appear to have asm goto support, so we fall back to the code we already knew worked :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: > > > Fengguang, I do not think this will help, but just in case. Could you > > > show the result of > > > > > > $ kernel/task_work.s > > Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! > > Attached is the new kernel/task_work.s. Here is the diff: gcc 4.6.3 vs 4.4.7 == --- task_work.s 2013-10-09 20:19:48.312272579 +0800 +++ /tmp/task_work.s2013-10-09 20:18:14.0 +0800 @@ -1,136 +1,150 @@ .file "task_work.c" -# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu) -# compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 3.1.0-p3, MPC version 0.9 -# warning: GMP header version 5.0.4 differs from library version 5.0.2. -# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2. +# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu) +# compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 3.1.1-p2. +# warning: GMP header version 5.1.1 differs from library version 5.0.2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 -# options passed: -nostdinc -I /c/wfg/tip/arch/x86/include -# -I arch/x86/include/generated -I /c/wfg/tip/include -I include -# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi -# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel -# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__ -# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 -# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 -# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s -# -D KBUILD_BASENAME=KBUILD_STR(task_work) -# -D KBUILD_MODNAME=KBUILD_STR(task_work) -# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d +# options passed: -nostdinc -I/c/wfg/tip/arch/x86/include +# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude +# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi +# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel +# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__ +# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 +# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 +# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work) +# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem +# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include +# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args -# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security -# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable -# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing -# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic +# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip +# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs +# -Werror-implicit-function-declaration -Wno-format-security +# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement +# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common +# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm -# options enabled: -fauto-inc-dec -fbranch-count-reg -fcaller-saves -# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize -# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types -# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse -# -fgcse-lm -fguess-branch-probability -fident -fif-conversion -# -fif-conversion2 -findirect-inlining -finline -# -finline-functions-called-once -finline-small-functions -fipa-cp -# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra +# options enabled: -falign-loops -fargument-alias -fauto-inc-dec +# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping +# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining +# -feliminate-unused-debug-types -fexpensive-optimizations +# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm +# -fguess-branch-probability -fident -fif-conversion -fif-conversion2 +# -findirect-inlining -finline -finline-functions-called-once +# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings
Re: [x86] BUG: unable to handle kernel paging request at 00740060
> > Fengguang, I do not think this will help, but just in case. Could you > > show the result of > > > > $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! Attached is the new kernel/task_work.s. Thanks, Fengguang .file "task_work.c" # GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu) # compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 3.1.1-p2. # warning: GMP header version 5.1.1 differs from library version 5.0.2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -nostdinc -I/c/wfg/tip/arch/x86/include # -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude # -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi # -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel # -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__ # -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 # -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 # -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work) # -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem # /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include # /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args # -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip # kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs # -Werror-implicit-function-declaration -Wno-format-security # -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement # -Wno-pointer-sign -p -fno-strict-aliasing -fno-common # -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm # options enabled: -falign-loops -fargument-alias -fauto-inc-dec # -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping # -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining # -feliminate-unused-debug-types -fexpensive-optimizations # -fforward-propagate -ffunction-cse -fgcse -fgcse-lm # -fguess-branch-probability -fident -fif-conversion -fif-conversion2 # -findirect-inlining -finline -finline-functions-called-once # -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move # -fpeephole -fpeephole2 -fprofile -freg-struct-return -fregmove # -freorder-blocks -freorder-functions -frerun-cse-after-loop # -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros # -fsplit-ivs-in-unroller -fsplit-wide-types -fthread-jumps # -ftoplevel-reorder -ftrapping-math -ftree-builtin-call-dce -ftree-ccp # -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce # -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= # -ftree-pre -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-sra # -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version -ftree-vrp # -funit-at-a-time -fvect-cost-model -fverbose-asm # -fzero-initialized-in-bss -m32 -m96bit-long-double # -maccumulate-outgoing-args -malign-stringops -mfused-madd -mglibc # -mieee-fp -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf # -mtls-direct-seg-refs # Compiler executable checksum: f7c11247ad5a53a602823d9bd673a474 .section.rodata.str1.1,"aMS",@progbits,1 .LC0: .string "/c/wfg/tip/kernel/task_work.c" .text .p2align 4,,15 .globl task_work_run .type task_work_run, @function task_work_run: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# callmcount #APP # 14 "/c/wfg/tip/arch/x86/include/asm/current.h" 1 movl current_task,%edi #, task # 0 "" 2 #NO_APP leal904(%edi), %ebx #, D.18648 .p2align 4,,15 .L15: movl(%ebx), %edx#* D.18648, work testl %edx, %edx # work je .L17#, .L2: xorl%ecx, %ecx # head.458 .L3: movl%edx, %eax # work, __ret #APP # 99 "/c/wfg/tip/kernel/task_work.c" 1 cmpxchgl %ecx,(%ebx)# head.458,* D.18648 # 0 "" 2 #NO_APP cmpl%eax, %edx # __ret, work jne .L15#, testl %edx, %edx # work je .L10#, .p2align 4,,15 .L12: #APP # 656 "/c/wfg/tip/arch/x86/include/asm/processor.h" 1 rep; nop # 0 "" 2 #NO_APP movl960(%edi), %eax # .pi_lock.raw_lock.slock, D.18658 testl %eax, %eax # D.18658 je
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Oleg, Thanks for looking into this. Attached is the task_work.s for you. > Fengguang, I do not think this will help, but just in case. Could you > show the result of > > $ kernel/task_work.s > > ? Sorry I lost some emails and found it back in LKML. Opened up too many mutt clients.. Thanks, Fengguang .file "task_work.c" # GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu) # compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 3.1.0-p3, MPC version 0.9 # warning: GMP header version 5.0.4 differs from library version 5.0.2. # warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -nostdinc -I /c/wfg/tip/arch/x86/include # -I arch/x86/include/generated -I /c/wfg/tip/include -I include # -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi # -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel # -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__ # -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 # -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 # -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s # -D KBUILD_BASENAME=KBUILD_STR(task_work) # -D KBUILD_MODNAME=KBUILD_STR(task_work) # -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include # -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args # -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx # -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes # -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security # -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable # -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing # -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm # options enabled: -fauto-inc-dec -fbranch-count-reg -fcaller-saves # -fcombine-stack-adjustments -fcompare-elim -fcprop-registers # -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize # -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types # -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse # -fgcse-lm -fguess-branch-probability -fident -fif-conversion # -fif-conversion2 -findirect-inlining -finline # -finline-functions-called-once -finline-small-functions -fipa-cp # -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move # -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays # -fprofile -freg-struct-return -fregmove -freorder-blocks # -freorder-functions -frerun-cse-after-loop # -fsched-critical-path-heuristic -fsched-dep-count-heuristic # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic # -fsched-stalled-insns-dep -fshow-column -fsigned-zeros # -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields # -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop # -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts # -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize # -ftree-parallelize-loops= -ftree-phiprop -ftree-pre -ftree-pta # -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize # -ftree-sra -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version # -ftree-vrp -funit-at-a-time -fvect-cost-model -fverbose-asm # -fzero-initialized-in-bss -m32 -m96bit-long-double # -maccumulate-outgoing-args -malign-stringops -mglibc -mieee-fp # -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf # -mtls-direct-seg-refs # Compiler executable checksum: aa5cb4c8e9c62c6cc9349213df314c34 .text .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %edi # task, task movl%edx, -16(%ebp) # work, %sfp movb%cl, -21(%ebp) # notify, %sfp .p2align 4,,15 .L3: movl904(%edi), %esi # task_3(D)->task_works, head cmpl$work_exited, %esi #, head sete%bl #, D.14145 andl$255, %ebx #, D.14145
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Oleg, Thanks for looking into this. Attached is the task_work.s for you. Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s ? Sorry I lost some emails and found it back in LKML. Opened up too many mutt clients.. Thanks, Fengguang .file task_work.c # GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu) # compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 3.1.0-p3, MPC version 0.9 # warning: GMP header version 5.0.4 differs from library version 5.0.2. # warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -nostdinc -I /c/wfg/tip/arch/x86/include # -I arch/x86/include/generated -I /c/wfg/tip/include -I include # -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi # -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel # -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__ # -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 # -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 # -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s # -D KBUILD_BASENAME=KBUILD_STR(task_work) # -D KBUILD_MODNAME=KBUILD_STR(task_work) # -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include # -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args # -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx # -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes # -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security # -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable # -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing # -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm # options enabled: -fauto-inc-dec -fbranch-count-reg -fcaller-saves # -fcombine-stack-adjustments -fcompare-elim -fcprop-registers # -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize # -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types # -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse # -fgcse-lm -fguess-branch-probability -fident -fif-conversion # -fif-conversion2 -findirect-inlining -finline # -finline-functions-called-once -finline-small-functions -fipa-cp # -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move # -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays # -fprofile -freg-struct-return -fregmove -freorder-blocks # -freorder-functions -frerun-cse-after-loop # -fsched-critical-path-heuristic -fsched-dep-count-heuristic # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic # -fsched-stalled-insns-dep -fshow-column -fsigned-zeros # -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields # -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop # -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts # -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize # -ftree-parallelize-loops= -ftree-phiprop -ftree-pre -ftree-pta # -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize # -ftree-sra -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version # -ftree-vrp -funit-at-a-time -fvect-cost-model -fverbose-asm # -fzero-initialized-in-bss -m32 -m96bit-long-double # -maccumulate-outgoing-args -malign-stringops -mglibc -mieee-fp # -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf # -mtls-direct-seg-refs # Compiler executable checksum: aa5cb4c8e9c62c6cc9349213df314c34 .text .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %edi # task, task movl%edx, -16(%ebp) # work, %sfp movb%cl, -21(%ebp) # notify, %sfp .p2align 4,,15 .L3: movl904(%edi), %esi # task_3(D)-task_works, head cmpl$work_exited, %esi #, head sete%bl #, D.14145 andl$255, %ebx #, D.14145
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! Attached is the new kernel/task_work.s. Thanks, Fengguang .file task_work.c # GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu) # compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 3.1.1-p2. # warning: GMP header version 5.1.1 differs from library version 5.0.2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -nostdinc -I/c/wfg/tip/arch/x86/include # -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude # -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi # -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel # -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__ # -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 # -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 # -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work) # -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem # /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include # /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args # -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip # kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs # -Werror-implicit-function-declaration -Wno-format-security # -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement # -Wno-pointer-sign -p -fno-strict-aliasing -fno-common # -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm # options enabled: -falign-loops -fargument-alias -fauto-inc-dec # -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping # -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining # -feliminate-unused-debug-types -fexpensive-optimizations # -fforward-propagate -ffunction-cse -fgcse -fgcse-lm # -fguess-branch-probability -fident -fif-conversion -fif-conversion2 # -findirect-inlining -finline -finline-functions-called-once # -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants -foptimize-register-move # -fpeephole -fpeephole2 -fprofile -freg-struct-return -fregmove # -freorder-blocks -freorder-functions -frerun-cse-after-loop # -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros # -fsplit-ivs-in-unroller -fsplit-wide-types -fthread-jumps # -ftoplevel-reorder -ftrapping-math -ftree-builtin-call-dce -ftree-ccp # -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce # -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= # -ftree-pre -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-sra # -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version -ftree-vrp # -funit-at-a-time -fvect-cost-model -fverbose-asm # -fzero-initialized-in-bss -m32 -m96bit-long-double # -maccumulate-outgoing-args -malign-stringops -mfused-madd -mglibc # -mieee-fp -mno-fancy-math-387 -mno-red-zone -mno-sse4 -mpush-args -msahf # -mtls-direct-seg-refs # Compiler executable checksum: f7c11247ad5a53a602823d9bd673a474 .section.rodata.str1.1,aMS,@progbits,1 .LC0: .string /c/wfg/tip/kernel/task_work.c .text .p2align 4,,15 .globl task_work_run .type task_work_run, @function task_work_run: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# callmcount #APP # 14 /c/wfg/tip/arch/x86/include/asm/current.h 1 movl current_task,%edi #, task # 0 2 #NO_APP leal904(%edi), %ebx #, D.18648 .p2align 4,,15 .L15: movl(%ebx), %edx#* D.18648, work testl %edx, %edx # work je .L17#, .L2: xorl%ecx, %ecx # head.458 .L3: movl%edx, %eax # work, __ret #APP # 99 /c/wfg/tip/kernel/task_work.c 1 cmpxchgl %ecx,(%ebx)# head.458,* D.18648 # 0 2 #NO_APP cmpl%eax, %edx # __ret, work jne .L15#, testl %edx, %edx # work je .L10#, .p2align 4,,15 .L12: #APP # 656 /c/wfg/tip/arch/x86/include/asm/processor.h 1 rep; nop # 0 2 #NO_APP movl960(%edi), %eax # variable.pi_lock.raw_lock.slock, D.18658 testl %eax, %eax # D.18658 je .L12#,
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! Attached is the new kernel/task_work.s. Here is the diff: gcc 4.6.3 vs 4.4.7 == --- task_work.s 2013-10-09 20:19:48.312272579 +0800 +++ /tmp/task_work.s2013-10-09 20:18:14.0 +0800 @@ -1,136 +1,150 @@ .file task_work.c -# GNU C (Debian 4.6.3-1) version 4.6.3 (x86_64-linux-gnu) -# compiled by GNU C version 4.6.3, GMP version 5.0.4, MPFR version 3.1.0-p3, MPC version 0.9 -# warning: GMP header version 5.0.4 differs from library version 5.0.2. -# warning: MPFR header version 3.1.0-p3 differs from library version 3.1.1-p2. +# GNU C (Debian 4.4.7-4) version 4.4.7 (x86_64-linux-gnu) +# compiled by GNU C version 4.4.7, GMP version 5.1.1, MPFR version 3.1.1-p2. +# warning: GMP header version 5.1.1 differs from library version 5.0.2. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 -# options passed: -nostdinc -I /c/wfg/tip/arch/x86/include -# -I arch/x86/include/generated -I /c/wfg/tip/include -I include -# -I /c/wfg/tip/arch/x86/include/uapi -I arch/x86/include/generated/uapi -# -I /c/wfg/tip/include/uapi -I include/generated/uapi -I /c/wfg/tip/kernel -# -I kernel -imultilib 32 -imultiarch i386-linux-gnu -D __KERNEL__ -# -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 -# -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 -# -D CC_HAVE_ASM_GOTO -D KBUILD_STR(s)=#s -# -D KBUILD_BASENAME=KBUILD_STR(task_work) -# -D KBUILD_MODNAME=KBUILD_STR(task_work) -# -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -# -include /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d +# options passed: -nostdinc -I/c/wfg/tip/arch/x86/include +# -Iarch/x86/include/generated -I/c/wfg/tip/include -Iinclude +# -I/c/wfg/tip/arch/x86/include/uapi -Iarch/x86/include/generated/uapi +# -I/c/wfg/tip/include/uapi -Iinclude/generated/uapi -I/c/wfg/tip/kernel +# -Ikernel -imultilib 32 -imultiarch i386-linux-gnu -D__KERNEL__ +# -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 +# -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 +# -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(task_work) +# -DKBUILD_MODNAME=KBUILD_STR(task_work) -isystem +# /usr/lib/gcc/x86_64-linux-gnu/4.4.7/include -include +# /c/wfg/tip/include/linux/kconfig.h -MD kernel/.task_work.s.d # /c/wfg/tip/kernel/task_work.c -m32 -msoft-float -mregparm=3 # -mpreferred-stack-boundary=2 -march=winchip2 -maccumulate-outgoing-args -# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -# -auxbase-strip kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -# -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security -# -Wno-sign-compare -Wframe-larger-than=1024 -Wno-unused-but-set-variable -# -Wdeclaration-after-statement -Wno-pointer-sign -p -fno-strict-aliasing -# -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic +# -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -auxbase-strip +# kernel/task_work.s -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs +# -Werror-implicit-function-declaration -Wno-format-security +# -Wno-sign-compare -Wframe-larger-than=1024 -Wdeclaration-after-statement +# -Wno-pointer-sign -p -fno-strict-aliasing -fno-common +# -fno-delete-null-pointer-checks -freg-struct-return -fno-pic # -ffreestanding -fno-asynchronous-unwind-tables -fno-stack-protector # -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow # -fconserve-stack -fverbose-asm -# options enabled: -fauto-inc-dec -fbranch-count-reg -fcaller-saves -# -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -# -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize -# -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types -# -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse -# -fgcse-lm -fguess-branch-probability -fident -fif-conversion -# -fif-conversion2 -findirect-inlining -finline -# -finline-functions-called-once -finline-small-functions -fipa-cp -# -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra +# options enabled: -falign-loops -fargument-alias -fauto-inc-dec +# -fbranch-count-reg -fcaller-saves -fcprop-registers -fcrossjumping +# -fcse-follow-jumps -fdefer-pop -fdwarf2-cfi-asm -fearly-inlining +# -feliminate-unused-debug-types -fexpensive-optimizations +# -fforward-propagate -ffunction-cse -fgcse -fgcse-lm +# -fguess-branch-probability -fident -fif-conversion -fif-conversion2 +# -findirect-inlining -finline -finline-functions-called-once +# -finline-small-functions -fipa-cp -fipa-pure-const -fipa-reference # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); setc %dl #,, c That compiler doesn't appear to have asm goto support, so we fall back to the code we already knew worked :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Fengguang, On 10/09, Fengguang Wu wrote: Thanks for looking into this. Attached is the task_work.s for you. Thanks a lot! I'm afraid I am wrong, my asm skills are close to zero... but this code looks wrong to me, and this can explain the oopses. task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %edi # task, task movl%edx, -16(%ebp) # work, %sfp movb%cl, -21(%ebp) # notify, %sfp .p2align 4,,15 .L3: movl904(%edi), %esi # task_3(D)-task_works, head cmpl$work_exited, %esi #, head sete%bl #, D.14145 andl$255, %ebx #, D.14145 xorl%ecx, %ecx # movl%ebx, %edx # D.14145, movl$__f.14042, %eax#, callftrace_likely_update# testl %ebx, %ebx # D.14145 jne .L4 #, movl-16(%ebp), %edx # %sfp, movl%esi, (%edx)# head, work_13(D)-next movl%esi, %eax # head, __ret #APP # 34 /c/wfg/tip/kernel/task_work.c 1 cmpxchgl %edx,904(%edi) #, *__ptr_16 # 0 2 #NO_APP cmpl%eax, %esi # __ret, head jne .L3 #, OK, we added the new work successfully, we should return 0. If we return non-zero, fput() (the likely caller) assumes that it should use the workqueues to close/free this file. Then later task_work_run() will do __fput() again. cmpb$0, -21(%ebp) #, %sfp je .L5 #, movl4(%edi), %eax # task_3(D)-stack, task_3(D)-stack #APP # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)D.14203_29], This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process). # 0 2 #NO_APP .L5: movl$0, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, This is what we are going to return. But note that -20(%ebp) was not initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5 above. IOW, in this case we seem to return a random value from stack. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Peter Zijlstra pet...@infradead.org wrote: On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); setc %dl #,, c That compiler doesn't appear to have asm goto support, so we fall back to the code we already knew worked :-) I'm using 4.7.2 with randconfig testing, which has asm goto support, and I haven't seen this crash yet. Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing bug gets exposed by GCC 4.8. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 02:27:05PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); setc %dl #,, c That compiler doesn't appear to have asm goto support, so we fall back to the code we already knew worked :-) Ah OK.. btw, here is a simple script I used to reproduce the problem. I'll attach the 3MB yocto initrd in another email. However I suspect whatever initrd would be OK. Thanks, Fengguang kvm-0day.sh Description: Bourne shell script
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote: I'm afraid I am wrong, my asm skills are close to zero... but this code looks wrong to me, and this can explain the oopses. task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %edi # task, task movl%edx, -16(%ebp) # work, %sfp movb%cl, -21(%ebp) # notify, %sfp .p2align 4,,15 .L3: movl904(%edi), %esi # task_3(D)-task_works, head cmpl$work_exited, %esi #, head sete%bl #, D.14145 andl$255, %ebx #, D.14145 xorl%ecx, %ecx # movl%ebx, %edx # D.14145, movl$__f.14042, %eax#, callftrace_likely_update# testl %ebx, %ebx # D.14145 jne .L4 #, movl-16(%ebp), %edx # %sfp, movl%esi, (%edx)# head, work_13(D)-next movl%esi, %eax # head, __ret #APP # 34 /c/wfg/tip/kernel/task_work.c 1 cmpxchgl %edx,904(%edi) #, *__ptr_16 # 0 2 #NO_APP cmpl%eax, %esi # __ret, head jne .L3 #, OK, we added the new work successfully, we should return 0. If we return non-zero, fput() (the likely caller) assumes that it should use the workqueues to close/free this file. Then later task_work_run() will do __fput() again. cmpb$0, -21(%ebp) #, %sfp je .L5 #, movl4(%edi), %eax # task_3(D)-stack, task_3(D)-stack #APP # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)D.14203_29], This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process). # 0 2 #NO_APP .L5: movl$0, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, This is what we are going to return. But note that -20(%ebp) was not initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5 above. IOW, in this case we seem to return a random value from stack. I think you're quite right, and I can confirm I can reproduce this with gcc-4.8.1 and Wu's .config: .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %esi # task, task movl%edx, %edi # work, work movl%ecx, -24(%ebp) # notify, %sfp jmp .L4 # .p2align 4,,15 .L9: movl%ebx, (%edi)# __old, work_15(D)-next movl%ebx, %eax # __old, __ret #APP # 34 /usr/src/linux-2.6/kernel/task_work.c 1 cmpxchgl %edi,904(%esi) # work, *__ptr_17 # 0 2 #NO_APP cmpl%eax, %ebx # __ret, __old je .L8 #, .L4: movl904(%esi), %ebx # task_7(D)-task_works, __old cmpl$work_exited, %ebx #, __old sete-13(%ebp) #, %sfp xorl%edx, %edx # __r movb-13(%ebp), %dl # %sfp, __r xorl%ecx, %ecx # movl$__f.14204, %eax#, callftrace_likely_update# cmpb$0, -13(%ebp) #, %sfp je .L9 #, movl$-3, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .p2align 4,,15 .L8: cmpb$0, -24(%ebp) #, %sfp je .L6 #, movl4(%esi), %eax # task_7(D)-stack, task_7(D)-stack #APP # 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23], # 0 2 #NO_APP .L6: movl$0, -20(%ebp) #, %sfp movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .size task_work_add, .-task_work_add Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again (although I don't see why L9 isn't merged with L2): .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: call__fentry__ pushq %rbp# movq%rsp, %rbp #, pushq %r15# pushq %r14# movl%edx, %r14d # notify, notify pushq %r13# movq%rsi, %r13 # work, work pushq %r12# movq%rdi, %r12 # task, task pushq %rbx# jmp .L4 # .p2align 4,,10
Re: [x86] BUG: unable to handle kernel paging request at 00740060
OK, thanks... I didn't notice Richard and Jakub were not cc'ed... Add them, perhaps they can take a look. On 10/09, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 02:43:10PM +0200, Oleg Nesterov wrote: I'm afraid I am wrong, my asm skills are close to zero... but this code looks wrong to me, and this can explain the oopses. task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %edi # task, task movl%edx, -16(%ebp) # work, %sfp movb%cl, -21(%ebp) # notify, %sfp .p2align 4,,15 .L3: movl904(%edi), %esi # task_3(D)-task_works, head cmpl$work_exited, %esi #, head sete%bl #, D.14145 andl$255, %ebx #, D.14145 xorl%ecx, %ecx # movl%ebx, %edx # D.14145, movl$__f.14042, %eax#, callftrace_likely_update# testl %ebx, %ebx # D.14145 jne .L4 #, movl-16(%ebp), %edx # %sfp, movl%esi, (%edx)# head, work_13(D)-next movl%esi, %eax # head, __ret #APP # 34 /c/wfg/tip/kernel/task_work.c 1 cmpxchgl %edx,904(%edi) #, *__ptr_16 # 0 2 #NO_APP cmpl%eax, %esi # __ret, head jne .L3 #, OK, we added the new work successfully, we should return 0. If we return non-zero, fput() (the likely caller) assumes that it should use the workqueues to close/free this file. Then later task_work_run() will do __fput() again. cmpb$0, -21(%ebp) #, %sfp je .L5 #, movl4(%edi), %eax # task_3(D)-stack, task_3(D)-stack #APP # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)D.14203_29], This is set_notify_resume(). Probably !CONFIG_SMP (I do not see kick_process). # 0 2 #NO_APP .L5: movl$0, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, This is what we are going to return. But note that -20(%ebp) was not initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5 above. IOW, in this case we seem to return a random value from stack. I think you're quite right, and I can confirm I can reproduce this with gcc-4.8.1 and Wu's .config: .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# subl$12, %esp #, callmcount movl%eax, %esi # task, task movl%edx, %edi # work, work movl%ecx, -24(%ebp) # notify, %sfp jmp .L4 # .p2align 4,,15 .L9: movl%ebx, (%edi)# __old, work_15(D)-next movl%ebx, %eax # __old, __ret #APP # 34 /usr/src/linux-2.6/kernel/task_work.c 1 cmpxchgl %edi,904(%esi) # work, *__ptr_17 # 0 2 #NO_APP cmpl%eax, %ebx # __ret, __old je .L8 #, .L4: movl904(%esi), %ebx # task_7(D)-task_works, __old cmpl$work_exited, %ebx #, __old sete-13(%ebp) #, %sfp xorl%edx, %edx # __r movb-13(%ebp), %dl # %sfp, __r xorl%ecx, %ecx # movl$__f.14204, %eax#, callftrace_likely_update# cmpb$0, -13(%ebp) #, %sfp je .L9 #, movl$-3, -20(%ebp) #, %sfp .L2: movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .p2align 4,,15 .L8: cmpb$0, -24(%ebp) #, %sfp je .L6 #, movl4(%esi), %eax # task_7(D)-stack, task_7(D)-stack #APP # 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_23], # 0 2 #NO_APP .L6: movl$0, -20(%ebp) #, %sfp movl-20(%ebp), %eax # %sfp, addl$12, %esp #, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .size task_work_add, .-task_work_add Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again (although I don't see why L9 isn't merged with L2): .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: call__fentry__ pushq %rbp# movq%rsp, %rbp #, pushq %r15# pushq %r14# movl%edx, %r14d # notify, notify
Re: [x86] BUG: unable to handle kernel paging request at 00740060
* Peter Zijlstra pet...@infradead.org wrote: This is what we are going to return. But note that -20(%ebp) was not initialized if TIF_NOTIFY_RESUME was already set, jc .L2 skips .L5 above. IOW, in this case we seem to return a random value from stack. I think you're quite right, and I can confirm I can reproduce this with gcc-4.8.1 and Wu's .config: [...] Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again [...] So this at least opens up the possibility that we can create a not too painful quirk and only use the 'asm goto' optimization tricks on 64-bit kernels? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again (although I don't see why L9 isn't merged with L2): i386-SMP also generates correct code afaict; a tad stupid but not wrong. If I remove ftrace from the .config its still broken.. If I also remove the likely/unlikely tracer its still broken and lots smaller: .p2align 4,,15 .globl task_work_add .type task_work_add, @function task_work_add: pushl %ebp# movl%esp, %ebp #, pushl %edi# pushl %esi# pushl %ebx# movl%eax, %esi # task, task .p2align 4,,15 .L4: movl904(%esi), %ebx # task_5(D)-task_works, __old cmpl$work_exited, %ebx #, __old je .L5 #, movl%ebx, (%edx)# __old, work_10(D)-next movl%ebx, %eax # __old, __ret #APP # 34 /usr/src/linux-2.6/kernel/task_work.c 1 cmpxchgl %edx,904(%esi) # work, *__ptr_12 # 0 2 #NO_APP cmpl%eax, %ebx # __ret, __old jne .L4 #, testb %cl, %cl# notify je .L6 #, movl4(%esi), %eax # task_5(D)-stack, task_5(D)-stack #APP # 208 /usr/src/linux-2.6/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); jc .L2 #, MEM[(volatile long unsigned int *)_18], # 0 2 #NO_APP .L6: xorl%edi, %edi # D.14172 .L2: movl%edi, %eax # D.14172, popl%ebx# popl%esi# popl%edi# popl%ebp# ret .L5: movl$-3, %edi #, D.14172 jmp .L2 # .size task_work_add, .-task_work_add That jc .L2 needs to be .L6 ! It looks like it fails to deal with the empty branch. Why this thing needs to use EDI is anybodies guess I suppose. Would've made much more sense to have: .L6: xorl %eax, %eax .L2: popl %ebx popl %esi popl %ebp ret .L5: movl, $-3, %eax jmp .L2 At least its not duplicating the popl+ret bits 3 times anymore. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: Once I force a x86_64 build using the 'same' config it goes away and generates 'sensible' code again (although I don't see why L9 isn't merged with L2): i386-SMP also generates correct code afaict; a tad stupid but not wrong. If I remove ftrace from the .config its still broken.. If I also remove the likely/unlikely tracer its still broken and lots smaller: OK, its -march=winchip2 that's buggered. Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. Jakub -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 9, 2013 at 11:16 AM, Jakub Jelinek ja...@redhat.com wrote: Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. Ok, so it isn't even specific for x86-32, because your test-case shows the bug for me on 64-bit too. Apparently we just have a harder time hitting it in practice in the kernel on x86-64./ Too bad. It makes me nervous about all our _traditional_ uses of asm goto too, never mind the new ones.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. That bug lists the component as middle end; this suggests x86_64 would be vulnerable too, can you confirm? So far we've only observed the wrong code on i386 targets, x86_64 targets appeared correct. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. That bug lists the component as middle end; this suggests x86_64 would be vulnerable too, can you confirm? So far we've only observed the wrong code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Jakub -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, 2013-10-09 at 19:18 +0200, Ingo Molnar wrote: * Ingo Molnar mi...@kernel.org wrote: * Peter Zijlstra pet...@infradead.org wrote: On Wed, Oct 09, 2013 at 08:19:11PM +0800, Fengguang Wu wrote: Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s Update: I recompiled the kernel with gcc 4.4.7 and find it booting fine! # 208 /c/wfg/tip/arch/x86/include/asm/bitops.h 1 bts $1, 8(%eax); setc %dl #,, c That compiler doesn't appear to have asm goto support, so we fall back to the code we already knew worked :-) I'm using 4.7.2 with randconfig testing, which has asm goto support, and I haven't seen this crash yet. Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing bug gets exposed by GCC 4.8. And as it happens, just a few hours later I hit a very similar crash, this time compiled with both 4.7.3 and 4.7.2! (config attached) This has a weird-x86-arch tuning knob as well: CONFIG_MGEODE_LX=y So I think we might need to turn off asm goto for all things 32-bit x86. Hm, 32 bit x86... I built 4.8.1 yesterday, so can now build x86_64 tip, but I suspect I'll not be the only one with a compiler that goes belly up. net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’: net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at haifa-sched.c:2353 gcc-4.6.2 (opensuse 12.1) has happily chewed up humongous piles of source, but finds this asm goto stuff to be toxic. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, 2013-10-08 at 21:05 +0200, Jakub Jelinek wrote: > On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: > > On 10/08, Linus Torvalds wrote: > > > > > > (not yet merged), see: > > > > > > > > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d > > > > I do not really understand inline assembly constraints, but I'll ask > > anyway. > > > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > > +do { \ > > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > > + : : "m" (var), ## __VA_ARGS__ \ > > ^ > > > > don't we need > > > > "+m" (var) > > > > here? > > You actually can't have output operands with asm goto, only inputs > and clobbers. But the "memory" clobber should be enough here. > > If you suspect a compiler bug, can somebody please narrow it down to > a single object file (if I've skimmed the patch right, it is just an > optimization, where object files compiled without and with the patch > should actually coexist fine in the same kernel), ideally to a single > routine if possible and post a preprocessed source + gcc command line > + version of gcc? gcc version 4.6.2 (SUSE Linux) won't produce output, but where it dies might point in the general direction of newer gcc troubles? CC [M] net/sunrpc/xprtsock.o net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’: net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at haifa-sched.c:2353 -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:35 PM, Oleg Nesterov wrote: > > Cough... sorry for off-topic question, > > static inline int test_and_set_bit(long nr, volatile unsigned long > *addr) > { > int oldbit; > > asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : > "memory"); > > doesn't this mean that "ADDR" doesn't need "+" as well? We use ADDR for some of the non-barrier ones too, that don't have the barrier. See clear_bit() and friends.. > Or at least, perhaps it makes sense to identify the include file which > makes the difference. Say, revert the changes in bitops.h, retest, then > in atomic.h if the kernel still fails, etc. Yeah, except Fengguang is the only one seeing this in his automated tests.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Jakub Jelinek wrote: > > On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: > > > > I do not really understand inline assembly constraints, but I'll ask > > anyway. > > > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > > +do { \ > > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > > + : : "m" (var), ## __VA_ARGS__ \ > > ^ > > > > don't we need > > > > "+m" (var) > > > > here? > > You actually can't have output operands with asm goto, only inputs > and clobbers. But the "memory" clobber should be enough here. Thanks Jakub and Linus. Cough... sorry for off-topic question, static inline int test_and_set_bit(long nr, volatile unsigned long *addr) { int oldbit; asm volatile(LOCK_PREFIX "bts %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); doesn't this mean that "ADDR" doesn't need "+" as well? > If you suspect a compiler bug, can somebody please narrow it down to > a single object file (if I've skimmed the patch right, it is just an > optimization, where object files compiled without and with the patch > should actually coexist fine in the same kernel), ideally to a single > routine if possible and post a preprocessed source + gcc command line > + version of gcc? Or at least, perhaps it makes sense to identify the include file which makes the difference. Say, revert the changes in bitops.h, retest, then in atomic.h if the kernel still fails, etc. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:20 PM, Linus Torvalds wrote: > > I'll try to see if I can reproduce this on my hardware Yeah, doesn't reproduce here.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:05 PM, Jakub Jelinek wrote: > > If you suspect a compiler bug, can somebody please narrow it down to > a single object file (if I've skimmed the patch right, it is just an > optimization, where object files compiled without and with the patch > should actually coexist fine in the same kernel), ideally to a single > routine if possible and post a preprocessed source + gcc command line > + version of gcc? It is indeed just an optimization, and we could in theory switch between the two versions on a case-by-case basis, but we don't have any sane way to really do that. I'll try to see if I can reproduce this on my hardware (just applying that patch on top of my own tip) and see if I can try to narrow things down. But I looked at the assembly for a couple of files, and it all looked good, and I know this patch works fine for others (ie all the normal -tip testing), so I suspect it's something specific to what Fengguang does. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: > On 10/08, Linus Torvalds wrote: > > > > (not yet merged), see: > > > > > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d > > I do not really understand inline assembly constraints, but I'll ask > anyway. > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > +do { \ > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > + : : "m" (var), ## __VA_ARGS__ \ > ^ > > don't we need > > "+m" (var) > > here? You actually can't have output operands with asm goto, only inputs and clobbers. But the "memory" clobber should be enough here. If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 11:51 AM, Oleg Nesterov wrote: > > I do not really understand inline assembly constraints, but I'll ask > anyway. > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > +do { \ > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > + : : "m" (var), ## __VA_ARGS__ \ > ^ > > don't we need > > "+m" (var) We have a memory clobber instead. So the memory is marked as input and clobbered. And we'd love to mark it "+m", but "ask goto" cannot have outputs. For the serializing ones, the memory clobber is ok - they have barrier semantics anyway. But we'd actually *want* to use "asm goto" for some cases where the memory clobber is too big of a hammer, so if we ever get input/output constraints to "asm goto" we'll be happy. Of course, right now it looks like we shouldn't be in a rush to use "asm goto" at all... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Linus Torvalds wrote: > > (not yet merged), see: > > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ + : : "m" (var), ## __VA_ARGS__ \ ^ don't we need "+m" (var) here? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Fengguang Wu wrote: > > Yeah, this will quiet the oops messages: > > -#ifdef CC_HAVE_ASM_GOTO > +#if 0 Can't understand how this can affect task_work.c... Well, task_work_add() does test_and_set_bit(), so that patch actually changes this code, but still I can't see how this can lead to these OOPSes. Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
[ Richard and Jakub added to cc, they can perhaps help or at least point us to the right gcc person. Richard, Jakub, the bug is triggered by kernel commit 0c44c2d0f459 (not yet merged), see: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d for the patch. The actual inline asm is pretty dang small, and the non-asm-goto version works fine. Can you take a look? ] On Tue, Oct 8, 2013 at 12:51 AM, Fengguang Wu wrote: > [9.844709] *pdpt = 072c1001 *pde = > >> That said, Fengguang, can you try two things just to check: >> >> - add "cc" to the clobbers list for the asm goto (technically it >> should be on the non-asm-goto as well, but we never had that, and >> maybe the fact that gcc always ends up testing a register afterwards >> hides the need for the clobber). >> >> So it would look like this in arch/x86/include/asm/rmwcc.h >> >> #define __GEN_RMWcc(fullop, var, cc, ...) \ >> do { \ >> asm volatile goto (fullop "; j" cc " %l[cc_label]" \ >> : : "m" (var), ## __VA_ARGS__ \ >> : "memory", "cc" : cc_label); \ >> return 0; \ >> cc_label: \ >> return 1; \ >> >> (where that "cc" thing is new). I'm not sure if "cc" really matters on >> x86 at all (it didn't use to, long long ago), but maybe it does these >> days.. > > Tests show that it makes no difference by adding the "cc" this way: > > - : "memory" : cc_label); \ > + : "memory", "cc" : cc_label); > \ Ok, that was a long shot, I don't think gcc actually ever assumes cc is live over an asm on x86. >> If that makes no difference, please just verify that the non-asm-goto >> version works fine, by changing the >> >> #ifdef CC_HAVE_ASM_GOTO >> >> into a simple "#if 0" to disable the asm-goto version. > > Yeah, this will quiet the oops messages: > > -#ifdef CC_HAVE_ASM_GOTO > +#if 0 Ok. So it looks very much like "asm goto()" is simply buggered. Too bad, since it generated nice clear code. I suspect it's the memory clobber - maybe it only marks memory as clobbered for the fallthrough case, and the actual "goto" case might used old cached values? What do I know, it's just a theory. We do have "asm goto" with memory clobbers elsewhere (our x86 version of __mutex_fastpath_lock()), but that use is very limited and only gets expanded in a single place. The new bitop cases get expanded *everywhere*, so if there is something subtly wrong wrt code generation that requires some particular pattern, they'd trigger it much more easily. Anybody have any ideas? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
I'll try to find other messages to understand what you are talking about, just one note for now On 10/07, Linus Torvalds wrote: > > Your oops makes very little sense, it looks like task_work_run() just > called out to random crap, probably because the work was already > released, so "work->func()" ends up being bad. Or task_work_run() can hit work->func == NULL if do_exit() is called twice if, say, the task does BUG() after exit_task_work(). > participants anyway, just in case there is some race. The comment says > that it can race with task_work_cancel() playing with *work. Oleg, > comments? The comment tries to say that if we are racing with task_work_cancel() it can't delete the first entry == work, we won the race, its cmpxchg(task->task_works) should fail. Howver, task_work_cancel() can delete one of the next entries and change, say, work->next. And we need to wait anyway if it scans this list. I'll try to recheck, but so far I do not see anything wrong. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Linus, On Mon, Oct 07, 2013 at 11:47:39AM -0700, Linus Torvalds wrote: > On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu wrote: > > > > I got the below dmesg and the first bad commit is > > > > commit 0c44c2d0f459 ("x86: Use asm goto to implement better > > modify_and_test() functions" > > Hmm. I'm looking at the final version of that patch, and I'm not > seeing anything wrong. It may trigger a compiler bug - there aren't > that many "asm goto" users, and using them for the bitops adds a lot > of new cases. > > Your oops makes very little sense, it looks like task_work_run() just > called out to random crap, probably because the work was already > released, so "work->func()" ends up being bad. I'm adding Oleg to the > participants anyway, just in case there is some race. The comment says > that it can race with task_work_cancel() playing with *work. Oleg, > comments? > > However, I don't see any actual bit-op code in task_work_run() itself, > so it's something else that got miscompiled and corrupted memory. In > that respect, the oops you have looks more like the oopses you got > with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set? The options was set: DEBUG_KOBJECT_RELEASE=y I tried disabled it, and find the error still remains: [9.719060] Write protecting the kernel text: 6116k [9.720356] Write protecting the kernel read-only data: 2616k [9.721586] NX-protecting the kernel data: 6172k [9.750420] BUG: unable to handle kernel NULL pointer dereference at (null) [9.750870] IP: [< (null)>] (null) [9.750870] *pdpt = 072be001 *pde = [9.750870] Oops: 0010 [#1] DEBUG_PAGEALLOC [9.750870] CPU: 0 PID: 84 Comm: rc.local Not tainted 3.12.0-rc1-00081-g6bfa687 #4 [9.750870] task: 872c4000 ti: 872c6000 task.ti: 872c6000 [9.750870] EIP: 0060:[<>] EFLAGS: 00010246 CPU: 0 [9.750870] EIP is at 0x0 [9.750870] EAX: 82076134 EBX: 872b2780 ECX: EDX: 82076134 [9.750870] ESI: 872c4000 EDI: 872c4388 EBP: 872c7f9c ESP: 872c7f8c [9.750870] DS: 007b ES: 007b FS: GS: SS: 0068 [9.750870] CR0: 8005003b CR2: CR3: 072bd000 CR4: 06b0 [9.750870] Stack: [9.750870] 810545b9 0001 789ecf58 7767dff4 872c7fac 81002358 78a03903 [9.750870] 872c6000 815f6bd0 [9.750870] 007b 007b 000b 777d81d0 0073 [9.750870] Call Trace: [9.750870] [<810545b9>] ? task_work_run+0x79/0xb0 [9.750870] [<81002358>] do_notify_resume+0x58/0x70 [9.750870] [<815f6bd0>] work_notifysig+0x2b/0x3b [9.750870] Code: Bad EIP value. [9.750870] EIP: [<>] 0x0 SS:ESP 0068:872c7f8c [9.750870] CR2: [9.769399] ---[ end trace da54692b95c91495 ]--- [9.777566] BUG: unable to handle kernel paging request at 05140060 [9.778845] IP: [<81054594>] task_work_run+0x54/0xb0 [9.779774] *pdpt = *pde = f000ff53f000ff53 [9.780708] Oops: [#2] DEBUG_PAGEALLOC [9.781431] CPU: 0 PID: 85 Comm: hostname Tainted: G D 3.12.0-rc1-00081-g6bfa687 #4 [9.781721] task: 872c8000 ti: 872ca000 task.ti: 872ca000 [9.781721] EIP: 0060:[<81054594>] EFLAGS: 00010206 CPU: 0 [9.781721] EIP is at task_work_run+0x54/0xb0 [9.781721] EAX: 05140060 EBX: 8729b900 ECX: EDX: 05140060 [9.781721] ESI: 872c8000 EDI: 872c8388 EBP: 872cbf3c ESP: 872cbf30 [9.781721] DS: 007b ES: 007b FS: GS: SS: 0068 [9.781721] CR0: 8005003b CR2: 05140060 CR3: 072cc000 CR4: 06b0 [9.781721] Stack: [9.781721] 872af400 872c8000 872cbf8c 8103a02a 0014 776cefb8 8105b49b [9.781721] 872cbfac 0001 0015 61636f6c 736f686c 6f6c2e74 872af458 [9.781721] 69616d6f 872af46e 872af458 872ae980 872c8000 872cbfa4 [9.781721] Call Trace: [9.781721] [<8103a02a>] do_exit+0x2aa/0x920 [9.781721] [<8105b49b>] ? up_write+0x1b/0x30 [9.781721] [<8103a732>] do_group_exit+0x52/0xb0 [9.781721] [<8103a7a8>] SyS_exit_group+0x18/0x20 [9.781721] [<815f7130>] sysenter_do_call+0x12/0x3c [9.781721] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0 [9.781721] EIP: [<81054594>] task_work_run+0x54/0xb0 SS:ESP 0068:872cbf30 [9.781721] CR2: 05140060 [9.802246] ---[ end trace da54692b95c91496 ]--- [9.802881] Fixing recursive fault but reboot is needed! [9.811986] BUG: unable to handle kernel paging request at 0805a000 [9.812911] IP: [<81054594>] task_work_run+0x54/0xb0 [9.813683] *pdpt = 072e2001 *pde = 072cf067 *pte = [9.815024] Oops: [#3] DEBUG_PAGEALLOC [9.815623] CPU: 0 PID: 86 Comm: plymouthd Tainted: G D
Re: [x86] BUG: unable to handle kernel paging request at 00740060
Hi Linus, On Mon, Oct 07, 2013 at 11:47:39AM -0700, Linus Torvalds wrote: On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu fengguang...@intel.com wrote: I got the below dmesg and the first bad commit is commit 0c44c2d0f459 (x86: Use asm goto to implement better modify_and_test() functions Hmm. I'm looking at the final version of that patch, and I'm not seeing anything wrong. It may trigger a compiler bug - there aren't that many asm goto users, and using them for the bitops adds a lot of new cases. Your oops makes very little sense, it looks like task_work_run() just called out to random crap, probably because the work was already released, so work-func() ends up being bad. I'm adding Oleg to the participants anyway, just in case there is some race. The comment says that it can race with task_work_cancel() playing with *work. Oleg, comments? However, I don't see any actual bit-op code in task_work_run() itself, so it's something else that got miscompiled and corrupted memory. In that respect, the oops you have looks more like the oopses you got with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set? The options was set: DEBUG_KOBJECT_RELEASE=y I tried disabled it, and find the error still remains: [9.719060] Write protecting the kernel text: 6116k [9.720356] Write protecting the kernel read-only data: 2616k [9.721586] NX-protecting the kernel data: 6172k [9.750420] BUG: unable to handle kernel NULL pointer dereference at (null) [9.750870] IP: [ (null)] (null) [9.750870] *pdpt = 072be001 *pde = [9.750870] Oops: 0010 [#1] DEBUG_PAGEALLOC [9.750870] CPU: 0 PID: 84 Comm: rc.local Not tainted 3.12.0-rc1-00081-g6bfa687 #4 [9.750870] task: 872c4000 ti: 872c6000 task.ti: 872c6000 [9.750870] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 [9.750870] EIP is at 0x0 [9.750870] EAX: 82076134 EBX: 872b2780 ECX: EDX: 82076134 [9.750870] ESI: 872c4000 EDI: 872c4388 EBP: 872c7f9c ESP: 872c7f8c [9.750870] DS: 007b ES: 007b FS: GS: SS: 0068 [9.750870] CR0: 8005003b CR2: CR3: 072bd000 CR4: 06b0 [9.750870] Stack: [9.750870] 810545b9 0001 789ecf58 7767dff4 872c7fac 81002358 78a03903 [9.750870] 872c6000 815f6bd0 [9.750870] 007b 007b 000b 777d81d0 0073 [9.750870] Call Trace: [9.750870] [810545b9] ? task_work_run+0x79/0xb0 [9.750870] [81002358] do_notify_resume+0x58/0x70 [9.750870] [815f6bd0] work_notifysig+0x2b/0x3b [9.750870] Code: Bad EIP value. [9.750870] EIP: [] 0x0 SS:ESP 0068:872c7f8c [9.750870] CR2: [9.769399] ---[ end trace da54692b95c91495 ]--- [9.777566] BUG: unable to handle kernel paging request at 05140060 [9.778845] IP: [81054594] task_work_run+0x54/0xb0 [9.779774] *pdpt = *pde = f000ff53f000ff53 [9.780708] Oops: [#2] DEBUG_PAGEALLOC [9.781431] CPU: 0 PID: 85 Comm: hostname Tainted: G D 3.12.0-rc1-00081-g6bfa687 #4 [9.781721] task: 872c8000 ti: 872ca000 task.ti: 872ca000 [9.781721] EIP: 0060:[81054594] EFLAGS: 00010206 CPU: 0 [9.781721] EIP is at task_work_run+0x54/0xb0 [9.781721] EAX: 05140060 EBX: 8729b900 ECX: EDX: 05140060 [9.781721] ESI: 872c8000 EDI: 872c8388 EBP: 872cbf3c ESP: 872cbf30 [9.781721] DS: 007b ES: 007b FS: GS: SS: 0068 [9.781721] CR0: 8005003b CR2: 05140060 CR3: 072cc000 CR4: 06b0 [9.781721] Stack: [9.781721] 872af400 872c8000 872cbf8c 8103a02a 0014 776cefb8 8105b49b [9.781721] 872cbfac 0001 0015 61636f6c 736f686c 6f6c2e74 872af458 [9.781721] 69616d6f 872af46e 872af458 872ae980 872c8000 872cbfa4 [9.781721] Call Trace: [9.781721] [8103a02a] do_exit+0x2aa/0x920 [9.781721] [8105b49b] ? up_write+0x1b/0x30 [9.781721] [8103a732] do_group_exit+0x52/0xb0 [9.781721] [8103a7a8] SyS_exit_group+0x18/0x20 [9.781721] [815f7130] sysenter_do_call+0x12/0x3c [9.781721] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 8b 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0 [9.781721] EIP: [81054594] task_work_run+0x54/0xb0 SS:ESP 0068:872cbf30 [9.781721] CR2: 05140060 [9.802246] ---[ end trace da54692b95c91496 ]--- [9.802881] Fixing recursive fault but reboot is needed! [9.811986] BUG: unable to handle kernel paging request at 0805a000 [9.812911] IP: [81054594] task_work_run+0x54/0xb0 [9.813683] *pdpt = 072e2001 *pde = 072cf067 *pte = [9.815024] Oops: [#3] DEBUG_PAGEALLOC [9.815623] CPU: 0 PID: 86 Comm: plymouthd Tainted: G D 3.12.0-rc1-00081-g6bfa687 #4 [9.816819] task:
Re: [x86] BUG: unable to handle kernel paging request at 00740060
I'll try to find other messages to understand what you are talking about, just one note for now On 10/07, Linus Torvalds wrote: Your oops makes very little sense, it looks like task_work_run() just called out to random crap, probably because the work was already released, so work-func() ends up being bad. Or task_work_run() can hit work-func == NULL if do_exit() is called twice if, say, the task does BUG() after exit_task_work(). participants anyway, just in case there is some race. The comment says that it can race with task_work_cancel() playing with *work. Oleg, comments? The comment tries to say that if we are racing with task_work_cancel() it can't delete the first entry == work, we won the race, its cmpxchg(task-task_works) should fail. Howver, task_work_cancel() can delete one of the next entries and change, say, work-next. And we need to wait anyway if it scans this list. I'll try to recheck, but so far I do not see anything wrong. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
[ Richard and Jakub added to cc, they can perhaps help or at least point us to the right gcc person. Richard, Jakub, the bug is triggered by kernel commit 0c44c2d0f459 (not yet merged), see: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d for the patch. The actual inline asm is pretty dang small, and the non-asm-goto version works fine. Can you take a look? ] On Tue, Oct 8, 2013 at 12:51 AM, Fengguang Wu fengguang...@intel.com wrote: [9.844709] *pdpt = 072c1001 *pde = That said, Fengguang, can you try two things just to check: - add cc to the clobbers list for the asm goto (technically it should be on the non-asm-goto as well, but we never had that, and maybe the fact that gcc always ends up testing a register afterwards hides the need for the clobber). So it would look like this in arch/x86/include/asm/rmwcc.h #define __GEN_RMWcc(fullop, var, cc, ...) \ do { \ asm volatile goto (fullop ; j cc %l[cc_label] \ : : m (var), ## __VA_ARGS__ \ : memory, cc : cc_label); \ return 0; \ cc_label: \ return 1; \ (where that cc thing is new). I'm not sure if cc really matters on x86 at all (it didn't use to, long long ago), but maybe it does these days.. Tests show that it makes no difference by adding the cc this way: - : memory : cc_label); \ + : memory, cc : cc_label); \ Ok, that was a long shot, I don't think gcc actually ever assumes cc is live over an asm on x86. If that makes no difference, please just verify that the non-asm-goto version works fine, by changing the #ifdef CC_HAVE_ASM_GOTO into a simple #if 0 to disable the asm-goto version. Yeah, this will quiet the oops messages: -#ifdef CC_HAVE_ASM_GOTO +#if 0 Ok. So it looks very much like asm goto() is simply buggered. Too bad, since it generated nice clear code. I suspect it's the memory clobber - maybe it only marks memory as clobbered for the fallthrough case, and the actual goto case might used old cached values? What do I know, it's just a theory. We do have asm goto with memory clobbers elsewhere (our x86 version of __mutex_fastpath_lock()), but that use is very limited and only gets expanded in a single place. The new bitop cases get expanded *everywhere*, so if there is something subtly wrong wrt code generation that requires some particular pattern, they'd trigger it much more easily. Anybody have any ideas? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Fengguang Wu wrote: Yeah, this will quiet the oops messages: -#ifdef CC_HAVE_ASM_GOTO +#if 0 Can't understand how this can affect task_work.c... Well, task_work_add() does test_and_set_bit(), so that patch actually changes this code, but still I can't see how this can lead to these OOPSes. Fengguang, I do not think this will help, but just in case. Could you show the result of $ kernel/task_work.s ? Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Linus Torvalds wrote: (not yet merged), see: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop ; j cc %l[cc_label] \ + : : m (var), ## __VA_ARGS__ \ ^ don't we need +m (var) here? Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 11:51 AM, Oleg Nesterov o...@redhat.com wrote: I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop ; j cc %l[cc_label] \ + : : m (var), ## __VA_ARGS__ \ ^ don't we need +m (var) We have a memory clobber instead. So the memory is marked as input and clobbered. And we'd love to mark it +m, but ask goto cannot have outputs. For the serializing ones, the memory clobber is ok - they have barrier semantics anyway. But we'd actually *want* to use asm goto for some cases where the memory clobber is too big of a hammer, so if we ever get input/output constraints to asm goto we'll be happy. Of course, right now it looks like we shouldn't be in a rush to use asm goto at all... Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: On 10/08, Linus Torvalds wrote: (not yet merged), see: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop ; j cc %l[cc_label] \ + : : m (var), ## __VA_ARGS__ \ ^ don't we need +m (var) here? You actually can't have output operands with asm goto, only inputs and clobbers. But the memory clobber should be enough here. If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? Jakub -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:05 PM, Jakub Jelinek ja...@redhat.com wrote: If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? It is indeed just an optimization, and we could in theory switch between the two versions on a case-by-case basis, but we don't have any sane way to really do that. I'll try to see if I can reproduce this on my hardware (just applying that patch on top of my own tip) and see if I can try to narrow things down. But I looked at the assembly for a couple of files, and it all looked good, and I know this patch works fine for others (ie all the normal -tip testing), so I suspect it's something specific to what Fengguang does. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:20 PM, Linus Torvalds torva...@linux-foundation.org wrote: I'll try to see if I can reproduce this on my hardware Yeah, doesn't reproduce here.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On 10/08, Jakub Jelinek wrote: On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop ; j cc %l[cc_label] \ + : : m (var), ## __VA_ARGS__ \ ^ don't we need +m (var) here? You actually can't have output operands with asm goto, only inputs and clobbers. But the memory clobber should be enough here. Thanks Jakub and Linus. Cough... sorry for off-topic question, static inline int test_and_set_bit(long nr, volatile unsigned long *addr) { int oldbit; asm volatile(LOCK_PREFIX bts %2,%1\n\t sbb %0,%0 : =r (oldbit), ADDR : Ir (nr) : memory); doesn't this mean that ADDR doesn't need + as well? If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? Or at least, perhaps it makes sense to identify the include file which makes the difference. Say, revert the changes in bitops.h, retest, then in atomic.h if the kernel still fails, etc. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 8, 2013 at 12:35 PM, Oleg Nesterov o...@redhat.com wrote: Cough... sorry for off-topic question, static inline int test_and_set_bit(long nr, volatile unsigned long *addr) { int oldbit; asm volatile(LOCK_PREFIX bts %2,%1\n\t sbb %0,%0 : =r (oldbit), ADDR : Ir (nr) : memory); doesn't this mean that ADDR doesn't need + as well? We use ADDR for some of the non-barrier ones too, that don't have the barrier. See clear_bit() and friends.. Or at least, perhaps it makes sense to identify the include file which makes the difference. Say, revert the changes in bitops.h, retest, then in atomic.h if the kernel still fails, etc. Yeah, except Fengguang is the only one seeing this in his automated tests.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, 2013-10-08 at 21:05 +0200, Jakub Jelinek wrote: On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: On 10/08, Linus Torvalds wrote: (not yet merged), see: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d I do not really understand inline assembly constraints, but I'll ask anyway. +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm volatile goto (fullop ; j cc %l[cc_label] \ + : : m (var), ## __VA_ARGS__ \ ^ don't we need +m (var) here? You actually can't have output operands with asm goto, only inputs and clobbers. But the memory clobber should be enough here. If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? gcc version 4.6.2 (SUSE Linux) won't produce output, but where it dies might point in the general direction of newer gcc troubles? CC [M] net/sunrpc/xprtsock.o net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’: net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at haifa-sched.c:2353 -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu wrote: > > I got the below dmesg and the first bad commit is > > commit 0c44c2d0f459 ("x86: Use asm goto to implement better modify_and_test() > functions" Hmm. I'm looking at the final version of that patch, and I'm not seeing anything wrong. It may trigger a compiler bug - there aren't that many "asm goto" users, and using them for the bitops adds a lot of new cases. Your oops makes very little sense, it looks like task_work_run() just called out to random crap, probably because the work was already released, so "work->func()" ends up being bad. I'm adding Oleg to the participants anyway, just in case there is some race. The comment says that it can race with task_work_cancel() playing with *work. Oleg, comments? However, I don't see any actual bit-op code in task_work_run() itself, so it's something else that got miscompiled and corrupted memory. In that respect, the oops you have looks more like the oopses you got with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set? That said, Fengguang, can you try two things just to check: - add "cc" to the clobbers list for the asm goto (technically it should be on the non-asm-goto as well, but we never had that, and maybe the fact that gcc always ends up testing a register afterwards hides the need for the clobber). So it would look like this in arch/x86/include/asm/rmwcc.h #define __GEN_RMWcc(fullop, var, cc, ...) \ do { \ asm volatile goto (fullop "; j" cc " %l[cc_label]" \ : : "m" (var), ## __VA_ARGS__ \ : "memory", "cc" : cc_label); \ return 0; \ cc_label: \ return 1; \ (where that "cc" thing is new). I'm not sure if "cc" really matters on x86 at all (it didn't use to, long long ago), but maybe it does these days.. If that makes no difference, please just verify that the non-asm-goto version works fine, by changing the #ifdef CC_HAVE_ASM_GOTO into a simple "#if 0" to disable the asm-goto version. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 11:08:56AM +0200, Peter Zijlstra wrote: > On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: > > Wu, do you use the same compiler version for all the builds that crash > > like this (I'm assuming the other email was this same commit)? Does a > > different compiler make things work again? > > OK, so in a further email you say you use gcc-4.8.1 (which is actually > newer than the one I used for most of the work although I do have it and > tried it iirc). Right. And I've got the boot result for gcc-4.6.3: the problem is still there. Here is the qemu cmdline and new dmesg. = cmd=( qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel $1 -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net user,vlan=0,hostfwd=tcp::10661-:22 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -serial file:/dev/shm/serial-0day -daemonize -display none -monitor null ) "${cmd[@]}" -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw' = [0.00] Linux version 3.12.0-rc1-00081-g6bfa687 (wfg@bee) (gcc version 4.6.3 (Debian 4.6.3-1) ) #1 Mon Oct 7 19:18:22 CST 2013 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] debug: ignoring loglevel setting. [0.00] NX (Execute Disable) protection: active [0.00] Hypervisor detected: KVM [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100 [0.00] Scanning 1 areas for low memory corruption [0.00] initial memory mapped: [mem 0x-0x029f] [0.00] Base memory trampoline at [8009b000] 9b000 size 16384 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] init_memory_mapping: [mem 0x0e40-0x0e5f] [0.00] [mem 0x0e40-0x0e5f] page 4k [0.00] BRK [0x020a7000, 0x020a7fff] PGTABLE [0.00] init_memory_mapping: [mem 0x0c00-0x0e3f] [0.00] [mem 0x0c00-0x0e3f] page 4k [0.00] BRK [0x020a8000, 0x020a8fff] PGTABLE [0.00] BRK [0x020a9000, 0x020a9fff] PGTABLE [0.00] BRK [0x020aa000, 0x020aafff] PGTABLE [0.00] BRK [0x020ab000, 0x020abfff] PGTABLE [0.00] BRK [0x020ac000, 0x020acfff] PGTABLE [0.00] init_memory_mapping: [mem 0x0010-0x0bff] [0.00] [mem 0x0010-0x0bff] page 4k [0.00] init_memory_mapping: [mem 0x0e60-0x0fffdfff] [0.00] [mem 0x0e60-0x0fffdfff] page 4k [0.00] log_buf_len: 8388608 [0.00] early log buf free: 128876(98%) [0.00] RAMDISK: [mem 0x0e73f000-0x0ffe] [0.00] ACPI: RSDP 000f16b0 00014 (v00 BOCHS ) [0.00] ACPI: RSDT 0fffe3f0 00034 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 0f80 00074 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.00] ACPI: DSDT 0fffe430 01137 (v01 BXPC BXDSDT 0001 INTL 20100528) [0.00] ACPI: FACS 0f40 00040 [0.00] ACPI: SSDT 06a0 00899 (v01 BOCHS BXPCSSDT 0001 BXPC 0001) [0.00] ACPI: APIC 05b0 00080 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.00] ACPI: HPET 0570 00038 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.00] 255MB LOWMEM available. [0.00] mapped low ram: 0 - 0fffe000 [0.00] low ram: 0 - 0fffe000 [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 0:fffd001, boot clock [0.00] Zone ranges: [0.00] Normal [mem 0x1000-0x0fffdfff] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009efff] [0.00] node 0: [mem 0x0010-0x0fffdfff] [0.00] On node 0 totalpages: 65436 [0.00] Normal zone: 576 pages used for memmap [0.00] Normal zone:
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: > On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote: > > Greetings, > > > > I got the below dmesg and the first bad commit is > > > > commit 0c44c2d0f459cd7e275242b72f500137c4fa834d > > Author: Peter Zijlstra > > Date: Wed Sep 11 15:19:24 2013 +0200 > > > > x86: Use asm goto to implement better modify_and_test() functions > > > > Linus suggested using asm goto to get rid of the typical SETcc + TEST > > instruction pair -- which also clobbers an extra register -- for our > > typical modify_and_test() functions. > > > > Because asm goto doesn't allow output fields it has to include an > > unconditinal memory clobber when it changes a memory variable to force > > a reload. > > > > Luckily all atomic ops already imply a compiler barrier to go along > > with their memory barrier semantics. > > > > Suggested-by: Linus Torvalds > > Signed-off-by: Peter Zijlstra > > Link: > > http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org > > Signed-off-by: Ingo Molnar > > > Well that blows,.. Anybody got any clue as to where to start looking? > I've not actually seen anything like this on my own machines. Perhaps it's related to one of - the randconfig - kvm - gcc In the end of dmesg file, there is the qemu command line to run the kernel: qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel /kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kernel-tests/run-queue/kvm/i386-randconfig-j1-10052106/next:master/.vmlinuz-a0cf1abc25ac197dd97b857c0f6341066a8cb1cf-20131005211923-7-athens branch=next/master BOOT_IMAGE=/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab' -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net user,vlan=0,hostfwd=tcp::10661-:22 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -drive file=/fs/LABEL=KVM/disk0-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk1-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk2-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk3-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk4-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk5-quantal-athens-6,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-quantal-athens-6 -serial file:/dev/shm/kboot/serial-quantal-athens-6 -daemonize -display none -monitor null > Wu, do you use the same compiler version for all the builds that crash > like this (I'm assuming the other email was this same commit)? Does a > different compiler make things work again? Good point. I'll try a different compiler. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: > Wu, do you use the same compiler version for all the builds that crash > like this (I'm assuming the other email was this same commit)? Does a > different compiler make things work again? OK, so in a further email you say you use gcc-4.8.1 (which is actually newer than the one I used for most of the work although I do have it and tried it iirc). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote: > Greetings, > > I got the below dmesg and the first bad commit is > > commit 0c44c2d0f459cd7e275242b72f500137c4fa834d > Author: Peter Zijlstra > Date: Wed Sep 11 15:19:24 2013 +0200 > > x86: Use asm goto to implement better modify_and_test() functions > > Linus suggested using asm goto to get rid of the typical SETcc + TEST > instruction pair -- which also clobbers an extra register -- for our > typical modify_and_test() functions. > > Because asm goto doesn't allow output fields it has to include an > unconditinal memory clobber when it changes a memory variable to force > a reload. > > Luckily all atomic ops already imply a compiler barrier to go along > with their memory barrier semantics. > > Suggested-by: Linus Torvalds > Signed-off-by: Peter Zijlstra > Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org > Signed-off-by: Ingo Molnar Well that blows,.. Anybody got any clue as to where to start looking? I've not actually seen anything like this on my own machines. Wu, do you use the same compiler version for all the builds that crash like this (I'm assuming the other email was this same commit)? Does a different compiler make things work again? > [3.336040] Write protecting the kernel read-only data: 2644k > [3.336982] NX-protecting the kernel data: 6152k > [3.375173] BUG: unable to handle kernel paging request at 00740060 > [3.376162] IP: [<81053fc4>] task_work_run+0x54/0xa0 > [3.376837] *pdpt = 072e1001 *pde = > [3.377579] Oops: [#1] DEBUG_PAGEALLOC > [3.378158] CPU: 0 PID: 85 Comm: hostname Not tainted > 3.12.0-rc2-next-20130927-03100-ga0cf1ab #5 > [3.378206] task: 8730c000 ti: 8730e000 task.ti: 8730e000 > [3.378206] EIP: 0060:[<81053fc4>] EFLAGS: 00010206 CPU: 0 > [3.378206] EIP is at task_work_run+0x54/0xa0 > [3.378206] EAX: 00740060 EBX: 87309000 ECX: EDX: 00740060 > [3.378206] ESI: 8730c388 EDI: 8730c000 EBP: 8730ff40 ESP: 8730ff34 > [3.378206] DS: 007b ES: 007b FS: GS: SS: 0068 > [3.378206] CR0: 8005003b CR2: 00740060 CR3: 072d7000 CR4: 06b0 > [3.378206] Stack: > [3.378206] 87308058 8730c000 8730ff8c 81039315 77675fb8 > 8105af7b > [3.378206] 8730ffac 0001 6c0e41a5 61636f6c 736f686c 6f6c2e74 > 646c6163 8730c398 > [3.378206] 815fc8fe 81022f40 872f1880 8730c000 > 8730ffa4 81039a0a > [3.378206] Call Trace: > [3.378206] [<81039315>] do_exit+0x2a5/0x910 > [3.378206] [<8105af7b>] ? up_write+0x1b/0x30 > [3.378206] [<815fc8fe>] ? restore_all+0xf/0xf > [3.378206] [<81022f40>] ? kvm_read_and_reset_pf_reason+0x40/0x40 > [3.378206] [<81039a0a>] do_group_exit+0x4a/0xa0 > [3.378206] [<81039a78>] SyS_exit_group+0x18/0x20 > [3.378206] [<815fcf50>] sysenter_do_call+0x12/0x3c > [3.378206] Code: 36 31 c9 89 d0 0f b1 0e 39 c2 75 eb 85 d2 74 5c 8d b4 26 > 00 00 00 00 f3 90 8b 87 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> > 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 47 0c 04 74 c4 b9 f0 af > [3.378206] EIP: [<81053fc4>] task_work_run+0x54/0xa0 SS:ESP 0068:8730ff34 > [3.378206] CR2: 00740060 > [3.394549] ---[ end trace a6f697254c888db0 ]--- > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote: Greetings, I got the below dmesg and the first bad commit is commit 0c44c2d0f459cd7e275242b72f500137c4fa834d Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 11 15:19:24 2013 +0200 x86: Use asm goto to implement better modify_and_test() functions Linus suggested using asm goto to get rid of the typical SETcc + TEST instruction pair -- which also clobbers an extra register -- for our typical modify_and_test() functions. Because asm goto doesn't allow output fields it has to include an unconditinal memory clobber when it changes a memory variable to force a reload. Luckily all atomic ops already imply a compiler barrier to go along with their memory barrier semantics. Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Well that blows,.. Anybody got any clue as to where to start looking? I've not actually seen anything like this on my own machines. Wu, do you use the same compiler version for all the builds that crash like this (I'm assuming the other email was this same commit)? Does a different compiler make things work again? [3.336040] Write protecting the kernel read-only data: 2644k [3.336982] NX-protecting the kernel data: 6152k [3.375173] BUG: unable to handle kernel paging request at 00740060 [3.376162] IP: [81053fc4] task_work_run+0x54/0xa0 [3.376837] *pdpt = 072e1001 *pde = [3.377579] Oops: [#1] DEBUG_PAGEALLOC [3.378158] CPU: 0 PID: 85 Comm: hostname Not tainted 3.12.0-rc2-next-20130927-03100-ga0cf1ab #5 [3.378206] task: 8730c000 ti: 8730e000 task.ti: 8730e000 [3.378206] EIP: 0060:[81053fc4] EFLAGS: 00010206 CPU: 0 [3.378206] EIP is at task_work_run+0x54/0xa0 [3.378206] EAX: 00740060 EBX: 87309000 ECX: EDX: 00740060 [3.378206] ESI: 8730c388 EDI: 8730c000 EBP: 8730ff40 ESP: 8730ff34 [3.378206] DS: 007b ES: 007b FS: GS: SS: 0068 [3.378206] CR0: 8005003b CR2: 00740060 CR3: 072d7000 CR4: 06b0 [3.378206] Stack: [3.378206] 87308058 8730c000 8730ff8c 81039315 77675fb8 8105af7b [3.378206] 8730ffac 0001 6c0e41a5 61636f6c 736f686c 6f6c2e74 646c6163 8730c398 [3.378206] 815fc8fe 81022f40 872f1880 8730c000 8730ffa4 81039a0a [3.378206] Call Trace: [3.378206] [81039315] do_exit+0x2a5/0x910 [3.378206] [8105af7b] ? up_write+0x1b/0x30 [3.378206] [815fc8fe] ? restore_all+0xf/0xf [3.378206] [81022f40] ? kvm_read_and_reset_pf_reason+0x40/0x40 [3.378206] [81039a0a] do_group_exit+0x4a/0xa0 [3.378206] [81039a78] SyS_exit_group+0x18/0x20 [3.378206] [815fcf50] sysenter_do_call+0x12/0x3c [3.378206] Code: 36 31 c9 89 d0 0f b1 0e 39 c2 75 eb 85 d2 74 5c 8d b4 26 00 00 00 00 f3 90 8b 87 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 8b 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 47 0c 04 74 c4 b9 f0 af [3.378206] EIP: [81053fc4] task_work_run+0x54/0xa0 SS:ESP 0068:8730ff34 [3.378206] CR2: 00740060 [3.394549] ---[ end trace a6f697254c888db0 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: Wu, do you use the same compiler version for all the builds that crash like this (I'm assuming the other email was this same commit)? Does a different compiler make things work again? OK, so in a further email you say you use gcc-4.8.1 (which is actually newer than the one I used for most of the work although I do have it and tried it iirc). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: On Sun, Oct 06, 2013 at 07:44:30AM +0800, Fengguang Wu wrote: Greetings, I got the below dmesg and the first bad commit is commit 0c44c2d0f459cd7e275242b72f500137c4fa834d Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 11 15:19:24 2013 +0200 x86: Use asm goto to implement better modify_and_test() functions Linus suggested using asm goto to get rid of the typical SETcc + TEST instruction pair -- which also clobbers an extra register -- for our typical modify_and_test() functions. Because asm goto doesn't allow output fields it has to include an unconditinal memory clobber when it changes a memory variable to force a reload. Luckily all atomic ops already imply a compiler barrier to go along with their memory barrier semantics. Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap142...@git.kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Well that blows,.. Anybody got any clue as to where to start looking? I've not actually seen anything like this on my own machines. Perhaps it's related to one of - the randconfig - kvm - gcc In the end of dmesg file, there is the qemu command line to run the kernel: qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel /kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kernel-tests/run-queue/kvm/i386-randconfig-j1-10052106/next:master/.vmlinuz-a0cf1abc25ac197dd97b857c0f6341066a8cb1cf-20131005211923-7-athens branch=next/master BOOT_IMAGE=/kernel/i386-randconfig-j1-10052106/a0cf1abc25ac197dd97b857c0f6341066a8cb1cf/vmlinuz-3.12.0-rc2-next-20130927-03100-ga0cf1ab' -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net user,vlan=0,hostfwd=tcp::10661-:22 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -drive file=/fs/LABEL=KVM/disk0-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk1-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk2-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk3-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk4-quantal-athens-6,media=disk,if=virtio -drive file=/fs/LABEL=KVM/disk5-quantal-athens-6,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-quantal-athens-6 -serial file:/dev/shm/kboot/serial-quantal-athens-6 -daemonize -display none -monitor null Wu, do you use the same compiler version for all the builds that crash like this (I'm assuming the other email was this same commit)? Does a different compiler make things work again? Good point. I'll try a different compiler. Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Mon, Oct 07, 2013 at 11:08:56AM +0200, Peter Zijlstra wrote: On Mon, Oct 07, 2013 at 10:55:33AM +0200, Peter Zijlstra wrote: Wu, do you use the same compiler version for all the builds that crash like this (I'm assuming the other email was this same commit)? Does a different compiler make things work again? OK, so in a further email you say you use gcc-4.8.1 (which is actually newer than the one I used for most of the work although I do have it and tried it iirc). Right. And I've got the boot result for gcc-4.6.3: the problem is still there. Here is the qemu cmdline and new dmesg. = cmd=( qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel $1 -initrd /kernel-tests/initrd/quantal-core-i386.cgz -m 256M -smp 2 -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net user,vlan=0,hostfwd=tcp::10661-:22 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -serial file:/dev/shm/serial-0day -daemonize -display none -monitor null ) ${cmd[@]} -append 'hung_task_panic=1 rcutree.rcu_cpu_stall_timeout=100 log_buf_len=8M ignore_loglevel debug sched_debug apic=debug dynamic_printk sysrq_always_enabled panic=10 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw' = [0.00] Linux version 3.12.0-rc1-00081-g6bfa687 (wfg@bee) (gcc version 4.6.3 (Debian 4.6.3-1) ) #1 Mon Oct 7 19:18:22 CST 2013 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] debug: ignoring loglevel setting. [0.00] NX (Execute Disable) protection: active [0.00] Hypervisor detected: KVM [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100 [0.00] Scanning 1 areas for low memory corruption [0.00] initial memory mapped: [mem 0x-0x029f] [0.00] Base memory trampoline at [8009b000] 9b000 size 16384 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] init_memory_mapping: [mem 0x0e40-0x0e5f] [0.00] [mem 0x0e40-0x0e5f] page 4k [0.00] BRK [0x020a7000, 0x020a7fff] PGTABLE [0.00] init_memory_mapping: [mem 0x0c00-0x0e3f] [0.00] [mem 0x0c00-0x0e3f] page 4k [0.00] BRK [0x020a8000, 0x020a8fff] PGTABLE [0.00] BRK [0x020a9000, 0x020a9fff] PGTABLE [0.00] BRK [0x020aa000, 0x020aafff] PGTABLE [0.00] BRK [0x020ab000, 0x020abfff] PGTABLE [0.00] BRK [0x020ac000, 0x020acfff] PGTABLE [0.00] init_memory_mapping: [mem 0x0010-0x0bff] [0.00] [mem 0x0010-0x0bff] page 4k [0.00] init_memory_mapping: [mem 0x0e60-0x0fffdfff] [0.00] [mem 0x0e60-0x0fffdfff] page 4k [0.00] log_buf_len: 8388608 [0.00] early log buf free: 128876(98%) [0.00] RAMDISK: [mem 0x0e73f000-0x0ffe] [0.00] ACPI: RSDP 000f16b0 00014 (v00 BOCHS ) [0.00] ACPI: RSDT 0fffe3f0 00034 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 0f80 00074 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.00] ACPI: DSDT 0fffe430 01137 (v01 BXPC BXDSDT 0001 INTL 20100528) [0.00] ACPI: FACS 0f40 00040 [0.00] ACPI: SSDT 06a0 00899 (v01 BOCHS BXPCSSDT 0001 BXPC 0001) [0.00] ACPI: APIC 05b0 00080 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.00] ACPI: HPET 0570 00038 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.00] 255MB LOWMEM available. [0.00] mapped low ram: 0 - 0fffe000 [0.00] low ram: 0 - 0fffe000 [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 0:fffd001, boot clock [0.00] Zone ranges: [0.00] Normal [mem 0x1000-0x0fffdfff] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009efff] [0.00] node 0: [mem 0x0010-0x0fffdfff] [0.00] On node 0 totalpages: 65436 [0.00] Normal zone: 576 pages used for memmap [0.00] Normal zone: 0 pages
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu fengguang...@intel.com wrote: I got the below dmesg and the first bad commit is commit 0c44c2d0f459 (x86: Use asm goto to implement better modify_and_test() functions Hmm. I'm looking at the final version of that patch, and I'm not seeing anything wrong. It may trigger a compiler bug - there aren't that many asm goto users, and using them for the bitops adds a lot of new cases. Your oops makes very little sense, it looks like task_work_run() just called out to random crap, probably because the work was already released, so work-func() ends up being bad. I'm adding Oleg to the participants anyway, just in case there is some race. The comment says that it can race with task_work_cancel() playing with *work. Oleg, comments? However, I don't see any actual bit-op code in task_work_run() itself, so it's something else that got miscompiled and corrupted memory. In that respect, the oops you have looks more like the oopses you got with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set? That said, Fengguang, can you try two things just to check: - add cc to the clobbers list for the asm goto (technically it should be on the non-asm-goto as well, but we never had that, and maybe the fact that gcc always ends up testing a register afterwards hides the need for the clobber). So it would look like this in arch/x86/include/asm/rmwcc.h #define __GEN_RMWcc(fullop, var, cc, ...) \ do { \ asm volatile goto (fullop ; j cc %l[cc_label] \ : : m (var), ## __VA_ARGS__ \ : memory, cc : cc_label); \ return 0; \ cc_label: \ return 1; \ (where that cc thing is new). I'm not sure if cc really matters on x86 at all (it didn't use to, long long ago), but maybe it does these days.. If that makes no difference, please just verify that the non-asm-goto version works fine, by changing the #ifdef CC_HAVE_ASM_GOTO into a simple #if 0 to disable the asm-goto version. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/