Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, 18 Mar 2015, Andy Lutomirski wrote: > > I posted the same problem to the opensuse kernel list shortly before turning > > to LKML. There, Michal Kubecek noted: > > > > "I encountered a similar problem recently. The thing is, x86 > > specification says that on a double fault, RIP and RSP registers are > > undefined, i.e. you not only can't expect them to contain values > > corresponding to the first or second fault but you can't even expect > > them to have any usable values at all. Unfortunately the kernel double > > fault handler doesn't take this into account and does try to display > > usual crash related information so that it itself does usually crash > > when trying to show stack content (that's the show_stack_log_lvl() > > crash). > > I think that's not entirely true. RIP is reliable for many classes of > double faults, and we rely on that for espfix64. The fact that hpa > was willing to write that code strongly suggests that Intel chips at > least really do work that way. A #DF won't deliberately clobber the instruction or the stack pointer. It's only that it may happen at a stage where either or both original pointers have been lost and replaced with new values already, possibly making them inconsistent with the corresponding segment selectors too (as they are not written at the same time). This will only happen in certain degenerate corner cases such as e.g. a problem with TSS (#TS) in the processing of a task gate used for taking the original exception, where a part of the new context has already been loaded before #DF resulted. Another case will be a stack segment limit violation (#SS), where stack has been switched in the processing of a trap or interrupt gate, preventing return information and error code from being pushed for the original exception. These are not conditions we'd normally observe in Linux. In other cases both the original instruction and the original stack pointer will have been retained. Maciej -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, 18 Mar 2015, Andy Lutomirski wrote: I posted the same problem to the opensuse kernel list shortly before turning to LKML. There, Michal Kubecek noted: I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually crash when trying to show stack content (that's the show_stack_log_lvl() crash). I think that's not entirely true. RIP is reliable for many classes of double faults, and we rely on that for espfix64. The fact that hpa was willing to write that code strongly suggests that Intel chips at least really do work that way. A #DF won't deliberately clobber the instruction or the stack pointer. It's only that it may happen at a stage where either or both original pointers have been lost and replaced with new values already, possibly making them inconsistent with the corresponding segment selectors too (as they are not written at the same time). This will only happen in certain degenerate corner cases such as e.g. a problem with TSS (#TS) in the processing of a task gate used for taking the original exception, where a part of the new context has already been loaded before #DF resulted. Another case will be a stack segment limit violation (#SS), where stack has been switched in the processing of a trap or interrupt gate, preventing return information and error code from being pushed for the original exception. These are not conditions we'd normally observe in Linux. In other cases both the original instruction and the original stack pointer will have been retained. Maciej -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 12:07 PM, Denys Vlasenko wrote: > On 03/23/2015 07:38 PM, Andy Lutomirski wrote: >>> cmpq $__NR_syscall_max,%rax >>> ja ret_from_sys_call >>> movq %r10,%rcx >>> call *sys_call_table(,%rax,8) # XXX:rip relative >>> movq %rax,RAX-ARGOFFSET(%rsp) >>> ret_from_sys_call: >>> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) >>> >>> jnz int_ret_from_sys_call_fixup /* Go the the slow path */ >>> LOCKDEP_SYS_EXIT >>> DISABLE_INTERRUPTS(CLBR_NONE) >>> TRACE_IRQS_OFF >>> ... >>> ... >>> int_ret_from_sys_call_fixup: >>> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET >>> jmp int_ret_from_sys_call >>> ... >>> ... >>> GLOBAL(int_ret_from_sys_call) >>> DISABLE_INTERRUPTS(CLBR_NONE) >>> TRACE_IRQS_OFF >>> >>> You reverted that by moving this insn to be after first >>> DISABLE_INTERRUPTS(CLBR_NONE). >>> >>> I also don't see how moving that check (even if it is wrong in a more >>> benign way) can have such a drastic effect. >> >> I bet I see it. I have the advantage of having stared at KVM code and >> cursed at it more recently than you, I suspect. KVM does awful, awful >> things to CPU state, and, as an optimization, it allows kernel code to >> run with CPU state that would be totally invalid in user mode. This >> happens through a bunch of hooks, including this bit in __switch_to: >> >> /* >> * Now maybe reload the debug registers and handle I/O bitmaps >> */ >> if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || >> task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) >> __switch_to_xtra(prev_p, next_p, tss); >> >> IOW, we *change* tif during context switches. >> >> >> The race looks like this: >> >> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) >> jnz int_ret_from_sys_call_fixup/* Go the the slow path */ >> >> --- preempted here, switch to KVM guest --- >> >> KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't >> happen to be a *32-bit* KVM guest, perhaps? >> >> Now KVM schedules, calling __switch_to. __switch_to sets >> _TIF_USER_RETURN_NOTIFY. > > Clear up to now... > >> We IRET back to the syscall exit code, > > So we end up being just after the "testl", right? > We go into "int_ret_from_sys_call_fixup". Nope, other way around. We saw no work bits set in testl, but one or more of those bits was set when we're preempted and return. Now we *don't* go to int_ret_from_sys_call_fixup. I don't think that the resulting sysret itself is harmful, but I think we're now running user code with some MSRs programmed wrong. The next syscall could do bad things, such as failing to clear IF. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 07:38 PM, Andy Lutomirski wrote: >> cmpq $__NR_syscall_max,%rax >> ja ret_from_sys_call >> movq %r10,%rcx >> call *sys_call_table(,%rax,8) # XXX:rip relative >> movq %rax,RAX-ARGOFFSET(%rsp) >> ret_from_sys_call: >> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) >> >> jnz int_ret_from_sys_call_fixup /* Go the the slow path */ >> LOCKDEP_SYS_EXIT >> DISABLE_INTERRUPTS(CLBR_NONE) >> TRACE_IRQS_OFF >> ... >> ... >> int_ret_from_sys_call_fixup: >> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET >> jmp int_ret_from_sys_call >> ... >> ... >> GLOBAL(int_ret_from_sys_call) >> DISABLE_INTERRUPTS(CLBR_NONE) >> TRACE_IRQS_OFF >> >> You reverted that by moving this insn to be after first >> DISABLE_INTERRUPTS(CLBR_NONE). >> >> I also don't see how moving that check (even if it is wrong in a more >> benign way) can have such a drastic effect. > > I bet I see it. I have the advantage of having stared at KVM code and > cursed at it more recently than you, I suspect. KVM does awful, awful > things to CPU state, and, as an optimization, it allows kernel code to > run with CPU state that would be totally invalid in user mode. This > happens through a bunch of hooks, including this bit in __switch_to: > > /* > * Now maybe reload the debug registers and handle I/O bitmaps > */ > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > __switch_to_xtra(prev_p, next_p, tss); > > IOW, we *change* tif during context switches. > > > The race looks like this: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > --- preempted here, switch to KVM guest --- > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > happen to be a *32-bit* KVM guest, perhaps? > > Now KVM schedules, calling __switch_to. __switch_to sets > _TIF_USER_RETURN_NOTIFY. Clear up to now... > We IRET back to the syscall exit code, So we end up being just after the "testl", right? We go into "int_ret_from_sys_call_fixup". We FIXUP_TOP_OF_STACK - now iret frame contains correct values. Then we jump to "int_ret_from_sys_call". > turn off interrupts, and do sysret. We are now screwed. I don't understand. Where exactly it would go wrong? On sysret, rsp would be restored from PER_CPU(old_rsp), right? We'd end up in *userspace* with userspace rsp. More to it. Since we FIXUPed the iret frame, it does not even matter how we'll exit to userspace. Either sysret or iret would work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 11:48:42 -0700, Andy Lutomirski wrote: > > On Mon, Mar 23, 2015 at 11:38 AM, Andy Lutomirski wrote: > > On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko wrote: > >> On 03/23/2015 02:22 PM, Takashi Iwai wrote: > >>> At Mon, 23 Mar 2015 10:35:41 +0100, > >>> Takashi Iwai wrote: > > At Mon, 23 Mar 2015 10:02:52 +0100, > Takashi Iwai wrote: > > > > At Fri, 20 Mar 2015 19:16:53 +0100, > > Denys Vlasenko wrote: > > > >>> I'm really puzzled now. We have a few pieces of information: > >>> > >>> - git bisection pointed the commit 96b6352c1271: > >>> x86_64, entry: Remove the syscall exit audit and schedule > >>> optimizations > >>> and reverting this "fixes" the problem indeed. Even just moving two > >>> lines > >>> LOCKDEP_SYS_EXIT > >>> DISABLE_INTERRUPTS(CLBR_NONE) > >>> at the beginning of ret_from_sys_call already fixes. (Of course I > >>> can't prove the fix but it stabilizes for a day without crash while > >>> usually I hit the bug in 10 minutes in full test running.) > >> > >> The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from > >> interrupt-disabled region to interrupt-enabled: > >> > >> cmpq $__NR_syscall_max,%rax > >> ja ret_from_sys_call > >> movq %r10,%rcx > >> call *sys_call_table(,%rax,8) # XXX:rip relative > >> movq %rax,RAX-ARGOFFSET(%rsp) > >> ret_from_sys_call: > >> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > >> > >> jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > >> LOCKDEP_SYS_EXIT > >> DISABLE_INTERRUPTS(CLBR_NONE) > >> TRACE_IRQS_OFF > >> ... > >> ... > >> int_ret_from_sys_call_fixup: > >> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET > >> jmp int_ret_from_sys_call > >> ... > >> ... > >> GLOBAL(int_ret_from_sys_call) > >> DISABLE_INTERRUPTS(CLBR_NONE) > >> TRACE_IRQS_OFF > >> > >> You reverted that by moving this insn to be after first > >> DISABLE_INTERRUPTS(CLBR_NONE). > >> > >> I also don't see how moving that check (even if it is wrong in a more > >> benign way) can have such a drastic effect. > > > > I bet I see it. I have the advantage of having stared at KVM code and > > cursed at it more recently than you, I suspect. KVM does awful, awful > > things to CPU state, and, as an optimization, it allows kernel code to > > run with CPU state that would be totally invalid in user mode. This > > happens through a bunch of hooks, including this bit in __switch_to: > > > > /* > > * Now maybe reload the debug registers and handle I/O bitmaps > > */ > > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > > __switch_to_xtra(prev_p, next_p, tss); > > > > IOW, we *change* tif during context switches. > > > > > > The race looks like this: > > > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > > > --- preempted here, switch to KVM guest --- > > > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > > happen to be a *32-bit* KVM guest, perhaps? > > > > Now KVM schedules, calling __switch_to. __switch_to sets > > _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn > > off interrupts, and do sysret. We are now screwed. > > > > I don't know why this manifests in this particular failure, but any > > number of terrible things could happen now. > > > > FWIW, this will affect things other than KVM. For example, SIGKILL > > sent while a process is sleeping in that two-instruction window won't > > work. > > > > Takashi, can you re-send your patch so we can review it for real in > > light of this race? > > Never mind, I'm testing a slightly fancier patch. OK, I'll wait for your test patch. thanks, Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 11:38:30 -0700, Andy Lutomirski wrote: > > On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko wrote: > > On 03/23/2015 02:22 PM, Takashi Iwai wrote: > >> At Mon, 23 Mar 2015 10:35:41 +0100, > >> Takashi Iwai wrote: > >>> > >>> At Mon, 23 Mar 2015 10:02:52 +0100, > >>> Takashi Iwai wrote: > > At Fri, 20 Mar 2015 19:16:53 +0100, > Denys Vlasenko wrote: > > >> I'm really puzzled now. We have a few pieces of information: > >> > >> - git bisection pointed the commit 96b6352c1271: > >> x86_64, entry: Remove the syscall exit audit and schedule optimizations > >> and reverting this "fixes" the problem indeed. Even just moving two > >> lines > >> LOCKDEP_SYS_EXIT > >> DISABLE_INTERRUPTS(CLBR_NONE) > >> at the beginning of ret_from_sys_call already fixes. (Of course I > >> can't prove the fix but it stabilizes for a day without crash while > >> usually I hit the bug in 10 minutes in full test running.) > > > > The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from > > interrupt-disabled region to interrupt-enabled: > > > > cmpq $__NR_syscall_max,%rax > > ja ret_from_sys_call > > movq %r10,%rcx > > call *sys_call_table(,%rax,8) # XXX:rip relative > > movq %rax,RAX-ARGOFFSET(%rsp) > > ret_from_sys_call: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > > > jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > > LOCKDEP_SYS_EXIT > > DISABLE_INTERRUPTS(CLBR_NONE) > > TRACE_IRQS_OFF > > ... > > ... > > int_ret_from_sys_call_fixup: > > FIXUP_TOP_OF_STACK %r11, -ARGOFFSET > > jmp int_ret_from_sys_call > > ... > > ... > > GLOBAL(int_ret_from_sys_call) > > DISABLE_INTERRUPTS(CLBR_NONE) > > TRACE_IRQS_OFF > > > > You reverted that by moving this insn to be after first > > DISABLE_INTERRUPTS(CLBR_NONE). > > > > I also don't see how moving that check (even if it is wrong in a more > > benign way) can have such a drastic effect. > > I bet I see it. I have the advantage of having stared at KVM code and > cursed at it more recently than you, I suspect. KVM does awful, awful > things to CPU state, and, as an optimization, it allows kernel code to > run with CPU state that would be totally invalid in user mode. This > happens through a bunch of hooks, including this bit in __switch_to: > > /* > * Now maybe reload the debug registers and handle I/O bitmaps > */ > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > __switch_to_xtra(prev_p, next_p, tss); > > IOW, we *change* tif during context switches. > > > The race looks like this: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > --- preempted here, switch to KVM guest --- > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > happen to be a *32-bit* KVM guest, perhaps? > > Now KVM schedules, calling __switch_to. __switch_to sets > _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn > off interrupts, and do sysret. We are now screwed. Thanks for enlightening! That looks like a feasible scenario. (I tested only a 64bit KVM guest, BTW.) > I don't know why this manifests in this particular failure, but any > number of terrible things could happen now. > > FWIW, this will affect things other than KVM. For example, SIGKILL > sent while a process is sleeping in that two-instruction window won't > work. > > Takashi, can you re-send your patch so we can review it for real in > light of this race? The patch below worked. I'll double-check tomorrow whether this really cures reliably. thanks, Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski: > I bet I see it. I have the advantage of having stared at KVM code and > cursed at it more recently than you, I suspect. KVM does awful, awful > things to CPU state, and, as an optimization, it allows kernel code to > run with CPU state that would be totally invalid in user mode. This > happens through a bunch of hooks, including this bit in __switch_to: > > /* > * Now maybe reload the debug registers and handle I/O bitmaps > */ > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > __switch_to_xtra(prev_p, next_p, tss); > > IOW, we *change* tif during context switches. > > > The race looks like this: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > --- preempted here, switch to KVM guest --- > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > happen to be a *32-bit* KVM guest, perhaps? not in my case (penryn CPU), there it was 64bit guests. > Now KVM schedules, calling __switch_to. __switch_to sets > _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn > off interrupts, and do sysret. We are now screwed. > > I don't know why this manifests in this particular failure, but any > number of terrible things could happen now. > > FWIW, this will affect things other than KVM. For example, SIGKILL > sent while a process is sleeping in that two-instruction window won't > work. > > Takashi, can you re-send your patch so we can review it for real in > light of this race? -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 11:38 AM, Andy Lutomirski wrote: > On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko wrote: >> On 03/23/2015 02:22 PM, Takashi Iwai wrote: >>> At Mon, 23 Mar 2015 10:35:41 +0100, >>> Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: > > At Fri, 20 Mar 2015 19:16:53 +0100, > Denys Vlasenko wrote: > >>> I'm really puzzled now. We have a few pieces of information: >>> >>> - git bisection pointed the commit 96b6352c1271: >>> x86_64, entry: Remove the syscall exit audit and schedule optimizations >>> and reverting this "fixes" the problem indeed. Even just moving two >>> lines >>> LOCKDEP_SYS_EXIT >>> DISABLE_INTERRUPTS(CLBR_NONE) >>> at the beginning of ret_from_sys_call already fixes. (Of course I >>> can't prove the fix but it stabilizes for a day without crash while >>> usually I hit the bug in 10 minutes in full test running.) >> >> The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from >> interrupt-disabled region to interrupt-enabled: >> >> cmpq $__NR_syscall_max,%rax >> ja ret_from_sys_call >> movq %r10,%rcx >> call *sys_call_table(,%rax,8) # XXX:rip relative >> movq %rax,RAX-ARGOFFSET(%rsp) >> ret_from_sys_call: >> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) >> >> jnz int_ret_from_sys_call_fixup /* Go the the slow path */ >> LOCKDEP_SYS_EXIT >> DISABLE_INTERRUPTS(CLBR_NONE) >> TRACE_IRQS_OFF >> ... >> ... >> int_ret_from_sys_call_fixup: >> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET >> jmp int_ret_from_sys_call >> ... >> ... >> GLOBAL(int_ret_from_sys_call) >> DISABLE_INTERRUPTS(CLBR_NONE) >> TRACE_IRQS_OFF >> >> You reverted that by moving this insn to be after first >> DISABLE_INTERRUPTS(CLBR_NONE). >> >> I also don't see how moving that check (even if it is wrong in a more >> benign way) can have such a drastic effect. > > I bet I see it. I have the advantage of having stared at KVM code and > cursed at it more recently than you, I suspect. KVM does awful, awful > things to CPU state, and, as an optimization, it allows kernel code to > run with CPU state that would be totally invalid in user mode. This > happens through a bunch of hooks, including this bit in __switch_to: > > /* > * Now maybe reload the debug registers and handle I/O bitmaps > */ > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > __switch_to_xtra(prev_p, next_p, tss); > > IOW, we *change* tif during context switches. > > > The race looks like this: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > --- preempted here, switch to KVM guest --- > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > happen to be a *32-bit* KVM guest, perhaps? > > Now KVM schedules, calling __switch_to. __switch_to sets > _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn > off interrupts, and do sysret. We are now screwed. > > I don't know why this manifests in this particular failure, but any > number of terrible things could happen now. > > FWIW, this will affect things other than KVM. For example, SIGKILL > sent while a process is sleeping in that two-instruction window won't > work. > > Takashi, can you re-send your patch so we can review it for real in > light of this race? Never mind, I'm testing a slightly fancier patch. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 18:46:45 +0100, Denys Vlasenko wrote: > > On 03/23/2015 06:18 PM, Takashi Iwai wrote: > > At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: > I pulled tip tree on top of 4.0-rc5, built with your patch and now > succeeded to get a better message: > > kvm: zapping shadow pages for mmio generation wraparound > kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 > EFLAGS: 00010006 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > PANIC: double fault, error_code: 0x0 > CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ > #2 > Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 > RAX: RBX: 0005 RCX: c101 > RDX: RSI: 0001 RDI: 7ffd22c23ef0 > > >> FYI: the disassembly of netlink_attachskb (from "Code:" line) is: > >> > >>0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > >>5: 55 push %rbp > >>6: 48 89 e5mov%rsp,%rbp > >>9: 41 56 push %r14 > >>b: 41 55 push %r13 > >>d: 49 89 d5mov%rdx,%r13 > >> 10: 41 54 push %r12 > >> 12: 49 89 f4mov%rsi,%r12 > >> 15: 53 push %rbx > >> 16: 48 89 fbmov%rdi,%rbx > >> 19: 48 83 ec 30 sub$0x30,%rsp > >> 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax > >> ^ > >> 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) > >> 29: 7c 25 jl 50 <_start+0x50> > >> 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax > >> > >> The ^ instruction is the one which faults. Since you said it > >> consistently happens here, this should be a page fault, not an external > >> hardware interrupt. > >> > >> The code corresponds to the comparison in if(): > >> > >> int netlink_attachskb(struct sock *sk, struct sk_buff *skb, > >> long *timeo, struct sock *ssk) > >> { > >> struct netlink_sock *nlk; > >> > >> nlk = nlk_sk(sk); > >> > >> if ((atomic_read(>sk_rmem_alloc) > sk->sk_rcvbuf || > > >>> - Another piece is that the bug happens only when a KVM is running. > >>> The kernel ran without problem over days with similar tasks > >>> (compiling kernel, etc) when no KVM was used. > >> > >> Conceivably virtualization support in CPUs can have nasty erratas. > >> However, you and other reporter have different CPUs - yours > >> is Ivy Bridge, his CPU is a Penryn. > >> > >> I don't see the path how KVM helps to trigger this. > >> > >>> - And now I get the trace as above, pointing netlink_attachskb(). > >>> > >>> I have a difficulty to imagine how all these pieces fit into a single > >>> picture. Is something already screwed up before that? > >> > >> Well, a tiny bit more info will be seen if you'd change %rdi > >> to, say, %r15 in these two lines in my patch: > >> > >>/* Save bogus RSP value */ > >>movq%rsp,%rdi > >> ... > >>push%rdi/* pt_regs->sp */ > >> > >> Then original %rdi will be visible in the crash message. > > > > OK, here we go. > > > > kvm: zapping shadow pages for mmio generation wraparound > > kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > > Exception on user stack 7fff1d7e5ec0: RSP: 0018:7fff1d7e5ef8 > > EFLAGS: 00010002 > > RIP: 0010:[] [] > > netlink_attachskb+0x1d/0x1d0 > > PANIC: double fault, error_code: 0x0 > > CPU: 5 PID: 14285 Comm: fixdep Tainted: GW 4.0.0-rc5-debug1+ > > #3 > > Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > > task: 88020ba1c690 ti: 880206ba4000 task.ti: 880206ba4000 > > RIP: 0010:[] [] > > netlink_attachskb+0x1d/0x1d0 > > RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 > > RAX: RBX: RCX: c101 > > RDX: RSI: 1ebb RDI: > > Thanks for your testing. So the %rdi was NULL... not very informative. > > Notice that your every crash is preceded by > > kvm: zapping shadow pages for mmio generation wraparound > kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > > This hints that kvm _is_ somehow responsible. It's likely irrelevant, as this appears at the time a VM starting, not at the crash time. I've got this message all the time. Sorry for confusing. Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko wrote: > On 03/23/2015 02:22 PM, Takashi Iwai wrote: >> At Mon, 23 Mar 2015 10:35:41 +0100, >> Takashi Iwai wrote: >>> >>> At Mon, 23 Mar 2015 10:02:52 +0100, >>> Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: >> I'm really puzzled now. We have a few pieces of information: >> >> - git bisection pointed the commit 96b6352c1271: >> x86_64, entry: Remove the syscall exit audit and schedule optimizations >> and reverting this "fixes" the problem indeed. Even just moving two >> lines >> LOCKDEP_SYS_EXIT >> DISABLE_INTERRUPTS(CLBR_NONE) >> at the beginning of ret_from_sys_call already fixes. (Of course I >> can't prove the fix but it stabilizes for a day without crash while >> usually I hit the bug in 10 minutes in full test running.) > > The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from > interrupt-disabled region to interrupt-enabled: > > cmpq $__NR_syscall_max,%rax > ja ret_from_sys_call > movq %r10,%rcx > call *sys_call_table(,%rax,8) # XXX:rip relative > movq %rax,RAX-ARGOFFSET(%rsp) > ret_from_sys_call: > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > LOCKDEP_SYS_EXIT > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > ... > ... > int_ret_from_sys_call_fixup: > FIXUP_TOP_OF_STACK %r11, -ARGOFFSET > jmp int_ret_from_sys_call > ... > ... > GLOBAL(int_ret_from_sys_call) > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > > You reverted that by moving this insn to be after first > DISABLE_INTERRUPTS(CLBR_NONE). > > I also don't see how moving that check (even if it is wrong in a more > benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? > > > Shot-in-the-dark idea. At this code revision we did not yet > store user's %rsp in pt_regs->sp, we used a fixup to populate it: > > .macro FIXUP_TOP_OF_STACK tmp offset=0 > movq PER_CPU_VAR(old_rsp),\tmp > movq \tmp,RSP+\offset(%rsp) > > (There are pending patches to fix this mess). > > If an interrupt interrupting *kernel code* would go into a code path > which does FIXUP_TOP_OF_STACK, it'd overwrite the correct saved %rsp > with a user's one. The iret from interrupt would work, > but the resulting CPU state would be inconsistent. But I don't see > such a code path from interrupts to FIXUP_TOP_OF_STACK... I don't buy it. Anything that does that is so completely broken that I'd hope we'd have found it long ago. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 06:18 PM, Takashi Iwai wrote: > At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[] [] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[] [] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 >> FYI: the disassembly of netlink_attachskb (from "Code:" line) is: >> >>0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >>5: 55 push %rbp >>6: 48 89 e5mov%rsp,%rbp >>9: 41 56 push %r14 >>b: 41 55 push %r13 >>d: 49 89 d5mov%rdx,%r13 >> 10: 41 54 push %r12 >> 12: 49 89 f4mov%rsi,%r12 >> 15: 53 push %rbx >> 16: 48 89 fbmov%rdi,%rbx >> 19: 48 83 ec 30 sub$0x30,%rsp >> 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax >> ^ >> 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) >> 29: 7c 25 jl 50 <_start+0x50> >> 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax >> >> The ^ instruction is the one which faults. Since you said it >> consistently happens here, this should be a page fault, not an external >> hardware interrupt. >> >> The code corresponds to the comparison in if(): >> >> int netlink_attachskb(struct sock *sk, struct sk_buff *skb, >> long *timeo, struct sock *ssk) >> { >> struct netlink_sock *nlk; >> >> nlk = nlk_sk(sk); >> >> if ((atomic_read(>sk_rmem_alloc) > sk->sk_rcvbuf || >>> - Another piece is that the bug happens only when a KVM is running. >>> The kernel ran without problem over days with similar tasks >>> (compiling kernel, etc) when no KVM was used. >> >> Conceivably virtualization support in CPUs can have nasty erratas. >> However, you and other reporter have different CPUs - yours >> is Ivy Bridge, his CPU is a Penryn. >> >> I don't see the path how KVM helps to trigger this. >> >>> - And now I get the trace as above, pointing netlink_attachskb(). >>> >>> I have a difficulty to imagine how all these pieces fit into a single >>> picture. Is something already screwed up before that? >> >> Well, a tiny bit more info will be seen if you'd change %rdi >> to, say, %r15 in these two lines in my patch: >> >>/* Save bogus RSP value */ >>movq%rsp,%rdi >> ... >>push%rdi/* pt_regs->sp */ >> >> Then original %rdi will be visible in the crash message. > > OK, here we go. > > kvm: zapping shadow pages for mmio generation wraparound > kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > Exception on user stack 7fff1d7e5ec0: RSP: 0018:7fff1d7e5ef8 > EFLAGS: 00010002 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > PANIC: double fault, error_code: 0x0 > CPU: 5 PID: 14285 Comm: fixdep Tainted: GW 4.0.0-rc5-debug1+ #3 > Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > task: 88020ba1c690 ti: 880206ba4000 task.ti: 880206ba4000 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 > RAX: RBX: RCX: c101 > RDX: RSI: 1ebb RDI: Thanks for your testing. So the %rdi was NULL... not very informative. Notice that your every crash is preceded by kvm: zapping shadow pages for mmio generation wraparound kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x This hints that kvm _is_ somehow responsible. I'm no expert on kvm, I need to take a look around that code... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: > > On 03/23/2015 02:22 PM, Takashi Iwai wrote: > > At Mon, 23 Mar 2015 10:35:41 +0100, > > Takashi Iwai wrote: > >> > >> At Mon, 23 Mar 2015 10:02:52 +0100, > >> Takashi Iwai wrote: > >>> > >>> At Fri, 20 Mar 2015 19:16:53 +0100, > >>> Denys Vlasenko wrote: > Takashi, are you willing to reproduce the panic one more time, > with this patch? I would like to see whether oops messages > are more informative with it. > >>> > >>> It can't be applied to 4.0-rc5, unfortunately. > >>> > >>> arch/x86/kernel/entry_64.S: Assembler messages: > >>> arch/x86/kernel/entry_64.S:1725: Error: no such instruction: > >>> `alloc_pt_gpregs_on_stack' > >>> arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* > >>> sections) for `+' > >>> scripts/Makefile.build:294: recipe for target > >>> 'arch/x86/kernel/entry_64.o' failed > >> > >> I pulled tip tree on top of 4.0-rc5, built with your patch and now > >> succeeded to get a better message: > >> > >> kvm: zapping shadow pages for mmio generation wraparound > >> kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > >> Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 > >> EFLAGS: 00010006 > >> RIP: 0010:[] [] > >> netlink_attachskb+0x1d/0x1d0 > >> PANIC: double fault, error_code: 0x0 > >> CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 > >> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > >> task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 > >> RIP: 0010:[] [] > >> netlink_attachskb+0x1d/0x1d0 > >> RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 > >> RAX: RBX: 0005 RCX: c101 > >> RDX: RSI: 0001 RDI: 7ffd22c23ef0 > >> RBP: 0ea7 R08: 1ea7 R09: > >> R10: 0309dbf8 R11: 0246 R12: 0001 > >> R13: R14: 03026e40 R15: 0309cd50 > >> FS: 7f89c83c2800() GS:88021d24() > >> knlGS: > >> CS: 0010 DS: ES: CR0: 80050033 > >> CR2: 016d CR3: d90a CR4: 001427e0 > >> Stack: > >> 0ea7 03099c10 0ea7 > >> 0ea7 0001 03099c10 0ea7 > >> 00c84696 03099c88 7f0122c23fb8 0302f610 > >> Call Trace: > >> > >> Code: > >> 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 > >> 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 > >> 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 > >> Kernel panic - not syncing: Machine halted. > >> CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 > >> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > >> 8800d1b33e28 816f80d2 > >> 81a22f81 8800d1b33ea8 816f2358 58d7 > >> 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 > >> Call Trace: > >> [] dump_stack+0x4c/0x6e > >> [] panic+0xc0/0x1f3 > >> [] df_debug+0x35/0x40 > >> [] do_double_fault+0x87/0x100 > >> [] do_userpsace_rsp_in_kernel+0x107/0x140 > >> [] ? netlink_attachskb+0x1d/0x1d0 > >> [] userpsace_rsp_in_kernel+0x36/0x40 > >> [] ? netlink_attachskb+0x1d/0x1d0 > >> > >> > >> So, it seems hitting in netlink_attachskb(). > >> I'd need to check whether this consistently hits there or just at > >> random. > > > > I managed to reproduce the bug two more times, and all three show the > > very same stack trace like the above. So, it's well reproducible. > > FYI: the disassembly of netlink_attachskb (from "Code:" line) is: > >0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >5: 55 push %rbp >6: 48 89 e5mov%rsp,%rbp >9: 41 56 push %r14 >b: 41 55 push %r13 >d: 49 89 d5mov%rdx,%r13 > 10: 41 54 push %r12 > 12: 49 89 f4mov%rsi,%r12 > 15: 53 push %rbx > 16: 48 89 fbmov%rdi,%rbx > 19: 48 83 ec 30 sub$0x30,%rsp > 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax > ^ > 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) > 29: 7c 25 jl 50 <_start+0x50> > 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax > > The ^ instruction is the one which faults. Since you said it > consistently happens here, this should be a page fault, not an external > hardware interrupt. > > The code corresponds to the comparison in if(): > > int netlink_attachskb(struct sock *sk, struct
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 02:22 PM, Takashi Iwai wrote: > At Mon, 23 Mar 2015 10:35:41 +0100, > Takashi Iwai wrote: >> >> At Mon, 23 Mar 2015 10:02:52 +0100, >> Takashi Iwai wrote: >>> >>> At Fri, 20 Mar 2015 19:16:53 +0100, >>> Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. >>> >>> It can't be applied to 4.0-rc5, unfortunately. >>> >>> arch/x86/kernel/entry_64.S: Assembler messages: >>> arch/x86/kernel/entry_64.S:1725: Error: no such instruction: >>> `alloc_pt_gpregs_on_stack' >>> arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* >>> sections) for `+' >>> scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' >>> failed >> >> I pulled tip tree on top of 4.0-rc5, built with your patch and now >> succeeded to get a better message: >> >> kvm: zapping shadow pages for mmio generation wraparound >> kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x >> Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 >> EFLAGS: 00010006 >> RIP: 0010:[] [] >> netlink_attachskb+0x1d/0x1d0 >> PANIC: double fault, error_code: 0x0 >> CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 >> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 >> task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 >> RIP: 0010:[] [] >> netlink_attachskb+0x1d/0x1d0 >> RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 >> RAX: RBX: 0005 RCX: c101 >> RDX: RSI: 0001 RDI: 7ffd22c23ef0 >> RBP: 0ea7 R08: 1ea7 R09: >> R10: 0309dbf8 R11: 0246 R12: 0001 >> R13: R14: 03026e40 R15: 0309cd50 >> FS: 7f89c83c2800() GS:88021d24() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: 016d CR3: d90a CR4: 001427e0 >> Stack: >> 0ea7 03099c10 0ea7 >> 0ea7 0001 03099c10 0ea7 >> 00c84696 03099c88 7f0122c23fb8 0302f610 >> Call Trace: >> >> Code: >> 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 >> 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 00 39 >> 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 >> Kernel panic - not syncing: Machine halted. >> CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 >> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 >> 8800d1b33e28 816f80d2 >> 81a22f81 8800d1b33ea8 816f2358 58d7 >> 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 >> Call Trace: >> [] dump_stack+0x4c/0x6e >> [] panic+0xc0/0x1f3 >> [] df_debug+0x35/0x40 >> [] do_double_fault+0x87/0x100 >> [] do_userpsace_rsp_in_kernel+0x107/0x140 >> [] ? netlink_attachskb+0x1d/0x1d0 >> [] userpsace_rsp_in_kernel+0x36/0x40 >> [] ? netlink_attachskb+0x1d/0x1d0 >> >> >> So, it seems hitting in netlink_attachskb(). >> I'd need to check whether this consistently hits there or just at >> random. > > I managed to reproduce the bug two more times, and all three show the > very same stack trace like the above. So, it's well reproducible. FYI: the disassembly of netlink_attachskb (from "Code:" line) is: 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 55 push %rbp 6: 48 89 e5mov%rsp,%rbp 9: 41 56 push %r14 b: 41 55 push %r13 d: 49 89 d5mov%rdx,%r13 10: 41 54 push %r12 12: 49 89 f4mov%rsi,%r12 15: 53 push %rbx 16: 48 89 fbmov%rdi,%rbx 19: 48 83 ec 30 sub$0x30,%rsp 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax ^ 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) 29: 7c 25 jl 50 <_start+0x50> 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax The ^ instruction is the one which faults. Since you said it consistently happens here, this should be a page fault, not an external hardware interrupt. The code corresponds to the comparison in if(): int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock *nlk; nlk = nlk_sk(sk); if ((atomic_read(>sk_rmem_alloc) > sk->sk_rcvbuf || %rdi (which is 1st param, "struct sock *sk") is 7ffd22c23ef0 (userspace address), but it's
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: > > At Mon, 23 Mar 2015 10:02:52 +0100, > Takashi Iwai wrote: > > > > At Fri, 20 Mar 2015 19:16:53 +0100, > > Denys Vlasenko wrote: > > > Takashi, are you willing to reproduce the panic one more time, > > > with this patch? I would like to see whether oops messages > > > are more informative with it. > > > > It can't be applied to 4.0-rc5, unfortunately. > > > > arch/x86/kernel/entry_64.S: Assembler messages: > > arch/x86/kernel/entry_64.S:1725: Error: no such instruction: > > `alloc_pt_gpregs_on_stack' > > arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* > > sections) for `+' > > scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' > > failed > > I pulled tip tree on top of 4.0-rc5, built with your patch and now > succeeded to get a better message: > > kvm: zapping shadow pages for mmio generation wraparound > kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x > Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 > EFLAGS: 00010006 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > PANIC: double fault, error_code: 0x0 > CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 > Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 > RIP: 0010:[] [] > netlink_attachskb+0x1d/0x1d0 > RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 > RAX: RBX: 0005 RCX: c101 > RDX: RSI: 0001 RDI: 7ffd22c23ef0 > RBP: 0ea7 R08: 1ea7 R09: > R10: 0309dbf8 R11: 0246 R12: 0001 > R13: R14: 03026e40 R15: 0309cd50 > FS: 7f89c83c2800() GS:88021d24() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 016d CR3: d90a CR4: 001427e0 > Stack: > 0ea7 03099c10 0ea7 > 0ea7 0001 03099c10 0ea7 > 00c84696 03099c88 7f0122c23fb8 0302f610 > Call Trace: > > Code: > 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 > 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 00 39 > 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 > Kernel panic - not syncing: Machine halted. > CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 > Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 > 8800d1b33e28 816f80d2 > 81a22f81 8800d1b33ea8 816f2358 58d7 > 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 > Call Trace: > [] dump_stack+0x4c/0x6e > [] panic+0xc0/0x1f3 > [] df_debug+0x35/0x40 > [] do_double_fault+0x87/0x100 > [] do_userpsace_rsp_in_kernel+0x107/0x140 > [] ? netlink_attachskb+0x1d/0x1d0 > [] userpsace_rsp_in_kernel+0x36/0x40 > [] ? netlink_attachskb+0x1d/0x1d0 > > > So, it seems hitting in netlink_attachskb(). > I'd need to check whether this consistently hits there or just at > random. I managed to reproduce the bug two more times, and all three show the very same stack trace like the above. So, it's well reproducible. I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this "fixes" the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) - Another piece is that the bug happens only when a KVM is running. The kernel ran without problem over days with similar tasks (compiling kernel, etc) when no KVM was used. - And now I get the trace as above, pointing netlink_attachskb(). I have a difficulty to imagine how all these pieces fit into a single picture. Is something already screwed up before that? Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: > > At Fri, 20 Mar 2015 19:16:53 +0100, > Denys Vlasenko wrote: > > Takashi, are you willing to reproduce the panic one more time, > > with this patch? I would like to see whether oops messages > > are more informative with it. > > It can't be applied to 4.0-rc5, unfortunately. > > arch/x86/kernel/entry_64.S: Assembler messages: > arch/x86/kernel/entry_64.S:1725: Error: no such instruction: > `alloc_pt_gpregs_on_stack' > arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* > sections) for `+' > scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' > failed I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[] [] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[] [] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 RBP: 0ea7 R08: 1ea7 R09: R10: 0309dbf8 R11: 0246 R12: 0001 R13: R14: 03026e40 R15: 0309cd50 FS: 7f89c83c2800() GS:88021d24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016d CR3: d90a CR4: 001427e0 Stack: 0ea7 03099c10 0ea7 0ea7 0001 03099c10 0ea7 00c84696 03099c88 7f0122c23fb8 0302f610 Call Trace: Code: 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 Kernel panic - not syncing: Machine halted. CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 8800d1b33e28 816f80d2 81a22f81 8800d1b33ea8 816f2358 58d7 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 Call Trace: [] dump_stack+0x4c/0x6e [] panic+0xc0/0x1f3 [] df_debug+0x35/0x40 [] do_double_fault+0x87/0x100 [] do_userpsace_rsp_in_kernel+0x107/0x140 [] ? netlink_attachskb+0x1d/0x1d0 [] userpsace_rsp_in_kernel+0x36/0x40 [] ? netlink_attachskb+0x1d/0x1d0 So, it seems hitting in netlink_attachskb(). I'd need to check whether this consistently hits there or just at random. Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: > > Hi, > > This particular crash was hard to diagnose because of two reasons: > > * CPU would happily use userspace RSP in kernel mode. > Crash comes only later, when we run off the stack. > We lose information when it started. > > * Kernel's error handling code is ill prepared for RSP pointing > to user stack. So we take another page fault trying > to dump stack. > > I prepared a patch which helps with both problems. > > For testing, I inserted an invalid instruction right before SYSRET > to induce a similar bug, and booted resulting kernel in qemu. > > Before my patch, double fault output starts like this: > > [0.715216] PANIC: double fault, error_code: 0x0 > [0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7 > [0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [0.716033] task: 880007588000 ti: 88000759 task.ti: > 88000759 > [0.716033] RIP: 0010:[] [] > do_error_trap+0x47/0x120 > [0.716033] RSP: 0018:7ffd89e7ffb8 EFLAGS: 00010006 > > The key here is that it doesn't show at which RIP we took the first > "bad" exception. The only useful detail visible here is bad RSP. > "do_error_trap+0x47" is useless. > > After the patch, the very moment of "bad" exception is caught: > > [0.666758] Exception on user stack 7ffc1fd0c388: RSP: > 0018:7ffc1fd0c3b0 EFLAGS: 00010006 > [0.667285] RIP: 0010:[] [] > ret_from_sys_call+0x5f/0x67 > [0.667285] PANIC: double fault, error_code: 0x > [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 > [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [0.667285] task: 880007588000 ti: 88000759 task.ti: > 88000759 > [0.667285] RIP: 0010:[] [] > ret_from_sys_call+0x5f/0x67 > [0.667285] RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 > > The exception happened at "ret_from_sys_call+0x5f". > We also won't take another page fault any more, > output proceeds like this: > > ... > [0.667285] RAX: 07a0 RBX: 7ffc1fd0c4e0 RCX: > c101 > [0.667285] RDX: 8800 RSI: 5401 RDI: > 7ffc1fd0c388 > [0.667285] RBP: 7ffc1fd0c570 R08: 0010 R09: > > [0.667285] R10: 7ffc1fd0c650 R11: 0202 R12: > 0120 > [0.667285] R13: 005f7b78 R14: R15: > 004c9d44 > [0.667285] FS: () GS:880007a0() > knlGS: > [0.667285] CS: 0010 DS: ES: CR0: 8005003b > [0.667285] CR2: 004ad1e4 CR3: 00101000 CR4: > 07f0 > [0.667285] Stack: > [0.667285] 0018 7ffc1fd0c490 7ffc1fd0c3d0 > > [0.667285] 7ffc1fd0c490 > > [0.667285] > > [0.667285] Call Trace: > [0.667285] > [0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 > 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 <0f> > 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48 > [0.667285] Kernel panic - not syncing: Machine halted. > [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 > [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [0.667285] 880007593e28 81789625 > 880007588000 > [0.667285] 81a3b181 880007593ea8 817840aa > 88000759 > [0.667285] 0008 880007593eb8 880007593e58 > 0001 > [0.667285] Call Trace: > [0.667285] [] dump_stack+0x4c/0x65 > [0.667285] [] panic+0xc6/0x1ff > [0.667285] [] df_debug+0x35/0x40 > [0.667285] [] do_double_fault+0x87/0x100 > [0.667285] [] do_userpsace_rsp_in_kernel+0x107/0x140 > [0.667285] [] ? ret_from_sys_call+0x5f/0x67 > [0.667285] [] userpsace_rsp_in_kernel+0x39/0x40 > [0.667285] [] ? ret_from_sys_call+0x5f/0x67 > [0.667285] Kernel Offset: disabled > [0.667285] Rebooting in 1 seconds.. > > Takashi, are you willing to reproduce the panic one more time, > with this patch? I would like to see whether oops messages > are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 RBP: 0ea7 R08: 1ea7 R09: R10: 0309dbf8 R11: 0246 R12: 0001 R13: R14: 03026e40 R15: 0309cd50 FS: 7f89c83c2800() GS:88021d24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016d CR3: d90a CR4: 001427e0 Stack: 0ea7 03099c10 0ea7 0ea7 0001 03099c10 0ea7 00c84696 03099c88 7f0122c23fb8 0302f610 Call Trace: UNK Code: 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 8b 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 Kernel panic - not syncing: Machine halted. CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 8800d1b33e28 816f80d2 81a22f81 8800d1b33ea8 816f2358 58d7 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 Call Trace: [816f80d2] dump_stack+0x4c/0x6e [816f2358] panic+0xc0/0x1f3 [81046e65] df_debug+0x35/0x40 [81003fe7] do_double_fault+0x87/0x100 [81004167] do_userpsace_rsp_in_kernel+0x107/0x140 [8162681d] ? netlink_attachskb+0x1d/0x1d0 [81703ca6] userpsace_rsp_in_kernel+0x36/0x40 [8162681d] ? netlink_attachskb+0x1d/0x1d0 So, it seems hitting in netlink_attachskb(). I'd need to check whether this consistently hits there or just at random. I managed to reproduce the bug two more times, and all three show the very same stack trace like the above. So, it's well reproducible. I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this fixes the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) - Another piece is that the bug happens only when a KVM is running. The kernel ran without problem over days with similar tasks (compiling kernel, etc) when no KVM was used. - And now I get the trace as above, pointing netlink_attachskb(). I have a difficulty to imagine how all these pieces fit into a single picture. Is something already screwed up before that? Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 RBP: 0ea7 R08: 1ea7 R09: R10: 0309dbf8 R11: 0246 R12: 0001 R13: R14: 03026e40 R15: 0309cd50 FS: 7f89c83c2800() GS:88021d24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016d CR3: d90a CR4: 001427e0 Stack: 0ea7 03099c10 0ea7 0ea7 0001 03099c10 0ea7 00c84696 03099c88 7f0122c23fb8 0302f610 Call Trace: UNK Code: 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 8b 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 Kernel panic - not syncing: Machine halted. CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 8800d1b33e28 816f80d2 81a22f81 8800d1b33ea8 816f2358 58d7 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 Call Trace: [816f80d2] dump_stack+0x4c/0x6e [816f2358] panic+0xc0/0x1f3 [81046e65] df_debug+0x35/0x40 [81003fe7] do_double_fault+0x87/0x100 [81004167] do_userpsace_rsp_in_kernel+0x107/0x140 [8162681d] ? netlink_attachskb+0x1d/0x1d0 [81703ca6] userpsace_rsp_in_kernel+0x36/0x40 [8162681d] ? netlink_attachskb+0x1d/0x1d0 So, it seems hitting in netlink_attachskb(). I'd need to check whether this consistently hits there or just at random. Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 RBP: 0ea7 R08: 1ea7 R09: R10: 0309dbf8 R11: 0246 R12: 0001 R13: R14: 03026e40 R15: 0309cd50 FS: 7f89c83c2800() GS:88021d24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016d CR3: d90a CR4: 001427e0 Stack: 0ea7 03099c10 0ea7 0ea7 0001 03099c10 0ea7 00c84696 03099c88 7f0122c23fb8 0302f610 Call Trace: UNK Code: 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 8b 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 Kernel panic - not syncing: Machine halted. CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 8800d1b33e28 816f80d2 81a22f81 8800d1b33ea8 816f2358 58d7 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 Call Trace: [816f80d2] dump_stack+0x4c/0x6e [816f2358] panic+0xc0/0x1f3 [81046e65] df_debug+0x35/0x40 [81003fe7] do_double_fault+0x87/0x100 [81004167] do_userpsace_rsp_in_kernel+0x107/0x140 [8162681d] ? netlink_attachskb+0x1d/0x1d0 [81703ca6] userpsace_rsp_in_kernel+0x36/0x40 [8162681d] ? netlink_attachskb+0x1d/0x1d0 So, it seems hitting in netlink_attachskb(). I'd need to check whether this consistently hits there or just at random. I managed to reproduce the bug two more times, and all three show the very same stack trace like the above. So, it's well reproducible. FYI: the disassembly of netlink_attachskb (from Code: line) is: 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 55 push %rbp 6: 48 89 e5mov%rsp,%rbp 9: 41 56 push %r14 b: 41 55 push %r13 d: 49 89 d5mov%rdx,%r13 10: 41 54 push %r12 12: 49 89 f4mov%rsi,%r12 15: 53 push %rbx 16: 48 89 fbmov%rdi,%rbx 19: 48 83 ec 30 sub$0x30,%rsp 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax ^ 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) 29: 7c 25 jl 50 _start+0x50 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax The ^ instruction is the one which faults. Since you said it consistently happens here, this should be a page fault, not an external hardware interrupt. The code corresponds to the comparison in if(): int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock *nlk; nlk = nlk_sk(sk); if ((atomic_read(sk-sk_rmem_alloc) sk-sk_rcvbuf || %rdi (which is 1st param, struct sock *sk) is 7ffd22c23ef0 (userspace
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 07:38 PM, Andy Lutomirski wrote: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. Clear up to now... We IRET back to the syscall exit code, So we end up being just after the testl, right? We go into int_ret_from_sys_call_fixup. We FIXUP_TOP_OF_STACK - now iret frame contains correct values. Then we jump to int_ret_from_sys_call. turn off interrupts, and do sysret. We are now screwed. I don't understand. Where exactly it would go wrong? On sysret, rsp would be restored from PER_CPU(old_rsp), right? We'd end up in *userspace* with userspace rsp. More to it. Since we FIXUPed the iret frame, it does not even matter how we'll exit to userspace. Either sysret or iret would work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 12:07 PM, Denys Vlasenko dvlas...@redhat.com wrote: On 03/23/2015 07:38 PM, Andy Lutomirski wrote: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. Clear up to now... We IRET back to the syscall exit code, So we end up being just after the testl, right? We go into int_ret_from_sys_call_fixup. Nope, other way around. We saw no work bits set in testl, but one or more of those bits was set when we're preempted and return. Now we *don't* go to int_ret_from_sys_call_fixup. I don't think that the resulting sysret itself is harmful, but I think we're now running user code with some MSRs programmed wrong. The next syscall could do bad things, such as failing to clear IF. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Hi, This particular crash was hard to diagnose because of two reasons: * CPU would happily use userspace RSP in kernel mode. Crash comes only later, when we run off the stack. We lose information when it started. * Kernel's error handling code is ill prepared for RSP pointing to user stack. So we take another page fault trying to dump stack. I prepared a patch which helps with both problems. For testing, I inserted an invalid instruction right before SYSRET to induce a similar bug, and booted resulting kernel in qemu. Before my patch, double fault output starts like this: [0.715216] PANIC: double fault, error_code: 0x0 [0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7 [0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.716033] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.716033] RIP: 0010:[81017057] [81017057] do_error_trap+0x47/0x120 [0.716033] RSP: 0018:7ffd89e7ffb8 EFLAGS: 00010006 The key here is that it doesn't show at which RIP we took the first bad exception. The only useful detail visible here is bad RSP. do_error_trap+0x47 is useless. After the patch, the very moment of bad exception is caught: [0.666758] Exception on user stack 7ffc1fd0c388: RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 [0.667285] RIP: 0010:[81793688] [81793688] ret_from_sys_call+0x5f/0x67 [0.667285] PANIC: double fault, error_code: 0x [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.667285] RIP: 0010:[81793688] [81793688] ret_from_sys_call+0x5f/0x67 [0.667285] RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 The exception happened at ret_from_sys_call+0x5f. We also won't take another page fault any more, output proceeds like this: ... [0.667285] RAX: 07a0 RBX: 7ffc1fd0c4e0 RCX: c101 [0.667285] RDX: 8800 RSI: 5401 RDI: 7ffc1fd0c388 [0.667285] RBP: 7ffc1fd0c570 R08: 0010 R09: [0.667285] R10: 7ffc1fd0c650 R11: 0202 R12: 0120 [0.667285] R13: 005f7b78 R14: R15: 004c9d44 [0.667285] FS: () GS:880007a0() knlGS: [0.667285] CS: 0010 DS: ES: CR0: 8005003b [0.667285] CR2: 004ad1e4 CR3: 00101000 CR4: 07f0 [0.667285] Stack: [0.667285] 0018 7ffc1fd0c490 7ffc1fd0c3d0 [0.667285] 7ffc1fd0c490 [0.667285] [0.667285] Call Trace: [0.667285] UNK [0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 0f 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48 [0.667285] Kernel panic - not syncing: Machine halted. [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] 880007593e28 81789625 880007588000 [0.667285] 81a3b181 880007593ea8 817840aa 88000759 [0.667285] 0008 880007593eb8 880007593e58 0001 [0.667285] Call Trace: [0.667285] [81789625] dump_stack+0x4c/0x65 [0.667285] [817840aa] panic+0xc6/0x1ff [0.667285] [81059ee5] df_debug+0x35/0x40 [0.667285] [81017e37] do_double_fault+0x87/0x100 [0.667285] [81017fb7] do_userpsace_rsp_in_kernel+0x107/0x140 [0.667285] [81793688] ? ret_from_sys_call+0x5f/0x67 [0.667285] [81795b49] userpsace_rsp_in_kernel+0x39/0x40 [0.667285] [81793688] ? ret_from_sys_call+0x5f/0x67 [0.667285] Kernel Offset: disabled [0.667285] Rebooting in 1 seconds.. Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed Takashi -- To unsubscribe from this list: send the line unsubscribe
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. It can't be applied to 4.0-rc5, unfortunately. arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack' arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+' scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 RBP: 0ea7 R08: 1ea7 R09: R10: 0309dbf8 R11: 0246 R12: 0001 R13: R14: 03026e40 R15: 0309cd50 FS: 7f89c83c2800() GS:88021d24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016d CR3: d90a CR4: 001427e0 Stack: 0ea7 03099c10 0ea7 0ea7 0001 03099c10 0ea7 00c84696 03099c88 7f0122c23fb8 0302f610 Call Trace: UNK Code: 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 8b 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 Kernel panic - not syncing: Machine halted. CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 8800d1b33e28 816f80d2 81a22f81 8800d1b33ea8 816f2358 58d7 0008 8800d1b33eb8 8800d1b33e58 8800d1b33ea8 Call Trace: [816f80d2] dump_stack+0x4c/0x6e [816f2358] panic+0xc0/0x1f3 [81046e65] df_debug+0x35/0x40 [81003fe7] do_double_fault+0x87/0x100 [81004167] do_userpsace_rsp_in_kernel+0x107/0x140 [8162681d] ? netlink_attachskb+0x1d/0x1d0 [81703ca6] userpsace_rsp_in_kernel+0x36/0x40 [8162681d] ? netlink_attachskb+0x1d/0x1d0 So, it seems hitting in netlink_attachskb(). I'd need to check whether this consistently hits there or just at random. I managed to reproduce the bug two more times, and all three show the very same stack trace like the above. So, it's well reproducible. FYI: the disassembly of netlink_attachskb (from Code: line) is: 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 55 push %rbp 6: 48 89 e5mov%rsp,%rbp 9: 41 56 push %r14 b: 41 55 push %r13 d: 49 89 d5mov%rdx,%r13 10: 41 54 push %r12 12: 49 89 f4mov%rsi,%r12 15: 53 push %rbx 16: 48 89 fbmov%rdi,%rbx 19: 48 83 ec 30 sub$0x30,%rsp 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax ^ 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) 29: 7c 25 jl 50 _start+0x50 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax The ^ instruction is the one which faults. Since you said it consistently happens here, this should be a page fault, not an external hardware interrupt. The code corresponds to the comparison in if(): int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski: I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? not in my case (penryn CPU), there it was 64bit guests. Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 11:48:42 -0700, Andy Lutomirski wrote: On Mon, Mar 23, 2015 at 11:38 AM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko dvlas...@redhat.com wrote: On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this fixes the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from interrupt-disabled region to interrupt-enabled: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? Never mind, I'm testing a slightly fancier patch. OK, I'll wait for your test patch. thanks, Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 18:46:45 +0100, Denys Vlasenko wrote: On 03/23/2015 06:18 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 FYI: the disassembly of netlink_attachskb (from Code: line) is: 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 55 push %rbp 6: 48 89 e5mov%rsp,%rbp 9: 41 56 push %r14 b: 41 55 push %r13 d: 49 89 d5mov%rdx,%r13 10: 41 54 push %r12 12: 49 89 f4mov%rsi,%r12 15: 53 push %rbx 16: 48 89 fbmov%rdi,%rbx 19: 48 83 ec 30 sub$0x30,%rsp 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax ^ 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) 29: 7c 25 jl 50 _start+0x50 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax The ^ instruction is the one which faults. Since you said it consistently happens here, this should be a page fault, not an external hardware interrupt. The code corresponds to the comparison in if(): int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock *nlk; nlk = nlk_sk(sk); if ((atomic_read(sk-sk_rmem_alloc) sk-sk_rcvbuf || - Another piece is that the bug happens only when a KVM is running. The kernel ran without problem over days with similar tasks (compiling kernel, etc) when no KVM was used. Conceivably virtualization support in CPUs can have nasty erratas. However, you and other reporter have different CPUs - yours is Ivy Bridge, his CPU is a Penryn. I don't see the path how KVM helps to trigger this. - And now I get the trace as above, pointing netlink_attachskb(). I have a difficulty to imagine how all these pieces fit into a single picture. Is something already screwed up before that? Well, a tiny bit more info will be seen if you'd change %rdi to, say, %r15 in these two lines in my patch: /* Save bogus RSP value */ movq%rsp,%rdi ... push%rdi/* pt_regs-sp */ Then original %rdi will be visible in the crash message. OK, here we go. kvm: zapping shadow pages for mmio generation wraparound kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7fff1d7e5ec0: RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 5 PID: 14285 Comm: fixdep Tainted: GW 4.0.0-rc5-debug1+ #3 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 88020ba1c690 ti: 880206ba4000 task.ti: 880206ba4000 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 RAX: RBX: RCX: c101 RDX: RSI: 1ebb RDI: Thanks for your testing. So the %rdi was NULL... not very informative. Notice that your every crash is preceded by kvm: zapping shadow pages for mmio generation wraparound kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x This hints that kvm _is_ somehow responsible. It's likely irrelevant, as this appears at the time a VM starting, not at the crash time. I've got this message all the time. Sorry for confusing. Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/23/2015 06:18 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote: I pulled tip tree on top of 4.0-rc5, built with your patch and now succeeded to get a better message: kvm: zapping shadow pages for mmio generation wraparound kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7ffd22c23ef0: RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 1 PID: 10819 Comm: cc1 Tainted: GW 4.0.0-rc5-debug1+ #2 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 8800d1b34b10 ti: 8800d1b3 task.ti: 8800d1b3 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7ffd22c23f28 EFLAGS: 00010006 RAX: RBX: 0005 RCX: c101 RDX: RSI: 0001 RDI: 7ffd22c23ef0 FYI: the disassembly of netlink_attachskb (from Code: line) is: 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 55 push %rbp 6: 48 89 e5mov%rsp,%rbp 9: 41 56 push %r14 b: 41 55 push %r13 d: 49 89 d5mov%rdx,%r13 10: 41 54 push %r12 12: 49 89 f4mov%rsi,%r12 15: 53 push %rbx 16: 48 89 fbmov%rdi,%rbx 19: 48 83 ec 30 sub$0x30,%rsp 1d: 8b 87 68 01 00 00 mov0x168(%rdi),%eax ^ 23: 39 87 9c 01 00 00 cmp%eax,0x19c(%rdi) 29: 7c 25 jl 50 _start+0x50 2b: 48 8b 87 88 04 00 00mov0x488(%rdi),%rax The ^ instruction is the one which faults. Since you said it consistently happens here, this should be a page fault, not an external hardware interrupt. The code corresponds to the comparison in if(): int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock *nlk; nlk = nlk_sk(sk); if ((atomic_read(sk-sk_rmem_alloc) sk-sk_rcvbuf || - Another piece is that the bug happens only when a KVM is running. The kernel ran without problem over days with similar tasks (compiling kernel, etc) when no KVM was used. Conceivably virtualization support in CPUs can have nasty erratas. However, you and other reporter have different CPUs - yours is Ivy Bridge, his CPU is a Penryn. I don't see the path how KVM helps to trigger this. - And now I get the trace as above, pointing netlink_attachskb(). I have a difficulty to imagine how all these pieces fit into a single picture. Is something already screwed up before that? Well, a tiny bit more info will be seen if you'd change %rdi to, say, %r15 in these two lines in my patch: /* Save bogus RSP value */ movq%rsp,%rdi ... push%rdi/* pt_regs-sp */ Then original %rdi will be visible in the crash message. OK, here we go. kvm: zapping shadow pages for mmio generation wraparound kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x Exception on user stack 7fff1d7e5ec0: RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 PANIC: double fault, error_code: 0x0 CPU: 5 PID: 14285 Comm: fixdep Tainted: GW 4.0.0-rc5-debug1+ #3 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: 88020ba1c690 ti: 880206ba4000 task.ti: 880206ba4000 RIP: 0010:[8162681d] [8162681d] netlink_attachskb+0x1d/0x1d0 RSP: 0018:7fff1d7e5ef8 EFLAGS: 00010002 RAX: RBX: RCX: c101 RDX: RSI: 1ebb RDI: Thanks for your testing. So the %rdi was NULL... not very informative. Notice that your every crash is preceded by kvm: zapping shadow pages for mmio generation wraparound kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x This hints that kvm _is_ somehow responsible. I'm no expert on kvm, I need to take a look around that code... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko dvlas...@redhat.com wrote: On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this fixes the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from interrupt-disabled region to interrupt-enabled: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? Shot-in-the-dark idea. At this code revision we did not yet store user's %rsp in pt_regs-sp, we used a fixup to populate it: .macro FIXUP_TOP_OF_STACK tmp offset=0 movq PER_CPU_VAR(old_rsp),\tmp movq \tmp,RSP+\offset(%rsp) (There are pending patches to fix this mess). If an interrupt interrupting *kernel code* would go into a code path which does FIXUP_TOP_OF_STACK, it'd overwrite the correct saved %rsp with a user's one. The iret from interrupt would work, but the resulting CPU state would be inconsistent. But I don't see such a code path from interrupts to FIXUP_TOP_OF_STACK... I don't buy it. Anything that does that is so completely broken that I'd hope we'd have found it long ago. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Mon, Mar 23, 2015 at 11:38 AM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko dvlas...@redhat.com wrote: On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this fixes the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from interrupt-disabled region to interrupt-enabled: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? Never mind, I'm testing a slightly fancier patch. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Mon, 23 Mar 2015 11:38:30 -0700, Andy Lutomirski wrote: On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko dvlas...@redhat.com wrote: On 03/23/2015 02:22 PM, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:35:41 +0100, Takashi Iwai wrote: At Mon, 23 Mar 2015 10:02:52 +0100, Takashi Iwai wrote: At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: I'm really puzzled now. We have a few pieces of information: - git bisection pointed the commit 96b6352c1271: x86_64, entry: Remove the syscall exit audit and schedule optimizations and reverting this fixes the problem indeed. Even just moving two lines LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) at the beginning of ret_from_sys_call already fixes. (Of course I can't prove the fix but it stabilizes for a day without crash while usually I hit the bug in 10 minutes in full test running.) The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from interrupt-disabled region to interrupt-enabled: cmpq $__NR_syscall_max,%rax ja ret_from_sys_call movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX:rip relative movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jnz int_ret_from_sys_call_fixup /* Go the the slow path */ LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF ... ... int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call ... ... GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE). I also don't see how moving that check (even if it is wrong in a more benign way) can have such a drastic effect. I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. Thanks for enlightening! That looks like a feasible scenario. (I tested only a 64bit KVM guest, BTW.) I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? The patch below worked. I'll double-check tomorrow whether this really cures reliably. thanks, Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: > > Takashi, are you willing to reproduce the panic one more time, > with this patch? I would like to see whether oops messages > are more informative with it. Sure, I'll do it, but you'll have to wait until the next Monday as the bug is triggered only on a machine in my office. I checked my local laptop, but it doesn't show the problem. Maybe someone else can test it beforehand... thanks, Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi, This particular crash was hard to diagnose because of two reasons: * CPU would happily use userspace RSP in kernel mode. Crash comes only later, when we run off the stack. We lose information when it started. * Kernel's error handling code is ill prepared for RSP pointing to user stack. So we take another page fault trying to dump stack. I prepared a patch which helps with both problems. For testing, I inserted an invalid instruction right before SYSRET to induce a similar bug, and booted resulting kernel in qemu. Before my patch, double fault output starts like this: [0.715216] PANIC: double fault, error_code: 0x0 [0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7 [0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.716033] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.716033] RIP: 0010:[] [] do_error_trap+0x47/0x120 [0.716033] RSP: 0018:7ffd89e7ffb8 EFLAGS: 00010006 The key here is that it doesn't show at which RIP we took the first "bad" exception. The only useful detail visible here is bad RSP. "do_error_trap+0x47" is useless. After the patch, the very moment of "bad" exception is caught: [0.666758] Exception on user stack 7ffc1fd0c388: RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 [0.667285] RIP: 0010:[] [] ret_from_sys_call+0x5f/0x67 [0.667285] PANIC: double fault, error_code: 0x [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.667285] RIP: 0010:[] [] ret_from_sys_call+0x5f/0x67 [0.667285] RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 The exception happened at "ret_from_sys_call+0x5f". We also won't take another page fault any more, output proceeds like this: ... [0.667285] RAX: 07a0 RBX: 7ffc1fd0c4e0 RCX: c101 [0.667285] RDX: 8800 RSI: 5401 RDI: 7ffc1fd0c388 [0.667285] RBP: 7ffc1fd0c570 R08: 0010 R09: [0.667285] R10: 7ffc1fd0c650 R11: 0202 R12: 0120 [0.667285] R13: 005f7b78 R14: R15: 004c9d44 [0.667285] FS: () GS:880007a0() knlGS: [0.667285] CS: 0010 DS: ES: CR0: 8005003b [0.667285] CR2: 004ad1e4 CR3: 00101000 CR4: 07f0 [0.667285] Stack: [0.667285] 0018 7ffc1fd0c490 7ffc1fd0c3d0 [0.667285] 7ffc1fd0c490 [0.667285] [0.667285] Call Trace: [0.667285] [0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 <0f> 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48 [0.667285] Kernel panic - not syncing: Machine halted. [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] 880007593e28 81789625 880007588000 [0.667285] 81a3b181 880007593ea8 817840aa 88000759 [0.667285] 0008 880007593eb8 880007593e58 0001 [0.667285] Call Trace: [0.667285] [] dump_stack+0x4c/0x65 [0.667285] [] panic+0xc6/0x1ff [0.667285] [] df_debug+0x35/0x40 [0.667285] [] do_double_fault+0x87/0x100 [0.667285] [] do_userpsace_rsp_in_kernel+0x107/0x140 [0.667285] [] ? ret_from_sys_call+0x5f/0x67 [0.667285] [] userpsace_rsp_in_kernel+0x39/0x40 [0.667285] [] ? ret_from_sys_call+0x5f/0x67 [0.667285] Kernel Offset: disabled [0.667285] Rebooting in 1 seconds.. Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 4e49d7d..92a35e6 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -70,6 +70,7 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long); dotraplinkage void do_stack_segment(struct pt_regs *, long); #ifdef CONFIG_X86_64 dotraplinkage void do_double_fault(struct pt_regs *, long); +dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs); asmlinkage struct pt_regs *sync_regs(struct pt_regs *); #endif dotraplinkage void do_general_protection(struct pt_regs *, long); diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 0c91256..fb85c26 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -958,6 +958,12 @@ ENTRY(\sym) INTR_FRAME
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Fri, 20 Mar 2015 19:16:53 +0100, Denys Vlasenko wrote: Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. Sure, I'll do it, but you'll have to wait until the next Monday as the bug is triggered only on a machine in my office. I checked my local laptop, but it doesn't show the problem. Maybe someone else can test it beforehand... thanks, Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi, This particular crash was hard to diagnose because of two reasons: * CPU would happily use userspace RSP in kernel mode. Crash comes only later, when we run off the stack. We lose information when it started. * Kernel's error handling code is ill prepared for RSP pointing to user stack. So we take another page fault trying to dump stack. I prepared a patch which helps with both problems. For testing, I inserted an invalid instruction right before SYSRET to induce a similar bug, and booted resulting kernel in qemu. Before my patch, double fault output starts like this: [0.715216] PANIC: double fault, error_code: 0x0 [0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7 [0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.716033] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.716033] RIP: 0010:[81017057] [81017057] do_error_trap+0x47/0x120 [0.716033] RSP: 0018:7ffd89e7ffb8 EFLAGS: 00010006 The key here is that it doesn't show at which RIP we took the first bad exception. The only useful detail visible here is bad RSP. do_error_trap+0x47 is useless. After the patch, the very moment of bad exception is caught: [0.666758] Exception on user stack 7ffc1fd0c388: RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 [0.667285] RIP: 0010:[81793688] [81793688] ret_from_sys_call+0x5f/0x67 [0.667285] PANIC: double fault, error_code: 0x [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] task: 880007588000 ti: 88000759 task.ti: 88000759 [0.667285] RIP: 0010:[81793688] [81793688] ret_from_sys_call+0x5f/0x67 [0.667285] RSP: 0018:7ffc1fd0c3b0 EFLAGS: 00010006 The exception happened at ret_from_sys_call+0x5f. We also won't take another page fault any more, output proceeds like this: ... [0.667285] RAX: 07a0 RBX: 7ffc1fd0c4e0 RCX: c101 [0.667285] RDX: 8800 RSI: 5401 RDI: 7ffc1fd0c388 [0.667285] RBP: 7ffc1fd0c570 R08: 0010 R09: [0.667285] R10: 7ffc1fd0c650 R11: 0202 R12: 0120 [0.667285] R13: 005f7b78 R14: R15: 004c9d44 [0.667285] FS: () GS:880007a0() knlGS: [0.667285] CS: 0010 DS: ES: CR0: 8005003b [0.667285] CR2: 004ad1e4 CR3: 00101000 CR4: 07f0 [0.667285] Stack: [0.667285] 0018 7ffc1fd0c490 7ffc1fd0c3d0 [0.667285] 7ffc1fd0c490 [0.667285] [0.667285] Call Trace: [0.667285] UNK [0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 0f 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48 [0.667285] Kernel panic - not syncing: Machine halted. [0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13 [0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.667285] 880007593e28 81789625 880007588000 [0.667285] 81a3b181 880007593ea8 817840aa 88000759 [0.667285] 0008 880007593eb8 880007593e58 0001 [0.667285] Call Trace: [0.667285] [81789625] dump_stack+0x4c/0x65 [0.667285] [817840aa] panic+0xc6/0x1ff [0.667285] [81059ee5] df_debug+0x35/0x40 [0.667285] [81017e37] do_double_fault+0x87/0x100 [0.667285] [81017fb7] do_userpsace_rsp_in_kernel+0x107/0x140 [0.667285] [81793688] ? ret_from_sys_call+0x5f/0x67 [0.667285] [81795b49] userpsace_rsp_in_kernel+0x39/0x40 [0.667285] [81793688] ? ret_from_sys_call+0x5f/0x67 [0.667285] Kernel Offset: disabled [0.667285] Rebooting in 1 seconds.. Takashi, are you willing to reproduce the panic one more time, with this patch? I would like to see whether oops messages are more informative with it. diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 4e49d7d..92a35e6 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -70,6 +70,7 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long); dotraplinkage void do_stack_segment(struct pt_regs *, long); #ifdef CONFIG_X86_64 dotraplinkage void do_double_fault(struct pt_regs *, long); +dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs); asmlinkage struct pt_regs *sync_regs(struct pt_regs *); #endif dotraplinkage void do_general_protection(struct pt_regs *, long);
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Thu, Mar 19, 2015 at 8:51 AM, Takashi Iwai wrote: > At Thu, 19 Mar 2015 08:41:57 -0700, > Andy Lutomirski wrote: >> >> On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai wrote: >> > At Thu, 19 Mar 2015 15:55:26 +0100, >> > Takashi Iwai wrote: >> >> >> >> At Thu, 19 Mar 2015 14:47:12 +0100, >> >> Takashi Iwai wrote: >> >> > >> >> > At Thu, 19 Mar 2015 13:48:56 +0100, >> >> > Denys Vlasenko wrote: >> >> > > >> >> > > Having no more ideas at the moment, here is a tarball of 13 patches >> >> > > of commits touching entry_64.S up to 4.0.0-rc1. >> >> > > >> >> > > x0001.patch is the latest, x0015.patch is the oldest. >> >> > > >> >> > > Patches 0003 and 0008 are not there since 0003 is empty merge patch >> >> > > and 0008 does some PCI fixup. >> >> > > >> >> > > If this breakage is recent, it ought to be one of these. >> >> > > Most of them do some non-trivial surgery. >> >> > > >> >> > > Even though I did not spot anything suspicious in them, >> >> > > entry.S is notorious for subtle breakage. >> >> > > >> >> > > Try reverting them in sequence starting from x0001.patch >> >> > > and see reverting which one makes crash disappear. >> >> > >> >> > OK, I'm going to check these git series. >> >> >> >> Reverting the commit >> >> 96b6352c12711d5c0bb7157f49c92580248e8146 >> >> x86_64, entry: Remove the syscall exit audit and schedule >> >> optimizations >> >> >> >> seems enough. After reverting this one, the machine runs stable with >> >> the kvm stress test. >> >> >> >> (I'll keep test running for a while; at the previous bisection, I hit >> >> the bug right after posting the mail ;) >> > >> > It survived long enough, so this looks like the spot. >> > >> > Also, I checked the patch below instead of reverting the commit, and >> > this seems working, too. >> > >> > >> > Takashi >> > >> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S >> > index 1d74d161687c..5340ac7f88a9 100644 >> > --- a/arch/x86/kernel/entry_64.S >> > +++ b/arch/x86/kernel/entry_64.S >> > @@ -364,12 +364,12 @@ system_call_fastpath: >> > * Has incomplete stack frame and undefined top of stack. >> > */ >> > ret_from_sys_call: >> > - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) >> > - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ >> > - >> > LOCKDEP_SYS_EXIT >> > DISABLE_INTERRUPTS(CLBR_NONE) >> > TRACE_IRQS_OFF >> > + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) >> > + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ >> > + >> > CFI_REMEMBER_STATE >> > /* >> > * sysretq will re-enable interrupts: >> >> The crash you're seeing could certainly be caused by an IRQ at the >> wrong time. However: >> >> int_ret_from_sys_call_fixup: >> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET >> jmp int_ret_from_sys_call >> >> and >> >> GLOBAL(int_ret_from_sys_call) >> DISABLE_INTERRUPTS(CLBR_NONE) >> TRACE_IRQS_OFF >> >> so with or without your little patch, we're turning off IRQs very >> quickly. retint_swapgs also turnes off interrupts before doing >> anything. So I don't see how your patch would have any effect. > > What about LOCKDEP_SYS_EXIT? > There's a LOCKDEP_SYS_EXIT_IRQ a few lines down in int_ret_from_sys_call, and the syscall slow path falls through directly to int_ret_from_sys_call. I'm going to try to write a diagnostic patch now. I have four separate contractors coming starting half an hour ago*, so it might take a while. * Yeah, right. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 08:41:57 -0700, Andy Lutomirski wrote: > > On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai wrote: > > At Thu, 19 Mar 2015 15:55:26 +0100, > > Takashi Iwai wrote: > >> > >> At Thu, 19 Mar 2015 14:47:12 +0100, > >> Takashi Iwai wrote: > >> > > >> > At Thu, 19 Mar 2015 13:48:56 +0100, > >> > Denys Vlasenko wrote: > >> > > > >> > > Having no more ideas at the moment, here is a tarball of 13 patches > >> > > of commits touching entry_64.S up to 4.0.0-rc1. > >> > > > >> > > x0001.patch is the latest, x0015.patch is the oldest. > >> > > > >> > > Patches 0003 and 0008 are not there since 0003 is empty merge patch > >> > > and 0008 does some PCI fixup. > >> > > > >> > > If this breakage is recent, it ought to be one of these. > >> > > Most of them do some non-trivial surgery. > >> > > > >> > > Even though I did not spot anything suspicious in them, > >> > > entry.S is notorious for subtle breakage. > >> > > > >> > > Try reverting them in sequence starting from x0001.patch > >> > > and see reverting which one makes crash disappear. > >> > > >> > OK, I'm going to check these git series. > >> > >> Reverting the commit > >> 96b6352c12711d5c0bb7157f49c92580248e8146 > >> x86_64, entry: Remove the syscall exit audit and schedule optimizations > >> > >> seems enough. After reverting this one, the machine runs stable with > >> the kvm stress test. > >> > >> (I'll keep test running for a while; at the previous bisection, I hit > >> the bug right after posting the mail ;) > > > > It survived long enough, so this looks like the spot. > > > > Also, I checked the patch below instead of reverting the commit, and > > this seems working, too. > > > > > > Takashi > > > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > > index 1d74d161687c..5340ac7f88a9 100644 > > --- a/arch/x86/kernel/entry_64.S > > +++ b/arch/x86/kernel/entry_64.S > > @@ -364,12 +364,12 @@ system_call_fastpath: > > * Has incomplete stack frame and undefined top of stack. > > */ > > ret_from_sys_call: > > - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > > - > > LOCKDEP_SYS_EXIT > > DISABLE_INTERRUPTS(CLBR_NONE) > > TRACE_IRQS_OFF > > + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > > + > > CFI_REMEMBER_STATE > > /* > > * sysretq will re-enable interrupts: > > The crash you're seeing could certainly be caused by an IRQ at the > wrong time. However: > > int_ret_from_sys_call_fixup: > FIXUP_TOP_OF_STACK %r11, -ARGOFFSET > jmp int_ret_from_sys_call > > and > > GLOBAL(int_ret_from_sys_call) > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > > so with or without your little patch, we're turning off IRQs very > quickly. retint_swapgs also turnes off interrupts before doing > anything. So I don't see how your patch would have any effect. What about LOCKDEP_SYS_EXIT? Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai wrote: > At Thu, 19 Mar 2015 15:55:26 +0100, > Takashi Iwai wrote: >> >> At Thu, 19 Mar 2015 14:47:12 +0100, >> Takashi Iwai wrote: >> > >> > At Thu, 19 Mar 2015 13:48:56 +0100, >> > Denys Vlasenko wrote: >> > > >> > > Having no more ideas at the moment, here is a tarball of 13 patches >> > > of commits touching entry_64.S up to 4.0.0-rc1. >> > > >> > > x0001.patch is the latest, x0015.patch is the oldest. >> > > >> > > Patches 0003 and 0008 are not there since 0003 is empty merge patch >> > > and 0008 does some PCI fixup. >> > > >> > > If this breakage is recent, it ought to be one of these. >> > > Most of them do some non-trivial surgery. >> > > >> > > Even though I did not spot anything suspicious in them, >> > > entry.S is notorious for subtle breakage. >> > > >> > > Try reverting them in sequence starting from x0001.patch >> > > and see reverting which one makes crash disappear. >> > >> > OK, I'm going to check these git series. >> >> Reverting the commit >> 96b6352c12711d5c0bb7157f49c92580248e8146 >> x86_64, entry: Remove the syscall exit audit and schedule optimizations >> >> seems enough. After reverting this one, the machine runs stable with >> the kvm stress test. >> >> (I'll keep test running for a while; at the previous bisection, I hit >> the bug right after posting the mail ;) > > It survived long enough, so this looks like the spot. > > Also, I checked the patch below instead of reverting the commit, and > this seems working, too. > > > Takashi > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index 1d74d161687c..5340ac7f88a9 100644 > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -364,12 +364,12 @@ system_call_fastpath: > * Has incomplete stack frame and undefined top of stack. > */ > ret_from_sys_call: > - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > - > LOCKDEP_SYS_EXIT > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > + > CFI_REMEMBER_STATE > /* > * sysretq will re-enable interrupts: The crash you're seeing could certainly be caused by an IRQ at the wrong time. However: int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call and GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF so with or without your little patch, we're turning off IRQs very quickly. retint_swapgs also turnes off interrupts before doing anything. So I don't see how your patch would have any effect. I'm starting to wonder if the problem has something to do with running fire_user_return_notifiers with IRQs on. We appear to do that, and it seems rather questionable to me that it's safe, given the sneaky things that KVM does in there. If we end up in user mode with a bad MSR_SYSCALL_MASK, we could see your crash, although I don't see how that would happen either. I'll try to write a diagnostic patch later this morning. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 15:55:26 +0100, Takashi Iwai wrote: > > At Thu, 19 Mar 2015 14:47:12 +0100, > Takashi Iwai wrote: > > > > At Thu, 19 Mar 2015 13:48:56 +0100, > > Denys Vlasenko wrote: > > > > > > Having no more ideas at the moment, here is a tarball of 13 patches > > > of commits touching entry_64.S up to 4.0.0-rc1. > > > > > > x0001.patch is the latest, x0015.patch is the oldest. > > > > > > Patches 0003 and 0008 are not there since 0003 is empty merge patch > > > and 0008 does some PCI fixup. > > > > > > If this breakage is recent, it ought to be one of these. > > > Most of them do some non-trivial surgery. > > > > > > Even though I did not spot anything suspicious in them, > > > entry.S is notorious for subtle breakage. > > > > > > Try reverting them in sequence starting from x0001.patch > > > and see reverting which one makes crash disappear. > > > > OK, I'm going to check these git series. > > Reverting the commit > 96b6352c12711d5c0bb7157f49c92580248e8146 > x86_64, entry: Remove the syscall exit audit and schedule optimizations > > seems enough. After reverting this one, the machine runs stable with > the kvm stress test. > > (I'll keep test running for a while; at the previous bisection, I hit > the bug right after posting the mail ;) It survived long enough, so this looks like the spot. Also, I checked the patch below instead of reverting the commit, and this seems working, too. Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: > > At Thu, 19 Mar 2015 13:48:56 +0100, > Denys Vlasenko wrote: > > > > Having no more ideas at the moment, here is a tarball of 13 patches > > of commits touching entry_64.S up to 4.0.0-rc1. > > > > x0001.patch is the latest, x0015.patch is the oldest. > > > > Patches 0003 and 0008 are not there since 0003 is empty merge patch > > and 0008 does some PCI fixup. > > > > If this breakage is recent, it ought to be one of these. > > Most of them do some non-trivial surgery. > > > > Even though I did not spot anything suspicious in them, > > entry.S is notorious for subtle breakage. > > > > Try reverting them in sequence starting from x0001.patch > > and see reverting which one makes crash disappear. > > OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) BTW, I also tried to reproduce this on another machine (a Haswell laptop), but I failed, even with the very same kernel. So the bug really seems depending on CPU. Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: > > Having no more ideas at the moment, here is a tarball of 13 patches > of commits touching entry_64.S up to 4.0.0-rc1. > > x0001.patch is the latest, x0015.patch is the oldest. > > Patches 0003 and 0008 are not there since 0003 is empty merge patch > and 0008 does some PCI fixup. > > If this breakage is recent, it ought to be one of these. > Most of them do some non-trivial surgery. > > Even though I did not spot anything suspicious in them, > entry.S is notorious for subtle breakage. > > Try reverting them in sequence starting from x0001.patch > and see reverting which one makes crash disappear. OK, I'm going to check these git series. Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 10:55 PM, Andy Lutomirski wrote: > On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko wrote: >>> in 'irq_return_via_sysret' is new to 4.0, and instead of entering the >>> kernel with a user stack poiinter, maybe we're *exiting* the kernel, >>> and have just reloaded the user stack pointer when "USERGS_SYSRET64" >>> takes some fault. >> >> Yes, so far we happily thought that SYSRET never fails... >> >> This merits adding some code which would at least BUG_ON >> if the faulting address is seen to match SYSRET64. > > sysret64 can only fail with #GP, and we're totally screwed if that > happens, although I agree about the BUG_ON in principle. Where would > we add it that would help in this case, though? We never even made it > to C code. I propose to widen such check to catch any cases where we enter an exception from CPL0 and find that our RSP is bad. This will cover the case of faulting SYSRET and possible future obscure bugs. What this patch does is it stops CPU dead if we find itself with userspace RSP (not saved RSP, but _actual_ %RSP register) in an exception handler prologue: diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index a0a3a6e..53a34ba 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -930,6 +930,12 @@ ENTRY(\sym) INTR_FRAME .endif + testq %rsp,%rsp + /* If RSP is positive, we are in kernel but have userspace RSP. */ + /* We corrupted user stack already by storing iret frame there. */ + /* This is supposed to be impossible. */ +0: jns 0b + ASM_CLAC PARAVIRT_ADJUST_EXCEPTION_FRAME Hopefully then NMI watchdog will kill it, and we'll get better data. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. revert_me_13.tar.gz Description: GNU Zip compressed data
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 11:58:19 +0100, Denys Vlasenko wrote: > > On 03/19/2015 11:16 AM, Takashi Iwai wrote: > > The kconfig is attached > > You also have PARAVIRT enabled, like Stefan. > > Just to obtain an additional data point, can you guys > try reproducing it with PARAVIRT off? > > It won't help us that much if it won't trigger with PARAVIRT off > (the bug may just become much harder to trigger), but if it would > still happen, that'd reduce the number of things we can suspect. I tried w/o PARAVIRT and the bug is still seen. The dmesg is attached below. Takashi [0.00] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 4.0.0-rc4-testz+ (tiwai@alsa1) (gcc version 4.8.3 20141208 [gcc-4_8-branch revision 218481] (SUSE Linux) ) #126 SMP PREEMPT Thu Mar 19 12:07:56 CET 2015 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.0.0-rc4-testz+ root=UUID=1190c997-9457-4dde-8a57-0cce0aae93c6 resume=/dev/disk/by-id/ata-INTEL_SSDSA2M080G2GN_CVPO9412011S080BGN-part1 splash=silent quiet showopts crashkernel=512M-:256M [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x1fff] usable [0.00] BIOS-e820: [mem 0x2000-0x201f] reserved [0.00] BIOS-e820: [mem 0x2020-0x40003fff] usable [0.00] BIOS-e820: [mem 0x40004000-0x40004fff] reserved [0.00] BIOS-e820: [mem 0x40005000-0xd6709fff] usable [0.00] BIOS-e820: [mem 0xd670a000-0xd67f] reserved [0.00] BIOS-e820: [mem 0xd680-0xd6f55fff] usable [0.00] BIOS-e820: [mem 0xd6f56000-0xd6ff] reserved [0.00] BIOS-e820: [mem 0xd700-0xd77b3fff] usable [0.00] BIOS-e820: [mem 0xd77b4000-0xd77f] ACPI data [0.00] BIOS-e820: [mem 0xd780-0xd8f1dfff] usable [0.00] BIOS-e820: [mem 0xd8f1e000-0xd8ff] ACPI NVS [0.00] BIOS-e820: [mem 0xd900-0xda6e2fff] usable [0.00] BIOS-e820: [mem 0xda6e3000-0xda8e1fff] reserved [0.00] BIOS-e820: [mem 0xda8e2000-0xda924fff] ACPI NVS [0.00] BIOS-e820: [mem 0xda925000-0xdaff] usable [0.00] BIOS-e820: [mem 0xdb80-0xdf9f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00021e5f] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x21e600 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-D3FFF write-protect [0.00] D4000-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask E write-back [0.00] 1 base 2 mask FE000 write-back [0.00] 2 base 0E000 mask FE000 uncachable [0.00] 3 base 0DC00 mask FFC00 uncachable [0.00] 4 base 0DB80 mask FFF80 uncachable [0.00] 5 base 21F00 mask FFF00 uncachable [0.00] 6 base 21E80 mask FFF80 uncachable [0.00] 7 base 21E60 mask FFFE0 uncachable [0.00] 8 disabled [0.00] 9 disabled [0.00] PAT configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] e820: update [mem 0xdb80-0x] usable ==> reserved [0.00] e820: last_pfn = 0xdb000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fda40-0x000fda4f] mapped at [880fda40] [
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/19/2015 11:16 AM, Takashi Iwai wrote: > The kconfig is attached You also have PARAVIRT enabled, like Stefan. Just to obtain an additional data point, can you guys try reproducing it with PARAVIRT off? It won't help us that much if it won't trigger with PARAVIRT off (the bug may just become much harder to trigger), but if it would still happen, that'd reduce the number of things we can suspect. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi, sorry to take time to back to this topic. At Wed, 18 Mar 2015 15:29:14 -0700, Andy Lutomirski wrote: > > On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: > > On Wed, 18 Mar 2015, Andy Lutomirski wrote: > > > >> sysret64 can only fail with #GP, and we're totally screwed if that > >> happens, > > > > But what if the GPF handler pagefaults afterwards? It'd be operating on > > user stack already. > > Good point. > > Stefan, can you try changing the first "jne > opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in > entry_64.S and seeing if you can reproduce this? (Is it easy enough > to reproduce that this would tell us anything?) I tried this, and the same crash still happens. On my machine (a Dell desktop with IvyBridge 4-core, 8GB RAM), I could reproduce it relatively easily. Start a desktop session as usual, and start a KVM with 1GB memory 4 CPU, and start compiling a kernel on VM with make -j4. Meanwhile, start compiling a kernel with make -j8 on the host, too. So nothing too special there. The kconfig is attached below. Currently I haven't set up kdump for this machine due to the disk space. Will try to adjust somehow from now on. Takashi .config Description: Binary data
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Good Morning :-) Am 19.03.2015 um 01:57 schrieb Andy Lutomirski: > Stefan, do you happen to know whether your disassembly of page_fault > came from the instructions in memory or if they came from the vmlinux > file? Not that I have any relevant ideas there. I think they came from memory. At least, the disassemble in crash... crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. ...is different than the one from loading vmlinux in gdb: Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done. Reading symbols from /usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done. (gdb) disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data16 xchg %ax,%ax 0x816834a3 <+3>: callq *0x7a5b07(%rip)# 0x81e28fb0 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Good Morning :-) Am 19.03.2015 um 01:57 schrieb Andy Lutomirski: Stefan, do you happen to know whether your disassembly of page_fault came from the instructions in memory or if they came from the vmlinux file? Not that I have any relevant ideas there. I think they came from memory. At least, the disassemble in crash... crash disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data32 xchg %ax,%ax 0x816834a3 +3: data32 xchg %ax,%ax 0x816834a6 +6: data32 xchg %ax,%ax 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. ...is different than the one from loading vmlinux in gdb: Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done. Reading symbols from /usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done. (gdb) disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data16 xchg %ax,%ax 0x816834a3 +3: callq *0x7a5b07(%rip)# 0x81e28fb0 pv_irq_ops+48 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi, sorry to take time to back to this topic. At Wed, 18 Mar 2015 15:29:14 -0700, Andy Lutomirski wrote: On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina jkos...@suse.cz wrote: On Wed, 18 Mar 2015, Andy Lutomirski wrote: sysret64 can only fail with #GP, and we're totally screwed if that happens, But what if the GPF handler pagefaults afterwards? It'd be operating on user stack already. Good point. Stefan, can you try changing the first jne opportunistic_sysret_failed to jmp opportunistic_sysret_failed in entry_64.S and seeing if you can reproduce this? (Is it easy enough to reproduce that this would tell us anything?) I tried this, and the same crash still happens. On my machine (a Dell desktop with IvyBridge 4-core, 8GB RAM), I could reproduce it relatively easily. Start a desktop session as usual, and start a KVM with 1GB memory 4 CPU, and start compiling a kernel on VM with make -j4. Meanwhile, start compiling a kernel with make -j8 on the host, too. So nothing too special there. The kconfig is attached below. Currently I haven't set up kdump for this machine due to the disk space. Will try to adjust somehow from now on. Takashi .config Description: Binary data
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. revert_me_13.tar.gz Description: GNU Zip compressed data
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/19/2015 11:16 AM, Takashi Iwai wrote: The kconfig is attached You also have PARAVIRT enabled, like Stefan. Just to obtain an additional data point, can you guys try reproducing it with PARAVIRT off? It won't help us that much if it won't trigger with PARAVIRT off (the bug may just become much harder to trigger), but if it would still happen, that'd reduce the number of things we can suspect. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 11:58:19 +0100, Denys Vlasenko wrote: On 03/19/2015 11:16 AM, Takashi Iwai wrote: The kconfig is attached You also have PARAVIRT enabled, like Stefan. Just to obtain an additional data point, can you guys try reproducing it with PARAVIRT off? It won't help us that much if it won't trigger with PARAVIRT off (the bug may just become much harder to trigger), but if it would still happen, that'd reduce the number of things we can suspect. I tried w/o PARAVIRT and the bug is still seen. The dmesg is attached below. Takashi [0.00] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 4.0.0-rc4-testz+ (tiwai@alsa1) (gcc version 4.8.3 20141208 [gcc-4_8-branch revision 218481] (SUSE Linux) ) #126 SMP PREEMPT Thu Mar 19 12:07:56 CET 2015 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.0.0-rc4-testz+ root=UUID=1190c997-9457-4dde-8a57-0cce0aae93c6 resume=/dev/disk/by-id/ata-INTEL_SSDSA2M080G2GN_CVPO9412011S080BGN-part1 splash=silent quiet showopts crashkernel=512M-:256M [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x1fff] usable [0.00] BIOS-e820: [mem 0x2000-0x201f] reserved [0.00] BIOS-e820: [mem 0x2020-0x40003fff] usable [0.00] BIOS-e820: [mem 0x40004000-0x40004fff] reserved [0.00] BIOS-e820: [mem 0x40005000-0xd6709fff] usable [0.00] BIOS-e820: [mem 0xd670a000-0xd67f] reserved [0.00] BIOS-e820: [mem 0xd680-0xd6f55fff] usable [0.00] BIOS-e820: [mem 0xd6f56000-0xd6ff] reserved [0.00] BIOS-e820: [mem 0xd700-0xd77b3fff] usable [0.00] BIOS-e820: [mem 0xd77b4000-0xd77f] ACPI data [0.00] BIOS-e820: [mem 0xd780-0xd8f1dfff] usable [0.00] BIOS-e820: [mem 0xd8f1e000-0xd8ff] ACPI NVS [0.00] BIOS-e820: [mem 0xd900-0xda6e2fff] usable [0.00] BIOS-e820: [mem 0xda6e3000-0xda8e1fff] reserved [0.00] BIOS-e820: [mem 0xda8e2000-0xda924fff] ACPI NVS [0.00] BIOS-e820: [mem 0xda925000-0xdaff] usable [0.00] BIOS-e820: [mem 0xdb80-0xdf9f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00021e5f] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x21e600 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-D3FFF write-protect [0.00] D4000-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask E write-back [0.00] 1 base 2 mask FE000 write-back [0.00] 2 base 0E000 mask FE000 uncachable [0.00] 3 base 0DC00 mask FFC00 uncachable [0.00] 4 base 0DB80 mask FFF80 uncachable [0.00] 5 base 21F00 mask FFF00 uncachable [0.00] 6 base 21E80 mask FFF80 uncachable [0.00] 7 base 21E60 mask FFFE0 uncachable [0.00] 8 disabled [0.00] 9 disabled [0.00] PAT configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] e820: update [mem 0xdb80-0x] usable == reserved [0.00] e820: last_pfn = 0xdb000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fda40-0x000fda4f] mapped at [880fda40] [0.00]
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 10:55 PM, Andy Lutomirski wrote: On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko dvlas...@redhat.com wrote: in 'irq_return_via_sysret' is new to 4.0, and instead of entering the kernel with a user stack poiinter, maybe we're *exiting* the kernel, and have just reloaded the user stack pointer when USERGS_SYSRET64 takes some fault. Yes, so far we happily thought that SYSRET never fails... This merits adding some code which would at least BUG_ON if the faulting address is seen to match SYSRET64. sysret64 can only fail with #GP, and we're totally screwed if that happens, although I agree about the BUG_ON in principle. Where would we add it that would help in this case, though? We never even made it to C code. I propose to widen such check to catch any cases where we enter an exception from CPL0 and find that our RSP is bad. This will cover the case of faulting SYSRET and possible future obscure bugs. What this patch does is it stops CPU dead if we find itself with userspace RSP (not saved RSP, but _actual_ %RSP register) in an exception handler prologue: diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index a0a3a6e..53a34ba 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -930,6 +930,12 @@ ENTRY(\sym) INTR_FRAME .endif + testq %rsp,%rsp + /* If RSP is positive, we are in kernel but have userspace RSP. */ + /* We corrupted user stack already by storing iret frame there. */ + /* This is supposed to be impossible. */ +0: jns 0b + ASM_CLAC PARAVIRT_ADJUST_EXCEPTION_FRAME Hopefully then NMI watchdog will kill it, and we'll get better data. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) BTW, I also tried to reproduce this on another machine (a Haswell laptop), but I failed, even with the very same kernel. So the bug really seems depending on CPU. Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Thu, Mar 19, 2015 at 8:51 AM, Takashi Iwai ti...@suse.de wrote: At Thu, 19 Mar 2015 08:41:57 -0700, Andy Lutomirski wrote: On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai ti...@suse.de wrote: At Thu, 19 Mar 2015 15:55:26 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) It survived long enough, so this looks like the spot. Also, I checked the patch below instead of reverting the commit, and this seems working, too. Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: The crash you're seeing could certainly be caused by an IRQ at the wrong time. However: int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call and GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF so with or without your little patch, we're turning off IRQs very quickly. retint_swapgs also turnes off interrupts before doing anything. So I don't see how your patch would have any effect. What about LOCKDEP_SYS_EXIT? There's a LOCKDEP_SYS_EXIT_IRQ a few lines down in int_ret_from_sys_call, and the syscall slow path falls through directly to int_ret_from_sys_call. I'm going to try to write a diagnostic patch now. I have four separate contractors coming starting half an hour ago*, so it might take a while. * Yeah, right. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 15:55:26 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) It survived long enough, so this looks like the spot. Also, I checked the patch below instead of reverting the commit, and this seems working, too. Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Thu, 19 Mar 2015 08:41:57 -0700, Andy Lutomirski wrote: On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai ti...@suse.de wrote: At Thu, 19 Mar 2015 15:55:26 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) It survived long enough, so this looks like the spot. Also, I checked the patch below instead of reverting the commit, and this seems working, too. Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: The crash you're seeing could certainly be caused by an IRQ at the wrong time. However: int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call and GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF so with or without your little patch, we're turning off IRQs very quickly. retint_swapgs also turnes off interrupts before doing anything. So I don't see how your patch would have any effect. What about LOCKDEP_SYS_EXIT? Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai ti...@suse.de wrote: At Thu, 19 Mar 2015 15:55:26 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 14:47:12 +0100, Takashi Iwai wrote: At Thu, 19 Mar 2015 13:48:56 +0100, Denys Vlasenko wrote: Having no more ideas at the moment, here is a tarball of 13 patches of commits touching entry_64.S up to 4.0.0-rc1. x0001.patch is the latest, x0015.patch is the oldest. Patches 0003 and 0008 are not there since 0003 is empty merge patch and 0008 does some PCI fixup. If this breakage is recent, it ought to be one of these. Most of them do some non-trivial surgery. Even though I did not spot anything suspicious in them, entry.S is notorious for subtle breakage. Try reverting them in sequence starting from x0001.patch and see reverting which one makes crash disappear. OK, I'm going to check these git series. Reverting the commit 96b6352c12711d5c0bb7157f49c92580248e8146 x86_64, entry: Remove the syscall exit audit and schedule optimizations seems enough. After reverting this one, the machine runs stable with the kvm stress test. (I'll keep test running for a while; at the previous bisection, I hit the bug right after posting the mail ;) It survived long enough, so this looks like the spot. Also, I checked the patch below instead of reverting the commit, and this seems working, too. Takashi diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1d74d161687c..5340ac7f88a9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -364,12 +364,12 @@ system_call_fastpath: * Has incomplete stack frame and undefined top of stack. */ ret_from_sys_call: - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ - LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ + CFI_REMEMBER_STATE /* * sysretq will re-enable interrupts: The crash you're seeing could certainly be caused by an IRQ at the wrong time. However: int_ret_from_sys_call_fixup: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_ret_from_sys_call and GLOBAL(int_ret_from_sys_call) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF so with or without your little patch, we're turning off IRQs very quickly. retint_swapgs also turnes off interrupts before doing anything. So I don't see how your patch would have any effect. I'm starting to wonder if the problem has something to do with running fire_user_return_notifiers with IRQs on. We appear to do that, and it seems rather questionable to me that it's safe, given the sneaky things that KVM does in there. If we end up in user mode with a bad MSR_SYSCALL_MASK, we could see your crash, although I don't see how that would happen either. I'll try to write a diagnostic patch later this morning. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 5:57 PM, Andy Lutomirski wrote: > >> sp = 140735967860552, > > 0x7fffa55f1748 > > Note that the double fault happened with rsp == 0x7fffa55eafb8, > which is the saved rsp here - 0x6790. That difference kind of large > to make sense if this is a sysret problem. Not that I have a better > explanation... Actually, that kind of large difference is what I'd expect if it's a GP fault on sysret then cascades to more faults because our kernel stack pointer is crap. So it starts with getting a GP fault due to the sysret, but now we're in la-la-land with really odd core register state, so what's not to say that we don't get a recursive fault. We don't use the kernel stack pointer for getting thread-info any more like we used to, but we still have code like this in entry_64.c: testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) which seems to know that the thread info is below the kernel stack. So let's say that the GP fault starts taking a recursive GP faults (or recursive page faults) due to confusion with thread_info accesses or something. And the stack keeps growing down, because all the faults just fault themselves. Until finally we hit an unmapped area, and that stops it - because while we had recursive faulting before, it was our kernel code that was confused. But now the fault handling ends up takiung a page fault while setting up the error information. You would *not* expect the stack to be unmapped just under the original %rsp value. User space has big frames and probably had deep call chains before it ever hit the problematic case, so there's some "slop" on the user stack. Only when we run out of slop do we get the double-fault. Which explains why you should *not* expect the %rsp values to be similar. And around 30kB of stack before that happens sounds quite reasonable. Now, to be honest, I don't see why we'd get the cascading faults, I just get this feeling that if %rsp is crap, just about anything might go wrong, and that if it's sysret taking a #GP fault, we're just screwed. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 5:23 PM, Stefan Seyfried wrote: > Am 19.03.2015 um 00:22 schrieb Andy Lutomirski: >> On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski wrote: >>> Yes, it's userspace. Thanks for checking, though. >> >> One more stupid hunch: >> >> Can you do: >> x/21xg 8801013d4f58 >> >> If I counted right, that'll dump task_pt_regs(current). > > That's all zeroes: > crash> x /21xg 0x8801013d4f58 > 0x8801013d4f58: 0x 0x > 0x8801013d4f68: 0x 0x > 0x8801013d4f78: 0x 0x > 0x8801013d4f88: 0x 0x > 0x8801013d4f98: 0x 0x > 0x8801013d4fa8: 0x 0x > 0x8801013d4fb8: 0x 0x > 0x8801013d4fc8: 0x 0x > 0x8801013d4fd8: 0x 0x > 0x8801013d4fe8: 0x 0x > 0x8801013d4ff8: 0x > > But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h > wrong, which is at least as likely...). > > #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) > > => I have the task_struct readily available decoded in the crash utility. > > crash> task, search for thread, in thread: > sp0 = 18446612136629993472 > crash> eval 18446612136629993472 > hexadecimal: 8801013d8000 (18014269664677728KB) I did indeed count wrong -- THREAD_SIZE != 0x1000. Whoops. > > crash> print *(struct pt_regs *)(18446612136629993472 - sizeof(struct > pt_regs)) Looks like we last entered via an io_submit syscall. > $20 = { > r15 = 18446744071585666077, > r14 = 16, > r13 = 582, > r12 = 18446612136629993352, > bp = 24, > bx = 18446744071585666061, > r11 = 582, ==flags, which is consistent with a syscall. However, Denys' big cleanup isn't in play here, so we probably did FIXUP_TOP_OF_STACK, maybe even in the syscall in question. > r10 = 10760856, > r9 = 140712613762160, > r8 = 140735967861216, > ax = 1, Entirely resonable if we're trying to exit from io_submit. > cx = 140712476030103, 0x7ffa2d263497 > dx = 140712613782304, > si = 1, > di = 140712589295616, > orig_ax = 209, __NR_io_submit > ip = 140712571864823, 0x7ffa32dc86f7, which is not equal to cx (oddly, given that this seems to have been a syscall) and is canonical. To me, this suggests that FIXUP_TOP_OF_STACK last executed on a different syscall, in which case all this opportunistic sysret stuff is a red herring - we never executed FIXUP_TOP_OF_STACK for this syscall. > cs = 51, __USER_CS > flags = 582, 0x246 (i.e. totally normal for userspace, I think) > sp = 140735967860552, 0x7fffa55f1748 Note that the double fault happened with rsp == 0x7fffa55eafb8, which is the saved rsp here - 0x6790. That difference kind of large to make sense if this is a sysret problem. Not that I have a better explanation... OTOH, if it's a syscall problem, then these regs are from the previous syscall, so 0x6790 byts of additional user stack usage is entirely sensible. Alternatively, we could have taken a whole pile of nested page faults until we crossed into the land of unwritable user stack pages. > ss = 43 __USER_DS > } > > => > r15 = 8168141d > r12 = 8801013d7f88 > bx = 8168140d > r9 = 7ffa355bd470 > ip = 7ffa32dc86f7 > sp = 7fffa55f1748 > > looks somehow legit, to my totally untrained eye (ip and sp actually). One potentially interesting thing that changed is that we now return from KVM to userspace (to the next scheduled task, not necessarily to the run ioctl) via sysret *even if the user return notifier runs*. This was part of the point of the opportunistic sysret code, and KVM seems to be involved here. > > I'm off to bed now (01:20 around here ;), will be back in about 7 hours. Thanks for the evening debugging help :) FWIW, I just noticed that stub_execveat incorrect calls RESTORE_TOP_OF_STACK before jumping to int_ret_from_sys_call. Actually, there seems to be an impressive number of bugs like that (the syscall slow path totally screws this up, but it seems harmless to me). I'm really glad that Denys is removing that code... Stefan, do you happen to know whether your disassembly of page_fault came from the instructions in memory or if they came from the vmlinux file? Not that I have any relevant ideas there. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 19.03.2015 um 00:22 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski wrote: >> Yes, it's userspace. Thanks for checking, though. > > One more stupid hunch: > > Can you do: > x/21xg 8801013d4f58 > > If I counted right, that'll dump task_pt_regs(current). That's all zeroes: crash> x /21xg 0x8801013d4f58 0x8801013d4f58: 0x 0x 0x8801013d4f68: 0x 0x 0x8801013d4f78: 0x 0x 0x8801013d4f88: 0x 0x 0x8801013d4f98: 0x 0x 0x8801013d4fa8: 0x 0x 0x8801013d4fb8: 0x 0x 0x8801013d4fc8: 0x 0x 0x8801013d4fd8: 0x 0x 0x8801013d4fe8: 0x 0x 0x8801013d4ff8: 0x But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h wrong, which is at least as likely...). #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) => I have the task_struct readily available decoded in the crash utility. crash> task, search for thread, in thread: sp0 = 18446612136629993472 crash> eval 18446612136629993472 hexadecimal: 8801013d8000 (18014269664677728KB) crash> print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs)) $20 = { r15 = 18446744071585666077, r14 = 16, r13 = 582, r12 = 18446612136629993352, bp = 24, bx = 18446744071585666061, r11 = 582, r10 = 10760856, r9 = 140712613762160, r8 = 140735967861216, ax = 1, cx = 140712476030103, dx = 140712613782304, si = 1, di = 140712589295616, orig_ax = 209, ip = 140712571864823, cs = 51, flags = 582, sp = 140735967860552, ss = 43 } => r15 = 8168141d r12 = 8801013d7f88 bx = 8168140d r9 = 7ffa355bd470 ip = 7ffa32dc86f7 sp = 7fffa55f1748 looks somehow legit, to my totally untrained eye (ip and sp actually). I'm off to bed now (01:20 around here ;), will be back in about 7 hours. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski wrote: > On Wed, Mar 18, 2015 at 3:38 PM, Stefan Seyfried > wrote: >> Am 18.03.2015 um 23:29 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: On Wed, 18 Mar 2015, Andy Lutomirski wrote: > sysret64 can only fail with #GP, and we're totally screwed if that > happens, But what if the GPF handler pagefaults afterwards? It'd be operating on user stack already. >>> >>> Good point. >>> >>> Stefan, can you try changing the first "jne >>> opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in >>> entry_64.S and seeing if you can reproduce this? (Is it easy enough >>> to reproduce that this would tell us anything?) >> >> I have no good way of reproducing the issue (happens once per week...) >> but apparently Takashi has, so I'd like to hand this task over to him. >> >>> It's a shame that double_fault doesn't record what gs was on entry. >>> If we did sysret -> general_protection -> page_fault -> double_fault, >>> then we'd enter double_fault with usergs, whereas syscall -> >>> page_fault -> double_fault would enter double_fault with kernelgs. >>> >>> Hmm. We may be able to answer this more directly. Stefan, can you >>> dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your >>> page_fault stack at the time of the failure)? That will tell us the >>> faulting address. If that fails, try starting at 7fffa55eb000 >>> instead. >> >> Unfortunately not, is this userspace memory? It's not in the dump I have. >> This issue is the first I have seen where having a full dump would be >> really helpful apart from cosmetic reasons... > > Yes, it's userspace. Thanks for checking, though. One more stupid hunch: Can you do: x/21xg 8801013d4f58 If I counted right, that'll dump task_pt_regs(current). --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:38 PM, Stefan Seyfried wrote: > Am 18.03.2015 um 23:29 schrieb Andy Lutomirski: >> On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: >>> On Wed, 18 Mar 2015, Andy Lutomirski wrote: >>> sysret64 can only fail with #GP, and we're totally screwed if that happens, >>> >>> But what if the GPF handler pagefaults afterwards? It'd be operating on >>> user stack already. >> >> Good point. >> >> Stefan, can you try changing the first "jne >> opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in >> entry_64.S and seeing if you can reproduce this? (Is it easy enough >> to reproduce that this would tell us anything?) > > I have no good way of reproducing the issue (happens once per week...) > but apparently Takashi has, so I'd like to hand this task over to him. > >> It's a shame that double_fault doesn't record what gs was on entry. >> If we did sysret -> general_protection -> page_fault -> double_fault, >> then we'd enter double_fault with usergs, whereas syscall -> >> page_fault -> double_fault would enter double_fault with kernelgs. >> >> Hmm. We may be able to answer this more directly. Stefan, can you >> dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your >> page_fault stack at the time of the failure)? That will tell us the >> faulting address. If that fails, try starting at 7fffa55eb000 >> instead. > > Unfortunately not, is this userspace memory? It's not in the dump I have. > This issue is the first I have seen where having a full dump would be > really helpful apart from cosmetic reasons... Yes, it's userspace. Thanks for checking, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 23:29 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: >> On Wed, 18 Mar 2015, Andy Lutomirski wrote: >> >>> sysret64 can only fail with #GP, and we're totally screwed if that >>> happens, >> >> But what if the GPF handler pagefaults afterwards? It'd be operating on >> user stack already. > > Good point. > > Stefan, can you try changing the first "jne > opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in > entry_64.S and seeing if you can reproduce this? (Is it easy enough > to reproduce that this would tell us anything?) I have no good way of reproducing the issue (happens once per week...) but apparently Takashi has, so I'd like to hand this task over to him. > It's a shame that double_fault doesn't record what gs was on entry. > If we did sysret -> general_protection -> page_fault -> double_fault, > then we'd enter double_fault with usergs, whereas syscall -> > page_fault -> double_fault would enter double_fault with kernelgs. > > Hmm. We may be able to answer this more directly. Stefan, can you > dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your > page_fault stack at the time of the failure)? That will tell us the > faulting address. If that fails, try starting at 7fffa55eb000 > instead. Unfortunately not, is this userspace memory? It's not in the dump I have. This issue is the first I have seen where having a full dump would be really helpful apart from cosmetic reasons... -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: > On Wed, 18 Mar 2015, Andy Lutomirski wrote: > >> sysret64 can only fail with #GP, and we're totally screwed if that >> happens, > > But what if the GPF handler pagefaults afterwards? It'd be operating on > user stack already. Good point. Stefan, can you try changing the first "jne opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in entry_64.S and seeing if you can reproduce this? (Is it easy enough to reproduce that this would tell us anything?) It's a shame that double_fault doesn't record what gs was on entry. If we did sysret -> general_protection -> page_fault -> double_fault, then we'd enter double_fault with usergs, whereas syscall -> page_fault -> double_fault would enter double_fault with kernelgs. Hmm. We may be able to answer this more directly. Stefan, can you dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your page_fault stack at the time of the failure)? That will tell us the faulting address. If that fails, try starting at 7fffa55eb000 instead. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:28 PM, Linus Torvalds wrote: > On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: >> >> But what if the GPF handler pagefaults afterwards? It'd be operating on >> user stack already. > > So I think this might be the answer. We don't see the GP fault, > because we don't have a backtrace, because that backtrace is on the > user stack (which is why the stack trace dumping fails - we should > probably fix that, btw - the second oops is just confusing and not > helpful). > > Is the intel check for canonical address (that __VIRTUAL_MASK_SHIFT > thing) perhaps wrong or not as strict as Intel CPU's do? We'd never > notice in normal situations.. I explicitly tested that I could blow up the kernel if I intentionally broke that test, and I couldn't blow it up with the test as written. That doesn't prove it's correct, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: > > But what if the GPF handler pagefaults afterwards? It'd be operating on > user stack already. So I think this might be the answer. We don't see the GP fault, because we don't have a backtrace, because that backtrace is on the user stack (which is why the stack trace dumping fails - we should probably fix that, btw - the second oops is just confusing and not helpful). Is the intel check for canonical address (that __VIRTUAL_MASK_SHIFT thing) perhaps wrong or not as strict as Intel CPU's do? We'd never notice in normal situations.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 11:20 PM, Andy Lutomirski wrote: >> There is an easy way to test the theory that SYSRET is to blame. >> >> Just replace >> >> movq RCX(%rsp),%rcx >> cmpq %rcx,RIP(%rsp) /* RCX == RIP */ >> jne opportunistic_sysret_failed >> >> this "jne" with "jmp", and try to reproduce. >> > > This is a classic root exploit, and it's why we check for > non-canonical RIP. In theory, that's the only way this can happen. > Intel screwed up -- AMD never fails SYSRET. I'm not saying the code needs to be changed. I'm saying that *people who see the crash* can make this change, run the modified kernel, and if crash disappears - then it is caused by "opportunistic SYSRET". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:18 PM, Linus Torvalds wrote: > On Wed, Mar 18, 2015 at 2:55 PM, Andy Lutomirski wrote: >> >> On Xen, it goes to xen_sysret64, which touches the same percpu >> variables that we touch on entry. So I still like my percpu vmap >> fault hypothesis, even though I don't understand what would trigger >> it. > > I don't dislike the theory per se, but not only don't I see how it > could happen on regular execution on a laptop, but I also don't see > why this fault behavior would be new to 4.0. > > (And I do believe that we should make sure that CPU bringup ends up > faulting in the percpu area, even if I don't really see why that would > be the issue here) > > Afaik, the system call entry code hasn't changed at all. > > What *has* changed is the "paranoid" handling (double-fault has that > magical "paranoid=2" thing, for example) and the return to user-space > code. Indeed. If this were #DB, #BP, or #MC, I'd believe that, but the page fault code didn't change. And double-fault didn't materially change -- the paranoid=2 thing means to opt *out* of the recent changes. So I'm not convinced by that theory. > > Which is really why I don't believe in that syscall thing. Not because > it isn't the obvious culprit, but simply because it hasn't *changed*. > > Or is there something subtle I've missed? We did change one thing here: for the first time* it's possible to exit using sysret when we didn't enter using syscall. But this really shouldn't matter on native, since we don't touch any memory at all between the stack switch and sysret. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, 18 Mar 2015, Andy Lutomirski wrote: > sysret64 can only fail with #GP, and we're totally screwed if that > happens, But what if the GPF handler pagefaults afterwards? It'd be operating on user stack already. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:17 PM, Denys Vlasenko wrote: > On 03/18/2015 10:55 PM, Andy Lutomirski wrote: >> On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko wrote: >>> On 03/18/2015 10:32 PM, Linus Torvalds wrote: On Wed, Mar 18, 2015 at 12:26 PM, Andy Lutomirski wrote: >> >> crash> disassemble page_fault >> Dump of assembler code for function page_fault: >>0x816834a0 <+0>: data32 xchg %ax,%ax >>0x816834a3 <+3>: data32 xchg %ax,%ax >>0x816834a6 <+6>: data32 xchg %ax,%ax >>0x816834a9 <+9>: sub$0x78,%rsp >>0x816834ad <+13>:callq 0x81683620 > > The callq was the double-faulting instruction, and it is indeed the > first function in here that would have accessed the stack. (The sub > *changes* rsp but isn't a memory access.) So, since RSP is bogus, we > page fault, and the page fault is promoted to a double fault. The > surprising thing is that the page fault itself seems to have been > delivered okay, and RSP wasn't on a page boundary. Not at all surprising, and sure it was on a page boundry.. Look closer. %rsp is 7fffa55eafb8. But that's *after* page_fault has done that sub$0x78,%rsp so %rsp when the page fault happened was 0x7fffa55eb030. Which is a different page. >> >> Ah, I forgot to add 0x78. You're right, of course. >> And that page happened to be mapped. So what happened is: - we somehow entered kernel mode without switching stacks (ie presumably syscall) - the user stack was still fine - we took a page fault, which once again didn't switch stacks, because we were already in kernel mode. And this page fault worked, because it just pushed the error code onto the user stack which was mapped. - we now took a second page fault within the page fault handler, because now the stack pointer has been decremented and points one user page down that is *not* mapped, so now that page fault cannot push the error code and return information. Now, how we took that original page fault is sadly not very clear at all. I agree that it's something about system-call (how could we not change stacks otherwise), but why it should have started now, I don't know. I don't think "system_call" has changed at all. Maybe there is something wrong with the new "ret_from_sys_call" logic, and that "use sysret to return to user mode" thing. Because this code sequence: + movq (RSP-RIP)(%rsp),%rsp + USERGS_SYSRET64 in 'irq_return_via_sysret' is new to 4.0, and instead of entering the kernel with a user stack poiinter, maybe we're *exiting* the kernel, and have just reloaded the user stack pointer when "USERGS_SYSRET64" takes some fault. >>> >>> Yes, so far we happily thought that SYSRET never fails... >>> >>> This merits adding some code which would at least BUG_ON >>> if the faulting address is seen to match SYSRET64. >> >> sysret64 can only fail with #GP, and we're totally screwed if that >> happens, although I agree about the BUG_ON in principle. Where would >> we add it that would help in this case, though? We never even made it >> to C code. >> >> In any event, this was a page fault. sysret64 doesn't access memory. > > Let's see. > > Faulting SYSRET will still be in CPL0. > It would drop CPU into the #GP handler > but %rsp is already loaded with _user_ %rsp (!). > > #GP handler will start pushing stuff onto stack, > happily thinking that it is a kernel stack. > > This can cause a page fault. > > Most likely, this page fault won't succeed, > and we'd get a double fault with %pir somewhere in #GP handler. > > Yes, this doesn't entirely matches what we see... > > There is an easy way to test the theory that SYSRET is to blame. > > Just replace > > movq RCX(%rsp),%rcx > cmpq %rcx,RIP(%rsp) /* RCX == RIP */ > jne opportunistic_sysret_failed > > this "jne" with "jmp", and try to reproduce. > This is a classic root exploit, and it's why we check for non-canonical RIP. In theory, that's the only way this can happen. Intel screwed up -- AMD never fails SYSRET. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 2:55 PM, Andy Lutomirski wrote: > > On Xen, it goes to xen_sysret64, which touches the same percpu > variables that we touch on entry. So I still like my percpu vmap > fault hypothesis, even though I don't understand what would trigger > it. I don't dislike the theory per se, but not only don't I see how it could happen on regular execution on a laptop, but I also don't see why this fault behavior would be new to 4.0. (And I do believe that we should make sure that CPU bringup ends up faulting in the percpu area, even if I don't really see why that would be the issue here) Afaik, the system call entry code hasn't changed at all. What *has* changed is the "paranoid" handling (double-fault has that magical "paranoid=2" thing, for example) and the return to user-space code. Which is really why I don't believe in that syscall thing. Not because it isn't the obvious culprit, but simply because it hasn't *changed*. Or is there something subtle I've missed? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 10:55 PM, Andy Lutomirski wrote: > On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko wrote: >> On 03/18/2015 10:32 PM, Linus Torvalds wrote: >>> On Wed, Mar 18, 2015 at 12:26 PM, Andy Lutomirski >>> wrote: > > crash> disassemble page_fault > Dump of assembler code for function page_fault: >0x816834a0 <+0>: data32 xchg %ax,%ax >0x816834a3 <+3>: data32 xchg %ax,%ax >0x816834a6 <+6>: data32 xchg %ax,%ax >0x816834a9 <+9>: sub$0x78,%rsp >0x816834ad <+13>:callq 0x81683620 The callq was the double-faulting instruction, and it is indeed the first function in here that would have accessed the stack. (The sub *changes* rsp but isn't a memory access.) So, since RSP is bogus, we page fault, and the page fault is promoted to a double fault. The surprising thing is that the page fault itself seems to have been delivered okay, and RSP wasn't on a page boundary. >>> >>> Not at all surprising, and sure it was on a page boundry.. >>> >>> Look closer. >>> >>> %rsp is 7fffa55eafb8. >>> >>> But that's *after* page_fault has done that >>> >>> sub$0x78,%rsp >>> >>> so %rsp when the page fault happened was 0x7fffa55eb030. Which is a >>> different page. > > Ah, I forgot to add 0x78. You're right, of course. > >>> >>> And that page happened to be mapped. >>> >>> So what happened is: >>> >>> - we somehow entered kernel mode without switching stacks >>> >>>(ie presumably syscall) >>> >>> - the user stack was still fine >>> >>> - we took a page fault, which once again didn't switch stacks, >>> because we were already in kernel mode. And this page fault worked, >>> because it just pushed the error code onto the user stack which was >>> mapped. >>> >>> - we now took a second page fault within the page fault handler, >>> because now the stack pointer has been decremented and points one user >>> page down that is *not* mapped, so now that page fault cannot push the >>> error code and return information. >>> >>> Now, how we took that original page fault is sadly not very clear at >>> all. I agree that it's something about system-call (how could we not >>> change stacks otherwise), but why it should have started now, I don't >>> know. I don't think "system_call" has changed at all. >>> >>> Maybe there is something wrong with the new "ret_from_sys_call" logic, >>> and that "use sysret to return to user mode" thing. Because this code >>> sequence: >>> >>> + movq (RSP-RIP)(%rsp),%rsp >>> + USERGS_SYSRET64 >>> >>> in 'irq_return_via_sysret' is new to 4.0, and instead of entering the >>> kernel with a user stack poiinter, maybe we're *exiting* the kernel, >>> and have just reloaded the user stack pointer when "USERGS_SYSRET64" >>> takes some fault. >> >> Yes, so far we happily thought that SYSRET never fails... >> >> This merits adding some code which would at least BUG_ON >> if the faulting address is seen to match SYSRET64. > > sysret64 can only fail with #GP, and we're totally screwed if that > happens, although I agree about the BUG_ON in principle. Where would > we add it that would help in this case, though? We never even made it > to C code. > > In any event, this was a page fault. sysret64 doesn't access memory. Let's see. Faulting SYSRET will still be in CPL0. It would drop CPU into the #GP handler but %rsp is already loaded with _user_ %rsp (!). #GP handler will start pushing stuff onto stack, happily thinking that it is a kernel stack. This can cause a page fault. Most likely, this page fault won't succeed, and we'd get a double fault with %pir somewhere in #GP handler. Yes, this doesn't entirely matches what we see... There is an easy way to test the theory that SYSRET is to blame. Just replace movq RCX(%rsp),%rcx cmpq %rcx,RIP(%rsp) /* RCX == RIP */ jne opportunistic_sysret_failed this "jne" with "jmp", and try to reproduce. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko wrote: > On 03/18/2015 10:32 PM, Linus Torvalds wrote: >> On Wed, Mar 18, 2015 at 12:26 PM, Andy Lutomirski >> wrote: crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 >>> >>> The callq was the double-faulting instruction, and it is indeed the >>> first function in here that would have accessed the stack. (The sub >>> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we >>> page fault, and the page fault is promoted to a double fault. The >>> surprising thing is that the page fault itself seems to have been >>> delivered okay, and RSP wasn't on a page boundary. >> >> Not at all surprising, and sure it was on a page boundry.. >> >> Look closer. >> >> %rsp is 7fffa55eafb8. >> >> But that's *after* page_fault has done that >> >> sub$0x78,%rsp >> >> so %rsp when the page fault happened was 0x7fffa55eb030. Which is a >> different page. Ah, I forgot to add 0x78. You're right, of course. >> >> And that page happened to be mapped. >> >> So what happened is: >> >> - we somehow entered kernel mode without switching stacks >> >>(ie presumably syscall) >> >> - the user stack was still fine >> >> - we took a page fault, which once again didn't switch stacks, >> because we were already in kernel mode. And this page fault worked, >> because it just pushed the error code onto the user stack which was >> mapped. >> >> - we now took a second page fault within the page fault handler, >> because now the stack pointer has been decremented and points one user >> page down that is *not* mapped, so now that page fault cannot push the >> error code and return information. >> >> Now, how we took that original page fault is sadly not very clear at >> all. I agree that it's something about system-call (how could we not >> change stacks otherwise), but why it should have started now, I don't >> know. I don't think "system_call" has changed at all. >> >> Maybe there is something wrong with the new "ret_from_sys_call" logic, >> and that "use sysret to return to user mode" thing. Because this code >> sequence: >> >> + movq (RSP-RIP)(%rsp),%rsp >> + USERGS_SYSRET64 >> >> in 'irq_return_via_sysret' is new to 4.0, and instead of entering the >> kernel with a user stack poiinter, maybe we're *exiting* the kernel, >> and have just reloaded the user stack pointer when "USERGS_SYSRET64" >> takes some fault. > > Yes, so far we happily thought that SYSRET never fails... > > This merits adding some code which would at least BUG_ON > if the faulting address is seen to match SYSRET64. sysret64 can only fail with #GP, and we're totally screwed if that happens, although I agree about the BUG_ON in principle. Where would we add it that would help in this case, though? We never even made it to C code. In any event, this was a page fault. sysret64 doesn't access memory. > > Now we only check for faulting IRETQ: > > error_kernelspace: > CFI_REL_OFFSET rcx, RCX+8 > incl %ebx > leaq native_irq_return_iret(%rip),%rcx > cmpq %rcx,RIP+8(%rsp) > je error_bad_iret > >> >> Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' >> makes me suspect it is, and that that is some paravirt rewriting >> area. What does paravirt go for that USERGS_SYSRET64 (or for >> SWAPGS_UNSAFE_STACK, for that matter). On Xen, it goes to xen_sysret64, which touches the same percpu variables that we touch on entry. So I still like my percpu vmap fault hypothesis, even though I don't understand what would trigger it. At the risk of asking awful questions, what happens if we deliver an IST interrupt in vmx_handle_external_intr? Can that happen? It can't be a good thing if it happens. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:49 schrieb Denys Vlasenko: > Stefan, Takashi, can you post your /proc/cpuinfo > and dmesg after boot? susi:~ # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c cpu MHz : 1867.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bugs: bogomips: 3723.96 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (repeats for second core :) I'm running 3.19 now, but the dmesg extracted from the crash dump of 4.0-rc3 is at http://paste.opensuse.org/48196621 -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Stefan, Takashi, can you post your /proc/cpuinfo and dmesg after boot? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:32 schrieb Linus Torvalds: > Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' > makes me suspect it is, and that that is some paravirt rewriting > area. What does paravirt go for that USERGS_SYSRET64 (or for > SWAPGS_UNSAFE_STACK, for that matter). This from the newer kernel package, but I doubt this configuration has been changed in the openSUSE kernel: susi:~ # grep PARAVIRT /boot/config-4.0.0-rc4-1.g126fc64-desktop CONFIG_PARAVIRT=y # CONFIG_PARAVIRT_DEBUG is not set # CONFIG_PARAVIRT_SPINLOCKS is not set # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set CONFIG_PARAVIRT_CLOCK=y So yes, PARAVIRT is enabled. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 10:32 PM, Linus Torvalds wrote: > On Wed, Mar 18, 2015 at 12:26 PM, Andy Lutomirski wrote: >>> >>> crash> disassemble page_fault >>> Dump of assembler code for function page_fault: >>>0x816834a0 <+0>: data32 xchg %ax,%ax >>>0x816834a3 <+3>: data32 xchg %ax,%ax >>>0x816834a6 <+6>: data32 xchg %ax,%ax >>>0x816834a9 <+9>: sub$0x78,%rsp >>>0x816834ad <+13>:callq 0x81683620 >> >> The callq was the double-faulting instruction, and it is indeed the >> first function in here that would have accessed the stack. (The sub >> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we >> page fault, and the page fault is promoted to a double fault. The >> surprising thing is that the page fault itself seems to have been >> delivered okay, and RSP wasn't on a page boundary. > > Not at all surprising, and sure it was on a page boundry.. > > Look closer. > > %rsp is 7fffa55eafb8. > > But that's *after* page_fault has done that > > sub$0x78,%rsp > > so %rsp when the page fault happened was 0x7fffa55eb030. Which is a > different page. > > And that page happened to be mapped. > > So what happened is: > > - we somehow entered kernel mode without switching stacks > >(ie presumably syscall) > > - the user stack was still fine > > - we took a page fault, which once again didn't switch stacks, > because we were already in kernel mode. And this page fault worked, > because it just pushed the error code onto the user stack which was > mapped. > > - we now took a second page fault within the page fault handler, > because now the stack pointer has been decremented and points one user > page down that is *not* mapped, so now that page fault cannot push the > error code and return information. > > Now, how we took that original page fault is sadly not very clear at > all. I agree that it's something about system-call (how could we not > change stacks otherwise), but why it should have started now, I don't > know. I don't think "system_call" has changed at all. > > Maybe there is something wrong with the new "ret_from_sys_call" logic, > and that "use sysret to return to user mode" thing. Because this code > sequence: > > + movq (RSP-RIP)(%rsp),%rsp > + USERGS_SYSRET64 > > in 'irq_return_via_sysret' is new to 4.0, and instead of entering the > kernel with a user stack poiinter, maybe we're *exiting* the kernel, > and have just reloaded the user stack pointer when "USERGS_SYSRET64" > takes some fault. Yes, so far we happily thought that SYSRET never fails... This merits adding some code which would at least BUG_ON if the faulting address is seen to match SYSRET64. Now we only check for faulting IRETQ: error_kernelspace: CFI_REL_OFFSET rcx, RCX+8 incl %ebx leaq native_irq_return_iret(%rip),%rcx cmpq %rcx,RIP+8(%rsp) je error_bad_iret > > Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' > makes me suspect it is, and that that is some paravirt rewriting > area. What does paravirt go for that USERGS_SYSRET64 (or for > SWAPGS_UNSAFE_STACK, for that matter). > > Linus > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried > wrote: >> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried >>> wrote: >> > The relevant thread's stack is here (see ti in the trace): > > 8801013d4000 > > It could be interesting to see what's there. > > I don't suppose you want to try to walk the paging structures to see > if 88023bc8 (i.e. gsbase) and, more specifically, > 88023bc8 + old_rsp and 88023bc8 + kernel_stack are > present? You'd only have to walk one level -- presumably, if the PGD > entry is there, the rest of the entries are okay, too. That's all greek to me :-) I see that there is something at 88023bc8: crash> x /64xg 0x88023bc8 0x88023bc8: 0x 0x 0x88023bc80010: 0x 0x 0x88023bc80020: 0x 0x6686ada9 0x88023bc80030: 0x 0x 0x88023bc80040: 0x 0x [all zeroes] 0x88023bc801f0: 0x 0x old_rsp and kernel_stack seem bogus: crash> print old_rsp Cannot access memory at address 0xa200 gdb: gdb request failed: print old_rsp crash> print kernel_stack Cannot access memory at address 0xaa48 gdb: gdb request failed: print kernel_stack kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: >>> >>> Yup. old_rsp and kernel_stack are offsets relative to gsbase. >>> crash> x /64xg 0x88023bc8aa00 0x88023bc8aa00: 0x 0x >>> >>> [...] >>> >>> I don't know enough about crashkernel to know whether the fact that >>> this worked means anything. >> >> AFAIK this just means that the memory at this location is included in >> the dump :-) >> >>> Can you dump the page of physical memory at 0x4779a067? That's the PGD. >> >> Unfortunately not, this is a partial dump (I think the default config in >> openSUSE, but I might have changed it some time ago) and the dump_level >> is 31 which means that the following are excluded: >> >> | |cache |cache | | >> dump | zero |without|with | user | free >>level | page |private|private| data | page >> ---+--+---+---+--+-- >> 31 | X | X | X | X | X >> >> so this: >> crash> x /64xg 0x4779a067 >> 0x4779a067: Cannot access memory at address 0x4779a067 >> gdb: gdb request failed: x /64xg >> >> probably just means, that the PGD falls in one of the above excluded >> categories. > > I suspect that it actually means that gdb sees virtual addresses, not > physical addresses. But I screwed up completely -- "PGD" in the dump > is the PGD *entry*, not the PGD pointer. in crash, usually physical addresses work (it's a sophisticated wrapper around gdb AFAICT) > > We could plausibly fish it out from current->mm, but that's a mess. I'll come to that later I > don't suppose that "info registers" or "p/x $cr3" will show the cr3 > value? No, that does not work from crash. But current->mm is easy: crash> task|grep mm start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" mm = 0x8800b8a9c040, active_mm = 0x8800b8a9c040, comm = "qemu-system-x86", and (guessing the type :-) crash> print *(struct mm_struct *)0x8800b8a9c040|grep pgd pgd = 0x880002d7e000, But if that's correct, pgd contains all zeroes: crash> print *(pgd_t *)0x880002d7e000 $15 = { pgd = 0 } crash> x /16xg 0x880002d7e000 0x880002d7e000: 0x 0x 0x880002d7e010: 0x 0x 0x880002d7e020: 0x 0x 0x880002d7e030: 0x 0x 0x880002d7e040: 0x 0x 0x880002d7e050: 0x 0x 0x880002d7e060: 0x 0x 0x880002d7e070: 0x 0x > In any case, Denys is right -- my theory doesn't really hold water on > non-SMAP systems. Mine is definitely not new enough for this feature :) Maybe it would be more helpful if Takashi who is able to reproduce this more reliably than me would do a crash dump, preferably with a lower dumplevel, to investigate on. I have seen the bug two or three times in a week or two, which makes waiting for it to happen a boring experience. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 12:26 PM, Andy Lutomirski wrote: >> >> crash> disassemble page_fault >> Dump of assembler code for function page_fault: >>0x816834a0 <+0>: data32 xchg %ax,%ax >>0x816834a3 <+3>: data32 xchg %ax,%ax >>0x816834a6 <+6>: data32 xchg %ax,%ax >>0x816834a9 <+9>: sub$0x78,%rsp >>0x816834ad <+13>:callq 0x81683620 > > The callq was the double-faulting instruction, and it is indeed the > first function in here that would have accessed the stack. (The sub > *changes* rsp but isn't a memory access.) So, since RSP is bogus, we > page fault, and the page fault is promoted to a double fault. The > surprising thing is that the page fault itself seems to have been > delivered okay, and RSP wasn't on a page boundary. Not at all surprising, and sure it was on a page boundry.. Look closer. %rsp is 7fffa55eafb8. But that's *after* page_fault has done that sub$0x78,%rsp so %rsp when the page fault happened was 0x7fffa55eb030. Which is a different page. And that page happened to be mapped. So what happened is: - we somehow entered kernel mode without switching stacks (ie presumably syscall) - the user stack was still fine - we took a page fault, which once again didn't switch stacks, because we were already in kernel mode. And this page fault worked, because it just pushed the error code onto the user stack which was mapped. - we now took a second page fault within the page fault handler, because now the stack pointer has been decremented and points one user page down that is *not* mapped, so now that page fault cannot push the error code and return information. Now, how we took that original page fault is sadly not very clear at all. I agree that it's something about system-call (how could we not change stacks otherwise), but why it should have started now, I don't know. I don't think "system_call" has changed at all. Maybe there is something wrong with the new "ret_from_sys_call" logic, and that "use sysret to return to user mode" thing. Because this code sequence: + movq (RSP-RIP)(%rsp),%rsp + USERGS_SYSRET64 in 'irq_return_via_sysret' is new to 4.0, and instead of entering the kernel with a user stack poiinter, maybe we're *exiting* the kernel, and have just reloaded the user stack pointer when "USERGS_SYSRET64" takes some fault. Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' makes me suspect it is, and that that is some paravirt rewriting area. What does paravirt go for that USERGS_SYSRET64 (or for SWAPGS_UNSAFE_STACK, for that matter). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried wrote: > Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: >> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried >> wrote: > The relevant thread's stack is here (see ti in the trace): 8801013d4000 It could be interesting to see what's there. I don't suppose you want to try to walk the paging structures to see if 88023bc8 (i.e. gsbase) and, more specifically, 88023bc8 + old_rsp and 88023bc8 + kernel_stack are present? You'd only have to walk one level -- presumably, if the PGD entry is there, the rest of the entries are okay, too. >>> >>> That's all greek to me :-) >>> >>> I see that there is something at 88023bc8: >>> >>> crash> x /64xg 0x88023bc8 >>> 0x88023bc8: 0x 0x >>> 0x88023bc80010: 0x 0x >>> 0x88023bc80020: 0x 0x6686ada9 >>> 0x88023bc80030: 0x 0x >>> 0x88023bc80040: 0x 0x >>> [all zeroes] >>> 0x88023bc801f0: 0x 0x >>> >>> old_rsp and kernel_stack seem bogus: >>> crash> print old_rsp >>> Cannot access memory at address 0xa200 >>> gdb: gdb request failed: print old_rsp >>> crash> print kernel_stack >>> Cannot access memory at address 0xaa48 >>> gdb: gdb request failed: print kernel_stack >>> >>> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: >> >> Yup. old_rsp and kernel_stack are offsets relative to gsbase. >> >>> >>> crash> x /64xg 0x88023bc8aa00 >>> 0x88023bc8aa00: 0x 0x >> >> [...] >> >> I don't know enough about crashkernel to know whether the fact that >> this worked means anything. > > AFAIK this just means that the memory at this location is included in > the dump :-) > >> Can you dump the page of physical memory at 0x4779a067? That's the PGD. > > Unfortunately not, this is a partial dump (I think the default config in > openSUSE, but I might have changed it some time ago) and the dump_level > is 31 which means that the following are excluded: > > | |cache |cache | | > dump | zero |without|with | user | free >level | page |private|private| data | page > ---+--+---+---+--+-- > 31 | X | X | X | X | X > > so this: > crash> x /64xg 0x4779a067 > 0x4779a067: Cannot access memory at address 0x4779a067 > gdb: gdb request failed: x /64xg > > probably just means, that the PGD falls in one of the above excluded > categories. I suspect that it actually means that gdb sees virtual addresses, not physical addresses. But I screwed up completely -- "PGD" in the dump is the PGD *entry*, not the PGD pointer. We could plausibly fish it out from current->mm, but that's a mess. I don't suppose that "info registers" or "p/x $cr3" will show the cr3 value? In any case, Denys is right -- my theory doesn't really hold water on non-SMAP systems. --Andy > > Best regards, > > Stefan > -- > Stefan Seyfried > Linux Consultant & Developer -- GPG Key: 0x731B665B > > B1 Systems GmbH > Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de > GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 2:06 PM, Denys Vlasenko wrote: > On 03/18/2015 09:49 PM, Andy Lutomirski wrote: >> On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko wrote: >>> On 03/18/2015 08:26 PM, Andy Lutomirski wrote: Hi Linus- You seem to enjoy debugging these things. Want to give this a shot? My guess is a vmalloc fault accessing either old_rsp or kernel_stack right after swapgs in syscall entry. >>> >>> The code is: >>> >>> ENTRY(system_call) >>> SWAPGS_UNSAFE_STACK >>> GLOBAL(system_call_after_swapgs) >>> movq%rsp,PER_CPU_VAR(rsp_scratch) >>> movqPER_CPU_VAR(kernel_stack),%rsp >>> >>> If PER_CPU_VAR(var) memory access can page fault >>> (I was thinking this is ensured to never fault), >>> then on these two instructions such page fault >>> will be fatal: we will still have userspace %rsp. >>> >>> I thought we can only get a NMI or debug interrupt here, >>> and they are both set up to use IST stacks >>> to prevent this scenario (among other reasons). >> >> I don't think that #DB is possible -- we should never have a >> watchpoint on percpu memory like that (unless we're using kgdb, in >> which case I think that kgdb should be fixed). > > And #DB shouldn't cause a problem even if it happens (it's on > an IST stack). > > I was thinking about it more and the thing is, CPU did manage > to enter page fault handler. > > It means that it managed to store iret frame. > > This means that stores to (%rsp) worked, whatever %rsp is > (even if it points to user's page). > > The double fault happened only when CALL insn inside the handler > attempted to push yet another word. _This_ is what did not work. > > Why? > > I almost ready to declare that it's SMAP triggering: > that attempts to access (write to) userspace were caught. > However, disassembly shows > > crash> disassemble page_fault > Dump of assembler code for function page_fault: >0x816834a0 <+0>: data32 xchg %ax,%ax >0x816834a3 <+3>: data32 xchg %ax,%ax >0x816834a6 <+6>: data32 xchg %ax,%ax >0x816834a9 <+9>: sub$0x78,%rsp >0x816834ad <+13>:callq 0x81683620 > KABOOM HERE^^^ >0x816834b2 <+18>:mov%rsp,%rdi >0x816834b5 <+21>:mov0x78(%rsp),%rsi >0x816834ba <+26>:movq $0x,0x78(%rsp) >0x816834c3 <+35>:callq 0x810504e0 >0x816834c8 <+40>:jmpq 0x816836d0 > End of assembler dump. > > Those NOPs at the beginning are ASM_CLAC and PARAVIRT_ADJUST_EXCEPTION_FRAME > from this source: > > > .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 > ENTRY(\sym) > /* Sanity check */ > .if \shift_ist != -1 && \paranoid == 0 > .error "using shift_ist requires paranoid=1" > .endif > > .if \has_error_code > XCPT_FRAME > .else > INTR_FRAME > .endif > > ASM_CLAC > PARAVIRT_ADJUST_EXCEPTION_FRAME > > subq $ORIG_RAX-R15, %rsp > call error_entry > ... > > If ASM_CLAC is replaced by NOPs, this CPU must be not SMAP capable. > If so, then another store to (%rsp) should have worked too... > > > Stefan, Takashi - are you seeing this on SMAP-capable CPUs? That's why I asked if this was Broadwell. It's not :( --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried > wrote: >>> The relevant thread's stack is here (see ti in the trace): >>> >>> 8801013d4000 >>> >>> It could be interesting to see what's there. >>> >>> I don't suppose you want to try to walk the paging structures to see >>> if 88023bc8 (i.e. gsbase) and, more specifically, >>> 88023bc8 + old_rsp and 88023bc8 + kernel_stack are >>> present? You'd only have to walk one level -- presumably, if the PGD >>> entry is there, the rest of the entries are okay, too. >> >> That's all greek to me :-) >> >> I see that there is something at 88023bc8: >> >> crash> x /64xg 0x88023bc8 >> 0x88023bc8: 0x 0x >> 0x88023bc80010: 0x 0x >> 0x88023bc80020: 0x 0x6686ada9 >> 0x88023bc80030: 0x 0x >> 0x88023bc80040: 0x 0x >> [all zeroes] >> 0x88023bc801f0: 0x 0x >> >> old_rsp and kernel_stack seem bogus: >> crash> print old_rsp >> Cannot access memory at address 0xa200 >> gdb: gdb request failed: print old_rsp >> crash> print kernel_stack >> Cannot access memory at address 0xaa48 >> gdb: gdb request failed: print kernel_stack >> >> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: > > Yup. old_rsp and kernel_stack are offsets relative to gsbase. > >> >> crash> x /64xg 0x88023bc8aa00 >> 0x88023bc8aa00: 0x 0x > > [...] > > I don't know enough about crashkernel to know whether the fact that > this worked means anything. AFAIK this just means that the memory at this location is included in the dump :-) > Can you dump the page of physical memory at 0x4779a067? That's the PGD. Unfortunately not, this is a partial dump (I think the default config in openSUSE, but I might have changed it some time ago) and the dump_level is 31 which means that the following are excluded: | |cache |cache | | dump | zero |without|with | user | free level | page |private|private| data | page ---+--+---+---+--+-- 31 | X | X | X | X | X so this: crash> x /64xg 0x4779a067 0x4779a067: Cannot access memory at address 0x4779a067 gdb: gdb request failed: x /64xg probably just means, that the PGD falls in one of the above excluded categories. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 09:49 PM, Andy Lutomirski wrote: > On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko wrote: >> On 03/18/2015 08:26 PM, Andy Lutomirski wrote: >>> Hi Linus- >>> >>> You seem to enjoy debugging these things. Want to give this a shot? >>> My guess is a vmalloc fault accessing either old_rsp or kernel_stack >>> right after swapgs in syscall entry. >> >> The code is: >> >> ENTRY(system_call) >> SWAPGS_UNSAFE_STACK >> GLOBAL(system_call_after_swapgs) >> movq%rsp,PER_CPU_VAR(rsp_scratch) >> movqPER_CPU_VAR(kernel_stack),%rsp >> >> If PER_CPU_VAR(var) memory access can page fault >> (I was thinking this is ensured to never fault), >> then on these two instructions such page fault >> will be fatal: we will still have userspace %rsp. >> >> I thought we can only get a NMI or debug interrupt here, >> and they are both set up to use IST stacks >> to prevent this scenario (among other reasons). > > I don't think that #DB is possible -- we should never have a > watchpoint on percpu memory like that (unless we're using kgdb, in > which case I think that kgdb should be fixed). And #DB shouldn't cause a problem even if it happens (it's on an IST stack). I was thinking about it more and the thing is, CPU did manage to enter page fault handler. It means that it managed to store iret frame. This means that stores to (%rsp) worked, whatever %rsp is (even if it points to user's page). The double fault happened only when CALL insn inside the handler attempted to push yet another word. _This_ is what did not work. Why? I almost ready to declare that it's SMAP triggering: that attempts to access (write to) userspace were caught. However, disassembly shows crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 KABOOM HERE^^^ 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. Those NOPs at the beginning are ASM_CLAC and PARAVIRT_ADJUST_EXCEPTION_FRAME from this source: .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ENTRY(\sym) /* Sanity check */ .if \shift_ist != -1 && \paranoid == 0 .error "using shift_ist requires paranoid=1" .endif .if \has_error_code XCPT_FRAME .else INTR_FRAME .endif ASM_CLAC PARAVIRT_ADJUST_EXCEPTION_FRAME subq $ORIG_RAX-R15, %rsp call error_entry ... If ASM_CLAC is replaced by NOPs, this CPU must be not SMAP capable. If so, then another store to (%rsp) should have worked too... Stefan, Takashi - are you seeing this on SMAP-capable CPUs? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried wrote: > Hi Andy, > > Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: >> Hi Linus- >> >> You seem to enjoy debugging these things. Want to give this a shot? >> My guess is a vmalloc fault accessing either old_rsp or kernel_stack >> right after swapgs in syscall entry. >> >> On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried >> wrote: >>> Hi all, >>> >>> first, I'm kind of happy that I'm not the only one seeing this, and >>> thus my beloved Thinkpad can stay for a bit longer... :-) >>> >>> Then, I'm mostly an amateur when it comes to kernel debugging, so bear >>> with me when I'm stumbling through the code... >>> >>> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: > At Wed, 18 Mar 2015 18:43:52 +0100, > Takashi Iwai wrote: >> >> At Wed, 18 Mar 2015 15:16:42 +0100, >> Takashi Iwai wrote: >>> >>> At Sun, 15 Mar 2015 09:17:15 +0100, >>> Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 OK, we double faulted. Too bad that x86 CPUs don't tell us why. [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[] [] page_fault+0xd/0x30 The double fault happened during page fault processing. Could you disassemble your page_fault function to find the offending instruction? >>> >>> This one is easy: >>> >>> crash> disassemble page_fault >>> Dump of assembler code for function page_fault: >>>0x816834a0 <+0>: data32 xchg %ax,%ax >>>0x816834a3 <+3>: data32 xchg %ax,%ax >>>0x816834a6 <+6>: data32 xchg %ax,%ax >>>0x816834a9 <+9>: sub$0x78,%rsp >>>0x816834ad <+13>:callq 0x81683620 >> >> The callq was the double-faulting instruction, and it is indeed the >> first function in here that would have accessed the stack. (The sub >> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we >> page fault, and the page fault is promoted to a double fault. The >> surprising thing is that the page fault itself seems to have been >> delivered okay, and RSP wasn't on a page boundary. >> >> You wouldn't happen to be using a Broadwell machine? > > No, this is a quite old Thinkpad X200s, Core2duo > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz > stepping: 10 > microcode : 0xa0c > >> The only way to get here with bogus RSP is if we interrupted something >> that was previously running at CPL0 with similarly bogus RSP. >> >> I don't know if I trust CR2. It's 16 bytes lower than I'd expect. >> >>>0x816834b2 <+18>:mov%rsp,%rdi >>>0x816834b5 <+21>:mov0x78(%rsp),%rsi >>>0x816834ba <+26>:movq $0x,0x78(%rsp) >>>0x816834c3 <+35>:callq 0x810504e0 >>>0x816834c8 <+40>:jmpq 0x816836d0 >>> End of assembler dump. >>> >>> [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 Uh, what? That RSP is a user address. [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 This is suspicious. We need to have died, again, of a fatal page fault while dumping the stack. >>> >>> I posted the same problem to the opensuse kernel list shortly before turning >>> to LKML. There,
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko wrote: > On 03/18/2015 08:26 PM, Andy Lutomirski wrote: >> Hi Linus- >> >> You seem to enjoy debugging these things. Want to give this a shot? >> My guess is a vmalloc fault accessing either old_rsp or kernel_stack >> right after swapgs in syscall entry. > > The code is: > > ENTRY(system_call) > SWAPGS_UNSAFE_STACK > GLOBAL(system_call_after_swapgs) > movq%rsp,PER_CPU_VAR(rsp_scratch) > movqPER_CPU_VAR(kernel_stack),%rsp > > If PER_CPU_VAR(var) memory access can page fault > (I was thinking this is ensured to never fault), > then on these two instructions such page fault > will be fatal: we will still have userspace %rsp. > > I thought we can only get a NMI or debug interrupt here, > and they are both set up to use IST stacks > to prevent this scenario (among other reasons). I don't think that #DB is possible -- we should never have a watchpoint on percpu memory like that (unless we're using kgdb, in which case I think that kgdb should be fixed). On the other hand, we can and do take page faults on percpu memory, because percpu lives in vmap space and we lazily populate PGD entries in per-mm PGDs. (That is, when we allocate a kernel PGD entry, we populate it in init_mm's pgd, but we don't proactively copy it during context switches.) But the affected system is a laptop, so there shouldn't be CPU hotplug or enough memory for this to happen. Confused. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On 03/18/2015 08:26 PM, Andy Lutomirski wrote: > Hi Linus- > > You seem to enjoy debugging these things. Want to give this a shot? > My guess is a vmalloc fault accessing either old_rsp or kernel_stack > right after swapgs in syscall entry. The code is: ENTRY(system_call) SWAPGS_UNSAFE_STACK GLOBAL(system_call_after_swapgs) movq%rsp,PER_CPU_VAR(rsp_scratch) movqPER_CPU_VAR(kernel_stack),%rsp If PER_CPU_VAR(var) memory access can page fault (I was thinking this is ensured to never fault), then on these two instructions such page fault will be fatal: we will still have userspace %rsp. I thought we can only get a NMI or debug interrupt here, and they are both set up to use IST stacks to prevent this scenario (among other reasons). -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi Andy, Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: > Hi Linus- > > You seem to enjoy debugging these things. Want to give this a shot? > My guess is a vmalloc fault accessing either old_rsp or kernel_stack > right after swapgs in syscall entry. > > On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried > wrote: >> Hi all, >> >> first, I'm kind of happy that I'm not the only one seeing this, and >> thus my beloved Thinkpad can stay for a bit longer... :-) >> >> Then, I'm mostly an amateur when it comes to kernel debugging, so bear >> with me when I'm stumbling through the code... >> >> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: > > At Wed, 18 Mar 2015 15:16:42 +0100, > Takashi Iwai wrote: >> >> At Sun, 15 Mar 2015 09:17:15 +0100, >> Stefan Seyfried wrote: >>> >>> Hi all, >>> >>> in 4.0-rc I have recently seen a few crashes, always when running >>> KVM guests (IIRC). Today I was able to capture a crash dump, this >>> is the backtrace from dmesg.txt: >>> >>> [242060.604870] PANIC: double fault, error_code: 0x0 >>> >>> OK, we double faulted. Too bad that x86 CPUs don't tell us why. >>> >>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >>> W 4.0.0-rc3-2.gd5c547f-desktop #1 >>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >>> (3.21 ) 12/13/2011 >>> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: >>> 8801013d4000 >>> [242060.604885] RIP: 0010:[] [] >>> page_fault+0xd/0x30 >>> >>> The double fault happened during page fault processing. Could you >>> disassemble your page_fault function to find the offending >>> instruction? >> >> This one is easy: >> >> crash> disassemble page_fault >> Dump of assembler code for function page_fault: >>0x816834a0 <+0>: data32 xchg %ax,%ax >>0x816834a3 <+3>: data32 xchg %ax,%ax >>0x816834a6 <+6>: data32 xchg %ax,%ax >>0x816834a9 <+9>: sub$0x78,%rsp >>0x816834ad <+13>:callq 0x81683620 > > The callq was the double-faulting instruction, and it is indeed the > first function in here that would have accessed the stack. (The sub > *changes* rsp but isn't a memory access.) So, since RSP is bogus, we > page fault, and the page fault is promoted to a double fault. The > surprising thing is that the page fault itself seems to have been > delivered okay, and RSP wasn't on a page boundary. > > You wouldn't happen to be using a Broadwell machine? No, this is a quite old Thinkpad X200s, Core2duo processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c > The only way to get here with bogus RSP is if we interrupted something > that was previously running at CPL0 with similarly bogus RSP. > > I don't know if I trust CR2. It's 16 bytes lower than I'd expect. > >>0x816834b2 <+18>:mov%rsp,%rdi >>0x816834b5 <+21>:mov0x78(%rsp),%rsi >>0x816834ba <+26>:movq $0x,0x78(%rsp) >>0x816834c3 <+35>:callq 0x810504e0 >>0x816834c8 <+40>:jmpq 0x816836d0 >> End of assembler dump. >> >> >>> [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 >>> >>> Uh, what? That RSP is a user address. >>> >>> [242060.604895] RAX: aa40 RBX: 0001 RCX: >>> 81682237 >>> [242060.604896] RDX: aa40 RSI: RDI: >>> 7fffa55eb078 >>> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: >>> >>> [242060.604900] R10: R11: 0293 R12: >>> 004a >>> [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: >>> 7ffa3556cf20 >>> [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() >>> knlGS: >>> [242060.604906] CS: 0010 DS: ES: CR0: 80050033 >>> [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: >>> 000427e0 >>> [242060.604909] Stack: >>> [242060.604942] BUG: unable to handle kernel paging request at >>> 7fffa55eafb8 >>> [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 >>> >>> This is suspicious. We need to have died, again, of a fatal page >>> fault while dumping the stack. >> >> I posted the same problem to the opensuse kernel list shortly before turning >> to LKML. There, Michal Kubecek noted: >> >> "I encountered a similar problem recently. The thing is, x86 >> specification says that on a double fault, RIP and RSP registers are >> undefined, i.e. you
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi Linus- You seem to enjoy debugging these things. Want to give this a shot? My guess is a vmalloc fault accessing either old_rsp or kernel_stack right after swapgs in syscall entry. On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried wrote: > Hi all, > > first, I'm kind of happy that I'm not the only one seeing this, and > thus my beloved Thinkpad can stay for a bit longer... :-) > > Then, I'm mostly an amateur when it comes to kernel debugging, so bear > with me when I'm stumbling through the code... > > Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: >> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: >>> At Wed, 18 Mar 2015 18:43:52 +0100, >>> Takashi Iwai wrote: At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: > > At Sun, 15 Mar 2015 09:17:15 +0100, > Stefan Seyfried wrote: >> >> Hi all, >> >> in 4.0-rc I have recently seen a few crashes, always when running >> KVM guests (IIRC). Today I was able to capture a crash dump, this >> is the backtrace from dmesg.txt: >> >> [242060.604870] PANIC: double fault, error_code: 0x0 >> >> OK, we double faulted. Too bad that x86 CPUs don't tell us why. >> >> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >> W 4.0.0-rc3-2.gd5c547f-desktop #1 >> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >> (3.21 ) 12/13/2011 >> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: >> 8801013d4000 >> [242060.604885] RIP: 0010:[] [] >> page_fault+0xd/0x30 >> >> The double fault happened during page fault processing. Could you >> disassemble your page_fault function to find the offending >> instruction? > > This one is easy: > > crash> disassemble page_fault > Dump of assembler code for function page_fault: >0x816834a0 <+0>: data32 xchg %ax,%ax >0x816834a3 <+3>: data32 xchg %ax,%ax >0x816834a6 <+6>: data32 xchg %ax,%ax >0x816834a9 <+9>: sub$0x78,%rsp >0x816834ad <+13>:callq 0x81683620 The callq was the double-faulting instruction, and it is indeed the first function in here that would have accessed the stack. (The sub *changes* rsp but isn't a memory access.) So, since RSP is bogus, we page fault, and the page fault is promoted to a double fault. The surprising thing is that the page fault itself seems to have been delivered okay, and RSP wasn't on a page boundary. You wouldn't happen to be using a Broadwell machine? The only way to get here with bogus RSP is if we interrupted something that was previously running at CPL0 with similarly bogus RSP. I don't know if I trust CR2. It's 16 bytes lower than I'd expect. >0x816834b2 <+18>:mov%rsp,%rdi >0x816834b5 <+21>:mov0x78(%rsp),%rsi >0x816834ba <+26>:movq $0x,0x78(%rsp) >0x816834c3 <+35>:callq 0x810504e0 >0x816834c8 <+40>:jmpq 0x816836d0 > End of assembler dump. > > >> [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 >> >> Uh, what? That RSP is a user address. >> >> [242060.604895] RAX: aa40 RBX: 0001 RCX: >> 81682237 >> [242060.604896] RDX: aa40 RSI: RDI: >> 7fffa55eb078 >> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: >> >> [242060.604900] R10: R11: 0293 R12: >> 004a >> [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: >> 7ffa3556cf20 >> [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() >> knlGS: >> [242060.604906] CS: 0010 DS: ES: CR0: 80050033 >> [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: >> 000427e0 >> [242060.604909] Stack: >> [242060.604942] BUG: unable to handle kernel paging request at >> 7fffa55eafb8 >> [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 >> >> This is suspicious. We need to have died, again, of a fatal page >> fault while dumping the stack. > > I posted the same problem to the opensuse kernel list shortly before turning > to LKML. There, Michal Kubecek noted: > > "I encountered a similar problem recently. The thing is, x86 > specification says that on a double fault, RIP and RSP registers are > undefined, i.e. you not only can't expect them to contain values > corresponding to the first or second fault but you can't even expect > them to have any usable values at all. Unfortunately the kernel double > fault handler doesn't take this into account and does try to display > usual crash related information so that it itself does usually crash > when trying to show stack content (that's the show_stack_log_lvl() > crash). I think that's not entirely true.
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi all, first, I'm kind of happy that I'm not the only one seeing this, and thus my beloved Thinkpad can stay for a bit longer... :-) Then, I'm mostly an amateur when it comes to kernel debugging, so bear with me when I'm stumbling through the code... Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: >> At Wed, 18 Mar 2015 18:43:52 +0100, >> Takashi Iwai wrote: >>> >>> At Wed, 18 Mar 2015 15:16:42 +0100, >>> Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: > > Hi all, > > in 4.0-rc I have recently seen a few crashes, always when running > KVM guests (IIRC). Today I was able to capture a crash dump, this > is the backtrace from dmesg.txt: > > [242060.604870] PANIC: double fault, error_code: 0x0 > > OK, we double faulted. Too bad that x86 CPUs don't tell us why. > > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G > W 4.0.0-rc3-2.gd5c547f-desktop #1 > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW > (3.21 ) 12/13/2011 > [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: > 8801013d4000 > [242060.604885] RIP: 0010:[] [] > page_fault+0xd/0x30 > > The double fault happened during page fault processing. Could you > disassemble your page_fault function to find the offending > instruction? This one is easy: crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. > [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 > > Uh, what? That RSP is a user address. > > [242060.604895] RAX: aa40 RBX: 0001 RCX: > 81682237 > [242060.604896] RDX: aa40 RSI: RDI: > 7fffa55eb078 > [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: > > [242060.604900] R10: R11: 0293 R12: > 004a > [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: > 7ffa3556cf20 > [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() > knlGS: > [242060.604906] CS: 0010 DS: ES: CR0: 80050033 > [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: > 000427e0 > [242060.604909] Stack: > [242060.604942] BUG: unable to handle kernel paging request at > 7fffa55eafb8 > [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 > > This is suspicious. We need to have died, again, of a fatal page > fault while dumping the stack. I posted the same problem to the opensuse kernel list shortly before turning to LKML. There, Michal Kubecek noted: "I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually crash when trying to show stack content (that's the show_stack_log_lvl() crash). The result is a double fault (which itself would be very hard to debug) followed by a crash in its handler so that analysing the outcome is extremely difficult." I cannot judge if this is true, but it sounded related to solving the problem to me. > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 > [242060.605078] Oops: [#1] PREEMPT SMP > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 > nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace > sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp > ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac > algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge > stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter > ip_tables x_tables af_packet bnep dm_crypt ecb
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: > At Wed, 18 Mar 2015 18:43:52 +0100, > Takashi Iwai wrote: >> >> At Wed, 18 Mar 2015 15:16:42 +0100, >> Takashi Iwai wrote: >> > >> > At Sun, 15 Mar 2015 09:17:15 +0100, >> > Stefan Seyfried wrote: >> > > >> > > Hi all, >> > > >> > > in 4.0-rc I have recently seen a few crashes, always when running >> > > KVM guests (IIRC). Today I was able to capture a crash dump, this >> > > is the backtrace from dmesg.txt: >> > > >> > > [242060.604870] PANIC: double fault, error_code: 0x0 OK, we double faulted. Too bad that x86 CPUs don't tell us why. >> > > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >> > > W 4.0.0-rc3-2.gd5c547f-desktop #1 >> > > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >> > > (3.21 ) 12/13/2011 >> > > [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: >> > > 8801013d4000 >> > > [242060.604885] RIP: 0010:[] [] >> > > page_fault+0xd/0x30 The double fault happened during page fault processing. Could you disassemble your page_fault function to find the offending instruction? >> > > [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 Uh, what? That RSP is a user address. >> > > [242060.604895] RAX: aa40 RBX: 0001 RCX: >> > > 81682237 >> > > [242060.604896] RDX: aa40 RSI: RDI: >> > > 7fffa55eb078 >> > > [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: >> > > >> > > [242060.604900] R10: R11: 0293 R12: >> > > 004a >> > > [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: >> > > 7ffa3556cf20 >> > > [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() >> > > knlGS: >> > > [242060.604906] CS: 0010 DS: ES: CR0: 80050033 >> > > [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: >> > > 000427e0 >> > > [242060.604909] Stack: >> > > [242060.604942] BUG: unable to handle kernel paging request at >> > > 7fffa55eafb8 >> > > [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 This is suspicious. We need to have died, again, of a fatal page fault while dumping the stack. >> > > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 >> > > [242060.605078] Oops: [#1] PREEMPT SMP >> > > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 >> > > nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace >> > > sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp >> > > ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac >> > > algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE >> > > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 >> > > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge >> > > stp llc ebtable_filter ebtables ip6table_filter ip6_tables >> > > iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc >> > > algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant >> > > snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel >> > > snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm >> > > [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp >> > > kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 >> > > uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core >> > > btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 >> > > pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei >> > > cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq >> > > i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video >> > > button processor sg loop >> > > [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >> > > W 4.0.0-rc3-2.gd5c547f-desktop #1 >> > > [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >> > > (3.21 ) 12/13/2011 >> > > [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: >> > > 8801013d4000 >> > > [242060.605396] RIP: 0010:[] [] >> > > show_stack_log_lvl+0x124/0x190 >> > > [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 >> > > [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: >> > > 88023bc7ffc0 >> > > [242060.605396] RDX: RSI: 88023bc84f58 RDI: >> > > >> > > [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: >> > > 0020 >> > > [242060.605396] R10: 0afb R11: 88023bc84bee R12: >> > > 88023bc84f58 >> > > [242060.605396] R13: R14: 81a2fe15 R15: >> > > >> > > [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() >> > > knlGS: >> > >
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: > > At Wed, 18 Mar 2015 15:16:42 +0100, > Takashi Iwai wrote: > > > > At Sun, 15 Mar 2015 09:17:15 +0100, > > Stefan Seyfried wrote: > > > > > > Hi all, > > > > > > in 4.0-rc I have recently seen a few crashes, always when running > > > KVM guests (IIRC). Today I was able to capture a crash dump, this > > > is the backtrace from dmesg.txt: > > > > > > [242060.604870] PANIC: double fault, error_code: 0x0 > > > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G > > > W 4.0.0-rc3-2.gd5c547f-desktop #1 > > > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW > > > (3.21 ) 12/13/2011 > > > [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: > > > 8801013d4000 > > > [242060.604885] RIP: 0010:[] [] > > > page_fault+0xd/0x30 > > > [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 > > > [242060.604895] RAX: aa40 RBX: 0001 RCX: > > > 81682237 > > > [242060.604896] RDX: aa40 RSI: RDI: > > > 7fffa55eb078 > > > [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: > > > > > > [242060.604900] R10: R11: 0293 R12: > > > 004a > > > [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: > > > 7ffa3556cf20 > > > [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() > > > knlGS: > > > [242060.604906] CS: 0010 DS: ES: CR0: 80050033 > > > [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: > > > 000427e0 > > > [242060.604909] Stack: > > > [242060.604942] BUG: unable to handle kernel paging request at > > > 7fffa55eafb8 > > > [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 > > > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 > > > [242060.605078] Oops: [#1] PREEMPT SMP > > > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 > > > nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace > > > sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp > > > ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac > > > algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE > > > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge > > > stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter > > > ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg > > > xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt > > > iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec > > > snd_hwdep snd_pcm_oss snd_pcm > > > [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp > > > kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 > > > uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core > > > btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 > > > pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei > > > cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq > > > i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video > > > button processor sg loop > > > [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G > > > W 4.0.0-rc3-2.gd5c547f-desktop #1 > > > [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW > > > (3.21 ) 12/13/2011 > > > [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: > > > 8801013d4000 > > > [242060.605396] RIP: 0010:[] [] > > > show_stack_log_lvl+0x124/0x190 > > > [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 > > > [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: > > > 88023bc7ffc0 > > > [242060.605396] RDX: RSI: 88023bc84f58 RDI: > > > > > > [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: > > > 0020 > > > [242060.605396] R10: 0afb R11: 88023bc84bee R12: > > > 88023bc84f58 > > > [242060.605396] R13: R14: 81a2fe15 R15: > > > > > > [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() > > > knlGS: > > > [242060.605396] CS: 0010 DS: ES: CR0: 80050033 > > > [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: > > > 000427e0 > > > [242060.605396] Stack: > > > [242060.605396] 02d7e000 0008 88023bc84ee8 > > > 7fffa55eafb8 > > > [242060.605396] 88023bc84f58 7fffa55eafb8 > > > 0040 > > > [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 > > > 81005c36 > > >
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: > > At Sun, 15 Mar 2015 09:17:15 +0100, > Stefan Seyfried wrote: > > > > Hi all, > > > > in 4.0-rc I have recently seen a few crashes, always when running > > KVM guests (IIRC). Today I was able to capture a crash dump, this > > is the backtrace from dmesg.txt: > > > > [242060.604870] PANIC: double fault, error_code: 0x0 > > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW > > 4.0.0-rc3-2.gd5c547f-desktop #1 > > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 > > ) 12/13/2011 > > [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: > > 8801013d4000 > > [242060.604885] RIP: 0010:[] [] > > page_fault+0xd/0x30 > > [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 > > [242060.604895] RAX: aa40 RBX: 0001 RCX: > > 81682237 > > [242060.604896] RDX: aa40 RSI: RDI: > > 7fffa55eb078 > > [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: > > > > [242060.604900] R10: R11: 0293 R12: > > 004a > > [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: > > 7ffa3556cf20 > > [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() > > knlGS: > > [242060.604906] CS: 0010 DS: ES: CR0: 80050033 > > [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: > > 000427e0 > > [242060.604909] Stack: > > [242060.604942] BUG: unable to handle kernel paging request at > > 7fffa55eafb8 > > [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 > > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 > > [242060.605078] Oops: [#1] PREEMPT SMP > > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 > > nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace > > sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp > > ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac > > algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE > > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge > > stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter > > ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg > > xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt > > iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec > > snd_hwdep snd_pcm_oss snd_pcm > > [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel > > kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo > > videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb > > v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr > > e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp > > wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci > > xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg > > loop > > [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW > > 4.0.0-rc3-2.gd5c547f-desktop #1 > > [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 > > ) 12/13/2011 > > [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: > > 8801013d4000 > > [242060.605396] RIP: 0010:[] [] > > show_stack_log_lvl+0x124/0x190 > > [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 > > [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: > > 88023bc7ffc0 > > [242060.605396] RDX: RSI: 88023bc84f58 RDI: > > > > [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: > > 0020 > > [242060.605396] R10: 0afb R11: 88023bc84bee R12: > > 88023bc84f58 > > [242060.605396] R13: R14: 81a2fe15 R15: > > > > [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() > > knlGS: > > [242060.605396] CS: 0010 DS: ES: CR0: 80050033 > > [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: > > 000427e0 > > [242060.605396] Stack: > > [242060.605396] 02d7e000 0008 88023bc84ee8 > > 7fffa55eafb8 > > [242060.605396] 88023bc84f58 7fffa55eafb8 > > 0040 > > [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 > > 81005c36 > > [242060.605396] Call Trace: > > [242060.605396] [] show_regs+0x86/0x210 > > [242060.605396] [] df_debug+0x1f/0x30 > > [242060.605396] [] do_double_fault+0x84/0x100 > > [242060.605396] [] double_fault+0x28/0x30 > > [242060.605396] [] page_fault+0xd/0x30
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: > > IIRC, this didn't happen with the early 4.0-rc, but can't say 100% > sure. I could reproduce the panic on 4.0-rc1, so scratch this comment. Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: > > Hi all, > > in 4.0-rc I have recently seen a few crashes, always when running > KVM guests (IIRC). Today I was able to capture a crash dump, this > is the backtrace from dmesg.txt: > > [242060.604870] PANIC: double fault, error_code: 0x0 > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW >4.0.0-rc3-2.gd5c547f-desktop #1 > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) > 12/13/2011 > [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: > 8801013d4000 > [242060.604885] RIP: 0010:[] [] > page_fault+0xd/0x30 > [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 > [242060.604895] RAX: aa40 RBX: 0001 RCX: > 81682237 > [242060.604896] RDX: aa40 RSI: RDI: > 7fffa55eb078 > [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: > > [242060.604900] R10: R11: 0293 R12: > 004a > [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: > 7ffa3556cf20 > [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() > knlGS: > [242060.604906] CS: 0010 DS: ES: CR0: 80050033 > [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: > 000427e0 > [242060.604909] Stack: > [242060.604942] BUG: unable to handle kernel paging request at > 7fffa55eafb8 > [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 > [242060.605078] Oops: [#1] PREEMPT SMP > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 > nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc > fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async > crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr > ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp > llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter > ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs > libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt > iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep > snd_pcm_oss snd_pcm > [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel > kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo > videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb > v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr > e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp > wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci > xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg > loop > [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW >4.0.0-rc3-2.gd5c547f-desktop #1 > [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) > 12/13/2011 > [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: > 8801013d4000 > [242060.605396] RIP: 0010:[] [] > show_stack_log_lvl+0x124/0x190 > [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 > [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: > 88023bc7ffc0 > [242060.605396] RDX: RSI: 88023bc84f58 RDI: > > [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: > 0020 > [242060.605396] R10: 0afb R11: 88023bc84bee R12: > 88023bc84f58 > [242060.605396] R13: R14: 81a2fe15 R15: > > [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() > knlGS: > [242060.605396] CS: 0010 DS: ES: CR0: 80050033 > [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: > 000427e0 > [242060.605396] Stack: > [242060.605396] 02d7e000 0008 88023bc84ee8 > 7fffa55eafb8 > [242060.605396] 88023bc84f58 7fffa55eafb8 > 0040 > [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 > 81005c36 > [242060.605396] Call Trace: > [242060.605396] [] show_regs+0x86/0x210 > [242060.605396] [] df_debug+0x1f/0x30 > [242060.605396] [] do_double_fault+0x84/0x100 > [242060.605396] [] double_fault+0x28/0x30 > [242060.605396] [] page_fault+0xd/0x30 > [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 > cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 > <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 > [242060.605396] RIP []
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: IIRC, this didn't happen with the early 4.0-rc, but can't say 100% sure. I could reproduce the panic on 4.0-rc1, so scratch this comment. Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.605396] RIP: 0010:[81005b44] [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: 88023bc7ffc0 [242060.605396] RDX: RSI: 88023bc84f58 RDI: [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: 0020 [242060.605396] R10: 0afb R11: 88023bc84bee R12: 88023bc84f58 [242060.605396] R13: R14: 81a2fe15 R15: [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.605396] CS: 0010 DS: ES: CR0: 80050033 [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: 000427e0 [242060.605396] Stack: [242060.605396] 02d7e000 0008 88023bc84ee8 7fffa55eafb8 [242060.605396] 88023bc84f58 7fffa55eafb8 0040 [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 81005c36 [242060.605396] Call Trace: [242060.605396] [81005c36] show_regs+0x86/0x210 [242060.605396] [8104636f] df_debug+0x1f/0x30 [242060.605396] [810041a4] do_double_fault+0x84/0x100 [242060.605396] [81683088] double_fault+0x28/0x30 [242060.605396] [816834ad] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 48 8b 33 48 c7 c7 0d fe a2 81
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.605396] RIP: 0010:[81005b44] [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: 88023bc7ffc0 [242060.605396] RDX: RSI: 88023bc84f58 RDI: [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: 0020 [242060.605396] R10: 0afb R11: 88023bc84bee R12: 88023bc84f58 [242060.605396] R13: R14: 81a2fe15 R15: [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.605396] CS: 0010 DS: ES: CR0: 80050033 [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: 000427e0 [242060.605396] Stack: [242060.605396] 02d7e000 0008 88023bc84ee8 7fffa55eafb8 [242060.605396] 88023bc84f58 7fffa55eafb8 0040 [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 81005c36 [242060.605396] Call Trace: [242060.605396] [81005c36] show_regs+0x86/0x210 [242060.605396] [8104636f] df_debug+0x1f/0x30 [242060.605396] [810041a4]
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: GW 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.605396] RIP: 0010:[81005b44] [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: 88023bc7ffc0 [242060.605396] RDX: RSI: 88023bc84f58 RDI: [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: 0020 [242060.605396] R10: 0afb R11: 88023bc84bee R12: 88023bc84f58 [242060.605396] R13: R14: 81a2fe15 R15: [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.605396] CS: 0010 DS: ES: CR0: 80050033 [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: 000427e0 [242060.605396] Stack: [242060.605396] 02d7e000 0008 88023bc84ee8 7fffa55eafb8 [242060.605396] 88023bc84f58 7fffa55eafb8 0040 [242060.605396] 7ffa356b5d60 000f 7ffa3556cf20 81005c36 [242060.605396] Call Trace: [242060.605396] [81005c36] show_regs+0x86/0x210 [242060.605396] [8104636f] df_debug+0x1f/0x30 [242060.605396] [810041a4] do_double_fault+0x84/0x100 [242060.605396] [81683088] double_fault+0x28/0x30 [242060.605396] [816834ad] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai ti...@suse.de wrote: At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 OK, we double faulted. Too bad that x86 CPUs don't tell us why. [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 The double fault happened during page fault processing. Could you disassemble your page_fault function to find the offending instruction? [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 Uh, what? That RSP is a user address. [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 This is suspicious. We need to have died, again, of a fatal page fault while dumping the stack. [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.605396] RIP: 0010:[81005b44] [81005b44] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 7fffa55eafc0 RBX: 7fffa55eafb8 RCX: 88023bc7ffc0 [242060.605396] RDX: RSI: 88023bc84f58 RDI: [242060.605396] RBP: 88023bc83fc0 R08: 81a2fe15 R09: 0020 [242060.605396] R10: 0afb R11: 88023bc84bee R12: 88023bc84f58 [242060.605396] R13: R14: 81a2fe15 R15: [242060.605396] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.605396] CS: 0010 DS: ES: CR0: 80050033 [242060.605396] CR2: 7fffa55eafb8 CR3: 02d7e000 CR4: 000427e0 [242060.605396] Stack: [242060.605396] 02d7e000 0008 88023bc84ee8