Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
[Excuse the duplicate; this is properly cc'd to bugs.debian.org.] On Tue, 2011-02-15 at 18:52 +0200, Timo Juhani Lindfors wrote: Ben Hutchings b...@decadent.org.uk writes: It's a kernel feature to be more efficient when running in a recognised virtual machine implementation (PV = paravirtualisation). thanks. I think it is the following code from vmi_32.c: [...] I don't understand how the first xchg instruction in 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax can generate a general protection fault. I googled around and found yes - it smells like it tries to deliver vector 0, after the panic code has deinitialized the lapic / ioapic which suggests a qemu bug from http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html Shall I reassign the bug or do you know how to investigate this more? Sorry, I don't have a good idea how to investigate this further. The message you're referring to is quite old and I would expect the bug to have been fixed in qemu since then. Is the KVM host using an old version? Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
Ben Hutchings b...@decadent.org.uk writes: It's a kernel feature to be more efficient when running in a recognised virtual machine implementation (PV = paravirtualisation). thanks. I think it is the following code from vmi_32.c: /* * Apply patch if appropriate, return length of new instruction * sequence. The callee does nop padding for us. */ static unsigned vmi_patch(u8 type, u16 clobbers, void *insns, unsigned long ip, unsigned len) { switch (type) { case PARAVIRT_PATCH(pv_irq_ops.irq_disable): return patch_internal(VMI_CALL_DisableInterrupts, len, insns, ip); case PARAVIRT_PATCH(pv_irq_ops.irq_enable): return patch_internal(VMI_CALL_EnableInterrupts, len, insns, ip); case PARAVIRT_PATCH(pv_irq_ops.restore_fl): return patch_internal(VMI_CALL_SetInterruptMask, len, insns, ip); case PARAVIRT_PATCH(pv_irq_ops.save_fl): return patch_internal(VMI_CALL_GetInterruptMask, len, insns, ip); case PARAVIRT_PATCH(pv_cpu_ops.iret): return patch_internal(VMI_CALL_IRET, len, insns, ip); case PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit): return patch_internal(VMI_CALL_SYSEXIT, len, insns, ip); default: break; } return len; } I don't understand how the first xchg instruction in 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax can generate a general protection fault. I googled around and found yes - it smells like it tries to deliver vector 0, after the panic code has deinitialized the lapic / ioapic which suggests a qemu bug from http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html Shall I reassign the bug or do you know how to investigate this more? -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/84oc6d5jga@sauna.l.org
Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
On Thu, 2011-02-03 at 00:50 +0200, Timo Juhani Lindfors wrote: Ben Hutchings b...@decadent.org.uk writes: Which version of qemu are you using in the host? If you are using kvm-qemu, which kernel version are you using in the host? The host is a xen domU: So this is ordinary qemu, not using hardware virtualisation? lindi1:~$ qemu-system-x86_64 --version QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 2003-2008 Fabrice Bellard lindi1:~$ dpkg-query -W qemu qemu0.12.5+dfsg-3 lindi1:~$ dmesg|head -n3 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax This looks like deliberate patching by the PV-alternatives mechanism. Is this PV-alternatives a linux or qemu feature or are they both cooperating? I tried to look around but couldn't find the code yet. It's a kernel feature to be more efficient when running in a recognised virtual machine implementation (PV = paravirtualisation). Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
Package: linux-2.6 Version: 2.6.32-30 Severity: normal Sometimes when I use shutdown -r now under qemu I get a general protection fault: 6[ 103.542142] e1000 :00:03.0: PCI INT A disabled 0[ 103.543710] Restarting system. 4[ 103.543772] machine restart 0[ 103.544118] general protection fault: fff2 [#1] SMP 0[ 103.544118] last sysfs file: /sys/devices/pci:00/:00:01.1/host0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/manage_start_stop 4[ 103.544118] CPU 0 4[ 103.544118] Modules linked in: parport_pc psmouse mtdchar parport i2c_piix4 processor button pcspkr evdev serio_raw i2c_core ext2 mbcache softdog jffs2 zlib_deflat e lzo_decompress lzo_compress mtdblock mtd_blkdevs mtdram mtd sg sr_mod cdrom sd_mod crc_t10dif ata_generic ata_piix thermal libata floppy thermal_sys scsi_mod e1000 [la st unloaded: scsi_wait_scan] 6[ 103.544118] Pid: 1020, comm: reboot Not tainted 2.6.32-5-amd64 #1 Bochs 6[ 103.544118] RIP: 0010:[810239db] [810239db] native_stop_other_cpus+0x86/0x90 6[ 103.544118] RSP: 0018:88001f2b3e08 EFLAGS: 0246 6[ 103.544118] RAX: 0001 RBX: 0246 RCX: 0001 6[ 103.544118] RDX: 0101010101010101 RSI: 00ff RDI: 0246 6[ 103.544118] RBP: 0001 R08: R09: 0008 6[ 103.544118] R10: R11: 81027dfc R12: 814d6740 6[ 103.544118] R13: R14: R15: 0001 6[ 103.544118] FS: 7ff3e26fe700() GS:88000180() knlGS: 6[ 103.544118] CS: 0010 DS: ES: CR0: 8005003b 6[ 103.544118] CR2: 7ff3e2701000 CR3: 1ddda000 CR4: 06f0 6[ 103.544118] DR0: DR1: DR2: 6[ 103.544118] DR3: DR6: DR7: 4[ 103.544118] Process reboot (pid: 1020, threadinfo 88001f2b2000, task 88001e89e350) 0[ 103.544118] Stack: 4[ 103.544118] 01234567 28121969 fee1dead 8102388b 4[ 103.544118] 0 810235f4 01234567 8105f961 4[ 103.544118] 0 02359000 7ff3e2186000 0[ 103.544118] Call Trace: 4[ 103.544118] [8102388b] ? native_machine_shutdown+0x56/0x6f 4[ 103.544118] [810235f4] ? native_machine_restart+0x21/0x37 4[ 103.544118] [8105f961] ? sys_reboot+0x146/0x190 4[ 103.544118] [810cea3e] ? free_pgtables+0x9c/0xbe 4[ 103.544118] [810fea38] ? dput+0x2c/0x15e 4[ 103.544118] [81103295] ? mntput_no_expire+0x23/0xee 4[ 103.544118] [810ecd86] ? filp_close+0x5b/0x62 4[ 103.544118] [81010b42] ? system_call_fastpath+0x16/0x1b 0[ 103.544118] Code: 76 0e 85 ed 75 e0 48 85 db 74 05 48 ff cb eb d6 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 e8 1e 0b 00 00 48 89 df 57 9d 66 66 90 66 90 5b 5d 41 5c c3 48 83 ec 08 48 8b 05 b0 27 4b 00 1[ 103.544118] RIP [810239db] native_stop_other_cpus+0x86/0x90 4[ 103.544118] RSP 88001f2b3e08 4[ 103.544118] ---[ end trace c7434bd0d5312ada ]--- More info: 1) I disassembled the Code: part with gdb: Dump of assembler code for function f: 0x00600860 f+0:76 0e jbe0x600870 f+16 0x00600862 f+2:85 ed test %ebp,%ebp 0x00600864 f+4:75 e0 jne0x600846 data_start+6 0x00600866 f+6:48 85 db test %rbx,%rbx 0x00600869 f+9:74 05 je 0x600870 f+16 0x0060086b f+11: 48 ff cb dec%rbx 0x0060086e f+14: eb d6 jmp0x600846 data_start+6 0x00600870 f+16: 9c pushfq 0x00600871 f+17: 58 pop%rax 0x00600872 f+18: 66 66 90 xchg %ax,%ax 0x00600875 f+21: 66 90 xchg %ax,%ax 0x00600877 f+23: 48 89 c3 mov%rax,%rbx 0x0060087a f+26: fa cli 0x0060087b f+27: 66 66 90 xchg %ax,%ax 0x0060087e f+30: 66 66 90 xchg %ax,%ax 0x00600881 f+33: e8 1e 0b 00 00 callq 0x6013a4 0x00600886 f+38: 48 89 df mov%rbx,%rdi 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax 0x00600890 f+48: 5b pop%rbx 0x00600891 f+49: 5d pop%rbp 0x00600892 f+50: 41 5c pop%r12 0x00600894 f+52: c3 retq 0x00600895 f+53: 48 83 ec 08sub$0x8,%rsp 0x00600899 f+57: 48 8b 05 b0 27 4b 00 mov 0x4b27b0(%rip),%rax# 0xab3050 0x006008a0 f+64: 00 00 add%al,(%rax) 2) I then run objdump -axdt /usr/lib/debug/boot/vmlinux-2.6.32-5-amd64 to see the
Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
On Wed, 2011-02-02 at 19:42 +0200, Timo Juhani Lindfors wrote: Package: linux-2.6 Version: 2.6.32-30 Severity: normal Sometimes when I use shutdown -r now under qemu I get a general protection fault: Which version of qemu are you using in the host? If you are using kvm-qemu, which kernel version are you using in the host? [...] 4) Observation: RIP == 0x810239db is in the middle of the 810239d9: ff 14 25 f8 69 46 81callq *0x814669f8 instruction! If you compare the on-disk data to the Code: dump you see that two calls have been replaced with the mysterious fragment 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax Is this memory corruption? Or is linux trying to patch the calls? [...] This looks like deliberate patching by the PV-alternatives mechanism. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
Ben Hutchings b...@decadent.org.uk writes: Which version of qemu are you using in the host? If you are using kvm-qemu, which kernel version are you using in the host? The host is a xen domU: lindi1:~$ qemu-system-x86_64 --version QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 2003-2008 Fabrice Bellard lindi1:~$ dpkg-query -W qemu qemu0.12.5+dfsg-3 lindi1:~$ dmesg|head -n3 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011 0x00600889 f+41: 57 push %rdi 0x0060088a f+42: 9d popfq 0x0060088b f+43: 66 66 90 xchg %ax,%ax 0x0060088e f+46: 66 90 xchg %ax,%ax This looks like deliberate patching by the PV-alternatives mechanism. Is this PV-alternatives a linux or qemu feature or are they both cooperating? I tried to look around but couldn't find the code yet. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/84lj1y126i@sauna.l.org