Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-03-06 Thread Ben Hutchings
[Excuse the duplicate; this is properly cc'd to bugs.debian.org.]

On Tue, 2011-02-15 at 18:52 +0200, Timo Juhani Lindfors wrote:
 Ben Hutchings b...@decadent.org.uk writes:
  It's a kernel feature to be more efficient when running in a recognised
  virtual machine implementation (PV = paravirtualisation).
 
 thanks. I think it is the following code from vmi_32.c:
[...]
 I don't understand how the first xchg instruction in
 
 0x00600889 f+41:   57 push   %rdi
 0x0060088a f+42:   9d popfq
 0x0060088b f+43:   66 66 90   xchg   %ax,%ax
 0x0060088e f+46:   66 90  xchg   %ax,%ax
 
 can generate a general protection fault. I googled around and found
 
yes - it smells like it tries to deliver vector 0, after the panic
 code has deinitialized the lapic / ioapic
 
 which suggests a qemu bug from
 http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html
 
 Shall I reassign the bug or do you know how to investigate this more?

Sorry, I don't have a good idea how to investigate this further.  The
message you're referring to is quite old and I would expect the bug to
have been fixed in qemu since then.  Is the KVM host using an old
version?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.



signature.asc
Description: This is a digitally signed message part


Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-02-15 Thread Timo Juhani Lindfors
Ben Hutchings b...@decadent.org.uk writes:
 It's a kernel feature to be more efficient when running in a recognised
 virtual machine implementation (PV = paravirtualisation).

thanks. I think it is the following code from vmi_32.c:

/*
 * Apply patch if appropriate, return length of new instruction
 * sequence.  The callee does nop padding for us.
 */
static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
  unsigned long ip, unsigned len)
{
switch (type) {
case PARAVIRT_PATCH(pv_irq_ops.irq_disable):
return patch_internal(VMI_CALL_DisableInterrupts, len,
  insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.irq_enable):
return patch_internal(VMI_CALL_EnableInterrupts, len,
  insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.restore_fl):
return patch_internal(VMI_CALL_SetInterruptMask, len,
  insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.save_fl):
return patch_internal(VMI_CALL_GetInterruptMask, len,
  insns, ip);
case PARAVIRT_PATCH(pv_cpu_ops.iret):
return patch_internal(VMI_CALL_IRET, len, insns, ip);
case PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit):
return patch_internal(VMI_CALL_SYSEXIT, len, insns, ip);
default:
break;
}
return len;
}

I don't understand how the first xchg instruction in

0x00600889 f+41:   57 push   %rdi
0x0060088a f+42:   9d popfq
0x0060088b f+43:   66 66 90   xchg   %ax,%ax
0x0060088e f+46:   66 90  xchg   %ax,%ax

can generate a general protection fault. I googled around and found

   yes - it smells like it tries to deliver vector 0, after the panic
code has deinitialized the lapic / ioapic

which suggests a qemu bug from
http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html

Shall I reassign the bug or do you know how to investigate this more?




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/84oc6d5jga@sauna.l.org



Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-02-14 Thread Ben Hutchings
On Thu, 2011-02-03 at 00:50 +0200, Timo Juhani Lindfors wrote:
 Ben Hutchings b...@decadent.org.uk writes:
  Which version of qemu are you using in the host?  If you are using
  kvm-qemu, which kernel version are you using in the host?
 
 The host is a xen domU:

So this is ordinary qemu, not using hardware virtualisation?

 lindi1:~$ qemu-system-x86_64 --version
 QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 
 2003-2008 Fabrice Bellard
 lindi1:~$ dpkg-query -W qemu
 qemu0.12.5+dfsg-3
 lindi1:~$ dmesg|head -n3
 [0.00] Initializing cgroup subsys cpuset
 [0.00] Initializing cgroup subsys cpu
 [0.00] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) 
 (b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 
 12 03:40:32 UTC 2011
 
  0x00600889 f+41:   57 push   %rdi
  0x0060088a f+42:   9d popfq
  0x0060088b f+43:   66 66 90   xchg   %ax,%ax
  0x0060088e f+46:   66 90  xchg   %ax,%ax
 
  This looks like deliberate patching by the PV-alternatives mechanism.
 
 Is this PV-alternatives a linux or qemu feature or are they both
 cooperating?
 
 I tried to look around but couldn't find the code yet.

It's a kernel feature to be more efficient when running in a recognised
virtual machine implementation (PV = paravirtualisation).

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-02-02 Thread Timo Juhani Lindfors
Package: linux-2.6
Version: 2.6.32-30
Severity: normal

Sometimes when I use

shutdown -r now

under qemu I get a general protection fault:

6[  103.542142] e1000 :00:03.0: PCI INT A disabled
0[  103.543710] Restarting system.
4[  103.543772] machine restart
0[  103.544118] general protection fault: fff2 [#1] SMP
0[  103.544118] last sysfs file: 
/sys/devices/pci:00/:00:01.1/host0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/manage_start_stop
4[  103.544118] CPU 0
4[  103.544118] Modules linked in: parport_pc psmouse mtdchar parport 
i2c_piix4 processor button pcspkr evdev serio_raw i2c_core ext2 mbcache softdog 
jffs2 zlib_deflat
e lzo_decompress lzo_compress mtdblock mtd_blkdevs mtdram mtd sg sr_mod cdrom 
sd_mod crc_t10dif ata_generic ata_piix thermal libata floppy thermal_sys 
scsi_mod e1000 [la
st unloaded: scsi_wait_scan]
6[  103.544118] Pid: 1020, comm: reboot Not tainted 2.6.32-5-amd64 #1 Bochs
6[  103.544118] RIP: 0010:[810239db]  [810239db] 
native_stop_other_cpus+0x86/0x90
6[  103.544118] RSP: 0018:88001f2b3e08  EFLAGS: 0246
6[  103.544118] RAX: 0001 RBX: 0246 RCX: 
0001
6[  103.544118] RDX: 0101010101010101 RSI: 00ff RDI: 
0246
6[  103.544118] RBP: 0001 R08:  R09: 
0008
6[  103.544118] R10:  R11: 81027dfc R12: 
814d6740
6[  103.544118] R13:  R14:  R15: 
0001
6[  103.544118] FS:  7ff3e26fe700() GS:88000180() 
knlGS:
6[  103.544118] CS:  0010 DS:  ES:  CR0: 8005003b
6[  103.544118] CR2: 7ff3e2701000 CR3: 1ddda000 CR4: 
06f0
6[  103.544118] DR0:  DR1:  DR2: 

6[  103.544118] DR3:  DR6:  DR7: 

4[  103.544118] Process reboot (pid: 1020, threadinfo 88001f2b2000, task 
88001e89e350)
0[  103.544118] Stack:
4[  103.544118]  01234567 28121969 fee1dead 
8102388b
4[  103.544118] 0  810235f4 01234567 
8105f961
4[  103.544118] 0   02359000 
7ff3e2186000
0[  103.544118] Call Trace:
4[  103.544118]  [8102388b] ? native_machine_shutdown+0x56/0x6f
4[  103.544118]  [810235f4] ? native_machine_restart+0x21/0x37
4[  103.544118]  [8105f961] ? sys_reboot+0x146/0x190
4[  103.544118]  [810cea3e] ? free_pgtables+0x9c/0xbe
4[  103.544118]  [810fea38] ? dput+0x2c/0x15e
4[  103.544118]  [81103295] ? mntput_no_expire+0x23/0xee
4[  103.544118]  [810ecd86] ? filp_close+0x5b/0x62
4[  103.544118]  [81010b42] ? system_call_fastpath+0x16/0x1b
0[  103.544118] Code: 76 0e 85 ed 75 e0 48 85 db 74 05 48 ff cb eb d6 9c 58 
66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 e8 1e 0b 00 00 48 89 df 57 9d 66 
66 90 66 90
 5b 5d 41 5c c3 48 83 ec 08 48 8b 05 b0 27 4b 00
1[  103.544118] RIP  [810239db] native_stop_other_cpus+0x86/0x90
4[  103.544118]  RSP 88001f2b3e08
4[  103.544118] ---[ end trace c7434bd0d5312ada ]---

More info:
1) I disassembled the Code: part with gdb:

Dump of assembler code for function f:
0x00600860 f+0:76 0e  jbe0x600870 f+16
0x00600862 f+2:85 ed  test   %ebp,%ebp
0x00600864 f+4:75 e0  jne0x600846 data_start+6
0x00600866 f+6:48 85 db   test   %rbx,%rbx
0x00600869 f+9:74 05  je 0x600870 f+16
0x0060086b f+11:   48 ff cb   dec%rbx
0x0060086e f+14:   eb d6  jmp0x600846 data_start+6
0x00600870 f+16:   9c pushfq
0x00600871 f+17:   58 pop%rax
0x00600872 f+18:   66 66 90   xchg   %ax,%ax
0x00600875 f+21:   66 90  xchg   %ax,%ax
0x00600877 f+23:   48 89 c3   mov%rax,%rbx
0x0060087a f+26:   fa cli
0x0060087b f+27:   66 66 90   xchg   %ax,%ax
0x0060087e f+30:   66 66 90   xchg   %ax,%ax
0x00600881 f+33:   e8 1e 0b 00 00 callq  0x6013a4
0x00600886 f+38:   48 89 df   mov%rbx,%rdi

0x00600889 f+41:   57 push   %rdi
0x0060088a f+42:   9d popfq
0x0060088b f+43:   66 66 90   xchg   %ax,%ax
0x0060088e f+46:   66 90  xchg   %ax,%ax

0x00600890 f+48:   5b pop%rbx
0x00600891 f+49:   5d pop%rbp
0x00600892 f+50:   41 5c  pop%r12
0x00600894 f+52:   c3 retq
0x00600895 f+53:   48 83 ec 08sub$0x8,%rsp
0x00600899 f+57:   48 8b 05 b0 27 4b 00   mov
0x4b27b0(%rip),%rax# 0xab3050
0x006008a0 f+64:   00 00  add%al,(%rax)

2) I then run objdump -axdt
/usr/lib/debug/boot/vmlinux-2.6.32-5-amd64 to see the 

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-02-02 Thread Ben Hutchings
On Wed, 2011-02-02 at 19:42 +0200, Timo Juhani Lindfors wrote:
 Package: linux-2.6
 Version: 2.6.32-30
 Severity: normal
 
 Sometimes when I use
 
 shutdown -r now
 
 under qemu I get a general protection fault:

Which version of qemu are you using in the host?  If you are using
kvm-qemu, which kernel version are you using in the host?

[...]
 4) Observation: RIP == 0x810239db is in the middle of the
 
 810239d9: ff 14 25 f8 69 46 81callq  *0x814669f8
 
 instruction! If you compare the on-disk data to the Code: dump you
 see that two calls have been replaced with the mysterious fragment
 
 0x00600889 f+41:   57 push   %rdi
 0x0060088a f+42:   9d popfq
 0x0060088b f+43:   66 66 90   xchg   %ax,%ax
 0x0060088e f+46:   66 90  xchg   %ax,%ax
 
 
 Is this memory corruption? Or is linux trying to patch the calls?
[...]

This looks like deliberate patching by the PV-alternatives mechanism.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90

2011-02-02 Thread Timo Juhani Lindfors
Ben Hutchings b...@decadent.org.uk writes:
 Which version of qemu are you using in the host?  If you are using
 kvm-qemu, which kernel version are you using in the host?

The host is a xen domU:

lindi1:~$ qemu-system-x86_64 --version
QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 2003-2008 
Fabrice Bellard
lindi1:~$ dpkg-query -W qemu
qemu0.12.5+dfsg-3
lindi1:~$ dmesg|head -n3
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) 
(b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 
03:40:32 UTC 2011

 0x00600889 f+41:   57 push   %rdi
 0x0060088a f+42:   9d popfq
 0x0060088b f+43:   66 66 90   xchg   %ax,%ax
 0x0060088e f+46:   66 90  xchg   %ax,%ax

 This looks like deliberate patching by the PV-alternatives mechanism.

Is this PV-alternatives a linux or qemu feature or are they both
cooperating?

I tried to look around but couldn't find the code yet.





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/84lj1y126i@sauna.l.org