Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 12:00, schrieb Chris Webb:
 I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has
 stopped and refuses to continue:
 
   (qemu) info status
   VM status: paused
   (qemu) cont
   (qemu) info status
   VM status: paused
 
 The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron
 6176 box, and has nine other 2GB production guests on it running absolutely
 fine.
 
 It's been a while since I've seen one of these. When I last saw a cluster of
 them, they were emulation failures (big real mode instructions, maybe?). I
 also remember a message about abnormal exit in the dmesg previously, but I
 don't have that here. This time, there is no host kernel output at all, just
 the paused guest.
 
 I have qemu monitor access and can even strace the relevant qemu process if
 necessary: is it possible to use this to diagnose what's caused this guest
 to stop, e.g. the unsupported instruction if it's an emulation failure?

Another common cause for stopped VMs are I/O errors, for example writes
to a sparse image when the disk is full.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 Am 24.10.2011 12:00, schrieb Chris Webb:
  I have qemu monitor access and can even strace the relevant qemu process if
  necessary: is it possible to use this to diagnose what's caused this guest
  to stop, e.g. the unsupported instruction if it's an emulation failure?
 
 Another common cause for stopped VMs are I/O errors, for example writes
 to a sparse image when the disk is full.

This guest are backed by LVM LVs so I don't think they can return EFULL, but I
could imagine read errors, so I've just done a trivial test to make sure I can
read them end-to-end:

  0015# dd 
if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 
of=/dev/null bs=1M
  3136+0 records in
  3136+0 records out
  3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s

  0015# dd 
if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 
of=/dev/null bs=1M
  276+0 records in
  276+0 records out
  289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s

Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
problems from emulation problems from anything else?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 12:58, schrieb Chris Webb:
 Kevin Wolf kw...@redhat.com writes:
 
 Am 24.10.2011 12:00, schrieb Chris Webb:
 I have qemu monitor access and can even strace the relevant qemu process if
 necessary: is it possible to use this to diagnose what's caused this guest
 to stop, e.g. the unsupported instruction if it's an emulation failure?

 Another common cause for stopped VMs are I/O errors, for example writes
 to a sparse image when the disk is full.
 
 This guest are backed by LVM LVs so I don't think they can return EFULL, but I
 could imagine read errors, so I've just done a trivial test to make sure I can
 read them end-to-end:
 
   0015# dd 
 if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 
 of=/dev/null bs=1M
   3136+0 records in
   3136+0 records out
   3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s
 
   0015# dd 
 if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 
 of=/dev/null bs=1M
   276+0 records in
   276+0 records out
   289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s
 
 Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
 problems from emulation problems from anything else?

In qemu 1.0 we'll have an extended 'info status' that includes the stop
reason, but 0.14 doesn't have this yet (was committed to git master only
recently).

If you attach a QMP monitor (see QMP/README, don't forget to send the
capabilities command, it's part of creating the connection) you will
receive messages for I/O errors, though.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 In qemu 1.0 we'll have an extended 'info status' that includes the stop
 reason, but 0.14 doesn't have this yet (was committed to git master only
 recently).

Right, okay. I might take a look at cherry-picking and back-porting that to
our version of qemu-kvm if it's not too entangled with other changes. It
would be very useful in these situations.

 If you attach a QMP monitor (see QMP/README, don't forget to send the
 capabilities command, it's part of creating the connection) you will
 receive messages for I/O errors, though.

Thanks. I don't think I can do this with an already-running qemu-kvm that's
in a stopped state can I, only with a new qemu-kvm invocation and wait to
try to catch the problem again?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 13:29, schrieb Chris Webb:
 Kevin Wolf kw...@redhat.com writes:
 
 In qemu 1.0 we'll have an extended 'info status' that includes the stop
 reason, but 0.14 doesn't have this yet (was committed to git master only
 recently).
 
 Right, okay. I might take a look at cherry-picking and back-porting that to
 our version of qemu-kvm if it's not too entangled with other changes. It
 would be very useful in these situations.

I'm afraid that it depends on many other changes, but you can try.

 
 If you attach a QMP monitor (see QMP/README, don't forget to send the
 capabilities command, it's part of creating the connection) you will
 receive messages for I/O errors, though.
 
 Thanks. I don't think I can do this with an already-running qemu-kvm that's
 in a stopped state can I, only with a new qemu-kvm invocation and wait to
 try to catch the problem again?

Good point... The only other thing that I can think of would be
attaching gdb and setting a breakpoint in vm_stop() or something.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 Good point... The only other thing that I can think of would be
 attaching gdb and setting a breakpoint in vm_stop() or something.

Perfect, that seems to identified what's going on very nicely:

(gdb) break vm_stop
Breakpoint 1 at 0x407d10: file /home/root/packages/qemu-kvm/src-UMBurO/cpus.c, 
line 318.
(gdb) fg
Continuing.

Breakpoint 1, vm_stop (reason=0)
at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
318 /home/root/packages/qemu-kvm/src-UMBurO/cpus.c: No such file or 
directory.
in /home/root/packages/qemu-kvm/src-UMBurO/cpus.c
(gdb) bt
#0  vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
#1  0x0058585f in ide_handle_rw_error (s=0x20330d8, error=28, op=8)
at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:468
#2  0x00588376 in ide_dma_cb (opaque=0x20330d8, 
ret=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:494
#3  0x00590092 in dma_bdrv_cb (opaque=0x2043a10, ret=-28)
at /home/root/packages/qemu-kvm/src-UMBurO/dma-helpers.c:94
#4  0x0044d64a in qcow2_aio_write_cb (opaque=0x2034900, ret=-28)
at block/qcow2.c:714
#5  0x0043df6d in posix_aio_process_queue (
opaque=value optimized out) at posix-aio-compat.c:462
#6  0x0043e07d in posix_aio_read (opaque=0x17c8110)
at posix-aio-compat.c:503
#7  0x00415fca in main_loop_wait (nonblocking=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1383
#8  0x0042ca37 in kvm_main_loop ()
at /home/root/packages/qemu-kvm/src-UMBurO/qemu-kvm.c:1589
#9  0x004170a3 in main (argc=32, argv=value optimized out, 
envp=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1429

I see what's happened here: we're not explicitly setting format=raw when we
start that guest and someone's uploaded a qcow2 image directly to a block
device. Ouch. Sorry for the noise!

Best wishes,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html