Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Another hint: I've never seens this using qemu 1.3.1

Stefan

Am 13.02.2013 08:49, schrieb Stefan Priebe - Profihost AG:
 Hi Paolo,
 
 sadly no luck. A VM crashed again.
 
 [ ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 24040c
 virtio_scsi_command_complete
 hw/virtio-scsi.c:429
 
 Same point like last time:
 static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
  size_t resid)
 {
 VirtIOSCSIReq *req = r-hba_private;
 uint32_t sense_len;
 
 429   =req-resp.cmd-response = VIRTIO_SCSI_S_OK;
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 
 Greets
 Stefan
 
 Am 12.02.2013 15:34, schrieb Paolo Bonzini:
 Il 12/02/2013 14:46, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 thanks - i applied the patch to the latest master. I hope that this will
 solve my issue. Will this one get integrated in 1.4 final?



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi,

could this be this one?

commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
Author: Paolo Bonzini pbonz...@redhat.com
Date:   Thu Jan 10 15:49:08 2013 +0100

virtio-scsi: abort in-flight I/O when the device is reset

When the device is reset, the SCSI bus should also be reset so
that in-flight I/O is cancelled.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com

Greets,
Stefan
Am 13.02.2013 09:01, schrieb Stefan Priebe - Profihost AG:
 Another hint: I've never seens this using qemu 1.3.1
 
 Stefan
 
 Am 13.02.2013 08:49, schrieb Stefan Priebe - Profihost AG:
 Hi Paolo,

 sadly no luck. A VM crashed again.

 [ ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 24040c
 virtio_scsi_command_complete
 hw/virtio-scsi.c:429

 Same point like last time:
 static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
  size_t resid)
 {
 VirtIOSCSIReq *req = r-hba_private;
 uint32_t sense_len;

 429   =req-resp.cmd-response = VIRTIO_SCSI_S_OK;
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);

 Greets
 Stefan

 Am 12.02.2013 15:34, schrieb Paolo Bonzini:
 Il 12/02/2013 14:46, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 thanks - i applied the patch to the latest master. I hope that this will
 solve my issue. Will this one get integrated in 1.4 final?



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Paolo Bonzini
Il 13/02/2013 09:19, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 
 could this be this one?
 
 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 Author: Paolo Bonzini pbonz...@redhat.com
 Date:   Thu Jan 10 15:49:08 2013 +0100
 
 virtio-scsi: abort in-flight I/O when the device is reset
 
 When the device is reset, the SCSI bus should also be reset so
 that in-flight I/O is cancelled.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com

You can certainly try reverting it, but this patch is fixing a real bug.

Paolo

 Greets,
 Stefan
 Am 13.02.2013 09:01, schrieb Stefan Priebe - Profihost AG:
 Another hint: I've never seens this using qemu 1.3.1

 Stefan

 Am 13.02.2013 08:49, schrieb Stefan Priebe - Profihost AG:
 Hi Paolo,

 sadly no luck. A VM crashed again.

 [ ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 24040c
 virtio_scsi_command_complete
 hw/virtio-scsi.c:429

 Same point like last time:
 static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
  size_t resid)
 {
 VirtIOSCSIReq *req = r-hba_private;
 uint32_t sense_len;

 429   =req-resp.cmd-response = VIRTIO_SCSI_S_OK;
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);

 Greets
 Stefan

 Am 12.02.2013 15:34, schrieb Paolo Bonzini:
 Il 12/02/2013 14:46, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 thanks - i applied the patch to the latest master. I hope that this will
 solve my issue. Will this one get integrated in 1.4 final?
 
 




Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi,
Am 13.02.2013 09:57, schrieb Paolo Bonzini:
 Il 13/02/2013 09:19, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 could this be this one?

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
...
 You can certainly try reverting it, but this patch is fixing a real bug.

Will try that. Yes but even if it fixes a bug and raises another one
(kvm segfault) which is the worst one. It should be fixed.

Greets,
Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Paolo Bonzini
Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:
 
  commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
  You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.

The KVM segfault is exposing a potential consistency problem.  What is
worse is not obvious.  Also, it is happening at reset time if this is
the culprit.  Reset usually happens at places where no data loss is caused.

Can you find out what the VM was doing when it segfaulted?  (Or even,
can you place the corefile and kvm executable somewhere where I can
download it?)

I'll prepare a test program that resets the adapter while doing I/O and
try to reproduce it myself, in the meanwhile: can you grep the VM's
/var/log/messages with kernel messages regarding the storage (aborting
cmd and other things after it)?  If not, do your VMs reset themselves
often for example?  Can you reproduce it on non-rbd storage?

Paolo



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi,
Am 13.02.2013 12:36, schrieb Paolo Bonzini:
 Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
 You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.
 
 The KVM segfault is exposing a potential consistency problem.  What is
 worse is not obvious.  Also, it is happening at reset time if this is
 the culprit.  Reset usually happens at places where no data loss is caused.
 
 Can you find out what the VM was doing when it segfaulted?  (Or even,
 can you place the corefile and kvm executable somewhere where I can
 download it?)

Yes it was doing an fstrim -v / which resulted in:

[45648.453698] end_request: I/O error, dev sda, sector 9066952

 I'll prepare a test program that resets the adapter while doing I/O and
 try to reproduce it myself, in the meanwhile: can you grep the VM's
 /var/log/messages with kernel messages regarding the storage (aborting
 cmd and other things after it)? 
Sadly not as i don't have acore dump. The kvm processes are started
through variuos Daemons and there seems no way to activate core dumps
for an already running process and i don't know which VM will crash next.

 If not, do your VMs reset themselves
 often for example?
No

 Can you reproduce it on non-rbd storage?
I don't have another storage type. ;-(

Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Paolo Bonzini
Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 Am 13.02.2013 12:36, schrieb Paolo Bonzini:
 Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
 You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.

 The KVM segfault is exposing a potential consistency problem.  What is
 worse is not obvious.  Also, it is happening at reset time if this is
 the culprit.  Reset usually happens at places where no data loss is caused.

 Can you find out what the VM was doing when it segfaulted?  (Or even,
 can you place the corefile and kvm executable somewhere where I can
 download it?)
 
 Yes it was doing an fstrim -v / which resulted in:
 
 [45648.453698] end_request: I/O error, dev sda, sector 9066952

Ok, very helpful.  One thing is to find why this failed.  This can
come later though.

First of all, please run cat /sys/block/*/device/scsi_disk/*/provisioning_mode
in a VM with a similar configuration as the one that crashed last.

Second, I attach another patch.

Third, if possible please compile QEMU with --enable-trace-backend=simple,
and run it with

  -trace events='bdrv_aio_discard
scsi_req_cancel
',file=qemu.$$.trace

This can give some clues.  The files should remain quite small,
so you can enable it on all VMs safely.

 Sadly not as i don't have acore dump. The kvm processes are started
 through variuos Daemons and there seems no way to activate core dumps
 for an already running process and i don't know which VM will crash next.

Probably the next that invokes fstrim. :)

 If not, do your VMs reset themselves
 often for example?
 No

Ok, good to know.

 Can you reproduce it on non-rbd storage?
 I don't have another storage type. ;-(

No problem.

Paolo
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index a97f1cd..01e1dec 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1508,6 +1508,10 @@ void scsi_req_unref(SCSIRequest *req)
will start the next chunk or complete the command.  */
 void scsi_req_continue(SCSIRequest *req)
 {
+if (req-io_canceled) {
+trace_scsi_req_continue_canceled(req-dev-id, req-lun, req-tag);
+return;
+}
 trace_scsi_req_continue(req-dev-id, req-lun, req-tag);
 if (req-cmd.mode == SCSI_XFER_TO_DEV) {
 req-ops-write_data(req);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index d411586..4a0673c 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -178,6 +178,9 @@ static void scsi_aio_complete(void *opaque, int ret)
 assert(r-req.aiocb != NULL);
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
+if (r-req.io_canceled) {
+goto done;
+}
 
 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {
@@ -223,6 +226,10 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
 
+if (r-req.io_canceled) {
+goto done;
+}
+
 if (scsi_is_cmd_fua(r-req.cmd)) {
 bdrv_acct_start(s-qdev.conf.bs, r-acct, 0, BDRV_ACCT_FLUSH);
 r-req.aiocb = bdrv_aio_flush(s-qdev.conf.bs, scsi_aio_complete, r);
@@ -230,6 +237,8 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
 }
 
 scsi_req_complete(r-req, GOOD);
+
+done:
 if (!r-req.io_canceled) {
 scsi_req_unref(r-req);
 }
@@ -243,6 +252,9 @@ static void scsi_dma_complete(void *opaque, int ret)
 assert(r-req.aiocb != NULL);
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
+if (r-req.io_canceled) {
+goto done;
+}
 
 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {
@@ -274,6 +286,9 @@ static void scsi_read_complete(void * opaque, int ret)
 assert(r-req.aiocb != NULL);
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
+if (r-req.io_canceled) {
+goto done;
+}
 
 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {
@@ -305,6 +320,9 @@ static void scsi_do_read(void *opaque, int ret)
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
 }
+if (r-req.io_canceled) {
+goto done;
+}
 
 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {
@@ -312,10 +330,6 @@ static void scsi_do_read(void *opaque, int ret)
 }
 }
 
-if (r-req.io_canceled) {
-return;
-}
-
 /* The request is used as the AIO opaque value, so add a ref.  */
 scsi_req_ref(r-req);
 
@@ -423,6 +437,9 @@ static void scsi_write_complete(void * opaque, int ret)
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
 }
+if (r-req.io_canceled) {
+goto done;
+}
 
 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {
@@ -1478,13 +1495,17 @@ static void scsi_unmap_complete(void *opaque, int ret)
 uint32_t nb_sectors;
 
 r-req.aiocb = NULL;
+if 

Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi paolo,

thanks for your work. Should i still apply your old patch to scsi-disk
or should i remove it?

Stefan
Am 13.02.2013 14:39, schrieb Paolo Bonzini:
 Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 Am 13.02.2013 12:36, schrieb Paolo Bonzini:
 Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
 You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.

 The KVM segfault is exposing a potential consistency problem.  What is
 worse is not obvious.  Also, it is happening at reset time if this is
 the culprit.  Reset usually happens at places where no data loss is caused.

 Can you find out what the VM was doing when it segfaulted?  (Or even,
 can you place the corefile and kvm executable somewhere where I can
 download it?)

 Yes it was doing an fstrim -v / which resulted in:

 [45648.453698] end_request: I/O error, dev sda, sector 9066952
 
 Ok, very helpful.  One thing is to find why this failed.  This can
 come later though.
 
 First of all, please run cat 
 /sys/block/*/device/scsi_disk/*/provisioning_mode
 in a VM with a similar configuration as the one that crashed last.
 
 Second, I attach another patch.
 
 Third, if possible please compile QEMU with --enable-trace-backend=simple,
 and run it with
 
   -trace events='bdrv_aio_discard
 scsi_req_cancel
 ',file=qemu.$$.trace
 
 This can give some clues.  The files should remain quite small,
 so you can enable it on all VMs safely.
 
 Sadly not as i don't have acore dump. The kvm processes are started
 through variuos Daemons and there seems no way to activate core dumps
 for an already running process and i don't know which VM will crash next.
 
 Probably the next that invokes fstrim. :)
 
 If not, do your VMs reset themselves
 often for example?
 No
 
 Ok, good to know.
 
 Can you reproduce it on non-rbd storage?
 I don't have another storage type. ;-(
 
 No problem.
 
 Paolo
 



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Output of cat:
[: ~]# cat /sys/block/*/device/scsi_disk/*/provisioning_mode
writesame_16

Stefan

Am 13.02.2013 14:39, schrieb Paolo Bonzini:
 Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 Am 13.02.2013 12:36, schrieb Paolo Bonzini:
 Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
 You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.

 The KVM segfault is exposing a potential consistency problem.  What is
 worse is not obvious.  Also, it is happening at reset time if this is
 the culprit.  Reset usually happens at places where no data loss is caused.

 Can you find out what the VM was doing when it segfaulted?  (Or even,
 can you place the corefile and kvm executable somewhere where I can
 download it?)

 Yes it was doing an fstrim -v / which resulted in:

 [45648.453698] end_request: I/O error, dev sda, sector 9066952
 
 Ok, very helpful.  One thing is to find why this failed.  This can
 come later though.
 
 First of all, please run cat 
 /sys/block/*/device/scsi_disk/*/provisioning_mode
 in a VM with a similar configuration as the one that crashed last.
 
 Second, I attach another patch.
 
 Third, if possible please compile QEMU with --enable-trace-backend=simple,
 and run it with
 
   -trace events='bdrv_aio_discard
 scsi_req_cancel
 ',file=qemu.$$.trace
 
 This can give some clues.  The files should remain quite small,
 so you can enable it on all VMs safely.
 
 Sadly not as i don't have acore dump. The kvm processes are started
 through variuos Daemons and there seems no way to activate core dumps
 for an already running process and i don't know which VM will crash next.
 
 Probably the next that invokes fstrim. :)
 
 If not, do your VMs reset themselves
 often for example?
 No
 
 Ok, good to know.
 
 Can you reproduce it on non-rbd storage?
 I don't have another storage type. ;-(
 
 No problem.
 
 Paolo
 



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi,

I added this:
-trace events=/tmp/events,file=/root/qemu.123.trace

and put the events in the events file as i couldn't handle \n in my app
starting the kvm process. But even when doing an fstrim the trace file
stays at 24 bytes - is this correct?

Stefan
Am 13.02.2013 14:39, schrieb Paolo Bonzini:
 Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 Am 13.02.2013 12:36, schrieb Paolo Bonzini:
 Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:

 commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
 ...
 You can certainly try reverting it, but this patch is fixing a real bug.
 Will try that. Yes but even if it fixes a bug and raises another one
 (kvm segfault) which is the worst one. It should be fixed.

 The KVM segfault is exposing a potential consistency problem.  What is
 worse is not obvious.  Also, it is happening at reset time if this is
 the culprit.  Reset usually happens at places where no data loss is caused.

 Can you find out what the VM was doing when it segfaulted?  (Or even,
 can you place the corefile and kvm executable somewhere where I can
 download it?)

 Yes it was doing an fstrim -v / which resulted in:

 [45648.453698] end_request: I/O error, dev sda, sector 9066952
 
 Ok, very helpful.  One thing is to find why this failed.  This can
 come later though.
 
 First of all, please run cat 
 /sys/block/*/device/scsi_disk/*/provisioning_mode
 in a VM with a similar configuration as the one that crashed last.
 
 Second, I attach another patch.
 
 Third, if possible please compile QEMU with --enable-trace-backend=simple,
 and run it with
 
   -trace events='bdrv_aio_discard
 scsi_req_cancel
 ',file=qemu.$$.trace
 
 This can give some clues.  The files should remain quite small,
 so you can enable it on all VMs safely.
 
 Sadly not as i don't have acore dump. The kvm processes are started
 through variuos Daemons and there seems no way to activate core dumps
 for an already running process and i don't know which VM will crash next.
 
 Probably the next that invokes fstrim. :)
 
 If not, do your VMs reset themselves
 often for example?
 No
 
 Ok, good to know.
 
 Can you reproduce it on non-rbd storage?
 I don't have another storage type. ;-(
 
 No problem.
 
 Paolo
 



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Paolo Bonzini
Il 13/02/2013 15:30, Stefan Priebe - Profihost AG ha scritto:
 I added this:
 -trace events=/tmp/events,file=/root/qemu.123.trace
 
 and put the events in the events file as i couldn't handle \n in my app
 starting the kvm process. But even when doing an fstrim the trace file
 stays at 24 bytes - is this correct?

Right... it would eventually flush, but not if qemu-kvm crash.

Answering your other question, the patch subsumes the other.  But if the
provisioning mode is writesame_16, this hunk alone will most likely fix
the crash:

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index d411586..4a0673c 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -178,6 +178,9 @@ static void scsi_aio_complete(void *opaque, int ret)
 assert(r-req.aiocb != NULL);
 r-req.aiocb = NULL;
 bdrv_acct_done(s-qdev.conf.bs, r-acct);
+if (r-req.io_canceled) {
+goto done;
+}

 if (ret  0) {
 if (scsi_handle_rw_error(r, -ret)) {

Paolo



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe

Hi,

Am 13.02.2013 16:24, schrieb Paolo Bonzini:

Il 13/02/2013 15:30, Stefan Priebe - Profihost AG ha scritto:

I added this:
-trace events=/tmp/events,file=/root/qemu.123.trace

and put the events in the events file as i couldn't handle \n in my app
starting the kvm process. But even when doing an fstrim the trace file
stays at 24 bytes - is this correct?


Right... it would eventually flush, but not if qemu-kvm crash.

Answering your other question, the patch subsumes the other.  But if the
provisioning mode is writesame_16, this hunk alone will most likely fix
the crash:


I've now added your big one but removed all you sent me in the past. 
Let's see what happens tomorrow morning. GMT+1


Thanks!

Greets,
Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-13 Thread Stefan Priebe - Profihost AG
Hi,

no VM crashed this morning.

Stefan

Am 13.02.2013 16:24, schrieb Paolo Bonzini:
 Il 13/02/2013 15:30, Stefan Priebe - Profihost AG ha scritto:
 I added this:
 -trace events=/tmp/events,file=/root/qemu.123.trace

 and put the events in the events file as i couldn't handle \n in my app
 starting the kvm process. But even when doing an fstrim the trace file
 stays at 24 bytes - is this correct?
 
 Right... it would eventually flush, but not if qemu-kvm crash.
 
 Answering your other question, the patch subsumes the other.  But if the
 provisioning mode is writesame_16, this hunk alone will most likely fix
 the crash:
 
 diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
 index d411586..4a0673c 100644
 --- a/hw/scsi-disk.c
 +++ b/hw/scsi-disk.c
 @@ -178,6 +178,9 @@ static void scsi_aio_complete(void *opaque, int ret)
  assert(r-req.aiocb != NULL);
  r-req.aiocb = NULL;
  bdrv_acct_done(s-qdev.conf.bs, r-acct);
 +if (r-req.io_canceled) {
 +goto done;
 +}
 
  if (ret  0) {
  if (scsi_handle_rw_error(r, -ret)) {
 
 Paolo
 



Re: [Qemu-devel] kvm segfaulting

2013-02-12 Thread Stefan Priebe - Profihost AG
Hi,

thanks - i applied the patch to the latest master. I hope that this will
solve my issue. Will this one get integrated in 1.4 final?

Greets,
Stefan

Am 11.02.2013 15:42, schrieb Paolo Bonzini:
 Il 11/02/2013 15:18, Stefan Priebe - Profihost AG ha scritto:
 Some trace that a request was actually cancelled, but I think I
 believe
 Ah but that must be in guest not on host right? How to grab that from
 client when it is crashing?
 
 Serial console could have something like sda: aborting command.  It is 
 actually interesting to see what is causing commands to be aborted (typically 
 a timeout, but what causes the timeout? :).
 
 that.  This seems to be the same issue as commits
 1bd075f29ea6d11853475c7c42734595720c3ac6 (iSCSI) and
 473c7f0255920bcaf37411990a3725898772817f (rbd), where the cancelled
 callback is called before the complete callback.
 If there is the same code in virtio-scsi it might be.
 
 No, virtio-scsi is relying on the backends (including scsi-disk)
 doing it correctly.  The RBD code looks okay, so it's still my
 fault :) but not virtio-scsi's.
 
 I think this happens when a request is split into multiple parts,
 and one of them is canceled.  Then the next part is fired, but
 virtio-scsi's cancellation callbacks have fired already.
 
 You can test this patch:
 
 diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
 index 07220e4..1d8289c 100644
 --- a/hw/scsi/scsi-disk.c
 +++ b/hw/scsi/scsi-disk.c
 @@ -221,6 +221,10 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
  {
  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
  
 +if (r-req.io_canceled) {
 +return;
 +}
 +
  if (scsi_is_cmd_fua(r-req.cmd)) {
  bdrv_acct_start(s-qdev.conf.bs, r-acct, 0, BDRV_ACCT_FLUSH);
  r-req.aiocb = bdrv_aio_flush(s-qdev.conf.bs, scsi_aio_complete, r);
 @@ -352,6 +356,10 @@ static void scsi_read_data(SCSIRequest *req)
  /* No data transfer may already be in progress */
  assert(r-req.aiocb == NULL);
  
 +if (r-req.io_canceled) {
 +return;
 +}
 +
  /* The request is used as the AIO opaque value, so add a ref.  */
  scsi_req_ref(r-req);
  if (r-req.cmd.mode == SCSI_XFER_TO_DEV) {
 @@ -455,6 +463,10 @@ static void scsi_write_data(SCSIRequest *req)
  /* No data transfer may already be in progress */
  assert(r-req.aiocb == NULL);
  
 +if (r-req.io_canceled) {
 +return;
 +}
 +
  /* The request is used as the AIO opaque value, so add a ref.  */
  scsi_req_ref(r-req);
  if (r-req.cmd.mode != SCSI_XFER_TO_DEV) {
 
 Paolo
 



Re: [Qemu-devel] kvm segfaulting

2013-02-12 Thread Paolo Bonzini
Il 12/02/2013 14:46, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 
 thanks - i applied the patch to the latest master. I hope that this will
 solve my issue. Will this one get integrated in 1.4 final?

No, only 1.4.1 and 1.5 unfortunately.  Let's give it a week for you to
test it.

Paolo



Re: [Qemu-devel] kvm segfaulting

2013-02-12 Thread Stefan Priebe - Profihost AG
Hi Paolo,

sadly no luck. A VM crashed again.

[ ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 24040c
virtio_scsi_command_complete
hw/virtio-scsi.c:429

Same point like last time:
static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
 size_t resid)
{
VirtIOSCSIReq *req = r-hba_private;
uint32_t sense_len;

429   =req-resp.cmd-response = VIRTIO_SCSI_S_OK;
req-resp.cmd-status = status;
if (req-resp.cmd-status == GOOD) {
req-resp.cmd-resid = tswap32(resid);
} else {
req-resp.cmd-resid = 0;
sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
   VIRTIO_SCSI_SENSE_SIZE);
req-resp.cmd-sense_len = tswap32(sense_len);
}
virtio_scsi_complete_req(req);

Greets
Stefan

Am 12.02.2013 15:34, schrieb Paolo Bonzini:
 Il 12/02/2013 14:46, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 thanks - i applied the patch to the latest master. I hope that this will
 solve my issue. Will this one get integrated in 1.4 final?



[Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hello list,

i've seen segfaults of the kvm process. Sadly i've no core dumps just
the line from dmesg:
kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
error 4 in kvm[7fcfc3223000+3ba000]

Is it possible to get the function and some more details out of this
line? I've symbol files and debugging files of the kvm binary.

Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 08:46, Stefan Priebe - Profihost AG ha scritto:
 i've seen segfaults of the kvm process. Sadly i've no core dumps just
 the line from dmesg:
 kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
 error 4 in kvm[7fcfc3223000+3ba000]
 
 Is it possible to get the function and some more details out of this
 line? I've symbol files and debugging files of the kvm binary.

You can run it under gdb.  Alternatively, disable address space
randomization (/proc/sys/kernel/randomize_va_space) and then use
something like addr2line -e /path/to/binary/kvm 0x7fcfc3465eac
(using the IP address from the dmesg line).

Paolo



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Michael Tokarev

11.02.2013 11:46, Stefan Priebe - Profihost AG пишет:

Hello list,

i've seen segfaults of the kvm process. Sadly i've no core dumps just
the line from dmesg:
kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
error 4 in kvm[7fcfc3223000+3ba000]

Is it possible to get the function and some more details out of this
line? I've symbol files and debugging files of the kvm binary.


First of all you need to provide at least minimal info about what _is_
your kvm process.  Official qemu binary for x86 is named qemu-system-x86_64.
Maybe some words about version, if it is some distribution-specific thing
(debian?) - maybe name the distribution, etc.  How it is started.

Enabling core dumps is a separate topic, basically, your process should
have writable current directory and not change its uid, or else you'll
have to do some tricks.  After you've a coredump, run gdb on it with the
symbols file, to see where it goes.

Thanks,

/mjt



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Hajnoczi
On Mon, Feb 11, 2013 at 08:46:03AM +0100, Stefan Priebe - Profihost AG wrote:
 i've seen segfaults of the kvm process. Sadly i've no core dumps just
 the line from dmesg:
 kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
 error 4 in kvm[7fcfc3223000+3ba000]
 
 Is it possible to get the function and some more details out of this
 line? I've symbol files and debugging files of the kvm binary.

Accessed address: c050
Address of the instruction that segfaulted: 7fcfc3465eac
Base memory address where kvm code was mmapped: 7fcfc3223000
Length of mmap: 3ba000

Try the following:

  $ printf '%x' $((0x7fcfc3465eac - 0x7fcfc3223000))
  242eac
  $ addr2line -e path/to/qemu-kvm-symbols -f 242eac

I also suggest posting about 10 lines before/after 0x242eac from the
objdump -d path/to/kvm output.  That way we can sanity check that the
instruction accesses memory and see what the surrounding instructions
are doing.

Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi Stefan,
Am 11.02.2013 10:40, schrieb Stefan Hajnoczi:
 On Mon, Feb 11, 2013 at 08:46:03AM +0100, Stefan Priebe - Profihost AG wrote:
 i've seen segfaults of the kvm process. Sadly i've no core dumps just
 the line from dmesg:
 kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
 error 4 in kvm[7fcfc3223000+3ba000]

 Is it possible to get the function and some more details out of this
 line? I've symbol files and debugging files of the kvm binary.
 
 Accessed address: c050
 Address of the instruction that segfaulted: 7fcfc3465eac
 Base memory address where kvm code was mmapped: 7fcfc3223000
 Length of mmap: 3ba000
 
 Try the following:
 
   $ printf '%x' $((0x7fcfc3465eac - 0x7fcfc3223000))
   242eac
   $ addr2line -e path/to/qemu-kvm-symbols -f 242eac
 
 I also suggest posting about 10 lines before/after 0x242eac from the
 objdump -d path/to/kvm output.  That way we can sanity check that the
 instruction accesses memory and see what the surrounding instructions
 are doing.

Great thing! This is current git master.

[: ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 242eac

virtio_scsi_command_complete
/opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/hw/virtio-scsi.c:429

static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
 size_t resid)
{
VirtIOSCSIReq *req = r-hba_private;
uint32_t sense_len;

= THIS IS 429req-resp.cmd-response = VIRTIO_SCSI_S_OK;
req-resp.cmd-status = status;
if (req-resp.cmd-status == GOOD) {
req-resp.cmd-resid = tswap32(resid);
} else {
req-resp.cmd-resid = 0;
sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
   VIRTIO_SCSI_SENSE_SIZE);
req-resp.cmd-sense_len = tswap32(sense_len);
}
virtio_scsi_complete_req(req);
}

Greets,
Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
So it looks a bit like a race condition in the virtio-scsi driver.
Command got canceled and the completed or something like this.

Stefan

Am 11.02.2013 10:40, schrieb Stefan Hajnoczi:
 On Mon, Feb 11, 2013 at 08:46:03AM +0100, Stefan Priebe - Profihost AG wrote:
 i've seen segfaults of the kvm process. Sadly i've no core dumps just
 the line from dmesg:
 kvm[26268]: segfault at c050 ip 7fcfc3465eac sp 7fffe85a0d00
 error 4 in kvm[7fcfc3223000+3ba000]

 Is it possible to get the function and some more details out of this
 line? I've symbol files and debugging files of the kvm binary.
 
 Accessed address: c050
 Address of the instruction that segfaulted: 7fcfc3465eac
 Base memory address where kvm code was mmapped: 7fcfc3223000
 Length of mmap: 3ba000
 
 Try the following:
 
   $ printf '%x' $((0x7fcfc3465eac - 0x7fcfc3223000))
   242eac
   $ addr2line -e path/to/qemu-kvm-symbols -f 242eac
 
 I also suggest posting about 10 lines before/after 0x242eac from the
 objdump -d path/to/kvm output.  That way we can sanity check that the
 instruction accesses memory and see what the surrounding instructions
 are doing.
 
 Stefan
 



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 Great thing! This is current git master.
 
 [: ~]# addr2line -e /usr/lib/debug/usr/bin/kvm -f 242eac
 
 virtio_scsi_command_complete
 /opt/debianpackages/pve-squeeze.sources/pve-qemu-kvm/qemu-kvm/hw/virtio-scsi.c:429
 
 static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
  size_t resid)
 {
 VirtIOSCSIReq *req = r-hba_private;
 uint32_t sense_len;
 
 = THIS IS 429req-resp.cmd-response = VIRTIO_SCSI_S_OK;
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }

Can you reproduce this?

Paolo



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi,
Am 11.02.2013 13:48, schrieb Paolo Bonzini:
 Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }
 
 Can you reproduce this?

Sadly no - but i've seen this 3 times in the last 4 weeks. I checked all
addresses / dmesg messages and all crashes point to that line.

Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Hajnoczi
On Mon, Feb 11, 2013 at 2:08 PM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
 Hi,
 Am 11.02.2013 13:48, schrieb Paolo Bonzini:
 Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }

 Can you reproduce this?

 Sadly no - but i've seen this 3 times in the last 4 weeks. I checked all
 addresses / dmesg messages and all crashes point to that line.

Just for sanity, because I never trust addr2line, can you please
confirm that you are using virtio-scsi in your guest?

The kvm command-line should contain -device virtio-scsi-pci.

Stefan



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi Stefan,

yes i use virtio-scsi-pci in all my guests. As it is the only one where
i can use fstrim from guest to storage with rbd ;-)

Stefan
Am 11.02.2013 14:21, schrieb Stefan Hajnoczi:
 On Mon, Feb 11, 2013 at 2:08 PM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Hi,
 Am 11.02.2013 13:48, schrieb Paolo Bonzini:
 Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }

 Can you reproduce this?

 Sadly no - but i've seen this 3 times in the last 4 weeks. I checked all
 addresses / dmesg messages and all crashes point to that line.
 
 Just for sanity, because I never trust addr2line, can you please
 confirm that you are using virtio-scsi in your guest?
 
 The kvm command-line should contain -device virtio-scsi-pci.
 
 Stefan
 



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
 Hi Stefan,
 
 yes i use virtio-scsi-pci in all my guests. As it is the only one where
 i can use fstrim from guest to storage with rbd ;-)

Can you check for anything suspicious in the kernel console output?

Paolo

 Stefan
 Am 11.02.2013 14:21, schrieb Stefan Hajnoczi:
 On Mon, Feb 11, 2013 at 2:08 PM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Hi,
 Am 11.02.2013 13:48, schrieb Paolo Bonzini:
 Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }

 Can you reproduce this?

 Sadly no - but i've seen this 3 times in the last 4 weeks. I checked all
 addresses / dmesg messages and all crashes point to that line.

 Just for sanity, because I never trust addr2line, can you please
 confirm that you are using virtio-scsi in your guest?

 The kvm command-line should contain -device virtio-scsi-pci.






Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi Paolo,

as the guest crashes i can't check the guest. On the host i just have
the segmentation fault line. Anything else is from the bootprocess or
enabling the tap device. So nothing suspicious.

Greets,
Stefan

Am 11.02.2013 14:56, schrieb Paolo Bonzini:
 Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
 Hi Stefan,

 yes i use virtio-scsi-pci in all my guests. As it is the only one where
 i can use fstrim from guest to storage with rbd ;-)
 
 Can you check for anything suspicious in the kernel console output?
 
 Paolo
 
 Stefan
 Am 11.02.2013 14:21, schrieb Stefan Hajnoczi:
 On Mon, Feb 11, 2013 at 2:08 PM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Hi,
 Am 11.02.2013 13:48, schrieb Paolo Bonzini:
 Il 11/02/2013 10:48, Stefan Priebe - Profihost AG ha scritto:
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
 req-resp.cmd-resid = tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
 req-resp.cmd-sense_len = tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }

 Can you reproduce this?

 Sadly no - but i've seen this 3 times in the last 4 weeks. I checked all
 addresses / dmesg messages and all crashes point to that line.

 Just for sanity, because I never trust addr2line, can you please
 confirm that you are using virtio-scsi in your guest?

 The kvm command-line should contain -device virtio-scsi-pci.
 
 
 



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 14:58, Stefan Priebe - Profihost AG ha scritto:
 Hi Paolo,
 
 as the guest crashes i can't check the guest. On the host i just have
 the segmentation fault line. Anything else is from the bootprocess or
 enabling the tap device. So nothing suspicious.

What about log from the serial console?

Paolo

 Greets,
 Stefan
 
 Am 11.02.2013 14:56, schrieb Paolo Bonzini:
  Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
  Hi Stefan,
 
  yes i use virtio-scsi-pci in all my guests. As it is the only one where
  i can use fstrim from guest to storage with rbd ;-)
  
  Can you check for anything suspicious in the kernel console output?
  




Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi,

nothing. What are you searching for?

Stefan
Am 11.02.2013 14:59, schrieb Paolo Bonzini:
 Il 11/02/2013 14:58, Stefan Priebe - Profihost AG ha scritto:
 Hi Paolo,

 as the guest crashes i can't check the guest. On the host i just have
 the segmentation fault line. Anything else is from the bootprocess or
 enabling the tap device. So nothing suspicious.
 
 What about log from the serial console?
 
 Paolo
 
 Greets,
 Stefan

 Am 11.02.2013 14:56, schrieb Paolo Bonzini:
 Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
 Hi Stefan,

 yes i use virtio-scsi-pci in all my guests. As it is the only one where
 i can use fstrim from guest to storage with rbd ;-)

 Can you check for anything suspicious in the kernel console output?

 



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 15:02, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 
 nothing. What are you searching for?

Some trace that a request was actually cancelled, but I think I believe
that.  This seems to be the same issue as commits
1bd075f29ea6d11853475c7c42734595720c3ac6 (iSCSI) and
473c7f0255920bcaf37411990a3725898772817f (rbd), where the cancelled
callback is called before the complete callback.

Note how virtio_scsi_request_cancelled is protected against
r-hba_private == NULL, while virtio_scsi_command_complete doesn't.
That's by design, because the latter should never happen or you could
get data corruption in the guest.

What version are you running?  Does the rbd backend have the above commit?

Paolo

 Stefan
 Am 11.02.2013 14:59, schrieb Paolo Bonzini:
 Il 11/02/2013 14:58, Stefan Priebe - Profihost AG ha scritto:
 Hi Paolo,

 as the guest crashes i can't check the guest. On the host i just have
 the segmentation fault line. Anything else is from the bootprocess or
 enabling the tap device. So nothing suspicious.

 What about log from the serial console?

 Paolo

 Greets,
 Stefan

 Am 11.02.2013 14:56, schrieb Paolo Bonzini:
 Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
 Hi Stefan,

 yes i use virtio-scsi-pci in all my guests. As it is the only one where
 i can use fstrim from guest to storage with rbd ;-)

 Can you check for anything suspicious in the kernel console output?






Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Stefan Priebe - Profihost AG
Hi,

 Some trace that a request was actually cancelled, but I think I
 believe
Ah but that must be in guest not on host right? How to grab that from
client when it is crashing?

 that.  This seems to be the same issue as commits
 1bd075f29ea6d11853475c7c42734595720c3ac6 (iSCSI) and
 473c7f0255920bcaf37411990a3725898772817f (rbd), where the cancelled
 callback is called before the complete callback.
If there is the same code in virtio-scsi it might be.

 Note how virtio_scsi_request_cancelled is protected against
 r-hba_private == NULL, while virtio_scsi_command_complete doesn't.
 That's by design, because the latter should never happen or you could
 get data corruption in the guest.
No idea ;-) i'm not even a C programmer at all.

 What version are you running?  Does the rbd backend have the above
 commit?
librbd 0.56.2
qemu current master git (10442558ab1797bfbb01285b909e34c5cf038f12)

Stefan
Am 11.02.2013 15:12, schrieb Paolo Bonzini:
 Il 11/02/2013 15:02, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 nothing. What are you searching for?
 
 Some trace that a request was actually cancelled, but I think I believe
 that.  This seems to be the same issue as commits
 1bd075f29ea6d11853475c7c42734595720c3ac6 (iSCSI) and
 473c7f0255920bcaf37411990a3725898772817f (rbd), where the cancelled
 callback is called before the complete callback.
 
 Note how virtio_scsi_request_cancelled is protected against
 r-hba_private == NULL, while virtio_scsi_command_complete doesn't.
 That's by design, because the latter should never happen or you could
 get data corruption in the guest.
 
 What version are you running?  Does the rbd backend have the above commit?
 
 Paolo
 
 Stefan
 Am 11.02.2013 14:59, schrieb Paolo Bonzini:
 Il 11/02/2013 14:58, Stefan Priebe - Profihost AG ha scritto:
 Hi Paolo,

 as the guest crashes i can't check the guest. On the host i just have
 the segmentation fault line. Anything else is from the bootprocess or
 enabling the tap device. So nothing suspicious.

 What about log from the serial console?

 Paolo

 Greets,
 Stefan

 Am 11.02.2013 14:56, schrieb Paolo Bonzini:
 Il 11/02/2013 14:35, Stefan Priebe - Profihost AG ha scritto:
 Hi Stefan,

 yes i use virtio-scsi-pci in all my guests. As it is the only one where
 i can use fstrim from guest to storage with rbd ;-)

 Can you check for anything suspicious in the kernel console output?


 



Re: [Qemu-devel] kvm segfaulting

2013-02-11 Thread Paolo Bonzini
Il 11/02/2013 15:18, Stefan Priebe - Profihost AG ha scritto:
  Some trace that a request was actually cancelled, but I think I
  believe
 Ah but that must be in guest not on host right? How to grab that from
 client when it is crashing?

Serial console could have something like sda: aborting command.  It is 
actually interesting to see what is causing commands to be aborted (typically a 
timeout, but what causes the timeout? :).

  that.  This seems to be the same issue as commits
  1bd075f29ea6d11853475c7c42734595720c3ac6 (iSCSI) and
  473c7f0255920bcaf37411990a3725898772817f (rbd), where the cancelled
  callback is called before the complete callback.
 If there is the same code in virtio-scsi it might be.

No, virtio-scsi is relying on the backends (including scsi-disk)
doing it correctly.  The RBD code looks okay, so it's still my
fault :) but not virtio-scsi's.

I think this happens when a request is split into multiple parts,
and one of them is canceled.  Then the next part is fired, but
virtio-scsi's cancellation callbacks have fired already.

You can test this patch:

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 07220e4..1d8289c 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -221,6 +221,10 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
 
+if (r-req.io_canceled) {
+return;
+}
+
 if (scsi_is_cmd_fua(r-req.cmd)) {
 bdrv_acct_start(s-qdev.conf.bs, r-acct, 0, BDRV_ACCT_FLUSH);
 r-req.aiocb = bdrv_aio_flush(s-qdev.conf.bs, scsi_aio_complete, r);
@@ -352,6 +356,10 @@ static void scsi_read_data(SCSIRequest *req)
 /* No data transfer may already be in progress */
 assert(r-req.aiocb == NULL);
 
+if (r-req.io_canceled) {
+return;
+}
+
 /* The request is used as the AIO opaque value, so add a ref.  */
 scsi_req_ref(r-req);
 if (r-req.cmd.mode == SCSI_XFER_TO_DEV) {
@@ -455,6 +463,10 @@ static void scsi_write_data(SCSIRequest *req)
 /* No data transfer may already be in progress */
 assert(r-req.aiocb == NULL);
 
+if (r-req.io_canceled) {
+return;
+}
+
 /* The request is used as the AIO opaque value, so add a ref.  */
 scsi_req_ref(r-req);
 if (r-req.cmd.mode != SCSI_XFER_TO_DEV) {

Paolo