[Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2021-01-11 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  Expired

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions



[Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2020-11-12 Thread Thomas Huth
The QEMU project is currently considering to move its bug tracking to another 
system. For this we need to know which bugs are still valid and which could be 
closed already. Thus we are setting older bugs to "Incomplete" now.
If you still think this bug report here is valid, then please switch the state 
back to "New" within the next 60 days, otherwise this report will be marked as 
"Expired". Or mark it as "Fix Released" if the problem has been solved with a 
newer version of QEMU already. Thank you and sorry for the inconvenience.

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  Incomplete

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions



[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2018-09-25 Thread Matthew Schumacher
Okay, this is probably a race condition bug.  If remove:

1
and
iothread='1' from the disk which causes the command to change from: 

-device virtio-blk-
pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-
disk0,id=virtio-disk0,bootindex=2,write-cache=on

to

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-
disk0,id=virtio-disk0,bootindex=2,write-cache=on

I don't get crashes anymore.

So for sure it has something to do with iothreads.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  New

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions



[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2018-09-24 Thread Matthew Schumacher
>From the core:

structured = {magic = 1732535960, flags = 0, type = 0, handle =
94174913593865, length = 0}

You would think that would pass:

chunk = >structured;

if (chunk->type == NBD_REPLY_TYPE_NONE) {
/* NBD_REPLY_FLAG_DONE is already checked in nbd_co_receive_one_chunk */
assert(chunk->flags & NBD_REPLY_FLAG_DONE);
goto break_loop;
}

Given:

#define NBD_REPLY_TYPE_NONE  0


Perhaps this is a problem with my compiler.  (or maybe it's an ignorant guess) 
I'm using:

gcc version 5.5.0 (GCC)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  New

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions



[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2018-09-24 Thread Matthew Schumacher
I'm back to trying to figure this out.  I can't use migrate and copy
storage until this bug is fixed, so I'm pretty motivated.  Today I
configured libvirt/qemu to dump the core, and I compiled qemu with
debugging symbols.  Here is the backtrace.  I'm not sure it says
anything we don't already know.

I may try to hack in some more debugging later today, but my C is
terrible.  Any other ideas on ways I can help?

(gdb) bt full
#0  0x7f1a6a3313f8 in raise () at /lib64/libc.so.6
#1  0x7f1a6a332ffa in abort () at /lib64/libc.so.6
#2  0x7f1a6a329c17 in __assert_fail_base () at /lib64/libc.so.6
#3  0x7f1a6a329cc2 in  () at /lib64/libc.so.6
#4  0x55a6cba705a6 in nbd_reply_chunk_iter_receive 
(s=s@entry=0x55a6ce458200, iter=iter@entry=0x7f1945fe8890, 
handle=handle@entry=94174913593865, qiov=qiov@entry=0x0, reply=0x7f1945fe8800,
reply@entry=0x0, payload=payload@entry=0x0) at block/nbd-client.c:606
local_reply =
{simple = {magic = 1732535960, error = 0, handle = 94174913593865}, 
structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, 
length = 0}, {magic = 1732535960, _skip = 0, handle = 94174913593865}}
chunk = 0x7f1945fe8800
local_err = 0x0
__func__ = "nbd_reply_chunk_iter_receive"
__PRETTY_FUNCTION__ = "nbd_reply_chunk_iter_receive"
#5  0x55a6cba706d6 in nbd_co_request (errp=0x7f1945fe, 
handle=94174913593865, s=0x55a6ce458200) at block/nbd-client.c:634
iter = {ret = 0, fatal = false, err = 0x0, done = false, 
only_structured = true}
ret = 
local_err = 0x0
client = 0x55a6ce458200
__PRETTY_FUNCTION__ = "nbd_co_request"
#6  0x55a6cba706d6 in nbd_co_request (bs=bs@entry=0x55a6ce450130, 
request=request@entry=0x7f1945fe88e0, write_qiov=write_qiov@entry=0x0) at 
block/nbd-client.c:772
ret = 
local_err = 0x0
client = 0x55a6ce458200
__PRETTY_FUNCTION__ = "nbd_co_request"
#7  0x55a6cba70cb5 in nbd_client_co_pwrite_zeroes (bs=0x55a6ce450130, 
offset=2483027968, bytes=16777216, flags=) at 
block/nbd-client.c:860
client = 
request = {handle = 94174913593865, from = 2483027968, len = 16777216, 
flags = 0, type = 6}
__PRETTY_FUNCTION__ = "nbd_client_co_pwrite_zeroes"
#8  0x55a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce450130, 
offset=offset@entry=2483027968, bytes=bytes@entry=16777216, 
flags=flags@entry=6) at block/io.c:1410
num = 16777216
drv = 0x55a6cc3b0600 
qiov = {iov = 0x10, niov = -834338512, nalloc = 21926, size = 
1831862272}
iov = {iov_base = 0x0, iov_len = 0}
ret = -95
need_flush = false
head = 0
tail = 0
max_write_zeroes = 33554432
alignment = 512
max_transfer = 16777216
__PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
#9  0x55a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8b50, 
offset=offset@entry=2483027968, bytes=bytes@entry=16777216, 
align=align@entry=512, qiov=0x0, flags=6, child=0x55a6ce333f50, 
child=0x55a6ce333f50)
at block/io.c:1522
bs = 0x55a6ce450130
drv = 0x55a6cc3b0600 
waited = 
ret = 
end_sector = 4882432
bytes_remaining = 16777216
max_transfer = 33554432
#10 0x55a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8b50, flags=6, 
bytes=16777216, offset=2483027968, child=0x55a6ce333f50) at block/io.c:1625
aligned_bytes = 16777216
bs = 0x55a6ce450130
buf = 
tail_padding_bytes = 0
---Type  to continue, or q  to quit---
local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 1825570816}
align = 512
head_padding_bytes = 
ret = 0
iov = {iov_base = 0x7f1945fe8bc0, iov_len = 1}
bs = 0x55a6ce450130
req =
  {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = 
BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, 
overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co 
= 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 
0x7f1945fe8b98}}, waiting_for = 0x0}
align = 
head_buf = 0x0
tail_buf = 0x0
local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 
94171452932608}
use_local_qiov = false
ret = 
__PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#11 0x55a6cba69a42 in bdrv_co_pwritev (child=child@entry=0x55a6ce333f50, 
offset=offset@entry=2483027968, bytes=bytes@entry=16777216, 
qiov=qiov@entry=0x0, flags=flags@entry=6) at block/io.c:1698
bs = 0x55a6ce450130
req =
  {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = 
BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, 
overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co 
= 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 

[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2018-09-21 Thread Matthew Schumacher
Hi Eric,

Thanks for looking at this.

I looked at the nbd/server.c code and couldn't see how it could send a
NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit.
The only place NBD_REPLY_TYPE_NONE is set is on line 1603:

set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE,
handle, 0);

Anyway, here is the command line generated:

/usr/bin/qemu-system-x86_64 -name guest=dng-smokeping,debug-threads=on
-S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-dng-
smokeping/master-key.aes -machine pc-1.1,accel=kvm,usb=off,dump-guest-
core=off -cpu qemu64,pmu=off -m 4096 -realtime mlock=off -smp
2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid
3d0e1603-ad08-4876-9d9f-2d563fac07ea -no-user-config -nodefaults
-chardev socket,id=charmonitor,fd=26,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc
base=localtime,clock=vm,driftfix=slew -global kvm-
pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device piix3
-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/datastore/vm/dng-
smokeping.raw,format=raw,if=none,id=drive-virtio-
disk0,cache=writeback,aio=threads -device virtio-blk-
pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-
disk0,id=virtio-disk0,bootindex=2,write-cache=on -drive if=none,id
=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-
ide0-1-0,id=ide0-1-0,bootindex=1 -netdev
tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-
pci,netdev=hostnet0,id=net0,mac=52:54:00:1d:da:b9,bus=pci.0,addr=0x3
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:59 -device
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-
pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on


Is there some way to turn on NBD trace?  I don't see any trace code around the 
assert, so I'm guessing it would need to be written   Is there a log event 
in QMP?  Can that be used to trace what is going on?  If so it would be easy to 
make libvirt log all of that, which should tell us what is going on...

If that won't work, I can run the VM outside of libvirt and tell it to
migrate over the QMP socket

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  New

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions



[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

2018-09-21 Thread Matthew Schumacher
Tested with Qemu 3.0.0 and this still happens.

Also tested with kernel 4.9.128 on one side and 4.9.76 on the other
thinking it might be a kernel 4.14 issue.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793791

Title:
  Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags &
  NBD_REPLY_FLAG_DONE' failed

Status in QEMU:
  New

Bug description:
  Qemu version on both sides: 2.12.1
  Host A Linux: 4.9.76
  Host B Linux: 4.14.67

  While calling from Host A:
  virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource 
--persistent --verbose --copy-storage-all

  I get a qemu crash with:

  2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : 
OBJECT_UNREF: obj=0x7f922c03d990
  qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: 
Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
  2018-09-21 16:12:41.230+: shutting down, reason=crashed
  2018-09-21 16:12:52.900+: shutting down, reason=failed

  It doesn't do it every time, but most of the time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions