[Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
[Expired for QEMU because there has been no activity for 60 days.] ** Changed in: qemu Status: Incomplete => Expired -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: Expired Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions
[Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now. If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience. ** Changed in: qemu Status: New => Incomplete -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: Incomplete Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions
[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
Okay, this is probably a race condition bug. If remove: 1 and iothread='1' from the disk which causes the command to change from: -device virtio-blk- pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio- disk0,id=virtio-disk0,bootindex=2,write-cache=on to -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio- disk0,id=virtio-disk0,bootindex=2,write-cache=on I don't get crashes anymore. So for sure it has something to do with iothreads. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: New Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions
[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
>From the core: structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0} You would think that would pass: chunk = >structured; if (chunk->type == NBD_REPLY_TYPE_NONE) { /* NBD_REPLY_FLAG_DONE is already checked in nbd_co_receive_one_chunk */ assert(chunk->flags & NBD_REPLY_FLAG_DONE); goto break_loop; } Given: #define NBD_REPLY_TYPE_NONE 0 Perhaps this is a problem with my compiler. (or maybe it's an ignorant guess) I'm using: gcc version 5.5.0 (GCC) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: New Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions
[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
I'm back to trying to figure this out. I can't use migrate and copy storage until this bug is fixed, so I'm pretty motivated. Today I configured libvirt/qemu to dump the core, and I compiled qemu with debugging symbols. Here is the backtrace. I'm not sure it says anything we don't already know. I may try to hack in some more debugging later today, but my C is terrible. Any other ideas on ways I can help? (gdb) bt full #0 0x7f1a6a3313f8 in raise () at /lib64/libc.so.6 #1 0x7f1a6a332ffa in abort () at /lib64/libc.so.6 #2 0x7f1a6a329c17 in __assert_fail_base () at /lib64/libc.so.6 #3 0x7f1a6a329cc2 in () at /lib64/libc.so.6 #4 0x55a6cba705a6 in nbd_reply_chunk_iter_receive (s=s@entry=0x55a6ce458200, iter=iter@entry=0x7f1945fe8890, handle=handle@entry=94174913593865, qiov=qiov@entry=0x0, reply=0x7f1945fe8800, reply@entry=0x0, payload=payload@entry=0x0) at block/nbd-client.c:606 local_reply = {simple = {magic = 1732535960, error = 0, handle = 94174913593865}, structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0}, {magic = 1732535960, _skip = 0, handle = 94174913593865}} chunk = 0x7f1945fe8800 local_err = 0x0 __func__ = "nbd_reply_chunk_iter_receive" __PRETTY_FUNCTION__ = "nbd_reply_chunk_iter_receive" #5 0x55a6cba706d6 in nbd_co_request (errp=0x7f1945fe, handle=94174913593865, s=0x55a6ce458200) at block/nbd-client.c:634 iter = {ret = 0, fatal = false, err = 0x0, done = false, only_structured = true} ret = local_err = 0x0 client = 0x55a6ce458200 __PRETTY_FUNCTION__ = "nbd_co_request" #6 0x55a6cba706d6 in nbd_co_request (bs=bs@entry=0x55a6ce450130, request=request@entry=0x7f1945fe88e0, write_qiov=write_qiov@entry=0x0) at block/nbd-client.c:772 ret = local_err = 0x0 client = 0x55a6ce458200 __PRETTY_FUNCTION__ = "nbd_co_request" #7 0x55a6cba70cb5 in nbd_client_co_pwrite_zeroes (bs=0x55a6ce450130, offset=2483027968, bytes=16777216, flags=) at block/nbd-client.c:860 client = request = {handle = 94174913593865, from = 2483027968, len = 16777216, flags = 0, type = 6} __PRETTY_FUNCTION__ = "nbd_client_co_pwrite_zeroes" #8 0x55a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce450130, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, flags=flags@entry=6) at block/io.c:1410 num = 16777216 drv = 0x55a6cc3b0600 qiov = {iov = 0x10, niov = -834338512, nalloc = 21926, size = 1831862272} iov = {iov_base = 0x0, iov_len = 0} ret = -95 need_flush = false head = 0 tail = 0 max_write_zeroes = 33554432 alignment = 512 max_transfer = 16777216 __PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes" #9 0x55a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8b50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, align=align@entry=512, qiov=0x0, flags=6, child=0x55a6ce333f50, child=0x55a6ce333f50) at block/io.c:1522 bs = 0x55a6ce450130 drv = 0x55a6cc3b0600 waited = ret = end_sector = 4882432 bytes_remaining = 16777216 max_transfer = 33554432 #10 0x55a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8b50, flags=6, bytes=16777216, offset=2483027968, child=0x55a6ce333f50) at block/io.c:1625 aligned_bytes = 16777216 bs = 0x55a6ce450130 buf = tail_padding_bytes = 0 ---Type to continue, or q to quit--- local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 1825570816} align = 512 head_padding_bytes = ret = 0 iov = {iov_base = 0x7f1945fe8bc0, iov_len = 1} bs = 0x55a6ce450130 req = {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8b98}}, waiting_for = 0x0} align = head_buf = 0x0 tail_buf = 0x0 local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 94171452932608} use_local_qiov = false ret = __PRETTY_FUNCTION__ = "bdrv_co_pwritev" #11 0x55a6cba69a42 in bdrv_co_pwritev (child=child@entry=0x55a6ce333f50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, qiov=qiov@entry=0x0, flags=flags@entry=6) at block/io.c:1698 bs = 0x55a6ce450130 req = {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last =
[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
Hi Eric, Thanks for looking at this. I looked at the nbd/server.c code and couldn't see how it could send a NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit. The only place NBD_REPLY_TYPE_NONE is set is on line 1603: set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0); Anyway, here is the command line generated: /usr/bin/qemu-system-x86_64 -name guest=dng-smokeping,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-dng- smokeping/master-key.aes -machine pc-1.1,accel=kvm,usb=off,dump-guest- core=off -cpu qemu64,pmu=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid 3d0e1603-ad08-4876-9d9f-2d563fac07ea -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -global kvm- pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device piix3 -usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/datastore/vm/dng- smokeping.raw,format=raw,if=none,id=drive-virtio- disk0,cache=writeback,aio=threads -device virtio-blk- pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio- disk0,id=virtio-disk0,bootindex=2,write-cache=on -drive if=none,id =drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive- ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:1d:da:b9,bus=pci.0,addr=0x3 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:59 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Is there some way to turn on NBD trace? I don't see any trace code around the assert, so I'm guessing it would need to be written Is there a log event in QMP? Can that be used to trace what is going on? If so it would be easy to make libvirt log all of that, which should tell us what is going on... If that won't work, I can run the VM outside of libvirt and tell it to migrate over the QMP socket -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: New Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions
[Qemu-devel] [Bug 1793791] Re: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
Tested with Qemu 3.0.0 and this still happens. Also tested with kernel 4.9.128 on one side and 4.9.76 on the other thinking it might be a kernel 4.14 issue. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793791 Title: Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed Status in QEMU: New Bug description: Qemu version on both sides: 2.12.1 Host A Linux: 4.9.76 Host B Linux: 4.14.67 While calling from Host A: virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all I get a qemu crash with: 2018-09-21 16:12:23.073+: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990 qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed. 2018-09-21 16:12:41.230+: shutting down, reason=crashed 2018-09-21 16:12:52.900+: shutting down, reason=failed It doesn't do it every time, but most of the time. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793791/+subscriptions