Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/13/2018 06:52 AM, Marc Olson via Qemu-devel wrote: > On 11/11/18 11:36 PM, Dongli Zhang wrote: >> On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote: >>> On 11/3/18 10:24 AM, Dongli Zhang wrote: The 'write' latency of sector=40960 is set to a very large value. When the I/O is stalled in guest due to that sector=40960 is accessed, I do see below messages in guest log: [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting [ 80.808095] nvme nvme0: Abort status: 0x4001 However, then nothing happens further. nvme I/O hangs in guest. I am not able to kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I need to kill qemu with "kill -9" The same result for virtio-scsi and qemu is stuck as well. >>> While I didn't try virtio-scsi, I wasn't able to reproduce this behavior >>> using >>> nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying >>> against? >> Would you like to reproduce the "aborting" message or the qemu hang? > I could not reproduce IO hanging in the guest, but I can reproduce qemu > hanging. >> guest image: ubuntu 16.04 >> guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04) >> qemu: qemu-3.0.0 (with the blkdebug delay patch) >> >> Would you be able to see the nvme abort (which is indeed not supported by >> qemu) >> message in guest kernel? > Yes. >> Once I see that message, I would not be able to kill the qemu-system-x86_64 >> command line with Ctrl+C. > > I missed this part. I wasn't expecting to handle very long timeouts, but what > appears to be happening is that the sleep doesn't get interrupted on > shutdown. I > suspect something like this, on top of the series I sent last night, should > help: > > diff --git a/block/blkdebug.c b/block/blkdebug.c > index 6b1f2d6..0bfb91b 100644 > --- a/block/blkdebug.c > +++ b/block/blkdebug.c > @@ -557,8 +557,11 @@ static int rule_check(BlockDriverState *bs, uint64_t > offset, uint64_t bytes) > remove_active_rule(s, delay_rule); > } > > -if (latency != 0) { > -qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency); > +while (latency > 0 && > !aio_external_disabled(bdrv_get_aio_context(bs))) { > +int64_t cur_latency = MIN(latency, 10ULL); > + > +qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, cur_latency); > +latency -= cur_latency; > } > } > > > /marc > > I am able to interrupt qemu with above patch to periodically wake up and sleep again. Dongli Zhang
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/11/18 11:36 PM, Dongli Zhang wrote: On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote: On 11/3/18 10:24 AM, Dongli Zhang wrote: The 'write' latency of sector=40960 is set to a very large value. When the I/O is stalled in guest due to that sector=40960 is accessed, I do see below messages in guest log: [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting [ 80.808095] nvme nvme0: Abort status: 0x4001 However, then nothing happens further. nvme I/O hangs in guest. I am not able to kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I need to kill qemu with "kill -9" The same result for virtio-scsi and qemu is stuck as well. While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying against? Would you like to reproduce the "aborting" message or the qemu hang? I could not reproduce IO hanging in the guest, but I can reproduce qemu hanging. guest image: ubuntu 16.04 guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04) qemu: qemu-3.0.0 (with the blkdebug delay patch) Would you be able to see the nvme abort (which is indeed not supported by qemu) message in guest kernel? Yes. Once I see that message, I would not be able to kill the qemu-system-x86_64 command line with Ctrl+C. I missed this part. I wasn't expecting to handle very long timeouts, but what appears to be happening is that the sleep doesn't get interrupted on shutdown. I suspect something like this, on top of the series I sent last night, should help: diff --git a/block/blkdebug.c b/block/blkdebug.c index 6b1f2d6..0bfb91b 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -557,8 +557,11 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes) remove_active_rule(s, delay_rule); } - if (latency != 0) { - qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency); + while (latency > 0 && !aio_external_disabled(bdrv_get_aio_context(bs))) { + int64_t cur_latency = MIN(latency, 10ULL); + + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, cur_latency); + latency -= cur_latency; } } /marc
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote: > On 11/3/18 10:24 AM, Dongli Zhang wrote: >> Hi all, >> >> I tried with the patch at: >> >> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html >> >> The patch is applied to qemu-3.0.0. >> >> >> Below configuration is used to test the feature for guest VM nvme. >> >> # qemu-system-x86_64 \ >> -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \ >> -net nic -net user,hostfwd=tcp::5022-:22 \ >> -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \ >> -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \ >> -object iothread,id=io1 \ >> -device nvme,drive=nvme1,serial=deadbeaf1 \ >> -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1 >> >> # cat blkdebug.config >> [delay] >> event = "write_aio" >> latency = "99" >> sector = "40960" >> >> >> The 'write' latency of sector=40960 is set to a very large value. When the >> I/O >> is stalled in guest due to that sector=40960 is accessed, I do see below >> messages in guest log: >> >> [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting >> [ 80.808095] nvme nvme0: Abort status: 0x4001 >> >> >> However, then nothing happens further. nvme I/O hangs in guest. I am not >> able to >> kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I >> need to kill qemu with "kill -9" >> >> >> The same result for virtio-scsi and qemu is stuck as well. > While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using > nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying > against? Would you like to reproduce the "aborting" message or the qemu hang? guest image: ubuntu 16.04 guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04) qemu: qemu-3.0.0 (with the blkdebug delay patch) Would you be able to see the nvme abort (which is indeed not supported by qemu) message in guest kernel? Once I see that message, I would not be able to kill the qemu-system-x86_64 command line with Ctrl+C. Dongli Zhang
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/3/18 10:24 AM, Dongli Zhang wrote: Hi all, I tried with the patch at: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html The patch is applied to qemu-3.0.0. Below configuration is used to test the feature for guest VM nvme. # qemu-system-x86_64 \ -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \ -net nic -net user,hostfwd=tcp::5022-:22 \ -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \ -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \ -object iothread,id=io1 \ -device nvme,drive=nvme1,serial=deadbeaf1 \ -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1 # cat blkdebug.config [delay] event = "write_aio" latency = "99" sector = "40960" The 'write' latency of sector=40960 is set to a very large value. When the I/O is stalled in guest due to that sector=40960 is accessed, I do see below messages in guest log: [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting [ 80.808095] nvme nvme0: Abort status: 0x4001 However, then nothing happens further. nvme I/O hangs in guest. I am not able to kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I need to kill qemu with "kill -9" The same result for virtio-scsi and qemu is stuck as well. While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying against? /marc
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/03/2018 01:24 PM, Dongli Zhang wrote: > Hi all, > Hi, please reply below the quoted text when writing to qemu-devel in the future; my reply is below. > I tried with the patch at: > > https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html > > The patch is applied to qemu-3.0.0. > > > Below configuration is used to test the feature for guest VM nvme. > > # qemu-system-x86_64 \ > -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \ > -net nic -net user,hostfwd=tcp::5022-:22 \ > -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \ > -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \ > -object iothread,id=io1 \ > -device nvme,drive=nvme1,serial=deadbeaf1 \ > -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1 > > # cat blkdebug.config > [delay] > event = "write_aio" > latency = "99" > sector = "40960" > > > The 'write' latency of sector=40960 is set to a very large value. When the I/O > is stalled in guest due to that sector=40960 is accessed, I do see below > messages in guest log: > > [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting > [ 80.808095] nvme nvme0: Abort status: 0x4001 > > > However, then nothing happens further. nvme I/O hangs in guest. I am not able > to > kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I > need to kill qemu with "kill -9" > > > The same result for virtio-scsi and qemu is stuck as well. > OK, sounds like a bug in the delay implementation here, then; or something I've not considered with the locking/drain specifics. Thanks for the report. > > About blkdebug, I can only trigger the error by the config file. Is there a > way > to inject error or latency via qemu monior? For instance, I would like to > inject > error not for a specific sector or state, but for the entire disk when I input > some command via qemu monitor. > I don't recall. There are some tricks you can play with set-state and rules that only apply when in a certain state. I don't remember if there are monitor or QMP commands to set the state explicitly. I'm looking at docs/devel/blkdebug.txt and don't see anything immediately. There's maybe a way you can use blockdev-add to create the blkdebug node and insert it live into the graph when you want it, and live-remove it when you don't, but I'm not sure of the syntax right away. (maybe that's not possible?) --js > Dongli Zhang > > > On 11/03/2018 02:17 AM, John Snow wrote: >> >> >> On 11/02/2018 01:55 PM, Marc Olson wrote: >>> On 11/2/18 10:49 AM, John Snow wrote: On 11/02/2018 04:11 AM, Dongli Zhang wrote: > Hi, > > Is there any way to emulate I/O timeout on qemu side (not fault > injection in VM > kernel) without modifying qemu source code? > > For instance, I would like to observe/study/debug the I/O timeout > handling of > nvme, scsi, virtio-blk (not supported) of VM kernel. > > Is there a way to trigger this on purpose on qemu side? > > Thank you very much! > > Dongli Zhang > I don't think the blkdebug driver supports arbitrary delays right now. Maybe we could augment it to do so? (I thought someone already had, but maybe it wasn't merged?) Aha, here: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html Let's work from there. >>> >>> I've got updates to that patch series that fell on the floor due to >>> other competing things. I'll get some screen time this weekend to work >>> on them and submit v3. >>> >>> /marc >>> >> >> Great! Please CC the usual maintainers, but also include me. >> >> In the meantime, Dongli Zhang, why don't you try the v2 patch and see if >> that helps you out for your use case? Report back if it works for you or >> not. >> >> --js >>
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
Hi all, I tried with the patch at: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html The patch is applied to qemu-3.0.0. Below configuration is used to test the feature for guest VM nvme. # qemu-system-x86_64 \ -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \ -net nic -net user,hostfwd=tcp::5022-:22 \ -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \ -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \ -object iothread,id=io1 \ -device nvme,drive=nvme1,serial=deadbeaf1 \ -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1 # cat blkdebug.config [delay] event = "write_aio" latency = "99" sector = "40960" The 'write' latency of sector=40960 is set to a very large value. When the I/O is stalled in guest due to that sector=40960 is accessed, I do see below messages in guest log: [ 80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting [ 80.808095] nvme nvme0: Abort status: 0x4001 However, then nothing happens further. nvme I/O hangs in guest. I am not able to kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I need to kill qemu with "kill -9" The same result for virtio-scsi and qemu is stuck as well. About blkdebug, I can only trigger the error by the config file. Is there a way to inject error or latency via qemu monior? For instance, I would like to inject error not for a specific sector or state, but for the entire disk when I input some command via qemu monitor. Dongli Zhang On 11/03/2018 02:17 AM, John Snow wrote: > > > On 11/02/2018 01:55 PM, Marc Olson wrote: >> On 11/2/18 10:49 AM, John Snow wrote: >>> On 11/02/2018 04:11 AM, Dongli Zhang wrote: Hi, Is there any way to emulate I/O timeout on qemu side (not fault injection in VM kernel) without modifying qemu source code? For instance, I would like to observe/study/debug the I/O timeout handling of nvme, scsi, virtio-blk (not supported) of VM kernel. Is there a way to trigger this on purpose on qemu side? Thank you very much! Dongli Zhang >>> I don't think the blkdebug driver supports arbitrary delays right now. >>> Maybe we could augment it to do so? >>> >>> (I thought someone already had, but maybe it wasn't merged?) >>> >>> Aha, here: >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html >>> V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html >>> >>> Let's work from there. >> >> I've got updates to that patch series that fell on the floor due to >> other competing things. I'll get some screen time this weekend to work >> on them and submit v3. >> >> /marc >> > > Great! Please CC the usual maintainers, but also include me. > > In the meantime, Dongli Zhang, why don't you try the v2 patch and see if > that helps you out for your use case? Report back if it works for you or > not. > > --js >
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/02/2018 01:55 PM, Marc Olson wrote: > On 11/2/18 10:49 AM, John Snow wrote: >> On 11/02/2018 04:11 AM, Dongli Zhang wrote: >>> Hi, >>> >>> Is there any way to emulate I/O timeout on qemu side (not fault >>> injection in VM >>> kernel) without modifying qemu source code? >>> >>> For instance, I would like to observe/study/debug the I/O timeout >>> handling of >>> nvme, scsi, virtio-blk (not supported) of VM kernel. >>> >>> Is there a way to trigger this on purpose on qemu side? >>> >>> Thank you very much! >>> >>> Dongli Zhang >>> >> I don't think the blkdebug driver supports arbitrary delays right now. >> Maybe we could augment it to do so? >> >> (I thought someone already had, but maybe it wasn't merged?) >> >> Aha, here: >> >> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html >> V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html >> >> Let's work from there. > > I've got updates to that patch series that fell on the floor due to > other competing things. I'll get some screen time this weekend to work > on them and submit v3. > > /marc > Great! Please CC the usual maintainers, but also include me. In the meantime, Dongli Zhang, why don't you try the v2 patch and see if that helps you out for your use case? Report back if it works for you or not. --js
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/2/18 10:49 AM, John Snow wrote: On 11/02/2018 04:11 AM, Dongli Zhang wrote: Hi, Is there any way to emulate I/O timeout on qemu side (not fault injection in VM kernel) without modifying qemu source code? For instance, I would like to observe/study/debug the I/O timeout handling of nvme, scsi, virtio-blk (not supported) of VM kernel. Is there a way to trigger this on purpose on qemu side? Thank you very much! Dongli Zhang I don't think the blkdebug driver supports arbitrary delays right now. Maybe we could augment it to do so? (I thought someone already had, but maybe it wasn't merged?) Aha, here: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html Let's work from there. I've got updates to that patch series that fell on the floor due to other competing things. I'll get some screen time this weekend to work on them and submit v3. /marc
Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?
On 11/02/2018 04:11 AM, Dongli Zhang wrote: > Hi, > > Is there any way to emulate I/O timeout on qemu side (not fault injection in > VM > kernel) without modifying qemu source code? > > For instance, I would like to observe/study/debug the I/O timeout handling of > nvme, scsi, virtio-blk (not supported) of VM kernel. > > Is there a way to trigger this on purpose on qemu side? > > Thank you very much! > > Dongli Zhang > I don't think the blkdebug driver supports arbitrary delays right now. Maybe we could augment it to do so? (I thought someone already had, but maybe it wasn't merged?) Aha, here: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html Let's work from there. --js