Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-12 Thread Dongli Zhang



On 11/13/2018 06:52 AM, Marc Olson via Qemu-devel wrote:
> On 11/11/18 11:36 PM, Dongli Zhang wrote:
>> On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote:
>>> On 11/3/18 10:24 AM, Dongli Zhang wrote:
 The 'write' latency of sector=40960 is set to a very large value. When the 
 I/O
 is stalled in guest due to that sector=40960 is accessed, I do see below
 messages in guest log:

 [   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
 [   80.808095] nvme nvme0: Abort status: 0x4001


 However, then nothing happens further. nvme I/O hangs in guest. I am not
 able to
 kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. 
 I
 need to kill qemu with "kill -9"


 The same result for virtio-scsi and qemu is stuck as well.
>>> While I didn't try virtio-scsi, I wasn't able to reproduce this behavior 
>>> using
>>> nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying
>>> against?
>> Would you like to reproduce the "aborting" message or the qemu hang?
> I could not reproduce IO hanging in the guest, but I can reproduce qemu 
> hanging.
>> guest image: ubuntu 16.04
>> guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04)
>> qemu: qemu-3.0.0 (with the blkdebug delay patch)
>>
>> Would you be able to see the nvme abort (which is indeed not supported by 
>> qemu)
>> message in guest kernel?
> Yes.
>> Once I see that message, I would not be able to kill the qemu-system-x86_64
>> command line with Ctrl+C.
> 
> I missed this part. I wasn't expecting to handle very long timeouts, but what
> appears to be happening is that the sleep doesn't get interrupted on 
> shutdown. I
> suspect something like this, on top of the series I sent last night, should 
> help:
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 6b1f2d6..0bfb91b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -557,8 +557,11 @@ static int rule_check(BlockDriverState *bs, uint64_t
> offset, uint64_t bytes)
>  remove_active_rule(s, delay_rule);
>  }
> 
> -if (latency != 0) {
> -qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
> +while (latency > 0 && 
> !aio_external_disabled(bdrv_get_aio_context(bs))) {
> +int64_t cur_latency = MIN(latency, 10ULL);
> +
> +qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, cur_latency);
> +latency -= cur_latency;
>  }
>  }
> 
> 
> /marc
> 
> 

I am able to interrupt qemu with above patch to periodically wake up and sleep
again.

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-12 Thread Marc Olson

On 11/11/18 11:36 PM, Dongli Zhang wrote:

On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote:

On 11/3/18 10:24 AM, Dongli Zhang wrote:

The 'write' latency of sector=40960 is set to a very large value. When the I/O
is stalled in guest due to that sector=40960 is accessed, I do see below
messages in guest log:

[   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
[   80.808095] nvme nvme0: Abort status: 0x4001


However, then nothing happens further. nvme I/O hangs in guest. I am not able to
kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
need to kill qemu with "kill -9"


The same result for virtio-scsi and qemu is stuck as well.

While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using
nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying 
against?

Would you like to reproduce the "aborting" message or the qemu hang?
I could not reproduce IO hanging in the guest, but I can reproduce qemu 
hanging.

guest image: ubuntu 16.04
guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04)
qemu: qemu-3.0.0 (with the blkdebug delay patch)

Would you be able to see the nvme abort (which is indeed not supported by qemu)
message in guest kernel?

Yes.

Once I see that message, I would not be able to kill the qemu-system-x86_64
command line with Ctrl+C.


I missed this part. I wasn't expecting to handle very long timeouts, but 
what appears to be happening is that the sleep doesn't get interrupted 
on shutdown. I suspect something like this, on top of the series I sent 
last night, should help:


diff --git a/block/blkdebug.c b/block/blkdebug.c
index 6b1f2d6..0bfb91b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -557,8 +557,11 @@ static int rule_check(BlockDriverState *bs, 
uint64_t offset, uint64_t bytes)

 remove_active_rule(s, delay_rule);
 }

-    if (latency != 0) {
-    qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
+    while (latency > 0 && 
!aio_external_disabled(bdrv_get_aio_context(bs))) {

+    int64_t cur_latency = MIN(latency, 10ULL);
+
+    qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, cur_latency);
+    latency -= cur_latency;
 }
 }


/marc




Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-11 Thread Dongli Zhang



On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote:
> On 11/3/18 10:24 AM, Dongli Zhang wrote:
>> Hi all,
>>
>> I tried with the patch at:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
>>
>> The patch is applied to qemu-3.0.0.
>>
>>
>> Below configuration is used to test the feature for guest VM nvme.
>>
>> # qemu-system-x86_64 \
>> -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
>> -net nic -net user,hostfwd=tcp::5022-:22 \
>> -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
>> -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
>> -object iothread,id=io1 \
>> -device nvme,drive=nvme1,serial=deadbeaf1 \
>> -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1
>>
>> # cat blkdebug.config
>> [delay]
>> event = "write_aio"
>> latency = "99"
>> sector = "40960"
>>
>>
>> The 'write' latency of sector=40960 is set to a very large value. When the 
>> I/O
>> is stalled in guest due to that sector=40960 is accessed, I do see below
>> messages in guest log:
>>
>> [   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
>> [   80.808095] nvme nvme0: Abort status: 0x4001
>>
>>
>> However, then nothing happens further. nvme I/O hangs in guest. I am not 
>> able to
>> kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
>> need to kill qemu with "kill -9"
>>
>>
>> The same result for virtio-scsi and qemu is stuck as well.
> While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using
> nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying 
> against?

Would you like to reproduce the "aborting" message or the qemu hang?

guest image: ubuntu 16.04
guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04)
qemu: qemu-3.0.0 (with the blkdebug delay patch)

Would you be able to see the nvme abort (which is indeed not supported by qemu)
message in guest kernel?

Once I see that message, I would not be able to kill the qemu-system-x86_64
command line with Ctrl+C.

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-11 Thread Marc Olson

On 11/3/18 10:24 AM, Dongli Zhang wrote:

Hi all,

I tried with the patch at:

https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

The patch is applied to qemu-3.0.0.


Below configuration is used to test the feature for guest VM nvme.

# qemu-system-x86_64 \
-smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
-net nic -net user,hostfwd=tcp::5022-:22 \
-drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
-device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
-object iothread,id=io1 \
-device nvme,drive=nvme1,serial=deadbeaf1 \
-drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1

# cat blkdebug.config
[delay]
event = "write_aio"
latency = "99"
sector = "40960"


The 'write' latency of sector=40960 is set to a very large value. When the I/O
is stalled in guest due to that sector=40960 is accessed, I do see below
messages in guest log:

[   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
[   80.808095] nvme nvme0: Abort status: 0x4001


However, then nothing happens further. nvme I/O hangs in guest. I am not able to
kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
need to kill qemu with "kill -9"


The same result for virtio-scsi and qemu is stuck as well.
While I didn't try virtio-scsi, I wasn't able to reproduce this behavior 
using nvme on Ubuntu 18.04 (4.15). What image and kernel version are you 
trying against?


/marc




Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-05 Thread John Snow



On 11/03/2018 01:24 PM, Dongli Zhang wrote:
> Hi all,
> 

Hi, please reply below the quoted text when writing to qemu-devel in the
future; my reply is below.

> I tried with the patch at:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
> 
> The patch is applied to qemu-3.0.0.
> 
> 
> Below configuration is used to test the feature for guest VM nvme.
> 
> # qemu-system-x86_64 \
> -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
> -net nic -net user,hostfwd=tcp::5022-:22 \
> -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
> -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
> -object iothread,id=io1 \
> -device nvme,drive=nvme1,serial=deadbeaf1 \
> -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1
> 
> # cat blkdebug.config
> [delay]
> event = "write_aio"
> latency = "99"
> sector = "40960"
> 
> 
> The 'write' latency of sector=40960 is set to a very large value. When the I/O
> is stalled in guest due to that sector=40960 is accessed, I do see below
> messages in guest log:
> 
> [   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
> [   80.808095] nvme nvme0: Abort status: 0x4001
> 
> 
> However, then nothing happens further. nvme I/O hangs in guest. I am not able 
> to
> kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
> need to kill qemu with "kill -9"
> >
> The same result for virtio-scsi and qemu is stuck as well.
> 

OK, sounds like a bug in the delay implementation here, then; or
something I've not considered with the locking/drain specifics. Thanks
for the report.

> 
> About blkdebug, I can only trigger the error by the config file. Is there a 
> way
> to inject error or latency via qemu monior? For instance, I would like to 
> inject
> error not for a specific sector or state, but for the entire disk when I input
> some command via qemu monitor.
> 

I don't recall.

There are some tricks you can play with set-state and rules that only
apply when in a certain state. I don't remember if there are monitor or
QMP commands to set the state explicitly.

I'm looking at docs/devel/blkdebug.txt and don't see anything immediately.

There's maybe a way you can use blockdev-add to create the blkdebug node
and insert it live into the graph when you want it, and live-remove it
when you don't, but I'm not sure of the syntax right away.

(maybe that's not possible?)

--js

> Dongli Zhang
> 
> 
> On 11/03/2018 02:17 AM, John Snow wrote:
>>
>>
>> On 11/02/2018 01:55 PM, Marc Olson wrote:
>>> On 11/2/18 10:49 AM, John Snow wrote:
 On 11/02/2018 04:11 AM, Dongli Zhang wrote:
> Hi,
>
> Is there any way to emulate I/O timeout on qemu side (not fault
> injection in VM
> kernel) without modifying qemu source code?
>
> For instance, I would like to observe/study/debug the I/O timeout
> handling of
> nvme, scsi, virtio-blk (not supported) of VM kernel.
>
> Is there a way to trigger this on purpose on qemu side?
>
> Thank you very much!
>
> Dongli Zhang
>
 I don't think the blkdebug driver supports arbitrary delays right now.
 Maybe we could augment it to do so?

 (I thought someone already had, but maybe it wasn't merged?)

 Aha, here:

 https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
 V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

 Let's work from there.
>>>
>>> I've got updates to that patch series that fell on the floor due to
>>> other competing things. I'll get some screen time this weekend to work
>>> on them and submit v3.
>>>
>>> /marc
>>>
>>
>> Great! Please CC the usual maintainers, but also include me.
>>
>> In the meantime, Dongli Zhang, why don't you try the v2 patch and see if
>> that helps you out for your use case? Report back if it works for you or
>> not.
>>
>> --js
>>



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-03 Thread Dongli Zhang
Hi all,

I tried with the patch at:

https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

The patch is applied to qemu-3.0.0.


Below configuration is used to test the feature for guest VM nvme.

# qemu-system-x86_64 \
-smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
-net nic -net user,hostfwd=tcp::5022-:22 \
-drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
-device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
-object iothread,id=io1 \
-device nvme,drive=nvme1,serial=deadbeaf1 \
-drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1

# cat blkdebug.config
[delay]
event = "write_aio"
latency = "99"
sector = "40960"


The 'write' latency of sector=40960 is set to a very large value. When the I/O
is stalled in guest due to that sector=40960 is accessed, I do see below
messages in guest log:

[   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
[   80.808095] nvme nvme0: Abort status: 0x4001


However, then nothing happens further. nvme I/O hangs in guest. I am not able to
kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
need to kill qemu with "kill -9"


The same result for virtio-scsi and qemu is stuck as well.


About blkdebug, I can only trigger the error by the config file. Is there a way
to inject error or latency via qemu monior? For instance, I would like to inject
error not for a specific sector or state, but for the entire disk when I input
some command via qemu monitor.

Dongli Zhang


On 11/03/2018 02:17 AM, John Snow wrote:
> 
> 
> On 11/02/2018 01:55 PM, Marc Olson wrote:
>> On 11/2/18 10:49 AM, John Snow wrote:
>>> On 11/02/2018 04:11 AM, Dongli Zhang wrote:
 Hi,

 Is there any way to emulate I/O timeout on qemu side (not fault
 injection in VM
 kernel) without modifying qemu source code?

 For instance, I would like to observe/study/debug the I/O timeout
 handling of
 nvme, scsi, virtio-blk (not supported) of VM kernel.

 Is there a way to trigger this on purpose on qemu side?

 Thank you very much!

 Dongli Zhang

>>> I don't think the blkdebug driver supports arbitrary delays right now.
>>> Maybe we could augment it to do so?
>>>
>>> (I thought someone already had, but maybe it wasn't merged?)
>>>
>>> Aha, here:
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
>>> V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
>>>
>>> Let's work from there.
>>
>> I've got updates to that patch series that fell on the floor due to
>> other competing things. I'll get some screen time this weekend to work
>> on them and submit v3.
>>
>> /marc
>>
> 
> Great! Please CC the usual maintainers, but also include me.
> 
> In the meantime, Dongli Zhang, why don't you try the v2 patch and see if
> that helps you out for your use case? Report back if it works for you or
> not.
> 
> --js
> 



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-02 Thread John Snow



On 11/02/2018 01:55 PM, Marc Olson wrote:
> On 11/2/18 10:49 AM, John Snow wrote:
>> On 11/02/2018 04:11 AM, Dongli Zhang wrote:
>>> Hi,
>>>
>>> Is there any way to emulate I/O timeout on qemu side (not fault
>>> injection in VM
>>> kernel) without modifying qemu source code?
>>>
>>> For instance, I would like to observe/study/debug the I/O timeout
>>> handling of
>>> nvme, scsi, virtio-blk (not supported) of VM kernel.
>>>
>>> Is there a way to trigger this on purpose on qemu side?
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>> I don't think the blkdebug driver supports arbitrary delays right now.
>> Maybe we could augment it to do so?
>>
>> (I thought someone already had, but maybe it wasn't merged?)
>>
>> Aha, here:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
>> V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
>>
>> Let's work from there.
> 
> I've got updates to that patch series that fell on the floor due to
> other competing things. I'll get some screen time this weekend to work
> on them and submit v3.
> 
> /marc
> 

Great! Please CC the usual maintainers, but also include me.

In the meantime, Dongli Zhang, why don't you try the v2 patch and see if
that helps you out for your use case? Report back if it works for you or
not.

--js



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-02 Thread Marc Olson

On 11/2/18 10:49 AM, John Snow wrote:

On 11/02/2018 04:11 AM, Dongli Zhang wrote:

Hi,

Is there any way to emulate I/O timeout on qemu side (not fault injection in VM
kernel) without modifying qemu source code?

For instance, I would like to observe/study/debug the I/O timeout handling of
nvme, scsi, virtio-blk (not supported) of VM kernel.

Is there a way to trigger this on purpose on qemu side?

Thank you very much!

Dongli Zhang


I don't think the blkdebug driver supports arbitrary delays right now.
Maybe we could augment it to do so?

(I thought someone already had, but maybe it wasn't merged?)

Aha, here:

https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

Let's work from there.


I've got updates to that patch series that fell on the floor due to 
other competing things. I'll get some screen time this weekend to work 
on them and submit v3.


/marc




Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-02 Thread John Snow



On 11/02/2018 04:11 AM, Dongli Zhang wrote:
> Hi,
> 
> Is there any way to emulate I/O timeout on qemu side (not fault injection in 
> VM
> kernel) without modifying qemu source code?
> 
> For instance, I would like to observe/study/debug the I/O timeout handling of
> nvme, scsi, virtio-blk (not supported) of VM kernel.
> 
> Is there a way to trigger this on purpose on qemu side?
> 
> Thank you very much!
> 
> Dongli Zhang
> 

I don't think the blkdebug driver supports arbitrary delays right now.
Maybe we could augment it to do so?

(I thought someone already had, but maybe it wasn't merged?)

Aha, here:

https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

Let's work from there.

--js