Re: [PATCH v5 0/9] block: Add retry for werror=/rerror= mechanism

2021-02-23 Thread Jiahui Cen
Hi Stefan, On 2021/2/23 17:40, Stefan Hajnoczi wrote: > On Fri, Feb 05, 2021 at 06:13:06PM +0800, Jiahui Cen wrote: >> This patch series propose to extend the werror=/rerror= mechanism to add >> a 'retry' feature. It can automatically retry failed I/O requests on error >>

Re: [PATCH v5 0/9] block: Add retry for werror=/rerror= mechanism

2021-02-09 Thread Jiahui Cen
Kindly ping. Any comments and reviews are wellcome :) Thanks, Jiahui On 2021/2/5 18:13, Jiahui Cen wrote: > A VM in the cloud environment may use a virutal disk as the backend storage, > and there are usually filesystems on the virtual block device. When backend > storage is tempora

[PATCH v5 5/9] block-backend: Add timeout support for retry

2021-02-05 Thread Jiahui Cen
Retry should only be triggered when timeout is not reached, so let's check timeout before retry. Device should also reset retry_start_time after successful retry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 25 +++- include/sysemu

[PATCH v5 4/9] block-backend: Enable retry action on errors

2021-02-05 Thread Jiahui Cen
Enable retry action when backend's retry timer is available. It would trigger the timer to do device specific retry action. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/block-backend.c b/block

[PATCH v5 0/9] block: Add retry for werror=/rerror= mechanism

2021-02-05 Thread Jiahui Cen
ile problems. * Fix incorrect remove of rehandle list. * Provide rehandle pause interface. REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html Jiahui Cen (9): qapi/block-core: Add retry option for error action block-backend: Introduce retry timer block-backend: Add device sp

[PATCH v5 6/9] block: Add error retry param setting

2021-02-05 Thread Jiahui Cen
Add "retry_interval" and "retry_timeout" parameter for drive and device option. These parameter are valid only when werror/rerror=retry. eg. --drive file=image,rerror=retry,retry_interval=1000,retry_timeout=5000 Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- b

[PATCH v5 3/9] block-backend: Add device specific retry callback

2021-02-05 Thread Jiahui Cen
Add retry_request_cb in BlockDevOps to do device specific retry action. Backend's timer would be registered only when the backend is set 'retry' on errors and the device supports retry action. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 8

[PATCH v5 2/9] block-backend: Introduce retry timer

2021-02-05 Thread Jiahui Cen
Add a timer to regularly trigger retry on errors. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 21 1 file changed, 21 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index e493f17515..3a9d55cbe3 100644 --- a/block

[PATCH v5 9/9] scsi-disk: Add support for retry on errors

2021-02-05 Thread Jiahui Cen
Mark failed requests as to be retried and implement retry_request_cb to handle these requests. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/scsi/scsi-disk.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index

[PATCH v5 8/9] scsi-bus: Refactor the code that retries requests

2021-02-05 Thread Jiahui Cen
Move the code that retries requests from scsi_dma_restart_bh() to its own, non-static, function. This will allow us to call it from the retry_request_cb() of scsi-disk in a future patch. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/scsi/scsi-bus.c | 16

[PATCH v5 7/9] virtio_blk: Add support for retry on errors

2021-02-05 Thread Jiahui Cen
Insert failed requests into device's list for later retry and handle queued requests to implement retry_request_cb. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/block/virtio-blk.c | 21 +--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/hw/block

[PATCH v5 1/9] qapi/block-core: Add retry option for error action

2021-02-05 Thread Jiahui Cen
Add a new error action 'retry' to support retry on errors. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 2 ++ qapi/block-core.json | 9 +++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/blockdev.c b/blockdev.c index b250b9b959..ece1d8ae58

Ping: [PATCH v4 0/7] block: Add retry for werror=/rerror= mechanism

2021-01-05 Thread Jiahui Cen
Hi Kevin, What do you think of these patches? Thanks, Jiahui On 2020/12/15 20:30, Jiahui Cen wrote: > A VM in the cloud environment may use a virutal disk as the backend storage, > and there are usually filesystems on the virtual block device. When backend > storage is temporarily down

Re: [PATCH v4 0/7] block: Add retry for werror=/rerror= mechanism

2020-12-20 Thread Jiahui Cen
Kindly ping... On 2020/12/15 20:30, Jiahui Cen wrote: > A VM in the cloud environment may use a virutal disk as the backend storage, > and there are usually filesystems on the virtual block device. When backend > storage is temporarily down, any I/O issued to the virtual block device >

[PATCH v4 4/7] block-backend: Enable retry action on errors

2020-12-15 Thread Jiahui Cen
Enable retry action when backend's retry timer is available. It would trigger the timer to do device specific retry action. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block/block-backend.c b/block

[PATCH v4 6/7] block: Add error retry param setting

2020-12-15 Thread Jiahui Cen
Add "retry_interval" and "retry_timeout" parameter for drive and device option. These parameter are valid only when werror/rerror=retry. eg. --drive file=image,rerror=retry,retry_interval=1000,retry_timeout=5000 Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- b

[PATCH v4 5/7] block-backend: Add timeout support for retry

2020-12-15 Thread Jiahui Cen
Retry should only be triggered when timeout is not reached, so let's check timeout before retry. Device should also reset retry_start_time after successful retry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 25 +++- include/sysemu

[PATCH v4 0/7] block: Add retry for werror=/rerror= mechanism

2020-12-15 Thread Jiahui Cen
html/qemu-devel/2020-10/msg06560.html Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang Jiahui Cen (7): qapi/block-core: Add retry option for error action block-backend: Introduce retry timer block-backend: Add device specific retry callback block-backend: Enable retry action on error

[PATCH v4 2/7] block-backend: Introduce retry timer

2020-12-15 Thread Jiahui Cen
Add a timer to regularly trigger retry on errors. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 21 1 file changed, 21 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index ce78d30794..fe775ea298 100644 --- a/block

[PATCH v4 3/7] block-backend: Add device specific retry callback

2020-12-15 Thread Jiahui Cen
Add retry_request_cb in BlockDevOps to do device specific retry action. Backend's timer would be registered only when the backend is set 'retry' on errors and the device supports retry action. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 8

[PATCH v4 7/7] virtio_blk: Add support for retry on errors

2020-12-15 Thread Jiahui Cen
Insert failed requests into device's list for later retry and handle queued requests to implement retry_request_cb. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/block/virtio-blk.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/hw/block

[PATCH v4 1/7] qapi/block-core: Add retry option for error action

2020-12-15 Thread Jiahui Cen
Add a new error action 'retry' to support retry on errors. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 2 ++ qapi/block-core.json | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/blockdev.c b/blockdev.c index 412354b4b6..47c0e6db52

[PATCH v3 3/9] block-backend: add I/O hang timeout

2020-10-22 Thread Jiahui Cen
Not all errors would be fixed, so it is better to add a rehandle timeout for I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 99 +- include/sysemu/block-backend.h | 2 + 2 files changed, 100 insertions(+), 1

[PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause

2020-10-22 Thread Jiahui Cen
-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 60 +++--- include/sysemu/block-backend.h | 2 ++ 2 files changed, 58 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 90fcc678b5..c16d95a2c9 100644

[PATCH v3 0/9] block-backend: Introduce I/O hang

2020-10-22 Thread Jiahui Cen
smoothly when I/O is recovred with this feature enabled. v2->v3: * Add a doc to describe I/O hang. v1->v2: * Rebase to fix compile problems. * Fix incorrect remove of rehandle list. * Provide rehandle pause interface. Jiahui Cen (9): block-backend: introduce I/O rehandle info block-b

[PATCH v3 5/9] block-backend: enable I/O hang when timeout is set

2020-10-22 Thread Jiahui Cen
Setting a non-zero timeout of I/O hang indicates I/O hang is enabled for the block backend. And when the block backend is going to be deleted, we should disable I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 40

[PATCH v3 1/9] block-backend: introduce I/O rehandle info

2020-10-22 Thread Jiahui Cen
The I/O hang feature is realized based on a rehandle mechanism. Each block backend will have a list to store hanging block AIOs, and a timer to regularly resend these aios. In order to issue the AIOs again, each block AIOs also need to store its coroutine entry. Signed-off-by: Jiahui Cen Signed

[PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event

2020-10-22 Thread Jiahui Cen
Sometimes hypervisor management tools like libvirt may need to monitor I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 3 +++ qapi/block-core.json | 26 ++ 2

[PATCH v3 6/9] virtio-blk: pause I/O hang when resetting

2020-10-22 Thread Jiahui Cen
When resetting virtio-blk, we have to drain all AIOs but do not care about the results. So it is necessary to disable I/O hang before resetting virtio-blk, and enable it after resetting. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/block/virtio-blk.c | 8 1 file changed

[PATCH v3 9/9] docs: add a doc about I/O hang

2020-10-22 Thread Jiahui Cen
Give some details about the I/O hang and how to use it. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- docs/io-hang.rst | 45 + 1 file changed, 45 insertions(+) create mode 100644 docs/io-hang.rst diff --git a/docs/io-hang.rst b/docs/io

[PATCH v3 2/9] block-backend: rehandle block aios when EIO

2020-10-22 Thread Jiahui Cen
situations, the returned error is often an EIO. To avoid this unavailablity, we can store the failed AIOs, and resend them later. If the error is temporary, the retries can succeed and the AIOs can be successfully completed. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block

[PATCH v3 7/9] qemu-option: add I/O hang timeout option

2020-10-22 Thread Jiahui Cen
I/O hang timeout should be different under different situations. So it is better to provide an option for user to determine I/O hang timeout for each block device. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 11 +++ 1 file changed, 11 insertions(+) diff --git