Hi Stefan,
On 2021/2/23 17:40, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2021 at 06:13:06PM +0800, Jiahui Cen wrote:
>> This patch series propose to extend the werror=/rerror= mechanism to add
>> a 'retry' feature. It can automatically retry failed I/O requests on error
>>
Kindly ping.
Any comments and reviews are wellcome :)
Thanks,
Jiahui
On 2021/2/5 18:13, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is tempora
Retry should only be triggered when timeout is not reached, so let's check
timeout before retry. Device should also reset retry_start_time after
successful retry.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 25 +++-
include/sysemu
Enable retry action when backend's retry timer is available. It would
trigger the timer to do device specific retry action.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 9 +
1 file changed, 9 insertions(+)
diff --git a/block/block-backend.c b/block
ile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.
REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html
Jiahui Cen (9):
qapi/block-core: Add retry option for error action
block-backend: Introduce retry timer
block-backend: Add device sp
Add "retry_interval" and "retry_timeout" parameter for drive and device
option. These parameter are valid only when werror/rerror=retry.
eg. --drive file=image,rerror=retry,retry_interval=1000,retry_timeout=5000
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
b
Add retry_request_cb in BlockDevOps to do device specific retry action.
Backend's timer would be registered only when the backend is set 'retry'
on errors and the device supports retry action.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 8
Add a timer to regularly trigger retry on errors.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 21
1 file changed, 21 insertions(+)
diff --git a/block/block-backend.c b/block/block-backend.c
index e493f17515..3a9d55cbe3 100644
--- a/block
Mark failed requests as to be retried and implement retry_request_cb to
handle these requests.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
hw/scsi/scsi-disk.c | 16
1 file changed, 16 insertions(+)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index
Move the code that retries requests from scsi_dma_restart_bh() to its own,
non-static, function. This will allow us to call it from the
retry_request_cb() of scsi-disk in a future patch.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
hw/scsi/scsi-bus.c | 16
Insert failed requests into device's list for later retry and handle
queued requests to implement retry_request_cb.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
hw/block/virtio-blk.c | 21 +---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/hw/block
Add a new error action 'retry' to support retry on errors.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
blockdev.c | 2 ++
qapi/block-core.json | 9 +++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index b250b9b959..ece1d8ae58
Hi Kevin,
What do you think of these patches?
Thanks,
Jiahui
On 2020/12/15 20:30, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down
Kindly ping...
On 2020/12/15 20:30, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down, any I/O issued to the virtual block device
>
Enable retry action when backend's retry timer is available. It would
trigger the timer to do device specific retry action.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/block/block-backend.c b/block
Add "retry_interval" and "retry_timeout" parameter for drive and device
option. These parameter are valid only when werror/rerror=retry.
eg. --drive file=image,rerror=retry,retry_interval=1000,retry_timeout=5000
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
b
Retry should only be triggered when timeout is not reached, so let's check
timeout before retry. Device should also reset retry_start_time after
successful retry.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 25 +++-
include/sysemu
html/qemu-devel/2020-10/msg06560.html
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
Jiahui Cen (7):
qapi/block-core: Add retry option for error action
block-backend: Introduce retry timer
block-backend: Add device specific retry callback
block-backend: Enable retry action on error
Add a timer to regularly trigger retry on errors.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 21
1 file changed, 21 insertions(+)
diff --git a/block/block-backend.c b/block/block-backend.c
index ce78d30794..fe775ea298 100644
--- a/block
Add retry_request_cb in BlockDevOps to do device specific retry action.
Backend's timer would be registered only when the backend is set 'retry'
on errors and the device supports retry action.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 8
Insert failed requests into device's list for later retry and handle
queued requests to implement retry_request_cb.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
hw/block/virtio-blk.c | 19 ---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/hw/block
Add a new error action 'retry' to support retry on errors.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
blockdev.c | 2 ++
qapi/block-core.json | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index 412354b4b6..47c0e6db52
Not all errors would be fixed, so it is better to add a rehandle timeout
for I/O hang.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 99 +-
include/sysemu/block-backend.h | 2 +
2 files changed, 100 insertions(+), 1
-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 60 +++---
include/sysemu/block-backend.h | 2 ++
2 files changed, 58 insertions(+), 4 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 90fcc678b5..c16d95a2c9 100644
smoothly when I/O is recovred with this feature enabled.
v2->v3:
* Add a doc to describe I/O hang.
v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.
Jiahui Cen (9):
block-backend: introduce I/O rehandle info
block-b
Setting a non-zero timeout of I/O hang indicates I/O hang is enabled for the
block backend. And when the block backend is going to be deleted, we should
disable I/O hang.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 40
The I/O hang feature is realized based on a rehandle mechanism.
Each block backend will have a list to store hanging block AIOs,
and a timer to regularly resend these aios. In order to issue
the AIOs again, each block AIOs also need to store its coroutine entry.
Signed-off-by: Jiahui Cen
Signed
Sometimes hypervisor management tools like libvirt may need to monitor
I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block-backend.c | 3 +++
qapi/block-core.json | 26 ++
2
When resetting virtio-blk, we have to drain all AIOs but do not care about the
results. So it is necessary to disable I/O hang before resetting virtio-blk,
and enable it after resetting.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
hw/block/virtio-blk.c | 8
1 file changed
Give some details about the I/O hang and how to use it.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
docs/io-hang.rst | 45 +
1 file changed, 45 insertions(+)
create mode 100644 docs/io-hang.rst
diff --git a/docs/io-hang.rst b/docs/io
situations,
the returned error is often an EIO.
To avoid this unavailablity, we can store the failed AIOs, and resend them
later. If the error is temporary, the retries can succeed and the AIOs can
be successfully completed.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
block/block
I/O hang timeout should be different under different situations. So it is
better to provide an option for user to determine I/O hang timeout for
each block device.
Signed-off-by: Jiahui Cen
Signed-off-by: Ying Fang
---
blockdev.c | 11 +++
1 file changed, 11 insertions(+)
diff --git
32 matches
Mail list logo