A VM in the cloud environment may use a virutal disk as the backend storage, and there are usually filesystems on the virtual block device. When backend storage is temporarily down, any I/O issued to the virtual block device will cause an error. For example, an error occurred in ext4 filesystem would make the filesystem readonly. In production environment, a cloud backend storage can be soon recovered. For example, an IP-SAN may be down due to network failure and will be online soon after network is recovered. However, the error in the filesystem may not be recovered unless a device reattach or system restart. Thus an I/O retry mechanism is in need to implement a self-healing system.
This patch series propose to extend the werror=/rerror= mechanism to add a 'retry' feature. It can automatically retry failed I/O requests on error without sending error back to guest, and guest can get back running smoothly when I/O is recovred. v4->v5: * Add document for 'retry' in qapi. * Support werror=/rerror=retry for scsi-disk. * Pause retry when draining. v3->v4: * Adapt to werror=/rerror= mechanism. v2->v3: * Add a doc to describe I/O hang. v1->v2: * Rebase to fix compile problems. * Fix incorrect remove of rehandle list. * Provide rehandle pause interface. REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html Jiahui Cen (9): qapi/block-core: Add retry option for error action block-backend: Introduce retry timer block-backend: Add device specific retry callback block-backend: Enable retry action on errors block-backend: Add timeout support for retry block: Add error retry param setting virtio_blk: Add support for retry on errors scsi-bus: Refactor the code that retries requests scsi-disk: Add support for retry on errors block/block-backend.c | 68 ++++++++++++++++++++ blockdev.c | 52 +++++++++++++++ hw/block/block.c | 10 +++ hw/block/virtio-blk.c | 21 +++++- hw/scsi/scsi-bus.c | 16 +++-- hw/scsi/scsi-disk.c | 16 +++++ include/hw/block/block.h | 7 +- include/hw/scsi/scsi.h | 1 + include/sysemu/block-backend.h | 10 +++ qapi/block-core.json | 9 ++- 10 files changed, 199 insertions(+), 11 deletions(-) -- 2.29.2
