On 16/11/20 19:31, Hannes Reinecke wrote:
Hi all,
one of our customers reported an infinite guest hang following an FC link loss
when using scsi-disk.
Problem is that scsi-disk issues SG_IO command with a timeout of UINT_MAX,
which essentially signals
'no timeout' to the host kernel. So if the command gets lost eg during an
unexpected link loss the
HBA driver will never attempt to abort or return the command. Hence the guest
will hang forever, and
the only way to resolve things is to reboot the host.
To solve it this patchset adds an 'io_timeout' parameter to scsi-disk and
scsi-generic, which allows
the admin to specify a command timeout for SG_IO request. It is initialized to
30 seconds to avoid the
infinite hang as mentioned above.
As usual, comments and reviews are welcome.
Hannes Reinecke (3):
virtio-scsi: trace events
scsi: make io_timeout configurable
scsi: add tracing for SG_IO commands
hw/scsi/scsi-disk.c | 9 ++++++---
hw/scsi/scsi-generic.c | 25 ++++++++++++++++++-------
hw/scsi/trace-events | 13 +++++++++++++
hw/scsi/virtio-scsi.c | 30 +++++++++++++++++++++++++++++-
include/hw/scsi/scsi.h | 4 +++-
5 files changed, 69 insertions(+), 12 deletions(-)
The UINT_MAX timeout predates me, but I think the idea was to make it
sort of like NFS's hard option. Without a timeout you cannot be quite
sure if/when the command will stay in some buffer of the HBA or the SAN
or the target, and there could be unintended reordering of writes.
Though I guess at some point you'll anyway restart the VM on another
host and the same reordering can happen, so I've queued the patch.
Paolo