Hi there.
I'm working on moving my NFS setup to ZFS over iSCSI. I'm using a CentOS 7.6
box with ZoL 0.8.1, with the LIO backend (but this shouldn't be relevent, see
further). For the PVE side, I'm running PVE6 with all updates applied.
Except a few minor issues I found in the LIO backend (for which I sent a patch
serie earlier today), most things do work nicely. Except one which is important
to me : I can't move disk from ZFS over iSCSI to any other storage. Destination
storage type doesn't matter, but the porblem is 100% reproducible when the
source storage is ZFS over iSCSI
A few seconds after I started disk move, the guest FS will "panic". For
example, with an el7 guest using XFS, I get :
kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00
kernel: blk_update_request: I/O error, dev sda, sector 7962536
kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00
kernel: blk_update_request: I/O error, dev sda, sector 7962536
kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 bc 0e 28 00 00 08 00
kernel: blk_update_request: I/O error, dev sda, sector 12324392
And the system completely crash. The data itself is not impacted. I can restart
the guest and everything appears OK. It doesn't matter if I let the disk move
operation terminates or if I cancel it.
Moving the disk offline works as expected.
Sparse or non sparse zvol backend doesn't matter either.
I searched a lot about this issue, and found at least two other persons having
the same, or a very similar issue :
* One using ZoL but with SCST, see [
https://sourceforge.net/p/scst/mailman/message/35241011/ |
https://sourceforge.net/p/scst/mailman/message/35241011/ ]
* Another, using OmniOS, so with Comstar, see [
https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/
|
https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/
]
Both are likely running PVE5, so it looks like it's not a recently introduced
regression.
I also was able to reproduce the issue with a FreeNAS storage, so using ctld.
As the issue is present with so many different stack, I think we can eliminate
an issue on the storage side. The problem is most likely on qemu, in it's iSCSI
block implementation.
The SCST-Devel thread is interesting, but infortunately, it's beyond my skills
here.
Any advice on how to debug this further ? I can reproduce it whenever I want,
on a test setup. I'm happy to provide any usefull informations
Regards, Daniel
--
[ https://www.firewall-services.com/ ]
Daniel Berteaud
FIREWALL-SERVICES SAS, La sécurité des réseaux
Société de Services en Logiciels Libres
Tél : +33.5 56 64 15 32
Matrix: @dani:fws.fr
[ https://www.firewall-services.com/ | https://www.firewall-services.com ]
_______________________________________________
pve-user mailing list
[email protected]
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user