On Mon Sep 29, 2025 at 7:00 PM CEST, Lorne Guse wrote: > TrueNAS has indicated that being able to reboot the storage device "Its a > basic requirement in the Enterprise." I've now done several TrueNAS upgrades > and reboots without any issues. I don't have any Windows VMs on my cluster > ATM, but I intend to build a few for testing purposes. > > Thanks again for your input.
You're welcome! Glad it's all working! > ________________________________ > From: Max R. Carrara <[email protected]> > Sent: Monday, September 29, 2025 6:06 AM > To: Lorne Guse <[email protected]>; Proxmox VE development discussion > <[email protected]> > Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI > storage > > On Fri Sep 26, 2025 at 6:41 PM CEST, Lorne Guse wrote: > > TIL what nerd-sniping is. I was worried that I broke some kind of rule a > > first. LOL > > Hahaha, oh no, it's all good 😄 > > > > > Thank you for your response. I will do some more extensive testing to see > > if there is a limit. Some TrueNAS updates can take longer than 3 min. > > You're welcome! > > > > > I imagine it might be guest-dependent. > > > > I always assumed that I had to shut down my VMs before updating TrueNAS. On > > the next update I'll run some backups and update while my proxmox cluster > > is online. > > Yeah, I would say it's the combination of storage and guest; that is, > it depends on what's running inside the guest and on what kind of > storage the guest's disks are residing on. > > Also, not sure if I've expressed this properly in my previous response, > but I definitely wouldn't *rely* on things being fine if some storage is > down for a bit. The safe option naturally is to shutdown any guests > using that storage before updating (unless the VMs can be migrated to > a different node that's using a different storage). > > > ________________________________ > > From: Max R. Carrara <[email protected]> > > Sent: Friday, September 26, 2025 7:32 AM > > To: Lorne Guse <[email protected]>; Proxmox VE development > > discussion <[email protected]> > > Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI > > storage > > > > On Fri Sep 26, 2025 at 4:06 AM CEST, Lorne Guse wrote: > > > RE: TrueNAS over iSCSI Custom Storage Plugin > > > > > > TrueNAS has asked me to investigate how Proxmox reacts to reboot of the > > > storage server while VMs and cluster are active. This is especially > > > relevant for updates to TrueNAS. > > > > > > >The one test we'd like to see work is reboot of TrueNAS node while VMs > > > >and cluster are operational… does it it "resume" cleanly? A TrueNAS > > > >software update will be similar. > > > > > > I don't think the storage plugin is responsible for this level of > > > interaction with the storage server. Is there anything that can be done > > > at the storage plugin level to facilitate graceful recovery when the > > > storage server goes down? > > > > > > > > > -- > > > Lorne Guse > > > > From what I have experienced, it depends entirely on the underlying > > storage implementation. Since you nerd-sniped me a little here, I > > decided to do some testing. > > > > On ZFS over iSCSI (using LIO), the downtime does not affect the VM at > > all, except that I/O is stalled while the remote storage is rebooting. > > So while I/O operations might take a little while to go through from the > > VMs perspective, nothing broke here (in my Debian VM at least). > > > > Note that with "broke" I mean that the VM kept on running, the OS and > > its parts didn't throw any errors, no systemd units failed, etc. > > Of course, if an application running inside the VM for example sets a > > timeout on some disk operation and throws an error because of that, > > that's an "issue" with the application. > > > > I even shut down the ZFS-over-iSCSI-via-LIO remote for a couple minutes > > to see if it would throw any errors eventually, but nope, it doesn't; > > things just take a while: > > > > Starting: Fri Sep 26 02:32:52 PM CEST 2025 > > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo > > Done: Fri Sep 26 02:32:58 PM CEST 2025 > > Starting: Fri Sep 26 02:32:59 PM CEST 2025 > > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo > > Done: Fri Sep 26 02:33:04 PM CEST 2025 > > Starting: Fri Sep 26 02:33:05 PM CEST 2025 > > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo > > Done: Fri Sep 26 02:36:16 PM CEST 2025 > > Starting: Fri Sep 26 02:36:17 PM CEST 2025 > > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo > > Done: Fri Sep 26 02:36:23 PM CEST 2025 > > Starting: Fri Sep 26 02:36:24 PM CEST 2025 > > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo > > Done: Fri Sep 26 02:36:29 PM CEST 2025 > > > > The timestamps there show that the storage was down for ~3 minutes, > > which is a *lot*, but nevertheless everything kept on running. > > > > The above is the output of the following: > > > > while sleep 1; do echo "Starting: $(date)"; sha256sum foo; echo "Done: > > $(date)"; done > > > > ... where "foo" is a 4 GiB large file I had created with: > > > > dd if=/dev/urandom of=./foo bs=1M count=4000 > > > > With the TrueNAS legacy plugin (also ZFS over iSCSI, as you know), > > reboots of TrueNAS are also handled "graciously" in this way; I was able > > to observe the same behavior as with the LIO iSCSI provider. So if you > > keep using iSCSI for the new plugin (which I think you do, IIRC), > > everything should be fine. But as I said, it's up to the applications > > inside the guest whether long disk I/O latencies are a problem or not. > > > > On a side note, I'm not too familiar with how QEMU handles iSCSI > > sessions in particular, but from what it seems it just waits until the > > iSCSI session resumes; at least that's what I'm assuming here. > > > > For curiosity's sake I also tested this with my SSHFS plugin [0], and > > in that case the VM remained online, but threw I/O errors immediately > > and remained in an unusable state even once the storage was up again. > > (I'll actually see if I can prevent that from happening; IIRC there's > > an option for reconnecting, unless I'm mistaken.) > > > > Regarding your question what the plugin can do to facilitate graceful > > recovery: In your case, things should be fine "out of the box" because > > of the magic intricacies of iSCSI + QEMU, with other plugins & storage > > implementations it really depends. > > > > Hope that helps clearing some things up! > > > > [0]: > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.proxmox.com%2F%3Fp%3Dpve-storage-plugin-examples.git%3Ba%3Dblob%3Bf%3Dplugin-sshfs%2Fsrc%2FPVE%2FStorage%2FCustom%2FSSHFSPlugin.pm%3Bh%3D2d1612b139a3342e7a91b9d2809c2cf209ed9b05%3Bhb%3Drefs%2Fheads%2Fmaster&data=05%7C02%7C%7C9ee54005a83441f7a8c008ddff509e79%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638947443902935220%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=xQ3G9cscw%2Bq1%2FfGp%2FkWEIC%2BWNJAxNaAVFw3POtKIlHk%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.proxmox.com%2F%3Fp%3Dpve-storage-plugin-examples.git%3Ba%3Dblob%3Bf%3Dplugin-sshfs%2Fsrc%2FPVE%2FStorage%2FCustom%2FSSHFSPlugin.pm%3Bh%3D2d1612b139a3342e7a91b9d2809c2cf209ed9b05%3Bhb%3Drefs%2Fheads%2Fmaster&data=05%7C02%7C%7C9ee54005a83441f7a8c008ddff509e79%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638947443902956864%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=VgO8KqwwIkRoH7XZQZIjeaWuKzVfzLkDZw2hXy5pK58%3D&reserved=0><https://git.proxmox.com/?p=pve-storage-plugin-examples.git;a=blob;f=plugin-sshfs/src/PVE/Storage/Custom/SSHFSPlugin.pm;h=2d1612b139a3342e7a91b9d2809c2cf209ed9b05;hb=refs/heads/master> _______________________________________________ pve-devel mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
