On Mon Sep 29, 2025 at 7:00 PM CEST, Lorne Guse wrote:
> TrueNAS has indicated that being able to reboot the storage device "Its a 
> basic requirement in the Enterprise." I've now done several TrueNAS upgrades 
> and reboots without any issues. I don't have any Windows VMs on my cluster 
> ATM, but I intend to build a few for testing purposes.
>
> Thanks again for your input.

You're welcome! Glad it's all working!

> ________________________________
> From: Max R. Carrara <[email protected]>
> Sent: Monday, September 29, 2025 6:06 AM
> To: Lorne Guse <[email protected]>; Proxmox VE development discussion 
> <[email protected]>
> Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI 
> storage
>
> On Fri Sep 26, 2025 at 6:41 PM CEST, Lorne Guse wrote:
> > TIL what nerd-sniping is. I was worried that I broke some kind of rule a 
> > first. LOL
>
> Hahaha, oh no, it's all good 😄
>
> >
> > Thank you for your response. I will do some more extensive testing to see 
> > if there is a limit. Some TrueNAS updates can take longer than 3 min.
>
> You're welcome!
>
> >
> > I imagine it might be guest-dependent.
> >
> > I always assumed that I had to shut down my VMs before updating TrueNAS. On 
> > the next update I'll run some backups and update while my proxmox cluster 
> > is online.
>
> Yeah, I would say it's the combination of storage and guest; that is,
> it depends on what's running inside the guest and on what kind of
> storage the guest's disks are residing on.
>
> Also, not sure if I've expressed this properly in my previous response,
> but I definitely wouldn't *rely* on things being fine if some storage is
> down for a bit. The safe option naturally is to shutdown any guests
> using that storage before updating (unless the VMs can be migrated to
> a different node that's using a different storage).
>
> > ________________________________
> > From: Max R. Carrara <[email protected]>
> > Sent: Friday, September 26, 2025 7:32 AM
> > To: Lorne Guse <[email protected]>; Proxmox VE development 
> > discussion <[email protected]>
> > Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI 
> > storage
> >
> > On Fri Sep 26, 2025 at 4:06 AM CEST, Lorne Guse wrote:
> > > RE: TrueNAS over iSCSI Custom Storage Plugin
> > >
> > > TrueNAS has asked me to investigate how Proxmox reacts to reboot of the 
> > > storage server while VMs and cluster are active. This is especially 
> > > relevant for updates to TrueNAS.
> > >
> > > >The one test we'd like to see work is reboot of TrueNAS node while VMs 
> > > >and cluster are operational… does it it "resume" cleanly? A TrueNAS 
> > > >software update will be similar.
> > >
> > > I don't think the storage plugin is responsible for this level of 
> > > interaction with the storage server. Is there anything that can be done 
> > > at the storage plugin level to facilitate graceful recovery when the 
> > > storage server goes down?
> > >
> > >
> > > --
> > > Lorne Guse
> >
> > From what I have experienced, it depends entirely on the underlying
> > storage implementation. Since you nerd-sniped me a little here, I
> > decided to do some testing.
> >
> > On ZFS over iSCSI (using LIO), the downtime does not affect the VM at
> > all, except that I/O is stalled while the remote storage is rebooting.
> > So while I/O operations might take a little while to go through from the
> > VMs perspective, nothing broke here (in my Debian VM at least).
> >
> > Note that with "broke" I mean that the VM kept on running, the OS and
> > its parts didn't throw any errors, no systemd units failed, etc.
> > Of course, if an application running inside the VM for example sets a
> > timeout on some disk operation and throws an error because of that,
> > that's an "issue" with the application.
> >
> > I even shut down the ZFS-over-iSCSI-via-LIO remote for a couple minutes
> > to see if it would throw any errors eventually, but nope, it doesn't;
> > things just take a while:
> >
> > Starting: Fri Sep 26 02:32:52 PM CEST 2025
> > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87  foo
> > Done: Fri Sep 26 02:32:58 PM CEST 2025
> > Starting: Fri Sep 26 02:32:59 PM CEST 2025
> > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87  foo
> > Done: Fri Sep 26 02:33:04 PM CEST 2025
> > Starting: Fri Sep 26 02:33:05 PM CEST 2025
> > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87  foo
> > Done: Fri Sep 26 02:36:16 PM CEST 2025
> > Starting: Fri Sep 26 02:36:17 PM CEST 2025
> > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87  foo
> > Done: Fri Sep 26 02:36:23 PM CEST 2025
> > Starting: Fri Sep 26 02:36:24 PM CEST 2025
> > d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87  foo
> > Done: Fri Sep 26 02:36:29 PM CEST 2025
> >
> > The timestamps there show that the storage was down for ~3 minutes,
> > which is a *lot*, but nevertheless everything kept on running.
> >
> > The above is the output of the following:
> >
> >     while sleep 1; do echo "Starting: $(date)"; sha256sum foo; echo "Done: 
> > $(date)"; done
> >
> > ... where "foo" is a 4 GiB large file I had created with:
> >
> >     dd if=/dev/urandom of=./foo bs=1M count=4000
> >
> > With the TrueNAS legacy plugin (also ZFS over iSCSI, as you know),
> > reboots of TrueNAS are also handled "graciously" in this way; I was able
> > to observe the same behavior as with the LIO iSCSI provider. So if you
> > keep using iSCSI for the new plugin (which I think you do, IIRC),
> > everything should be fine. But as I said, it's up to the applications
> > inside the guest whether long disk I/O latencies are a problem or not.
> >
> > On a side note, I'm not too familiar with how QEMU handles iSCSI
> > sessions in particular, but from what it seems it just waits until the
> > iSCSI session resumes; at least that's what I'm assuming here.
> >
> > For curiosity's sake I also tested this with my SSHFS plugin [0], and
> > in that case the VM remained online, but threw I/O errors immediately
> > and remained in an unusable state even once the storage was up again.
> > (I'll actually see if I can prevent that from happening; IIRC there's
> > an option for reconnecting, unless I'm mistaken.)
> >
> > Regarding your question what the plugin can do to facilitate graceful
> > recovery: In your case, things should be fine "out of the box" because
> > of the magic intricacies of iSCSI + QEMU, with other plugins & storage
> > implementations it really depends.
> >
> > Hope that helps clearing some things up!
> >
> > [0]: 
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.proxmox.com%2F%3Fp%3Dpve-storage-plugin-examples.git%3Ba%3Dblob%3Bf%3Dplugin-sshfs%2Fsrc%2FPVE%2FStorage%2FCustom%2FSSHFSPlugin.pm%3Bh%3D2d1612b139a3342e7a91b9d2809c2cf209ed9b05%3Bhb%3Drefs%2Fheads%2Fmaster&data=05%7C02%7C%7C9ee54005a83441f7a8c008ddff509e79%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638947443902935220%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=xQ3G9cscw%2Bq1%2FfGp%2FkWEIC%2BWNJAxNaAVFw3POtKIlHk%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.proxmox.com%2F%3Fp%3Dpve-storage-plugin-examples.git%3Ba%3Dblob%3Bf%3Dplugin-sshfs%2Fsrc%2FPVE%2FStorage%2FCustom%2FSSHFSPlugin.pm%3Bh%3D2d1612b139a3342e7a91b9d2809c2cf209ed9b05%3Bhb%3Drefs%2Fheads%2Fmaster&data=05%7C02%7C%7C9ee54005a83441f7a8c008ddff509e79%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638947443902956864%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=VgO8KqwwIkRoH7XZQZIjeaWuKzVfzLkDZw2hXy5pK58%3D&reserved=0><https://git.proxmox.com/?p=pve-storage-plugin-examples.git;a=blob;f=plugin-sshfs/src/PVE/Storage/Custom/SSHFSPlugin.pm;h=2d1612b139a3342e7a91b9d2809c2cf209ed9b05;hb=refs/heads/master>



_______________________________________________
pve-devel mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to