Also, make sure you disabled nfs4 delegations. This can cause all kind of weird 
problems. Maybe thats the reason why nfs3 is working fine for you

https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4DelegationMandatoryLock

Am 14.11.25, 09:58 schrieb "Jürgen Gotteswinter" 
<[email protected] 
<mailto:[email protected]>LID>:


Have you tried mounting your shares with „hard“ instead of „soft“? Reducing 
timeo and retrans might also improve your situation.


Von: Marty Godsey <[email protected] <mailto:[email protected]>LID>
Antworten an: "[email protected] 
<mailto:[email protected]>" <[email protected] 
<mailto:[email protected]>>
Datum: Freitag, 14. November 2025 um 09:45
An: "[email protected] <mailto:[email protected]>" 
<[email protected] <mailto:[email protected]>>
Betreff: Re: QCOW2 on NFS shared storage


Hello guys,


I will answer in this email to all the different responses I got.




1. Describe HA setup.


* It’s a basic two-node HA cluster using ZFS. It is a TrueNAS F60 all-NVME. The 
network is using basic MLAG. The Linux hosts also have an MLAG as well. I 
understand this is not a network failover, but I wanted to describe the network 
a little.


1. NFS Version.


* I am using NFS 4.1 now. Here is my connect string: nfs 
vers=4.1,rsize=1048576,wsize=1048576,nconnect=16,_netdev,soft,intr,timeo=100,retrans=3,noatime,nodiratime,async
 0 0


* When changing it to NFSv3, I can perform a failover and it does not lock up 
the VM. Though the speed is slower which is expected on ver3.


1. Disk timeout issue.


* It does not take 30 seconds. From the time of starting the failover test, to 
NFS being able to be written to again, is about 5-8 seconds. The network side 
drops no test pings, sometimes maybe 1, and the mounts on the can be written to 
at that 8-10 mark.





Marty




From: Eric Green <[email protected] <mailto:[email protected]>>
Date: Friday, November 14, 2025 at 1:19 AM
To: [email protected] <mailto:[email protected]> 
<[email protected] <mailto:[email protected]>>
Subject: Re: QCOW2 on NFS shared storage
WARNING: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.




This isn’t a qcow2 issue. This is a file system timeout issue in the virtual 
machine. Your NFS failover event is taking longer than 30 seconds, which is the 
default block device timeout for Linux block devices. Your Linux system then 
switches the file system to read only mode which sends everything to heck in a 
handbasket. On Windows VMs it does the blue screen of death and reboots to an 
OS Not Found prompt.


You can either increase the block device timeout on Linux or speed up your 
failover.


VMware ESXi handled this situation by pausing the virtual machine when it 
detected NFS delays. I don’t know that qemu/kvm has that ability, it pauses the 
VM when migrating a VM to a different physical server but not when there are 
delays in the underlying NFS. CloudStack can only use functionality provided by 
the hypervisor.


Get Outlook for 
iOS<https://atpscan.global.hornetsecurity.com?d=iaMacNWpHFZp16egj48xXzQLNtl5-MUyd6fiGcJD5wU&f=ownC0kQ0FqAUHOQwHQAEcna8VQQZRik6595JyUagAfjZ3gh3NKHeMHMAGop8F5LG&i=&k=g9ag&m=zNLsDU4pJcjgaPQY2kNIyV4fc0-KI8C-qiAx0zW0yBGwy_41ByYwRPzLlUnz8jt_lBhHdsTVMS6j95mQOQuBw3V-AwBiLKQVvP5GjyKUapb1vBxQ-vkGvy39tAg8KkY0&n=CVVSUCQWE72mk52lZFEejSZDbTVA42p-OksxkTd0eDDAsbr_WmSruP0PT5j8k4lI&r=UzY4302GaoSnF-QNFAgsfOQiQzMoVQbgV_JhRpMrwIaQaggz1aM8UaIn5LrNvAGO&s=1304210f6ab8d39d31696d792a09a6567b3951eb81ee4a0b1e3e11cbcdbb4837&u=https%3A%2F%2Faka.ms%2Fo0ukef>
 
<https://atpscan.global.hornetsecurity.com?d=iaMacNWpHFZp16egj48xXzQLNtl5-MUyd6fiGcJD5wU&amp;f=ownC0kQ0FqAUHOQwHQAEcna8VQQZRik6595JyUagAfjZ3gh3NKHeMHMAGop8F5LG&amp;i=&amp;k=g9ag&amp;m=zNLsDU4pJcjgaPQY2kNIyV4fc0-KI8C-qiAx0zW0yBGwy_41ByYwRPzLlUnz8jt_lBhHdsTVMS6j95mQOQuBw3V-AwBiLKQVvP5GjyKUapb1vBxQ-vkGvy39tAg8KkY0&amp;n=CVVSUCQWE72mk52lZFEejSZDbTVA42p-OksxkTd0eDDAsbr_WmSruP0PT5j8k4lI&amp;r=UzY4302GaoSnF-QNFAgsfOQiQzMoVQbgV_JhRpMrwIaQaggz1aM8UaIn5LrNvAGO&amp;s=1304210f6ab8d39d31696d792a09a6567b3951eb81ee4a0b1e3e11cbcdbb4837&amp;u=https%3A%2F%2Faka.ms%2Fo0ukef&gt;>
________________________________
From: Marty Godsey <[email protected] <mailto:[email protected]>LID>
Sent: Thursday, November 13, 2025 3:25 PM
To: [email protected] <mailto:[email protected]> 
<[email protected] <mailto:[email protected]>>
Subject: QCOW2 on NFS shared storage


Hello everyone.


So, I am learning, or reading at least, that QCOW2 file format, running a 
shared NFS storage that’s HA, does not to like to failover.


I have an HA NAS running NFS 4.1 and everything works fine until I test the 
failure scenario of a failover on the storage nodes. When I do this, the entire 
VM locks up and must be hard reset to recover.


Is this true? Do people not use QCOW2 on HA NFS storage? Are there time outs I 
can set, etc..


Thank you for all the input.




Marty



Reply via email to