Hi Matthew, 

I'm also using a HA TrueNAS as the storage. I have NFS as well as iscsi shares 
and did do some in place upgrade. The failover went more or less smooth, it was 
more of an issue on the TrueNas side where the different vlans didn't come up. 
This caused the engine to take down the storage domain and things took some 
time until everything was up again. The VMs in ovirt did go into paused mode 
and started to work again as soon as the failover was done. I was failing over 
by rebooting one of the TrueNas nodes and this took some time for the other 
node to take over. I was thinking about asking the TN guys if there is a 
command or procedure to speed up the failover. In all I didn't stop any VMs 
although the VMs paused. Depending on the critically of the VMs you might want 
to move to another storage. 

Sven 

-----Ursprüngliche Nachricht-----
Von: users-boun...@ovirt.org [mailto:users-boun...@ovirt.org] Im Auftrag von 
Matthew Trent
Gesendet: Montag, 5. Juni 2017 23:48
An: users <users@ovirt.org>
Betreff: [ovirt-users] Seamless SAN HA failovers with oVirt?

I'm using two TrueNAS HA SANs (FreeBSD-based ZFS) to provide storage via NFS to 
7 oVirt boxes and about 25 VMs.

For SAN system upgrades I've always scheduled a maintenance window, shut down 
all the oVirt stuff, upgraded the SANs, and spun everything back up. It's 
pretty disruptive, but I assumed that was the thing to do.

However, in talking with the TrueNAS vendor they said the majority of their 
customers are using VMWare and they almost always do TrueNAS updates in 
production. They just upgrade one head of the TrueNAS HA pair then failover to 
the other head and upgrade it too. There's a 30-ish second pause in I/O while 
the disk arrays are taken over by the other HA head, but VMWare just tolerates 
it and continues without skipping a beat. They say this is standard procedure 
in the SAN world and virtualization systems should tolerate 30-60 seconds of 
I/O pause for HA failovers seamlessly.

It sounds great to me, but I wanted to pick this lists' brain -- is anyone 
doing this with oVirt? Are you able to failover your HA SAN with 30-60 seconds 
of no I/O without oVirt freaking out?

If not, are there any tunables relating to this? I see the default NFS mount 
options look fairly tolerant (proto=tcp,timeo=600,retrans=6), but are there 
VDSM or sanlock or some other oVirt timeouts that will kick in and start 
putting storage domains into error states, fencing hosts or something before 
that? I've never timed anything, but I want to say my past experience is that 
ovirt hosted engine started showing errors almost immediately when we've had 
SAN issues in the past.

Thanks!

--
Matthew Trent
Network Engineer
Lewis County IT Services
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to