Hi, you are aware of the fact that eql sync replication is just about replication, no single piece of high availability? i am not even sure if it does ip failover itself. so better think about minutes of interruptions than seconds.
anyway, dont count on ovirts pause/unpause. theres a real chance that it will go horrible wrong. a scheduled maint. window where everything gets shut down whould be best practice Juergen Am 5/30/2016 um 3:07 PM schrieb Nicolas Ecarnot: > Hello, > > We're planning a move from our old building towards a new one a few > meters away. > > > > In a similar way of Martijn > (https://www.mail-archive.com/[email protected]/msg33182.html), I have > maintenance planed on our storage side. > > Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic). > This SAN allows me to setup block replication between two SANs, seen by > oVirt as one (Dell is naming it SyncRep). > Then switch all the iSCSI accesses to the replicated LUN. > > When doing this, the iSCSI stack of each oVirt host notices the > de-connection, tries to reconnect, and succeeds. > Amongst our hosts, this happens between 4 and 15 seconds. > > When this happens fast enough, oVirt engine and the VMs don't even > notice, and they keep running happily. > > When this takes more than 4 seconds, there are 2 cases : > > 1 - The hosts and/or oVirt and/or the SPM (I actually don't know) > notices that there is a storage failure, and pauses the VMs. > When the iSCSI stack reconnects, the VMs are automatically recovered > from pause, and this all takes less than 30 seconds. That is very > acceptable for us, as this action is extremely rare. > > 2 - Same storage failure, VMs paused, and some VMs stay in pause mode > forever. > Manual "run" action is mandatory. > When done, everything recovers correctly. > This is also quite acceptable, but here come my questions : > > My questions : (!) > - *WHAT* process or piece of code or what oVirt parts is responsible for > deciding when to UN-pause a VM, and at what conditions? > That would help me to understand why some cases are working even more > smoothly than others. > - Are there related timeouts I could play with in engine-config options? > - [a bit off-topic] Is it safe to increase some iSCSI timeouts of > buffer-sizes in the hope this kind of disconnection would get un-noticed? > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

