The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem. If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).
Kind regards, ---------- Forwarded message ---------- From: Koen Vanoppen <[email protected]> Date: 2014-04-11 14:47 GMT+02:00 Subject: Re: [ovirt-users] [Users] HA To: Michal Skrivanek <[email protected]> The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem. If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-). Kind regards, 2014-04-11 9:37 GMT+02:00 Michal Skrivanek <[email protected]>: > On 11 Apr 2014, at 09:00, Koen Vanoppen wrote: > > Hi All, > > Any news about this? DSM hook or anything? > Thanx! > > Kind regards > > > 2014-04-09 9:37 GMT+02:00 Omer Frenkel <[email protected]>: > >> >> >> ----- Original Message ----- >> > From: "Koen Vanoppen" <[email protected]> >> > To: [email protected] >> > Sent: Tuesday, April 8, 2014 3:41:02 PM >> > Subject: Re: [Users] HA >> > >> > Or with other words, the SPM and the VM should move almost immediate >> after >> > the storage connections on the hypervisor are gone. I know, I'm asking >> to >> > much maybe, but we would be very happy :-) :-). >> > >> > So sketch: >> > >> > Mercury1 SPM >> > Mercury 2 >> > >> > Mercury1 loses both fibre connections --> goes in non-operational and >> the VM >> > goes in paused state and stays this way, until I manually reboot the >> host so >> > it fences. >> > >> > What I would like is that when mercury 1 loses both fibre connections. >> He >> > fences immediate so the VM's are moved also almost instantly... If this >> is >> > possible... :-) >> > >> > Kind regards and thanks for all the help! >> > >> >> Michal, is there a vdsm hook for vm moved to pause? >> if so, you could send KILL to it, and engine will identify vm was >> killed+HA, >> so it will be restarted, and no need to reboot the host, it will stay in >> non-operational until storage is fixed. >> > > you have to differentiate - if only the VMs would be paused, yes, you can > do anything (also change the err reporting policy to not pause the VM) > but if the host becomes non-operational then it simply doesn't work, vdsm > got stuck somewhere (often in get blk device stats) > proper power management config should fence it > > Thanks, > michal > > >> > >> > >> > 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < [email protected] > : >> > >> > >> > >> > Ok, >> > Thanx already for all the help. I adapted some things for quicker >> respons: >> > engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 >> > engine-config --set FenceQuietTimeBetweenOperationsInSec=60 >> > >> > engine-config --get StorageDomainFalureTimeoutInMinutes-->180 >> > engine-config --set StorageDomainFalureTimeoutInMinutes=1 >> > >> > engine-config --get SpmCommandFailOverRetries-->5 >> > engine-config --set SpmCommandFailOverRetries >> > >> > engine-config --get SPMFailOverAttempts-->3 >> > engine-config --set SPMFailOverAttempts=1 >> > >> > engine-config --get NumberOfFailedRunsOnVds-->3 >> > engine-config --set NumberOfFailedRunsOnVds=1 >> > >> > engine-config --get vdsTimeout-->180 >> > engine-config --set vdsTimeout=30 >> > >> > engine-config --get VDSAttemptsToResetCount-->2 >> > engine-config --set VDSAttemptsToResetCount=1 >> > >> > engine-config --get TimeoutToResetVdsInSeconds-->60 >> > engine-config --set TimeoutToResetVdsInSeconds=30 >> > >> > Now the result of this is that when the VM is not running on the SPM >> that it >> > will migrate before going in pause mode. >> > But when we tried it, when the vm is running on the SPM, it get's in >> paused >> > mode (for safety reasons, I know ;-) ). And stays there until the host >> gets >> > MANUALLY fenced by rebooting it. So now my question is... How can I >> make the >> > hypervisor fence (so reboots, so vm is moved) quicker? >> > >> > Kind regards, >> > >> > Koen >> > >> > >> > 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < [email protected] > : >> > >> > >> > >> > >> > >> > Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). >> Ik heb >> > reeds de time out aangepast. Die stond op 5 min voor hij den time out >> ging >> > geven. Staat nu op 2 min >> > On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < >> > [email protected] > wrote: >> > >> > >> > >> > >> > >> > >> > >> > >> > Ik heb ze ook he >> > >> > >> > >> > Maar normaal had de fencing moeten werken als ik het zo lees >> > >> > Dus daar is ergens iets verkeerd gelopen zo te lezen >> > >> > >> > >> > From: Koen Vanoppen [mailto: [email protected] ] >> > Sent: vrijdag 4 april 2014 16:07 >> > To: David Van Zeebroeck >> > Subject: Fwd: Re: [Users] HA >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > David Van Zeebroeck >> > >> > Product Manager Unix Infrastructure >> > >> > Information & Communication Technology >> > >> > Brussels Airport Company >> > >> > T +32 (0)2 753 66 24 >> > >> > M +32 (0)497 02 17 31 >> > >> > [email protected] >> > >> > >> > >> > www.brusselsairport.be >> > >> > >> > >> > >> > >> > >> > >> > FOLLOW US ON: >> > >> > >> > >> > >> > >> > >> > >> > Company Info >> > >> > >> > >> > >> > >> > >> > ---------- Forwarded message ---------- >> > From: "Michal Skrivanek" < [email protected] > >> > Date: Apr 4, 2014 3:39 PM >> > Subject: Re: [Users] HA >> > To: "Koen Vanoppen" < [email protected] > >> > Cc: "ovirt-users Users" < [email protected] > >> > >> > >> > >> > >> > >> > >> > On 4 Apr 2014, at 15:14, Sander Grendelman wrote: >> > >> > >> > >> > >> > >> > >> > Do you have power management configured? >> > >> > >> > Was the "failed" host fenced/rebooted? >> > >> > >> > >> > >> > >> > On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < [email protected]> >> > wrote: >> > >> > >> > So... It is possible for a fully automatic migration of the VM to >> another >> > hypervisor in case Storage connection fails? >> > >> > >> > How can we make this happen? Because for the moment, when we tested the >> > situation they stayed in pause state. >> > >> > >> > (Test situation: >> > >> > * Unplug the 2 fibre cables from the hypervisor >> > * VM's go in pause state >> > * VM's stayed in pause state until the failure was solved >> > >> > >> > >> > >> > >> > as said before, it's not safe hence we (try to) not migrate them. >> > >> > >> > They only get paused when they actually access the storage which may >> not be >> > always the case. I.e. the storage connection is severed, host deemed >> > NonOperational and VMs are getting migrated from it, then some of them >> will >> > succeed if they didn't access that "bad" storage … the paused VMs will >> > remain (mostly, it can still happen that they appear paused migrated on >> > other host when the disk access occurs only at the last stage of >> migration) >> > >> > >> > >> > >> > >> > >> > >> > >> > so in other words, if you want to migrate the VMs without interruption >> it's >> > not sometimes possible >> > >> > >> > if you are fine with the VMs restarted in short time on other host then >> power >> > management/fencing will help here >> > >> > >> > >> > >> > >> > Thanks, >> > >> > >> > michal >> > >> > >> > >> > >> > >> > >> > ) >> > >> > >> > >> > >> > They only returned when we restored the fiber connection to the >> Hypervisor… >> > >> > >> > >> > >> > >> > yes, since 3.3 we have the autoresume feature >> > >> > >> > >> > >> > >> > Thanks, >> > >> > >> > michal >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > Kind Regards, >> > >> > Koen >> > >> > >> > >> > >> > >> > >> > >> > >> > 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < [email protected] >: >> > >> > >> > So... It is possible for a fully automatic migration of the VM to >> another >> > hypervisor in case Storage connection fails? >> > >> > >> > How can we make this happen? Because for the moment, when we tested the >> > situation they stayed in pause state. >> > >> > >> > (Test situation: >> > >> > * Unplug the 2 fibre cables from the hypervisor >> > * VM's go in pause state >> > * VM's stayed in pause state until the failure was solved >> > >> > >> > ) >> > >> > >> > >> > >> > They only returned when we restored the fiber connection to the >> Hypervisor... >> > >> > >> > Kind Regards, >> > >> > Koen >> > >> > >> > >> > >> > >> > 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < [email protected] >: >> > >> > >> > >> > >> > >> > >> > >> > ---------- Forwarded message ---------- >> > From: "Doron Fediuck" < [email protected] > >> > Date: Apr 3, 2014 4:51 PM >> > Subject: Re: [Users] HA >> > >> > >> > To: "Koen Vanoppen" < [email protected] > >> > Cc: "Omer Frenkel" < [email protected] >, < [email protected] >, >> "Federico >> > Simoncelli" < [email protected] >, "Allon Mureinik" < >> [email protected] >> > > >> > >> > >> > >> > ----- Original Message ----- >> > > From: "Koen Vanoppen" < [email protected] > >> > > To: "Omer Frenkel" < [email protected] >, [email protected] >> > > Sent: Wednesday, April 2, 2014 4:17:36 PM >> > > Subject: Re: [Users] HA >> > > >> > > Yes, indeed. I meant not-operational. Sorry. >> > > So, if I understand this correctly. When we ever come in a situation >> that >> > > we >> > > loose both storage connections on our hypervisor, we will have to >> manually >> > > restore the connections first? >> > > >> > > And thanx for the tip for speeding up thins :-). >> > > >> > > Kind regards, >> > > >> > > Koen >> > > >> > > >> > > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < [email protected] > : >> > > >> > > >> > > >> > > >> > > >> > > ----- Original Message ----- >> > > > From: "Koen Vanoppen" < [email protected] > >> > > > To: [email protected] >> > > > Sent: Wednesday, April 2, 2014 4:07:19 PM >> > > > Subject: [Users] HA >> > > > >> > > > Dear All, >> > > > >> > > > Due our acceptance testing, we discovered something. (Document will >> > > > follow). >> > > > When we disable one fiber path, no problem multipath finds it way no >> > > > pings >> > > > are lost. >> > > > BUT when we disabled both the fiber paths (so one of the storage >> domain >> > > > is >> > > > gone on this host, but still available on the other host), vms go in >> > > > paused >> > > > mode... He chooses a new SPM (can we speed this up?), put's the >> host in >> > > > non-responsive (can we speed this up, more important) and the VM's >> stay >> > > > on >> > > > Paused mode... I would expect that they would be migrated (yes, HA >> is >> > > >> > > i guess you mean the host moves to not-operational (in contrast to >> > > non-responsive)? >> > > if so, the engine will not migrate vms that are paused to do io error, >> > > because of data corruption risk. >> > > >> > > to speed up you can look at the storage domain monitoring timeout: >> > > engine-config --get StorageDomainFalureTimeoutInMinutes >> > > >> > > >> > > > enabled) to the other host and reboot there... Any solution? We are >> still >> > > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the >> easter >> > > > holiday. >> > > > >> > > > Kind Regards, >> > > > >> > > > Koen >> > > > >> > >> > Hi Koen, >> > Resuming from paused due to io issues is supported (adding relevant >> folks). >> > Regardless, if you did not define power management, you should manually >> > approve >> > source host was rebooted in order for migration to proceed. Otherwise >> we risk >> > split-brain scenario. >> > >> > Doron >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > [email protected] >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > [email protected] >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > [email protected] >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users > > >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

