On 11 Apr 2014, at 09:00, Koen Vanoppen wrote: > Hi All, > > Any news about this? DSM hook or anything? > Thanx! > > Kind regards > > > 2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofren...@redhat.com>: > > > ----- Original Message ----- > > From: "Koen Vanoppen" <vanoppen.k...@gmail.com> > > To: users@ovirt.org > > Sent: Tuesday, April 8, 2014 3:41:02 PM > > Subject: Re: [Users] HA > > > > Or with other words, the SPM and the VM should move almost immediate after > > the storage connections on the hypervisor are gone. I know, I'm asking to > > much maybe, but we would be very happy :-) :-). > > > > So sketch: > > > > Mercury1 SPM > > Mercury 2 > > > > Mercury1 loses both fibre connections --> goes in non-operational and the VM > > goes in paused state and stays this way, until I manually reboot the host so > > it fences. > > > > What I would like is that when mercury 1 loses both fibre connections. He > > fences immediate so the VM's are moved also almost instantly... If this is > > possible... :-) > > > > Kind regards and thanks for all the help! > > > > Michal, is there a vdsm hook for vm moved to pause? > if so, you could send KILL to it, and engine will identify vm was killed+HA, > so it will be restarted, and no need to reboot the host, it will stay in > non-operational until storage is fixed.
you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM) but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats) proper power management config should fence it Thanks, michal > > > > > > > 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.k...@gmail.com > : > > > > > > > > Ok, > > Thanx already for all the help. I adapted some things for quicker respons: > > engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 > > engine-config --set FenceQuietTimeBetweenOperationsInSec=60 > > > > engine-config --get StorageDomainFalureTimeoutInMinutes-->180 > > engine-config --set StorageDomainFalureTimeoutInMinutes=1 > > > > engine-config --get SpmCommandFailOverRetries-->5 > > engine-config --set SpmCommandFailOverRetries > > > > engine-config --get SPMFailOverAttempts-->3 > > engine-config --set SPMFailOverAttempts=1 > > > > engine-config --get NumberOfFailedRunsOnVds-->3 > > engine-config --set NumberOfFailedRunsOnVds=1 > > > > engine-config --get vdsTimeout-->180 > > engine-config --set vdsTimeout=30 > > > > engine-config --get VDSAttemptsToResetCount-->2 > > engine-config --set VDSAttemptsToResetCount=1 > > > > engine-config --get TimeoutToResetVdsInSeconds-->60 > > engine-config --set TimeoutToResetVdsInSeconds=30 > > > > Now the result of this is that when the VM is not running on the SPM that it > > will migrate before going in pause mode. > > But when we tried it, when the vm is running on the SPM, it get's in paused > > mode (for safety reasons, I know ;-) ). And stays there until the host gets > > MANUALLY fenced by rebooting it. So now my question is... How can I make the > > hypervisor fence (so reboots, so vm is moved) quicker? > > > > Kind regards, > > > > Koen > > > > > > 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.k...@gmail.com > : > > > > > > > > > > > > Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb > > reeds de time out aangepast. Die stond op 5 min voor hij den time out ging > > geven. Staat nu op 2 min > > On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < > > david.van.zeebro...@brusselsairport.be > wrote: > > > > > > > > > > > > > > > > > > Ik heb ze ook he > > > > > > > > Maar normaal had de fencing moeten werken als ik het zo lees > > > > Dus daar is ergens iets verkeerd gelopen zo te lezen > > > > > > > > From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ] > > Sent: vrijdag 4 april 2014 16:07 > > To: David Van Zeebroeck > > Subject: Fwd: Re: [Users] HA > > > > > > > > > > > > > > > > > > > > > > > > David Van Zeebroeck > > > > Product Manager Unix Infrastructure > > > > Information & Communication Technology > > > > Brussels Airport Company > > > > T +32 (0)2 753 66 24 > > > > M +32 (0)497 02 17 31 > > > > david.van.zeebro...@brusselsairport.be > > > > > > > > www.brusselsairport.be > > > > > > > > > > > > > > > > FOLLOW US ON: > > > > > > > > > > > > > > > > Company Info > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > From: "Michal Skrivanek" < michal.skriva...@redhat.com > > > Date: Apr 4, 2014 3:39 PM > > Subject: Re: [Users] HA > > To: "Koen Vanoppen" < vanoppen.k...@gmail.com > > > Cc: "ovirt-users Users" < users@ovirt.org > > > > > > > > > > > > > > > On 4 Apr 2014, at 15:14, Sander Grendelman wrote: > > > > > > > > > > > > > > Do you have power management configured? > > > > > > Was the "failed" host fenced/rebooted? > > > > > > > > > > > > On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.k...@gmail.com > > > wrote: > > > > > > So... It is possible for a fully automatic migration of the VM to another > > hypervisor in case Storage connection fails? > > > > > > How can we make this happen? Because for the moment, when we tested the > > situation they stayed in pause state. > > > > > > (Test situation: > > > > * Unplug the 2 fibre cables from the hypervisor > > * VM's go in pause state > > * VM's stayed in pause state until the failure was solved > > > > > > > > > > > > as said before, it's not safe hence we (try to) not migrate them. > > > > > > They only get paused when they actually access the storage which may not be > > always the case. I.e. the storage connection is severed, host deemed > > NonOperational and VMs are getting migrated from it, then some of them will > > succeed if they didn't access that "bad" storage … the paused VMs will > > remain (mostly, it can still happen that they appear paused migrated on > > other host when the disk access occurs only at the last stage of migration) > > > > > > > > > > > > > > > > > > so in other words, if you want to migrate the VMs without interruption it's > > not sometimes possible > > > > > > if you are fine with the VMs restarted in short time on other host then > > power > > management/fencing will help here > > > > > > > > > > > > Thanks, > > > > > > michal > > > > > > > > > > > > > > ) > > > > > > > > > > They only returned when we restored the fiber connection to the Hypervisor… > > > > > > > > > > > > yes, since 3.3 we have the autoresume feature > > > > > > > > > > > > Thanks, > > > > > > michal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind Regards, > > > > Koen > > > > > > > > > > > > > > > > > > 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.k...@gmail.com >: > > > > > > So... It is possible for a fully automatic migration of the VM to another > > hypervisor in case Storage connection fails? > > > > > > How can we make this happen? Because for the moment, when we tested the > > situation they stayed in pause state. > > > > > > (Test situation: > > > > * Unplug the 2 fibre cables from the hypervisor > > * VM's go in pause state > > * VM's stayed in pause state until the failure was solved > > > > > > ) > > > > > > > > > > They only returned when we restored the fiber connection to the > > Hypervisor... > > > > > > Kind Regards, > > > > Koen > > > > > > > > > > > > 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.k...@gmail.com >: > > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > From: "Doron Fediuck" < dfedi...@redhat.com > > > Date: Apr 3, 2014 4:51 PM > > Subject: Re: [Users] HA > > > > > > To: "Koen Vanoppen" < vanoppen.k...@gmail.com > > > Cc: "Omer Frenkel" < ofren...@redhat.com >, < users@ovirt.org >, "Federico > > Simoncelli" < fsimo...@redhat.com >, "Allon Mureinik" < amure...@redhat.com > > > > > > > > > > > ----- Original Message ----- > > > From: "Koen Vanoppen" < vanoppen.k...@gmail.com > > > > To: "Omer Frenkel" < ofren...@redhat.com >, users@ovirt.org > > > Sent: Wednesday, April 2, 2014 4:17:36 PM > > > Subject: Re: [Users] HA > > > > > > Yes, indeed. I meant not-operational. Sorry. > > > So, if I understand this correctly. When we ever come in a situation that > > > we > > > loose both storage connections on our hypervisor, we will have to manually > > > restore the connections first? > > > > > > And thanx for the tip for speeding up thins :-). > > > > > > Kind regards, > > > > > > Koen > > > > > > > > > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofren...@redhat.com > : > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Koen Vanoppen" < vanoppen.k...@gmail.com > > > > > To: users@ovirt.org > > > > Sent: Wednesday, April 2, 2014 4:07:19 PM > > > > Subject: [Users] HA > > > > > > > > Dear All, > > > > > > > > Due our acceptance testing, we discovered something. (Document will > > > > follow). > > > > When we disable one fiber path, no problem multipath finds it way no > > > > pings > > > > are lost. > > > > BUT when we disabled both the fiber paths (so one of the storage domain > > > > is > > > > gone on this host, but still available on the other host), vms go in > > > > paused > > > > mode... He chooses a new SPM (can we speed this up?), put's the host in > > > > non-responsive (can we speed this up, more important) and the VM's stay > > > > on > > > > Paused mode... I would expect that they would be migrated (yes, HA is > > > > > > i guess you mean the host moves to not-operational (in contrast to > > > non-responsive)? > > > if so, the engine will not migrate vms that are paused to do io error, > > > because of data corruption risk. > > > > > > to speed up you can look at the storage domain monitoring timeout: > > > engine-config --get StorageDomainFalureTimeoutInMinutes > > > > > > > > > > enabled) to the other host and reboot there... Any solution? We are > > > > still > > > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the > > > > easter > > > > holiday. > > > > > > > > Kind Regards, > > > > > > > > Koen > > > > > > > > Hi Koen, > > Resuming from paused due to io issues is supported (adding relevant folks). > > Regardless, if you did not define power management, you should manually > > approve > > source host was rebooted in order for migration to proceed. Otherwise we > > risk > > split-brain scenario. > > > > Doron > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users