Re: [ovirt-users] [Users] HA
Hi All, Any news about this? DSM hook or anything? Thanx! Kind regards 2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com: - Original Message - From: Koen Vanoppen vanoppen.k...@gmail.com To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-). So sketch: Mercury1 SPM Mercury 2 Mercury1 loses both fibre connections -- goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences. What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-) Kind regards and thanks for all the help! Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was killed+HA, so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed. 2014-04-08 14:26 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec--180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60 engine-config --get StorageDomainFalureTimeoutInMinutes--180 engine-config --set StorageDomainFalureTimeoutInMinutes=1 engine-config --get SpmCommandFailOverRetries--5 engine-config --set SpmCommandFailOverRetries engine-config --get SPMFailOverAttempts--3 engine-config --set SPMFailOverAttempts=1 engine-config --get NumberOfFailedRunsOnVds--3 engine-config --set NumberOfFailedRunsOnVds=1 engine-config --get vdsTimeout--180 engine-config --set vdsTimeout=30 engine-config --get VDSAttemptsToResetCount--2 engine-config --set VDSAttemptsToResetCount=1 engine-config --get TimeoutToResetVdsInSeconds--60 engine-config --set TimeoutToResetVdsInSeconds=30 Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker? Kind regards, Koen 2014-04-04 16:28 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, David Van Zeebroeck david.van.zeebro...@brusselsairport.be wrote: Ik heb ze ook he Maar normaal had de fencing moeten werken als ik het zo lees Dus daar is ergens iets verkeerd gelopen zo te lezen From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA David Van Zeebroeck Product Manager Unix Infrastructure Information Communication Technology Brussels Airport Company T +32 (0)2 753 66 24 M +32 (0)497 02 17 31 david.van.zeebro...@brusselsairport.be www.brusselsairport.be FOLLOW US ON: Company Info -- Forwarded message -- From: Michal Skrivanek michal.skriva...@redhat.com Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: Koen Vanoppen vanoppen.k...@gmail.com Cc: ovirt-users Users users@ovirt.org On 4 Apr 2014, at 15:14, Sander Grendelman wrote: Do you have power management configured? Was the failed host fenced/rebooted? On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen vanoppen.k...@gmail.com wrote: So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails? How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state. (Test situation: * Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved as said before, it's not safe hence we (try to) not migrate them. They only get paused when they actually access the storage which may not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of them will succeed if they didn't access that bad storage … the paused VMs will remain (mostly, it can still
Re: [ovirt-users] [Users] HA
On 11 Apr 2014, at 09:00, Koen Vanoppen wrote: Hi All, Any news about this? DSM hook or anything? Thanx! Kind regards 2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com: - Original Message - From: Koen Vanoppen vanoppen.k...@gmail.com To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-). So sketch: Mercury1 SPM Mercury 2 Mercury1 loses both fibre connections -- goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences. What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-) Kind regards and thanks for all the help! Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was killed+HA, so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed. you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM) but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats) proper power management config should fence it Thanks, michal 2014-04-08 14:26 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec--180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60 engine-config --get StorageDomainFalureTimeoutInMinutes--180 engine-config --set StorageDomainFalureTimeoutInMinutes=1 engine-config --get SpmCommandFailOverRetries--5 engine-config --set SpmCommandFailOverRetries engine-config --get SPMFailOverAttempts--3 engine-config --set SPMFailOverAttempts=1 engine-config --get NumberOfFailedRunsOnVds--3 engine-config --set NumberOfFailedRunsOnVds=1 engine-config --get vdsTimeout--180 engine-config --set vdsTimeout=30 engine-config --get VDSAttemptsToResetCount--2 engine-config --set VDSAttemptsToResetCount=1 engine-config --get TimeoutToResetVdsInSeconds--60 engine-config --set TimeoutToResetVdsInSeconds=30 Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker? Kind regards, Koen 2014-04-04 16:28 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, David Van Zeebroeck david.van.zeebro...@brusselsairport.be wrote: Ik heb ze ook he Maar normaal had de fencing moeten werken als ik het zo lees Dus daar is ergens iets verkeerd gelopen zo te lezen From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA David Van Zeebroeck Product Manager Unix Infrastructure Information Communication Technology Brussels Airport Company T +32 (0)2 753 66 24 M +32 (0)497 02 17 31 david.van.zeebro...@brusselsairport.be www.brusselsairport.be FOLLOW US ON: Company Info -- Forwarded message -- From: Michal Skrivanek michal.skriva...@redhat.com Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: Koen Vanoppen vanoppen.k...@gmail.com Cc: ovirt-users Users users@ovirt.org On 4 Apr 2014, at 15:14, Sander Grendelman wrote: Do you have power management configured? Was the failed host fenced/rebooted? On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen vanoppen.k...@gmail.com wrote: So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails? How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state. (Test situation: * Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved as said before,
Re: [ovirt-users] [Users] HA
On 11 Apr 2014, at 14:47, Koen Vanoppen wrote: The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem. ah, I see If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-). I'm not entirely sure that the after_vm_pause() hook gets invoked in this case. It was not intended for involuntary pause…but give it a try!:) otherwise ….well, you can always do a periodic query…not very effective though Thanks, michal Kind regards, -- Forwarded message -- From: Koen Vanoppen vanoppen.k...@gmail.com Date: 2014-04-11 14:47 GMT+02:00 Subject: Re: [ovirt-users] [Users] HA To: Michal Skrivanek michal.skriva...@redhat.com The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem. If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-). Kind regards, 2014-04-11 9:37 GMT+02:00 Michal Skrivanek michal.skriva...@redhat.com: On 11 Apr 2014, at 09:00, Koen Vanoppen wrote: Hi All, Any news about this? DSM hook or anything? Thanx! Kind regards 2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com: - Original Message - From: Koen Vanoppen vanoppen.k...@gmail.com To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-). So sketch: Mercury1 SPM Mercury 2 Mercury1 loses both fibre connections -- goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences. What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-) Kind regards and thanks for all the help! Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was killed+HA, so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed. you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM) but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats) proper power management config should fence it Thanks, michal 2014-04-08 14:26 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec--180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60 engine-config --get StorageDomainFalureTimeoutInMinutes--180 engine-config --set StorageDomainFalureTimeoutInMinutes=1 engine-config --get SpmCommandFailOverRetries--5 engine-config --set SpmCommandFailOverRetries engine-config --get SPMFailOverAttempts--3 engine-config --set SPMFailOverAttempts=1 engine-config --get NumberOfFailedRunsOnVds--3 engine-config --set NumberOfFailedRunsOnVds=1 engine-config --get vdsTimeout--180 engine-config --set vdsTimeout=30 engine-config --get VDSAttemptsToResetCount--2 engine-config --set VDSAttemptsToResetCount=1 engine-config --get TimeoutToResetVdsInSeconds--60 engine-config --set TimeoutToResetVdsInSeconds=30 Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker? Kind regards, Koen 2014-04-04 16:28 GMT+02:00 Koen Vanoppen vanoppen.k...@gmail.com : Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, David Van Zeebroeck david.van.zeebro...@brusselsairport.be wrote: Ik heb ze ook he Maar normaal had de fencing moeten werken als ik het zo lees Dus daar is ergens iets verkeerd gelopen zo te lezen From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA David Van Zeebroeck Product Manager Unix Infrastructure Information