Re: [ovirt-users] [Users] HA

2014-04-11 Thread Koen Vanoppen
Hi All,

Any news about this? DSM hook or anything?
Thanx!

Kind regards


2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com:



 - Original Message -
  From: Koen Vanoppen vanoppen.k...@gmail.com
  To: users@ovirt.org
  Sent: Tuesday, April 8, 2014 3:41:02 PM
  Subject: Re: [Users] HA
 
  Or with other words, the SPM and the VM should move almost immediate
 after
  the storage connections on the hypervisor are gone. I know, I'm asking to
  much maybe, but we would be very happy :-) :-).
 
  So sketch:
 
  Mercury1 SPM
  Mercury 2
 
  Mercury1 loses both fibre connections -- goes in non-operational and
 the VM
  goes in paused state and stays this way, until I manually reboot the
 host so
  it fences.
 
  What I would like is that when mercury 1 loses both fibre connections. He
  fences immediate so the VM's are moved also almost instantly... If this
 is
  possible... :-)
 
  Kind regards and thanks for all the help!
 

 Michal, is there a vdsm hook for vm moved to pause?
 if so, you could send KILL to it, and engine will identify vm was
 killed+HA,
 so it will be restarted, and no need to reboot the host, it will stay in
 non-operational until storage is fixed.

 
 
  2014-04-08 14:26 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
  Ok,
  Thanx already for all the help. I adapted some things for quicker
 respons:
  engine-config --get FenceQuietTimeBetweenOperationsInSec--180
  engine-config --set FenceQuietTimeBetweenOperationsInSec=60
 
  engine-config --get StorageDomainFalureTimeoutInMinutes--180
  engine-config --set StorageDomainFalureTimeoutInMinutes=1
 
  engine-config --get SpmCommandFailOverRetries--5
  engine-config --set SpmCommandFailOverRetries
 
  engine-config --get SPMFailOverAttempts--3
  engine-config --set SPMFailOverAttempts=1
 
  engine-config --get NumberOfFailedRunsOnVds--3
  engine-config --set NumberOfFailedRunsOnVds=1
 
  engine-config --get vdsTimeout--180
  engine-config --set vdsTimeout=30
 
  engine-config --get VDSAttemptsToResetCount--2
  engine-config --set VDSAttemptsToResetCount=1
 
  engine-config --get TimeoutToResetVdsInSeconds--60
  engine-config --set TimeoutToResetVdsInSeconds=30
 
  Now the result of this is that when the VM is not running on the SPM
 that it
  will migrate before going in pause mode.
  But when we tried it, when the vm is running on the SPM, it get's in
 paused
  mode (for safety reasons, I know ;-) ). And stays there until the host
 gets
  MANUALLY fenced by rebooting it. So now my question is... How can I make
 the
  hypervisor fence (so reboots, so vm is moved) quicker?
 
  Kind regards,
 
  Koen
 
 
  2014-04-04 16:28 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
 
 
  Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik
 heb
  reeds de time out aangepast. Die stond op 5 min voor hij den time out
 ging
  geven. Staat nu op 2 min
  On Apr 4, 2014 4:14 PM, David Van Zeebroeck 
  david.van.zeebro...@brusselsairport.be  wrote:
 
 
 
 
 
 
 
 
  Ik heb ze ook he
 
 
 
  Maar normaal had de fencing moeten werken als ik het zo lees
 
  Dus daar is ergens iets verkeerd gelopen zo te lezen
 
 
 
  From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ]
  Sent: vrijdag 4 april 2014 16:07
  To: David Van Zeebroeck
  Subject: Fwd: Re: [Users] HA
 
 
 
 
 
 
 
 
 
 
 
  David Van Zeebroeck
 
  Product Manager Unix Infrastructure
 
  Information  Communication Technology
 
  Brussels Airport Company
 
  T +32 (0)2 753 66 24
 
  M +32 (0)497 02 17 31
 
  david.van.zeebro...@brusselsairport.be
 
 
 
  www.brusselsairport.be
 
 
 
 
 
 
 
  FOLLOW US ON:
 
 
 
 
 
 
 
  Company Info
 
 
 
 
 
 
  -- Forwarded message --
  From: Michal Skrivanek  michal.skriva...@redhat.com 
  Date: Apr 4, 2014 3:39 PM
  Subject: Re: [Users] HA
  To: Koen Vanoppen  vanoppen.k...@gmail.com 
  Cc: ovirt-users Users  users@ovirt.org 
 
 
 
 
 
 
  On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
 
 
 
 
 
 
  Do you have power management configured?
 
 
  Was the failed host fenced/rebooted?
 
 
 
 
 
  On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen  vanoppen.k...@gmail.com
  wrote:
 
 
  So... It is possible for a fully automatic migration of the VM to another
  hypervisor in case Storage connection fails?
 
 
  How can we make this happen? Because for the moment, when we tested the
  situation they stayed in pause state.
 
 
  (Test situation:
 
  * Unplug the 2 fibre cables from the hypervisor
  * VM's go in pause state
  * VM's stayed in pause state until the failure was solved
 
 
 
 
 
  as said before, it's not safe hence we (try to) not migrate them.
 
 
  They only get paused when they actually access the storage which may not
 be
  always the case. I.e. the storage connection is severed, host deemed
  NonOperational and VMs are getting migrated from it, then some of them
 will
  succeed if they didn't access that bad storage … the paused VMs will
  remain (mostly, it can still 

Re: [ovirt-users] [Users] HA

2014-04-11 Thread Michal Skrivanek

On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:

 Hi All,
 
 Any news about this? DSM hook or anything?
 Thanx!
 
 Kind regards
 
 
 2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com:
 
 
 - Original Message -
  From: Koen Vanoppen vanoppen.k...@gmail.com
  To: users@ovirt.org
  Sent: Tuesday, April 8, 2014 3:41:02 PM
  Subject: Re: [Users] HA
 
  Or with other words, the SPM and the VM should move almost immediate after
  the storage connections on the hypervisor are gone. I know, I'm asking to
  much maybe, but we would be very happy :-) :-).
 
  So sketch:
 
  Mercury1 SPM
  Mercury 2
 
  Mercury1 loses both fibre connections -- goes in non-operational and the VM
  goes in paused state and stays this way, until I manually reboot the host so
  it fences.
 
  What I would like is that when mercury 1 loses both fibre connections. He
  fences immediate so the VM's are moved also almost instantly... If this is
  possible... :-)
 
  Kind regards and thanks for all the help!
 
 
 Michal, is there a vdsm hook for vm moved to pause?
 if so, you could send KILL to it, and engine will identify vm was killed+HA,
 so it will be restarted, and no need to reboot the host, it will stay in 
 non-operational until storage is fixed.

you have to differentiate - if only the VMs would be paused, yes, you can do 
anything (also change the err reporting policy to not pause the VM)
but if the host becomes non-operational then it simply doesn't work, vdsm got 
stuck somewhere (often in get blk device stats)
proper power management config should fence it

Thanks,
michal

 
 
 
  2014-04-08 14:26 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
  Ok,
  Thanx already for all the help. I adapted some things for quicker respons:
  engine-config --get FenceQuietTimeBetweenOperationsInSec--180
  engine-config --set FenceQuietTimeBetweenOperationsInSec=60
 
  engine-config --get StorageDomainFalureTimeoutInMinutes--180
  engine-config --set StorageDomainFalureTimeoutInMinutes=1
 
  engine-config --get SpmCommandFailOverRetries--5
  engine-config --set SpmCommandFailOverRetries
 
  engine-config --get SPMFailOverAttempts--3
  engine-config --set SPMFailOverAttempts=1
 
  engine-config --get NumberOfFailedRunsOnVds--3
  engine-config --set NumberOfFailedRunsOnVds=1
 
  engine-config --get vdsTimeout--180
  engine-config --set vdsTimeout=30
 
  engine-config --get VDSAttemptsToResetCount--2
  engine-config --set VDSAttemptsToResetCount=1
 
  engine-config --get TimeoutToResetVdsInSeconds--60
  engine-config --set TimeoutToResetVdsInSeconds=30
 
  Now the result of this is that when the VM is not running on the SPM that it
  will migrate before going in pause mode.
  But when we tried it, when the vm is running on the SPM, it get's in paused
  mode (for safety reasons, I know ;-) ). And stays there until the host gets
  MANUALLY fenced by rebooting it. So now my question is... How can I make the
  hypervisor fence (so reboots, so vm is moved) quicker?
 
  Kind regards,
 
  Koen
 
 
  2014-04-04 16:28 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
 
 
  Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb
  reeds de time out aangepast. Die stond op 5 min voor hij den time out ging
  geven. Staat nu op 2 min
  On Apr 4, 2014 4:14 PM, David Van Zeebroeck 
  david.van.zeebro...@brusselsairport.be  wrote:
 
 
 
 
 
 
 
 
  Ik heb ze ook he
 
 
 
  Maar normaal had de fencing moeten werken als ik het zo lees
 
  Dus daar is ergens iets verkeerd gelopen zo te lezen
 
 
 
  From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ]
  Sent: vrijdag 4 april 2014 16:07
  To: David Van Zeebroeck
  Subject: Fwd: Re: [Users] HA
 
 
 
 
 
 
 
 
 
 
 
  David Van Zeebroeck
 
  Product Manager Unix Infrastructure
 
  Information  Communication Technology
 
  Brussels Airport Company
 
  T +32 (0)2 753 66 24
 
  M +32 (0)497 02 17 31
 
  david.van.zeebro...@brusselsairport.be
 
 
 
  www.brusselsairport.be
 
 
 
 
 
 
 
  FOLLOW US ON:
 
 
 
 
 
 
 
  Company Info
 
 
 
 
 
 
  -- Forwarded message --
  From: Michal Skrivanek  michal.skriva...@redhat.com 
  Date: Apr 4, 2014 3:39 PM
  Subject: Re: [Users] HA
  To: Koen Vanoppen  vanoppen.k...@gmail.com 
  Cc: ovirt-users Users  users@ovirt.org 
 
 
 
 
 
 
  On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
 
 
 
 
 
 
  Do you have power management configured?
 
 
  Was the failed host fenced/rebooted?
 
 
 
 
 
  On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen  vanoppen.k...@gmail.com 
  wrote:
 
 
  So... It is possible for a fully automatic migration of the VM to another
  hypervisor in case Storage connection fails?
 
 
  How can we make this happen? Because for the moment, when we tested the
  situation they stayed in pause state.
 
 
  (Test situation:
 
  * Unplug the 2 fibre cables from the hypervisor
  * VM's go in pause state
  * VM's stayed in pause state until the failure was solved
 
 
 
 
 
  as said before, 

Re: [ovirt-users] [Users] HA

2014-04-11 Thread Michal Skrivanek

On 11 Apr 2014, at 14:47, Koen Vanoppen wrote:

 The Power management is configured correctly. And as long as the host who 
 loses his storage isn't the SPM, there is no problem.

ah, I see

 If I can make it work that, when the VM is pauzed it's get switched of and 
 (HA-way) reboots itself. I'm perfectly happy :-).

I'm not entirely sure that the after_vm_pause() hook gets invoked in this case. 
It was not intended for involuntary pause…but give it a try!:)
otherwise ….well, you can always do a periodic query…not very effective though

Thanks,
michal

 
 Kind regards,
 
 
 -- Forwarded message --
 From: Koen Vanoppen vanoppen.k...@gmail.com
 Date: 2014-04-11 14:47 GMT+02:00
 Subject: Re: [ovirt-users] [Users] HA
 To: Michal Skrivanek michal.skriva...@redhat.com
 
 
 The Power management is configured correctly. And as long as the host who 
 loses his storage isn't the SPM, there is no problem.
 If I can make it work that, when the VM is pauzed it's get switched of and 
 (HA-way) reboots itself. I'm perfectly happy :-).
 
 Kind regards,
 
 
 
 
 2014-04-11 9:37 GMT+02:00 Michal Skrivanek michal.skriva...@redhat.com:
 
 
 On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:
 
 Hi All,
 
 Any news about this? DSM hook or anything?
 Thanx!
 
 Kind regards
 
 
 2014-04-09 9:37 GMT+02:00 Omer Frenkel ofren...@redhat.com:
 
 
 - Original Message -
  From: Koen Vanoppen vanoppen.k...@gmail.com
  To: users@ovirt.org
  Sent: Tuesday, April 8, 2014 3:41:02 PM
  Subject: Re: [Users] HA
 
  Or with other words, the SPM and the VM should move almost immediate after
  the storage connections on the hypervisor are gone. I know, I'm asking to
  much maybe, but we would be very happy :-) :-).
 
  So sketch:
 
  Mercury1 SPM
  Mercury 2
 
  Mercury1 loses both fibre connections -- goes in non-operational and the 
  VM
  goes in paused state and stays this way, until I manually reboot the host 
  so
  it fences.
 
  What I would like is that when mercury 1 loses both fibre connections. He
  fences immediate so the VM's are moved also almost instantly... If this is
  possible... :-)
 
  Kind regards and thanks for all the help!
 
 
 Michal, is there a vdsm hook for vm moved to pause?
 if so, you could send KILL to it, and engine will identify vm was killed+HA,
 so it will be restarted, and no need to reboot the host, it will stay in 
 non-operational until storage is fixed.
 
 you have to differentiate - if only the VMs would be paused, yes, you can do 
 anything (also change the err reporting policy to not pause the VM)
 but if the host becomes non-operational then it simply doesn't work, vdsm got 
 stuck somewhere (often in get blk device stats)
 proper power management config should fence it
 
 Thanks,
 michal
 
 
 
 
  2014-04-08 14:26 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
  Ok,
  Thanx already for all the help. I adapted some things for quicker respons:
  engine-config --get FenceQuietTimeBetweenOperationsInSec--180
  engine-config --set FenceQuietTimeBetweenOperationsInSec=60
 
  engine-config --get StorageDomainFalureTimeoutInMinutes--180
  engine-config --set StorageDomainFalureTimeoutInMinutes=1
 
  engine-config --get SpmCommandFailOverRetries--5
  engine-config --set SpmCommandFailOverRetries
 
  engine-config --get SPMFailOverAttempts--3
  engine-config --set SPMFailOverAttempts=1
 
  engine-config --get NumberOfFailedRunsOnVds--3
  engine-config --set NumberOfFailedRunsOnVds=1
 
  engine-config --get vdsTimeout--180
  engine-config --set vdsTimeout=30
 
  engine-config --get VDSAttemptsToResetCount--2
  engine-config --set VDSAttemptsToResetCount=1
 
  engine-config --get TimeoutToResetVdsInSeconds--60
  engine-config --set TimeoutToResetVdsInSeconds=30
 
  Now the result of this is that when the VM is not running on the SPM that 
  it
  will migrate before going in pause mode.
  But when we tried it, when the vm is running on the SPM, it get's in paused
  mode (for safety reasons, I know ;-) ). And stays there until the host gets
  MANUALLY fenced by rebooting it. So now my question is... How can I make 
  the
  hypervisor fence (so reboots, so vm is moved) quicker?
 
  Kind regards,
 
  Koen
 
 
  2014-04-04 16:28 GMT+02:00 Koen Vanoppen  vanoppen.k...@gmail.com  :
 
 
 
 
 
  Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik 
  heb
  reeds de time out aangepast. Die stond op 5 min voor hij den time out ging
  geven. Staat nu op 2 min
  On Apr 4, 2014 4:14 PM, David Van Zeebroeck 
  david.van.zeebro...@brusselsairport.be  wrote:
 
 
 
 
 
 
 
 
  Ik heb ze ook he
 
 
 
  Maar normaal had de fencing moeten werken als ik het zo lees
 
  Dus daar is ergens iets verkeerd gelopen zo te lezen
 
 
 
  From: Koen Vanoppen [mailto: vanoppen.k...@gmail.com ]
  Sent: vrijdag 4 april 2014 16:07
  To: David Van Zeebroeck
  Subject: Fwd: Re: [Users] HA
 
 
 
 
 
 
 
 
 
 
 
  David Van Zeebroeck
 
  Product Manager Unix Infrastructure
 
  Information