Re: [Users] two node ovirt cluster with HA

2014-01-30 Thread Eli Mesika


- Original Message -
 From: Tareq Alayan tala...@redhat.com
 To: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se
 Cc: users@ovirt.org
 Sent: Monday, January 27, 2014 2:43:29 PM
 Subject: Re: [Users] two node ovirt cluster with HA
 
 Hi,
 
 Power management makes use of special *dedicated* hardware in order to
 restart hosts independently of host OS. The engine connects to a power
 management devices using a *dedicated* network IP address.
 The engine is capable of rebooting hosts that have entered a
 non-operational or non-responsive state,

non-operational is related to storage issues so the Host will not be restarted 
by PM in this case

 The abilities provided by all power management devices are: check
 status, start, stop and recycle (restart)...

Only status, start, stop  while restart is implemented as stop-wait to off 
status-start-wait to on status 

 
 In the case of non-responsive host: all of the VMs that are currently
 running on that host can also become non-responsive. However, the
 non-responsive host keeps locking the VM hard disk for all VMs it is
 running. Attempting to start a VM on a different host and assign the
 second host write privileges for the virtual machine hard disk image can
 cause data corruption.
 Rebooting allows the engine to assume that the lock on a VM hard disk
 image has been released.
 The engine can know for sure that the problematic host has been rebooted
 via the power management device and then it can start a VM from the
 problematic host on another host without risking data corruption.
 Important note: A virtual machine that has been marked highly-available
 can not be safely started on a different host without the certainty that
 doing so will not cause data corruption.
 
 N-joy,
 
 --Tareq
 
 
 
 On 01/27/2014 02:05 PM, Dafna Ron wrote:
  I am adding Tareq for the Power Management implementation.
 
  Dafna
 
 
  On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
  On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
  Powering off the host will never trigger vm migration.
  As far as engine is concerned it just lost connection to the host, but
  has no way of telling if the host is down or if a router is down.
  Can´t it at least check with power management if the Host status is down
  first?
 
  I mean, if the network is down there will be no response from either PM
  or Host. But if PM is up and can tell you that the Host is down, sounds
  rather clear cut to me...
 
  Seems to me the VM's would be restarted sooner if the flow was altered
  to first check with PM if it´s a network or Host issue, and if Host
  issue, immediately restart VM's on another Host, instead of waiting for
  a potentially problematic Host to boot up eventually.
 
  /K
 
  since vm's can continue running on the host even if engine has no
  access
  to it, starting the vm's on the second host can cause split brain and
  data corruption.
 
  The way that the engine knows what's going on is by sending heath check
  queries to the vdsm.
  Power management will try to reboot a host when the health checks to
  vdsm will not be answered.
  So... if engine gets no reply and has no way of rebooting the host, the
  host status will be changed to Non-Responsive and the vm's will be
  unknown because engine has no way of knowing what's happening with the
  vm's.
  Since reboot of the host will kill the vm's running on it - this will
  never cause any vm migration but... along with the High-Availability vm
  feature, you will be able to have some of the vm's re-started on the
  second host after the host reboot (and that is only if Power Management
  was confirmed as successful).
 
  VM migration is only triggered when:
  1. Cluster configuration states that the vm should be migrated in case
  of failure
  2. Engine has access to the host - so the failure is on the storage
  side
  and not the host side.
  3. the vms are not actively writing (although there might be a new RFE
  for it).
 
  hope this clears things up
 
  Dafna
 
 
 
  On 01/27/2014 10:11 AM, Andrew Lau wrote:
  Hi,
 
  Have you got power management enabled?
 
  That's the fencing feature required for the engine to ensure that the
  host is actually offline. It won't resume any other VMs to prevent
  potential VM corruption (eg. VM running on multiple hosts).
 
  Andrew.
 
  On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com
  mailto:urotr...@gmail.com wrote:
 
   Hi all ,
 
   I was setting a two node ovirt cluster with ovirt engine on
   seperate node . I completed the configuration and tested VM  live
   migrations with out any issues . Then for checking cluster HA I
   powered down one host and expected vms running on that host to be
   migrated to the other one . But nothing happened , Engine
  detected
   host as un-rechable and marked it as non-operational and vm
  ran on
   that host went to 'unknown state' . Is that not possible to setup
   a fully HA ovirt cluster with two

Re: [Users] two node ovirt cluster with HA

2014-01-29 Thread Eli Mesika


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: d...@redhat.com
 Cc: Tareq Alayan tala...@redhat.com, Eli Mesika emes...@redhat.com, 
 Karli Sjöberg karli.sjob...@slu.se,
 users@ovirt.org
 Sent: Tuesday, January 28, 2014 3:12:46 PM
 Subject: Re: [Users] two node ovirt cluster with HA
 
 On Tue, Jan 28, 2014 at 12:02 AM, Dafna Ron d...@redhat.com wrote:
 
  Andrew,
  Once this discussion is finished, and If what you like done is not in the
  current implementation can you please open a bug/feature request for it?
 
 
 Sure - I've opened a RFE here based on the current discussions
 https://bugzilla.redhat.com/show_bug.cgi?id=1058737 but I'm not sure which
 category it should be under.

I had assigned it to infra , thanks 
IMHO we should handle only the first scenario reported in this BZ 

 
 Cheers,
 Andrew.
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-29 Thread Tareq Alayan

Adding Eli.


On 01/27/2014 02:50 PM, Andrew Lau wrote:

Hi,

I think he was asking what if the power management device reported 
that the host was powered off. Then VMs should be brought back up as 
being off would essentially be the same as running a power cycle/reboot?


Another example I'm seeing is what happens if the whole host loses 
power and it's power management device then becomes unavailable (ie. 
not reachable) then you're stuck in the case where it requires manual 
intervention.


I would be interested to potentially see something like a timeout on 
those problematic VMs (eg. if nothing was read or write after x amount 
of time) then you could consider the host as offline? I guess then 
that adds a lot of risk..



On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com 
mailto:tala...@redhat.com wrote:


Hi,

Power management makes use of special *dedicated* hardware in
order to restart hosts independently of host OS. The engine
connects to a power management devices using a *dedicated* network
IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...

In the case of non-responsive host: all of the VMs that are
currently running on that host can also become non-responsive.
However, the non-responsive host keeps locking the VM hard disk
for all VMs it is running. Attempting to start a VM on a different
host and assign the second host write privileges for the virtual
machine hard disk image can cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard
disk image has been released.
The engine can know for sure that the problematic host has been
rebooted via the power management device and then it can start a
VM from the problematic host on another host without risking data
corruption.
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different host
without the certainty that doing so will not cause data corruption.

N-joy,

--Tareq




On 01/27/2014 02:05 PM, Dafna Ron wrote:

I am adding Tareq for the Power Management implementation.

Dafna


On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection
to the host, but
has no way of telling if the host is down or if a
router is down.

Can´t it at least check with power management if the Host
status is down
first?

I mean, if the network is down there will be no response
from either PM
or Host. But if PM is up and can tell you that the Host is
down, sounds
rather clear cut to me...

Seems to me the VM's would be restarted sooner if the flow
was altered
to first check with PM if it´s a network or Host issue,
and if Host
issue, immediately restart VM's on another Host, instead
of waiting for
a potentially problematic Host to boot up eventually.

/K

since vm's can continue running on the host even if
engine has no access
to it, starting the vm's on the second host can cause
split brain and
data corruption.

The way that the engine knows what's going on is by
sending heath check
queries to the vdsm.
Power management will try to reboot a host when the
health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of
rebooting the host, the
host status will be changed to Non-Responsive and the
vm's will be
unknown because engine has no way of knowing what's
happening with the
vm's.
Since reboot of the host will kill the vm's running on
it - this will
never cause any vm migration but... along with the
High-Availability vm
feature, you will be able to have some of the vm's
re-started on the
second host after the host reboot (and that is only if
Power Management
was confirmed as successful).

VM migration is only triggered when:
1. Cluster configuration states that the vm should be
migrated in case
of failure
2. 

Re: [Users] two node ovirt cluster with HA

2014-01-29 Thread Tareq Alayan

Hi,

Power management makes use of special *dedicated* hardware in order to 
restart hosts independently of host OS. The engine connects to a power 
management devices using a *dedicated* network IP address.
The engine is capable of rebooting hosts that have entered a 
non-operational or non-responsive state,
The abilities provided by all power management devices are: check 
status, start, stop and recycle (restart)...


In the case of non-responsive host: all of the VMs that are currently 
running on that host can also become non-responsive. However, the 
non-responsive host keeps locking the VM hard disk for all VMs it is 
running. Attempting to start a VM on a different host and assign the 
second host write privileges for the virtual machine hard disk image can 
cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard disk 
image has been released.
The engine can know for sure that the problematic host has been rebooted 
via the power management device and then it can start a VM from the 
problematic host on another host without risking data corruption.
Important note: A virtual machine that has been marked highly-available 
can not be safely started on a different host without the certainty that 
doing so will not cause data corruption.


N-joy,

--Tareq



On 01/27/2014 02:05 PM, Dafna Ron wrote:

I am adding Tareq for the Power Management implementation.

Dafna


On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection to the host, but
has no way of telling if the host is down or if a router is down.

Can´t it at least check with power management if the Host status is down
first?

I mean, if the network is down there will be no response from either PM
or Host. But if PM is up and can tell you that the Host is down, sounds
rather clear cut to me...

Seems to me the VM's would be restarted sooner if the flow was altered
to first check with PM if it´s a network or Host issue, and if Host
issue, immediately restart VM's on another Host, instead of waiting for
a potentially problematic Host to boot up eventually.

/K

since vm's can continue running on the host even if engine has no 
access

to it, starting the vm's on the second host can cause split brain and
data corruption.

The way that the engine knows what's going on is by sending heath check
queries to the vdsm.
Power management will try to reboot a host when the health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of rebooting the host, the
host status will be changed to Non-Responsive and the vm's will be
unknown because engine has no way of knowing what's happening with the
vm's.
Since reboot of the host will kill the vm's running on it - this will
never cause any vm migration but... along with the High-Availability vm
feature, you will be able to have some of the vm's re-started on the
second host after the host reboot (and that is only if Power Management
was confirmed as successful).

VM migration is only triggered when:
1. Cluster configuration states that the vm should be migrated in case
of failure
2. Engine has access to the host - so the failure is on the storage 
side

and not the host side.
3. the vms are not actively writing (although there might be a new RFE
for it).

hope this clears things up

Dafna



On 01/27/2014 10:11 AM, Andrew Lau wrote:

Hi,

Have you got power management enabled?

That's the fencing feature required for the engine to ensure that the
host is actually offline. It won't resume any other VMs to prevent
potential VM corruption (eg. VM running on multiple hosts).

Andrew.

On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com
mailto:urotr...@gmail.com wrote:

 Hi all ,

 I was setting a two node ovirt cluster with ovirt engine on
 seperate node . I completed the configuration and tested VM  live
 migrations with out any issues . Then for checking cluster HA I
 powered down one host and expected vms running on that host to be
 migrated to the other one . But nothing happened , Engine 
detected
 host as un-rechable and marked it as non-operational and vm 
ran on

 that host went to 'unknown state' . Is that not possible to setup
 a fully HA ovirt cluster with two nodes ? or else is that my
 configuration problem ? please advice .

 Thanks  Regards

 Alex

 ___
 Users mailing list
 Users@ovirt.org mailto:Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users









Re: [Users] two node ovirt cluster with HA

2014-01-28 Thread Karli Sjöberg


Skickat från min iPhone

 27 jan 2014 kl. 16:40 skrev Eli Mesika emes...@redhat.com:
 
 
 
 - Original Message -
 From: Tareq Alayan tala...@redhat.com
 To: Andrew Lau and...@andrewklau.com, Eli Mesika emes...@redhat.com
 Cc: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se, users@ovirt.org
 Sent: Monday, January 27, 2014 2:59:02 PM
 Subject: Re: [Users] two node ovirt cluster with HA
 
 Adding Eli.
 
 I just want to summarize the requirement as I understand it:
 
 In the case that a Host that is running HA VMs and have PM configured is 
 turned off manually :
 
 1) The non-responsive treatment should be modified to check Host status via 
 PM agent 
 2) If Host is off , HA VMs will attempt to run on another host ASAP
 3) The host status should be set to DOWN
 4) No attempt to restart vdsm (soft fencing) or restart the host (hard 
 fencing) will be done 
 
 Is the above correct? if so , a RFE on that can be opened 

Spot on, that's exactly what I was trying to say! I'd very much like to see an 
RFE for that.

/K

 
 
 
 On 01/27/2014 02:50 PM, Andrew Lau wrote:
 Hi,
 
 I think he was asking what if the power management device reported
 that the host was powered off. Then VMs should be brought back up as
 being off would essentially be the same as running a power cycle/reboot?
 
 Another example I'm seeing is what happens if the whole host loses
 power and it's power management device then becomes unavailable (ie.
 not reachable) then you're stuck in the case where it requires manual
 intervention.
 
 I would be interested to potentially see something like a timeout on
 those problematic VMs (eg. if nothing was read or write after x amount
 of time) then you could consider the host as offline? I guess then
 that adds a lot of risk..
 
 
 On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com
 mailto:tala...@redhat.com wrote:
 
Hi,
 
Power management makes use of special *dedicated* hardware in
order to restart hosts independently of host OS. The engine
connects to a power management devices using a *dedicated* network
IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...
 
In the case of non-responsive host: all of the VMs that are
currently running on that host can also become non-responsive.
However, the non-responsive host keeps locking the VM hard disk
for all VMs it is running. Attempting to start a VM on a different
host and assign the second host write privileges for the virtual
machine hard disk image can cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard
disk image has been released.
The engine can know for sure that the problematic host has been
rebooted via the power management device and then it can start a
VM from the problematic host on another host without risking data
corruption.
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different host
without the certainty that doing so will not cause data corruption.
 
N-joy,
 
--Tareq
 
 
 
 
On 01/27/2014 02:05 PM, Dafna Ron wrote:
 
I am adding Tareq for the Power Management implementation.
 
Dafna
 
 
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
 
On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
 
Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection
to the host, but
has no way of telling if the host is down or if a
router is down.
 
Can´t it at least check with power management if the Host
status is down
first?
 
I mean, if the network is down there will be no response
from either PM
or Host. But if PM is up and can tell you that the Host is
down, sounds
rather clear cut to me...
 
Seems to me the VM's would be restarted sooner if the flow
was altered
to first check with PM if it´s a network or Host issue,
and if Host
issue, immediately restart VM's on another Host, instead
of waiting for
a potentially problematic Host to boot up eventually.
 
/K
 
since vm's can continue running on the host even if
engine has no access
to it, starting the vm's on the second host can cause
split brain and
data corruption.
 
The way that the engine knows what's going on is by
sending heath check
queries to the vdsm.
Power management will try to reboot a host when

Re: [Users] two node ovirt cluster with HA

2014-01-28 Thread Eli Mesika


- Original Message -
 From: Jaison peter urotr...@gmail.com
 To: Eli Mesika emes...@redhat.com
 Cc: users@ovirt.org, Tareq Alayan tala...@redhat.com
 Sent: Tuesday, January 28, 2014 7:33:35 AM
 Subject: Re: [Users] two node ovirt cluster with HA
 
 Thank you all for your valuable feedback .
 
 Can you please specify some of the supported fencing devices in ovirt ?

For oVirt 3.4 :

apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti

 
 
 On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika emes...@redhat.com wrote:
 
 
 
  - Original Message -
   From: Tareq Alayan tala...@redhat.com
   To: Andrew Lau and...@andrewklau.com, Eli Mesika 
  emes...@redhat.com
   Cc: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se,
  users@ovirt.org
   Sent: Monday, January 27, 2014 2:59:02 PM
   Subject: Re: [Users] two node ovirt cluster with HA
  
   Adding Eli.
 
  I just want to summarize the requirement as I understand it:
 
  In the case that a Host that is running HA VMs and have PM configured is
  turned off manually :
 
  1) The non-responsive treatment should be modified to check Host status
  via PM agent
  2) If Host is off , HA VMs will attempt to run on another host ASAP
  3) The host status should be set to DOWN
  4) No attempt to restart vdsm (soft fencing) or restart the host (hard
  fencing) will be done
 
  Is the above correct? if so , a RFE on that can be opened
 
  
  
   On 01/27/2014 02:50 PM, Andrew Lau wrote:
Hi,
   
I think he was asking what if the power management device reported
that the host was powered off. Then VMs should be brought back up as
being off would essentially be the same as running a power
  cycle/reboot?
   
Another example I'm seeing is what happens if the whole host loses
power and it's power management device then becomes unavailable (ie.
not reachable) then you're stuck in the case where it requires manual
intervention.
   
I would be interested to potentially see something like a timeout on
those problematic VMs (eg. if nothing was read or write after x amount
of time) then you could consider the host as offline? I guess then
that adds a lot of risk..
   
   
On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com
mailto:tala...@redhat.com wrote:
   
Hi,
   
Power management makes use of special *dedicated* hardware in
order to restart hosts independently of host OS. The engine
connects to a power management devices using a *dedicated* network
IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...
   
In the case of non-responsive host: all of the VMs that are
currently running on that host can also become non-responsive.
However, the non-responsive host keeps locking the VM hard disk
for all VMs it is running. Attempting to start a VM on a different
host and assign the second host write privileges for the virtual
machine hard disk image can cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard
disk image has been released.
The engine can know for sure that the problematic host has been
rebooted via the power management device and then it can start a
VM from the problematic host on another host without risking data
corruption.
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different host
without the certainty that doing so will not cause data corruption.
   
N-joy,
   
--Tareq
   
   
   
   
On 01/27/2014 02:05 PM, Dafna Ron wrote:
   
I am adding Tareq for the Power Management implementation.
   
Dafna
   
   
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
   
On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
   
Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection
to the host, but
has no way of telling if the host is down or if a
router is down.
   
Can´t it at least check with power management if the Host
status is down
first?
   
I mean, if the network is down there will be no response
from either PM
or Host. But if PM is up and can tell you that the Host is
down, sounds
rather clear cut to me...
   
Seems to me the VM's would be restarted sooner if the flow
was altered
to first check with PM if it´s

Re: [Users] two node ovirt cluster with HA

2014-01-28 Thread Jaison peter
Thanks !


On Tue, Jan 28, 2014 at 2:04 PM, Eli Mesika emes...@redhat.com wrote:



 - Original Message -
  From: Jaison peter urotr...@gmail.com
  To: Eli Mesika emes...@redhat.com
  Cc: users@ovirt.org, Tareq Alayan tala...@redhat.com
  Sent: Tuesday, January 28, 2014 7:33:35 AM
  Subject: Re: [Users] two node ovirt cluster with HA
 
  Thank you all for your valuable feedback .
 
  Can you please specify some of the supported fencing devices in ovirt ?

 For oVirt 3.4 :


 apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti

 
 
  On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika emes...@redhat.com wrote:
 
  
  
   - Original Message -
From: Tareq Alayan tala...@redhat.com
To: Andrew Lau and...@andrewklau.com, Eli Mesika 
   emes...@redhat.com
Cc: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se,
   users@ovirt.org
Sent: Monday, January 27, 2014 2:59:02 PM
Subject: Re: [Users] two node ovirt cluster with HA
   
Adding Eli.
  
   I just want to summarize the requirement as I understand it:
  
   In the case that a Host that is running HA VMs and have PM configured
 is
   turned off manually :
  
   1) The non-responsive treatment should be modified to check Host status
   via PM agent
   2) If Host is off , HA VMs will attempt to run on another host ASAP
   3) The host status should be set to DOWN
   4) No attempt to restart vdsm (soft fencing) or restart the host (hard
   fencing) will be done
  
   Is the above correct? if so , a RFE on that can be opened
  
   
   
On 01/27/2014 02:50 PM, Andrew Lau wrote:
 Hi,

 I think he was asking what if the power management device reported
 that the host was powered off. Then VMs should be brought back up
 as
 being off would essentially be the same as running a power
   cycle/reboot?

 Another example I'm seeing is what happens if the whole host loses
 power and it's power management device then becomes unavailable
 (ie.
 not reachable) then you're stuck in the case where it requires
 manual
 intervention.

 I would be interested to potentially see something like a timeout
 on
 those problematic VMs (eg. if nothing was read or write after x
 amount
 of time) then you could consider the host as offline? I guess then
 that adds a lot of risk..


 On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com
 mailto:tala...@redhat.com wrote:

 Hi,

 Power management makes use of special *dedicated* hardware in
 order to restart hosts independently of host OS. The engine
 connects to a power management devices using a *dedicated*
 network
 IP address.
 The engine is capable of rebooting hosts that have entered a
 non-operational or non-responsive state,
 The abilities provided by all power management devices are:
 check
 status, start, stop and recycle (restart)...

 In the case of non-responsive host: all of the VMs that are
 currently running on that host can also become non-responsive.
 However, the non-responsive host keeps locking the VM hard disk
 for all VMs it is running. Attempting to start a VM on a
 different
 host and assign the second host write privileges for the
 virtual
 machine hard disk image can cause data corruption.
 Rebooting allows the engine to assume that the lock on a VM
 hard
 disk image has been released.
 The engine can know for sure that the problematic host has been
 rebooted via the power management device and then it can start
 a
 VM from the problematic host on another host without risking
 data
 corruption.
 Important note: A virtual machine that has been marked
 highly-available can not be safely started on a different host
 without the certainty that doing so will not cause data
 corruption.

 N-joy,

 --Tareq




 On 01/27/2014 02:05 PM, Dafna Ron wrote:

 I am adding Tareq for the Power Management implementation.

 Dafna


 On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

 On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

 Powering off the host will never trigger vm
 migration.
 As far as engine is concerned it just lost
 connection
 to the host, but
 has no way of telling if the host is down or if a
 router is down.

 Can´t it at least check with power management if the
 Host
 status is down
 first?

 I mean, if the network is down there will be no
 response
 from either PM
 or Host. But if PM is up and can tell you that the
 Host

Re: [Users] two node ovirt cluster with HA

2014-01-28 Thread Andrew Lau
On Tue, Jan 28, 2014 at 12:02 AM, Dafna Ron d...@redhat.com wrote:

 Andrew,
 Once this discussion is finished, and If what you like done is not in the
 current implementation can you please open a bug/feature request for it?


Sure - I've opened a RFE here based on the current discussions
https://bugzilla.redhat.com/show_bug.cgi?id=1058737 but I'm not sure which
category it should be under.

Cheers,
Andrew.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-28 Thread Dafna Ron

On 01/28/2014 01:12 PM, Andrew Lau wrote:
On Tue, Jan 28, 2014 at 12:02 AM, Dafna Ron d...@redhat.com 
mailto:d...@redhat.comwrote:


Andrew,
Once this discussion is finished, and If what you like done is not
in the current implementation can you please open a bug/feature
request for it?


Sure - I've opened a RFE here based on the current discussions 
https://bugzilla.redhat.com/show_bug.cgi?id=1058737 but I'm not sure 
which category it should be under.


Cheers,
Andrew.


Thanks Andrew! I really appreciate it :)



--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Dafna Ron

Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection to the host, but 
has no way of telling if the host is down or if a router is down.
since vm's can continue running on the host even if engine has no access 
to it, starting the vm's on the second host can cause split brain and 
data corruption.


The way that the engine knows what's going on is by sending heath check 
queries to the vdsm.
Power management will try to reboot a host when the health checks to 
vdsm will not be answered.
So... if engine gets no reply and has no way of rebooting the host, the 
host status will be changed to Non-Responsive and the vm's will be 
unknown because engine has no way of knowing what's happening with the 
vm's.
Since reboot of the host will kill the vm's running on it - this will 
never cause any vm migration but... along with the High-Availability vm 
feature, you will be able to have some of the vm's re-started on the 
second host after the host reboot (and that is only if Power Management 
was confirmed as successful).


VM migration is only triggered when:
1. Cluster configuration states that the vm should be migrated in case 
of failure
2. Engine has access to the host - so the failure is on the storage side 
and not the host side.
3. the vms are not actively writing (although there might be a new RFE 
for it).


hope this clears things up

Dafna



On 01/27/2014 10:11 AM, Andrew Lau wrote:


Hi,

Have you got power management enabled?

That's the fencing feature required for the engine to ensure that the 
host is actually offline. It won't resume any other VMs to prevent 
potential VM corruption (eg. VM running on multiple hosts).


Andrew.

On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com 
mailto:urotr...@gmail.com wrote:


Hi all ,

I was setting a two node ovirt cluster with ovirt engine on
seperate node . I completed the configuration and tested VM  live
migrations with out any issues . Then for checking cluster HA I
powered down one host and expected vms running on that host to be
migrated to the other one . But nothing happened , Engine detected
host as un-rechable and marked it as non-operational and vm ran on
that host went to 'unknown state' . Is that not possible to setup
a fully HA ovirt cluster with two nodes ? or else is that my
configuration problem ? please advice .

Thanks  Regards

Alex

___
Users mailing list
Users@ovirt.org mailto:Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Karli Sjöberg
On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
 Powering off the host will never trigger vm migration.
 As far as engine is concerned it just lost connection to the host, but 
 has no way of telling if the host is down or if a router is down.

Can´t it at least check with power management if the Host status is down
first?

I mean, if the network is down there will be no response from either PM
or Host. But if PM is up and can tell you that the Host is down, sounds
rather clear cut to me...

Seems to me the VM's would be restarted sooner if the flow was altered
to first check with PM if it´s a network or Host issue, and if Host
issue, immediately restart VM's on another Host, instead of waiting for
a potentially problematic Host to boot up eventually. 

/K

 since vm's can continue running on the host even if engine has no access 
 to it, starting the vm's on the second host can cause split brain and 
 data corruption.
 
 The way that the engine knows what's going on is by sending heath check 
 queries to the vdsm.
 Power management will try to reboot a host when the health checks to 
 vdsm will not be answered.
 So... if engine gets no reply and has no way of rebooting the host, the 
 host status will be changed to Non-Responsive and the vm's will be 
 unknown because engine has no way of knowing what's happening with the 
 vm's.
 Since reboot of the host will kill the vm's running on it - this will 
 never cause any vm migration but... along with the High-Availability vm 
 feature, you will be able to have some of the vm's re-started on the 
 second host after the host reboot (and that is only if Power Management 
 was confirmed as successful).
 
 VM migration is only triggered when:
 1. Cluster configuration states that the vm should be migrated in case 
 of failure
 2. Engine has access to the host - so the failure is on the storage side 
 and not the host side.
 3. the vms are not actively writing (although there might be a new RFE 
 for it).
 
 hope this clears things up
 
 Dafna
 
 
 
 On 01/27/2014 10:11 AM, Andrew Lau wrote:
 
  Hi,
 
  Have you got power management enabled?
 
  That's the fencing feature required for the engine to ensure that the 
  host is actually offline. It won't resume any other VMs to prevent 
  potential VM corruption (eg. VM running on multiple hosts).
 
  Andrew.
 
  On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com 
  mailto:urotr...@gmail.com wrote:
 
  Hi all ,
 
  I was setting a two node ovirt cluster with ovirt engine on
  seperate node . I completed the configuration and tested VM  live
  migrations with out any issues . Then for checking cluster HA I
  powered down one host and expected vms running on that host to be
  migrated to the other one . But nothing happened , Engine detected
  host as un-rechable and marked it as non-operational and vm ran on
  that host went to 'unknown state' . Is that not possible to setup
  a fully HA ovirt cluster with two nodes ? or else is that my
  configuration problem ? please advice .
 
  Thanks  Regards
 
  Alex
 
  ___
  Users mailing list
  Users@ovirt.org mailto:Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
 -- 
 Dafna Ron
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



-- 

Med Vänliga Hälsningar

---
Karli Sjöberg
Swedish University of Agricultural Sciences Box 7079 (Visiting Address
Kronåsvägen 8)
S-750 07 Uppsala, Sweden
Phone:  +46-(0)18-67 15 66
karli.sjob...@slu.se
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Dafna Ron

I am adding Tareq for the Power Management implementation.

Dafna


On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection to the host, but
has no way of telling if the host is down or if a router is down.

Can´t it at least check with power management if the Host status is down
first?

I mean, if the network is down there will be no response from either PM
or Host. But if PM is up and can tell you that the Host is down, sounds
rather clear cut to me...

Seems to me the VM's would be restarted sooner if the flow was altered
to first check with PM if it´s a network or Host issue, and if Host
issue, immediately restart VM's on another Host, instead of waiting for
a potentially problematic Host to boot up eventually.

/K


since vm's can continue running on the host even if engine has no access
to it, starting the vm's on the second host can cause split brain and
data corruption.

The way that the engine knows what's going on is by sending heath check
queries to the vdsm.
Power management will try to reboot a host when the health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of rebooting the host, the
host status will be changed to Non-Responsive and the vm's will be
unknown because engine has no way of knowing what's happening with the
vm's.
Since reboot of the host will kill the vm's running on it - this will
never cause any vm migration but... along with the High-Availability vm
feature, you will be able to have some of the vm's re-started on the
second host after the host reboot (and that is only if Power Management
was confirmed as successful).

VM migration is only triggered when:
1. Cluster configuration states that the vm should be migrated in case
of failure
2. Engine has access to the host - so the failure is on the storage side
and not the host side.
3. the vms are not actively writing (although there might be a new RFE
for it).

hope this clears things up

Dafna



On 01/27/2014 10:11 AM, Andrew Lau wrote:

Hi,

Have you got power management enabled?

That's the fencing feature required for the engine to ensure that the
host is actually offline. It won't resume any other VMs to prevent
potential VM corruption (eg. VM running on multiple hosts).

Andrew.

On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com
mailto:urotr...@gmail.com wrote:

 Hi all ,

 I was setting a two node ovirt cluster with ovirt engine on
 seperate node . I completed the configuration and tested VM  live
 migrations with out any issues . Then for checking cluster HA I
 powered down one host and expected vms running on that host to be
 migrated to the other one . But nothing happened , Engine detected
 host as un-rechable and marked it as non-operational and vm ran on
 that host went to 'unknown state' . Is that not possible to setup
 a fully HA ovirt cluster with two nodes ? or else is that my
 configuration problem ? please advice .

 Thanks  Regards

 Alex

 ___
 Users mailing list
 Users@ovirt.org mailto:Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users






--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Andrew Lau
Hi,

I think he was asking what if the power management device reported that the
host was powered off. Then VMs should be brought back up as being off would
essentially be the same as running a power cycle/reboot?

Another example I'm seeing is what happens if the whole host loses power
and it's power management device then becomes unavailable (ie. not
reachable) then you're stuck in the case where it requires manual
intervention.

I would be interested to potentially see something like a timeout on those
problematic VMs (eg. if nothing was read or write after x amount of time)
then you could consider the host as offline? I guess then that adds a lot
of risk..


On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com wrote:

 Hi,

 Power management makes use of special *dedicated* hardware in order to
 restart hosts independently of host OS. The engine connects to a power
 management devices using a *dedicated* network IP address.
 The engine is capable of rebooting hosts that have entered a
 non-operational or non-responsive state,
 The abilities provided by all power management devices are: check status,
 start, stop and recycle (restart)...

 In the case of non-responsive host: all of the VMs that are currently
 running on that host can also become non-responsive. However, the
 non-responsive host keeps locking the VM hard disk for all VMs it is
 running. Attempting to start a VM on a different host and assign the second
 host write privileges for the virtual machine hard disk image can cause
 data corruption.
 Rebooting allows the engine to assume that the lock on a VM hard disk
 image has been released.
 The engine can know for sure that the problematic host has been rebooted
 via the power management device and then it can start a VM from the
 problematic host on another host without risking data corruption.
 Important note: A virtual machine that has been marked highly-available
 can not be safely started on a different host without the certainty that
 doing so will not cause data corruption.

 N-joy,

 --Tareq




 On 01/27/2014 02:05 PM, Dafna Ron wrote:

 I am adding Tareq for the Power Management implementation.

 Dafna


 On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

 On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

 Powering off the host will never trigger vm migration.
 As far as engine is concerned it just lost connection to the host, but
 has no way of telling if the host is down or if a router is down.

 Can´t it at least check with power management if the Host status is down
 first?

 I mean, if the network is down there will be no response from either PM
 or Host. But if PM is up and can tell you that the Host is down, sounds
 rather clear cut to me...

 Seems to me the VM's would be restarted sooner if the flow was altered
 to first check with PM if it´s a network or Host issue, and if Host
 issue, immediately restart VM's on another Host, instead of waiting for
 a potentially problematic Host to boot up eventually.

 /K

  since vm's can continue running on the host even if engine has no access
 to it, starting the vm's on the second host can cause split brain and
 data corruption.

 The way that the engine knows what's going on is by sending heath check
 queries to the vdsm.
 Power management will try to reboot a host when the health checks to
 vdsm will not be answered.
 So... if engine gets no reply and has no way of rebooting the host, the
 host status will be changed to Non-Responsive and the vm's will be
 unknown because engine has no way of knowing what's happening with the
 vm's.
 Since reboot of the host will kill the vm's running on it - this will
 never cause any vm migration but... along with the High-Availability vm
 feature, you will be able to have some of the vm's re-started on the
 second host after the host reboot (and that is only if Power Management
 was confirmed as successful).

 VM migration is only triggered when:
 1. Cluster configuration states that the vm should be migrated in case
 of failure
 2. Engine has access to the host - so the failure is on the storage side
 and not the host side.
 3. the vms are not actively writing (although there might be a new RFE
 for it).

 hope this clears things up

 Dafna



 On 01/27/2014 10:11 AM, Andrew Lau wrote:

 Hi,

 Have you got power management enabled?

 That's the fencing feature required for the engine to ensure that the
 host is actually offline. It won't resume any other VMs to prevent
 potential VM corruption (eg. VM running on multiple hosts).

 Andrew.

 On Jan 27, 2014 5:12 PM, Jaison peter urotr...@gmail.com
 mailto:urotr...@gmail.com wrote:

  Hi all ,

  I was setting a two node ovirt cluster with ovirt engine on
  seperate node . I completed the configuration and tested VM  live
  migrations with out any issues . Then for checking cluster HA I
  powered down one host and expected vms running on that host to be
  migrated to the other one . But nothing happened , 

Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Dafna Ron

Andrew,
Once this discussion is finished, and If what you like done is not in 
the current implementation can you please open a bug/feature request for 
it?


Thanks,

Dafna

On 01/27/2014 12:59 PM, Tareq Alayan wrote:

Adding Eli.


On 01/27/2014 02:50 PM, Andrew Lau wrote:

Hi,

I think he was asking what if the power management device reported 
that the host was powered off. Then VMs should be brought back up as 
being off would essentially be the same as running a power cycle/reboot?


Another example I'm seeing is what happens if the whole host loses 
power and it's power management device then becomes unavailable (ie. 
not reachable) then you're stuck in the case where it requires manual 
intervention.


I would be interested to potentially see something like a timeout on 
those problematic VMs (eg. if nothing was read or write after x 
amount of time) then you could consider the host as offline? I guess 
then that adds a lot of risk..



On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com 
mailto:tala...@redhat.com wrote:


Hi,

Power management makes use of special *dedicated* hardware in
order to restart hosts independently of host OS. The engine
connects to a power management devices using a *dedicated*
network IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...

In the case of non-responsive host: all of the VMs that are
currently running on that host can also become non-responsive.
However, the non-responsive host keeps locking the VM hard disk
for all VMs it is running. Attempting to start a VM on a
different host and assign the second host write privileges for
the virtual machine hard disk image can cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard
disk image has been released.
The engine can know for sure that the problematic host has been
rebooted via the power management device and then it can start a
VM from the problematic host on another host without risking data
corruption.
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different host
without the certainty that doing so will not cause data corruption.

N-joy,

--Tareq




On 01/27/2014 02:05 PM, Dafna Ron wrote:

I am adding Tareq for the Power Management implementation.

Dafna


On 01/27/2014 11:48 AM, Karli Sjöberg wrote:

On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:

Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection
to the host, but
has no way of telling if the host is down or if a
router is down.

Can´t it at least check with power management if the Host
status is down
first?

I mean, if the network is down there will be no response
from either PM
or Host. But if PM is up and can tell you that the Host
is down, sounds
rather clear cut to me...

Seems to me the VM's would be restarted sooner if the
flow was altered
to first check with PM if it´s a network or Host issue,
and if Host
issue, immediately restart VM's on another Host, instead
of waiting for
a potentially problematic Host to boot up eventually.

/K

since vm's can continue running on the host even if
engine has no access
to it, starting the vm's on the second host can cause
split brain and
data corruption.

The way that the engine knows what's going on is by
sending heath check
queries to the vdsm.
Power management will try to reboot a host when the
health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of
rebooting the host, the
host status will be changed to Non-Responsive and the
vm's will be
unknown because engine has no way of knowing what's
happening with the
vm's.
Since reboot of the host will kill the vm's running
on it - this will
never cause any vm migration but... along with the
High-Availability vm
feature, you will be able to have some of the vm's
re-started on the
second host after the host reboot (and that is only
if Power Management
was confirmed as 

Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Eli Mesika


- Original Message -
 From: Tareq Alayan tala...@redhat.com
 To: Andrew Lau and...@andrewklau.com, Eli Mesika emes...@redhat.com
 Cc: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se, users@ovirt.org
 Sent: Monday, January 27, 2014 2:59:02 PM
 Subject: Re: [Users] two node ovirt cluster with HA
 
 Adding Eli.

I just want to summarize the requirement as I understand it:

In the case that a Host that is running HA VMs and have PM configured is turned 
off manually :

1) The non-responsive treatment should be modified to check Host status via PM 
agent 
2) If Host is off , HA VMs will attempt to run on another host ASAP
3) The host status should be set to DOWN
4) No attempt to restart vdsm (soft fencing) or restart the host (hard fencing) 
will be done 

Is the above correct? if so , a RFE on that can be opened 

 
 
 On 01/27/2014 02:50 PM, Andrew Lau wrote:
  Hi,
 
  I think he was asking what if the power management device reported
  that the host was powered off. Then VMs should be brought back up as
  being off would essentially be the same as running a power cycle/reboot?
 
  Another example I'm seeing is what happens if the whole host loses
  power and it's power management device then becomes unavailable (ie.
  not reachable) then you're stuck in the case where it requires manual
  intervention.
 
  I would be interested to potentially see something like a timeout on
  those problematic VMs (eg. if nothing was read or write after x amount
  of time) then you could consider the host as offline? I guess then
  that adds a lot of risk..
 
 
  On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com
  mailto:tala...@redhat.com wrote:
 
  Hi,
 
  Power management makes use of special *dedicated* hardware in
  order to restart hosts independently of host OS. The engine
  connects to a power management devices using a *dedicated* network
  IP address.
  The engine is capable of rebooting hosts that have entered a
  non-operational or non-responsive state,
  The abilities provided by all power management devices are: check
  status, start, stop and recycle (restart)...
 
  In the case of non-responsive host: all of the VMs that are
  currently running on that host can also become non-responsive.
  However, the non-responsive host keeps locking the VM hard disk
  for all VMs it is running. Attempting to start a VM on a different
  host and assign the second host write privileges for the virtual
  machine hard disk image can cause data corruption.
  Rebooting allows the engine to assume that the lock on a VM hard
  disk image has been released.
  The engine can know for sure that the problematic host has been
  rebooted via the power management device and then it can start a
  VM from the problematic host on another host without risking data
  corruption.
  Important note: A virtual machine that has been marked
  highly-available can not be safely started on a different host
  without the certainty that doing so will not cause data corruption.
 
  N-joy,
 
  --Tareq
 
 
 
 
  On 01/27/2014 02:05 PM, Dafna Ron wrote:
 
  I am adding Tareq for the Power Management implementation.
 
  Dafna
 
 
  On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
 
  On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
 
  Powering off the host will never trigger vm migration.
  As far as engine is concerned it just lost connection
  to the host, but
  has no way of telling if the host is down or if a
  router is down.
 
  Can´t it at least check with power management if the Host
  status is down
  first?
 
  I mean, if the network is down there will be no response
  from either PM
  or Host. But if PM is up and can tell you that the Host is
  down, sounds
  rather clear cut to me...
 
  Seems to me the VM's would be restarted sooner if the flow
  was altered
  to first check with PM if it´s a network or Host issue,
  and if Host
  issue, immediately restart VM's on another Host, instead
  of waiting for
  a potentially problematic Host to boot up eventually.
 
  /K
 
  since vm's can continue running on the host even if
  engine has no access
  to it, starting the vm's on the second host can cause
  split brain and
  data corruption.
 
  The way that the engine knows what's going on is by
  sending heath check
  queries to the vdsm.
  Power management will try to reboot a host when the
  health checks to
  vdsm will not be answered

Re: [Users] two node ovirt cluster with HA

2014-01-27 Thread Jaison peter
Thank you all for your valuable feedback .

Can you please specify some of the supported fencing devices in ovirt ?


On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika emes...@redhat.com wrote:



 - Original Message -
  From: Tareq Alayan tala...@redhat.com
  To: Andrew Lau and...@andrewklau.com, Eli Mesika 
 emes...@redhat.com
  Cc: d...@redhat.com, Karli Sjöberg karli.sjob...@slu.se,
 users@ovirt.org
  Sent: Monday, January 27, 2014 2:59:02 PM
  Subject: Re: [Users] two node ovirt cluster with HA
 
  Adding Eli.

 I just want to summarize the requirement as I understand it:

 In the case that a Host that is running HA VMs and have PM configured is
 turned off manually :

 1) The non-responsive treatment should be modified to check Host status
 via PM agent
 2) If Host is off , HA VMs will attempt to run on another host ASAP
 3) The host status should be set to DOWN
 4) No attempt to restart vdsm (soft fencing) or restart the host (hard
 fencing) will be done

 Is the above correct? if so , a RFE on that can be opened

 
 
  On 01/27/2014 02:50 PM, Andrew Lau wrote:
   Hi,
  
   I think he was asking what if the power management device reported
   that the host was powered off. Then VMs should be brought back up as
   being off would essentially be the same as running a power
 cycle/reboot?
  
   Another example I'm seeing is what happens if the whole host loses
   power and it's power management device then becomes unavailable (ie.
   not reachable) then you're stuck in the case where it requires manual
   intervention.
  
   I would be interested to potentially see something like a timeout on
   those problematic VMs (eg. if nothing was read or write after x amount
   of time) then you could consider the host as offline? I guess then
   that adds a lot of risk..
  
  
   On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan tala...@redhat.com
   mailto:tala...@redhat.com wrote:
  
   Hi,
  
   Power management makes use of special *dedicated* hardware in
   order to restart hosts independently of host OS. The engine
   connects to a power management devices using a *dedicated* network
   IP address.
   The engine is capable of rebooting hosts that have entered a
   non-operational or non-responsive state,
   The abilities provided by all power management devices are: check
   status, start, stop and recycle (restart)...
  
   In the case of non-responsive host: all of the VMs that are
   currently running on that host can also become non-responsive.
   However, the non-responsive host keeps locking the VM hard disk
   for all VMs it is running. Attempting to start a VM on a different
   host and assign the second host write privileges for the virtual
   machine hard disk image can cause data corruption.
   Rebooting allows the engine to assume that the lock on a VM hard
   disk image has been released.
   The engine can know for sure that the problematic host has been
   rebooted via the power management device and then it can start a
   VM from the problematic host on another host without risking data
   corruption.
   Important note: A virtual machine that has been marked
   highly-available can not be safely started on a different host
   without the certainty that doing so will not cause data corruption.
  
   N-joy,
  
   --Tareq
  
  
  
  
   On 01/27/2014 02:05 PM, Dafna Ron wrote:
  
   I am adding Tareq for the Power Management implementation.
  
   Dafna
  
  
   On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
  
   On Mon, 2014-01-27 at 11:11 +, Dafna Ron wrote:
  
   Powering off the host will never trigger vm migration.
   As far as engine is concerned it just lost connection
   to the host, but
   has no way of telling if the host is down or if a
   router is down.
  
   Can´t it at least check with power management if the Host
   status is down
   first?
  
   I mean, if the network is down there will be no response
   from either PM
   or Host. But if PM is up and can tell you that the Host is
   down, sounds
   rather clear cut to me...
  
   Seems to me the VM's would be restarted sooner if the flow
   was altered
   to first check with PM if it´s a network or Host issue,
   and if Host
   issue, immediately restart VM's on another Host, instead
   of waiting for
   a potentially problematic Host to boot up eventually.
  
   /K
  
   since vm's can continue running on the host even if
   engine has no access
   to it, starting the vm's on the second host can cause
   split brain and
   data