subject:"Re\: \[ovirt\-users\] Decrease downtime for HA"

Re: [ovirt-users] Decrease downtime for HA

2018-04-25 Thread Eli Mesika

On Mon, Apr 23, 2018 at 8:06 PM, Michal Skrivanek <
michal.skriva...@redhat.com> wrote:

>
>
> On 23 Apr 2018, at 10:52, Daniel Menzel 
> wrote:
>
> Hi Michal,
>
> in your last mail you wrote, that the values can be turned down - how can
> this be done?
> H
>
> AFAIK , there is no point in changing fencing vdc_options values in that
case (assuming no kdump is configured here ...)

The Fencing mechanism 

is putting the host in "connecting" state for a grace period that depends
on its number of running VMs and if it serves as APM or not
While the host became non-responding , we first try to do a soft-fence
(restart VDSM via ssh) , this will also take time
After that point , if soft-fence is failing , the host will be reboot via
the fencing script and the time it takes is totally depending on the host
If you have something to look at , it is your host reboot time and try to
improve it, if the host will reboot faster, the whole process will take
less time ...

Regards

Eli


>
>
> Best
> Daniel
>
> On 12.04.2018 20:29, Michal Skrivanek wrote:
>
>
>
> On 12 Apr 2018, at 13:13, Daniel Menzel 
> wrote:
>
> Hi there,
>
> does anyone have an idea how to decrease a virtual machine's downtime?
>
> Best
> Daniel
>
> On 06.04.2018 13:34, Daniel Menzel wrote:
>
> Hi Michal,
>
>
> Hi Daniel,
> adding Martin to review fencing behavior
>
> (sorry for misspelling your name in my first mail).
>
>
> that’s not the reason I’m replying late!:-))
>
> The settings for the VMs are the following (oVirt 4.2):
>
>1. HA checkbox enabled of course
>2. "Target Storage Domain for VM Lease" -> left empty
>
>
> if you need faster reactions then try to use VM Leases as well, it won’t
> make a difference in this case but will help in case of network issues.
> E.g. if you use iSCSI and the storage connection breaks while host
> connection still works it would restart the VM in about 80s; otherwise it
> would take >5 mins.
>
>
>1. "Resume Behavior" -> AUTO_RESUME
>2. Priority for Migration -> High
>3. "Watchdog Model" -> No-Watchdog
>
> For testing we did not kill any VM but the host. So basically we simulated
> an instantaneous crash by manually turning the machine off via
> IPMI-Interface (not via operating system!) and ping the guest(s). What
> happens then?
>
>1. 2-3 seconds after the we press the host's shutdown button we lose
>ping contact to the VM(s).
>2. After another 20s oVirt changes the host's status to "connecting",
>the VM's status is set to a question mark.
>3. After ~1:30 the host is flagged to "non responsive”
>
>
> that sounds about right. Now fencing action should have been initiated, if
> you can share the engine logs we can confirm that. IIRC we first try soft
> fencing - try to ssh to that host, that might take some time to time out I
> guess. Martin?
>
>
>1.
>2. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later
>the guest is back online.
>
> So, there seems to be one mistake I made in the first mail: The downtime
> is "only" 2.5min. But still I think this time can be decreased as for some
> services it is still quite a long time.
>
>
> these values can be tuned down, but then you may be more susceptible to
> fencing power cycling a host in case of shorter network outages. It may be
> ok…depending on your requirements.
>
> Best
> Daniel
>
> On 06.04.2018 12:49, Michal Skrivanek wrote:
>
> On 6 Apr 2018, at 12:45, Daniel Menzel  
>  wrote:
>
> Hi Michael,
> thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
> management and fencing enabled on all hosts. We also tested this and found 
> out that it works perfectly. So this cannot be the reason I guess.
>
> Hi Daniel,
> ok, then it’s worth looking into details. Can you describe in more detail 
> what happens? What exact settings you’re using for such VM? Are you killing 
> the HE VM or other VMs or both? Would be good to narrow it down a bit and 
> then review the exact flow
>
> Thanks,
> michal
>
>
> Daniel
>
>
>
> On 06.04.2018 11:11, Michal Skrivanek wrote:
>
> On 4 Apr 2018, at 15:36, Daniel Menzel  
>  wrote:
>
> Hello,
>
> we're successfully using a setup with 4 Nodes and a replicated Gluster for 
> storage. The engine is self hosted. What we're dealing with at the moment is 
> the high availability: If a node fails (for example simulated by a forced 
> power loss) the engine comes back up online withing ~2min. But guests (having 
> the HA option enabled) come back online only after a very long grace time of 
> ~5min. As we have a reliable network (40 GbE) and reliable servers I think 
> that the default grace times are way too high for us - is there any 
> possibility to change those values?
>
> And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
> Otherwise we have to resort to relatively long timeouts to make sure the host 
> is really dead
> Thanks,
> michal
>
> Thanks in advance!
> Daniel
>
> ___
> Use

Re: [ovirt-users] Decrease downtime for HA

2018-04-23 Thread Michal Skrivanek



> On 23 Apr 2018, at 10:52, Daniel Menzel  
> wrote:
> 
> Hi Michal,
> 
> in your last mail you wrote, that the values can be turned down - how can 
> this be done?
> 
> 

this is not anything we change very often as it then decreases the system’s 
tolerance to short network glitches
You’d have to take a look at vdc_options and play with some of those 
parameters…Martin/Eli may have some suggestions, otherwise you’d have to read 
the source code and experiment
> Best
> Daniel
> 
> On 12.04.2018 20:29, Michal Skrivanek wrote:
>> 
>> 
>>> On 12 Apr 2018, at 13:13, Daniel Menzel >> > wrote:
>>> 
>>> Hi there,
>>> 
>>> does anyone have an idea how to decrease a virtual machine's downtime?
>>> 
>>> Best
>>> Daniel
>>> 
>>> On 06.04.2018 13:34, Daniel Menzel wrote:
 Hi Michal,
 
 
>> 
>> Hi Daniel,
>> adding Martin to review fencing behavior
 (sorry for misspelling your name in my first mail).
 
 
>> 
>> that’s not the reason I’m replying late!:-))
>> 
 The settings for the VMs are the following (oVirt 4.2):
 
 HA checkbox enabled of course
 "Target Storage Domain for VM Lease" -> left empty
>> 
>> if you need faster reactions then try to use VM Leases as well, it won’t 
>> make a difference in this case but will help in case of network issues. E.g. 
>> if you use iSCSI and the storage connection breaks while host connection 
>> still works it would restart the VM in about 80s; otherwise it would take >5 
>> mins. 
 "Resume Behavior" -> AUTO_RESUME
 Priority for Migration -> High
 "Watchdog Model" -> No-Watchdog
 For testing we did not kill any VM but the host. So basically we simulated 
 an instantaneous crash by manually turning the machine off via 
 IPMI-Interface (not via operating system!) and ping the guest(s). What 
 happens then?
 
 2-3 seconds after the we press the host's shutdown button we lose ping 
 contact to the VM(s).
 After another 20s oVirt changes the host's status to "connecting", the 
 VM's status is set to a question mark.
 After ~1:30 the host is flagged to "non responsive”
>> 
>> that sounds about right. Now fencing action should have been initiated, if 
>> you can share the engine logs we can confirm that. IIRC we first try soft 
>> fencing - try to ssh to that host, that might take some time to time out I 
>> guess. Martin?
 
 After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the guest 
 is back online.
 So, there seems to be one mistake I made in the first mail: The downtime 
 is "only" 2.5min. But still I think this time can be decreased as for some 
 services it is still quite a long time.
 
 
>> 
>> these values can be tuned down, but then you may be more susceptible to 
>> fencing power cycling a host in case of shorter network outages. It may be 
>> ok…depending on your requirements.
 Best
 Daniel
 
 On 06.04.2018 12:49, Michal Skrivanek wrote:
>> On 6 Apr 2018, at 12:45, Daniel Menzel  
>>  wrote:
>> 
>> Hi Michael,
>> thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
>> management and fencing enabled on all hosts. We also tested this and 
>> found out that it works perfectly. So this cannot be the reason I guess.
> Hi Daniel,
> ok, then it’s worth looking into details. Can you describe in more detail 
> what happens? What exact settings you’re using for such VM? Are you 
> killing the HE VM or other VMs or both? Would be good to narrow it down a 
> bit and then review the exact flow
> 
> Thanks,
> michal
> 
>> Daniel
>> 
>> 
>> 
>> On 06.04.2018 11:11, Michal Skrivanek wrote:
 On 4 Apr 2018, at 15:36, Daniel Menzel 
  
  wrote:
 
 Hello,
 
 we're successfully using a setup with 4 Nodes and a replicated Gluster 
 for storage. The engine is self hosted. What we're dealing with at the 
 moment is the high availability: If a node fails (for example 
 simulated by a forced power loss) the engine comes back up online 
 withing ~2min. But guests (having the HA option enabled) come back 
 online only after a very long grace time of ~5min. As we have a 
 reliable network (40 GbE) and reliable servers I think that the 
 default grace times are way too high for us - is there any possibility 
 to change those values?
>>> And do you have Power Management(iLO, iDRAC,etc) configured for your 
>>> hosts? Otherwise we have to resort to relatively long timeouts to make 
>>> sure the host is really dead
>>> Thanks,
>>> michal
 Thanks in advance!
 Daniel
 
 ___
 Users mailing list
 Users@ovirt.

Re: [ovirt-users] Decrease downtime for HA

2018-04-23 Thread Daniel Menzel


Hi Michal,

in your last mail you wrote, that the values can be turned down - how 
can this be done?


Best
Daniel


On 12.04.2018 20:29, Michal Skrivanek wrote:



On 12 Apr 2018, at 13:13, Daniel Menzel 
> wrote:


Hi there,

does anyone have an idea how to decrease a virtual machine's downtime?

Best
Daniel


On 06.04.2018 13:34, Daniel Menzel wrote:


Hi Michal,




Hi Daniel,
adding Martin to review fencing behavior


(sorry for misspelling your name in my first mail).




that’s not the reason I’m replying late!:-))


The settings for the VMs are the following (oVirt 4.2):

 1. HA checkbox enabled of course
 2. "Target Storage Domain for VM Lease" -> left empty



if you need faster reactions then try to use VM Leases as well, it 
won’t make a difference in this case but will help in case of network 
issues. E.g. if you use iSCSI and the storage connection breaks while 
host connection still works it would restart the VM in about 80s; 
otherwise it would take >5 mins.


 3. "Resume Behavior" -> AUTO_RESUME
 4. Priority for Migration -> High
 5. "Watchdog Model" -> No-Watchdog

For testing we did not kill any VM but the host. So basically we 
simulated an instantaneous crash by manually turning the machine off 
via IPMI-Interface (not via operating system!) and ping the 
guest(s). What happens then?


 1. 2-3 seconds after the we press the host's shutdown button we
lose ping contact to the VM(s).
 2. After another 20s oVirt changes the host's status to
"connecting", the VM's status is set to a question mark.
 3. After ~1:30 the host is flagged to "non responsive”



that sounds about right. Now fencing action should have been 
initiated, if you can share the engine logs we can confirm that. IIRC 
we first try soft fencing - try to ssh to that host, that might take 
some time to time out I guess. Martin?


3.


 4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later
the guest is back online.

So, there seems to be one mistake I made in the first mail: The 
downtime is "only" 2.5min. But still I think this time can be 
decreased as for some services it is still quite a long time.





these values can be tuned down, but then you may be more susceptible 
to fencing power cycling a host in case of shorter network outages. It 
may be ok…depending on your requirements.


Best
Daniel


On 06.04.2018 12:49, Michal Skrivanek wrote:

On 6 Apr 2018, at 12:45, Daniel Menzel  wrote:

Hi Michael,
thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
management and fencing enabled on all hosts. We also tested this and found out 
that it works perfectly. So this cannot be the reason I guess.

Hi Daniel,
ok, then it’s worth looking into details. Can you describe in more detail what 
happens? What exact settings you’re using for such VM? Are you killing the HE 
VM or other VMs or both? Would be good to narrow it down a bit and then review 
the exact flow

Thanks,
michal


Daniel



On 06.04.2018 11:11, Michal Skrivanek wrote:

On 4 Apr 2018, at 15:36, Daniel Menzel  wrote:

Hello,

we're successfully using a setup with 4 Nodes and a replicated Gluster for 
storage. The engine is self hosted. What we're dealing with at the moment is 
the high availability: If a node fails (for example simulated by a forced power 
loss) the engine comes back up online withing ~2min. But guests (having the HA 
option enabled) come back online only after a very long grace time of ~5min. As 
we have a reliable network (40 GbE) and reliable servers I think that the 
default grace times are way too high for us - is there any possibility to 
change those values?

And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
Otherwise we have to resort to relatively long timeouts to make sure the host 
is really dead
Thanks,
michal

Thanks in advance!
Daniel

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-12 Thread Michal Skrivanek



> On 12 Apr 2018, at 13:13, Daniel Menzel  
> wrote:
> 
> Hi there,
> 
> does anyone have an idea how to decrease a virtual machine's downtime?
> 
> Best
> Daniel
> 
> On 06.04.2018 13:34, Daniel Menzel wrote:
>> Hi Michal,
>> 
>> 

Hi Daniel,
adding Martin to review fencing behavior
>> (sorry for misspelling your name in my first mail).
>> 
>> 

that’s not the reason I’m replying late!:-))

>> The settings for the VMs are the following (oVirt 4.2):
>> 
>> HA checkbox enabled of course
>> "Target Storage Domain for VM Lease" -> left empty

if you need faster reactions then try to use VM Leases as well, it won’t make a 
difference in this case but will help in case of network issues. E.g. if you 
use iSCSI and the storage connection breaks while host connection still works 
it would restart the VM in about 80s; otherwise it would take >5 mins. 
>> "Resume Behavior" -> AUTO_RESUME
>> Priority for Migration -> High
>> "Watchdog Model" -> No-Watchdog
>> For testing we did not kill any VM but the host. So basically we simulated 
>> an instantaneous crash by manually turning the machine off via 
>> IPMI-Interface (not via operating system!) and ping the guest(s). What 
>> happens then?
>> 
>> 2-3 seconds after the we press the host's shutdown button we lose ping 
>> contact to the VM(s).
>> After another 20s oVirt changes the host's status to "connecting", the VM's 
>> status is set to a question mark.
>> After ~1:30 the host is flagged to "non responsive”

that sounds about right. Now fencing action should have been initiated, if you 
can share the engine logs we can confirm that. IIRC we first try soft fencing - 
try to ssh to that host, that might take some time to time out I guess. Martin?
>> After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the guest 
>> is back online.
>> So, there seems to be one mistake I made in the first mail: The downtime is 
>> "only" 2.5min. But still I think this time can be decreased as for some 
>> services it is still quite a long time.
>> 
>> 

these values can be tuned down, but then you may be more susceptible to fencing 
power cycling a host in case of shorter network outages. It may be ok…depending 
on your requirements.
>> Best
>> Daniel
>> 
>> On 06.04.2018 12:49, Michal Skrivanek wrote:
 On 6 Apr 2018, at 12:45, Daniel Menzel  
  wrote:
 
 Hi Michael,
 thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
 management and fencing enabled on all hosts. We also tested this and found 
 out that it works perfectly. So this cannot be the reason I guess.
>>> Hi Daniel,
>>> ok, then it’s worth looking into details. Can you describe in more detail 
>>> what happens? What exact settings you’re using for such VM? Are you killing 
>>> the HE VM or other VMs or both? Would be good to narrow it down a bit and 
>>> then review the exact flow
>>> 
>>> Thanks,
>>> michal
>>> 
 Daniel
 
 
 
 On 06.04.2018 11:11, Michal Skrivanek wrote:
>> On 4 Apr 2018, at 15:36, Daniel Menzel  
>>  wrote:
>> 
>> Hello,
>> 
>> we're successfully using a setup with 4 Nodes and a replicated Gluster 
>> for storage. The engine is self hosted. What we're dealing with at the 
>> moment is the high availability: If a node fails (for example simulated 
>> by a forced power loss) the engine comes back up online withing ~2min. 
>> But guests (having the HA option enabled) come back online only after a 
>> very long grace time of ~5min. As we have a reliable network (40 GbE) 
>> and reliable servers I think that the default grace times are way too 
>> high for us - is there any possibility to change those values?
> And do you have Power Management(iLO, iDRAC,etc) configured for your 
> hosts? Otherwise we have to resort to relatively long timeouts to make 
> sure the host is really dead
> Thanks,
> michal
>> Thanks in advance!
>> Daniel
>> 
>> ___
>> Users mailing list
>> Users@ovirt.org 
>> http://lists.ovirt.org/mailman/listinfo/users 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> Users mailing list
>> Users@ovirt.org 
>> http://lists.ovirt.org/mailman/listinfo/users 
>> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-12 Thread Daniel Menzel


Hi there,

does anyone have an idea how to decrease a virtual machine's downtime?

Best
Daniel


On 06.04.2018 13:34, Daniel Menzel wrote:


Hi Michal,

(sorry for misspelling your name in my first mail).

The settings for the VMs are the following (oVirt 4.2):

 1. HA checkbox enabled of course
 2. "Target Storage Domain for VM Lease" -> left empty
 3. "Resume Behavior" -> AUTO_RESUME
 4. Priority for Migration -> High
 5. "Watchdog Model" -> No-Watchdog

For testing we did not kill any VM but the host. So basically we 
simulated an instantaneous crash by manually turning the machine off 
via IPMI-Interface (not via operating system!) and ping the guest(s). 
What happens then?


 1. 2-3 seconds after the we press the host's shutdown button we lose
ping contact to the VM(s).
 2. After another 20s oVirt changes the host's status to "connecting",
the VM's status is set to a question mark.
 3. After ~1:30 the host is flagged to "non responsive"
 4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later
the guest is back online.

So, there seems to be one mistake I made in the first mail: The 
downtime is "only" 2.5min. But still I think this time can be 
decreased as for some services it is still quite a long time.


Best
Daniel


On 06.04.2018 12:49, Michal Skrivanek wrote:

On 6 Apr 2018, at 12:45, Daniel Menzel  wrote:

Hi Michael,
thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
management and fencing enabled on all hosts. We also tested this and found out 
that it works perfectly. So this cannot be the reason I guess.

Hi Daniel,
ok, then it’s worth looking into details. Can you describe in more detail what 
happens? What exact settings you’re using for such VM? Are you killing the HE 
VM or other VMs or both? Would be good to narrow it down a bit and then review 
the exact flow

Thanks,
michal


Daniel



On 06.04.2018 11:11, Michal Skrivanek wrote:

On 4 Apr 2018, at 15:36, Daniel Menzel  wrote:

Hello,

we're successfully using a setup with 4 Nodes and a replicated Gluster for 
storage. The engine is self hosted. What we're dealing with at the moment is 
the high availability: If a node fails (for example simulated by a forced power 
loss) the engine comes back up online withing ~2min. But guests (having the HA 
option enabled) come back online only after a very long grace time of ~5min. As 
we have a reliable network (40 GbE) and reliable servers I think that the 
default grace times are way too high for us - is there any possibility to 
change those values?

And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
Otherwise we have to resort to relatively long timeouts to make sure the host 
is really dead
Thanks,
michal

Thanks in advance!
Daniel

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-06 Thread Daniel Menzel


Hi Michal,

(sorry for misspelling your name in my first mail).

The settings for the VMs are the following (oVirt 4.2):

1. HA checkbox enabled of course
2. "Target Storage Domain for VM Lease" -> left empty
3. "Resume Behavior" -> AUTO_RESUME
4. Priority for Migration -> High
5. "Watchdog Model" -> No-Watchdog

For testing we did not kill any VM but the host. So basically we 
simulated an instantaneous crash by manually turning the machine off via 
IPMI-Interface (not via operating system!) and ping the guest(s). What 
happens then?


1. 2-3 seconds after the we press the host's shutdown button we lose
   ping contact to the VM(s).
2. After another 20s oVirt changes the host's status to "connecting",
   the VM's status is set to a question mark.
3. After ~1:30 the host is flagged to "non responsive"
4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the
   guest is back online.

So, there seems to be one mistake I made in the first mail: The downtime 
is "only" 2.5min. But still I think this time can be decreased as for 
some services it is still quite a long time.


Best
Daniel


On 06.04.2018 12:49, Michal Skrivanek wrote:



On 6 Apr 2018, at 12:45, Daniel Menzel  wrote:

Hi Michael,
thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
management and fencing enabled on all hosts. We also tested this and found out 
that it works perfectly. So this cannot be the reason I guess.

Hi Daniel,
ok, then it’s worth looking into details. Can you describe in more detail what 
happens? What exact settings you’re using for such VM? Are you killing the HE 
VM or other VMs or both? Would be good to narrow it down a bit and then review 
the exact flow

Thanks,
michal


Daniel



On 06.04.2018 11:11, Michal Skrivanek wrote:

On 4 Apr 2018, at 15:36, Daniel Menzel  wrote:

Hello,

we're successfully using a setup with 4 Nodes and a replicated Gluster for 
storage. The engine is self hosted. What we're dealing with at the moment is 
the high availability: If a node fails (for example simulated by a forced power 
loss) the engine comes back up online withing ~2min. But guests (having the HA 
option enabled) come back online only after a very long grace time of ~5min. As 
we have a reliable network (40 GbE) and reliable servers I think that the 
default grace times are way too high for us - is there any possibility to 
change those values?

And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
Otherwise we have to resort to relatively long timeouts to make sure the host 
is really dead
Thanks,
michal

Thanks in advance!
Daniel

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-06 Thread Michal Skrivanek



> On 6 Apr 2018, at 12:45, Daniel Menzel  
> wrote:
> 
> Hi Michael,
> thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
> management and fencing enabled on all hosts. We also tested this and found 
> out that it works perfectly. So this cannot be the reason I guess.

Hi Daniel,
ok, then it’s worth looking into details. Can you describe in more detail what 
happens? What exact settings you’re using for such VM? Are you killing the HE 
VM or other VMs or both? Would be good to narrow it down a bit and then review 
the exact flow

Thanks,
michal

> 
> Daniel
> 
> 
> 
> On 06.04.2018 11:11, Michal Skrivanek wrote:
>>> On 4 Apr 2018, at 15:36, Daniel Menzel  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> we're successfully using a setup with 4 Nodes and a replicated Gluster for 
>>> storage. The engine is self hosted. What we're dealing with at the moment 
>>> is the high availability: If a node fails (for example simulated by a 
>>> forced power loss) the engine comes back up online withing ~2min. But 
>>> guests (having the HA option enabled) come back online only after a very 
>>> long grace time of ~5min. As we have a reliable network (40 GbE) and 
>>> reliable servers I think that the default grace times are way too high for 
>>> us - is there any possibility to change those values?
>> And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
>> Otherwise we have to resort to relatively long timeouts to make sure the 
>> host is really dead
>> Thanks,
>> michal
>>> 
>>> Thanks in advance!
>>> Daniel
>>> 
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>> 
>>> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-06 Thread Daniel Menzel


Hi Michael,
thanks for your mail. Sorry, I forgot to write that. Yes, we have power 
management and fencing enabled on all hosts. We also tested this and 
found out that it works perfectly. So this cannot be the reason I guess.


Daniel



On 06.04.2018 11:11, Michal Skrivanek wrote:




On 4 Apr 2018, at 15:36, Daniel Menzel  wrote:

Hello,

we're successfully using a setup with 4 Nodes and a replicated Gluster for 
storage. The engine is self hosted. What we're dealing with at the moment is 
the high availability: If a node fails (for example simulated by a forced power 
loss) the engine comes back up online withing ~2min. But guests (having the HA 
option enabled) come back online only after a very long grace time of ~5min. As 
we have a reliable network (40 GbE) and reliable servers I think that the 
default grace times are way too high for us - is there any possibility to 
change those values?


And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
Otherwise we have to resort to relatively long timeouts to make sure the host 
is really dead

Thanks,
michal


Thanks in advance!
Daniel

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

2018-04-06 Thread Michal Skrivanek



> On 4 Apr 2018, at 15:36, Daniel Menzel  
> wrote:
> 
> Hello,
> 
> we're successfully using a setup with 4 Nodes and a replicated Gluster for 
> storage. The engine is self hosted. What we're dealing with at the moment is 
> the high availability: If a node fails (for example simulated by a forced 
> power loss) the engine comes back up online withing ~2min. But guests (having 
> the HA option enabled) come back online only after a very long grace time of 
> ~5min. As we have a reliable network (40 GbE) and reliable servers I think 
> that the default grace times are way too high for us - is there any 
> possibility to change those values?

And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? 
Otherwise we have to resort to relatively long timeouts to make sure the host 
is really dead

Thanks,
michal
> 
> Thanks in advance!
> Daniel
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

Re: [ovirt-users] Decrease downtime for HA

9 matches

Site Navigation

Mail list logo

Footer information