Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-27 Thread Piotr Kliczewski
Looking at the logs I can see that connection was lost at 2015-01-26
09:24:43,213
and I can see good number of reconnection attempts which end up with
timeout or 'no route to host'.
The connection was recovered at 2015-01-26 09:28:56,292.

Vdsm.log do not contain above connection loss (it starts at 2015-01-26
10:01:02,208).

It was lost again at 2015-01-26 11:54:58,741 and it was recovered at
2015-01-26 12:01:47,752.

I checked vdsm logs and I can see really weird lack of logs:

JsonRpc (StompReactor)::DEBUG::2015-01-26
11:52:35,893::stompReactor::98::Broker.StompAdapter::(handle_frame)
Handling message StompFMainThread::INFO::2015-01-26
12:01:45,183::vdsm::131::vds::(run) (PID: 7021) I am the actual vdsm
4.16.10-8.gitc937927.el6 love005.ovt.visionamics.com
(2.6.32-504.3.3.el6.x86_64)
MainThread::DEBUG::2015-01-26
12:01:45,184::resourceManager::421::Storage.ResourceManager::(registerNamespace)
Registering namespace 'Storage'

which covers having no connection from the engine perspective.

Usually when there are connectivity issues we see timeouts in the logs
but here there are 'no route to host' as well
which suggest networking issues.

@Dan - Do you know what caused lack of logs in vdsm?
@ILanit - What vdsm version do you use?

On Tue, Jan 27, 2015 at 4:57 PM, Piotr Kliczewski pklic...@redhat.com wrote:




 - Original Message -
 From: Eli Mesika emes...@redhat.com
 To: Piotr Kliczewski pklic...@redhat.com
 Cc: Artyom Lukianov aluki...@redhat.com, users@ovirt.org, 
 rabsh...@citytwist.net, ILanit Stein
 ist...@redhat.com
 Sent: Tuesday, January 27, 2015 4:39:26 PM
 Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot



 - Original Message -
  From: ILanit Stein ist...@redhat.com
  To: Artyom Lukianov aluki...@redhat.com, Eli Mesika
  emes...@redhat.com
  Cc: users@ovirt.org, rabsh...@citytwist.net
  Sent: Tuesday, January 27, 2015 5:19:12 PM
  Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
 
 
  Hi Guys,
 
  Can you please look into this please?

 Hi
 From the logs I can see clearly that host is turned on in 2015-01-26
 11:56:51,191
 However, there is a stomp exception in 2015-01-26 11:56:53,544 and a
 connection timeout in 2015-01-26 11:56:53,553 that might be related

 Piotr, can you please have a look ?


 Sure. Can you please send me the logs?


 
  Thanks,
  Ilanit.
  - Forwarded Message -
  From: Rob Abshear rabsh...@citytwist.net
  To: ILanit Stein ist...@redhat.com
  Sent: Tuesday, January 27, 2015 3:05:56 PM
  Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
 
  Here are the logs. you requested.  The shutdown of the node was at 11:53
  and vdsmd was manually restarted at 12:01 to get the node back online.
 
  On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein ist...@redhat.com wrote:
 
   It might be a bug,
   Would you please attach the logs, I mentioned bellow,
   that can bring more details on the failure?
   Adding Eli, that may want to give some input on this issue.
  
   Thanks,
   Ilanit.
  
   - Original Message -
   From: Rob Abshear rabsh...@citytwist.net
   To: ILanit Stein ist...@redhat.com
   Cc: users@ovirt.org
   Sent: Monday, January 26, 2015 9:43:14 PM
   Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
  
   I have done a bit more investigating on this matter.  If I restart the
   node
   from within oVirt using the power management option restart, then the
   node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
   the command to power cycle the machine, then the machine restarts and
   vdsmd
   DOES start.  I can run the following command from another node in the
   cluster:
   fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
   and the node restarts and vdsmd DOES start.
  
   On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com wrote:
  
Hi Rob,
   
Thanks for this report.
   
Would you please provide these logs, at the time frame, the host
failure
occur:
1. oVirt Engine: /var/log/ovirt-engine/engine.log
2. host: /var/log/vdsm/vdsm.log
   
If it is reproducible, please add this info as well.
   
You can also check vdsm service status, on host, while host reported as
Non responsive,
by running on host 'service vdsmd status'
There might some problem, that might have prevented from vdsm service
to
come up, on host.
   
Ilanit.
   
- Original Message -
From: Rob Abshear rabsh...@citytwist.net
To: users@ovirt.org
Sent: Friday, January 23, 2015 9:22:42 PM
Subject: [ovirt-users] Host remains Non-Responsive after reboot
   
   
I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
cluster. Each host has a drac5 and it is configured and working. I am
trying to simulate a node failure. I am running one HA VM on one of the
hosts for testing. I simulate the failure by powering off the host with
   the
VM running

Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-27 Thread Rob Abshear
Yeah.  There would have been a lot of connection issues because I was doing
a lot of testing and reconfiguring.The only part that's really
applicable for this issue is the period you mentioned from 11:54 to 12:01.
I did use the service vdsm status command after the host came back up and
the service was not running.  I start the service manually and it comes up
without error and the node comes back online.  Is it normal operation for
the host to automatically recover if it can, including starting vdsmd?  One
of my colleagues thinks that, perhaps that we are experiencing normal
operation.  But I can't imagine that the host wouldn't come back completely
if it's able.

On Tue, Jan 27, 2015 at 3:05 PM, Piotr Kliczewski 
piotr.kliczew...@gmail.com wrote:

 Looking at the logs I can see that connection was lost at 2015-01-26
 09:24:43,213
 and I can see good number of reconnection attempts which end up with
 timeout or 'no route to host'.
 The connection was recovered at 2015-01-26 09:28:56,292.

 Vdsm.log do not contain above connection loss (it starts at 2015-01-26
 10:01:02,208).

 It was lost again at 2015-01-26 11:54:58,741 and it was recovered at
 2015-01-26 12:01:47,752.

 I checked vdsm logs and I can see really weird lack of logs:

 JsonRpc (StompReactor)::DEBUG::2015-01-26
 11:52:35,893::stompReactor::98::Broker.StompAdapter::(handle_frame)
 Handling message StompFMainThread::INFO::2015-01-26
 12:01:45,183::vdsm::131::vds::(run) (PID: 7021) I am the actual vdsm
 4.16.10-8.gitc937927.el6 love005.ovt.visionamics.com
 (2.6.32-504.3.3.el6.x86_64)
 MainThread::DEBUG::2015-01-26

 12:01:45,184::resourceManager::421::Storage.ResourceManager::(registerNamespace)
 Registering namespace 'Storage'

 which covers having no connection from the engine perspective.

 Usually when there are connectivity issues we see timeouts in the logs
 but here there are 'no route to host' as well
 which suggest networking issues.

 @Dan - Do you know what caused lack of logs in vdsm?
 @ILanit - What vdsm version do you use?

 On Tue, Jan 27, 2015 at 4:57 PM, Piotr Kliczewski pklic...@redhat.com
 wrote:
 
 
 
 
  - Original Message -
  From: Eli Mesika emes...@redhat.com
  To: Piotr Kliczewski pklic...@redhat.com
  Cc: Artyom Lukianov aluki...@redhat.com, users@ovirt.org,
 rabsh...@citytwist.net, ILanit Stein
  ist...@redhat.com
  Sent: Tuesday, January 27, 2015 4:39:26 PM
  Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
 
 
 
  - Original Message -
   From: ILanit Stein ist...@redhat.com
   To: Artyom Lukianov aluki...@redhat.com, Eli Mesika
   emes...@redhat.com
   Cc: users@ovirt.org, rabsh...@citytwist.net
   Sent: Tuesday, January 27, 2015 5:19:12 PM
   Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
  
  
   Hi Guys,
  
   Can you please look into this please?
 
  Hi
  From the logs I can see clearly that host is turned on in 2015-01-26
  11:56:51,191
  However, there is a stomp exception in 2015-01-26 11:56:53,544 and a
  connection timeout in 2015-01-26 11:56:53,553 that might be related
 
  Piotr, can you please have a look ?
 
 
  Sure. Can you please send me the logs?
 
 
  
   Thanks,
   Ilanit.
   - Forwarded Message -
   From: Rob Abshear rabsh...@citytwist.net
   To: ILanit Stein ist...@redhat.com
   Sent: Tuesday, January 27, 2015 3:05:56 PM
   Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
  
   Here are the logs. you requested.  The shutdown of the node was at
 11:53
   and vdsmd was manually restarted at 12:01 to get the node back online.
  
   On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein ist...@redhat.com
 wrote:
  
It might be a bug,
Would you please attach the logs, I mentioned bellow,
that can bring more details on the failure?
Adding Eli, that may want to give some input on this issue.
   
Thanks,
Ilanit.
   
- Original Message -
From: Rob Abshear rabsh...@citytwist.net
To: ILanit Stein ist...@redhat.com
Cc: users@ovirt.org
Sent: Monday, January 26, 2015 9:43:14 PM
Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
   
I have done a bit more investigating on this matter.  If I restart
 the
node
from within oVirt using the power management option restart, then
 the
node restarts and vdsmd DOES NOT start.  If I go into the DRAC and
 issue
the command to power cycle the machine, then the machine restarts
 and
vdsmd
DOES start.  I can run the following command from another node in
 the
cluster:
fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
and the node restarts and vdsmd DOES start.
   
On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com
 wrote:
   
 Hi Rob,

 Thanks for this report.

 Would you please provide these logs, at the time frame, the host
 failure
 occur:
 1. oVirt Engine: /var/log/ovirt-engine/engine.log
 2. host: /var/log/vdsm

Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-27 Thread Piotr Kliczewski




- Original Message -
 From: Eli Mesika emes...@redhat.com
 To: Piotr Kliczewski pklic...@redhat.com
 Cc: Artyom Lukianov aluki...@redhat.com, users@ovirt.org, 
 rabsh...@citytwist.net, ILanit Stein
 ist...@redhat.com
 Sent: Tuesday, January 27, 2015 4:39:26 PM
 Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
 
 
 
 - Original Message -
  From: ILanit Stein ist...@redhat.com
  To: Artyom Lukianov aluki...@redhat.com, Eli Mesika
  emes...@redhat.com
  Cc: users@ovirt.org, rabsh...@citytwist.net
  Sent: Tuesday, January 27, 2015 5:19:12 PM
  Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
  
  
  Hi Guys,
  
  Can you please look into this please?
 
 Hi
 From the logs I can see clearly that host is turned on in 2015-01-26
 11:56:51,191
 However, there is a stomp exception in 2015-01-26 11:56:53,544 and a
 connection timeout in 2015-01-26 11:56:53,553 that might be related
 
 Piotr, can you please have a look ?
 

Sure. Can you please send me the logs?

 
  
  Thanks,
  Ilanit.
  - Forwarded Message -
  From: Rob Abshear rabsh...@citytwist.net
  To: ILanit Stein ist...@redhat.com
  Sent: Tuesday, January 27, 2015 3:05:56 PM
  Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
  
  Here are the logs. you requested.  The shutdown of the node was at 11:53
  and vdsmd was manually restarted at 12:01 to get the node back online.
  
  On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein ist...@redhat.com wrote:
  
   It might be a bug,
   Would you please attach the logs, I mentioned bellow,
   that can bring more details on the failure?
   Adding Eli, that may want to give some input on this issue.
  
   Thanks,
   Ilanit.
  
   - Original Message -
   From: Rob Abshear rabsh...@citytwist.net
   To: ILanit Stein ist...@redhat.com
   Cc: users@ovirt.org
   Sent: Monday, January 26, 2015 9:43:14 PM
   Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
  
   I have done a bit more investigating on this matter.  If I restart the
   node
   from within oVirt using the power management option restart, then the
   node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
   the command to power cycle the machine, then the machine restarts and
   vdsmd
   DOES start.  I can run the following command from another node in the
   cluster:
   fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
   and the node restarts and vdsmd DOES start.
  
   On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com wrote:
  
Hi Rob,
   
Thanks for this report.
   
Would you please provide these logs, at the time frame, the host
failure
occur:
1. oVirt Engine: /var/log/ovirt-engine/engine.log
2. host: /var/log/vdsm/vdsm.log
   
If it is reproducible, please add this info as well.
   
You can also check vdsm service status, on host, while host reported as
Non responsive,
by running on host 'service vdsmd status'
There might some problem, that might have prevented from vdsm service
to
come up, on host.
   
Ilanit.
   
- Original Message -
From: Rob Abshear rabsh...@citytwist.net
To: users@ovirt.org
Sent: Friday, January 23, 2015 9:22:42 PM
Subject: [ovirt-users] Host remains Non-Responsive after reboot
   
   
I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
cluster. Each host has a drac5 and it is configured and working. I am
trying to simulate a node failure. I am running one HA VM on one of the
hosts for testing. I simulate the failure by powering off the host with
   the
VM running.
   
Here is what is happening.
   
   
* Host is powered off
* ~4 minutes pass and the host is recognized as not responding
* Automatic fence runs and the VM migrates. Another host in the
node
is chosen as a proxy to execute Status command on the host.
* Same host is chosen as proxy to execute Start command on the
host.
* Same host is chosen as proxy to execute Status command on the
host.
* The host DOES physically start.
* The host never shows status of UP.
* I select “confirm host has been rebooted” and I see a manual
fence
start.
* Host stays non-responsive.
* I put the host in maintenance and then activate it.
* Host still non-responsive
* I put the host in maintenance and do a reinstall
* Reinstall finishes and host becomes UP
   
So, everything seems to go fine with the HA functionality, but the host
never recovers without being reinstalled. Please let me know which logs
   you
need to look at to help me out with this.
   
Thanks
   
   
Sent with Mixmax
   
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-27 Thread Eli Mesika


- Original Message -
 From: ILanit Stein ist...@redhat.com
 To: Artyom Lukianov aluki...@redhat.com, Eli Mesika emes...@redhat.com
 Cc: users@ovirt.org, rabsh...@citytwist.net
 Sent: Tuesday, January 27, 2015 5:19:12 PM
 Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
 
 
 Hi Guys,
 
 Can you please look into this please?

Hi
From the logs I can see clearly that host is turned on in 2015-01-26 
11:56:51,191
However, there is a stomp exception in 2015-01-26 11:56:53,544 and a connection 
timeout in 2015-01-26 11:56:53,553 that might be related 

Piotr, can you please have a look ?


 
 Thanks,
 Ilanit.
 - Forwarded Message -
 From: Rob Abshear rabsh...@citytwist.net
 To: ILanit Stein ist...@redhat.com
 Sent: Tuesday, January 27, 2015 3:05:56 PM
 Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
 
 Here are the logs. you requested.  The shutdown of the node was at 11:53
 and vdsmd was manually restarted at 12:01 to get the node back online.
 
 On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein ist...@redhat.com wrote:
 
  It might be a bug,
  Would you please attach the logs, I mentioned bellow,
  that can bring more details on the failure?
  Adding Eli, that may want to give some input on this issue.
 
  Thanks,
  Ilanit.
 
  - Original Message -
  From: Rob Abshear rabsh...@citytwist.net
  To: ILanit Stein ist...@redhat.com
  Cc: users@ovirt.org
  Sent: Monday, January 26, 2015 9:43:14 PM
  Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
 
  I have done a bit more investigating on this matter.  If I restart the node
  from within oVirt using the power management option restart, then the
  node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
  the command to power cycle the machine, then the machine restarts and vdsmd
  DOES start.  I can run the following command from another node in the
  cluster:
  fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
  and the node restarts and vdsmd DOES start.
 
  On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com wrote:
 
   Hi Rob,
  
   Thanks for this report.
  
   Would you please provide these logs, at the time frame, the host failure
   occur:
   1. oVirt Engine: /var/log/ovirt-engine/engine.log
   2. host: /var/log/vdsm/vdsm.log
  
   If it is reproducible, please add this info as well.
  
   You can also check vdsm service status, on host, while host reported as
   Non responsive,
   by running on host 'service vdsmd status'
   There might some problem, that might have prevented from vdsm service to
   come up, on host.
  
   Ilanit.
  
   - Original Message -
   From: Rob Abshear rabsh...@citytwist.net
   To: users@ovirt.org
   Sent: Friday, January 23, 2015 9:22:42 PM
   Subject: [ovirt-users] Host remains Non-Responsive after reboot
  
  
   I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
   cluster. Each host has a drac5 and it is configured and working. I am
   trying to simulate a node failure. I am running one HA VM on one of the
   hosts for testing. I simulate the failure by powering off the host with
  the
   VM running.
  
   Here is what is happening.
  
  
   * Host is powered off
   * ~4 minutes pass and the host is recognized as not responding
   * Automatic fence runs and the VM migrates. Another host in the node
   is chosen as a proxy to execute Status command on the host.
   * Same host is chosen as proxy to execute Start command on the host.
   * Same host is chosen as proxy to execute Status command on the host.
   * The host DOES physically start.
   * The host never shows status of UP.
   * I select “confirm host has been rebooted” and I see a manual fence
   start.
   * Host stays non-responsive.
   * I put the host in maintenance and then activate it.
   * Host still non-responsive
   * I put the host in maintenance and do a reinstall
   * Reinstall finishes and host becomes UP
  
   So, everything seems to go fine with the HA functionality, but the host
   never recovers without being reinstalled. Please let me know which logs
  you
   need to look at to help me out with this.
  
   Thanks
  
  
   Sent with Mixmax
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
 
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-26 Thread Rob Abshear
I have done a bit more investigating on this matter.  If I restart the node
from within oVirt using the power management option restart, then the
node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
the command to power cycle the machine, then the machine restarts and vdsmd
DOES start.  I can run the following command from another node in the
cluster:
fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
and the node restarts and vdsmd DOES start.

On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com wrote:

 Hi Rob,

 Thanks for this report.

 Would you please provide these logs, at the time frame, the host failure
 occur:
 1. oVirt Engine: /var/log/ovirt-engine/engine.log
 2. host: /var/log/vdsm/vdsm.log

 If it is reproducible, please add this info as well.

 You can also check vdsm service status, on host, while host reported as
 Non responsive,
 by running on host 'service vdsmd status'
 There might some problem, that might have prevented from vdsm service to
 come up, on host.

 Ilanit.

 - Original Message -
 From: Rob Abshear rabsh...@citytwist.net
 To: users@ovirt.org
 Sent: Friday, January 23, 2015 9:22:42 PM
 Subject: [ovirt-users] Host remains Non-Responsive after reboot


 I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
 cluster. Each host has a drac5 and it is configured and working. I am
 trying to simulate a node failure. I am running one HA VM on one of the
 hosts for testing. I simulate the failure by powering off the host with the
 VM running.

 Here is what is happening.


 * Host is powered off
 * ~4 minutes pass and the host is recognized as not responding
 * Automatic fence runs and the VM migrates. Another host in the node
 is chosen as a proxy to execute Status command on the host.
 * Same host is chosen as proxy to execute Start command on the host.
 * Same host is chosen as proxy to execute Status command on the host.
 * The host DOES physically start.
 * The host never shows status of UP.
 * I select “confirm host has been rebooted” and I see a manual fence
 start.
 * Host stays non-responsive.
 * I put the host in maintenance and then activate it.
 * Host still non-responsive
 * I put the host in maintenance and do a reinstall
 * Reinstall finishes and host becomes UP

 So, everything seems to go fine with the HA functionality, but the host
 never recovers without being reinstalled. Please let me know which logs you
 need to look at to help me out with this.

 Thanks


 Sent with Mixmax

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-26 Thread ILanit Stein
It might be a bug, 
Would you please attach the logs, I mentioned bellow,
that can bring more details on the failure?
Adding Eli, that may want to give some input on this issue.

Thanks,
Ilanit.

- Original Message -
From: Rob Abshear rabsh...@citytwist.net
To: ILanit Stein ist...@redhat.com
Cc: users@ovirt.org
Sent: Monday, January 26, 2015 9:43:14 PM
Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot

I have done a bit more investigating on this matter.  If I restart the node
from within oVirt using the power management option restart, then the
node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
the command to power cycle the machine, then the machine restarts and vdsmd
DOES start.  I can run the following command from another node in the
cluster:
fence_drac5 -a 192.168.200.105 -l root -p password -x -o reboot
and the node restarts and vdsmd DOES start.

On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein ist...@redhat.com wrote:

 Hi Rob,

 Thanks for this report.

 Would you please provide these logs, at the time frame, the host failure
 occur:
 1. oVirt Engine: /var/log/ovirt-engine/engine.log
 2. host: /var/log/vdsm/vdsm.log

 If it is reproducible, please add this info as well.

 You can also check vdsm service status, on host, while host reported as
 Non responsive,
 by running on host 'service vdsmd status'
 There might some problem, that might have prevented from vdsm service to
 come up, on host.

 Ilanit.

 - Original Message -
 From: Rob Abshear rabsh...@citytwist.net
 To: users@ovirt.org
 Sent: Friday, January 23, 2015 9:22:42 PM
 Subject: [ovirt-users] Host remains Non-Responsive after reboot


 I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
 cluster. Each host has a drac5 and it is configured and working. I am
 trying to simulate a node failure. I am running one HA VM on one of the
 hosts for testing. I simulate the failure by powering off the host with the
 VM running.

 Here is what is happening.


 * Host is powered off
 * ~4 minutes pass and the host is recognized as not responding
 * Automatic fence runs and the VM migrates. Another host in the node
 is chosen as a proxy to execute Status command on the host.
 * Same host is chosen as proxy to execute Start command on the host.
 * Same host is chosen as proxy to execute Status command on the host.
 * The host DOES physically start.
 * The host never shows status of UP.
 * I select “confirm host has been rebooted” and I see a manual fence
 start.
 * Host stays non-responsive.
 * I put the host in maintenance and then activate it.
 * Host still non-responsive
 * I put the host in maintenance and do a reinstall
 * Reinstall finishes and host becomes UP

 So, everything seems to go fine with the HA functionality, but the host
 never recovers without being reinstalled. Please let me know which logs you
 need to look at to help me out with this.

 Thanks


 Sent with Mixmax

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Host remains Non-Responsive after reboot

2015-01-24 Thread ILanit Stein
Hi Rob,

Thanks for this report.

Would you please provide these logs, at the time frame, the host failure occur:
1. oVirt Engine: /var/log/ovirt-engine/engine.log
2. host: /var/log/vdsm/vdsm.log

If it is reproducible, please add this info as well.

You can also check vdsm service status, on host, while host reported as Non 
responsive,
by running on host 'service vdsmd status' 
There might some problem, that might have prevented from vdsm service to come 
up, on host.

Ilanit. 

- Original Message -
From: Rob Abshear rabsh...@citytwist.net
To: users@ovirt.org
Sent: Friday, January 23, 2015 9:22:42 PM
Subject: [ovirt-users] Host remains Non-Responsive after reboot


I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the cluster. 
Each host has a drac5 and it is configured and working. I am trying to simulate 
a node failure. I am running one HA VM on one of the hosts for testing. I 
simulate the failure by powering off the host with the VM running. 

Here is what is happening. 


* Host is powered off 
* ~4 minutes pass and the host is recognized as not responding 
* Automatic fence runs and the VM migrates. Another host in the node is 
chosen as a proxy to execute Status command on the host. 
* Same host is chosen as proxy to execute Start command on the host. 
* Same host is chosen as proxy to execute Status command on the host. 
* The host DOES physically start. 
* The host never shows status of UP. 
* I select “confirm host has been rebooted” and I see a manual fence start. 
* Host stays non-responsive. 
* I put the host in maintenance and then activate it. 
* Host still non-responsive 
* I put the host in maintenance and do a reinstall 
* Reinstall finishes and host becomes UP 

So, everything seems to go fine with the HA functionality, but the host never 
recovers without being reinstalled. Please let me know which logs you need to 
look at to help me out with this. 

Thanks 


Sent with Mixmax 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Host remains Non-Responsive after reboot

2015-01-23 Thread Rob Abshear
I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the cluster.
Each host has a drac5 and it is configured and working. I am trying to simulate
a node failure. I am running one HA VM on one of the hosts for testing. I
simulate the failure by powering off the host with the VM running.
Here is what is happening. * Host is powered off
 * ~4 minutes pass and the host is recognized as not responding
 * Automatic fence runs and the VM migrates.Another host in the node is chosen 
as a proxy to execute Status command on
   the host.
 * Same host is chosen as proxy to execute Start command on the host.
 * Same host is chosen as proxy to execute Status command on the host.
 * The host DOES physically start.
 * The host never shows status of UP.
 * I select “confirm host has been rebooted” and I see a manual fence start.
 * Host stays non-responsive.
 * I put the host in maintenance and then activate it.
 * Host still non-responsive
 * I put the host in maintenance and do a reinstall
 * Reinstall finishes and host becomes UP

So, everything seems to go fine with the HA functionality, but the host never
recovers without being reinstalled. Please let me know which logs you need to
look at to help me out with this.
Thanks

Sent withMixmax [https://mixmax.com/r/S6cJAfQTLnw8QGtnD]___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users