Re: [ovirt-users] Engine HA-Issues

2017-07-17 Thread Sven Achtelik
Hi Kasturi,

thank you for pointing me into the right direction. It turned out that I‘ve 
removed an old DNS-Server. My nodes where set to use that old DNS-Server and 
thus lost the capability to resolve the name.

Thanks,
Sven

Von: Kasturi Narra [mailto:kna...@redhat.com]
Gesendet: Montag, 17. Juli 2017 08:10
An: Sven Achtelik <sven.achte...@eps.aero>
Cc: users@ovirt.org
Betreff: Re: [ovirt-users] Engine HA-Issues

Hi ,

  Can you please check the following. Following could be one of the reason why 
HE vm restarts every minute.


Check the error or engine health state. If it’s to do with Liveliness check, 
then this is mostly an issue connecting to engine.

- Check if engine FQDN is reachable from all hosts

-  curl -v 
http:///ovirt-engine/services/health<http://%3cengine-fqdn%3e/ovirt-engine/services/health>
 - does this return ok?

- Access the HE console and check if ovirt-engine is running.

- Check /var/log/ovirt-engine/server.log or /var/log/ovirt-engine/engine.log if 
there are errors starting ovirt-engine



Thanks

kasturi


On Fri, Jul 14, 2017 at 10:28 PM, Sven Achtelik 
<sven.achte...@eps.aero<mailto:sven.achte...@eps.aero>> wrote:
Hi All,

after running solid for several month my ovirt-engine started rebooting on 
several hosts. I’ve looked into the hostend-engine –vm-status and it sees that 
the engine is up on one host but not reachable. At the same time I can access 
the gui and everything is working fine. After some time the engine is shutting 
down and all hosts are trying to start the engine until one is the winner, at 
least it looks like this. Any clues where to look at and find the issue with 
the liveliness check ?



--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node01
Host ID: 1
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 3eb33843
local_conf_timestamp   : 17128
Host timestamp : 17113
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=17113 (Fri Jul 14 11:50:23 2017)
host-id=1
score=3400
vm_conf_refresh_time=17128 (Fri Jul 14 11:50:38 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node02.mgmt.lan
Host ID: 2
Engine status  : {"reason": "failed liveliness check", 
"health": "bad", "vm": "up", "detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 2a8c86cc
local_conf_timestamp   : 523182
Host timestamp : 523167
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=523167 (Fri Jul 14 11:50:25 2017)
host-id=2
score=3400
vm_conf_refresh_time=523182 (Fri Jul 14 11:50:40 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False


--== Host 3 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node03.mgmt.lan
Host ID: 3
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : f8490d79
local_conf_timestamp   : 527698
Host timestamp : 527683
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=527683 (Fri Jul 14 11:50:33 2017)
host-id=3
score=3400
vm_conf_refresh_time=527698 (Fri Jul 14 11:50:47 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False


Re: [ovirt-users] Engine HA-Issues

2017-07-17 Thread Kasturi Narra
Hi ,

  Can you please check the following. Following could be one of the reason
why HE vm restarts every minute.

Check the error or engine health state. If it’s to do with Liveliness
check, then this is mostly an issue connecting to engine.

- Check if engine FQDN is reachable from all hosts

-  curl -v http:///ovirt-engine/services/health - does this
return ok?

- Access the HE console and check if ovirt-engine is running.

- Check /var/log/ovirt-engine/server.log or
/var/log/ovirt-engine/engine.log if there are errors starting ovirt-engine


Thanks

kasturi


On Fri, Jul 14, 2017 at 10:28 PM, Sven Achtelik 
wrote:

> Hi All,
>
>
>
> after running solid for several month my ovirt-engine started rebooting on
> several hosts. I’ve looked into the hostend-engine –vm-status and it sees
> that the engine is up on one host but not reachable. At the same time I can
> access the gui and everything is working fine. After some time the engine
> is shutting down and all hosts are trying to start the engine until one is
> the winner, at least it looks like this. Any clues where to look at and
> find the issue with the liveliness check ?
>
>
>
> 
> 
>
>
>
> --== Host 1 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date  : True
>
> Hostname   : ovirt-node01
>
> Host ID: 1
>
> Engine status  : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
>
> Score  : 3400
>
> stopped: False
>
> Local maintenance  : False
>
> crc32  : 3eb33843
>
> local_conf_timestamp   : 17128
>
> Host timestamp : 17113
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=17113 (Fri Jul 14 11:50:23 2017)
>
> host-id=1
>
> score=3400
>
> vm_conf_refresh_time=17128 (Fri Jul 14 11:50:38 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineDown
>
> stopped=False
>
>
>
>
>
> --== Host 2 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date  : True
>
> Hostname   : ovirt-node02.mgmt.lan
>
> Host ID: 2
>
> Engine status  : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "up"}
>
> Score  : 3400
>
> stopped: False
>
> Local maintenance  : False
>
> crc32  : 2a8c86cc
>
> local_conf_timestamp   : 523182
>
> Host timestamp : 523167
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=523167 (Fri Jul 14 11:50:25 2017)
>
> host-id=2
>
> score=3400
>
> vm_conf_refresh_time=523182 (Fri Jul 14 11:50:40 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineStarting
>
> stopped=False
>
>
>
>
>
> --== Host 3 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date  : True
>
> Hostname   : ovirt-node03.mgmt.lan
>
> Host ID: 3
>
> Engine status  : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
>
> Score  : 3400
>
> stopped: False
>
> Local maintenance  : False
>
> crc32  : f8490d79
>
> local_conf_timestamp   : 527698
>
> Host timestamp : 527683
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=527683 (Fri Jul 14 11:50:33 2017)
>
> host-id=3
>
> score=3400
>
> vm_conf_refresh_time=527698 (Fri Jul 14 11:50:47 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineDown
>
> stopped=False
>
>
>
> 
> --
>
> Thank you,
>
> Sven
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Engine HA-Issues

2017-07-14 Thread Sven Achtelik
Hi All,

after running solid for several month my ovirt-engine started rebooting on 
several hosts. I've looked into the hostend-engine -vm-status and it sees that 
the engine is up on one host but not reachable. At the same time I can access 
the gui and everything is working fine. After some time the engine is shutting 
down and all hosts are trying to start the engine until one is the winner, at 
least it looks like this. Any clues where to look at and find the issue with 
the liveliness check ?



--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node01
Host ID: 1
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 3eb33843
local_conf_timestamp   : 17128
Host timestamp : 17113
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=17113 (Fri Jul 14 11:50:23 2017)
host-id=1
score=3400
vm_conf_refresh_time=17128 (Fri Jul 14 11:50:38 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node02.mgmt.lan
Host ID: 2
Engine status  : {"reason": "failed liveliness check", 
"health": "bad", "vm": "up", "detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 2a8c86cc
local_conf_timestamp   : 523182
Host timestamp : 523167
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=523167 (Fri Jul 14 11:50:25 2017)
host-id=2
score=3400
vm_conf_refresh_time=523182 (Fri Jul 14 11:50:40 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False


--== Host 3 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt-node03.mgmt.lan
Host ID: 3
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : f8490d79
local_conf_timestamp   : 527698
Host timestamp : 527683
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=527683 (Fri Jul 14 11:50:33 2017)
host-id=3
score=3400
vm_conf_refresh_time=527698 (Fri Jul 14 11:50:47 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False

--
Thank you,
Sven
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users