[ovirt-users] Hosted -engine is down and cannot be restarted

2019-03-19 Thread ada per
Hello everyone, 

For a strange reason the hosted engine went down and I cannot restart it.  I 
tried manually restarting it without any success can you please advice?

For all the nodes the engine status is the same as the one below. 
--== Host nodex. (id: 6) status ==--
conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : nodex
Host ID: 6
Engine status  : {"reason": "bad vm status", "health": 
"bad", "vm": "down_unexpected", "detail": "Down"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 323a9f45
local_conf_timestamp   : 2648874
Host timestamp : 2648874
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2648874 (Tue Mar 19 12:25:44 2019)
host-id=6
score=3400
vm_conf_refresh_time=2648874 (Tue Mar 19 12:25:44 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False

When I try the commands
root@node5# hosted-engine --vm-shutdown
I ge the response:
root@node5# Command VM.shutdown with args {'delay': '120', 'message': 'VM is 
shutting down!', 'vmID': 'a492d2eb-1dfd-470d-a141-3e55d2189275'} 
failed:(code=1, message=Virtual machine does not exist) 

But when I run  : hosted-engine --vm-start 
I get the response: VM exists and is down, cleaning up and restarting



Below you can see the # journalctl -u ovirt-ha-agent logs

Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Unhandled 
monitoring loop exception
  Traceback (most 
recent call last):
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 430, in start_monitoring
  
self._monitoring_loop()
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 449, in _monitoring_loop
  for old_state, 
state, delay in self.fsm:
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
line 127, in next
  new_data = 
self.refresh(self._state.data)
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 81, in refresh
  
stats.update(self.hosted_engine.collect_stats())
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 737, in collect_stats
  all_stats = 
self._broker.get_stats_from_storage()
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 143, in get_stats_from_storage
  result = 
self._proxy.get_stats()
File 
"/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
  return 
self.__send(self.__name, args)
File 
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
  
verbose=self.__verbose
File 
"/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
  return 
self.single_request(host, handler, request_body, verbose)
File 
"/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
  
self.send_content(h, request_body)
File 
"/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
  
connection.endheaders(request_body)
File 
"/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
  
self._send_output(message_body)
   

[ovirt-users] Hosted Engine goes down while putting gluster node into maintenance mode.

2018-12-03 Thread Abhishek Sahni
Hello Team,

We are running a setup of 3-way replica HC gluster setup configured during
the initial deployment from the cockpit console using ansible.

NODE1
  - /dev/sda   (OS)
  - /dev/sdb   ( Gluster Bricks )
   * /gluster_bricks/engine/engine/
   * /gluster_bricks/data/data/
   * /gluster_bricks/vmstore/vmstore/

NODE2 and NODE3 with a similar setup.

Hosted engine was running on node2.

- While moving NODE1 to maintenance mode along with stopping the
gluster service as it prompts before, Hosted engine instantly went down.

- I start the gluster service back on node1 and start the hosted engine
again and found hosted engine started properly but getting crashed again
and again within frames of second after a successful start because HE
itself stopping glusterd on node1. (not sure) but cross-verified by
checking glusterd status.

*Is it possible to clear pending tasks or not let the HE to stop
glusterd on node1?*

*Or we can start the HE using other gluster node?*

https://paste.fedoraproject.org/paste/Qu2tSHuF-~G4GjGmstV6mg


-- 

ABHISHEK SAHNI


IISER Bhopal
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7ETASYIKXRAGYZRBZIS6G743UHPKGCNA/


Re: [ovirt-users] Hosted Engine is down and won't start

2017-11-13 Thread Simone Tiraboschi
On Fri, Nov 10, 2017 at 9:42 AM, Kasturi Narra  wrote:

> Hello Logan,
>
>One reason the liveliness check fails is host cannot ping your hosted
> engine vm. you can try connecting to HE vm using remote-viewer
> vnc://hypervisor-ip:5900 and from the hosted-engine --vm-status output
> looks like the HE vm is up and running fine.
>
>
Hi,
just a small addition:
we can deploy hosted-engine choosing vnc or spice as the graphical console
protocol so you have to fix the remote viewer command according to what you
are using.
And the tcp post is not always 5900 but it depends on the VMs starting
order.

To get the actual VNC port number you could use:
. /etc/ovirt-hosted-engine/hosted-engine.conf
vdsm-client VM getInfo vmID=$vmid | jq -r '.devices[] | select(.device |
contains("vnc")).port'

An alternative is to use the serial console with:
hosted-engine --console


>
>- Please check internal dns setting like resolv.conf setting
>- Can not resolve virtual host name or ip address.
>
> Thanks
> kasturi
>
>
> On Fri, Nov 10, 2017 at 12:56 PM, Logan Kuhn 
> wrote:
>
>> We lost the backend storage that hosts our self hosted engine tonight.
>> We've recovered it and there was no data corruption on the volume
>> containing the HE disk.  However, when we try to start the HE it doesn't
>> give an error, but it also doesn't start.
>>
>> The VM isn't pingable and the liveliness check always fails.
>>
>>  [root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1
>> Hostname   : ovirttest1.wolfram.com
>> Host ID: 1
>> Engine status  : {"reason": "failed liveliness
>> check", "health": "bad", "vm": "up", "detail": "up"}
>> Score  : 3400
>> stopped: False
>> Local maintenance  : False
>> crc32  : 2c2f3ec9
>> local_conf_timestamp   : 18980042
>> Host timestamp : 18980039
>> Extra metadata (valid at timestamp):
>>metadata_parse_version=1
>>metadata_feature_version=1
>>timestamp=18980039 (Fri Nov 10 01:17:59 2017)
>>host-id=1
>>score=3400
>>vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017)
>>conf_on_shared_storage=True
>>maintenance=False
>>state=GlobalMaintenance
>>stopped=False
>>
>> The environment is in Global Maintenance so that we can isolate it to
>> starting on a specific host to eliminate as many variables as possible.
>> I've attached the agent and broker logs
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine is down

2017-11-13 Thread Martin Sivak
Hi,

Following my answer, this is the bug you opened to track this issue right?

https://bugzilla.redhat.com/show_bug.cgi?id=1511788

You said in comment #2 of that bug that all is well now. Should we
close the bug then?

Best regards

Martin Sivak

On Fri, Nov 10, 2017 at 8:22 AM, Logan Kuhn  wrote:
> We lost the backend storage that hosts our self hosted engine tonight.
> We've recovered it and there was no data corruption on the volume containing
> the HE disk.  However, when we try to start the HE it doesn't give an error,
> but it also doesn't start.
>
> The VM isn't pingable and the liveliness check always fails.
>
>  [root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1
> Hostname   : ovirttest1.wolfram.com
> Host ID: 1
> Engine status  : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "up"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : 2c2f3ec9
> local_conf_timestamp   : 18980042
> Host timestamp : 18980039
> Extra metadata (valid at timestamp):
>metadata_parse_version=1
>metadata_feature_version=1
>timestamp=18980039 (Fri Nov 10 01:17:59 2017)
>host-id=1
>score=3400
>vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017)
>conf_on_shared_storage=True
>maintenance=False
>state=GlobalMaintenance
>stopped=False
>
> The environment is in Global Maintenance so that we can isolate it to
> starting on a specific host to eliminate as many variables as possible.
> I've attached the agent and broker logs
>
> Regards,
> Logan Kuhn
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine is down

2017-11-13 Thread Kasturi Narra
Hi Logan,

When i look at the hosted-engine --vm-status i see that vm is up but it
is health is bad. Can you try connecting to the vm using remote-viewer
using the command below ?

remote-viewer vnc://ovirttest1.wolfram.com:5900

Thanks
kasturi

On Fri, Nov 10, 2017 at 12:52 PM, Logan Kuhn  wrote:

> We lost the backend storage that hosts our self hosted engine tonight.
> We've recovered it and there was no data corruption on the volume
> containing the HE disk.  However, when we try to start the HE it doesn't
> give an error, but it also doesn't start.
>
> The VM isn't pingable and the liveliness check always fails.
>
>  [root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1
> Hostname   : ovirttest1.wolfram.com
> Host ID: 1
> Engine status  : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "up"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : 2c2f3ec9
> local_conf_timestamp   : 18980042
> Host timestamp : 18980039
> Extra metadata (valid at timestamp):
>metadata_parse_version=1
>metadata_feature_version=1
>timestamp=18980039 (Fri Nov 10 01:17:59 2017)
>host-id=1
>score=3400
>vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017)
>conf_on_shared_storage=True
>maintenance=False
>state=GlobalMaintenance
>stopped=False
>
> The environment is in Global Maintenance so that we can isolate it to
> starting on a specific host to eliminate as many variables as possible.
> I've attached the agent and broker logs
>
> Regards,
> Logan Kuhn
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine is down and won't start

2017-11-10 Thread Kasturi Narra
Hello Logan,

   One reason the liveliness check fails is host cannot ping your hosted
engine vm. you can try connecting to HE vm using remote-viewer
vnc://hypervisor-ip:5900 and from the hosted-engine --vm-status output
looks like the HE vm is up and running fine.


   - Please check internal dns setting like resolv.conf setting
   - Can not resolve virtual host name or ip address.

Thanks
kasturi


On Fri, Nov 10, 2017 at 12:56 PM, Logan Kuhn 
wrote:

> We lost the backend storage that hosts our self hosted engine tonight.
> We've recovered it and there was no data corruption on the volume
> containing the HE disk.  However, when we try to start the HE it doesn't
> give an error, but it also doesn't start.
>
> The VM isn't pingable and the liveliness check always fails.
>
>  [root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1
> Hostname   : ovirttest1.wolfram.com
> Host ID: 1
> Engine status  : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "up"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : 2c2f3ec9
> local_conf_timestamp   : 18980042
> Host timestamp : 18980039
> Extra metadata (valid at timestamp):
>metadata_parse_version=1
>metadata_feature_version=1
>timestamp=18980039 (Fri Nov 10 01:17:59 2017)
>host-id=1
>score=3400
>vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017)
>conf_on_shared_storage=True
>maintenance=False
>state=GlobalMaintenance
>stopped=False
>
> The environment is in Global Maintenance so that we can isolate it to
> starting on a specific host to eliminate as many variables as possible.
> I've attached the agent and broker logs
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine is down

2016-05-01 Thread Yedidyah Bar David
On Fri, Apr 22, 2016 at 10:31 AM, Budur Nagaraju  wrote:
> HI
>
> I have configured hosted engine with two hosts ,one of the hosted engine is
> down and unable to make it active .
>
> Is there anyways to fix the issue ? I have restarted ha-agent and ha-broker
> but no luck.

If still not solved, please check/post relevant logs. Thanks.

>
> Thanks,
> Nagaraju
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Didi
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users