[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-19 Thread Strahil
>> >> 1.2 All bricks healed (gluster volume heal data info summary) and no 
>> >> split-brain
>> >
>> >  
>> >  
>> > gluster volume heal data info
>> >  
>> > Brick node-msk-gluster203:/opt/gluster/data
>> > Status: Connected
>> > Number of entries: 0
>> >  
>> > Brick node-msk-gluster205:/opt/gluster/data
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > Status: Connected
>> > Number of entries: 7
>> >  
>> > Brick node-msk-gluster201:/opt/gluster/data
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > Status: Connected
>> > Number of entries: 7
>> >  
>>
>> Data needs healing.
>> Run: cluster volume heal data full
>
> This does not work.

Yeah, That's because my phone corrects the 'gluster' to 'cluster'

Usually gluster daemons detect need of heal, but with 'gluster volume heal data 
full && sleep 5 && gluster volume heal data info summary && sleep 5 && gluster 
volume heal data info summary', you can force syncing and get the result.
Let's see what happens with DNS.

Best Regards,
Strahil Nikolov___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BR4Y4X5AGRUWGYOSKNQPRR6XHCOMQXZG/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-19 Thread Николаев Алексей
Thx for your help, Strahil! Hmmm, I see DNS resolution failed in hostname without FQDN. I'll try to fix it. 19.03.2019, 09:43, "Strahil" :Hi Alexei,>> 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain>>  >  > gluster volume heal data info>  > Brick node-msk-gluster203:/opt/gluster/data> Status: Connected> Number of entries: 0>  > Brick node-msk-gluster205:/opt/gluster/data> 78043-0943-48f8-a4fe-9b23e2ba3404>> 7-1746-471b-a49d-8d824db9fd72>> > > 8-4370-46ce-b976-ac22d2f680ee>> 9142-7843fd260c70>> > Status: Connected> Number of entries: 7>  > Brick node-msk-gluster201:/opt/gluster/data> 78043-0943-48f8-a4fe-9b23e2ba3404>> 7-1746-471b-a49d-8d824db9fd72>> > > 8-4370-46ce-b976-ac22d2f680ee>> 9142-7843fd260c70>> > Status: Connected> Number of entries: 7>  Data needs healing.Run: cluster volume heal data fullThis does not work. If it still doesn't heal (check in 5 min),go to /rhev/data-center/mnt/glusterSD/msk-gluster-facility._dataAnd run 'find  . -exec stat {}\;'  without the quotes.Done. https://yadi.sk/i/nXu0RV646YpD6Q  As I have understood you, ovirt Hosted Engine  is running and can be started on all nodes except 1.Ovirt Hosted Engine works and can be run on all nodes with no exceptions.Hosted Engine volume /rhev/data-center/mnt/glusterSD/msk-gluster-facility._engine can be mounted by all nodes without problems. >>  >> 2. Go to the problematic host and check the mount point is there>>  >  > No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data> If I create a mount point manually, it is deleted after the node is activated.>  > Other nodes can mount this volume without problems. Only this node have connection problems after update.>  > Here is a part of the log at the time of activation of the node:>  > vdsm log>  > 2019-03-18 16:46:00,548+0300 INFO  (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)> 2019-03-18 16:46:00,549+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)> 2019-03-18 16:46:00,581+0300 INFO  (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], options=None) from=:::10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)> 2019-03-18 16:46:00,621+0300 INFO  (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' (storageServer:167)> 2019-03-18 16:46:00,622+0300 INFO  (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data mode: None (fileUtils:197)> 2019-03-18 16:46:00,622+0300 WARN  (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)This seems very strange. As you have hidden the hostname, I'm not use which on is this.Check that DNS can be resolved from  all hosts and the hostname of this Host is resolvable.Name resolution works without problems. dig msk-gluster-facility. ;; ANSWER SECTION:msk-gluster-facility.. 1786 IN A    10.77.253.205 # <-- node-msk-gluster205.msk-gluster-facility.. 1786 IN A    10.77.253.201 # <-- node-msk-gluster201.msk-gluster-facility.. 1786 IN A    10.77.253.203 # <-- node-msk-gluster203. ;; Query time: 5 msec;; SERVER: 10.77.16.155#53(10.77.16.155);; WHEN: Tue Mar 19 14:55:10 MSK 2019;; MSG SIZE  rcvd: 110 Also check if it in the  peer  list.msk-gluster-facility. is just an A type record in dns. It is used on a webUI for mounting gluster volumes and gluster storage HA.Try to manually mount the cluster volume:mount -t glusterfs msk-gluster-facility.:/data /mntWell, the mount works from hypervisor node77-202.And does not work with the hypervisor node77-204 (problematic node). node77- 204/var/log/glusterfs/mnt.log [2019-03-19 12:15:11.106226] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.15 (args: /usr/sbin/glusterfs --volfile-server=msk-gluster-facility. --volfile-id=/data /mnt)[2019-03-19 12:15:11.109577] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction[2019-03-19 12:15:11.129652] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1[2019-03-19 12:15:11.135384] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2[2019-03-19 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-19 Thread Strahil
Hi Alexei,

>> 1.2 All bricks healed (gluster volume heal data info summary) and no 
>> split-brain
>
>  
>  
> gluster volume heal data info
>  
> Brick node-msk-gluster203:/opt/gluster/data
> Status: Connected
> Number of entries: 0
>  
> Brick node-msk-gluster205:/opt/gluster/data
> 
> 
> 
> 
> 
> 
> 
> Status: Connected
> Number of entries: 7
>  
> Brick node-msk-gluster201:/opt/gluster/data
> 
> 
> 
> 
> 
> 
> 
> Status: Connected
> Number of entries: 7
>  

Data needs healing.
Run: cluster volume heal data full
If it still doesn't heal (check in 5 min),go to 
/rhev/data-center/mnt/glusterSD/msk-gluster-facility._data
And run 'find  . -exec stat {}\;'  without the quotes.

As I have understood you, ovirt Hosted Engine  is running and can be started on 
all nodes except 1.


>>  
>> 2. Go to the problematic host and check the mount point is there
>
>  
>  
> No mount point on problematic node 
> /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data
> If I create a mount point manually, it is deleted after the node is activated.
>  
> Other nodes can mount this volume without problems. Only this node have 
> connection problems after update.
>  
> Here is a part of the log at the time of activation of the node:
>  
> vdsm log
>  
> 2019-03-18 16:46:00,548+0300 INFO  (jsonrpc/5) [vds] Setting Hosted Engine HA 
> local maintenance to False (API:1630)
> 2019-03-18 16:46:00,549+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC 
> call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)
> 2019-03-18 16:46:00,581+0300 INFO  (jsonrpc/7) [vdsm.api] START 
> connectStorageServer(domType=7, 
> spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': 
> u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': 
> u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', 
> u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], 
> options=None) from=:::10.77.253.210,56630, flow_id=81524ed, 
> task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)
> 2019-03-18 16:46:00,621+0300 INFO  (jsonrpc/7) 
> [storage.StorageServer.MountConnection] Creating directory 
> u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' 
> (storageServer:167)
> 2019-03-18 16:46:00,622+0300 INFO  (jsonrpc/7) [storage.fileUtils] Creating 
> directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data 
> mode: None (fileUtils:197)
> 2019-03-18 16:46:00,622+0300 WARN  (jsonrpc/7) 
> [storage.StorageServer.MountConnection] gluster server 
> u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 
> 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate 
> servers (storageServer:317)


This seems very strange. As you have hidden the hostname, I'm not use which on 
is this.
Check that DNS can be resolved from  all hosts and the hostname of this Host is 
resolvable.
Also check if it in the  peer  list.
Try to manually mount the cluster volume:
mount -t glusterfs msk-gluster-facility.:/data /mnt

Is this a second FQDN/IP of this server?
If so, gluster accepts that via gluster peer probe IP2


>> 2.1. Check permissions (should be vdsm:kvm) and fix with chown -R if needed

>> 2.2. Check the OVF_STORE from the logs that it exists
>
>  
> How can i do this?

Go to /rhev/data-center/mnt/glusterSD/host_engine and use find inside the 
domain UUID for files that are not owned by vdsm:KVM.
I usually run 'chown -R vdsm:KVM 823xx---zzz'  and it will fix any 
misconfiguration.

Best Regards,
Strahil Nikolov___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OFDI4CH3REYGWAD7V36K4SW64MALACAV/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-18 Thread Николаев Алексей
Thx for answer!   18.03.2019, 14:52, "Strahil Nikolov" : Hi Alexei, In order to debug it check the following: 1. Check gluster:1.1 All bricks up ? All peers up. Gluster version is 3.12.15 [root@node-msk-gluster203 ~]# gluster peer statusNumber of Peers: 2 Hostname: node-msk-gluster205.Uuid: 188d8444-3246-4696-a0a7-2872e0a01067State: Peer in Cluster (Connected) Hostname: node-msk-gluster201.Uuid: 919b0a60-b9b7-4091-a60a-51d43b995285State: Peer in Cluster (Connected) All bricks on all gluster servers are UP. Volume Name: dataType: ReplicateVolume ID: 8fb43ba3-b2e9-4e33-b4c3-b0b03cd8cba3Status: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: node-msk-gluster203.:/opt/gluster/dataBrick2: node-msk-gluster205.:/opt/gluster/dataBrick3: node-msk-gluster201.:/opt/gluster/data (arbiter) Volume Name: engineType: ReplicateVolume ID: 5dda8427-c69b-4b96-bcd6-eff3be2e0b5cStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcp Bricks:Brick1: node-msk-gluster205.:/opt/gluster/engineBrick2: node-msk-gluster203.:/opt/gluster/engineBrick3: node-msk-gluster201.:/opt/gluster/engine (arbiter)   1.2 All bricks healed (gluster volume heal data info summary) and no split-brain  gluster volume heal data info Brick node-msk-gluster203:/opt/gluster/dataStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster205:/opt/gluster/dataStatus: ConnectedNumber of entries: 7 Brick node-msk-gluster201:/opt/gluster/dataStatus: ConnectedNumber of entries: 7 gluster volume heal engine info Brick node-msk-gluster205.:/opt/gluster/engineStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster203.:/opt/gluster/engineStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster201.:/opt/gluster/engineStatus: ConnectedNumber of entries: 02. Go to the problematic host and check the mount point is there  No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_dataIf I create a mount point manually, it is deleted after the node is activated. Other nodes can mount this volume without problems. Only this node have connection problems after update. Here is a part of the log at the time of activation of the node: vdsm log 2019-03-18 16:46:00,548+0300 INFO  (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)2019-03-18 16:46:00,549+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)2019-03-18 16:46:00,581+0300 INFO  (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], options=None) from=:::10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)2019-03-18 16:46:00,621+0300 INFO  (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' (storageServer:167)2019-03-18 16:46:00,622+0300 INFO  (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data mode: None (fileUtils:197)2019-03-18 16:46:00,622+0300 WARN  (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)2019-03-18 16:46:00,622+0300 INFO  (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.:/data at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data (mount:204)2019-03-18 16:46:00,809+0300 ERROR (jsonrpc/7) [storage.HSM] Could not connect to storageServer (hsm:2415)Traceback (most recent call last):  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2412, in connectStorageServer    conObj.connect()  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 179, in connect    six.reraise(t, v, tb)  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 171, in connect    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount    cgroup=cgroup)  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__    return callMethod()  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in     **kwargs)  File "", line 2, in mount  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod    raise convert_to_error(kind, result)MountError: (1, ';Running scope as unit run-72797.scope.\nMount failed. Please check the log file for more details.\n')  2.1. Check permissions (should be vdsm:kvm) and 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-18 Thread Strahil Nikolov
 Hi Alexei,
In order to debug it check the following:
1. Check gluster:1.1 All bricks up ?1.2 All bricks healed (gluster volume heal 
data info summary) and no split-brain
2. Go to the problematic host and check the mount point is there2.1. Check 
permissions (should be vdsm:kvm) and fix with chown -R if needed2.2. Check the 
OVF_STORE from the logs that it exists2.3. Check that vdsm can extract the 
file:sudo -u vdsm tar -tvf 
/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data/DOMAIN-UUID/Volume-UUID/Image-ID
3 Configure virsh alias, as it's quite helpful:alias virsh='virsh -c 
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
4. If VM is running - go to the host and get the xml:virsh dumpxml HostedEngine 
> /root/HostedEngine.xml4.1. Get the Network:virsh net-dumpxml vdsm-ovirtmgmt > 
/root/vdsm-ovirtmgmt.xml4.2 If not , Here is mine:[root@ovirt1 ~]# virsh 
net-dumpxml vdsm-ovirtmgmt

  vdsm-ovirtmgmt
  7ae538ce-d419-4dae-93b8-3a4d27700227
  
  


UUID is not important, as my first recovery was with different one.
5. If you Hosted Engine is down:5.1 Remove the VM (if exists anywhere)on all 
nodes:virsh undefine HostedEngine5.2 Verify that the nodes are in global 
maintenance:hosted-engine --vm-status5.3 Define the Engine on only 1 
machinevirsh define HostedEngine.xmlvirsh net-define vdsm-ovirtmgmt.xml
virsh start HostedEngine

Note: if it complains about the storage - there is no link in 
/var/run/vdsm/storage/DOMAIN-UUID/Volume-UUID to your Volume-UUIDHere is how it 
looks mine:[root@ovirt1 808423f9-8a5c-40cd-bc9f-2568c85b8c74]# ll 
/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74
total 24
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:42 2c74697a-8bd9-4472-8a98-bf624f3462d5 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/2c74697a-8bd9-4472-8a98-bf624f3462d5
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:45 3ec27d6d-921c-4348-b799-f50543b6f919 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/3ec27d6d-921c-4348-b799-f50543b6f919
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 08:28 441abdc8-6cb1-49a4-903f-a1ec0ed88429 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 21:15 8ec7a465-151e-4ac3-92a7-965ecf854501 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/8ec7a465-151e-4ac3-92a7-965ecf854501
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 08:28 94ade632-6ecc-4901-8cec-8e39f3d69cb0 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0
lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:42 fe62a281-51e9-4b23-87b3-2deb52357304 -> 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/fe62a281-51e9-4b23-87b3-2deb52357304


Once you create your link , start it again.
6. Wait till OVF is fixed (takes more than the settings in the engine :) )
Good Luck!
Best Regards,Strahil Nikolov


В понеделник, 18 март 2019 г., 12:57:30 ч. Гринуич+2, Николаев Алексей 
 написа:  
 
 Hi all! I have a very similar problem after update one of the two nodes to 
version 4.3.1. This node77-02 lost connection to gluster volume named DATA, but 
not to volume with hosted engine.  node77-02 /var/log/messages Mar 18 13:40:00 
node77-02 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed 
scanning for OVF_STORE due to Command Volume.getInfo with args 
{'storagepoolID': '----', 'storagedomainID': 
'2ee71105-1810-46eb-9388-cc6caccf9fac', 'volumeID': 
u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11', 'imageID': 
u'43b75b50-cad4-411f-8f51-2e99e52f4c77'} failed:#012(code=201, message=Volume 
does not exist: (u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11',))Mar 18 13:40:00 
node77-02 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable 
to identify the OVF_STORE volume, falling back to initial vm.conf. Please 
ensure you already added your first data domain for regular VMs HostedEngine VM 
works fine on all nodes. But node77-02 failed witherror in webUI: 
ConnectStoragePoolVDS failed: Cannot find master domain: 
u'spUUID=5a5cca91-01f8-01af-0297-025f, 
msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' node77-02 vdsm.log 2019-03-18 
13:51:46,287+0300 WARN  (jsonrpc/7) [storage.StorageServer.MountConnection] 
gluster server u'msk-gluster-facility.' is not in bricks 
['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly 
mounting duplicate servers (storageServer:317)2019-03-18 13:51:46,287+0300 INFO 
 (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.ipt.fsin.uis:/data 
at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data 
(mount:204)2019-03-18 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-18 Thread Николаев Алексей
Hi all! I have a very similar problem after update one of the two nodes to version 4.3.1. This node77-02 lost connection to gluster volume named DATA, but not to volume with hosted engine.  node77-02 /var/log/messages Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '----', 'storagedomainID': '2ee71105-1810-46eb-9388-cc6caccf9fac', 'volumeID': u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11', 'imageID': u'43b75b50-cad4-411f-8f51-2e99e52f4c77'} failed:#012(code=201, message=Volume does not exist: (u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11',))Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs HostedEngine VM works fine on all nodes. But node77-02 failed witherror in webUI: ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=5a5cca91-01f8-01af-0297-025f, msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' node77-02 vdsm.log 2019-03-18 13:51:46,287+0300 WARN  (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)2019-03-18 13:51:46,287+0300 INFO  (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.ipt.fsin.uis:/data at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data (mount:204)2019-03-18 13:51:46,474+0300 ERROR (jsonrpc/7) [storage.HSM] Could not connect to storageServer (hsm:2415)Traceback (most recent call last):  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2412, in connectStorageServer    conObj.connect()  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 179, in connect    six.reraise(t, v, tb)  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 171, in connect    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount    cgroup=cgroup)  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__    return callMethod()  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in     **kwargs)  File "", line 2, in mount  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod    raise convert_to_error(kind, result)MountError: (1, ';Running scope as unit run-10121.scope.\nMount failed. Please check the log file for more details.\n') -- 2019-03-18 13:51:46,830+0300 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='fe81642e-2421-4169-a08b-51467e8f01fe') Unexpected error (task:875)Traceback (most recent call last):  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run    return fn(*args, **kargs)  File "", line 2, in connectStoragePool  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method    ret = func(*args, **kwargs)  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1035, in connectStoragePool    spUUID, hostID, msdUUID, masterVersion, domainsMap)  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1097, in _connectStoragePool    res = pool.connect(hostID, msdUUID, masterVersion)  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 700, in connect    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1274, in __rebuild    self.setMasterDomain(msdUUID, masterVersion)  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1495, in setMasterDomain    raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=5a5cca91-01f8-01af-0297-025f, msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' What the bestpractice to recovery this problem?15.03.2019, 13:47, "Strahil Nikolov" : On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov  wrote:Ok, I have managed to recover again and no issues are detected this time.I guess this case is quite rare and nobody has experienced that. >Hi,>can you please explain how you fixed it? I have set again to global maintenance, defined the HostedEngine from the old xml (taken from old vdsm log) , defined the network and powered it off.Set the OVF update period to 5 min , but it took several hours until the OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and undefined it from ovirt1. then I set the maintenance to 'none' and the VM powered on ovirt1.In order to 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-17 Thread Strahil
Hi Simone,

I  have noticed that my Engine's root disk is 'vda' just in standalone KVM.
I have the feeling that was not the case before. 

Can someone check a default engine and post the output of lsblk ?

Thanks in advance.

Best Regards,
Strahil NikolovOn Mar 15, 2019 12:46, Strahil Nikolov  
wrote:
>
>
> On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov  wrote:
>>
>> Ok,
>>
>> I have managed to recover again and no issues are detected this time.
>> I guess this case is quite rare and nobody has experienced that.
>
>
> >Hi,
> >can you please explain how you fixed it?
>
> I have set again to global maintenance, defined the HostedEngine from the old 
> xml (taken from old vdsm log) , defined the network and powered it off.
> Set the OVF update period to 5 min , but it took several hours until the 
> OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent 
> ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and 
> undefined it from ovirt1.
>
> then I set the maintenance to 'none' and the VM powered on ovirt1.
> In order to test a failure, I removed the global maintenance and powered off 
> the HostedEngine from itself (via ssh). It was brought back to the other node.
>
> In order to test failure of ovirt2, I set ovirt1 in local maintenance and 
> removed it (mode 'none') and again shutdown the VM via ssh and it started 
> again to ovirt1.
>
> It seems to be working, as I have later shut down the Engine several times 
> and it managed to start without issues. 
>
> I'm not sure this is related, but I had detected that ovirt2 was out-of-sync 
> of the vdsm-ovirtmgmt network , but it got fixed easily via the UI.
>
>
>
> Best Regards,
> Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FCMFOMJYSIMXLZKGPY3DDTJIM6IWWNXJ/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-15 Thread Strahil Nikolov

On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov  wrote:

 Ok,
I have managed to recover again and no issues are detected this time.I guess 
this case is quite rare and nobody has experienced that.

>Hi,>can you please explain how you fixed it?
I have set again to global maintenance, defined the HostedEngine from the old 
xml (taken from old vdsm log) , defined the network and powered it off.Set the 
OVF update period to 5 min , but it took several hours until the OVF_STORE were 
updated. Once this happened I restarted the ovirt-ha-agent ovirt-ha-broker on 
both nodes.Then I powered off the HostedEngine and undefined it from ovirt1.

then I set the maintenance to 'none' and the VM powered on ovirt1.
In order to test a failure, I removed the global maintenance and powered off 
the HostedEngine from itself (via ssh). It was brought back to the other node.
In order to test failure of ovirt2, I set ovirt1 in local maintenance and 
removed it (mode 'none') and again shutdown the VM via ssh and it started again 
to ovirt1.
It seems to be working, as I have later shut down the Engine several times and 
it managed to start without issues. 

I'm not sure this is related, but I had detected that ovirt2 was out-of-sync of 
the vdsm-ovirtmgmt network , but it got fixed easily via the UI.



Best Regards,Strahil Nikolov
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3B7OQUA733ETUA66TB7HF5Y24BLSI4XO/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-15 Thread Simone Tiraboschi
On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov 
wrote:

> Ok,
>
> I have managed to recover again and no issues are detected this time.
> I guess this case is quite rare and nobody has experienced that.
>

Hi,
can you please explain how you fixed it?


>
> Best Regards,
> Strahil Nikolov
>
> В сряда, 13 март 2019 г., 13:03:38 ч. Гринуич+2, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Dear Simone,
>
> it seems that there is some kind of problem ,as the OVF got updated with
> wrong configuration:
> [root@ovirt2 ~]# ls -l
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0}
>
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429:
> total 66591
> -rw-rw. 1 vdsm kvm   30720 Mar 12 08:06
> c3309fc0-8707-4de1-903d-8d4bbb024f81
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06
> c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
>
>
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0:
> total 66591
> -rw-rw. 1 vdsm kvm   30720 Mar 13 11:07
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta
>
> Starting the hosted-engine fails with:
>
> 2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm]
> (vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed
> (vm:937)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in
> _startUnderlyingVm
> self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in
> _run
> dom = self._connection.defineXML(self._domain.xml)
>   File
> "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line
> 131, in wrapper
> ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line
> 94, in wrapper
> return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in
> defineXML
> if ret is None:raise libvirtError('virDomainDefineXML() failed',
> conn=self)
> libvirtError: XML error: No PCI buses available
>
> Best Regards,
> Strahil Nikolov
>
>
> В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Dear Simone,
>
> it should be 60 min , but I have checked several hours after that and it
> didn't update it.
>
> [root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes
> OvfUpdateIntervalInMinutes: 60 version: general
>
> How can i make a backup of the VM config , as you have noticed the local
> copy in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ?
>
> I will keep the HostedEngine's xml - so I can redefine if needed.
>
> Best Regards,
> Strahil Nikolov
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NYOEDUYIIV3TYU6HWFHFNKHA45ZV2WFD/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-15 Thread Strahil Nikolov
 Ok,
I have managed to recover again and no issues are detected this time.I guess 
this case is quite rare and nobody has experienced that.
Best Regards,Strahil Nikolov

В сряда, 13 март 2019 г., 13:03:38 ч. Гринуич+2, Strahil Nikolov 
 написа:  
 
  Dear Simone,
it seems that there is some kind of problem ,as the OVF got updated with wrong 
configuration:[root@ovirt2 ~]# ls -l 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0}
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 
c3309fc0-8707-4de1-903d-8d4bbb024f81.meta

/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta

Starting the hosted-engine fails with:
2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm] 
(vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed 
(vm:937)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in 
_startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in _run
    dom = self._connection.defineXML(self._domain.xml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in 
wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: No PCI buses available

Best Regards,Strahil Nikolov


В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov 
 написа:  
 
  Dear Simone,
it should be 60 min , but I have checked several hours after that and it didn't 
update it.
[root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes
OvfUpdateIntervalInMinutes: 60 version: general

How can i make a backup of the VM config , as you have noticed the local copy 
in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ?
I will keep the HostedEngine's xml - so I can redefine if needed.
Best Regards,Strahil Nikolov
  
  
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZRPIBZKOD533HODP6VER726XWGQEZXM7/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-13 Thread Strahil Nikolov
 Dear Simone,
it seems that there is some kind of problem ,as the OVF got updated with wrong 
configuration:[root@ovirt2 ~]# ls -l 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0}
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 
c3309fc0-8707-4de1-903d-8d4bbb024f81.meta

/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta

Starting the hosted-engine fails with:
2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm] 
(vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed 
(vm:937)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in 
_startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in _run
    dom = self._connection.defineXML(self._domain.xml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in 
wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: No PCI buses available

Best Regards,Strahil Nikolov


В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov 
 написа:  
 
  Dear Simone,
it should be 60 min , but I have checked several hours after that and it didn't 
update it.
[root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes
OvfUpdateIntervalInMinutes: 60 version: general

How can i make a backup of the VM config , as you have noticed the local copy 
in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ?
I will keep the HostedEngine's xml - so I can redefine if needed.
Best Regards,Strahil Nikolov
  
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3XPJXJ4I4LVDDV47BTSXA4FQE3OM5T5J/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-12 Thread Strahil Nikolov
 Dear Simone,
it should be 60 min , but I have checked several hours after that and it didn't 
update it.
[root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes
OvfUpdateIntervalInMinutes: 60 version: general

How can i make a backup of the VM config , as you have noticed the local copy 
in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ?
I will keep the HostedEngine's xml - so I can redefine if needed.
Best Regards,Strahil Nikolov
  
  
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GHH4TKNKFWSBKJVX6UHIVB6R4EKS54EH/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-12 Thread Simone Tiraboschi
On Tue, Mar 12, 2019 at 9:48 AM Strahil Nikolov 
wrote:

> Latest update - the system is back and running normally.
> After a day (or maybe a little more), the OVF is OK:
>

Normally it should try every 60 minutes.
Can you please execute
engine-config -g OvfUpdateIntervalInMinutes
on your engine VM and check the results? it should be 60 minutes by default.


>
> [root@ovirt1 ~]# ls -l
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0}
>
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429:
> total 66591
> -rw-rw. 1 vdsm kvm   30720 Mar 12 08:06
> c3309fc0-8707-4de1-903d-8d4bbb024f81
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06
> c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
>
>
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0:
> total 66591
> -rw-rw. 1 vdsm kvm   30720 Mar 12 08:06
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta
>
>
> Once it's got fixed, I have managed to start the hosted-engine properly (I
> have rebooted the whole cluster just to be on the safe side):
>
> [root@ovirt1 ~]# hosted-engine --vm-status
>
>
> --== Host ovirt1.localdomain (id: 1) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : True
> Hostname   : ovirt1.localdomain
> Host ID: 1
> Engine status  : {"health": "good", "vm": "up",
> "detail": "Up"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : 8ec26591
> local_conf_timestamp   : 49704
> Host timestamp : 49704
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=49704 (Tue Mar 12 10:47:43 2019)
> host-id=1
> score=3400
> vm_conf_refresh_time=49704 (Tue Mar 12 10:47:43 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUp
> stopped=False
>
>
> --== Host ovirt2.localdomain (id: 2) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : True
> Hostname   : ovirt2.localdomain
> Host ID: 2
> Engine status  : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : f9f39dcd
> local_conf_timestamp   : 14458
> Host timestamp : 14458
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=14458 (Tue Mar 12 10:47:41 2019)
> host-id=2
> score=3400
> vm_conf_refresh_time=14458 (Tue Mar 12 10:47:41 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineDown
> stopped=False
>
>
>
> Best Regards,
> Strahil Nikolov
>
> В неделя, 10 март 2019 г., 5:05:33 ч. Гринуич+2, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Hello again,
>
> Latest update: the engine is up and running (or at least the login portal).
>
> [root@ovirt1 ~]# hosted-engine --check-liveliness
> Hosted Engine is up!
>
> I have found online the xml for the network:
>
> [root@ovirt1 ~]# cat ovirtmgmt_net.xml
> 
>   vdsm-ovirtmgmt
>   
>   
>   
>
> Sadly, I had to create a symbolic link to the main disk in
> /var/run/vdsm/storage , as it was missing.
>
> So, what's next.
>
> Issues up to now:
> 2 OVF - 0 bytes
> Problem with local copy of the HostedEngine config - used xml from an old
> vdsm log
> Missing vdsm-ovirtmgmt definition
> No link for the main raw disk in /var/run/vdsm/storage .
>
> Can you hint me how to recover the 2 OVF tars now ?
>
> Best Regards,
> Strahil Nikolov
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A23KL3SPOAMCJKGUQZUX65IF46AYITK7/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-12 Thread Strahil Nikolov
 Latest update - the system is back and running normally.After a day (or maybe 
a little more), the OVF is OK:
[root@ovirt1 ~]# ls -l 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0}
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 
c3309fc0-8707-4de1-903d-8d4bbb024f81.meta

/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0:
total 66591
-rw-rw. 1 vdsm kvm   30720 Mar 12 08:06 9460fc4b-54f3-48e3-b7b6-da962321ecf4
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta


Once it's got fixed, I have managed to start the hosted-engine properly (I have 
rebooted the whole cluster just to be on the safe side):
[root@ovirt1 ~]# hosted-engine --vm-status


--== Host ovirt1.localdomain (id: 1) status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.localdomain
Host ID    : 1
Engine status  : {"health": "good", "vm": "up", "detail": 
"Up"}
Score  : 3400
stopped    : False
Local maintenance  : False
crc32  : 8ec26591
local_conf_timestamp   : 49704
Host timestamp : 49704
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=49704 (Tue Mar 12 10:47:43 2019)
    host-id=1
    score=3400
    vm_conf_refresh_time=49704 (Tue Mar 12 10:47:43 2019)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineUp
    stopped=False


--== Host ovirt2.localdomain (id: 2) status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt2.localdomain
Host ID    : 2
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 3400
stopped    : False
Local maintenance  : False
crc32  : f9f39dcd
local_conf_timestamp   : 14458
Host timestamp : 14458
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=14458 (Tue Mar 12 10:47:41 2019)
    host-id=2
    score=3400
    vm_conf_refresh_time=14458 (Tue Mar 12 10:47:41 2019)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineDown
    stopped=False



Best Regards,Strahil Nikolov

В неделя, 10 март 2019 г., 5:05:33 ч. Гринуич+2, Strahil Nikolov 
 написа:  
 
  Hello again,
Latest update: the engine is up and running (or at least the login portal).
[root@ovirt1 ~]# hosted-engine --check-livelinessHosted Engine is up!
I have found online the xml for the network:
[root@ovirt1 ~]# cat ovirtmgmt_net.xml   vdsm-ovirtmgmt  
    
Sadly, I had to create a symbolic link to the main disk in 
/var/run/vdsm/storage , as it was missing.
So, what's next.
Issues up to now:2 OVF - 0 bytesProblem with local copy of the HostedEngine 
config - used xml from an old vdsm logMissing vdsm-ovirtmgmt definitionNo link 
for the main raw disk in /var/run/vdsm/storage .
Can you hint me how to recover the 2 OVF tars now ?
Best Regards,Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ODFPHVS5LYY6JWFWKWR3PBYTF3QSDKGV/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-09 Thread Strahil Nikolov
 Hello again,
Latest update: the engine is up and running (or at least the login portal).
[root@ovirt1 ~]# hosted-engine --check-livelinessHosted Engine is up!
I have found online the xml for the network:
[root@ovirt1 ~]# cat ovirtmgmt_net.xml   vdsm-ovirtmgmt  
    
Sadly, I had to create a symbolic link to the main disk in 
/var/run/vdsm/storage , as it was missing.
So, what's next.
Issues up to now:2 OVF - 0 bytesProblem with local copy of the HostedEngine 
config - used xml from an old vdsm logMissing vdsm-ovirtmgmt definitionNo link 
for the main raw disk in /var/run/vdsm/storage .
Can you hint me how to recover the 2 OVF tars now ?
Best Regards,Strahil Nikolov
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/57QC4DEXCVF6AEIDFDLDBYSPZQIYJGOR/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-09 Thread Strahil Nikolov
 Hi Simone,
and thanks for your help.
So far I found out that there is some problem with the local copy of the 
HostedEngine config (see attached part of vdsm.log).
I have found out an older xml configuration (in an old vdsm.log) and defining 
the VM works, but powering it on reports:
[root@ovirt1 ~]# virsh define hosted-engine.xmlDomain HostedEngine defined from 
hosted-engine.xml
[root@ovirt1 ~]# virsh list --all Id    Name                           
State -     HostedEngine    
               shut off
[root@ovirt1 ~]# virsh start HostedEngineerror: Failed to start domain 
HostedEngineerror: Network not found: no network with matching name 
'vdsm-ovirtmgmt'
[root@ovirt1 ~]# virsh net-list --all Name                 State      Autostart 
    Persistent-- 
;vdsmdummy;          active     no            no default              inactive  
 no            yes
[root@ovirt1 ~]# brctl showbridge name     bridge id               STP enabled  
   interfaces;vdsmdummy;             8000.       noovirtmgmt        
       8000.bc5ff467f5b3       no              enp2s0
[root@ovirt1 ~]# ip a s1: lo:  mtu 65536 qdisc noqueue 
state UNKNOWN group default qlen 1000    link/loopback 00:00:00:00:00:00 brd 
00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever 
preferred_lft forever    inet6 ::1/128 scope host        valid_lft forever 
preferred_lft forever2: enp2s0:  mtu 9000 
qdisc mq master ovirtmgmt state UP group default qlen 1000    link/ether 
bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff3: ovs-system:  mtu 
1500 qdisc noop state DOWN group default qlen 1000    link/ether 
f6:78:c7:2d:32:f9 brd ff:ff:ff:ff:ff:ff4: br-int:  mtu 
1500 qdisc noop state DOWN group default qlen 1000    link/ether 
66:36:dd:63:dc:48 brd ff:ff:ff:ff:ff:ff20: ovirtmgmt: 
 mtu 9000 qdisc noqueue state UP group default 
qlen 1000    link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff    inet 
192.168.1.90/24 brd 192.168.1.255 scope global ovirtmgmt       valid_lft 
forever preferred_lft forever    inet 192.168.1.243/24 brd 192.168.1.255 scope 
global secondary ovirtmgmt       valid_lft forever preferred_lft forever    
inet6 fe80::be5f:f4ff:fe67:f5b3/64 scope link        valid_lft forever 
preferred_lft forever21: ;vdsmdummy;:  mtu 1500 qdisc noop 
state DOWN group default qlen 1000    link/ether ce:36:8d:b7:64:bd brd 
ff:ff:ff:ff:ff:ff

192.168.1.243/24 is the one of the IPs in ctdb..

So , now comes the question - is there an xml in the logs that defines the 
network ?My hope is to power up the HostedEngine properly and hope that it will 
push all the configurations to the right places ... maybe this is way too 
optimistic.
At least I have learned a lot for oVirt.
Best Regards,Strahil Nikolov


В четвъртък, 7 март 2019 г., 17:55:12 ч. Гринуич+2, Simone Tiraboschi 
 написа:  
 
 

On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov  wrote:

 

  
>The OVF_STORE volume is going to get periodically recreated by the engine so 
>at least you need a running engine.
>In order to avoid this kind of issue we have two OVF_STORE disks, in your case:
>MainThread::INFO::2019-03-06 
>06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, 
>volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 
>06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, 
>volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4
>Can you please check if you have at lest the second copy?
Second Copy is empty too:[root@ovirt1 ~]# ll 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
total 66561
-rw-rw. 1 vdsm kvm   0 Mar  4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar  4 05:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.meta



>And even in the case you lost both, we are storing on the shared storage the 
>initial vm.conf:>MainThread::ERROR::2019-03-06 
>>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store)
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial 
>vm.conf

>Can you please check what do you have in 
>/var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following:
[root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf
# Editing the hosted engine VM is only possible via the manager UI\API
# This file was generated at Thu Mar  7 15:37:26 2019

vmId=8474ae07-f172-4a20-b516-375c73903df7
memSize=4096
display=vnc
devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-08 Thread Simone Tiraboschi
On Fri, Mar 8, 2019 at 12:49 PM Strahil Nikolov 
wrote:

> Hi Simone,
>
> sadly it seems that starting the engine from an alternative config is not
> working.
> Virsh reports that the VM is defined , but shut down and the dumpxml
> doesn't show any disks - maybe this is normal for oVirt (I have never
> checked a running VM).
>

No, it's not:

devices={index:0,iface:virtio,format:raw,poolID:-
---,volumeID:a9ab832f-c4f2-4b9b-
9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-
965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-
2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-
9d99-6393fd995979,address:{bus:0x00, slot:0x06,

the disk was definitively there in vm.conf but then the VM ignores it.
I'd suggest to double check vdsm.log for errors or something like that.


>
> Strangely , both OVF have been wiped out at almost the same time.
>
> I'm attaching some console output and gluster logs. In the gluster stuff I
> can see:
> glusterd.log
>
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 47 times between [2019-03-04 05:
> 17:15.810686] and [2019-03-04 05:19:08.576724]
> [2019-03-04 05:19:16.147795] I [MSGID: 106488]
> [glusterd-handler.c:1558:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
> [2019-03-04 05:19:16.149524] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-03-04 05:19:42.728693] E [MSGID: 106419]
> [glusterd-utils.c:6943:glusterd_add_inode_size_to_dict] 0-management: could
> not find (null) to getinode size f
> or systemd-1 (autofs): (null) package missing?
> [2019-03-04 05:20:54.236659] I [MSGID: 106499]
> [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume
> data
> [2019-03-04 05:20:54.245844] I [MSGID: 106499]
> [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume
> engine
>
>
> and the log of the mountpoint:
>
> [2019-03-04 05:19:35.381378] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-03-04 05:19:37.294931] I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 0-engine-replicate-0: selecting
> local read_child engine-client-0
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-engine-replicate-0: selecting local read_child engine-client-0" repeated
> 7 times
>  between [2019-03-04 05:19:37.294931] and [2019-03-04 05:21:26.171701]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 865 times between [2019-03-04 05
> :19:35.381378] and [2019-03-04 05:21:26.233004]
> [2019-03-04 05:21:35.699082] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-03-04 05:21:38.671811] I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 0-engine-replicate-0: selecting
> local read_child engine-client-0
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-engine-replicate-0: selecting local read_child engine-client-0" repeated
> 7 times
>  between [2019-03-04 05:21:38.671811] and [2019-03-04 05:23:31.654205]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 889 times between [2019-03-04 05
> :21:35.699082] and [2019-03-04 05:23:32.613797]
>
>
Adding also Sahina here.


>
>
> Best Regards,
> Strahil Nikolov
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FLPFUNGH3FKQPXNRH3JM2JDXU6S4M2XJ/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-07 Thread Simone Tiraboschi
On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov 
wrote:

>
>
>
> >The OVF_STORE volume is going to get periodically recreated by the engine
> so at least you need a running engine.
>
> >In order to avoid this kind of issue we have two OVF_STORE disks, in your
> case:
>
> >MainThread::INFO::2019-03-06
> 06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429,
> volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81
> >MainThread::INFO::2019-03-06
> 06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0,
> volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4
>
> >Can you please check if you have at lest the second copy?
>
> Second Copy is empty too:
> [root@ovirt1 ~]# ll
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
> total 66561
> -rw-rw. 1 vdsm kvm   0 Mar  4 05:23
> c3309fc0-8707-4de1-903d-8d4bbb024f81
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar  4 05:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
>
>
>
> >And even in the case you lost both, we are storing on the shared storage
> the initial vm.conf:
> >MainThread::ERROR::2019-03-06
> >06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store)
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
> vm.conf
>
> >Can you please check what do you have
> in /var/run/ovirt-hosted-engine-ha/vm.conf ?
>
> It exists and has the following:
>
> [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf
> # Editing the hosted engine VM is only possible via the manager UI\API
> # This file was generated at Thu Mar  7 15:37:26 2019
>
> vmId=8474ae07-f172-4a20-b516-375c73903df7
> memSize=4096
> display=vnc
> devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
> type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk}
> devices={index:0,iface:virtio,format:raw,poolID:----,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00,
> slot:0x06, domain:0x, type:pci,
> function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
> devices={device:scsi,model:virtio-scsi,type:controller}
> devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00,
> slot:0x03, domain:0x, type:pci,
> function:0x0},device:bridge,type:interface}
> devices={device:console,type:console}
> devices={device:vga,alias:video0,type:video}
> devices={device:vnc,type:graphics}
> vmName=HostedEngine
>
> spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
> smp=1
> maxVCpus=8
> cpuType=Opteron_G5
> emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(',
> ')|first
> devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}
>

You should be able to copy it to /root/myvm.conf.xml and start the engine
VM with
hosted-engine --vm-start --vm-conf=/root/myvm.conf


>
>
>
> Also, I think this happened when I was upgrading ovirt1 (last in the
> gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I
> forgot to enable the global maintenance.
>
>
> >Sorry, I don't understand
> >Can you please explain what happened?
>
> I have updated the engine first -> All OK, next was the arbiter -> again
> no issues with it.
> Next was the empty host -> ovirt2 and everything went OK.
> After that I migrated the engine to ovirt2 , and tried to updated ovirt1.
> The web showed that the installation failed, but using "yum update" was
> working.
> During the update via yum of ovirt1 -> the engine app crashed and
> restarted on ovirt2.
> After the reboot of ovirt1 I have noticed the error about pinging the
> gateway ,thus I stopped the engine and stopped the following services on
> both hosts (global maintenance):
> ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock
>
> Next was a reinitialization of the sanlock space via 'sanlock direct -s'.
> In the end I have managed to power on the hosted-engine and it was running
> for a while.
>
> As the errors did not stop - I have decided to shutdown everything, then
> power it up , heal gluster and check what will happen.
>
> Currently I'm not able to power up the engine:
>
>
> [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>

Please notice that in 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-07 Thread Strahil Nikolov
 

  
>The OVF_STORE volume is going to get periodically recreated by the engine so 
>at least you need a running engine.
>In order to avoid this kind of issue we have two OVF_STORE disks, in your case:
>MainThread::INFO::2019-03-06 
>06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, 
>volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 
>06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, 
>volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4
>Can you please check if you have at lest the second copy?
Second Copy is empty too:[root@ovirt1 ~]# ll 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
total 66561
-rw-rw. 1 vdsm kvm   0 Mar  4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar  4 05:24 
c3309fc0-8707-4de1-903d-8d4bbb024f81.meta



>And even in the case you lost both, we are storing on the shared storage the 
>initial vm.conf:>MainThread::ERROR::2019-03-06 
>>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store)
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial 
>vm.conf

>Can you please check what do you have in 
>/var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following:
[root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf
# Editing the hosted engine VM is only possible via the manager UI\API
# This file was generated at Thu Mar  7 15:37:26 2019

vmId=8474ae07-f172-4a20-b516-375c73903df7
memSize=4096
display=vnc
devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, 
type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk}
devices={index:0,iface:virtio,format:raw,poolID:----,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00,
 slot:0x06, domain:0x, type:pci, 
function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
devices={device:scsi,model:virtio-scsi,type:controller}
devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00,
 slot:0x03, domain:0x, type:pci, function:0x0},device:bridge,type:interface}
devices={device:console,type:console}
devices={device:vga,alias:video0,type:video}
devices={device:vnc,type:graphics}
vmName=HostedEngine
spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
smp=1
maxVCpus=8
cpuType=Opteron_G5
emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(',
 ')|first
devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}




Also, I think this happened when I was upgrading ovirt1 (last in the gluster 
cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to 
enable the global maintenance.

>Sorry, I don't understand>Can you please explain what happened?
I have updated the engine first -> All OK, next was the arbiter -> again no 
issues with it.Next was the empty host -> ovirt2 and everything went OK.After 
that I migrated the engine to ovirt2 , and tried to updated ovirt1.The web 
showed that the installation failed, but using "yum update" was working.During 
the update via yum of ovirt1 -> the engine app crashed and restarted on 
ovirt2.After the reboot of ovirt1 I have noticed the error about pinging the 
gateway ,thus I stopped the engine and stopped the following services on both 
hosts (global maintenance):ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd 
sanlock
Next was a reinitialization of the sanlock space via 'sanlock direct -s'. 
In the end I have managed to power on the hosted-engine and it was running for 
a while.
As the errors did not stop - I have decided to shutdown everything, then power 
it up , heal gluster and check what will happen.
Currently I'm not able to power up the engine:

[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status


!! Cluster is in GLOBAL MAINTENANCE mode !!



--== Host ovirt1.localdomain (id: 1) status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.localdomain
Host ID    : 1
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-07 Thread Simone Tiraboschi
On Thu, Mar 7, 2019 at 9:19 AM Strahil Nikolov 
wrote:

> Hi Simone,
>
> I think I found the problem - ovirt-ha cannot extract the file containing
> the needed data .
> In my case it is completely empty:
>
>
> [root@ovirt1 ~]# ll
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0
> total 66561
> -rw-rw. 1 vdsm kvm   0 Mar  4 05:21
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4
> -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease
> -rw-r--r--. 1 vdsm kvm 435 Mar  4 05:22
> 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta
>
>
> Any hint how to recreate that ? Maybe wipe and restart the ovirt-ha-broker
> and agent ?
>

The OVF_STORE volume is going to get periodically recreated by the engine
so at least you need a running engine.

In order to avoid this kind of issue we have two OVF_STORE disks, in your
case:

MainThread::INFO::2019-03-06
06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429,
volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81
MainThread::INFO::2019-03-06
06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0,
volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4

Can you please check if you have at lest the second copy?

And even in the case you lost both, we are storing on the shared storage
the initial vm.conf:
MainThread::ERROR::2019-03-06
06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store)
Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
vm.conf

Can you please check what do you have
in /var/run/ovirt-hosted-engine-ha/vm.conf ?


>
> Also, I think this happened when I was upgrading ovirt1 (last in the
> gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I
> forgot to enable the global maintenance.
>

Sorry, I don't understand.
Can you please explain what happened?



>
>
>
> Best Regards,
> Strahil Nikolov
>
> В сряда, 6 март 2019 г., 16:57:30 ч. Гринуич+2, Simone Tiraboschi <
> stira...@redhat.com> написа:
>
>
>
>
> On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov 
> wrote:
>
> Hi Simone,
>
> thanks for your reply.
>
> >Are you really sure that the issue was on the ping?
> >on storage errors the broker restart itself and while the broker is
> restarting >the agent cannot ask the broker to trigger the gateway monitor
> (the ping one) and >so that error message.
>
> It seemed so in that moment, but I'm not so sure , right now :)
>
> >Which kind of storage are you using?
> >can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ?
>
> I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount.
> Please , have a look in the attached logs.
>
>
> Nothing seems that strange there but that error.
> Can you please try with ovirt-ha-agent and ovirt-ha-broker in debug mode?
> you have to set level=DEBUG in [logger_root] section
> in /etc/ovirt-hosted-engine-ha/agent-log.conf
> and /etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two
> services.
>
>
>
> Best Regards,
> Strahil Nikolov
>
> В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi <
> stira...@redhat.com> написа:
>
>
>
>
> On Wed, Mar 6, 2019 at 6:13 AM Strahil  wrote:
>
> Hi guys,
>
> After updating to 4.3.1 I had an issue where the ovirt-ha-broker was
> complaining that it couldn't ping the gateway.
>
>
> Are you really sure that the issue was on the ping?
> on storage errors the broker restart itself and while the broker is
> restarting the agent cannot ask the broker to trigger the gateway monitor
> (the ping one) and so that error message.
>
>
> As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker,
> vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace.
>
> I gues s I didn't do it properly as now I receive:
>
> ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
> vm.conf
>
> Any hints how to fix this ? Of course a redeploy is possible, but I prefer
> to recover from that.
>
>
> Which kind of storage are you using?
> can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ?
>
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/
>
> ___
> Users mailing list -- users@ovirt.org
> To 

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-07 Thread Strahil Nikolov
 Hi Simone,
I think I found the problem - ovirt-ha cannot extract the file containing the 
needed data .In my case it is completely empty:

[root@ovirt1 ~]# ll 
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0total
 66561-rw-rw. 1 vdsm kvm       0 Mar  4 05:21 
9460fc4b-54f3-48e3-b7b6-da962321ecf4-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 
9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease-rw-r--r--. 1 vdsm kvm     435 Mar  4 
05:22 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta

Any hint how to recreate that ? Maybe wipe and restart the ovirt-ha-broker and 
agent ?
Also, I think this happened when I was upgrading ovirt1 (last in the gluster 
cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to 
enable the global maintenance.


Best Regards,Strahil Nikolov
В сряда, 6 март 2019 г., 16:57:30 ч. Гринуич+2, Simone Tiraboschi 
 написа:  
 
 

On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov  wrote:

 Hi Simone,
thanks for your reply.
>Are you really sure that the issue was on the ping?>on storage errors the 
>broker restart itself and while the broker is restarting >the agent cannot ask 
>the broker to trigger the gateway monitor (the ping one) and >so that error 
>message.
It seemed so in that moment, but I'm not so sure , right now :)
>Which kind of storage are you using?>can you please attach 
>/var/log/ovirt-hosted-engine-ha/broker.log ?
I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount.Please , have a look in 
the attached logs.

Nothing seems that strange there but that error.Can you please try with 
ovirt-ha-agent and ovirt-ha-broker in debug mode?you have to set level=DEBUG in 
[logger_root] section in /etc/ovirt-hosted-engine-ha/agent-log.conf and 
/etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two services. 

Best Regards,Strahil Nikolov

В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi 
 написа:  
 
 

On Wed, Mar 6, 2019 at 6:13 AM Strahil  wrote:


Hi guys,

After updating to 4.3.1 I had an issue where the ovirt-ha-broker was 
complaining that it couldn't ping the gateway.



Are you really sure that the issue was on the ping?on storage errors the broker 
restart itself and while the broker is restarting the agent cannot ask the 
broker to trigger the gateway monitor (the ping one) and so that error message. 

As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, vdsmd, 
supervdsmd and sanlock on the nodes and reinitialized the lockspace.

I gues s I didn't do it properly as now I receive:

ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed 
extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf

Any hints how to fix this ? Of course a redeploy is possible, but I prefer to 
recover from that.


Which kind of storage are you using?can you please attach 
/var/log/ovirt-hosted-engine-ha/broker.log ? 

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/
  
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QGBP6GYMCMEMI7GM2RB5OQOWMMNILDX5/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-06 Thread Simone Tiraboschi
On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov 
wrote:

> Hi Simone,
>
> thanks for your reply.
>
> >Are you really sure that the issue was on the ping?
> >on storage errors the broker restart itself and while the broker is
> restarting >the agent cannot ask the broker to trigger the gateway monitor
> (the ping one) and >so that error message.
>
> It seemed so in that moment, but I'm not so sure , right now :)
>
> >Which kind of storage are you using?
> >can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ?
>
> I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount.
> Please , have a look in the attached logs.
>

Nothing seems that strange there but that error.
Can you please try with ovirt-ha-agent and ovirt-ha-broker in debug mode?
you have to set level=DEBUG in [logger_root] section
in /etc/ovirt-hosted-engine-ha/agent-log.conf
and /etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two
services.


>
> Best Regards,
> Strahil Nikolov
>
> В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi <
> stira...@redhat.com> написа:
>
>
>
>
> On Wed, Mar 6, 2019 at 6:13 AM Strahil  wrote:
>
> Hi guys,
>
> After updating to 4.3.1 I had an issue where the ovirt-ha-broker was
> complaining that it couldn't ping the gateway.
>
>
> Are you really sure that the issue was on the ping?
> on storage errors the broker restart itself and while the broker is
> restarting the agent cannot ask the broker to trigger the gateway monitor
> (the ping one) and so that error message.
>
>
> As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker,
> vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace.
>
> I gues s I didn't do it properly as now I receive:
>
> ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
> vm.conf
>
> Any hints how to fix this ? Of course a redeploy is possible, but I prefer
> to recover from that.
>
>
> Which kind of storage are you using?
> can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ?
>
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/S3YOLMXMNXPT4B32Y4CYPNQRQXWA2UO3/


[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

2019-03-05 Thread Simone Tiraboschi
On Wed, Mar 6, 2019 at 6:13 AM Strahil  wrote:

> Hi guys,
>
> After updating to 4.3.1 I had an issue where the ovirt-ha-broker was
> complaining that it couldn't ping the gateway.
>

Are you really sure that the issue was on the ping?
on storage errors the broker restart itself and while the broker is
restarting the agent cannot ask the broker to trigger the gateway monitor
(the ping one) and so that error message.


> As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker,
> vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace.
>
> I gues s I didn't do it properly as now I receive:
>
> ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
> vm.conf
>
> Any hints how to fix this ? Of course a redeploy is possible, but I prefer
> to recover from that.
>

Which kind of storage are you using?
can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ?


> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/