[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-09 Thread souvaliotimaria
The output of the getfattr command on the nodes was the following:

Node1:
[root@ov-no1 ~]# getfattr -d -m . -e hex 
/gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
getfattr: Removing leading '/' from absolute path names
# file: 
gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x0394
trusted.afr.engine-client-2=0x
trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf
trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437
trusted.glusterfs.mdata=0x015fec62872f5849585fec62872f5849585d791c1a00ba286e
trusted.glusterfs.shard.block-size=0x0400
trusted.glusterfs.shard.file-size=0x00190092040b


Node2:
[root@ov-no2 ~]#  getfattr -d -m . -e hex 
/gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
getfattr: Removing leading '/' from absolute path names
# file: 
gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x
trusted.afr.engine-client-0=0x043a
trusted.afr.engine-client-2=0x
trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf
trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437
trusted.glusterfs.mdata=0x015fec62872f5849585fec62872f5849585d791c1a00ba286e
trusted.glusterfs.shard.block-size=0x0400
trusted.glusterfs.shard.file-size=0x00190092040b


Node3:
[root@ov-no3 ~]#  getfattr -d -m . -e hex 
/gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
getfattr: Removing leading '/' from absolute path names
# file: 
gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x
trusted.afr.engine-client-0=0x0444
trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf
trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437
trusted.glusterfs.mdata=0x015fec62872f5849585fec62872f5849585d791c1a00ba286e
trusted.glusterfs.shard.block-size=0x0400
trusted.glusterfs.shard.file-size=0x00190092040b
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7URDMDQ7LDUPNS6YDBVAS/


[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-08 Thread souvaliotimaria
Thank you for your reply.
I'm trying that right now and I see it triggered the self-healing process. 
I will come back with an update.
Best regards.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKW4RAVHVOZN6CZVK2TOC7727DHLKWRZ/


[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-08 Thread souvaliotimaria
Thank you. 
I have tried that and it didn't work as the system sees that the file is not in 
split-brain.
I have also tried force heal and full heal and still nothing. I always end up 
with the entry being stuck in unsynched stage.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W5AJ4PKEK36NZEIAPTX3UQD6P7EZM7EL/


[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-04 Thread souvaliotimaria
Hello again, 
I've tried to heal the brick with latest-mtime, but I get the following:

gluster volume heal engine split-brain latest-mtime 
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
Healing 
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
 failed: File not in split-brain.
Volume heal failed.

Should I try the solution described in this question, where we manually remove 
the conflicting entry, triggering the heal operations? 
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/RPYIMSQCBYVQ654HYGBN5NCPRVCGRRYB/#H6EBSPL5XRLBUVZBE7DGSY25YFPIR2KY
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CCRNM7N3FSUYXDHFP2XDMGAMKSHBMJQQ/


[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-04 Thread souvaliotimaria
I tried only the simple healing because I wasn't sure if I'd mess the gluster 
more than it already is. 
I will try latest-mtime in a couple of hours because the system is a production 
system and I have to do it after office hours. I will come back with an update.
Thank you very much for your help!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YHI63SPSJG6MNAI6737LZXS5ZG5UPXAG/


[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-03 Thread souvaliotimaria
Hello,

Thank you very much for your reply.

I get the following from the below gluster commands:

[root@ov-no1 ~]# gluster volume heal engine info split-brain
Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries in split-brain: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries in split-brain: 0

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries in split-brain: 0


[root@ov-no1 ~]# gluster volume heal engine info summary
Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Total Number of entries: 1
Number of entries in heal pending: 1
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Total Number of entries: 1
Number of entries in heal pending: 1
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Total Number of entries: 1
Number of entries in heal pending: 1
Number of entries in split-brain: 0
Number of entries possibly healing: 0


[root@ov-no1 ~]# gluster volume info
Volume Name: data
Type: Replicate
Volume ID: 6c7bb2e4-ed35-4826-81f6-34fcd2d0a984
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ov-no1.ariadne-t.local:/gluster_bricks/data/data
Brick2: ov-no2.ariadne-t.local:/gluster_bricks/data/data
Brick3: 
ov-no3.ariadne-t.local:/gluster_bricks/data/data (arbiter)
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.strict-o-direct: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
network.ping-timeout: 30
storage.owner-uid: 36
storage.owner-gid: 36
cluster.granular-entry-heal: enable

Volume Name: engine
Type: Replicate
Volume ID: 7173c827-309f-4e84-a0da-6b2b8eb50264
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 
ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Brick2: 
ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
Brick3: 
ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.strict-o-direct: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: 

[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-03-01 Thread souvaliotimaria
Hello again, 

I am back with a brief description of the situation I am in, and questions 
about the recovery. 

oVirt environment: 4.3.5.2 Hyperconverged
GlusterFS: Replica 2 + Arbiter 1
GlusterFS volumes: data, engine, vmstore

The current situation is the following:

- The Cluster is in Global Maintenance.

- The volume engine is up with comment (in the Web GUI) : Up, unsynched 
entries, needs healing.

- The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the 
output of virsh list --all command shows that the HostedEngine is running.

I tried to issue the gluster heal command (gluster volume heal engine) but 
nothing changed.

I have the following questions:

1. Should I restart the glusterd service? Where from? Is it enough if the 
glusterd is restarted on one host or should it be restarted on the other two as 
well?

2. Should the node that was NonResponsive and came back, be rebooted or not? It 
seems alright now and in good health.

3. Should the HostedEngine be restored with engine-backup or is it not 
necessary?

4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive 
host?
The nsswitch file on the ovirt hosts and engine, has the DNS defined as:
hosts:  files dns myhostname

5. How can we recover/rectify the situation above?

Thanks for your help,
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GO6S6GXRJWYZN5NZ5IFTNQ6SGNEB75WQ/


[ovirt-users] Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

2021-02-27 Thread souvaliotimaria

Hello everyone,

Any help would be greatly appreciated in the following problem.

In my lab, the day before yesterday, we had power issues, with a UPS going 
off-line and following the power outage of the NFS/DNS server I have set up to 
serve ovirt with isos and as a DNS server (our other DNS servers are located as 
VMs within the oVirt environment). We found a broadcast storm on the switch 
(due to a faulty NIC on the aformentioned UPS) that the ovirt nodes are 
connected and later on had to re-establish several of the virtual connections 
as well. The above led to one of the hosts becoming NonResponsive, two machines 
becoming unresponsive and three VMs shuting down. 

The oVirt environment, version 4.3.5.2, is a replica 2 + arbiter 1 environment 
and runs GlusterFS with the recommended volumes of data, engine and vmstore.

So far, the times there was some kind of a problem, usually oVirt was able to 
solve it by its own.

This time, however, after we recovered from the above state, the volumes of 
data and vmstore successfully healing , the volume engine became stuck to the 
healing process (Up, unsynched entries, needs healing), and from the web GUI I 
see that the VM HostedEngine is paused due to a storage I/O error while the 
output of virsh list --all command shows that the HostedEngine is running.. How 
is that happening?

I tried to manually trigger the healing process for the volume but nothing with 
gluster volume heal engine

The command 
gluster volume heal engine info 
shows the following 

[root@ov-no3 ~]# gluster volume heal engine info
Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
 
Status: Connected
Number of entries: 1

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
 
Status: Connected
Number of entries: 1

This morning I came upon this Reddit post 
https://www.reddit.com/r/gluster/comments/fl3yb7/entries_stuck_in_heal_pending/ 
where it seems that after a graceful reboot one of the ovirt hosts, the gluster 
came back online after it completed the appropriate healing processes. The 
thing is from what I have read that when there are unsynched entries in the 
gluster a host cannot be put into maintenance mode so that it can be rebooted, 
correct?

Should I try to restart the glusterd service.

Could someone tell me what I should do?

Thank you all for your time and help,
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BSOF7BXAMVJ4IMYUEB3OBU4T64FGYA2J/


[ovirt-users] Re: QEMU error qemuDomainAgentAvailable in /var/log/messages

2021-01-08 Thread souvaliotimaria
Thank you very much for your answer.

The service is up and running in the engine but it is only loaded on the nodes. 
Should I start it (enabling also?) on the nodes to?

There is one VM that I have not installed the guest agent on and it is running 
on the arbitrary host.

Also, about the snapshots and backup, you mean the built-in ovirt capabilities 
or an external backup/snapshot program also?

Thank you again 
Maria 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SMY6FQ5MTBC37ZK47SQFYLARGVFPIZNN/


[ovirt-users] QEMU error qemuDomainAgentAvailable in /var/log/messages

2021-01-07 Thread souvaliotimaria
Hello everyone and a happy new year!

I have a question which might be silly but I am stumped. I keep getting the 
following error in my /var/log/messages

Jan 5 12:20:30 ovno3 libvirtd: 2021-01-05 10:20:30.481+: 5283: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU guest agent 
is not connected

This entry appears on the arbitrary node only and it has a recurrence of 5 
minutes.

I have GlusterFS on the ovirt environment (production environment) and it's 
serving several vital services. The VMs are running ok and I haven't noticed 
any discrepancy. Almost a week ago there was a disconnection on the gluster 
storage but since then everything works as expected.

Does anyone know what this error is and if there is a guide or something to fix 
it? I have no idea what to search and where.

Thank you all very much for your time!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RGYORYFEGLDMKP6PBJW537CGJFGO3FTR/


[ovirt-users] Re: Best Practice? Affinity Rules Enforcement Manager or High Availability?

2020-12-30 Thread souvaliotimaria
Thank you very much for your reply.
I will check this out immediately.

Best regards and merry holidays,
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJKVRBOVQXDJ4ON5GZPDRKPGZQCFCF6J/


[ovirt-users] Best Practice? Affinity Rules Enforcement Manager or High Availability?

2020-12-30 Thread souvaliotimaria
Hello everyone,

Not sure if I should ask this here as it seems to be a pretty obvious question 
but here it is.

What is the best solution for making your VMs able to automatically boot up on 
another working host when something goes wrong (gluster problem, non responsive 
host etc)? Would you enable the Affinity Manager and enforce some policies or 
would you set the VMs you want as Highly Available?

Thank you very much for your time!

Best regards, 
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JAHYLIWGSLRIOUMXWTH5Q6BRFD5WPD4/


[ovirt-users] Re: VM HostedEngine is down with error

2020-09-04 Thread souvaliotimaria
Hello, 

This is what I could gather from the gluster logs around the time frame of the 
HE shutdown.

NODE1:
[root@ov-no1 glusterfs]# more 
bricks/gluster_bricks-vmstore-vmstore.log-20200830 |egrep "( W | E )"|more
[2020-08-27 15:35:03.090477] W [glusterfsd.c:1570:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7fa6e04a3dd5] 
-->/usr/sbin/glusterfsd(glus
terfs_sigwaiter+0xe5) [0x55a40138d1b5] 
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a40138d01b] ) 0-: received 
signum (15), shutting down
[2020-08-27 15:35:14.926794] E [MSGID: 100018] 
[glusterfsd.c:2333:glusterfs_pidfile_update] 0-glusterfsd: pidfile 
/var/run/gluster/vols/vmstore/ov-no
1.ariadne-t.local-gluster_bricks-vmstore-vmstore.pid lock failed [Resource 
temporarily unavailable]


[root@ov-no1 glusterfs]# more bricks/gluster_bricks-data-data.log-20200830 
|egrep "( W | E )"|more
[2020-08-27 15:35:01.087875] W [glusterfsd.c:1570:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7fc3cbf69dd5] 
-->/usr/sbin/glusterfsd(glus
terfs_sigwaiter+0xe5) [0x555e313711b5] 
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x555e3137101b] ) 0-: received 
signum (15), shutting down
[2020-08-27 15:35:14.890471] E [MSGID: 100018] 
[glusterfsd.c:2333:glusterfs_pidfile_update] 0-glusterfsd: pidfile 
/var/run/gluster/vols/data/ov-no1.a
riadne-t.local-gluster_bricks-data-data.pid lock failed [Resource temporarily 
unavailable]


[root@ov-no1 glusterfs]# more bricks/gluster_bricks-engine-engine.log-20200830 
|egrep "( W | E )"|more
[2020-08-27 15:35:02.088732] W [glusterfsd.c:1570:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f70b99cbdd5] 
-->/usr/sbin/glusterfsd(glus
terfs_sigwaiter+0xe5) [0x55ebd132b1b5] 
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55ebd132b01b] ) 0-: received 
signum (15), shutting down
[2020-08-27 15:35:14.907603] E [MSGID: 100018] 
[glusterfsd.c:2333:glusterfs_pidfile_update] 0-glusterfsd: pidfile 
/var/run/gluster/vols/engine/ov-no1
.ariadne-t.local-gluster_bricks-engine-engine.pid lock failed [Resource 
temporarily unavailable]


[root@ov-no1 glusterfs]# more bricks/gluster_bricks-vmstore-vmstore.log |egrep 
"( W | E )"|more
[nothing in the output]

[root@ov-no1 glusterfs]# more bricks/gluster_bricks-data-data.log |egrep "( W | 
E )"|more
[nothing in the output]

[root@ov-no1 glusterfs]# more bricks/gluster_bricks-engine-engine.log |egrep "( 
W | E )"|more
[nothing in the output]


[root@ov-no1 glusterfs]# more cmd_history.log | egrep "(WARN|error|fail)" |more
[2020-09-01 02:00:38.685251]  : volume geo-replication status : FAILED : Commit 
failed on ov-no2.ariadne-t.local. Please check log file for details.
Commit failed on ov-no3.ariadne-t.local. Please check log file for details.
[2020-09-01 03:02:39.094984]  : volume geo-replication status : FAILED : Commit 
failed on ov-no2.ariadne-t.local. Please check log file for details.
Commit failed on ov-no3.ariadne-t.local. Please check log file for details.
[2020-09-01 11:18:32.510224]  : volume geo-replication status : FAILED : Commit 
failed on ov-no2.ariadne-t.local. Please check log file for details.
Commit failed on ov-no3.ariadne-t.local. Please check log file for details.
[2020-09-01 14:24:33.778942]  : volume geo-replication status : FAILED : Commit 
failed on ov-no2.ariadne-t.local. Please check log file for details.
Commit failed on ov-no3.ariadne-t.local. Please check log file for details.




[root@ov-no1 glusterfs]# cat glusterd.log | egrep "( W | E )" |more
[2020-09-01 07:00:31.326169] E [glusterd-op-sm.c:8132:glusterd_op_sm] 
(-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x23a1e) [0x7f23d8ac8a1e]
 -->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x1c1be) [0x7f23d8ac11be] 
-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x4306f) [0x7f23
d8ae806f] ) 0-management: Unable to get transaction opinfo for transaction ID 
:435d3780-aa0c-4a64-bc28-56ae394159d0
[2020-09-01 08:02:31.551563] E [glusterd-op-sm.c:8132:glusterd_op_sm] 
(-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x23a1e) [0x7f23d8ac8a1e]
 -->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x1c1be) [0x7f23d8ac11be] 
-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x4306f) [0x7f23
d8ae806f] ) 0-management: Unable to get transaction opinfo for transaction ID 
:930a8a08-1044-41cf-b921-913b982e0c72
[2020-09-01 09:04:31.786157] E [glusterd-op-sm.c:8132:glusterd_op_sm] 
(-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x23a1e) [0x7f23d8ac8a1e]
 -->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x1c1be) [0x7f23d8ac11be] 
-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x4306f) [0x7f23
d8ae806f] ) 0-management: Unable to get transaction opinfo for transaction ID 
:9942b579-5240-4fee-bb4c-78b9a1c98da8
[2020-09-01 10:06:32.014362] E [glusterd-op-sm.c:8132:glusterd_op_sm] 
(-->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x23a1e) [0x7f23d8ac8a1e]
 -->/usr/lib64/glusterfs/6.4/xlator/mgmt/glusterd.so(+0x1c1be) [0x7f23d8ac11be] 

[ovirt-users] Re: VM HostedEngine is down with error

2020-09-03 Thread souvaliotimaria
Thank you very much for your reply. 

I checked the NTP and realized the service wasn't working properly on two of 
the three nodes, but despite that the clocks seemed to have the correct 
time(date and hwdate). I switched to chronyd and stopped the ntpd service and 
now it seems the servers' clocks are synchronized.

The time in BIOS has different time than the systems. Does this affect the 
behaviour of the overall performance?

This is what I could gather from the logs:

Node1:

[root@ov-no1 ~]# more /var/log/messages |egrep "(WARN|error)"|more
d
Aug 27 17:53:08 ov-no1 libvirtd: 2020-08-27 14:53:08.947+: 5613: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
Aug 27 17:58:08 ov-no1 libvirtd: 2020-08-27 14:58:08.943+: 5613: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
Aug 27 18:03:08 ov-no1 libvirtd: 2020-08-27 15:03:08.937+: 5614: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
Aug 27 18:08:08 ov-no1 libvirtd: 2020-08-27 15:08:08.951+: 5617: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
Aug 27 18:13:08 ov-no1 libvirtd: 2020-08-27 15:13:08.951+: 5616: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
Aug 27 18:18:08 ov-no1 libvirtd: 2020-08-27 15:18:08.942+: 5618: error : 
qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU gues
t agent is not connected
.
.
[Around that time Node 3, who is the arbiter node, was placed in local 
Maintenance mode. It was shut down for maintenance and when we boot it up again 
all seemed right. We removed it from the  maintenance mode and when the healing 
processes finished, Node 1 became NonResponsive. Long story short, the VDSM 
agent sent Node1 a restart command. Node1 rebooted, HostedEngine was up on 
Node1 and the rest of the VMs that were hosted by Node1 had to be manually 
brought up. Since then everything seemed to be working as it should. The 
HostedEngine VM shutdown with no apparent reason 5 days later, making us 
believe there was no connection between the two incidents.]

.
.
.
.
.
Sep  1 05:53:30 ov-no1 vdsm[6706]: WARN Worker blocked:  timeout=60, 
duration=60.00 at 0x7f1ed7381dd0> t
ask#=76268 at 0x7f1ebc0797d0>, traceback:#012File: 
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap#012  
self.__bootstrap_inner()#012Fil
e: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner#012  
self.run()#012File: "/usr/lib64/python2.7/threading.py", line 765, in run
#012  self.__target(*self.__args, **self.__kwargs)#012File: 
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in 
run#012  ret =
 func(*args, **kwargs)#012File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run#012  
self._execute_task()#012File: "/usr/lib/p
ython2.7/site-packages/vdsm/executor.py", line 315, in _execute_task#012  
task()#012File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 3
91, in __call__#012  self._callable()#012File: 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in 
__call__#012  self._handler(sel
f._ctx, self._req)#012File: 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in 
_serveRequest#012  response = self._handle_request
(req, ctx)#012File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", 
line 345, in _handle_request#012  res = method(**params)#012File: "/usr
/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in 
_dynamicMethod#012  result = fn(*methodArgs)#012File: 
"/usr/lib/python2.7/site-package
s/vdsm/gluster/apiwrapper.py", line 237, in geoRepSessionList#012  
remoteUserName)#012File: 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", l
ine 93, in wrapper#012  rv = func(*args, **kwargs)#012File: 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 551, in 
volumeGeoRepSessionL
ist#012  remoteUserName,#012File: 
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in 
__call__#012  return callMethod()#012File:
 "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in 
#012  **kwargs)#012File: "", line 2, in glusterVolumeGeoRep
Status#012File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, 
in _callmethod#012  kind, result = conn.recv()

Sep  1 05:54:30 ov-no1 vdsm[6706]: WARN Worker blocked:  timeout=60, 
duration=120.00 at 0x7f1ed7381dd0> 
task#=76268 at 0x7f1ebc0797d0>, traceback:#012File: 
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap#012  
self.__bootstrap_inner()#012Fi
le: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner#012  
self.run()#012File: "/usr/lib64/python2.7/threading.py", line 765, in ru
n#012  self.__target(*self.__args, **self.__kwargs)#012File: 

[ovirt-users] VM HostedEngine is down with error

2020-09-01 Thread souvaliotimaria
Hello everyone, 

I have a replica 2 + arbiter installation and this morning the Hosted Engine 
gave the following error on the UI and resumed on a different node (node3) than 
the one it was originally running(node1). (The original node has more memory 
than the one it ended up, but it had a better memory usage percentage at the 
time). Also, the only way I discovered the migration had happened and there was 
an Error in Events, was because I logged in the web interface of ovirt for a 
routine inspection. Βesides that, everything was working properly and still is.

The error that popped is the following:

VM HostedEngine is down with error. Exit message: internal error: qemu 
unexpectedly closed the monitor: 
2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
2020-09-01T06:49:20.927274Z qemu-kvm: -device 
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on:
 Failed to get "write" lock
Is another process using the image?.

Which from what I could gather concerns the following snippet from the 
HostedEngine.xml and it's the virtio disk of the Hosted Engine:


  
  

  
  
  d5de54b6-9f8e-4fba-819b-ebf6780757d2
  
  


I've tried looking into the logs and the sar command but I couldn't find 
anything to relate with the above errors and determining the reason for it to 
happen. Is this a Gluster or a QEMU problem?

The Hosted Engine was manually migrated five days before on node1.

Is there a standard practice I could follow to determine what happened and 
secure my system?

Thank you very much for your time, 
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HBU4P4E5ECOA6BNNFVLK2Y44ZX5UHYYE/


[ovirt-users] Re: Upgrade Memory of oVirt Nodes

2020-08-19 Thread souvaliotimaria
Hello again, 

Hope everyone's ok.

I'm really sorry for being so late, but I came back with the update.

We wanted to upgrade the memory because we plan on deploying several VMs on the 
platform, mostly for services, and thought that now was a better time to do the 
upgrade than later on.

I manually migrated some of the VMs on one another node, trying to keep the 
percentage of the physical memory of each node somewhat equal. The upgrade of 
the memory of the nodes happened with no problem (from 32GB to 72GB). The VMs 
that were active on the time (8 VMs including HE) didn't show any kind of 
downtime, slow-down or any other issue. The percentage of the memory of the two 
on-line nodes at the time was around 75-80%, with the HE consuming the most.

The upgrade happened right after I had to replace a failed SAS HDD (hot-plug) 
which contained the mirror of the HDD with the ovirt node OS. Everything went 
as my team and I hoped with no problems on either side.

For the storage; the deployment we have is GlusterFS with one node as arbiter. 
When the node rejoined the others, it took around 8-10 mins for the healing 
operations to be completed and since then everything's going perfectly well.

Thank you all for the time you took to respond to me and the valuable 
information you shared. It means a lot. 

Best regards, 
Maria
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YVZYK3OPJBUPPH7IY5GXFVQ4RORONGGM/


[ovirt-users] Upgrade Memory of oVirt Nodes

2020-05-19 Thread souvaliotimaria
Hello everyone, 

I have an oVirt 4.3.2.5 hyperconverged 3 node production environment and we 
want to add some RAM to it. 

Can I upgrade the RAM without my users noticing any disruptions and keep the 
VMs running?

The way I thought I should do it was to migrate any running VMs to the other 
nodes, then set one node in maintenance mode, shut it down, place the new 
memory, bring it back up, remove it from maintenance mode and see how the 
installation reacts and repeat for the other two nodes. Is this correct or 
should I follow another way?

Will there be a problem during the time when the nodes will not be identical in 
their resources?

Thank you for your time,
Souvalioti Maria 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/F4E6DMLL23QU6KGMUVUNRGNR3IUYCT5W/


[ovirt-users] Messed up 4.2.3.1 installation - SSL handshake ERROR

2019-09-25 Thread souvaliotimaria
Hello, everyone!
So, I have an experimental installation of ovirt 4.2.3.1 with 3 nodes and 
glustered. Recently I deployed a new installation with ovirt 4.3.5.2, 3nodes 
and glustered storage here as well. The thing is, in my enthousiasm I thought 
"hey! what if I can import the experimental nodes as hosts in the new 
installation in a new cluster and see what happens? Will the 4.3.5.2 engine see 
them? Probably yeah. But will it see the VMs I have there?" 
And so I imported the experimental nodes. Without detaching them from their 
hosted engine. I could see the only VM that was active at the moment and not 
one of the suspended ones and of course I could not see the 4.2.3.1 HE VM. 
I have removed the hosts from the new installation and I have tried 
reconnecting the old engine and its nodes. Passwordless ssh works just fine, 
but the problem persists. 
hosted-engine --vm-status reports stale-data on node 2 and node 3 
The thing is I know I messed the experimental installation (and I blame only my 
curiosity), SSL handshake is no longer feasable and I can't remove the hosts 
from the initial Cluster to Import them again. Basically everything is either 
in the process of activating without ever being able to do so or down or Non 
responsive.
I would like to find a way around this, as I have seen in other posts in the 
ovirt forum that the SSL handshake error appears in some other cases and I 
would like to have a know-how if an occasion like this occurs in the future in 
production.
Is it possible to re-deploy the engine on the nodes and not lose the glustered 
space or the existing VMs? Can the HE be destroyed and then deployed from 
scratch? What about the glustered space and the VMs' space? Will the VMs just 
take up space without being able to neither bring them up nor destroy them?
I know I'm asking a lot and it was my fault to begin with but I am really 
curious if we can see this through.
Thanks in advance
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FTJGN7TEE2Y4OEAN5IXMZT6QSBWZML2V/


[ovirt-users] VMs inaccessible from Ubuntu/Debian-based OS

2019-08-30 Thread souvaliotimaria
Hello all, 

I'm having an issue the past couple of days. I have tried anything I could find 
to solve this but with no success.

I have a three node ovirt installation, glustered, hyperconverged and there i 
have a few VMs. The installation is for experimental reasons before we merge it 
in our DC. Anyway, though I can connect to the VMs' console from Fedora and 
Windows, I can't from Ubuntu. 

I have tried installing the browser-spice-plugin that's the corresponding 
package to spice-xpi and I got no results.

I purged the browser-spice-plugin and then I installed the spice-client 
package, downloaded the spice-xpi from Fedora (FC19) (as instructed in 
https://www.ovirt.org/develop/infra/testing/spice.html),  copied the 
libnsISpicec.so to /usr/lib/mozilla/plugins and made sure that 
xserver-xorg-video-qxl and spice-vdagent were installed and in their latest 
version, and still nothing.

No matter what I have tried, I can't gain access to the console. No matter the 
browser I use, the message I get is "Unable to connect to the graphic server 
/tmp/mozilla_msouval0/console-1.vv".

I have checked the logs and couldn't find anything useful, maybe I'm not 
checking the right logs? 
I run tcpdump on both the node the VM is being hosted and the Ubuntu machine 
I'm using and though on the UBuntu I capture a few packets (6) on both sides 
but on the Ubuntu side there were 15 packets received by the filter.

Could you please guide towards the right way to solve this? Could this be an 
ntp problem? 

ovirt node version: 4.2.3.1
Workstation: Jessie 

(I thought maybe Jessie was too old, and I run the same steps from Ubuntu 14.04 
and Ubuntu 16.04, but the problem still remains)

Thank you in advance for any help.
Maria
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4VMMYEJNSHGUHNRDA6AM6TNLBL4JLL6O/


[ovirt-users] Re: Hosted Engine Abruptly Stopped Responding - Unexpected Shutdown

2019-06-13 Thread souvaliotimaria
Hello and thank you very much for your reply.

I'm terribly sorry for being so late to respond. 

I thought the same, that dropping the cache was more of a workaround and not a 
real solution but truthfully I was stuck and can't think of anything more than 
how much I need to upgrade the memory on the nodes. I try to find info about 
other ovirt virtualization set-ups and the amount of memory allocated so I can 
get an idea of what my set-up needs. The only thing that I found was that one 
admin had set ovirt up with 128GB and still needed more because of the growing 
needs of the system and its users and was about to upgrade its memory too. I'm 
just worried that ovirt is very memory consuming and no matter how much I will 
"feed" it, it will still ask for more. Also, I'm worried that there one, two or 
even more tweaks in the configurations that I still miss and they'd be able to 
solve the memory problem. 

Anyway, KSM is enabled. Sar shows that the committed memory when a Windows 10 
VM is active too (alongside Hosted Engine of course, and two Linux VMs - 1 
CentOS, 1 Debian) is around 89% in the specific host that it runs (together 
with the Debian VM) and has reached up to 98%.

You are correct about the monitoring system too. I have set up a PRTG 
environment and there's Nagios running but they can't yet see ovirt. I will set 
them up correctly the next few days.

I haven't made any changes to my tuned profile. it's the default from ovirt. 
Specifically, the active profile says it's set to virtual-host.

Again I'm very sorry for taking me so long to reply and thank you very much for 
your response. 

Best Regards,
Maria Souvalioti
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/G4YELWF5L4AKUT3OH4C4QJHHEEJPCI3G/


[ovirt-users] Hosted Engine Abruptly Stopped Responding - Unexpected Shutdown

2019-06-06 Thread souvaliotimaria
Hello, 

I came upon a problem the previous month that I figured it would be good to 
discuss here. I'm sorry I didn't post here earlier but time slipped me. 

I have set up a glustered, hyperconverged oVirt environment for experimental 
use as a means to see its  behaviour and get used to its management and 
performance before setting it up as a production environment for use in our 
organization. The environment is up and running since 2018 October. The three 
nodes are HP ProLiant DL380 G7 and have the following characteristics:

Mem: 22GB
CPU: 2x Hexa Core - Intel Xeon Hexa Core E56xx
HDD: 5x 300GB
Network: BCM5709C with dual-port Gigabit
OS: Linux RedHat 7.5.1804(Core 3.10.0-862.3.2.el7.x86_64 x86_64) - Ovirt Node 
4.2.3.1

As I was working on the environment, the engine stopped working.
Not long before the time the HE stopped, I was in the web interface managing my 
VMs, when the browser froze and the HE was also not responding to ICMP 
requests. 

The first thing I did was to connect via ssh to all nodes and run the command
#hosted-engine --vm-status 
which showed that the HE was down in nodes 1 and 2 and up on the 3rd node. 

After executing
#virsh -r list
the VM list that was shown contained two of the VMs I had previously created 
and were up; the HE was nowhere.

I tried to restart the HE with the
#hosted-engine --vm-start
but it didn't work.

I then put all nodes in maintenance mode with the command
#hosted-engine --set-maintenance --mode=global
(I guess I should have done that earlier) and re-run
#hosted-engine --vm-start
that had the same result as it previously did. 

After checking the mails the system sent to the root user, I saw there were 
several mails on the 3rd node (where the HE had been), informing of the HE's 
state. The messages were changing between EngineDown-EngineStart, 
EngineStart-EngineStarting, EngineStarting-EngineMaybeAway, 
EngineMaybeAway-EngineUnexpectedlyDown, EngineUnexpectedlyDown-EngineDown, 
EngineDown-EngineStart and so forth.

I continued by searching the following logs in all nodes :
/var/log/libvirt/qemu/HostedEngine.log
/var/log/libvirt/qemu/win10.log
/var/log/libvirt/qemu/DNStest.log
/var/log/vdsm/vdsm.log
/var/log/ovirt-hosted-engine-ha/agent.log

After that I spotted and error that had started appearing almost a month ago in 
node #2:
ERROR Internal server error Traceback (most recent call last): File 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in 
_handle_request res = method(**params) File 
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in 
_dynamicMethod result = fn(*methodArgs) File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 85, in 
logicalVolumeList return self._gluster.logicalVolumeList() File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper rv 
= func(*args, **kwargs) File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 808, in 
logicalVolumeList status = self.svdsmProxy.glusterLogicalVolumeList() File 
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in 
__call__ return callMethod() File 
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 52, in 
 getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, 
AttributeError: 'AutoProxy[instance]' object has no attribute 
'glusterLogicalVolumeList'


The outputs of the following commands were also checked as a way to see if 
there was a mandatory process missing/killed, a memory problem or even disk 
space shortage that led to the sudden death of a process
#ps -A
#top
#free -h
#df -hT

Finally, after some time delving in the logs, the output of the 
#journalctl --dmesg
showed the following message
"Out of memory: Kill process 5422 (qemu-kvm) score 514 or sacrifice child.
Killed process 5422 (qemu-kvm) total-vm:17526548kB, anon-rss:9310396kB,
file-rss:2336kB, shmem-rss:12kB"
which after that the ovirtmgmt started not responding.

I tried to restart the vhostd by executing
#/etc/rc.d/init.d/vhostmd start
but it didn't work. 

Finally, I decided to run the HE restart command on the other nodes as well 
(I'd figured that since the HE was last running on the node #3, that's where I 
should try to restart it). So, I run 
#hosted-engine --vm-start
and the output was 
"Command VM.getStats with args {'vmID':'...<το ID της HE>'} failed:
(code=1,message=Virtual machine does not exist: {'vmID':'...<το ID της
HE>'})"
And then I run the command again and the output was
"VM exists and its status is Powering Up."

After that I executed 
#virsh -r list
and the output was the following:
Id Name   State

2  HostedEngine  running

After the HE's restart two mails came that stated: 
ReinitializeFSMEngineStarting and EngineStarting-EngineUp

After that and after checking that we had access to the web interface again, we 
executed
hosted-engine --set-maintenance --mode=none
to get out of the