We ran in to this issue as well when trying to install Ovirt Hyperconverged.

The root issue is that kmod-kvdo in Centos 8 (and probably upstream) is built for a specific kernel and if you don't run that kernel it is not found. This is a major issue even if you match the kernel version then if the kernel is updated then your volume will fail because a rpm package for kmod-kvdo for that specific kernel would have to be built. It doesn't even declare a rpm dependency on the kernel version it works with.

I cloned the git repo from https://github.com/dm-vdo/kvdo and built a rpm from there, it uses dkms so it will build a module for the running kernel and if the kernel is updated then a new module for that version will be built. Works like a charm every time, but I haven't yet tried to run the hyperconverged wizard again.


Den 2021-01-11 kl. 21:32, skrev Charles Lam:
Dear Strahil and Ritesh,

Thank you both.  I am back where I started with:

"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing 
any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please 
execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal 

Regarding my most recent issue:

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: 
kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

Per Strahil's note, I checked for kvdo:

[r...@host1.tld.com conf.d]# rpm -qa | grep vdo
[r...@host1.tld.com conf.d]#

[r...@host2.tld.com conf.d]# rpm -qa | grep vdo
[r...@host2.tld.com conf.d]#

[r...@host3.tld.com ~]# rpm -qa | grep vdo
[r...@host3.tld.com ~]#

I found 
 which pointed to https://bugs.centos.org/view.php?id=17928.  As suggested on 
the CentOS bug tracker I attempted to manually install


but there was a dependency that kernel-core be greater than what I was 
installed, so I manually upgraded kernel-core to 
kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to


and installed vdo-support-  Upon clean-up and 
redeploy I am now back at Gluster deploy failing at

TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: 
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", 
"end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], 
"stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": 
"2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or 
more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", 
"end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], 
"stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}

NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
fmov1n1.sn.dtcorp.com      : ok=70   changed=29   unreachable=0    failed=1    
skipped=188  rescued=0    ignored=1
fmov1n2.sn.dtcorp.com      : ok=68   changed=27   unreachable=0    failed=0    
skipped=163  rescued=0    ignored=1
fmov1n3.sn.dtcorp.com      : ok=68   changed=27   unreachable=0    failed=0    
skipped=163  rescued=0    ignored=1

Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more 

I doubled-back to Strahil's recommendation to restart Gluster and enable 
granular-entry-heal.  This fails, example:

[root@host1 ~]# gluster volume heal data granular-entry-heal enable
One or more bricks could be down. Please execute the command again after 
bringing all bricks online and finishing any pending heals
Volume heal failed.

I have followed Ritesh's suggestion:

[root@host1~]# ansible-playbook 
 -i /etc/ansible/hc_wizard_inventory.yml

which appeared to execute successfully:

fmov1n1.sn.dtcorp.com      : ok=11   changed=2    unreachable=0    failed=0    
skipped=2    rescued=0    ignored=0
fmov1n2.sn.dtcorp.com      : ok=9    changed=1    unreachable=0    failed=0    
skipped=1    rescued=0    ignored=0
fmov1n3.sn.dtcorp.com      : ok=9    changed=1    unreachable=0    failed=0    
skipped=1    rescued=0    ignored=0

Here is the info Strahil requested when I first reported this issue on December 
18th, re-run today, January 11:

[root@host1 ~]# gluster pool list
UUID                                    Hostname                State
4964020a-9632-43eb-9468-798920e98559    host2.domain.com   Connected
f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b    host3.domain.com   Connected
6ba94e82-579c-4ae2-b3c5-bef339c6f795    localhost               Connected
[root@host1 ~]# gluster volume list
[root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster 
volume info $i; echo 
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick host1.domain.com:/gluster_bricks
/data/data                                  49153     0          Y       406272
Brick host2.domain.com:/gluster_bricks
/data/data                                  49153     0          Y       360300
Brick host3.domain.com:/gluster_bricks
/data/data                                  49153     0          Y       360082
Self-heal Daemon on localhost               N/A       N/A        Y       413227
Self-heal Daemon on host2.domain.com   N/A       N/A        Y       360223
Self-heal Daemon on host3.domain.com   N/A       N/A        Y       360003

Task Status of Volume data
There are no active volume tasks

Volume Name: data
Type: Replicate
Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: host1.domain.com:/gluster_bricks/data/data
Brick2: host2.domain.com:/gluster_bricks/data/data
Brick3: host3.domain.com:/gluster_bricks/data/data
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick host1.domain.com:/gluster_bricks
/engine/engine                              49152     0          Y       404563
Brick host2.domain.com:/gluster_bricks
/engine/engine                              49152     0          Y       360202
Brick host3.domain.com:/gluster_bricks
/engine/engine                              49152     0          Y       359982
Self-heal Daemon on localhost               N/A       N/A        Y       413227
Self-heal Daemon on host3.domain.com   N/A       N/A        Y       360003
Self-heal Daemon on host2.domain.com   N/A       N/A        Y       360223

Task Status of Volume engine
There are no active volume tasks

Volume Name: engine
Type: Replicate
Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: host1.domain.com:/gluster_bricks/engine/engine
Brick2: host2.domain.com:/gluster_bricks/engine/engine
Brick3: host3.domain.com:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick host1.domain.com:/gluster_bricks
/vmstore/vmstore                            49154     0          Y       407952
Brick host2.domain.com:/gluster_bricks
/vmstore/vmstore                            49154     0          Y       360389
Brick host3.domain.com:/gluster_bricks
/vmstore/vmstore                            49154     0          Y       360176
Self-heal Daemon on localhost               N/A       N/A        Y       413227
Self-heal Daemon on host2.domain.com   N/A       N/A        Y       360223
Self-heal Daemon on host3.domain.com   N/A       N/A        Y       360003

Task Status of Volume vmstore
There are no active volume tasks

Volume Name: vmstore
Type: Replicate
Volume ID: 27c8346c-0374-4108-a33a-0024007a9527
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore
Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore
Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
[root@host1 ~]#

Again, further suggestions for troubleshooting are VERY much appreciated!

Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
List Archives: 
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
List Archives: 

Reply via email to