Dear Strahil and Ritesh, Thank you both. I am back where I started with:
"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."] Regarding my most recent issue: "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" Per Strahil's note, I checked for kvdo: [r...@host1.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [r...@host1.tld.com conf.d]# [r...@host2.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [r...@host2.tld.com conf.d]# [r...@host3.tld.com ~]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [r...@host3.tld.com ~]# I found https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-creating-vdo-kernel-module-kvdo-not-installed which pointed to https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS bug tracker I attempted to manually install vdo-support-6.2.4.14-14.el8.x86_64 vdo-6.2.4.14-14.el8.x86_64 kmod-kvdo-6.2.3.91-73.el8.x86_64 but there was a dependency that kernel-core be greater than what I was installed, so I manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to vdo-6.2.4.14-14.el8.x86_64.rpm kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now back at Gluster deploy failing at TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188 rescued=0 ignored=1 fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more informations. I doubled-back to Strahil's recommendation to restart Gluster and enable granular-entry-heal. This fails, example: [root@host1 ~]# gluster volume heal data granular-entry-heal enable One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals Volume heal failed. I have followed Ritesh's suggestion: [root@host1~]# ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml which appeared to execute successfully: PLAY RECAP ********************************************************************************************************** fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 Here is the info Strahil requested when I first reported this issue on December 18th, re-run today, January 11: [root@host1 ~]# gluster pool list UUID Hostname State 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected [root@host1 ~]# gluster volume list data engine vmstore [root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "###########################################################################################################";done Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /data/data 49153 0 Y 406272 Brick host2.domain.com:/gluster_bricks /data/data 49153 0 Y 360300 Brick host3.domain.com:/gluster_bricks /data/data 49153 0 Y 360082 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: data Type: Replicate Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/data/data Brick2: host2.domain.com:/gluster_bricks/data/data Brick3: host3.domain.com:/gluster_bricks/data/data Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /engine/engine 49152 0 Y 404563 Brick host2.domain.com:/gluster_bricks /engine/engine 49152 0 Y 360202 Brick host3.domain.com:/gluster_bricks /engine/engine 49152 0 Y 359982 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: engine Type: Replicate Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/engine/engine Brick2: host2.domain.com:/gluster_bricks/engine/engine Brick3: host3.domain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 407952 Brick host2.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360389 Brick host3.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360176 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Task Status of Volume vmstore ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: vmstore Type: Replicate Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### [root@host1 ~]# Again, further suggestions for troubleshooting are VERY much appreciated! Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQSXFS2CRWGRF4HNIR4YDX6K/