I would recommend asking also on glusterfs users mailing list about this. Il giorno mer 7 lug 2021 alle ore 21:09 Jiří Sléžka <[email protected]> ha scritto:
> Hello, > > I have 3 node HCI cluster with oVirt 4.4.6 and CentOS8. > > For time to time (I belive) random brick on random host goes down > because health-check. It looks like > > [root@ovirt-hci02 ~]# grep "posix_health_check" > /var/log/glusterfs/bricks/* > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 07:13:37.408184] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 07:13:37.408407] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 16:11:14.518971] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 16:11:14.519200] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > > on other host > > [root@ovirt-hci01 ~]# grep "posix_health_check" > /var/log/glusterfs/bricks/* > /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 > 13:15:51.983327] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 > 13:15:51.983728] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: > still alive! -> SIGTERM > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 > 01:53:35.769129] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 > 01:53:35.769819] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > > I cannot link these errors to any storage/fs issue (in dmesg or > /var/log/messages), brick devices looks healthy (smartd). > > I can force start brick with > > gluster volume start vms|engine force > > and after some healing all works fine for few days > > Did anybody observe this behavior? > > vms volume has this structure (two bricks per host, each is separate > JBOD ssd disk), engine volume has one brick on each host... > > gluster volume info vms > > Volume Name: vms > Type: Distributed-Replicate > Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: 10.0.4.11:/gluster_bricks/vms/vms > Brick2: 10.0.4.13:/gluster_bricks/vms/vms > Brick3: 10.0.4.12:/gluster_bricks/vms/vms > Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2 > Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2 > Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2 > Options Reconfigured: > cluster.granular-entry-heal: enable > performance.stat-prefetch: off > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > storage.fips-mode-rchecksum: on > nfs.disable: on > performance.client-io-threads: off > > Cheers, > > Jiri > _______________________________________________ > Users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/BPXG53NG34QKCABYJ35UYIWPNNWTKXW4/ > -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> [email protected] <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/BMIRJBLFWAVSTKFERBXKLV6KDD4UIZSD/

