[ovirt-users] Re: glusterfs health-check failed, (brick) going down

Sandro Bonazzola Wed, 07 Jul 2021 23:11:23 -0700

I would recommend asking also on glusterfs users mailing list about this.

Il giorno mer 7 lug 2021 alle ore 21:09 Jiří Sléžka <[email protected]> ha
scritto:


> Hello,
>
> I have 3 node HCI cluster with oVirt 4.4.6 and CentOS8.
>
> For time to time (I belive) random brick on random host goes down
> because health-check. It looks like
>
> [root@ovirt-hci02 ~]# grep "posix_health_check"
> /var/log/glusterfs/bricks/*
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
> 07:13:37.408184] M [MSGID: 113075]
> [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
> health-check failed, going down
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
> 07:13:37.408407] M [MSGID: 113075]
> [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
> alive! -> SIGTERM
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
> 16:11:14.518971] M [MSGID: 113075]
> [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
> health-check failed, going down
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
> 16:11:14.519200] M [MSGID: 113075]
> [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
> alive! -> SIGTERM
>
> on other host
>
> [root@ovirt-hci01 ~]# grep "posix_health_check"
> /var/log/glusterfs/bricks/*
> /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05
> 13:15:51.983327] M [MSGID: 113075]
> [posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix:
> health-check failed, going down
> /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05
> 13:15:51.983728] M [MSGID: 113075]
> [posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix:
> still alive! -> SIGTERM
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05
> 01:53:35.769129] M [MSGID: 113075]
> [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
> health-check failed, going down
> /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05
> 01:53:35.769819] M [MSGID: 113075]
> [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
> alive! -> SIGTERM
>
> I cannot link these errors to any storage/fs issue (in dmesg or
> /var/log/messages), brick devices looks healthy (smartd).
>
> I can force start brick with
>
> gluster volume start vms|engine force
>
> and after some healing all works fine for few days
>
> Did anybody observe this behavior?
>
> vms volume has this structure (two bricks per host, each is separate
> JBOD ssd disk), engine volume has one brick on each host...
>
> gluster volume info vms
>
> Volume Name: vms
> Type: Distributed-Replicate
> Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 3 = 6
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.4.11:/gluster_bricks/vms/vms
> Brick2: 10.0.4.13:/gluster_bricks/vms/vms
> Brick3: 10.0.4.12:/gluster_bricks/vms/vms
> Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2
> Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2
> Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2
> Options Reconfigured:
> cluster.granular-entry-heal: enable
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> user.cifs: off
> network.ping-timeout: 30
> network.remote-dio: off
> performance.strict-o-direct: on
> performance.low-prio-threads: 32
> features.shard: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> transport.address-family: inet
> storage.fips-mode-rchecksum: on
> nfs.disable: on
> performance.client-io-threads: off
>
> Cheers,
>
> Jiri
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/BPXG53NG34QKCABYJ35UYIWPNNWTKXW4/
>


-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

[email protected]
<https://www.redhat.com/>

*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/BMIRJBLFWAVSTKFERBXKLV6KDD4UIZSD/

[ovirt-users] Re: glusterfs health-check failed, (brick) going down

Reply via email to