Re: [Gluster-users] split-brain errors under heavy load when one brick down

2019-09-18 Thread Erik Jacobson
Thank you for replying! > Okay so 0-cm_shared-replicate-1 means these 3 bricks: > > Brick4: 172.23.0.6:/data/brick_cm_shared > Brick5: 172.23.0.7:/data/brick_cm_shared > Brick6: 172.23.0.8:/data/brick_cm_shared The above is correct. > Were there any pending self-heals for this volume? Is it

Re: [Gluster-users] split-brain errors under heavy load when one brick down

2019-09-17 Thread Ravishankar N
On 16/09/19 7:34 pm, Erik Jacobson wrote: Example errors: ex1 [2019-09-06 18:26:42.665050] E [MSGID: 108008] [afr-read-txn.c:123:afr_read_txn_refresh_done] 0-cm_shared-replicate-1: Failing ACCESS on gfid ee3f5646-9368-4151-92a3-5b8e7db1fbf9: split-brain observed. [Input/output error] Okay

[Gluster-users] split-brain errors under heavy load when one brick down

2019-09-16 Thread Erik Jacobson
Hello all. I'm new to the list but not to gluster. We are using gluster to service NFS boot on a top500 cluster. It is a Distributed-Replicate volume 3x9. We are having a problem when one server in a subvolume goes down, we get random missing files and split-brain errors in the nfs.log file. We