Hi Gluster Community

We have a PVE Proxmox cluster with two nodes. These two nodes each have 4 HDDs 
over which we have a glusterfs to migrate VMs live.

A few days ago we had the problem that some disk files in the glusterfs got 
into a split-brain condition. We were able to secure the corresponding logfiles 
and resolve the split brain condition, but don't know how it happened. In the 
appendix you can find the Glusterfs log files.

Maybe one of you can tell us what caused the problem:

Here is the network setup of the PVE Cluster

192.168.231.0/24 --> Serverlan (reach PVE Gui port 8006)
10.10.11.0 /24 --> Cluster Ha Lan
10.10.12.0 /24 --> Glusterfs Storage lan

Glusterfs Lan
.) PVEServer1 - 10.10.12.31
.) PVEServer2 - 10.10.12.32

What we've seen in the mnt-pve-GlusterVol01.log log file:
Server1:
[2019-05-13 04:25:01.509716] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 
0-glusterfsd: Fetching the volume file from server...

[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv 
on 10.10.12.31:24007 failed (No data available)

[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 
0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data 
available)

[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 
0-glusterfsd-mgmt: Exhausted all volfile servers

[2019-05-13 09:47:50.926948] W [glusterfsd.c:1327:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7fe58a1eb494] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55a8728115e5] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55a872811444] ) 0-: received 
signum (15), shutting down

[2019-05-13 09:47:50.926977] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting 
'/mnt/pve/GlusterVol01'.

[2019-05-13 09:47:50.950381] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: 
unmounting /mnt/pve/GlusterVol01

[2019-05-13 09:49:43.823117] I [MSGID: 100030] [glusterfsd.c:2454:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: 
/usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 
/mnt/pve/GlusterVol01)

[2019-05-13 09:49:43.828117] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1

[2019-05-13 09:49:43.869885] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 
0-vol0-replicate-0: quorum-type none overriding quorum-count 1

[2019-05-13 09:49:43.871644] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 2

[2019-05-13 09:49:43.880208] I [MSGID: 114020] [client.c:2356:notify] 
0-vol0-client-0: parent translators are ready, attempting connect on transport

[2019-05-13 09:49:43.880609] I [MSGID: 114020] [client.c:2356:notify] 
0-vol0-client-1: parent translators are ready, attempting connect on transport

[2019-05-13 09:49:43.880816] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 
0-vol0-client-0: changing port to 49155 (from 0)

Final graph:

+------------------------------------------------------------------------------+

1: volume vol0-client-0

2: type protocol/client

3: option ping-timeout 5

4: option remote-host pvetau01-storage

5: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0

6: option transport-type socket

7: option transport.address-family inet

8: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401

9: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252

10: option filter-O_DIRECT enable

11: option send-gids true

12: end-volume

13:

14: volume vol0-client-1

15: type protocol/client

16: option ping-timeout 5

17: option remote-host pvetau02-storage

18: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0

19: option transport-type socket

20: option transport.address-family inet

21: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401

22: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252

23: option filter-O_DIRECT enable

24: option send-gids true

25: end-volume

26:

27: volume vol0-replicate-0

28: type cluster/replicate

29: option eager-lock enable

30: option quorum-count 1

31: subvolumes vol0-client-0 vol0-client-1

32: end-volume

33:

34: volume vol0-dht

35: type cluster/distribute

36: option lock-migration off

37: subvolumes vol0-replicate-0

38: end-volume

39:

40: volume vol0-write-behind

41: type performance/write-behind

42: subvolumes vol0-dht

43: end-volume

44:

45: volume vol0-readdir-ahead

46: type performance/readdir-ahead

47: subvolumes vol0-write-behind

48: end-volume

49:

50: volume vol0-open-behind

51: type performance/open-behind

52: subvolumes vol0-readdir-ahead

53: end-volume

54:

55: volume vol0

56: type debug/io-stats

57: option log-level INFO

58: option latency-measurement off

59: option count-fop-hits off

60: subvolumes vol0-open-behind

61: end-volume

62:

63: volume meta-autoload

64: type meta

65: subvolumes vol0

66: end-volume

67:

+------------------------------------------------------------------------------+

[2019-05-13 09:49:43.881243] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:43.881434] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 
0-vol0-client-1: changing port to 49154 (from 0)

[2019-05-13 09:49:43.881906] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:43.882213] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to 
vol0-client-1, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:43.882222] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:43.882249] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 
0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.

[2019-05-13 09:49:43.882360] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk 
version = 1

[2019-05-13 09:49:43.886625] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to 
vol0-client-0, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:43.886633] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:43.890995] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk 
version = 1

[2019-05-13 09:49:43.891049] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26

[2019-05-13 09:49:43.891067] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: 
switched to graph 0

[2019-05-13 09:49:43.891625] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-0

[2019-05-13 10:20:38.998246] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-1: server 
10.10.12.32:49154 has not responded in the last 5 seconds, disconnecting.

[2019-05-13 10:20:38.998657] E [rpc-clnt.c:365:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] 
))))) 0-vol0-client-1: forced unwinding frame type(GlusterFS 3.3) 
op(LOOKUP(27)) called at 2019-05-13 10:20:33.237111 (xid=0x492)

[2019-05-13 10:20:38.998681] W [MSGID: 114031] 
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-1: remote operation 
failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is 
not connected]

[2019-05-13 10:20:38.998829] E [rpc-clnt.c:365:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] 
))))) 0-vol0-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called 
at 2019-05-13 10:20:33.237115 (xid=0x493)

[2019-05-13 10:20:38.998843] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 
0-vol0-client-1: socket disconnected

[2019-05-13 10:20:38.998854] I [MSGID: 114018] 
[client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from 
vol0-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available

[2019-05-13 10:20:43.355917] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-0

[2019-05-13 10:21:20.850030] E [socket.c:2309:socket_connect_finish] 
0-vol0-client-1: connection to 10.10.12.32:24007 failed (No route to host)

[2019-05-13 10:22:07.026615] E [MSGID: 114058] 
[client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-1: failed to 
get the port number for remote subvolume. Please run 'gluster volume status' on 
server to see if brick process is running.

[2019-05-13 10:22:07.026663] I [MSGID: 114018] 
[client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from 
vol0-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available

[2019-05-13 10:22:10.010421] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 
0-vol0-client-1: changing port to 49154 (from 0)

[2019-05-13 10:22:10.011105] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:10.011558] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to 
vol0-client-1, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:10.011609] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:10.011622] I [MSGID: 114042] 
[client-handshake.c:1054:client_post_handshake] 0-vol0-client-1: 2 fds open - 
Delaying child_up until they are re-opened

[2019-05-13 10:22:10.032258] I [MSGID: 114041] 
[client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-1: last fd 
open'd/lock-self-heal'd - notifying CHILD-UP

[2019-05-13 10:22:10.032492] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk 
version = 1

[2019-05-13 10:22:13.790586] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-0

[2019-05-13 11:12:57.300347] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 11:12:57.305284] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)

[2019-05-13 11:12:57.305712] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 11:12:57.306277] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 11:12:57.306938] I [MSGID: 114024] 
[client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-0: 
/images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying 
duplicate remote fd set.

[2019-05-13 11:12:57.306973] I [MSGID: 114024] 
[client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-1: 
/images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying 
duplicate remote fd set.

[2019-05-13 11:12:57.310052] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 2698: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f69d1cba184 (Input/output error)

[2019-05-13 11:12:57.310137] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 2697: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f69d1cba184 (Input/output error)

[2019-05-13 11:12:57.311543] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 2699: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f69d1cba184 (Input/output error)

The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 
0-vol0-replicate-0: Failing FGETXATTR on gfid 
5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output 
error]" repeated 2 times between [2019-05-13 11:12:57.305712] and [2019-05-13 
11:12:57.310816]

The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 
0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between 
[2019-05-13 11:12:57.306277] and [2019-05-13 11:12:57.311184]

The message "W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 
0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for 
gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)" repeated 6 
times between [2019-05-13 11:12:57.305284] and [2019-05-13 11:12:57.311274]

The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 
0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: 
split-brain observed. [Input/output error]" repeated 5 times between 
[2019-05-13 11:12:57.300347] and [2019-05-13 11:12:57.311531]



Server 2:
[2019-05-13 04:25:01.338790] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 
0-glusterfsd: Fetching the volume file from server...

[2019-05-13 09:47:59.443328] E [socket.c:2309:socket_connect_finish] 
0-glusterfs: connection to 10.10.12.31:24007 failed (Connection refused)

[2019-05-13 09:48:17.426580] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-0: server 
10.10.12.31:49155 has not responded in the last 5 seconds, disconnecting.

[2019-05-13 09:48:17.426872] E [rpc-clnt.c:365:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] 
))))) 0-vol0-client-0: forced unwinding frame type(GlusterFS 3.3) 
op(LOOKUP(27)) called at 2019-05-13 09:48:12.180579 (xid=0x5663a4)

[2019-05-13 09:48:17.426899] W [MSGID: 114031] 
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-0: remote operation 
failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is 
not connected]

[2019-05-13 09:48:17.427056] E [rpc-clnt.c:365:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9]
 (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] 
))))) 0-vol0-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called 
at 2019-05-13 09:48:12.180591 (xid=0x5663a5)

[2019-05-13 09:48:17.427067] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 
0-vol0-client-0: socket disconnected

[2019-05-13 09:48:17.427077] I [MSGID: 114018] 
[client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from 
vol0-client-0. Client process will keep trying to connect to glusterd until 
brick's port is available

[2019-05-13 09:48:21.479100] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-1

[2019-05-13 09:48:59.219302] E [socket.c:2309:socket_connect_finish] 
0-vol0-client-0: connection to 10.10.12.31:24007 failed (No route to host)

[2019-05-13 09:49:41.468469] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing

[2019-05-13 09:49:42.505174] E [MSGID: 114058] 
[client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-0: failed to 
get the port number for remote subvolume. Please run 'gluster volume status' on 
server to see if brick process is running.

[2019-05-13 09:49:42.505225] I [MSGID: 114018] 
[client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from 
vol0-client-0. Client process will keep trying to connect to glusterd until 
brick's port is available

[2019-05-13 09:49:45.442003] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 
0-vol0-client-0: changing port to 49155 (from 0)

[2019-05-13 09:49:45.442523] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:45.442802] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to 
vol0-client-0, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:45.442812] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:45.442820] I [MSGID: 114042] 
[client-handshake.c:1054:client_post_handshake] 0-vol0-client-0: 2 fds open - 
Delaying child_up until they are re-opened

[2019-05-13 09:49:45.443244] I [MSGID: 114041] 
[client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-0: last fd 
open'd/lock-self-heal'd - notifying CHILD-UP

[2019-05-13 09:49:45.443353] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk 
version = 1

[2019-05-13 09:49:49.622255] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-1

[2019-05-13 10:20:06.060045] W [glusterfsd.c:1327:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7efebc254494] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55dba7a3b5e5] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55dba7a3b444] ) 0-: received 
signum (15), shutting down

[2019-05-13 10:20:06.068969] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting 
'/mnt/pve/GlusterVol01'.

[2019-05-13 10:20:06.103235] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: 
unmounting /mnt/pve/GlusterVol01

[2019-05-13 10:22:08.842734] I [MSGID: 100030] [glusterfsd.c:2454:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: 
/usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 
/mnt/pve/GlusterVol01)

[2019-05-13 10:22:08.853935] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1

[2019-05-13 10:22:08.944855] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 
0-vol0-replicate-0: quorum-type none overriding quorum-count 1

[2019-05-13 10:22:08.946502] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 2

[2019-05-13 10:22:08.972020] I [MSGID: 114020] [client.c:2356:notify] 
0-vol0-client-0: parent translators are ready, attempting connect on transport

[2019-05-13 10:22:08.972395] I [MSGID: 114020] [client.c:2356:notify] 
0-vol0-client-1: parent translators are ready, attempting connect on transport


[2019-05-13 10:22:08.972832] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 
0-vol0-client-0: changing port to 49155 (from 0)

[2019-05-13 10:22:08.973142] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:08.973231] I [MSGID: 114057] 
[client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:08.973544] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to 
vol0-client-1, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:08.973544] I [MSGID: 114046] 
[client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to 
vol0-client-0, attached to remote volume 
'/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:08.973566] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:08.973567] I [MSGID: 114047] 
[client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and 
Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:08.973616] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 
0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.

[2019-05-13 10:22:08.973639] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk 
version = 1

[2019-05-13 10:22:08.977940] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk 
version = 1

[2019-05-13 10:22:08.978055] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26

[2019-05-13 10:22:08.978075] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: 
switched to graph 0

[2019-05-13 10:22:08.978603] I [MSGID: 108031] 
[afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local 
read_child vol0-client-1

[2019-05-13 10:53:46.573894] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.573992] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)

[2019-05-13 10:53:46.574253] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.574949] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 10:53:46.575526] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1380: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.577820] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1381: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.596838] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.597759] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. 
(Possible split-brain)

[2019-05-13 10:53:46.598916] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 
0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between 
[2019-05-13 10:53:46.574949] and [2019-05-13 10:53:46.599257]

[2019-05-13 10:53:46.599525] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. 
(Possible split-brain)

[2019-05-13 10:53:46.599797] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.599825] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1389: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.599876] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. 
(Possible split-brain)

[2019-05-13 10:53:46.600149] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.600193] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. 
(Possible split-brain)

[2019-05-13 10:53:46.600417] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.600775] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 10:53:46.601071] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. 
(Possible split-brain)

[2019-05-13 10:53:46.601537] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.601577] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1390: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.619830] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.620701] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. 
(Possible split-brain)

[2019-05-13 10:53:46.621098] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.621455] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 10:53:46.621732] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. 
(Possible split-brain)


[2019-05-13 10:53:46.623509] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. 
[Input/output error]


[2019-05-13 10:53:46.624891] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.625212] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 10:53:46.625314] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. 
(Possible split-brain)

[2019-05-13 10:53:46.625721] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. 
[Input/output error]

[2019-05-13 10:53:46.625754] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1399: READ => -1 gfid=79423c92-0338-4dc9-bafc-091172e8d845 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.576286] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]


[2019-05-13 10:56:28.176786] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:56:28.177684] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)

[2019-05-13 10:56:28.178782] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:56:28.179128] W [MSGID: 108027] 
[afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for 
(null)

[2019-05-13 10:56:28.180634] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1533: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:56:28.179439] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)

[2019-05-13 10:56:28.180620] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:59:25.278595] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ 
on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]

[2019-05-13 10:59:25.279517] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)

[2019-05-13 10:59:25.280605] E [MSGID: 108008] 
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing 
FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. 
[Input/output error]


[2019-05-13 10:59:25.281649] W [fuse-bridge.c:2228:fuse_readv_cbk] 
0-glusterfs-fuse: 1685: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 
fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:59:25.281250] W [MSGID: 108008] 
[afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 
found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. 
(Possible split-brain)
-------------------------------------------------


What we can't explain is why server 1 does the following:

[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv 
on 10.10.12.31:24007 failed (No data available)

[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 
0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data 
available)

[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 
0-glusterfsd-mgmt: Exhausted all volfile servers


then the volume will be unmounted and re-mounted with another port again.
In further consequence server 2 behaves exactly like this which consequences in 
a a split-brain condition of the disk files of the VMs.

we would be glad if someone could explain these behaviors to us.

BR
René
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to