[Gluster-users] Hosted VM Pause when one node of gluster goes down

2018-04-23 Thread rwecker
HI, 

I have a 3 node hyper-converged cluster running glusterfs with 3 replica 1 
arbiter volumes. When I Shutdown 1 node i am having problems with high load 
VM's pausing due to storage error.what areas should i look in to get this to 
work? 



Russell Wecker 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Stale File handle

2018-02-20 Thread rwecker
Hello, 

I have a file in my gluster volume that has the following when i try to access 
it (ls) 

ls: cannot access 37019600-c34e-4d10-8829-ac08cb141f19.meta: Stale file handle 
37019600-c34e-4d10-8829-ac08cb141f19 37019600-c34e-4d10-8829-ac08cb141f19.lease 
37019600-c34e-4d10-8829-ac08cb141f19.meta 

when i look at gluster volume heal VMData info i get the following 

Brick found1.ssd.org:/data/brick1/data 
/8d4b29ee-16c9-4bb9-a7db-937c9da6805d/images/a01b2883-73b1-4d4f-a338-44afcfce57a6/37019600-c34e-4d10-8829-ac08cb141f19
 
Status: Connected 
Number of entries: 1 

Brick found2.ssd.org:/data/brick1/data 
Status: Connected 
Number of entries: 0 

Brick found3.ssd.org:/data/brick1/arbit 
/8d4b29ee-16c9-4bb9-a7db-937c9da6805d/images/a01b2883-73b1-4d4f-a338-44afcfce57a6/37019600-c34e-4d10-8829-ac08cb141f19
 
Status: Connected 
Number of entries: 1 

the entries never sync! if i stop the volume and examine the file on the brick 
itself the file is fine but i cannot get gluster to accept the file on disk as 
authoritative. any ideas how to fix this 

Using gluster 3.12.6 

Thanks 

Russell Wecker 
IT Director 
Southern Asia-Pacific Division 
San Miguel II 
Bypass Silang, Silang, Cavite 
4118 Philippines 
Phone: +63 46 414 4000 x 5210 
Cell: +63 917 595 6395 
URL: [ http://ssd.adventist.asia/ | http://ssd.adventist.asia ] 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Split brain

2018-02-19 Thread rwecker
Hi, 

I am having a problem with a split-brain issue that does not seem to match up 
with documentation on how to solve it. 

gluster volume heal VMData2 info gives: 


Brick found2.ssd.org:/data/brick6/data 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc.meta
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ce2831a4-3e82-4bf8-bb68-82afa0c401fe
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images - Is in split-brain 

/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ee8e0fd3-cf68-4f2a-9d0d-f7ee665cc8c3
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/a4bea667-c65d-4085-9b90-896bf7fc55ff
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/4981580d-628f-4266-8e7e-f7a0bcae2dbb
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/2ee209b4-3e21-47fc-8342-033a37605d65
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md/xleases 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/28d32853-5c89-4c5f-8fb7-76fe0f88ff1f
 
Status: Connected 
Number of entries: 12 

Brick found3.ssd.org:/data/brick6/data 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc.meta
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ce2831a4-3e82-4bf8-bb68-82afa0c401fe
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images - Is in split-brain 

/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ee8e0fd3-cf68-4f2a-9d0d-f7ee665cc8c3
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/a4bea667-c65d-4085-9b90-896bf7fc55ff
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/4981580d-628f-4266-8e7e-f7a0bcae2dbb
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/2ee209b4-3e21-47fc-8342-033a37605d65
 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md/xleases 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/28d32853-5c89-4c5f-8fb7-76fe0f88ff1f
 
Status: Connected 
Number of entries: 12 

luster volume heal VMData2 info split-brain gives: 

Brick found2.ssd.org:/data/brick6/data 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images 
Status: Connected 
Number of entries in split-brain: 1 

Brick found3.ssd.org:/data/brick6/data 
/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images 
Status: Connected 
Number of entries in split-brain: 1 

on found3 running getfattr -d -m . hex 
/data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images gives: 

getfattr: hex: No such file or directory 
getfattr: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images: No such 
file or directory 
[root@found3 ~]# getfattr -d -m . hex 
/data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images 
getfattr: hex: No such file or directory 
getfattr: Removing leading '/' from absolute path names 
# file: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images 
security.selinux="system_u:object_r:default_t:s0" 
trusted.afr.VMData2-client-6=0sAQAG 
trusted.gfid=0sK8ZFxmThRxeq7pYw7QTOCw== 
trusted.glusterfs.dht=0sAQBVqQ== 

on Found2 running getfattr -d -m . hex 
/data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images gives: 

getfattr: hex: No such file or directory 
getfattr: Removing leading '/' from absolute path names 
# file: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images 
security.selinux="system_u:object_r:default_t:s0" 
trusted.afr.VMData2-client-6=0sAQAG 
trusted.afr.dirty=0s 
trusted.gfid=0sK8ZFxmThRxeq7pYw7QTOCw== 
trusted.glusterfs.dht=0sAQBVqQ== 

the only difference being the trusted.afr.dirty item in found2 which in not in 
found3. 

Any help would be appreciated. 




Russell Wecker 
IT Director 
Southern Asia-Pacific Division 
San Miguel II 
Bypass Silang, Silang, Cavite 
4118 Philippines 
Phone: +63 46 414 4000 x 5210 
Cell: +63 917 595 6395 
URL: [ http://ssd.adventist.asia/ | http://ssd.adventist.asia ] 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade from 3.8.15 to 3.12.5

2018-02-19 Thread rwecker
Thanks That Fixed both issuses 

Russell Wecker 
IT Director 
Southern Asia Pacific Division 


From: "Atin Mukherjee" <amukh...@redhat.com> 
To: "rwecker" <rwec...@ssd.org> 
Cc: "gluster-users" <gluster-users@gluster.org> 
Sent: Monday, February 19, 2018 4:51:56 PM 
Subject: Re: [Gluster-users] Upgrade from 3.8.15 to 3.12.5 

I believe the peer rejected issue is something we recently identified and has 
been fixed through [ https://bugzilla.redhat.com/show_bug.cgi?id=1544637 | 
https://bugzilla.redhat.com/show_bug.cgi?id=1544637 ] and is available in 
3.12.6. I'd request you to upgrade to the latest version in 3.12 series. 

On Mon, Feb 19, 2018 at 12:27 PM, < [ mailto:rwec...@ssd.org | rwec...@ssd.org 
] > wrote: 



Hi, 

I have a 3 node cluster (Found1, Found2, Found2) which i wanted to upgrade I 
upgraded one node from 3.8.15 to 3.12.5 and now i am having multiple problems 
with the install. The 2 nodes not upgraded are still working fine(Found1,2) but 
the one upgraded has Peer Rejected (Connected) when peer status is run but it 
also has multiple brick that have "Transport endpoint is not connected" some 
brick seem to work some do not. 

any help would be appreciated. 

Thanks 


here are the log files 


glusterd.log 
[2018-02-19 05:32:38.589150] I [MSGID: 106478] [glusterd.c:1423:init] 
0-management: Maximum allowed open file descriptors set to 65536 
[2018-02-19 05:32:38.589237] I [MSGID: 106479] [glusterd.c:1481:init] 
0-management: Using /var/lib/glusterd as working directory 
[2018-02-19 05:32:38.589264] I [MSGID: 106479] [glusterd.c:1486:init] 
0-management: Using /var/run/gluster as pid file working directory 
[2018-02-19 05:32:38.609833] W [MSGID: 103071] 
[rdma.c:4630:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device] 
[2018-02-19 05:32:38.609892] W [MSGID: 103055] [rdma.c:4939:init] 
0-rdma.management: Failed to initialize IB Device 
[2018-02-19 05:32:38.609919] W [rpc-transport.c:350:rpc_transport_load] 
0-rpc-transport: 'rdma' initialization failed 
[2018-02-19 05:32:38.610149] W [rpcsvc.c:1682:rpcsvc_create_listener] 
0-rpc-service: cannot create listener, initing the transport failed 
[2018-02-19 05:32:38.610178] E [MSGID: 106243] [glusterd.c:1769:init] 
0-management: creation of 1 listeners failed, continuing with succeeded 
transport 
[2018-02-19 05:32:49.737152] I [MSGID: 106513] 
[glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved 
op-version: 30712 
[2018-02-19 05:32:50.248992] I [MSGID: 106498] 
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: 
connect returned 0 
[2018-02-19 05:32:50.249097] I [MSGID: 106498] 
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: 
connect returned 0 
[2018-02-19 05:32:50.249161] W [MSGID: 106062] 
[glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: 
Failed to get tcp-user-timeout 
[2018-02-19 05:32:50.249206] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600 
[2018-02-19 05:32:50.249327] W [MSGID: 101002] [options.c:995:xl_opt_validate] 
0-management: option 'address-family' is deprecated, preferred is 
'transport.address-family', continuing with correction 
[2018-02-19 05:32:50.254789] W [MSGID: 106062] 
[glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: 
Failed to get tcp-user-timeout 
[2018-02-19 05:32:50.254831] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600 
[2018-02-19 05:32:50.254908] W [MSGID: 101002] [options.c:995:xl_opt_validate] 
0-management: option 'address-family' is deprecated, preferred is 
'transport.address-family', continuing with correction 
[2018-02-19 05:32:50.258683] I [MSGID: 106544] 
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 
de955a28-c230-4ada-98ba-a8f404ee8827 
Final graph: 
+--+
 
1: volume management 
2: type mgmt/glusterd 
3: option rpc-auth.auth-glusterfs on 
4: option rpc-auth.auth-unix on 
5: option rpc-auth.auth-null on 
6: option rpc-auth-allow-insecure on 
7: option transport.listen-backlog 10 
8: option event-threads 1 
9: option ping-timeout 0 
10: option transport.socket.read-fail-log off 
11: option transport.socket.keepalive-interval 2 
12: option transport.socket.keepalive-time 10 
13: option transport-type rdma 
14: option working-directory /var/lib/glusterd 
15: end-volume 
16: 
+--+
 
[2018-02-19 05:32:50.259384] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1 
[2018-02-19 05:32:50.284115] I [MSGID: 106163] 
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 30712 
[2018-02-19 05:32:50.285320] I [MSGID: 106493

[Gluster-users] Upgrade from 3.8.15 to 3.12.5

2018-02-18 Thread rwecker
Hi, 

I have a 3 node cluster (Found1, Found2, Found2) which i wanted to upgrade I 
upgraded one node from 3.8.15 to 3.12.5 and now i am having multiple problems 
with the install. The 2 nodes not upgraded are still working fine(Found1,2) but 
the one upgraded has Peer Rejected (Connected) when peer status is run but it 
also has multiple brick that have "Transport endpoint is not connected" some 
brick seem to work some do not. 

any help would be appreciated. 

Thanks 


here are the log files 


glusterd.log 
[2018-02-19 05:32:38.589150] I [MSGID: 106478] [glusterd.c:1423:init] 
0-management: Maximum allowed open file descriptors set to 65536 
[2018-02-19 05:32:38.589237] I [MSGID: 106479] [glusterd.c:1481:init] 
0-management: Using /var/lib/glusterd as working directory 
[2018-02-19 05:32:38.589264] I [MSGID: 106479] [glusterd.c:1486:init] 
0-management: Using /var/run/gluster as pid file working directory 
[2018-02-19 05:32:38.609833] W [MSGID: 103071] 
[rdma.c:4630:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device] 
[2018-02-19 05:32:38.609892] W [MSGID: 103055] [rdma.c:4939:init] 
0-rdma.management: Failed to initialize IB Device 
[2018-02-19 05:32:38.609919] W [rpc-transport.c:350:rpc_transport_load] 
0-rpc-transport: 'rdma' initialization failed 
[2018-02-19 05:32:38.610149] W [rpcsvc.c:1682:rpcsvc_create_listener] 
0-rpc-service: cannot create listener, initing the transport failed 
[2018-02-19 05:32:38.610178] E [MSGID: 106243] [glusterd.c:1769:init] 
0-management: creation of 1 listeners failed, continuing with succeeded 
transport 
[2018-02-19 05:32:49.737152] I [MSGID: 106513] 
[glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved 
op-version: 30712 
[2018-02-19 05:32:50.248992] I [MSGID: 106498] 
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: 
connect returned 0 
[2018-02-19 05:32:50.249097] I [MSGID: 106498] 
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: 
connect returned 0 
[2018-02-19 05:32:50.249161] W [MSGID: 106062] 
[glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: 
Failed to get tcp-user-timeout 
[2018-02-19 05:32:50.249206] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600 
[2018-02-19 05:32:50.249327] W [MSGID: 101002] [options.c:995:xl_opt_validate] 
0-management: option 'address-family' is deprecated, preferred is 
'transport.address-family', continuing with correction 
[2018-02-19 05:32:50.254789] W [MSGID: 106062] 
[glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: 
Failed to get tcp-user-timeout 
[2018-02-19 05:32:50.254831] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600 
[2018-02-19 05:32:50.254908] W [MSGID: 101002] [options.c:995:xl_opt_validate] 
0-management: option 'address-family' is deprecated, preferred is 
'transport.address-family', continuing with correction 
[2018-02-19 05:32:50.258683] I [MSGID: 106544] 
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 
de955a28-c230-4ada-98ba-a8f404ee8827 
Final graph: 
+--+
 
1: volume management 
2: type mgmt/glusterd 
3: option rpc-auth.auth-glusterfs on 
4: option rpc-auth.auth-unix on 
5: option rpc-auth.auth-null on 
6: option rpc-auth-allow-insecure on 
7: option transport.listen-backlog 10 
8: option event-threads 1 
9: option ping-timeout 0 
10: option transport.socket.read-fail-log off 
11: option transport.socket.keepalive-interval 2 
12: option transport.socket.keepalive-time 10 
13: option transport-type rdma 
14: option working-directory /var/lib/glusterd 
15: end-volume 
16: 
+--+
 
[2018-02-19 05:32:50.259384] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1 
[2018-02-19 05:32:50.284115] I [MSGID: 106163] 
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 30712 
[2018-02-19 05:32:50.285320] I [MSGID: 106493] 
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT 
from uuid: a23fa00c-4c7c-436d-9d04-0c16941c, host: found2.ssd.org, port: 0 
[2018-02-19 05:32:50.286561] I [MSGID: 106493] 
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT 
from uuid: b9fb5e3b-b638-4495-afee-36b465aea4e7, host: found1.ssd.org, port: 0 
[2018-02-19 05:32:50.296816] I [MSGID: 106490] 
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: a23fa00c-4c7c-436d-9d04-0c16941c 
[2018-02-19 05:32:50.298392] E [MSGID: 106010] 
[glusterd-utils.c:3374:glusterd_compare_friend_volume] 0-management: Version of 
Cksums VMData differ. local cksum = 1127272657, remote cksum = 3816303263 on 
peer found2.ssd.org 
[2018-02-19