[Gluster-users] Hosted VM Pause when one node of gluster goes down
HI, I have a 3 node hyper-converged cluster running glusterfs with 3 replica 1 arbiter volumes. When I Shutdown 1 node i am having problems with high load VM's pausing due to storage error.what areas should i look in to get this to work? Russell Wecker ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Stale File handle
Hello, I have a file in my gluster volume that has the following when i try to access it (ls) ls: cannot access 37019600-c34e-4d10-8829-ac08cb141f19.meta: Stale file handle 37019600-c34e-4d10-8829-ac08cb141f19 37019600-c34e-4d10-8829-ac08cb141f19.lease 37019600-c34e-4d10-8829-ac08cb141f19.meta when i look at gluster volume heal VMData info i get the following Brick found1.ssd.org:/data/brick1/data /8d4b29ee-16c9-4bb9-a7db-937c9da6805d/images/a01b2883-73b1-4d4f-a338-44afcfce57a6/37019600-c34e-4d10-8829-ac08cb141f19 Status: Connected Number of entries: 1 Brick found2.ssd.org:/data/brick1/data Status: Connected Number of entries: 0 Brick found3.ssd.org:/data/brick1/arbit /8d4b29ee-16c9-4bb9-a7db-937c9da6805d/images/a01b2883-73b1-4d4f-a338-44afcfce57a6/37019600-c34e-4d10-8829-ac08cb141f19 Status: Connected Number of entries: 1 the entries never sync! if i stop the volume and examine the file on the brick itself the file is fine but i cannot get gluster to accept the file on disk as authoritative. any ideas how to fix this Using gluster 3.12.6 Thanks Russell Wecker IT Director Southern Asia-Pacific Division San Miguel II Bypass Silang, Silang, Cavite 4118 Philippines Phone: +63 46 414 4000 x 5210 Cell: +63 917 595 6395 URL: [ http://ssd.adventist.asia/ | http://ssd.adventist.asia ] ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Split brain
Hi, I am having a problem with a split-brain issue that does not seem to match up with documentation on how to solve it. gluster volume heal VMData2 info gives: Brick found2.ssd.org:/data/brick6/data /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc.meta /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ce2831a4-3e82-4bf8-bb68-82afa0c401fe /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images - Is in split-brain /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ee8e0fd3-cf68-4f2a-9d0d-f7ee665cc8c3 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/a4bea667-c65d-4085-9b90-896bf7fc55ff /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/4981580d-628f-4266-8e7e-f7a0bcae2dbb /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/2ee209b4-3e21-47fc-8342-033a37605d65 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md/xleases /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/28d32853-5c89-4c5f-8fb7-76fe0f88ff1f Status: Connected Number of entries: 12 Brick found3.ssd.org:/data/brick6/data /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45/82a7027b-321c-4bd9-8afc-2a12cfa23bfc.meta /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/d1164a9b-ba63-46c4-a9ec-76ea4a7a2c45 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ce2831a4-3e82-4bf8-bb68-82afa0c401fe /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images - Is in split-brain /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/ee8e0fd3-cf68-4f2a-9d0d-f7ee665cc8c3 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/a4bea667-c65d-4085-9b90-896bf7fc55ff /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/4981580d-628f-4266-8e7e-f7a0bcae2dbb /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/2ee209b4-3e21-47fc-8342-033a37605d65 /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md/xleases /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/dom_md /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images/28d32853-5c89-4c5f-8fb7-76fe0f88ff1f Status: Connected Number of entries: 12 luster volume heal VMData2 info split-brain gives: Brick found2.ssd.org:/data/brick6/data /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images Status: Connected Number of entries in split-brain: 1 Brick found3.ssd.org:/data/brick6/data /08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images Status: Connected Number of entries in split-brain: 1 on found3 running getfattr -d -m . hex /data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images gives: getfattr: hex: No such file or directory getfattr: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images: No such file or directory [root@found3 ~]# getfattr -d -m . hex /data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images getfattr: hex: No such file or directory getfattr: Removing leading '/' from absolute path names # file: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images security.selinux="system_u:object_r:default_t:s0" trusted.afr.VMData2-client-6=0sAQAG trusted.gfid=0sK8ZFxmThRxeq7pYw7QTOCw== trusted.glusterfs.dht=0sAQBVqQ== on Found2 running getfattr -d -m . hex /data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images gives: getfattr: hex: No such file or directory getfattr: Removing leading '/' from absolute path names # file: data/brick6/data/08aa5fc4-c9ba-4fcf-af57-72450b875d1a/images security.selinux="system_u:object_r:default_t:s0" trusted.afr.VMData2-client-6=0sAQAG trusted.afr.dirty=0s trusted.gfid=0sK8ZFxmThRxeq7pYw7QTOCw== trusted.glusterfs.dht=0sAQBVqQ== the only difference being the trusted.afr.dirty item in found2 which in not in found3. Any help would be appreciated. Russell Wecker IT Director Southern Asia-Pacific Division San Miguel II Bypass Silang, Silang, Cavite 4118 Philippines Phone: +63 46 414 4000 x 5210 Cell: +63 917 595 6395 URL: [ http://ssd.adventist.asia/ | http://ssd.adventist.asia ] ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 3.8.15 to 3.12.5
Thanks That Fixed both issuses Russell Wecker IT Director Southern Asia Pacific Division From: "Atin Mukherjee" <amukh...@redhat.com> To: "rwecker" <rwec...@ssd.org> Cc: "gluster-users" <gluster-users@gluster.org> Sent: Monday, February 19, 2018 4:51:56 PM Subject: Re: [Gluster-users] Upgrade from 3.8.15 to 3.12.5 I believe the peer rejected issue is something we recently identified and has been fixed through [ https://bugzilla.redhat.com/show_bug.cgi?id=1544637 | https://bugzilla.redhat.com/show_bug.cgi?id=1544637 ] and is available in 3.12.6. I'd request you to upgrade to the latest version in 3.12 series. On Mon, Feb 19, 2018 at 12:27 PM, < [ mailto:rwec...@ssd.org | rwec...@ssd.org ] > wrote: Hi, I have a 3 node cluster (Found1, Found2, Found2) which i wanted to upgrade I upgraded one node from 3.8.15 to 3.12.5 and now i am having multiple problems with the install. The 2 nodes not upgraded are still working fine(Found1,2) but the one upgraded has Peer Rejected (Connected) when peer status is run but it also has multiple brick that have "Transport endpoint is not connected" some brick seem to work some do not. any help would be appreciated. Thanks here are the log files glusterd.log [2018-02-19 05:32:38.589150] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-02-19 05:32:38.589237] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-02-19 05:32:38.589264] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-02-19 05:32:38.609833] W [MSGID: 103071] [rdma.c:4630:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2018-02-19 05:32:38.609892] W [MSGID: 103055] [rdma.c:4939:init] 0-rdma.management: Failed to initialize IB Device [2018-02-19 05:32:38.609919] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2018-02-19 05:32:38.610149] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-02-19 05:32:38.610178] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-02-19 05:32:49.737152] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712 [2018-02-19 05:32:50.248992] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-02-19 05:32:50.249097] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-02-19 05:32:50.249161] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-02-19 05:32:50.249206] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-02-19 05:32:50.249327] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-02-19 05:32:50.254789] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-02-19 05:32:50.254831] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-02-19 05:32:50.254908] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-02-19 05:32:50.258683] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: de955a28-c230-4ada-98ba-a8f404ee8827 Final graph: +--+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +--+ [2018-02-19 05:32:50.259384] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-02-19 05:32:50.284115] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30712 [2018-02-19 05:32:50.285320] I [MSGID: 106493
[Gluster-users] Upgrade from 3.8.15 to 3.12.5
Hi, I have a 3 node cluster (Found1, Found2, Found2) which i wanted to upgrade I upgraded one node from 3.8.15 to 3.12.5 and now i am having multiple problems with the install. The 2 nodes not upgraded are still working fine(Found1,2) but the one upgraded has Peer Rejected (Connected) when peer status is run but it also has multiple brick that have "Transport endpoint is not connected" some brick seem to work some do not. any help would be appreciated. Thanks here are the log files glusterd.log [2018-02-19 05:32:38.589150] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-02-19 05:32:38.589237] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-02-19 05:32:38.589264] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-02-19 05:32:38.609833] W [MSGID: 103071] [rdma.c:4630:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2018-02-19 05:32:38.609892] W [MSGID: 103055] [rdma.c:4939:init] 0-rdma.management: Failed to initialize IB Device [2018-02-19 05:32:38.609919] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2018-02-19 05:32:38.610149] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-02-19 05:32:38.610178] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-02-19 05:32:49.737152] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712 [2018-02-19 05:32:50.248992] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-02-19 05:32:50.249097] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-02-19 05:32:50.249161] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-02-19 05:32:50.249206] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-02-19 05:32:50.249327] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-02-19 05:32:50.254789] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-02-19 05:32:50.254831] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-02-19 05:32:50.254908] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-02-19 05:32:50.258683] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: de955a28-c230-4ada-98ba-a8f404ee8827 Final graph: +--+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +--+ [2018-02-19 05:32:50.259384] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-02-19 05:32:50.284115] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30712 [2018-02-19 05:32:50.285320] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: a23fa00c-4c7c-436d-9d04-0c16941c, host: found2.ssd.org, port: 0 [2018-02-19 05:32:50.286561] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: b9fb5e3b-b638-4495-afee-36b465aea4e7, host: found1.ssd.org, port: 0 [2018-02-19 05:32:50.296816] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: a23fa00c-4c7c-436d-9d04-0c16941c [2018-02-19 05:32:50.298392] E [MSGID: 106010] [glusterd-utils.c:3374:glusterd_compare_friend_volume] 0-management: Version of Cksums VMData differ. local cksum = 1127272657, remote cksum = 3816303263 on peer found2.ssd.org [2018-02-19