[Gluster-users] set larger field width for status command
All, A quick question: how can I get the "Gluster process" field to be larger when doing a "gluster volume status" command? It word-wraps that field so I end up with 2 lines for some bricks and 1 for others depending on the length of the path to the brick or hostname... Brian Andrus Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Volume Creation - Best Practices
You can do that, but you could run into issues with the 'shared' remaining space. Any one of the volumes could eat up the space you planned on using in another volume. Not a huge issue, but could bite you. I prefer to use ZFS for the flexibility. I create a RAIDZ pool and then separate zfs filesystems within that for each brick. I can reserve a specific amount of space in the pool for each brick and that can be modified as well. It is easy to grow it too. Plus, configured right, zfs does parallel across all the disks, so you get speedup in performance. Brian Andrus On 8/24/2018 11:45 AM, Mark Connor wrote: Wondering if there is a best practice for volume creation. I don't see this information in the documentation. For example. I have a 10 node distribute-replicate setup with one large xfs filesystem mounted on each node. Is it OK for me to have just one xfs filesystem mounted and use subdirectories for my bricks for multiple volume creation? So I could have, lets say 10 different volumes but each using a brick as subdir on my single xfs filesystem on each node? In other words multiple bricks on one xfs filesystem per node? I create volumes on the fly and creating new filesystems for each node would be too much work. Your thoughts? ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] issue with self-heal
You message means something (usually glusterfsd) is not running quite right or at all on one of the servers. If you can tell which it is, you need to stop/restart glusterd and glusterfsd. Note: sometimes just stopping them doesn't really stop them. You need to do a killall -9 for glusterd, glusterfsd and anything else with "gluster" Then just start glusterd and glusterfsd. Once they are up you should be able to do the heal. If you can't tell which it is and are able to take gluster offline for users for a moment, do that process to all your brick servers. Brian Andrus On 7/13/2018 10:55 AM, hsafe wrote: Hello Gluster community, After several hundred GB of data writes (small image 100k 1M) into a replicated 2x glusterfs servers , I am facing issue with healing process. Earlier the heal info returned the bricks and nodes and the fact that there are no failed heal; but now it gets to the state with below message: *# gluster volume heal gv1 info healed* *Gathering list of heal failed entries on volume gv1 has been unsuccessful on bricks that are down. Please check if all brick processes are running.* issuing the heal info command gives a log list of gfid info that takes like an hour to complete. The file data being images would not change and primarily served from 8x server mount native glusterfs. Here is some insight on the status of the gluster, but how can I effectively do a successful heal on the storages cause last times trying to do that send the servers southway and irresponsive *# gluster volume info Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: performance.md-cache-timeout: 128 cluster.background-self-heal-count: 32 server.statedump-path: /tmp performance.readdir-ahead: on nfs.disable: true network.inode-lru-limit: 5 features.bitrot: off features.scrub: Inactive performance.cache-max-file-size: 16MB client.event-threads: 8 cluster.eager-lock: on* Appreciate your help.Thanks ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] transport endpoint not connected and sudden unmount
All, I have a gluster filesystem (glusterfs-4.0.2-1, Type: Distributed-Replicate, Number of Bricks: 5 x 3 = 15) I have one directory that is used for slurm statefiles, which seems to get out of sync fairly often. There are particular files that end up never healing. Since the files are ephemeral, I'm ok with losing them (for now). Following some advice, I deleted UUID files that were in /GLUSTER/brick1/.glusterfs/indices/xattrop/ This makes gluster volume heal GDATA statistics heal-count show no issues, however the issue is still there. Even though nothing is showing up with gluster volume heal GDATA info, there are some files/directories that, if I try to access them at all, I get "Transport endpoint is not connected" There is even a directory, which is empty but if I try to 'rmdir' it, I get "rmdir: failed to remove ‘/DATA/slurmstate.old/slurm/’: Software caused connection abort" and the mount goes bad. I have to umount/mount it to get it back. There is a bit of info in the log file that has to do with the crash which is attached. How do I clean this up? And what is the 'proper' way to handle when you have a file that will not heal even in a 3-way replicate? Brian Andrus [2018-06-27 14:16:00.075738] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-12: Connected to GDATA-client-12, attached to remote volume '/GLUSTER/brick1'. [2018-06-27 14:16:00.075755] I [MSGID: 108005] [afr-common.c:5081:__afr_handle_child_up_event] 0-GDATA-replicate-4: Subvolume 'GDATA-client-12' came back up; going online. [2018-06-27 14:16:00.076274] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-14: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.076468] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-14: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.076582] I [rpc-clnt.c:2071:rpc_clnt_reconfig] 0-GDATA-client-14: changing port to 49152 (from 0) [2018-06-27 14:16:00.076772] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-13: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.076922] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-13: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.077407] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-13: Connected to GDATA-client-13, attached to remote volume '/GLUSTER/brick1'. [2018-06-27 14:16:00.077422] I [MSGID: 108002] [afr-common.c:5378:afr_notify] 0-GDATA-replicate-4: Client-quorum is met [2018-06-27 14:16:00.079479] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-14: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.079723] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-GDATA-client-14: error returned while attempting to connect to host:(null), port:0 [2018-06-27 14:16:00.080249] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-14: Connected to GDATA-client-14, attached to remote volume '/GLUSTER/brick1'. [2018-06-27 14:16:00.081176] I [fuse-bridge.c:4234:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2018-06-27 14:16:00.081196] I [fuse-bridge.c:4864:fuse_graph_sync] 0-fuse: switched to graph 0 [2018-06-27 14:16:00.088870] I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-GDATA-dht: Directory selfheal failed: Unable to form layout for directory / [2018-06-27 14:16:03.675890] W [MSGID: 108027] [afr-common.c:2255:afr_attempt_readsubvol_set] 0-GDATA-replicate-1: no read subvols for /slurmstate.old/slurm [2018-06-27 14:16:03.675921] I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-GDATA-dht: Found anomalies in /slurmstate.old/slurm (gfid = ----). Holes=1 overlaps=0 [2018-06-27 14:16:03.675936] W [MSGID: 109005] [dht-selfheal.c:2303:dht_selfheal_directory] 0-GDATA-dht: Directory selfheal failed: 1 subvolumes down.Not fixing. path = /slurmstate.old/slurm, gfid = 8ed6a9e9-2820-40bd-8d9d-77b7f79c774 8 [2018-06-27 14:16:03.679061] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-GDATA-replicate-2: performing entry selfheal on 8ed6a9e9-2820-40bd-8d9d-77b7f79c7748 [2018-06-27 14:16:03.681899] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-GDATA-replicate-2: expunging file 8ed6a9e9-2820-40bd-8d9d-77b7f79c7748/heartbeat (----) on GDATA-cli ent-6 [2018-06-27 14:16:03.683080] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-GDATA-client-4: remote operation failed. Path: /slurmstate.old/slurm/qos_usage (848b3d5e-3492-4343-a1b2-a86cc975b3c2) [No data available ] [2018-06-27 14:16:03.683624] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-GDATA-client-4: remote operation failed. Path: (null) (000
[Gluster-users] clean up of unclean files
All, I have a 5x3 Distributed-Replicate filesystem that has a few entries that do not clean up when being healed. I had tracked down what they were and since they were really just temp/expendable files, I moved the directory and recreated what was needed. Now those files in the recreated directory cannot be deleted and they show up in the gluster volume heal info output and never go away. Examples below: */Brick brick5.internal:/GLUSTER/brick1/**/ /**/ - Is in split-brain/**/ /**/ /**/Status: Connected/**/ /**/Number of entries: 1/**/ /**/ /**/Brick brick6.internal:/GLUSTER/brick1/**/ /**//resv_state/**/ /**/ - Is in split-brain/**/ /**/ /**//node_state/**/ /**//job_state.old/**/ /**//node_state.old/**/ /**/Status: Connected/**/ /**/Number of entries: 5/**/ /* So, how do I clean those up so they aren't showing up anywhere at all? Brian Andrus ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] gluster volume create failed: Host is not in 'Peer in Cluster' state
All, Running glusterfs-4.0.2-1 on CentOS 7.5.1804 I have 10 servers running in a pool. All show as connected when I do gluster peer status and gluster pool list. There is 1 volume running that is distributed on servers 1-5. I try using a brick in server7 and it always gives me: /volume create: GDATA: failed: Host server7 is not in 'Peer in Cluster' state/ Now that is even ON server7 with: /gluster volume create GDATA transport tcp server7:/GLUSTER/brick1/ I have detached and re-probed the server. It seems all happy, but it will NOT allow any sort of volume to be created on it. Any ideas out there? Brian Andrus ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] replicate a distributed volume
All, With Gluster 4.0.2, is it possible to take an existing distributed volume and turn it into a distributed-replicate by adding servers/bricks? It seems this should be possible, but I don't know that anything has been done to get it there. Brian Andrus ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Upgrade OS on Server node
All, I have a Distributed-Replicate volume served by 10 servers (Number of Bricks: 5 x 2 = 10). They are currently running CentOS 6 and I want to upgrade them to CentOS 7. I know there are several ways I could go about it, but I was wondering if there is a best-practice that alleviates down/rebuild time. All the best, Brian Andrus ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users