[Gluster-users] set larger field width for status command

2020-03-02 Thread Brian Andrus

All,

A quick question:

how can I get the "Gluster process" field to be larger when doing a 
"gluster volume status" command?


It word-wraps that field so I end up with 2 lines for some bricks and 1 
for others depending on the length of the path to the brick or hostname...


Brian Andrus





Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Volume Creation - Best Practices

2018-08-24 Thread Brian Andrus
You can do that, but you could run into issues with the 'shared' 
remaining space. Any one of the volumes could eat up the space you 
planned on using in another volume. Not a huge issue, but could bite you.


I prefer to use ZFS for the flexibility. I create a RAIDZ pool and then 
separate zfs filesystems within that for each brick. I can reserve a 
specific amount of space in the pool for each brick and that can be 
modified as well.


It is easy to grow it too. Plus, configured right, zfs does parallel 
across all the disks, so you get speedup in performance.


Brian Andrus

On 8/24/2018 11:45 AM, Mark Connor wrote:
Wondering if there is a best practice for volume creation. I don't see 
this information in the documentation. For example.
I have a 10 node distribute-replicate setup with one large xfs 
filesystem mounted on each node.


Is it OK for me to have just one xfs filesystem mounted and use 
subdirectories for my bricks for multiple volume creation?
So I could have, lets say 10 different volumes but each using a 
brick as subdir on my single xfs filesystem on each node?

In other words multiple bricks on one xfs filesystem per node?
I create volumes on the fly and creating new filesystems for each node 
would be too much work.


Your thoughts?



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] issue with self-heal

2018-07-13 Thread Brian Andrus
You message means something (usually glusterfsd) is not running quite 
right or at all on one of the servers.


If you can tell which it is, you need to stop/restart glusterd and 
glusterfsd. Note: sometimes just stopping them doesn't really stop them. 
You need to do a killall -9  for glusterd, glusterfsd and anything else 
with "gluster"


Then just start glusterd and glusterfsd. Once they are up you should be 
able to do the heal.


If you can't tell which it is and are able to take gluster offline for 
users for a moment, do that process to all your brick servers.


Brian Andrus


On 7/13/2018 10:55 AM, hsafe wrote:


Hello Gluster community,

After several hundred GB of data writes (small image  100k  1M) 
into a replicated 2x glusterfs servers , I am facing issue with 
healing process. Earlier the heal info returned the bricks and nodes 
and the fact that there are no failed heal; but now it gets to the 
state with below message:


*# gluster volume heal gv1 info healed*

*Gathering list of heal failed entries on volume gv1 has been 
unsuccessful on bricks that are down. Please check if all brick 
processes are running.*


issuing the heal info command gives a log list of gfid info that takes 
like an hour to complete. The file data being images would not change 
and primarily served from 8x server mount native glusterfs.


Here is some insight on the status of the gluster, but how can I 
effectively do a successful heal on the storages cause last times 
trying to do that send the servers southway and irresponsive


*# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
performance.md-cache-timeout: 128
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 5
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on*

Appreciate your help.Thanks



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] transport endpoint not connected and sudden unmount

2018-06-27 Thread Brian Andrus

All,

I have a gluster filesystem (glusterfs-4.0.2-1, Type: 
Distributed-Replicate, Number of Bricks: 5 x 3 = 15)


I have one directory that is used for slurm statefiles, which seems to 
get out of sync fairly often. There are particular files that end up 
never healing.


Since the files are ephemeral, I'm ok with losing them (for now). 
Following some advice, I deleted UUID files that were in 
/GLUSTER/brick1/.glusterfs/indices/xattrop/


This makes gluster volume heal GDATA statistics heal-count show no 
issues, however the issue is still there. Even though nothing is showing 
up with gluster volume heal GDATA info, there are some files/directories 
that, if I try to access them at all, I get "Transport endpoint is not 
connected"
There is even a directory, which is empty but if I try to 'rmdir' it, I 
get "rmdir: failed to remove ‘/DATA/slurmstate.old/slurm/’: Software 
caused connection abort" and the mount goes bad. I have to umount/mount 
it to get it back.


There is a bit of info in the log file that has to do with the crash 
which is attached.


How do I clean this up? And what is the 'proper' way to handle when you 
have a file that will not heal even in a 3-way replicate?


Brian Andrus

[2018-06-27 14:16:00.075738] I [MSGID: 114046] 
[client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-12: Connected to 
GDATA-client-12, attached to remote volume '/GLUSTER/brick1'.
[2018-06-27 14:16:00.075755] I [MSGID: 108005] 
[afr-common.c:5081:__afr_handle_child_up_event] 0-GDATA-replicate-4: Subvolume 
'GDATA-client-12' came back up; going online.
[2018-06-27 14:16:00.076274] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-14: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.076468] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-14: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.076582] I [rpc-clnt.c:2071:rpc_clnt_reconfig] 
0-GDATA-client-14: changing port to 49152 (from 0)
[2018-06-27 14:16:00.076772] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-13: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.076922] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-13: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.077407] I [MSGID: 114046] 
[client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-13: Connected to 
GDATA-client-13, attached to remote volume '/GLUSTER/brick1'.
[2018-06-27 14:16:00.077422] I [MSGID: 108002] [afr-common.c:5378:afr_notify] 
0-GDATA-replicate-4: Client-quorum is met
[2018-06-27 14:16:00.079479] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-14: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.079723] W [rpc-clnt.c:1739:rpc_clnt_submit] 
0-GDATA-client-14: error returned while attempting to connect to host:(null), 
port:0
[2018-06-27 14:16:00.080249] I [MSGID: 114046] 
[client-handshake.c:1176:client_setvolume_cbk] 0-GDATA-client-14: Connected to 
GDATA-client-14, attached to remote volume '/GLUSTER/brick1'.
[2018-06-27 14:16:00.081176] I [fuse-bridge.c:4234:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-06-27 14:16:00.081196] I [fuse-bridge.c:4864:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2018-06-27 14:16:00.088870] I [MSGID: 109005] 
[dht-selfheal.c:2328:dht_selfheal_directory] 0-GDATA-dht: Directory selfheal 
failed: Unable to form layout for directory /
[2018-06-27 14:16:03.675890] W [MSGID: 108027] 
[afr-common.c:2255:afr_attempt_readsubvol_set] 0-GDATA-replicate-1: no read 
subvols for /slurmstate.old/slurm
[2018-06-27 14:16:03.675921] I [MSGID: 109063] 
[dht-layout.c:693:dht_layout_normalize] 0-GDATA-dht: Found anomalies in 
/slurmstate.old/slurm (gfid = ----). Holes=1 
overlaps=0
[2018-06-27 14:16:03.675936] W [MSGID: 109005] 
[dht-selfheal.c:2303:dht_selfheal_directory] 0-GDATA-dht: Directory selfheal 
failed: 1 subvolumes down.Not fixing. path = /slurmstate.old/slurm, gfid = 
8ed6a9e9-2820-40bd-8d9d-77b7f79c774
8
[2018-06-27 14:16:03.679061] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-GDATA-replicate-2: 
performing entry selfheal on 8ed6a9e9-2820-40bd-8d9d-77b7f79c7748
[2018-06-27 14:16:03.681899] W [MSGID: 108015] 
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-GDATA-replicate-2: 
expunging file 8ed6a9e9-2820-40bd-8d9d-77b7f79c7748/heartbeat 
(----) on GDATA-cli
ent-6
[2018-06-27 14:16:03.683080] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-GDATA-client-4: remote 
operation failed. Path: /slurmstate.old/slurm/qos_usage 
(848b3d5e-3492-4343-a1b2-a86cc975b3c2) [No data available
]
[2018-06-27 14:16:03.683624] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-GDATA-client-4: remote 
operation failed. Path: (null) (000

[Gluster-users] clean up of unclean files

2018-06-13 Thread Brian Andrus

All,

I have a 5x3 Distributed-Replicate filesystem that has a few entries 
that do not clean up when being healed.


I had tracked down what they were and since they were really just 
temp/expendable files, I moved the directory and recreated what was needed.


Now those files in the recreated directory cannot be deleted and they 
show up in the gluster volume heal  info output and never go away.


Examples below:

*/Brick brick5.internal:/GLUSTER/brick1/**/
/**/ - Is in split-brain/**/
/**/
/**/Status: Connected/**/
/**/Number of entries: 1/**/
/**/
/**/Brick brick6.internal:/GLUSTER/brick1/**/
/**//resv_state/**/
/**/ - Is in split-brain/**/
/**/
/**//node_state/**/
/**//job_state.old/**/
/**//node_state.old/**/
/**/Status: Connected/**/
/**/Number of entries: 5/**/
/*


So, how do I clean those up so they aren't showing up anywhere at all?

Brian Andrus

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster volume create failed: Host is not in 'Peer in Cluster' state

2018-05-22 Thread Brian Andrus

All,

Running glusterfs-4.0.2-1 on CentOS 7.5.1804

I have 10 servers running in a pool. All show as connected when I do 
gluster peer status and gluster pool list.


There is 1 volume running that is distributed on servers 1-5.

I try using a brick in server7 and it always gives me:
/volume create: GDATA: failed: Host server7 is not in 'Peer in Cluster' 
state/


Now that is even ON server7 with:
/gluster volume create GDATA transport tcp server7:/GLUSTER/brick1/

I have detached and re-probed the server. It seems all happy, but it 
will NOT allow any sort of volume to be created on it.


Any ideas out there?

Brian Andrus

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] replicate a distributed volume

2018-05-22 Thread Brian Andrus

All,

With Gluster 4.0.2, is it possible to take an existing distributed 
volume and turn it into a distributed-replicate by adding servers/bricks?


It seems this should be possible, but I don't know that anything has 
been done to get it there.


Brian Andrus


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Upgrade OS on Server node

2018-05-01 Thread Brian Andrus

All,

I have a Distributed-Replicate volume served by 10 servers (Number of 
Bricks: 5 x 2 = 10). They are currently running CentOS 6 and I want to 
upgrade them to CentOS 7.


I know there are several ways I could go about it, but I was wondering 
if there is a best-practice that alleviates down/rebuild time.


All the best,

Brian Andrus


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users