Re: [Gluster-users] remove-brick failure on distributed with 5.6

2019-05-24 Thread Nithya Balachandran
Hi Brandon,

Please send the following:

1. the gluster volume info
2. Information about which brick was removed
3. The rebalance log file for all nodes hosting removed bricks.

Regards,
Nithya


On Fri, 24 May 2019 at 19:33, Ravishankar N  wrote:

> Adding a few DHT folks for some possible suggestions.
>
> -Ravi
> On 23/05/19 11:15 PM, bran...@thinkhuge.net wrote:
>
> Does anyone know what should be done on a glusterfs v5.6 "gluster volume
> remove-brick" operation that fails?  I'm trying to remove 1 of 8
> distributed smaller nodes for replacement with larger node.
>
>
>
> The "gluster volume remove-brick ... status" command reports status failed
> and failures = "3"
>
>
>
> cat /var/log/glusterfs/volbackups-rebalance.log
>
> ...
>
> [2019-05-23 16:43:37.442283] I [MSGID: 109028]
> [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is
> failed. Time taken is 545.00 secs
>
>
>
> All servers are confirmed in good communications and updated and freshly
> rebooted and retried the remove-brick few times with fail each time
>
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] remove-brick failure on distributed with 5.6

2019-05-24 Thread Ravishankar N

Adding a few DHT folks for some possible suggestions.

-Ravi

On 23/05/19 11:15 PM, bran...@thinkhuge.net wrote:


Does anyone know what should be done on a glusterfs v5.6 "gluster 
volume remove-brick" operation that fails?  I'm trying to remove 1 of 
8 distributed smaller nodes for replacement with larger node.


The "gluster volume remove-brick ... status" command reports status 
failed and failures = "3"


cat /var/log/glusterfs/volbackups-rebalance.log

...

[2019-05-23 16:43:37.442283] I [MSGID: 109028] 
[dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: 
Rebalance is failed. Time taken is 545.00 secs


All servers are confirmed in good communications and updated and 
freshly rebooted and retried the remove-brick few times with fail each 
time



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-24 Thread Ravishankar N


On 23/05/19 2:40 AM, Alan Orth wrote:

Dear list,

I seem to have gotten into a tricky situation. Today I brought up a 
shiny new server with new disk arrays and attempted to replace one 
brick of a replica 2 distribute/replicate volume on an older server 
using the `replace-brick` command:


# gluster volume replace-brick homes wingu0:/mnt/gluster/homes 
wingu06:/data/glusterfs/sdb/homes commit force


The command was successful and I see the new brick in the output of 
`gluster volume info`. The problem is that Gluster doesn't seem to be 
migrating the data,


`replace-brick` definitely must heal (not migrate) the data. In your 
case, data must have been healed from Brick-4 to the replaced Brick-3. 
Are there any errors in the self-heal daemon logs of Brick-4's node? 
Does Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit 
out of date. replace-brick command internally does all the setfattr 
steps that are mentioned in the doc.


-Ravi


and now the original brick that I replaced is no longer part of the 
volume (and a few terabytes of data are just sitting on the old brick):


# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes

I see the Gluster docs have a more complicated procedure for replacing 
bricks that involves getfattr/setfattr¹. How can I tell Gluster about 
the old brick? I see that I have a backup of the old volfile thanks to 
yum's rpmsave function if that helps.


We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can 
give.


¹ 
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick


--
Alan Orth
alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] add-brick: failed: Commit failed

2019-05-24 Thread Ravishankar N

Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:

Hi Ravi,

Please see the log attached.
When I grep -E "Connected to |disconnected from" 
gvol0-add-brick-mount.log,  I don't see a "Connected to gvol0-client-1". 
It looks like this temporary mount is not able to connect to the 2nd 
brick, which is why the lookup is failing due to lack of quorum.
The output of "gluster volume status" is as follows. Should there be 
something listening on gfs3? I'm not sure whether it having TCP Port 
and Pid as N/A is a symptom or cause. Thank you.


# gluster volume status
Status of volume: gvol0
Gluster process TCP Port  RDMA Port  
Online  Pid

--
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y   7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y   7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A    N   N/A


Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 
0x00010001 /nodirectwritedata/gluster/gvol0` on *both* 
gfs1 and gfs2.


2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. If 
it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 
to see why.


Thanks,

Ravi



Self-heal Daemon on localhost   N/A N/A    Y   19853
Self-heal Daemon on gfs1    N/A N/A    Y   28600
Self-heal Daemon on gfs2    N/A N/A    Y   17614

Task Status of Volume gvol0
--
There are no active volume tasks


On Wed, 22 May 2019 at 18:06, Ravishankar N > wrote:


If you are trying this again, please 'gluster volume set $volname
client-log-level DEBUG`before attempting the add-brick and attach
the gvol0-add-brick-mount.log here. After that, you can change the
client-log-level back to INFO.

-Ravi

On 22/05/19 11:32 AM, Ravishankar N wrote:



On 22/05/19 11:23 AM, David Cunningham wrote:

Hi Ravi,

I'd already done exactly that before, where step 3 was a simple
'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another
suggestion on what the cleanup or reformat should be?

`rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
David. Basically, '/nodirectwritedata/gluster/gvol0' must be
empty and must not have any extended attributes set on it. Why
fuse_first_lookup() is failing is a bit of a mystery to me at
this point. :-(
Regards,
Ravi


Thank you.


On Wed, 22 May 2019 at 13:56, Ravishankar N
mailto:ravishan...@redhat.com>> wrote:

Hmm, so the volume info seems to indicate that the add-brick
was successful but the gfid xattr is missing on the new
brick (as are the actual files, barring the .glusterfs
folder, according to your previous mail).

Do you want to try removing and adding it again?

1. `gluster volume remove-brick gvol0 replica 2
gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1

2. Check that gluster volume info is now back to a 1x2
volume on all nodes and `gluster peer status` is  connected
on all nodes.

3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
gfs3.

4. `gluster volume add-brick gvol0 replica 3 arbiter 1
gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.

5. Check that the files are getting healed on to the new brick.

Thanks,
Ravi
On 22/05/19 6:50 AM, David Cunningham wrote:

Hi Ravi,

Certainly. On the existing two nodes:

gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x
trusted.afr.gvol0-client-2=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x
trusted.afr.gvol0-client-0=0x
trusted.afr.gvol0-client-2=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

On the new node:

gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolu