Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-16 Thread Ravishankar N
Okay so for all files and dirs, node 2 seems to be the bad copy. Try the 
following:


1. On both node 1 and node3, set theafr xattr for dir10:
setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001 
/data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10


2. Fuse mount the volume temporarily in some location and from that 
mount point, do a `find .|xargs stat >/dev/null`


3. Run `gluster volume heal $volname`

HTH,
Ravi

On 11/16/2018 09:07 PM, mabi wrote:

And finally here is the output of a getfattr from both files from the 3 nodes:

FILE 1: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8
trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579


FILE 2: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace
trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster snapshot & geo-replication

2018-11-16 Thread Marcus Pedersén
Hi all,

I am using CentOS 7 and Gluster version 4.1.3


I am using thin LVM and creates snapshots once a day, of cause deleting the 
oldest ones after a while.

Creating a snap fails every now and then with the following different errors:

Error : Request timed out

or

failed: Brick ops failed on urd-gds-002. changelog notify failed

(Where the server name are different hosts in the gluster cluster all the time)


I have descovered that the log for snaps grows large, endlessly?

The log:

/var/log/glusterfs/snaps/urd-gds-volume/snapd.log

I now of size 21G and continues to grow.

I removed the file about 2 weeks ago and it was about the same size.

Is this the way it should be?

See a part of the log below.




Second of all I have stopped the geo-replication as I never managed to make it 
work.

Even when it is stopped and you try to pause geo-replication, you still get the 
respond:

Geo-replication paused successfully.

Should there be an error instead?


Resuming gives an error:

geo-replication command failed
Geo-replication session between urd-gds-volume and 
geouser@urd-gds-geo-001::urd-gds-volume is not Paused.


This is related to bug 1547446

https://bugzilla.redhat.com/show_bug.cgi?id=1547446

The fix should be present from 4.0 and onwards

Should I report this in the same bug?


Thanks alot!


Best regards

Marcus Pedersén


/var/log/glusterfs/snaps/urd-gds-volume/snapd.log:

[2018-11-13 18:51:16.498206] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: 
Invalid argument
[2018-11-13 18:51:16.498752] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
[2018-11-13 18:51:16.502120] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
[2018-11-13 18:51:16.589263] I [addr.c:55:compare_addr_and_update] 
0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.118"
[2018-11-13 18:51:16.589324] I [MSGID: 115029] 
[server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted 
client from 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
 (version: 3.13.1)
[2018-11-13 18:51:16.593003] E [server-handshake.c:385:server_first_lookup] 
0-snapd-urd-gds-volume: lookup on root failed: Permission denied
[2018-11-13 18:51:16.593177] E [server-handshake.c:342:do_path_lookup] 
0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission 
denied
[2018-11-13 18:51:16.593206] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/home) failed: 
Invalid argument
[2018-11-13 18:51:16.593678] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
[2018-11-13 18:51:16.597201] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
[root@urd-gds-001 ~]# tail -n 100 
/var/log/glusterfs/snaps/urd-gds-volume/snapd.log
[2018-11-13 18:52:09.782058] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
[2018-11-13 18:52:09.785473] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
[2018-11-13 18:52:09.821147] I [addr.c:55:compare_addr_and_update] 
0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.115"
[2018-11-13 18:52:09.821233] I [MSGID: 115029] 
[server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted 
client from 
iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666
 (version: 3.13.1)
[2018-11-13 18:52:09.825173] E [server-handshake.c:385:server_first_lookup] 
0-snapd-urd-gds-volume: lookup on root failed: Permission denied
[2018-11-13 18:52:09.825397] E [server-handshake.c:342:do_path_lookup] 
0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission 
denied
[2018-11-13 18:52:09.825450] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: 
Invalid argument
[2018-11-13 18:52:09.825917] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666
[2018-11-13 18:52:09.829403] I [MSGID: 

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-16 Thread mabi
‐‐‐ Original Message ‐‐‐
On Friday, November 16, 2018 5:14 AM, Ravishankar N  
wrote:

> Okay, as asked in the previous mail, please share the getfattr output
> from all bricks for these 2 files. I think once we have this, we can try
> either 'adjusting' the the gfid and symlinks on node 2 for dir11 and
> oc_dir or see if we can set afr xattrs on dir10 for self-heal to purge
> everything under it on node 2 and recreate it using the other 2 nodes.

And finally here is the output of a getfattr from both files from the 3 nodes:

FILE 1: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8
trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579


FILE 2: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace
trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579

Thanks again in advance for your answer.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is it recommended for Glustereventsd be running on all nodes?

2018-11-16 Thread Jeevan Patnaik
Hi,

I get alerts from all nodes for PEER_DISCONNECT i.e., webhook config is
reflected on all nodes.

how to avoid this if at all we run glustereventsd on all nodes?

On Thu, Nov 15, 2018, 7:42 PM Jeevan Patnaik  Hi,
>
> And Gluster version is 3.12.5.
>
> Regards,
> Jeevan.
>
> On Thu, Nov 15, 2018, 7:40 PM Jeevan Patnaik 
>> Hi All,
>>
>> I have implemented a webhook and attached to glustereventsd to listen to
>> events and to send alerts on critical events.
>>
>> So, I categorized events manually critical, informational and warning.
>>
>> We are interested in only events that can cause issue to end users like
>> BRICK_DISCONNECTED (reducing redundancy of the volume), QUORUM_LOST
>> (possible downtime of subvoume), QUOTA_CROSSES_SOFTLIMIT, AFR_SPLIT_BRAIN
>> etc. and not included any other events that are resulted while some one
>> does admin tasks like PEER_ATTACH.
>>
>> And I see atleast some events are local to the node like PEER_ATTACH  and
>> doesn't appear from other gluster node.
>>
>> My idea is to run glustereventsd service only on a gluster admin node, to
>> avoid possible load on the storage serving nodes due to traffic caused by
>> webhook events.
>>
>> So, my question is are there any events local to node, which will be
>> missed in admin node but are fatal to end users, assuming that the admin
>> node will always be running?
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users