Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Okay so for all files and dirs, node 2 seems to be the bad copy. Try the following: 1. On both node 1 and node3, set theafr xattr for dir10: setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001 /data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 2. Fuse mount the volume temporarily in some location and from that mount point, do a `find .|xargs stat >/dev/null` 3. Run `gluster volume heal $volname` HTH, Ravi On 11/16/2018 09:07 PM, mabi wrote: And finally here is the output of a getfattr from both files from the 3 nodes: FILE 1: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8 trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 FILE 2: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gluster snapshot & geo-replication
Hi all, I am using CentOS 7 and Gluster version 4.1.3 I am using thin LVM and creates snapshots once a day, of cause deleting the oldest ones after a while. Creating a snap fails every now and then with the following different errors: Error : Request timed out or failed: Brick ops failed on urd-gds-002. changelog notify failed (Where the server name are different hosts in the gluster cluster all the time) I have descovered that the log for snaps grows large, endlessly? The log: /var/log/glusterfs/snaps/urd-gds-volume/snapd.log I now of size 21G and continues to grow. I removed the file about 2 weeks ago and it was about the same size. Is this the way it should be? See a part of the log below. Second of all I have stopped the geo-replication as I never managed to make it work. Even when it is stopped and you try to pause geo-replication, you still get the respond: Geo-replication paused successfully. Should there be an error instead? Resuming gives an error: geo-replication command failed Geo-replication session between urd-gds-volume and geouser@urd-gds-geo-001::urd-gds-volume is not Paused. This is related to bug 1547446 https://bugzilla.redhat.com/show_bug.cgi?id=1547446 The fix should be present from 4.0 and onwards Should I report this in the same bug? Thanks alot! Best regards Marcus Pedersén /var/log/glusterfs/snaps/urd-gds-volume/snapd.log: [2018-11-13 18:51:16.498206] E [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: Invalid argument [2018-11-13 18:51:16.498752] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting connection from iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773 [2018-11-13 18:51:16.502120] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down connection iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773 [2018-11-13 18:51:16.589263] I [addr.c:55:compare_addr_and_update] 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.118" [2018-11-13 18:51:16.589324] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted client from iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735 (version: 3.13.1) [2018-11-13 18:51:16.593003] E [server-handshake.c:385:server_first_lookup] 0-snapd-urd-gds-volume: lookup on root failed: Permission denied [2018-11-13 18:51:16.593177] E [server-handshake.c:342:do_path_lookup] 0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission denied [2018-11-13 18:51:16.593206] E [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first lookup on subdir (/interbull/home) failed: Invalid argument [2018-11-13 18:51:16.593678] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting connection from iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735 [2018-11-13 18:51:16.597201] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down connection iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735 [root@urd-gds-001 ~]# tail -n 100 /var/log/glusterfs/snaps/urd-gds-volume/snapd.log [2018-11-13 18:52:09.782058] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting connection from iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767 [2018-11-13 18:52:09.785473] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down connection iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767 [2018-11-13 18:52:09.821147] I [addr.c:55:compare_addr_and_update] 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.115" [2018-11-13 18:52:09.821233] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted client from iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666 (version: 3.13.1) [2018-11-13 18:52:09.825173] E [server-handshake.c:385:server_first_lookup] 0-snapd-urd-gds-volume: lookup on root failed: Permission denied [2018-11-13 18:52:09.825397] E [server-handshake.c:342:do_path_lookup] 0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission denied [2018-11-13 18:52:09.825450] E [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: Invalid argument [2018-11-13 18:52:09.825917] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting connection from iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666 [2018-11-13 18:52:09.829403] I [MSGID:
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Friday, November 16, 2018 5:14 AM, Ravishankar N wrote: > Okay, as asked in the previous mail, please share the getfattr output > from all bricks for these 2 files. I think once we have this, we can try > either 'adjusting' the the gfid and symlinks on node 2 for dir11 and > oc_dir or see if we can set afr xattrs on dir10 for self-heal to purge > everything under it on node 2 and recreate it using the other 2 nodes. And finally here is the output of a getfattr from both files from the 3 nodes: FILE 1: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8 trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 FILE 2: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 Thanks again in advance for your answer. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Is it recommended for Glustereventsd be running on all nodes?
Hi, I get alerts from all nodes for PEER_DISCONNECT i.e., webhook config is reflected on all nodes. how to avoid this if at all we run glustereventsd on all nodes? On Thu, Nov 15, 2018, 7:42 PM Jeevan Patnaik Hi, > > And Gluster version is 3.12.5. > > Regards, > Jeevan. > > On Thu, Nov 15, 2018, 7:40 PM Jeevan Patnaik >> Hi All, >> >> I have implemented a webhook and attached to glustereventsd to listen to >> events and to send alerts on critical events. >> >> So, I categorized events manually critical, informational and warning. >> >> We are interested in only events that can cause issue to end users like >> BRICK_DISCONNECTED (reducing redundancy of the volume), QUORUM_LOST >> (possible downtime of subvoume), QUOTA_CROSSES_SOFTLIMIT, AFR_SPLIT_BRAIN >> etc. and not included any other events that are resulted while some one >> does admin tasks like PEER_ATTACH. >> >> And I see atleast some events are local to the node like PEER_ATTACH and >> doesn't appear from other gluster node. >> >> My idea is to run glustereventsd service only on a gluster admin node, to >> avoid possible load on the storage serving nodes due to traffic caused by >> webhook events. >> >> So, my question is are there any events local to node, which will be >> missed in admin node but are fatal to end users, assuming that the admin >> node will always be running? >> > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users