Re: [Gluster-users] gfid entries failing to heal

2022-03-25 Thread Collin Strassburger
Thank you for the suggestion!

I mounted the volume on a new mount point, retrieved the gfid of another file, 
and was able to obtain the file information through Method2.

However, when I used the gfids listed in heal, I get a “transport endpoint is 
not connected” error, despite them showing as “connected” in heal info (and 
volume status).

Thanks,
Collin

From: Strahil Nikolov 
Sent: Friday, March 25, 2022 1:47 PM
To: Collin Strassburger ; gluster-users@gluster.org
Subject: Re: [Gluster-users] gfid entries failing to heal


CAUTION - EXTERNAL EMAIL: Do not click any links or open any attachments unless 
you trust the sender and know the content is safe.
To find gfid path, you can use 
https://docs.gluster.org/en/main/Troubleshooting/gfid-to-path/

Usually, I prefer to mount and then use Method2 to retrieve the path.

Then, you can getfattr the file/dir to get a clue.

Best Regards,
Strahil Nikolov
On Fri, Mar 25, 2022 at 18:51, Collin Strassburger
mailto:cstrassbur...@bihrle.com>> wrote:

Hello,



I am having a problem with a replica 3 volume.



When I run: gluster volume heal hydra_pbs_vol info

It returns:

Brick hydra1:/data/glusterfs/PBS/NonMountDir



Status: Connected

Number of entries: 1



Brick hydra2:/data/glusterfs/PBS/NonMountDir









Status: Connected

Number of entries: 4



Brick viz1:/data/glusterfs/PBS/NonMountDir

Status: Connected

Number of entries: 0



The items have been present for some time and do not appear to be healing.

As shown above, the items are not labeled as split-brain and they do not have 
path information to do a manual delete-and-heal.



~~

Content of /var/log/glusterfs/glfsheal-hydra_pbs_vol.log is attached

~~

Info:

gluster volume info hydra_pbs_vol

Volume Name: hydra_pbs_vol

Type: Replicate

Volume ID: efb30804-1c08-4ef6-a579-a2f77d5049e0

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: hydra1:/data/glusterfs/PBS/NonMountDir

Brick2: hydra2:/data/glusterfs/PBS/NonMountDir

Brick3: viz1:/data/glusterfs/PBS/NonMountDir

Options Reconfigured:

performance.client-io-threads: off

nfs.disable: on

transport.address-family: inet

storage.fips-mode-rchecksum: on

cluster.granular-entry-heal: on

features.bitrot: on

features.scrub: Active



~~

Status:

Status of volume: hydra_pbs_vol

Gluster process TCP Port  RDMA Port  Online  Pid

--

Brick hydra1:/data/glusterfs/PBS/NonMountDi

r   49152 0  Y   1518

Brick hydra2:/data/glusterfs/PBS/NonMountDi

r   49152 0  Y   1438

Brick viz1:/data/glusterfs/PBS/NonMountDir  49152 0  Y   2991

Self-heal Daemon on localhost   N/A   N/AY   1942

Bitrot Daemon on localhost  N/A   N/AY   1563

Scrubber Daemon on localhostN/A   N/AY   1738

Self-heal Daemon on viz1N/A   N/AY   3491

Bitrot Daemon on viz1   N/A   N/AY   3203

Scrubber Daemon on viz1 N/A   N/AY   3261

Self-heal Daemon on hydra2  N/A   N/AY   1843

Bitrot Daemon on hydra2 N/A   N/AY   1475

Scrubber Daemon on hydra2   N/A   N/AY   1651



Task Status of Volume hydra_pbs_vol

--

There are no active volume tasks

~~





How can I resolve these entries/issues?





Thanks,

Collin Strassburger (he/him)




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] gfid entries failing to heal

2022-03-25 Thread Collin Strassburger
Hello,

I am having a problem with a replica 3 volume.

When I run: gluster volume heal hydra_pbs_vol info
It returns:
Brick hydra1:/data/glusterfs/PBS/NonMountDir

Status: Connected
Number of entries: 1

Brick hydra2:/data/glusterfs/PBS/NonMountDir




Status: Connected
Number of entries: 4

Brick viz1:/data/glusterfs/PBS/NonMountDir
Status: Connected
Number of entries: 0

The items have been present for some time and do not appear to be healing.
As shown above, the items are not labeled as split-brain and they do not have 
path information to do a manual delete-and-heal.

~~
Content of /var/log/glusterfs/glfsheal-hydra_pbs_vol.log is attached
~~
Info:
gluster volume info hydra_pbs_vol
Volume Name: hydra_pbs_vol
Type: Replicate
Volume ID: efb30804-1c08-4ef6-a579-a2f77d5049e0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: hydra1:/data/glusterfs/PBS/NonMountDir
Brick2: hydra2:/data/glusterfs/PBS/NonMountDir
Brick3: viz1:/data/glusterfs/PBS/NonMountDir
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
features.bitrot: on
features.scrub: Active

~~
Status:
Status of volume: hydra_pbs_vol
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick hydra1:/data/glusterfs/PBS/NonMountDi
r   49152 0  Y   1518
Brick hydra2:/data/glusterfs/PBS/NonMountDi
r   49152 0  Y   1438
Brick viz1:/data/glusterfs/PBS/NonMountDir  49152 0  Y   2991
Self-heal Daemon on localhost   N/A   N/AY   1942
Bitrot Daemon on localhost  N/A   N/AY   1563
Scrubber Daemon on localhostN/A   N/AY   1738
Self-heal Daemon on viz1N/A   N/AY   3491
Bitrot Daemon on viz1   N/A   N/AY   3203
Scrubber Daemon on viz1 N/A   N/AY   3261
Self-heal Daemon on hydra2  N/A   N/AY   1843
Bitrot Daemon on hydra2 N/A   N/AY   1475
Scrubber Daemon on hydra2   N/A   N/AY   1651

Task Status of Volume hydra_pbs_vol
--
There are no active volume tasks
~~


How can I resolve these entries/issues?


Thanks,
Collin Strassburger (he/him)


glfsheal_Excerpt.log
Description: glfsheal_Excerpt.log




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

2022-03-25 Thread Olaf Buitelaar
Hi Peter,

I see your raid array is rebuilding, could it be your xfs needs a repair,
using xfs_repair?
did you try running gluster v hdd start force?

Kind regards,

Olaf


Op do 24 mrt. 2022 om 15:54 schreef Peter Schmidt <
peterschmidt18...@yandex.com>:

> Hello everyone,
>
> I'm running an oVirt cluster on top of a distributed-replicate gluster
> volume and one of the bricks cannot be mounted anymore from my oVirt hosts.
> This morning I also noticed a stack trace and a spike in TCP connections on
> one of the three gluster nodes (storage2), which I have attached at the end
> of this mail. Only this particular brick on storage2 seems to be causing
> trouble:
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
>
> I don't know what's causing this or how to resolve this issue. I would
> appreciate it if someone could take a look at my logs and point me in the
> right direction. If any additional logs are required, please let me know.
> Thank you in advance!
>
> Operating system on all hosts: Centos 7.9.2009
> oVirt version: 4.3.10.4-1
> Gluster versions:
> - storage1: 6.10-1
> - storage2: 6.7-1
> - storage3: 6.7-1
>
> 
> # brick is not connected/mounted on the oVirt hosts
>
> *[xlator.protocol.client.hdd-client-7.priv]*
> *fd.0.remote_fd = -1*
> *-- = --*
> *granted-posix-lock[0] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1*
> *granted-posix-lock[1] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
> l_start = 101, l_len = 1*
> *-- = --*
> *connected = 0*
> *total_bytes_read = 11383136800*
> *ping_timeout = 10*
> *total_bytes_written = 16699851552*
> *ping_msgs_sent = 1*
> *msgs_sent = 2*
>
> 
> # mount log from one of the oVirt hosts
> # the IP 172.22.102.142 corresponds to my gluster node "storage2"
> # the port 49154 corresponds to the brick
> storage2:/data/glusterfs/hdd/brick3/brick
>
> *[2022-03-24 10:59:28.138178] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
> *[2022-03-24 10:59:38.142698] I [rpc-clnt.c:2028:rpc_clnt_reconfig]
> 0-hdd-client-7: changing port to 49154 (from 0)*
> *The message "I [MSGID: 114018] [client.c:2331:client_rpc_notify]
> 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep
> trying to connect to glusterd until brick's port is available" repeated 4
> times between [2022-03-24 10:58:04.114741] and [2022-03-24 10:59:28.137380]*
> *The message "W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]" repeated 4 times
> between [2022-03-24 10:58:04.115169] and [2022-03-24 10:59:28.138052]*
> *[2022-03-24 10:59:49.143217] C
> [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server
> 172.22.102.142:49154  has not responded in the
> last 10 seconds, disconnecting.*
> *[2022-03-24 10:59:49.143838] I [MSGID: 114018]
> [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from
> hdd-client-7. Client process will keep trying to connect to glusterd until
> brick's port is available*
> *[2022-03-24 10:59:49.144540] E [rpc-clnt.c:346:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24
> 10:59:38.145208 (xid=0x861)*
> *[2022-03-24 10:59:49.144557] W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]*
> *[2022-03-24 10:59:49.144653] E [rpc-clnt.c:346:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24
> 10:59:38.145218 (xid=0x862)*
> *[2022-03-24 10:59:49.144665] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
>
> 
> # netcat/telnet to the brick's port of storage2 are working
>
> *[root@storage1  ~]#  netcat -z -v 172.22.102.142 49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> *[root@storage3  ~]# netcat -z -v 172.22.102.142 49154*
> *Connection to