Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-18 Thread David Cunningham
Hi Strahil, Thank you for that, and the point you can't 'cd' to the .gfid directory. I think the customer is going to live with the higher CPU usage as it's still well within acceptable limits, and other things demand our time. Thanks again for your input! On Fri, 12 Jun 2020 at 16:06,

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-12 Thread Strahil Nikolov
Hello David, The .gfid directory is there but you cannot traverse (cd) in it - you need to specify just like in the example.I had some cases where the 'transprt endpoint is not connected' was received, but usually this is due to a gfid missing. About the meetings, one of the topics is

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-11 Thread David Cunningham
Hi Strahil, Is there a trick to getting the .gfid directory to appear besides adding "-o aux-gfid-mount" to the mount? I mounted it using "mount -t glusterfs -o aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid directory under /mnt/glusterfs. I haven't tried joining a gluster

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-11 Thread Strahil Nikolov
You can try the path of a file based on gfid (method 2) via: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ The gfids from the strace should be there, but if the file was renamed/deleted - it is normall to be missing. Have you joined the last gluster meeting to discuss the

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-10 Thread David Cunningham
Hi Strahil, Thanks for that. I did search for a file with the gfid in the name, on both the master nodes and geo-replication slave, but none of them had such a file. I guess maybe by the time I looked the file had been deleted? Either that or something is more seriously wrong with invalid gfids.

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-10 Thread Strahil Nikolov
Hey David, Sadly I just have a feeling that on any brick there is a gfid mismatch, but I could be wrong. As you have the gfid list, please check on all bricks (both master and slave) that the file exists (not the one in .gluster , but the real one) and it has the same gfid. You can

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread David Cunningham
Hi Strahil, Thank you for that. Do you know if these "Stale file handle" errors on the geo-replication slave could be related? [2020-06-10 01:02:32.268989] E [MSGID: 109040] [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht: /.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread Strahil Nikolov
Hey David, Can you check the cpu usage in the sar on the rest of the cluster (going backwards from the day you found the high cpu usage), so we can know if this behaviour was obseerved on other nodes. Maybe that behaviour was "normal" for the push node (which could be another one) . As

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-08 Thread David Cunningham
Hi Sankarshan, Thanks for that. So what should we look for to figure out what this process is doing? In /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see something like the following logged regularly: [[2020-06-09 02:01:19.670595] D [master(worker

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-07 Thread sankarshan
Reading through the thread it occurs to me that it would be a stronger approach to understand the workload (a general description of the application) and in terms of the releases of GlusterFS running, assess if there are new issues to be addressed or if existing sets of patches tend to work.

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-07 Thread David Cunningham
Hi Strahil, The CPU is still quite high, with "top" regularly showing 100% CPU usage by that process. However it's not clear whether this is really a problem, or if it's just normal geo-replication activity. While CPU usage was not previously as high on this server, it's not clear whether

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-06 Thread Strahil Nikolov
Hey David, can you check the old logs for gfid mismatch and get a list of files that were causing the high cpu . Maybe they are related somehow (maybe created by the same software , same client version or something else) which could help about that. Also take a look in

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-05 Thread David Cunningham
Hi Sunny and Strahil, Thanks again for your responses. We don't have a lot of renaming activity - maybe some, but not a lot. We do have files which are open for writing for quite a while - they're call recordings being written as the call happens. We've installed GlusterFS using the Ubuntu

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Strahil Nikolov
Hi David, in which log do you see the entries ? I think I got an explanation why you see the process only on one of the master nodes - geo-rep session is established from only 1 master node /I hope someone corrects me if I'm wrong/ to one slave node. Thus it will be natural to see the high

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Sunny Kumar
Hi David, You haven't answered my previous question regarding the type of your workload. --- You can use the below command to enable debug log. `gluster vol geo-rep :: config log-level DEBUG` and after capturing log again switch back to info mode: `gluster vol geo-rep :: config log-level

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-01 Thread David Cunningham
Hi Strahil and Sunny, Thank you for the replies. I checked the gfid on the master and slaves and they are the same. After moving the file away and back again it doesn't seem to be having the issue with that file any more. We are still getting higher CPU usage on one of the master nodes than the

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-30 Thread Sunny Kumar
Hi David, Looks like you are running a workload that involves lots of rename and geo-rep is trying to handle those. you can try below patches which will give you performance benefits. [1]. https://review.gluster.org/#/c/glusterfs/+/23570/ [2]. https://review.gluster.org/#/c/glusterfs/+/23459/

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-30 Thread Strahil Nikolov
Hey David, for me a gfid mismatch means that the file was replaced/recreated - just like vim in linux does (and it is expected for config file). Have you checked the gfid of the file on both source and destination, do they really match or they are different ? What happens when you

[Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-29 Thread David Cunningham
Hello, We're having an issue with a geo-replication process with unusually high CPU use and giving "Entry not present on master. Fixing gfid mismatch in slave" errors. Can anyone help on this? We have 3 GlusterFS replica nodes (we'll call the master), which also push data to a remote server