Hi Strahil,
Thank you for that, and the point you can't 'cd' to the .gfid directory.
I think the customer is going to live with the higher CPU usage as it's
still well within acceptable limits, and other things demand our time.
Thanks again for your input!
On Fri, 12 Jun 2020 at 16:06,
Hello David,
The .gfid directory is there but you cannot traverse (cd) in it - you need
to specify just like in the example.I had some cases where the 'transprt
endpoint is not connected' was received, but usually this is due to a gfid
missing.
About the meetings, one of the topics is
Hi Strahil,
Is there a trick to getting the .gfid directory to appear besides adding
"-o aux-gfid-mount" to the mount? I mounted it using "mount -t glusterfs -o
aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid directory
under /mnt/glusterfs.
I haven't tried joining a gluster
You can try the path of a file based on gfid (method 2) via:
https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
The gfids from the strace should be there, but if the file was renamed/deleted
- it is normall to be missing.
Have you joined the last gluster meeting to discuss the
Hi Strahil,
Thanks for that. I did search for a file with the gfid in the name, on both
the master nodes and geo-replication slave, but none of them had such a
file. I guess maybe by the time I looked the file had been deleted? Either
that or something is more seriously wrong with invalid gfids.
Hey David,
Sadly I just have a feeling that on any brick there is a gfid mismatch, but I
could be wrong.
As you have the gfid list, please check on all bricks (both master and
slave) that the file exists (not the one in .gluster , but the real one) and
it has the same gfid.
You can
Hi Strahil,
Thank you for that. Do you know if these "Stale file handle" errors on the
geo-replication slave could be related?
[2020-06-10 01:02:32.268989] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to
Hey David,
Can you check the cpu usage in the sar on the rest of the cluster (going
backwards from the day you found the high cpu usage), so we can know if this
behaviour was obseerved on other nodes.
Maybe that behaviour was "normal" for the push node (which could be another
one) .
As
Hi Sankarshan,
Thanks for that. So what should we look for to figure out what this process
is doing? In
/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
something like the following logged regularly:
[[2020-06-09 02:01:19.670595] D [master(worker
Reading through the thread it occurs to me that it would be a stronger
approach to understand the workload (a general description of the
application) and in terms of the releases of GlusterFS running, assess
if there are new issues to be addressed or if existing sets of patches
tend to work.
Hi Strahil,
The CPU is still quite high, with "top" regularly showing 100% CPU usage by
that process. However it's not clear whether this is really a problem, or
if it's just normal geo-replication activity. While CPU usage was not
previously as high on this server, it's not clear whether
Hey David,
can you check the old logs for gfid mismatch and get a list of files that were
causing the high cpu .
Maybe they are related somehow (maybe created by the same software , same
client version or something else) which could help about that.
Also take a look in
Hi Sunny and Strahil,
Thanks again for your responses. We don't have a lot of renaming activity -
maybe some, but not a lot. We do have files which are open for writing for
quite a while - they're call recordings being written as the call happens.
We've installed GlusterFS using the Ubuntu
Hi David,
in which log do you see the entries ?
I think I got an explanation why you see the process only on one of the master
nodes - geo-rep session is established from only 1 master node /I hope
someone corrects me if I'm wrong/ to one slave node. Thus it will be natural to
see the high
Hi David,
You haven't answered my previous question regarding the type of your workload.
---
You can use the below command to enable debug log.
`gluster vol geo-rep :: config log-level DEBUG`
and after capturing log again switch back to info mode:
`gluster vol geo-rep :: config log-level
Hi Strahil and Sunny,
Thank you for the replies. I checked the gfid on the master and slaves and
they are the same. After moving the file away and back again it doesn't
seem to be having the issue with that file any more.
We are still getting higher CPU usage on one of the master nodes than the
Hi David,
Looks like you are running a workload that involves lots of rename and
geo-rep is trying to handle those. you can try below patches which
will give you performance benefits.
[1]. https://review.gluster.org/#/c/glusterfs/+/23570/
[2]. https://review.gluster.org/#/c/glusterfs/+/23459/
Hey David,
for me a gfid mismatch means that the file was replaced/recreated - just
like vim in linux does (and it is expected for config file).
Have you checked the gfid of the file on both source and destination, do
they really match or they are different ?
What happens when you
Hello,
We're having an issue with a geo-replication process with unusually high
CPU use and giving "Entry not present on master. Fixing gfid mismatch in
slave" errors. Can anyone help on this?
We have 3 GlusterFS replica nodes (we'll call the master), which also push
data to a remote server
19 matches
Mail list logo