Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread David Cunningham
Hi Strahil,

Thank you for that. Do you know if these "Stale file handle" errors on the
geo-replication slave could be related?

[2020-06-10 01:02:32.268989] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the file on
gvol0-dht [Stale file handle]
[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434237: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434251: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434264: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434277: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434290: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434303: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434316: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434329: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434342: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.891835] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434403: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)



On Tue, 9 Jun 2020 at 16:31, Strahil Nikolov  wrote:

> Hey David,
>
> Can you check the cpu usage  in the sar on the rest of the cluster (going
> backwards from the day you found the high cpu usage),  so we can know if
> this behaviour was obseerved on other nodes.
>
> Maybe that behaviour was "normal" for the push node (which could be
> another one) .
>
> As  this  script  is python,  I guess  you can put some debug print
> statements in it.
>
> Best Regards,
> Strahil Nikolov
>
> На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Sankarshan,
> >
> >Thanks for that. So what should we look for to figure out what this
> >process
> >is doing? In
> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
> >something like the following logged regularly:
> >
> >
> >[[2020-06-09 02:01:19.670595] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
> >_GMaster:
> >processing changes
>
> >batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>
> >'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
> >[2020-06-09 02:01:19.674927] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
> >change
> >
>
> >changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
> >[2020-06-09 02:01:19.683098] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
> >entries: []
> >[2020-06-09 02:01:19.695125] D [master(worker
> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
> >'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
> >[2020-06-09 02:01:19.695344] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
> >[2020-06-09 02:01:19.695508] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
> >[2020-06-09 02:01:19.695638] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
> >[2020-06-09 02:01:19.695759] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
> >[2020-06-09 02:01:19.695883] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
> >change
> >
>
> 

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread Strahil Nikolov
Hey David,

Can you check the cpu usage  in the sar on the rest of the cluster (going 
backwards from the day you found the high cpu usage),  so we can know if this 
behaviour was obseerved on other nodes.

Maybe that behaviour was "normal" for the push node (which could be another 
one) .

As  this  script  is python,  I guess  you can put some debug print statements 
in it.

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham 
 написа:
>Hi Sankarshan,
>
>Thanks for that. So what should we look for to figure out what this
>process
>is doing? In
>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
>something like the following logged regularly:
>
>
>[[2020-06-09 02:01:19.670595] D [master(worker
>/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
>_GMaster:
>processing changes
>batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
>[2020-06-09 02:01:19.674927] D [master(worker
>/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
>change
>
>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
>[2020-06-09 02:01:19.683098] D [master(worker
>/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
>entries: []
>[2020-06-09 02:01:19.695125] D [master(worker
>/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
>files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
>'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>[2020-06-09 02:01:19.695344] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:19.695508] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:19.695638] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>[2020-06-09 02:01:19.695759] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:19.695883] D [master(worker
>/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
>change
>
>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055
>[2020-06-09 02:01:19.696170] D [master(worker
>/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
>entries: []
>[2020-06-09 02:01:19.714097] D [master(worker
>/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
>files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>[2020-06-09 02:01:19.714286] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:19.714433] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:19.714577] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.179656] D [resource(worker
>/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:
>.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,
>.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,
>.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.738632] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
>duration=0.5588 num_files=7 job=2   return_code=0
>[2020-06-09 02:01:20.739650] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:20.740041] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:20.740200] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>[2020-06-09 02:01:20.740343] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.740482] D [master(worker