[Gluster-users] Routing glusterfs from IB network to Ethernet

2020-06-02 Thread Arman Khalatyan
Hello everybody,
I am testing some environment where my glusterfs is replicated on the
3jbods with IB interfaces.
Are there any possibility to route this to the 10G network for further
usage in the OVIRT environment?
I was googling for the gluster router but nothing found.
Is gluster dealing somehow with the multi-homed network installations?
In the architecture (the last figure)
https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/
it is unclear if you have TCP/IP and IB together how the gluster will
select the right routes?

Thank you beforehand,
Arman.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Strahil Nikolov
Hi David,

in which log do you see the entries ?

I think I got an explanation why you see the process  only on one of the master 
nodes -  geo-rep session is established from only 1 master node  /I hope 
someone corrects me if I'm wrong/ to one slave node. Thus it will be natural to 
see the high CPU  usage on only 1 master node in your situation.

Do you see anything else  in the : var/log/glusterfs/geo-replication/ 
(master nodes) or in /var/log/glusterfs/geo-replication-slaves (slaves) that 
could hint of the exact issue. I have  a vague feeling that that python script 
is constantly  looping over some data causing the CPU hog.

Sadly, I can't find an instruction for increasing the log level of the geo rep 
log .


Best  Regards,
Strahil  Nikolov
 

На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham 
 написа:
>Hi Strahil and Sunny,
>
>Thank you for the replies. I checked the gfid on the master and slaves
>and
>they are the same. After moving the file away and back again it doesn't
>seem to be having the issue with that file any more.
>
>We are still getting higher CPU usage on one of the master nodes than
>the
>others. It logs this every few seconds:
>
>[2020-06-02 03:10:15.637815] I [master(worker
>/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>Taken
>   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
>UNL=0
>[2020-06-02 03:10:15.638010] I [master(worker
>/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
>Time TakenSETA=0  SETX=0 
>meta_duration=0.data_duration=12.7878
>   DATA=4  XATT=0
>[2020-06-02 03:10:15.638286] I [master(worker
>/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
>Completed
>changelog_end=1591067378entry_stime=(1591067167, 0)
>changelog_start=1591067364  stime=(1591067377, 0)  
>duration=12.8068
> num_changelogs=2mode=live_changelog
>[2020-06-02 03:10:20.658601] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> stime=(1591067377, 0)
>[2020-06-02 03:10:34.21799] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> duration=0.3826 num_files=8 job=1   return_code=0
>[2020-06-02 03:10:46.440535] I [master(worker
>/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>Taken
>   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314
>UNL=1
>[2020-06-02 03:10:46.440809] I [master(worker
>/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
>Time TakenSETA=0  SETX=0 
>meta_duration=0.data_duration=13.0171
>   DATA=14 XATT=0
>[2020-06-02 03:10:46.441205] I [master(worker
>/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
>Completed
>changelog_end=1591067420entry_stime=(1591067419, 0)
>changelog_start=1591067392  stime=(1591067419, 0)  
>duration=13.0322
> num_changelogs=3mode=live_changelog
>[2020-06-02 03:10:51.460925] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> stime=(1591067419, 0)
>
>[2020-06-02 03:11:04.448913] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
>duration=0.3466 num_files=3 job=1   return_code=0
>
>Whereas the other master nodes only log this:
>
>[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] : Using
>session config file
>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>
>Can anyone help with what might cause the high CPU usage on one master
>node? The process is this one, and is using 70-100% of CPU:
>
>python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
>worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
>/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
>b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
>cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
>--resource-remote nvfs30 --resource-remote-id
>1e698ccd-aeec-4ec4-96fe-383da8fc3b78
>
>Thank you in advance!
>
>
>
>
>On Sat, 30 May 2020 at 20:20, Strahil Nikolov 
>wrote:
>
>> Hey David,
>>
>> for me a gfid  mismatch means  that the file  was  replaced/recreated
> -
>> just like  vim in linux does (and it is expected for config file).
>>
>> Have  you checked the gfid  of  the file on both source and
>destination,
>> do they really match or they are different ?
>>
>> What happens  when you move away the file  from the slave ,  does it
>fixes
>> the issue ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> >Hello,
>> >
>> >We're having an issue with a geo-replication process with unusually
>> >high
>> >CPU use and giving "Entry not present on master. Fixing gfid
>mismatch
>> >in
>> >slave" 

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Sunny Kumar
Hi David,

You haven't answered my previous question regarding the type of your workload.
---
You can use the below command to enable debug log.

`gluster vol geo-rep  :: config log-level DEBUG`

and after capturing log again switch back to info mode:

`gluster vol geo-rep  :: config log-level INFO`

Please share the debug log and geo-rep config to debug further:
for config:

`gluster vol geo-rep  :: config`

/sunny


On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov  wrote:
>
> Hi David,
>
> in which log do you see the entries ?
>
> I think I got an explanation why you see the process  only on one of the 
> master nodes -  geo-rep session is established from only 1 master node  /I 
> hope someone corrects me if I'm wrong/ to one slave node. Thus it will be 
> natural to see the high CPU  usage on only 1 master node in your situation.
>
> Do you see anything else  in the : var/log/glusterfs/geo-replication/ 
> (master nodes) or in /var/log/glusterfs/geo-replication-slaves (slaves) that 
> could hint of the exact issue. I have  a vague feeling that that python 
> script is constantly  looping over some data causing the CPU hog.
>
> Sadly, I can't find an instruction for increasing the log level of the geo 
> rep log .
>
>
> Best  Regards,
> Strahil  Nikolov
>
>
> На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham 
>  написа:
> >Hi Strahil and Sunny,
> >
> >Thank you for the replies. I checked the gfid on the master and slaves
> >and
> >they are the same. After moving the file away and back again it doesn't
> >seem to be having the issue with that file any more.
> >
> >We are still getting higher CPU usage on one of the master nodes than
> >the
> >others. It logs this every few seconds:
> >
> >[2020-06-02 03:10:15.637815] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >Taken
> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
> >UNL=0
> >[2020-06-02 03:10:15.638010] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >Time TakenSETA=0  SETX=0
> >meta_duration=0.data_duration=12.7878
> >   DATA=4  XATT=0
> >[2020-06-02 03:10:15.638286] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >Completed
> >changelog_end=1591067378entry_stime=(1591067167, 0)
> >changelog_start=1591067364  stime=(1591067377, 0)
> >duration=12.8068
> > num_changelogs=2mode=live_changelog
> >[2020-06-02 03:10:20.658601] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1591067377, 0)
> >[2020-06-02 03:10:34.21799] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> > duration=0.3826 num_files=8 job=1   return_code=0
> >[2020-06-02 03:10:46.440535] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >Taken
> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314
> >UNL=1
> >[2020-06-02 03:10:46.440809] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >Time TakenSETA=0  SETX=0
> >meta_duration=0.data_duration=13.0171
> >   DATA=14 XATT=0
> >[2020-06-02 03:10:46.441205] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >Completed
> >changelog_end=1591067420entry_stime=(1591067419, 0)
> >changelog_start=1591067392  stime=(1591067419, 0)
> >duration=13.0322
> > num_changelogs=3mode=live_changelog
> >[2020-06-02 03:10:51.460925] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1591067419, 0)
> >
> >[2020-06-02 03:11:04.448913] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> >duration=0.3466 num_files=3 job=1   return_code=0
> >
> >Whereas the other master nodes only log this:
> >
> >[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] :
> >Using
> >session config file
> >path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] : Using
> >session config file
> >path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >
> >Can anyone help with what might cause the high CPU usage on one master
> >node? The process is this one, and is using 70-100% of CPU:
> >
> >python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
> >worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
> >--resource-remote nvfs30 --resource-remote-id
> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >
> >Thank you in advance!
> >
> >
> >
> >
> >On Sat, 30 May 2020 at 20:20, Strahil Nikolov 
> >wrote:
> >
> >> Hey David,
> >>
> >> for