Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Strahil Nikolov
Usually there is always only 1 "master" , but when you power off one of the 2 
nodes - the geo rep should handle that and the second node should take the job.

How long did you wait after gluster1 has been rebooted ?


Best Regards,
Strahil Nikolov






В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes 
 написа: 





I was able to solve the issue restarting all servers.

Now I have another issue!

I just powered off the gluster01 server and then the geo-replication entered in 
faulty status.
I tried to stop and start the gluster geo-replication like that:

gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume  Peer 
gluster01.home.local, which is a part of DATA volume, is down. Please bring up 
the peer and retry. geo-replication command failed
How can I have geo-replication with 2 master and 1 slave?

Thanks


---
Gilberto Nunes Ferreira







Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes 
 escreveu:
> Hi there...
> 
> I'd created a 2 gluster vol and another 1 gluster server acting as a backup 
> server, using geo-replication.
> So in gluster01 I'd issued the command:
> 
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data 
> gluster02:/DATA/master01-data/
> 
> Then in gluster03 server:
> 
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
> 
> I'd setted the ssh powerless session between this 3 servers.
> 
> Then I'd used this script
> 
> https://github.com/gilbertoferreira/georepsetup
> 
> like this
> 
> georepsetup    
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supported by the 
> Python core team. Support for it is now deprecated in cryptography, and will 
> be removed in a future release.  from cryptography.hazmat.backends import 
> default_backend usage: georepsetup [-h] [--force] [--no-color] MASTERVOL 
> SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~# georepsetup 
> DATA gluster03 DATA-SLAVE 
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supported by the 
> Python core team. Support for it is now deprecated in cryptography, and will 
> be removed in a future release.  from cryptography.hazmat.backends import 
> default_backend Geo-replication session will be established between DATA and 
> gluster03::DATA-SLAVE Root password of gluster03 is required to complete the 
> setup. NOTE: Password will not be stored. root@gluster03's password:  [    
> OK] gluster03 is Reachable(Port 22) [    OK] SSH Connection established 
> root@gluster03 [    OK] Master Volume and Slave Volume are compatible 
> (Version: 8.2) [    OK] Common secret pub file present at 
> /var/lib/glusterd/geo-replication/common_secret.pem.pub [    OK] 
> common_secret.pem.pub file copied to gluster03 [    OK] Master SSH Keys 
> copied to all Up Slave nodes [    OK] Updated Master SSH Keys to all Up Slave 
> nodes authorized_keys file [    OK] Geo-replication Session Established
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I get Faulty 
> status in gluster01
> 
> There's the log
> 
> 
> [2020-10-26 20:16:41.362584] I [gsyncdstatus(monitor):248:set_worker_status] 
> GeorepStatus: Worker Status Change [{status=Initializing...}] [2020-10-26 
> 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: starting gsyncd 
> worker [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26 
> 20:16:41.508884] I [resource(worker /DATA/master01-data):1387:connect_remote] 
> SSH: Initializing SSH connection between master and slave... [2020-10-26 
> 20:16:42.996678] I [resource(worker /DATA/master01-data):1436:connect_remote] 
> SSH: SSH connection between master and slave established. [{duration=1.4873}] 
> [2020-10-26 20:16:42.997121] I [resource(worker 
> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume 
> locally... [2020-10-26 20:16:44.170661] E [syncdutils(worker 
> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr value 
> [2020-10-26 20:16:44.171281] I [resource(worker 
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume 
> [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker 
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker 
> /DATA/master01-data):1645:register] _GMaster: Working dir 
> [{path=/var/lib/misc/gluster/gsyncd/DATA_gluster03_DATA-SLAVE/DATA-master01-data}]
>  [2020-10-26 20:16:46.201798] I [resource(worker 
> /DATA/master01-data):1292:service_loop] GLUSTER: Register time 
> [{time=1603743406}] [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker 
> /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change 
> [{status=Active}] 

Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread Strahil Nikolov
You need to fix that "reject" issue before trying anything else.
Have you tried to "detach" the arbiter and then "probe" it again ?

I have no idea what you did to reach that state - can you provide the details ?

Best Regards,
Strahil Nikolov






В понеделник, 26 октомври 2020 г., 20:38:38 Гринуич+2, mabi 
 написа: 





Ok I see I won't go down that path of disabling quota.

I could now remove the arbiter brick of my volume which has the quota issue so 
it is now a simple 2 nodes replica with 1 brick per node.

Now I would like to add the brick back but I get the following error:

volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in 
Cluster' state

In fact I checked and the arbiter node is still rejected as you can see here:

State: Peer Rejected (Connected)

On the arbiter node glusted.log file I see the following errors:

[2020-10-26 18:35:05.605124] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume woelkli-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node1.domain.tld
[2020-10-26 18:35:05.617009] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node2.domain.tld

So although I have removed the arbiter brick from my volume it it still 
complains about that checksum of the quota configuration. I also tried to 
restart glusterd on my arbiter node but it does not help. The peer is still 
rejected.

What should I do at this stage?


‐‐‐ Original Message ‐‐‐

On Monday, October 26, 2020 6:06 PM, Strahil Nikolov  
wrote:

> Detaching the arbiter is pointless...
> Quota is an extended file attribute, and thus disabling and reenabling quota 
> on a volume with millions of files will take a lot of time and lots of IOPS. 
> I would leave it as a last resort. 
>
> Also, it was mentioned in the list about the following script that might help 
> you:
> https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py
>
> You can take a look in the mailing list for usage and more details.
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato 
> diego.zucc...@unibo.it написа:
>
> Il 26/10/20 15:09, mabi ha scritto:
>
> > Right, seen liked that this sounds reasonable. Do you actually remember the 
> > exact command you ran in order to remove the brick? I was thinking this 
> > should be it:
> > gluster volume remove-brick   force
> > but should I use "force" or "start"?
>
> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)
>
> gluster volume remove-brick BigVol replica 2
>
> =
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol force
>
> gluster peer detach str957-biostq
>
> ==
>
> gluster peer probe str957-biostq
>
> =
>
> gluster volume add-brick BigVol replica 3 arbiter 1
>
> 
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol
>
> You obviously have to wait for remove-brick to complete before detaching
> arbiter.
>
> > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and 
> > > RAM.
> > > That's quite long I must say and I am in the same case as you, my arbiter 
> > > is a VM.
>
> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).
>
> -
>
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster monitoring

2020-10-26 Thread Mahdi Adnan
Hello


 How do you keep track of the health status of your Gluster volumes? When
Brick went down (crash, failure, shutdown), node failure, peering issue,
on-going healing?

Gluster Tendrl is complex and sometimes it's broken, Prometheus exporter
still lacking, gstatus is basic.

Currently, to monitor a Gluster volume, a custom script should be used to
gather whatever info needed for monitoring or a combination of the
mentioned tools.

Can Gluster have something similar to Ceph and display the health of the
entire cluster? I know Ceph uses it’s “Monitors” to keep track of
everything going inside the cluster, but Gluster should also have a way to
keep track of the cluster’s health.

How’s the community experience with Gluster monitoring? How are you
managing and tracking alerts and issues? Any recommendations?

Thank you.

-- 
Respectfully
Mahdi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Gilberto Nunes
Well I do not reboot the host. I shut down the host. Then after 15 min give
up.
Don't know why that happened.
I will try it latter

---
Gilberto Nunes Ferreira








Em seg., 26 de out. de 2020 às 21:31, Strahil Nikolov 
escreveu:

> Usually there is always only 1 "master" , but when you power off one of
> the 2 nodes - the geo rep should handle that and the second node should
> take the job.
>
> How long did you wait after gluster1 has been rebooted ?
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes <
> gilberto.nune...@gmail.com> написа:
>
>
>
>
>
> I was able to solve the issue restarting all servers.
>
> Now I have another issue!
>
> I just powered off the gluster01 server and then the geo-replication
> entered in faulty status.
> I tried to stop and start the gluster geo-replication like that:
>
> gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
>  Peer gluster01.home.local, which is a part of DATA volume, is down. Please
> bring up the peer and retry. geo-replication command failed
> How can I have geo-replication with 2 master and 1 slave?
>
> Thanks
>
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
>
> Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
> gilberto.nune...@gmail.com> escreveu:
> > Hi there...
> >
> > I'd created a 2 gluster vol and another 1 gluster server acting as a
> backup server, using geo-replication.
> > So in gluster01 I'd issued the command:
> >
> > gluster peer probe gluster02;gluster peer probe gluster03
> > gluster vol create DATA replica 2 gluster01:/DATA/master01-data
> gluster02:/DATA/master01-data/
> >
> > Then in gluster03 server:
> >
> > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
> >
> > I'd setted the ssh powerless session between this 3 servers.
> >
> > Then I'd used this script
> >
> > https://github.com/gilbertoferreira/georepsetup
> >
> > like this
> >
> > georepsetup
>
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supported by the
> Python core team. Support for it is now deprecated in cryptography, and
> will be removed in a future release.  from cryptography.hazmat.backends
> import default_backend usage: georepsetup [-h] [--force] [--no-color]
> MASTERVOL SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~#
> georepsetup DATA gluster03 DATA-SLAVE
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supported by the
> Python core team. Support for it is now deprecated in cryptography, and
> will be removed in a future release.  from cryptography.hazmat.backends
> import default_backend Geo-replication session will be established between
> DATA and gluster03::DATA-SLAVE Root password of gluster03 is required to
> complete the setup. NOTE: Password will not be stored. root@gluster03's
> password:  [OK] gluster03 is Reachable(Port 22) [OK] SSH Connection
> established root@gluster03 [OK] Master Volume and Slave Volume are
> compatible (Version: 8.2) [OK] Common secret pub file present at
> /var/lib/glusterd/geo-replication/common_secret.pem.pub [OK]
> common_secret.pem.pub file copied to gluster03 [OK] Master SSH Keys
> copied to all Up Slave nodes [OK] Updated Master SSH Keys to all Up
> Slave nodes authorized_keys file [OK] Geo-replication Session
> Established
> > Then I reboot the 3 servers...
> > After a while everything works ok, but after a few minutes, I get Faulty
> status in gluster01
> >
> > There's the log
> >
> >
> > [2020-10-26 20:16:41.362584] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change [{status=Initializing...}] [2020-10-26 20:16:41.362937] I
> [monitor(monitor):160:monitor] Monitor: starting gsyncd worker
> [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26
> 20:16:41.508884] I [resource(worker
> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
> between master and slave... [2020-10-26 20:16:42.996678] I [resource(worker
> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between
> master and slave established. [{duration=1.4873}] [2020-10-26
> 20:16:42.997121] I [resource(worker /DATA/master01-data):1116:connect]
> GLUSTER: Mounting gluster volume locally... [2020-10-26 20:16:44.170661] E
> [syncdutils(worker /DATA/master01-data):110:gf_mount_ready] : failed
> to get the xattr value [2020-10-26 20:16:44.171281] I [resource(worker
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
> [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
> Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker
> /DATA/master01-data):1645:register] _GMaster: Working dir
> 

Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
Ok I see I won't go down that path of disabling quota.

I could now remove the arbiter brick of my volume which has the quota issue so 
it is now a simple 2 nodes replica with 1 brick per node.

Now I would like to add the brick back but I get the following error:

volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in 
Cluster' state

In fact I checked and the arbiter node is still rejected as you can see here:

State: Peer Rejected (Connected)

On the arbiter node glusted.log file I see the following errors:

[2020-10-26 18:35:05.605124] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume woelkli-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node1.domain.tld
[2020-10-26 18:35:05.617009] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node2.domain.tld

So although I have removed the arbiter brick from my volume it it still 
complains about that checksum of the quota configuration. I also tried to 
restart glusterd on my arbiter node but it does not help. The peer is still 
rejected.

What should I do at this stage?


‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 6:06 PM, Strahil Nikolov  
wrote:

> Detaching the arbiter is pointless...
> Quota is an extended file attribute, and thus disabling and reenabling quota 
> on a volume with millions of files will take a lot of time and lots of IOPS. 
> I would leave it as a last resort. 
>
> Also, it was mentioned in the list about the following script that might help 
> you:
> https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py
>
> You can take a look in the mailing list for usage and more details.
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato 
> diego.zucc...@unibo.it написа:
>
> Il 26/10/20 15:09, mabi ha scritto:
>
> > Right, seen liked that this sounds reasonable. Do you actually remember the 
> > exact command you ran in order to remove the brick? I was thinking this 
> > should be it:
> > gluster volume remove-brick   force
> > but should I use "force" or "start"?
>
> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)
>
> gluster volume remove-brick BigVol replica 2
>
> =
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol force
>
> gluster peer detach str957-biostq
>
> ==
>
> gluster peer probe str957-biostq
>
> =
>
> gluster volume add-brick BigVol replica 3 arbiter 1
>
> 
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol
>
> You obviously have to wait for remove-brick to complete before detaching
> arbiter.
>
> > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and 
> > > RAM.
> > > That's quite long I must say and I am in the same case as you, my arbiter 
> > > is a VM.
>
> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).
>
> -
>
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread Strahil Nikolov
Detaching the arbiter is pointless...
Quota is an extended file attribute, and thus disabling and reenabling quota on 
a volume with millions of files will take a lot of time and lots of IOPS. I 
would leave it as a last resort. 

Also, it was mentioned in the list about the following script that might help 
you:
https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py


You can take a look in the mailing list for usage and more details.

Best Regards,
Strahil Nikolov






В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato 
 написа: 





Il 26/10/20 15:09, mabi ha scritto:

> Right, seen liked that this sounds reasonable. Do you actually remember the 
> exact command you ran in order to remove the brick? I was thinking this 
> should be it:
> gluster volume remove-brick   force
> but should I use "force" or "start"?
Memory does not serve me well (there are 28 disks, not 26!), but bash
history does :)
# gluster volume remove-brick BigVol replica 2
str957-biostq:/srv/arbiters/{00..27}/BigVol force
# gluster peer detach str957-biostq
# gluster peer probe str957-biostq
# gluster volume add-brick BigVol replica 3 arbiter 1
str957-biostq:/srv/arbiters/{00..27}/BigVol

You obviously have to wait for remove-brick to complete before detaching
arbiter.

>> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
>> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.
> That's quite long I must say and I am in the same case as you, my arbiter is 
> a VM.
Give all the CPU and RAM you can. Less than 8GB RAM is asking for
troubles (in my case).

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Gilberto Nunes
I was able to solve the issue restarting all servers.

Now I have another issue!

I just powered off the gluster01 server and then the geo-replication
entered in faulty status.
I tried to stop and start the gluster geo-replication like that:

gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
Peer gluster01.home.local, which is a part of DATA volume, is down. Please
bring up the peer and retry.
geo-replication command failed

How can I have geo-replication with 2 master and 1 slave?

Thanks


---
Gilberto Nunes Ferreira






Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
gilberto.nune...@gmail.com> escreveu:

> Hi there...
>
> I'd created a 2 gluster vol and another 1 gluster server acting as a
> backup server, using geo-replication.
> So in gluster01 I'd issued the command:
>
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data
> gluster02:/DATA/master01-data/
>
> Then in gluster03 server:
>
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
>
> I'd setted the ssh powerless session between this 3 servers.
>
> Then I'd used this script
>
> https://github.com/gilbertoferreira/georepsetup
>
> like this
>
> georepsetup
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in
> cryptography, and will be removed in a future release.
>  from cryptography.hazmat.backends import default_backend
> usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL
> georepsetup: error: too few arguments
> gluster01:~# georepsetup DATA gluster03 DATA-SLAVE
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in
> cryptography, and will be removed in a future release.
>  from cryptography.hazmat.backends import default_backend
> Geo-replication session will be established between DATA and
> gluster03::DATA-SLAVE
> Root password of gluster03 is required to complete the setup. NOTE:
> Password will not be stored.
>
> root@gluster03's password:
> [OK] gluster03 is Reachable(Port 22)
> [OK] SSH Connection established root@gluster03
> [OK] Master Volume and Slave Volume are compatible (Version: 8.2)
> [OK] Common secret pub file present at
> /var/lib/glusterd/geo-replication/common_secret.pem.pub
> [OK] common_secret.pem.pub file copied to gluster03
> [OK] Master SSH Keys copied to all Up Slave nodes
> [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys
> file
> [OK] Geo-replication Session Established
>
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I get Faulty
> status in gluster01
>
> There's the log
>
>
> [2020-10-26 20:16:41.362584] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change [{status=Initializing...}]
> [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor:
> starting gsyncd worker [{brick=/DATA/master01-data},
> {slave_node=gluster03}]
> [2020-10-26 20:16:41.508884] I [resource(worker
> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
> between master and slave.
> ..
> [2020-10-26 20:16:42.996678] I [resource(worker
> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between
> master and slave established.
> [{duration=1.4873}]
> [2020-10-26 20:16:42.997121] I [resource(worker
> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume
> locally...
> [2020-10-26 20:16:44.170661] E [syncdutils(worker
> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr
> value
> [2020-10-26 20:16:44.171281] I [resource(worker
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
> [{duration=1.1739}]
> [2020-10-26 20:16:44.171772] I [subcmds(worker
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
> Acknowledging back to monitor
> [2020-10-26 20:16:46.200603] I [master(worker
> /DATA/master01-data):1645:register] _GMaster: Working dir
> [{path=/var/lib/misc/gluster/gsyncd/DATA_glu
> ster03_DATA-SLAVE/DATA-master01-data}]
> [2020-10-26 20:16:46.201798] I [resource(worker
> /DATA/master01-data):1292:service_loop] GLUSTER: Register time
> [{time=1603743406}]
> [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker
> /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change
> [{status=Active}]
> [2020-10-26 20:16:46.395112] I [gsyncdstatus(worker
> /DATA/master01-data):253:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change [{status=His
> tory Crawl}]
> [2020-10-26 20:16:46.396491] I [master(worker
> /DATA/master01-data):1559:crawl] _GMaster: starting history crawl
> [{turns=1}, {stime=(1603742506, 0)},
> {etime=1603743406}, 

[Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Gilberto Nunes
Hi there...

I'd created a 2 gluster vol and another 1 gluster server acting as a backup
server, using geo-replication.
So in gluster01 I'd issued the command:

gluster peer probe gluster02;gluster peer probe gluster03
gluster vol create DATA replica 2 gluster01:/DATA/master01-data
gluster02:/DATA/master01-data/

Then in gluster03 server:

gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/

I'd setted the ssh powerless session between this 3 servers.

Then I'd used this script

https://github.com/gilbertoferreira/georepsetup

like this

georepsetup
/usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
CryptographyDeprecationWarning: Python 2 is no longer supp
orted by the Python core team. Support for it is now deprecated in
cryptography, and will be removed in a future release.
 from cryptography.hazmat.backends import default_backend
usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL
georepsetup: error: too few arguments
gluster01:~# georepsetup DATA gluster03 DATA-SLAVE
/usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
CryptographyDeprecationWarning: Python 2 is no longer supp
orted by the Python core team. Support for it is now deprecated in
cryptography, and will be removed in a future release.
 from cryptography.hazmat.backends import default_backend
Geo-replication session will be established between DATA and
gluster03::DATA-SLAVE
Root password of gluster03 is required to complete the setup. NOTE:
Password will not be stored.

root@gluster03's password:
[OK] gluster03 is Reachable(Port 22)
[OK] SSH Connection established root@gluster03
[OK] Master Volume and Slave Volume are compatible (Version: 8.2)
[OK] Common secret pub file present at
/var/lib/glusterd/geo-replication/common_secret.pem.pub
[OK] common_secret.pem.pub file copied to gluster03
[OK] Master SSH Keys copied to all Up Slave nodes
[OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys file
[OK] Geo-replication Session Established

Then I reboot the 3 servers...
After a while everything works ok, but after a few minutes, I get Faulty
status in gluster01

There's the log


[2020-10-26 20:16:41.362584] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change [{status=Initializing...}]
[2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/DATA/master01-data},
{slave_node=gluster03}]
[2020-10-26 20:16:41.508884] I [resource(worker
/DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
between master and slave.
..
[2020-10-26 20:16:42.996678] I [resource(worker
/DATA/master01-data):1436:connect_remote] SSH: SSH connection between
master and slave established.
[{duration=1.4873}]
[2020-10-26 20:16:42.997121] I [resource(worker
/DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2020-10-26 20:16:44.170661] E [syncdutils(worker
/DATA/master01-data):110:gf_mount_ready] : failed to get the xattr
value
[2020-10-26 20:16:44.171281] I [resource(worker
/DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.1739}]
[2020-10-26 20:16:44.171772] I [subcmds(worker
/DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
Acknowledging back to monitor
[2020-10-26 20:16:46.200603] I [master(worker
/DATA/master01-data):1645:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/DATA_glu
ster03_DATA-SLAVE/DATA-master01-data}]
[2020-10-26 20:16:46.201798] I [resource(worker
/DATA/master01-data):1292:service_loop] GLUSTER: Register time
[{time=1603743406}]
[2020-10-26 20:16:46.226415] I [gsyncdstatus(worker
/DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2020-10-26 20:16:46.395112] I [gsyncdstatus(worker
/DATA/master01-data):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=His
tory Crawl}]
[2020-10-26 20:16:46.396491] I [master(worker
/DATA/master01-data):1559:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1603742506, 0)},
{etime=1603743406}, {entry_stime=(1603743226, 0)}]
[2020-10-26 20:16:46.399292] E [resource(worker
/DATA/master01-data):1312:service_loop] GLUSTER: Changelog History Crawl
failed [{error=[Errno 0] Su
cesso}]
[2020-10-26 20:16:47.177205] I [monitor(monitor):228:monitor] Monitor:
worker died in startup phase [{brick=/DATA/master01-data}]
[2020-10-26 20:16:47.184525] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change [{status=Faulty}]


Any advice will be welcome.

Thanks

---
Gilberto Nunes Ferreira




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 3:39 PM, Diego Zuccato  
wrote:

> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)

Yes, I also too often rely on history ;)

> gluster volume remove-brick BigVol replica 2 
> str957-biostq:/srv/arbiters/{00..27}/BigVol force

Thanks for the info, I was missing the "replica 2" inside the command it looks 
like

> gluster peer detach str957-biostq
> gluster peer probe str957-biostq

Do I really need to do a detach and re-probe the aribter node? I would like to 
avoid that because I have two other volumes with even more files... so that 
would mean that I have to remove the arbiter brick of the two other volumes 
too...

> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).

I have added an extra 4 GB of RAM just in case.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 2:56 PM, Diego Zuccato  
wrote:

> The volume is built by 26 10TB disks w/ genetic data. I currently don't
> have exact numbers, but it's still at the beginning, so there are a bit
> less than 10TB actually used.
> But you're only removing the arbiters, you always have two copies of
> your files. The worst that can happen is a split brain condition
> (avoidable by requiring a 2-nodes quorum, in that case the worst is that
> the volume goes readonly).

Right, seen liked that this sounds reasonable. Do you actually remember the 
exact command you ran in order to remove the brick? I was thinking this should 
be it:

gluster volume remove-brick   force

but should I use "force" or "start"?

> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.

That's quite long I must say and I am in the same case as you, my arbiter is a 
VM.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
On Monday, October 26, 2020 11:34 AM, Diego Zuccato  
wrote:

> IIRC it's the same issue I had some time ago.
> I solved it by "degrading" the volume to replica 2, then cleared the
> arbiter bricks and upgraded again to replica 3 arbiter 1.

Thanks Diego for pointing out this workaround. How much data do you have on 
that volume in terms of TB and files? Because I have around 3TB of data in 10 
million files. So I am a bit worried of taking such drastic measures.

How bad was the load after on your volume when re-adding the arbiter brick? and 
how long did it take to sync/heal?

Would another workaround such as turning off quotas on that problematic volume 
work? That sounds much less scary but I don't know if that would work...




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread Diego Zuccato
Il 26/10/20 15:09, mabi ha scritto:

> Right, seen liked that this sounds reasonable. Do you actually remember the 
> exact command you ran in order to remove the brick? I was thinking this 
> should be it:
> gluster volume remove-brick   force
> but should I use "force" or "start"?
Memory does not serve me well (there are 28 disks, not 26!), but bash
history does :)
# gluster volume remove-brick BigVol replica 2
str957-biostq:/srv/arbiters/{00..27}/BigVol force
# gluster peer detach str957-biostq
# gluster peer probe str957-biostq
# gluster volume add-brick BigVol replica 3 arbiter 1
str957-biostq:/srv/arbiters/{00..27}/BigVol

You obviously have to wait for remove-brick to complete before detaching
arbiter.

>> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
>> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.
> That's quite long I must say and I am in the same case as you, my arbiter is 
> a VM.
Give all the CPU and RAM you can. Less than 8GB RAM is asking for
troubles (in my case).

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread Diego Zuccato
Il 26/10/20 14:46, mabi ha scritto:

>> I solved it by "degrading" the volume to replica 2, then cleared the
>> arbiter bricks and upgraded again to replica 3 arbiter 1.
> Thanks Diego for pointing out this workaround. How much data do you have on 
> that volume in terms of TB and files? Because I have around 3TB of data in 10 
> million files. So I am a bit worried of taking such drastic measures.
The volume is built by 26 10TB disks w/ genetic data. I currently don't
have exact numbers, but it's still at the beginning, so there are a bit
less than 10TB actually used.
But you're only removing the arbiters, you always have two copies of
your files. The worst that can happen is a split brain condition
(avoidable by requiring a 2-nodes quorum, in that case the worst is that
the volume goes readonly).

> How bad was the load after on your volume when re-adding the arbiter brick? 
> and how long did it take to sync/heal?
IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.

> Would another workaround such as turning off quotas on that problematic 
> volume work? That sounds much less scary but I don't know if that would 
> work...
I don't know, sorry.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Glusterfs as databse store

2020-10-26 Thread Alex K
HI Strahil,

Thanx for your feedback.
I had already received your feedback which seems to be very useful.
You had pointed at /var/lib/glusterd/groups/db-workload profile which
includes recommended gluster volume settings for such work-loads (includes
direct IO).
I will be testing this setup though I expect no issues apart from slower
performance then native setup.

On Sun, Oct 25, 2020 at 9:45 PM Strahil Nikolov 
wrote:

> Hey Alex,
>
> sorry for the late reply - seems you went to the SPAM dir.
>
> I think that a DB with direct I/O won't have any issues with Gluster.As a
> second thought , DBs know their data file names , so even 1 file per table
> will work quite OK.
>

> But you will need a lot of testing before putting something into
> production.
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В понеделник, 12 октомври 2020 г., 21:10:03 Гринуич+3, Alex K <
> rightkickt...@gmail.com> написа:
>
>
>
>
>
>
>
> On Mon, Oct 12, 2020, 19:24 Strahil Nikolov  wrote:
> > Hi Alex,
> >
> > I can share that oVirt is using Gluster as a HCI solution and many
> people are hosting DBs in their Virtual Machines.Yet, oVirt bypasses any
> file system caches and uses Direct I/O in order to ensure consistency.
> >
> > As you will be using pacemaker, drbd is a viable solution that can be
> controlled easily.
> Thank you Strahil. I am using ovirt with glusterfs successfully for the
> last 5 years and I'm very happy about it. Though the vms gluster volume has
> sharding enabled by default and I suspect this is different if you run DB
> directly on top glusterfs. I assume there are optimizations one could apply
> at gluster volumes (use direct io?, small file workload optimizations, etc)
> and was hoping that there were success stories of DBs on top glusterfs.  I
> might go with drbd as the latest version is much more scalable and
> simplified.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> >
> >
> >
> >
> >
> > В понеделник, 12 октомври 2020 г., 12:12:18 Гринуич+3, Alex K <
> rightkickt...@gmail.com> написа:
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Oct 12, 2020 at 9:47 AM Diego Zuccato 
> wrote:
> >> Il 10/10/20 16:53, Alex K ha scritto:
> >>
> >>> Reading from the docs i see that this is not recommended?
> >> IIUC the risk of having partially-unsynced data is is too high.
> >> DB replication is not easy to configure because it's hard to do well,
> >> even active/passive.
> >> But I can tell you that a 3-node mariadb (galera) cluster is not hard to
> >> setup. Just follow one of the tutorials. It's nearly as easy as setting
> >> up a replica3 gluster volume :)
> >> And "guarantees" consinstency in the DB data.
> > I see. Since I will not have only mariadb, then I have to setup the same
> replication for postgresql and later influxdb, which adds into the
> complexity.
> > For cluster management I will be using pacemaker/corosync.
> >
> > Thanx for your feedback
> >
> >>
> >> --
> >> Diego Zuccato
> >> DIFA - Dip. di Fisica e Astronomia
> >> Servizi Informatici
> >> Alma Mater Studiorum - Università di Bologna
> >> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >> tel.: +39 051 20 95786
> >>
> > 
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] missing files on FUSE mount

2020-10-26 Thread Martín Lorenzo
HI Strahil, thanks for your reply,
I had one node with 13 clients, the rest with 14. I've just restarted the
services on that node, now I have 14, let's see what happens.
Regarding the samba repos, I wasn't aware of that, I was using centos main
repo. I'll check the out
Best Regards,
Martin


On Tue, Oct 20, 2020 at 3:19 PM Strahil Nikolov 
wrote:

> Do you have the same ammount of clients connected to each brick ?
>
> I guess something like this can show it:
>
> gluster volume status VOL clients
> gluster volume status VOL client-list
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В вторник, 20 октомври 2020 г., 15:41:45 Гринуич+3, Martín Lorenzo <
> mlore...@gmail.com> написа:
>
>
>
>
>
> Hi, I have the following problem, I have a distributed replicated cluster
> set up with samba and CTDB, over fuse mount points
> I am having inconsistencies across the FUSE mounts, users report that
> files are disappearing after being copied/moved. I take a look at the mount
> points on each node, and they don't display the same data
>
>  faulty mount point
> [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file or
> directory
> ls: cannot access PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg: No such file or
> directory
> total 633723
> drwxr-xr-x. 5 arribagente PN  4096 Oct 19 10:52 COMERCIAL AG martes 20
> de octubre
> -rw-r--r--. 1 arribagente PN 648927236 Jun  3 07:16 PANEO FACHADA PALACIO
> LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -?? ? ?   ?  ?? PANEO NIÑOS ESCUELAS
> CON TAPABOCAS.mpg
> -?? ? ?   ?  ?? PANEO VUELTA A CLASES
> CON TAPABOCAS.mpg
>
>
> ###healthy mount point###
> [root@gluster7 ARRIBA GENTE martes 20 de octubre]# ll
> total 3435596
> drwxr-xr-x. 5 arribagente PN   4096 Oct 19 10:52 COMERCIAL AG martes
> 20 de octubre
> -rw-r--r--. 1 arribagente PN  648927236 Jun  3 07:16 PANEO FACHADA PALACIO
> LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS
> CON TAPABOCAS.mpg
> -rw-r--r--. 1 arribagente PN  784701444 Sep  4 07:23 PANEO VUELTA A CLASES
> CON TAPABOCAS.mpg
>
>  - So far the only way to solve this is to create a directory in the
> healthy mount point, on the same path:
> [root@gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola
>
> - When you refresh the other mountpoint, and the issue is resolved:
> [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> total 3435600
> drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes
> 20 de octubre
> drwxr-xr-x. 2 rootroot   4096 Oct 20 08:45 hola
> -rw-r--r--. 1 arribagente PN648927236 Jun  3 07:16 PANEO FACHADA
> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -rw-r--r--. 1 arribagente PN   2084415492 Aug 18 09:14 PANEO NIÑOS
> ESCUELAS CON TAPABOCAS.mpg
> -rw-r--r--. 1 arribagente PN784701444 Sep  4 07:23 PANEO VUELTA A
> CLASES CON TAPABOCAS.mpg
>
> Interestingly, the error occurs on the mount point where the files were
> copied. They don't show up as pending heal entries. I have around 15 people
> using them over samba, I think I'm having this issue reported every two
> days.
>
> I have an older cluster with similar issues, different gluster version,
> but a very similar topology (4 bricks, initially two bricks then expanded)
> Please note , the bricks aren't the same size (but their replicas are), so
> my other suspicion is that rebalancing has something to do with it.
>
> I'm trying to reproduce it over a small virtualized cluster, so far no
> results.
>
> Here are the cluster details
> four nodes, replica 2, plus one arbiter hosting 2 bricks
>
> I have 2 bricks with ~20 TB capacity and the other pair is ~48TB
> Volume Name: tapeless
> Type: Distributed-Replicate
> Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick
> Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick
> Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick
> (arbiter)
> Brick4: gluster12.glustersaeta.net:/data/glusterfs/tapeless/brick_12/brick
> Brick5: gluster13.glustersaeta.net:/data/glusterfs/tapeless/brick_13/brick
> Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick
> (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> features.quota: on
> features.inode-quota: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.cache-samba-metadata: on
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 20
> performance.nl-cache: on
> performance.nl-cache-timeout: 600
> 

Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
Dear all,

Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to 7.8 but 
now, 1 week later after the upgrade, I have rebooted my third node (arbiter 
node) and unfortunately the bricks do not want to come up on that node. I get 
the same following error message:

[2020-10-26 06:21:59.726705] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote 
cksum = 66908910 on peer node2.domain
[2020-10-26 06:21:59.726871] I [MSGID: 106493] 
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to node2.domain (0), ret: 0, op_ret: -1
[2020-10-26 06:21:59.728164] I [MSGID: 106490] 
[glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: 5f4ccbf4-33f6-4298-8b31-213553223349
[2020-10-26 06:21:59.728969] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote 
cksum = 66908910 on peer node1.domain
[2020-10-26 06:21:59.729099] I [MSGID: 106493] 
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to node1.domain (0), ret: 0, op_ret: -1

Can someone please advise what I need to do in order to have my arbiter node up 
and running again as soon as possible?

Thank you very much in advance for your help.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Monday, September 7, 2020 5:49 AM, Sanju Rakonde  wrote:

> Hi,
>
> issue https://github.com/gluster/glusterfs/issues/1332 is fixed now with 
> https://github.com/gluster/glusterfs/commit/865cca1190e233381f975ff36118f46e29477dcf.
>
> It will be backported to release-7 and release-8 branches soon.
>
> On Mon, Sep 7, 2020 at 1:14 AM Strahil Nikolov  wrote:
>
>> Your e-mail got in the spam...
>>
>> If you haven't fixed the issue, check Hari's topic about quota issues (based 
>> on the error message you provided) : 
>> https://medium.com/@harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a
>>
>> Most probably there is a quota issue and you need to fix it.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В неделя, 23 август 2020 г., 11:05:27 Гринуич+3, mabi  
>> написа:
>>
>> Hello,
>>
>> So to be precise I am exactly having the following issue:
>>
>> https://github.com/gluster/glusterfs/issues/1332
>>
>> I could not wait any longer to find some workarounds or quick fixes so I 
>> decided to downgrade my rejected from 7.7 back to 6.9 which worked.
>>
>> I would be really glad if someone could fix this issue or provide me a 
>> workaround which works because version 6 of GlusterFS is not supported 
>> anymore so I would really like to move on to the stable version 7.
>>
>> Thank you very much in advance.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>>
>> On Saturday, August 22, 2020 7:53 PM, mabi  wrote:
>>
>>> Hello,
>>>
>>> I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS 
>>> from 6.9 to 7.7 but unfortunately after upgrading the first node, that node 
>>> gets rejected due to the following error:
>>>
>>> [2020-08-22 17:43:00.240990] E [MSGID: 106012] 
>>> [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums 
>>> of quota configuration of volume myvolume differ. local cksum = 3013120651, 
>>> remote cksum = 0 on peer myfirstnode.domain.tld
>>>
>>> So glusterd process is running but not glusterfsd.
>>>
>>> I am exactly in the same issue as described here:
>>>
>>> https://www.gitmemory.com/Adam2Marsh
>>>
>>> But I do not see any solutions or workaround. So now I am stuck with a 
>>> degraded GlusterFS cluster.
>>>
>>> Could someone please advise me as soon as possible on what I should do? Is 
>>> there maybe any workarounds?
>>>
>>> Thank you very much in advance for your response.
>>>
>>> Best regards,
>>> Mabi
>>
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Thanks,
> Sanju



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread Diego Zuccato
Il 26/10/20 07:40, mabi ha scritto:

> Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to
> 7.8 but now, 1 week later after the upgrade, I have rebooted my third
> node (arbiter node) and unfortunately the bricks do not want to come up
> on that node. I get the same following error message:
IIRC it's the same issue I had some time ago.
I solved it by "degrading" the volume to replica 2, then cleared the
arbiter bricks and upgraded again to replica 3 arbiter 1.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users