Re: [Gluster-users] Failed to get quota limits

2018-02-13 Thread mabi
mits for a27818fe-0248-40fe-bb23-d43d61010478
[2018-02-13 08:16:14.082067] E 
[cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
quota limits for daf97388-bcec-4cc0-a8ef-5b93f05b30f6
[2018-02-13 08:16:14.086929] E 
[cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
quota limits for 3c768b36-2625-4509-87ef-fe5214cb9b01
[2018-02-13 08:16:14.087905] E 
[cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
quota limits for f8cf47d4-4f54-43c5-ab0d-75b45b4677a3
[2018-02-13 08:16:14.089788] E 
[cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
quota limits for b4c81a39-2152-45c5-95d3-b796d88226fe
[2018-02-13 08:16:14.092919] E 
[cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
quota limits for 16ac4cde-a5d4-451f-adcc-422a542fea24
[2018-02-13 08:16:14.092980] I [input.c:31:cli_batch] 0-: Exiting with: 0

*** /var/log/glusterfs/bricks/data-myvolume-brick.log ***

[2018-02-13 08:16:13.948065] I [addr.c:182:gf_auth] 0-/data/myvolume/brick: 
allowed = "*", received addr = "127.0.0.1"
[2018-02-13 08:16:13.948105] I [login.c:76:gf_auth] 0-auth/login: allowed user 
names: bea3e634-e174-4bb3-a1d6-25b09d03b536
[2018-02-13 08:16:13.948125] I [MSGID: 115029] 
[server-handshake.c:695:server_setvolume] 0-myvolume-server: accepted client 
from gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0 (version: 
3.10.7)
[2018-02-13 08:16:14.022257] I [MSGID: 115036] [server.c:559:server_rpc_notify] 
0-myvolume-server: disconnecting connection from 
gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0
[2018-02-13 08:16:14.022465] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 
0-myvolume-server: Shutting down connection 
gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0

 Original Message 
On February 13, 2018 12:47 AM, Hari Gowtham <hgowt...@redhat.com> wrote:

> Hi,
>
> Can you provide more information like, the volume configuration, quota.conf 
> file and the log files.
>
> On Sat, Feb 10, 2018 at 1:05 AM, mabi <m...@protonmail.ch> wrote:
>> Hello,
>>
>> I am running GlusterFS 3.10.7 and just noticed by doing a "gluster volume 
>> quota  list" that my quotas on that volume are broken. The command 
>> returns no output and no errors but by looking in /var/log/glusterfs.cli I 
>> found the following errors:
>>
>> [2018-02-09 19:31:24.242324] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for 3df709ee-641d-46a2-bd61-889583e3033c
>> [2018-02-09 19:31:24.249790] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for a27818fe-0248-40fe-bb23-d43d61010478
>> [2018-02-09 19:31:24.252378] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for daf97388-bcec-4cc0-a8ef-5b93f05b30f6
>> [2018-02-09 19:31:24.256775] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for 3c768b36-2625-4509-87ef-fe5214cb9b01
>> [2018-02-09 19:31:24.257434] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for f8cf47d4-4f54-43c5-ab0d-75b45b4677a3
>> [2018-02-09 19:31:24.259126] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for b4c81a39-2152-45c5-95d3-b796d88226fe
>> [2018-02-09 19:31:24.261664] E 
>> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get 
>> quota limits for 16ac4cde-a5d4-451f-adcc-422a542fea24
>> [2018-02-09 19:31:24.261719] I [input.c:31:cli_batch] 0-: Exiting with: 0
>>
>> How can I fix my quota on that volume again? I had around 30 quotas set on 
>> different directories of that volume.
>>
>> Thanks in advance.
>>
>> Regards,
>> M.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Regards,
> Hari Gowtham.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failed to get quota limits

2018-02-13 Thread mabi
Thank you for your answer. This problem seem to have started since last week, 
so should I also send you the same log files but for last week? I think 
logrotate rotates them on a weekly basis.

The only two quota commands we use are the following:

gluster volume quota myvolume limit-usage /directory 10GB
gluster volume quota myvolume list

basically to set a new quota or to list the current quotas. The quota list was 
working in the past yes but we already had a similar issue where the quotas 
disappeared last August 2017:

http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html

In the mean time the only thing we did is to upgrade from 3.8 to 3.10.

There are actually no errors to be seen using any gluster commands. The "quota 
myvolume list" returns simply nothing.

In order to lookup the directories should I run a "stat" on them? and if yes 
should I do that on a client through the fuse mount?
​

 Original Message 
 On February 13, 2018 10:58 AM, Hari Gowtham <hgowt...@redhat.com> wrote:

>The log provided are from 11th, you have seen the issue a while before
> that itself.
>
> The logs help us to know if something has actually went wrong.
> once something goes wrong the output might get affected and i need to know 
> what
> went wrong. Which means i need the log from the beginning.
>
> and i need to know a few more things,
> Was the quota list command was working as expected at the beginning?
> If yes, what were the commands issued, before you noticed this problem.
> Is there any other error that you see other than this?
>
> And can you try looking up the directories the limits are set on and
> check if that fixes the error?
>
>> Original Message 
>> On February 13, 2018 10:44 AM, mabi m...@protonmail.ch wrote:
>>>Hi Hari,
>>>Sure no problem, I will send you in a minute another mail where you can 
>>>download all the relevant log files including the quota.conf binary file. 
>>>Let me know if you need anything else. In the mean time here below is the 
>>>output of a volume status.
>>>Best regards,
>>> M.
>>>Status of volume: myvolume
>>> Gluster process TCP Port  RDMA Port  Online  Pid
>>>Brick gfs1a.domain.local:/data/myvolume
>>> /brick  49153 0  Y   3214
>>> Brick gfs1b.domain.local:/data/myvolume
>>> /brick  49154 0  Y   3256
>>> Brick gfs1c.domain.local:/srv/glusterf
>>> s/myvolume/brick 49153 0  Y   515
>>> Self-heal Daemon on localhost   N/A   N/AY   
>>> 3186
>>> Quota Daemon on localhost   N/A   N/AY   
>>> 3195
>>> Self-heal Daemon on gfs1b.domain.local N/A   N/AY   3217
>>> Quota Daemon on gfs1b.domain.local N/A   N/AY   3229
>>> Self-heal Daemon on gfs1c.domain.local N/A   N/AY   486
>>> Quota Daemon on gfs1c.domain.local N/A   N/AY   495
>>>Task Status of Volume myvolume
>>>There are no active volume tasks
>>> Original Message 
>>> On February 13, 2018 10:09 AM, Hari Gowtham hgowt...@redhat.com wrote:
>>>>Hi,
>>>> A part of the log won't be enough to debug the issue.
>>>> Need the whole log messages till date.
>>>> You can send it as attachments.
>>>> Yes the quota.conf is a binary file.
>>>> And I need the volume status output too.
>>>> On Tue, Feb 13, 2018 at 1:56 PM, mabi m...@protonmail.ch wrote:
>>>>>Hi Hari,
>>>>> Sorry for not providing you more details from the start. Here below you 
>>>>> will
>>>>> find all the relevant log entries and info. Regarding the quota.conf file 
>>>>> I
>>>>> have found one for my volume but it is a binary file. Is it supposed to be
>>>>> binary or text?
>>>>> Regards,
>>>>> M.
>>>>> *** gluster volume info myvolume ***
>>>>> Volume Name: myvolume
>>>>> Type: Replicate
>>>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: gfs1a.domain.local:/data/myvolume/brick
>>>>> Brick2: gfs1b.domain.local:/data/myvolume/brick
>>>>> Brick3: gfs1c.domain.local:/srv/

Re: [Gluster-users] Failed to get quota limits

2018-02-13 Thread mabi
I tried to set the limits as you suggest by running the following command.

$ sudo gluster volume quota myvolume limit-usage /directory 200GB
volume quota : success

but then when I list the quotas there is still nothing, so nothing really 
happened.

I also tried to run stat on all directories which have a quota but nothing 
happened either.

I will send you tomorrow all the other logfiles as requested.
​

 Original Message 
 On February 13, 2018 12:20 PM, Hari Gowtham <hgowt...@redhat.com> wrote:

>Were you able to set new limits after seeing this error?
>
> On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com wrote:
>>Yes, I need the log files in that duration, the log rotated file after
>> hitting the
>> issue aren't necessary, but the ones before hitting the issues are needed
>> (not just when you hit it, the ones even before you hit it).
>>Yes, you have to do a stat from the client through fuse mount.
>>On Tue, Feb 13, 2018 at 3:56 PM, mabi m...@protonmail.ch wrote:
>>>Thank you for your answer. This problem seem to have started since last 
>>>week, so should I also send you the same log files but for last week? I 
>>>think logrotate rotates them on a weekly basis.
>>>The only two quota commands we use are the following:
>>>gluster volume quota myvolume limit-usage /directory 10GB
>>> gluster volume quota myvolume list
>>>basically to set a new quota or to list the current quotas. The quota list 
>>>was working in the past yes but we already had a similar issue where the 
>>>quotas disappeared last August 2017:
>>>http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html
>>>In the mean time the only thing we did is to upgrade from 3.8 to 3.10.
>>>There are actually no errors to be seen using any gluster commands. The 
>>>"quota myvolume list" returns simply nothing.
>>>In order to lookup the directories should I run a "stat" on them? and if yes 
>>>should I do that on a client through the fuse mount?
>>> Original Message 
>>> On February 13, 2018 10:58 AM, Hari Gowtham hgowt...@redhat.com wrote:
>>>>The log provided are from 11th, you have seen the issue a while before
>>>> that itself.
>>>>The logs help us to know if something has actually went wrong.
>>>> once something goes wrong the output might get affected and i need to know 
>>>> what
>>>> went wrong. Which means i need the log from the beginning.
>>>>and i need to know a few more things,
>>>> Was the quota list command was working as expected at the beginning?
>>>> If yes, what were the commands issued, before you noticed this problem.
>>>> Is there any other error that you see other than this?
>>>>And can you try looking up the directories the limits are set on and
>>>> check if that fixes the error?
>>>>> Original Message 
>>>>> On February 13, 2018 10:44 AM, mabi m...@protonmail.ch wrote:
>>>>>>Hi Hari,
>>>>>> Sure no problem, I will send you in a minute another mail where you can 
>>>>>> download all the relevant log files including the quota.conf binary 
>>>>>> file. Let me know if you need anything else. In the mean time here below 
>>>>>> is the output of a volume status.
>>>>>> Best regards,
>>>>>> M.
>>>>>> Status of volume: myvolume
>>>>>> Gluster process TCP Port  RDMA Port  Online  
>>>>>> Pid
>>>>>> Brick gfs1a.domain.local:/data/myvolume
>>>>>> /brick  49153 0  Y   3214
>>>>>> Brick gfs1b.domain.local:/data/myvolume
>>>>>> /brick  49154 0  Y   3256
>>>>>> Brick gfs1c.domain.local:/srv/glusterf
>>>>>> s/myvolume/brick 49153 0  Y   515
>>>>>> Self-heal Daemon on localhost   N/A   N/AY   
>>>>>> 3186
>>>>>> Quota Daemon on localhost   N/A   N/AY   
>>>>>> 3195
>>>>>> Self-heal Daemon on gfs1b.domain.local N/A   N/AY   3217
>>>>>> Quota Daemon on gfs1b.domain.local N/A   N/AY   3229
>>>>>> Self-heal Daemon on gfs1c.domain.local N/A   N/AY   486
>>>>>> Quota Daemon on gfs1c.domain.l

Re: [Gluster-users] Failed to get quota limits

2018-02-24 Thread mabi
Dear Hari,

Thank you for getting back to me after having analysed the problem.

As you said I tried to run "gluster volume quota  list " for all 
of my directories which have a quota and found out that there was one directory 
quota which was missing (stale) as you can see below:

$ gluster volume quota myvolume list /demo.domain.tld
  Path   Hard-limit  Soft-limit  Used  
Available  Soft-limit exceeded? Hard-limit exceeded?
---
/demo.domain.tldN/AN/A  8.0MB   
N/A N/AN/A

So as you suggest I added again the quota on that directory and now the "list" 
finally works again and show the quotas for every directories as I defined 
them. That did the trick!

Now do you know if this bug is already corrected in a new release of GlusterFS? 
if not do you know when it will be fixed?

Again many thanks for your help here!

Best regards,
M.

‐‐‐ Original Message ‐‐‐

On February 23, 2018 7:45 AM, Hari Gowtham <hgowt...@redhat.com> wrote:

> ​​
> 
> Hi,
> 
> There is a bug in 3.10 which doesn't allow the quota list command to
> 
> output, if the last entry on the conf file is a stale entry.
> 
> The workaround for this is to remove the stale entry at the end. (If
> 
> the last two entries are stale then both have to be removed and so on
> 
> until the last entry on the conf file is a valid entry).
> 
> This can be avoided by adding a new limit. As the new limit you added
> 
> didn't work there is another way to check this.
> 
> Try quota list command with a specific limit mentioned in the command.
> 
> gluster volume quota  list 
> 
> Make sure this path and the limit are set.
> 
> If this works then you need to clean up the last stale entry.
> 
> If this doesn't work we need to look further.
> 
> Thanks Sanoj for the guidance.
> 
> On Wed, Feb 14, 2018 at 1:36 AM, mabi m...@protonmail.ch wrote:
> 
> > I tried to set the limits as you suggest by running the following command.
> > 
> > $ sudo gluster volume quota myvolume limit-usage /directory 200GB
> > 
> > volume quota : success
> > 
> > but then when I list the quotas there is still nothing, so nothing really 
> > happened.
> > 
> > I also tried to run stat on all directories which have a quota but nothing 
> > happened either.
> > 
> > I will send you tomorrow all the other logfiles as requested.
> > 
> > \-\-\-\-\-\-\-\- Original Message 
> > 
> > On February 13, 2018 12:20 PM, Hari Gowtham hgowt...@redhat.com wrote:
> > 
> > > Were you able to set new limits after seeing this error?
> > > 
> > > On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com wrote:
> > > 
> > > > Yes, I need the log files in that duration, the log rotated file after
> > > > 
> > > > hitting the
> > > > 
> > > > issue aren't necessary, but the ones before hitting the issues are 
> > > > needed
> > > > 
> > > > (not just when you hit it, the ones even before you hit it).
> > > > 
> > > > Yes, you have to do a stat from the client through fuse mount.
> > > > 
> > > > On Tue, Feb 13, 2018 at 3:56 PM, mabi m...@protonmail.ch wrote:
> > > > 
> > > > > Thank you for your answer. This problem seem to have started since 
> > > > > last week, so should I also send you the same log files but for last 
> > > > > week? I think logrotate rotates them on a weekly basis.
> > > > > 
> > > > > The only two quota commands we use are the following:
> > > > > 
> > > > > gluster volume quota myvolume limit-usage /directory 10GB
> > > > > 
> > > > > gluster volume quota myvolume list
> > > > > 
> > > > > basically to set a new quota or to list the current quotas. The quota 
> > > > > list was working in the past yes but we already had a similar issue 
> > > > > where the quotas disappeared last August 2017:
> > > > > 
> > > > > http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html
> > > > > 
> > > > > In the mean time the only thing we did is to upgrade from 3.8 to 3.10.
> > > > > 
> > > > > There are actually no errors to be seen using any gluster commands. 
> > > > > The "quota myvolume list" returns simply n

Re: [Gluster-users] glustereventsd not being stopped by systemd script

2018-07-30 Thread mabi
Hi Aravinda,

Thanks for the info, somehow I wasn't aware about this new service. Now it's 
clear and I updated my documentation.

Best regards,
M.

‐‐‐ Original Message ‐‐‐
On July 30, 2018 5:59 AM, Aravinda Vishwanathapura Krishna Murthy 
 wrote:

> On Mon, Jul 30, 2018 at 1:03 AM mabi  wrote:
>
>> Hi,
>>
>> I just noticed that when I run a "systemctl stop glusterfs" on Debian 9 the 
>> following glustereventsd processes are still running:
>>
>> root  2471 1  0 22:03 ?00:00:00 python 
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>> root  2489  2471  0 22:03 ?00:00:00 python 
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>>
>> Isn't the glusterfs systemd command also supposed to stop these?
>
> glustereventsd is a separate process which can be managed independent of 
> glusterd. "systemctl stop glustereventsd" will stop the eventsd service.
>
>> I ran into this while upgrading from 3.12.9 to 3.12.12 and I thought I would 
>> mention it in case it has been forgotten.
>>
>> Best regards,
>> M.
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> regards
> Aravinda VK___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glustereventsd not being stopped by systemd script

2018-07-29 Thread mabi
Hi,

I just noticed that when I run a "systemctl stop glusterfs" on Debian 9 the 
following glustereventsd processes are still running:

root  2471 1  0 22:03 ?00:00:00 python /usr/sbin/glustereventsd 
--pid-file /var/run/glustereventsd.pid
root  2489  2471  0 22:03 ?00:00:00 python /usr/sbin/glustereventsd 
--pid-file /var/run/glustereventsd.pid

Isn't the glusterfs systemd command also supposed to stop these?

I ran into this while upgrading from 3.12.9 to 3.12.12 and I thought I would 
mention it in case it has been forgotten.

Best regards,
M.



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-08-09 Thread mabi
Hello,

I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and 
now I see a weird behaviour on my client (using FUSE mount) where I have 
processes (PHP 5.6 FPM) trying to access a specific directory and then the 
process blocks. I can't kill the process either, not even with kill -9. I need 
to reboot the machine in order to get rid of these blocked processes.

This directory has one particularity compared to the other directories it is 
that it has reached it's quota soft-limit as you can see here in the output of 
gluster volume quota list:

  Path   Hard-limit  Soft-limit  Used  
Available  Soft-limit exceeded? Hard-limit exceeded?
---
/directory  100.0GB 80%(80.0GB)   90.5GB   9.5GB
 Yes   No

That does not mean that it is the quota's fault but it might be a hint where to 
start looking for... And by the way can someone explain me what the soft-limit 
does? or does it not do anything special?

Here is an the linux stack of a blocking process on that directory which 
happened with a simple "ls -la":

[Thu Aug  9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug  9 14:21:07 2018]   Not tainted 3.16.0-4-amd64 #1
[Thu Aug  9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Thu Aug  9 14:21:07 2018] ls  D 88017ef93200 0  2272   
2268 0x0004
[Thu Aug  9 14:21:07 2018]  88017653f490 0286 00013200 
880174d7bfd8
[Thu Aug  9 14:21:07 2018]  00013200 88017653f490 8800eeb3d5f0 
8800fefac800
[Thu Aug  9 14:21:07 2018]  880174d7bbe0 8800eeb3d6d0 8800fefac800 
8800ffe1e1c0
[Thu Aug  9 14:21:07 2018] Call Trace:
[Thu Aug  9 14:21:07 2018]  [] ? 
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug  9 14:21:07 2018]  [] ? 
prepare_to_wait_event+0xf0/0xf0
[Thu Aug  9 14:21:07 2018]  [] ? 
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug  9 14:21:07 2018]  [] ? lookup_fast+0x25e/0x2b0
[Thu Aug  9 14:21:07 2018]  [] ? path_lookupat+0x155/0x780
[Thu Aug  9 14:21:07 2018]  [] ? kmem_cache_alloc+0x75/0x480
[Thu Aug  9 14:21:07 2018]  [] ? fuse_getxattr+0xe9/0x150 
[fuse]
[Thu Aug  9 14:21:07 2018]  [] ? filename_lookup+0x26/0xc0
[Thu Aug  9 14:21:07 2018]  [] ? user_path_at_empty+0x54/0x90
[Thu Aug  9 14:21:07 2018]  [] ? kmem_cache_free+0xd8/0x210
[Thu Aug  9 14:21:07 2018]  [] ? user_path_at_empty+0x5f/0x90
[Thu Aug  9 14:21:07 2018]  [] ? vfs_fstatat+0x46/0x90
[Thu Aug  9 14:21:07 2018]  [] ? SYSC_newlstat+0x1d/0x40
[Thu Aug  9 14:21:07 2018]  [] ? SyS_lgetxattr+0x58/0x80
[Thu Aug  9 14:21:07 2018]  [] ? 
system_call_fast_compare_end+0x10/0x15


My 3 gluster nodes are all Debian 9 and my client Debian 8.

Let me know if you need more information.

Best regards,
Mabi
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-08-09 Thread mabi
Hi Nithya,

Thanks for the fast answer. Here the additional info:

1. gluster volume info

Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
Options Reconfigured:
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92

2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which 
command should I use for that? and which process would you like, the blocked 
process (for example "ls")?

Regards,
M.

‐‐‐ Original Message ‐‐‐
On August 9, 2018 3:10 PM, Nithya Balachandran  wrote:

> Hi,
>
> Please provide the following:
>
> - gluster volume info
> - statedump of the fuse process when it hangs
>
> Thanks,
> Nithya
>
> On 9 August 2018 at 18:24, mabi  wrote:
>
>> Hello,
>>
>> I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 
>> and now I see a weird behaviour on my client (using FUSE mount) where I have 
>> processes (PHP 5.6 FPM) trying to access a specific directory and then the 
>> process blocks. I can't kill the process either, not even with kill -9. I 
>> need to reboot the machine in order to get rid of these blocked processes.
>>
>> This directory has one particularity compared to the other directories it is 
>> that it has reached it's quota soft-limit as you can see here in the output 
>> of gluster volume quota list:
>>
>>   Path   Hard-limit  Soft-limit  Used  
>> Available  Soft-limit exceeded? Hard-limit exceeded?
>> ---
>> /directory  100.0GB 80%(80.0GB)   90.5GB   9.5GB 
>> Yes   No
>>
>> That does not mean that it is the quota's fault but it might be a hint where 
>> to start looking for... And by the way can someone explain me what the 
>> soft-limit does? or does it not do anything special?
>>
>> Here is an the linux stack of a blocking process on that directory which 
>> happened with a simple "ls -la":
>>
>> [Thu Aug  9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 
>> seconds.
>> [Thu Aug  9 14:21:07 2018]   Not tainted 3.16.0-4-amd64 #1
>> [Thu Aug  9 14:21:07 2018] "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Thu Aug  9 14:21:07 2018] ls  D 88017ef93200 0  2272   
>> 2268 0x0004
>> [Thu Aug  9 14:21:07 2018]  88017653f490 0286 
>> 00013200 880174d7bfd8
>> [Thu Aug  9 14:21:07 2018]  00013200 88017653f490 
>> 8800eeb3d5f0 8800fefac800
>> [Thu Aug  9 14:21:07 2018]  880174d7bbe0 8800eeb3d6d0 
>> 8800fefac800 8800ffe1e1c0
>> [Thu Aug  9 14:21:07 2018] Call Trace:
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> __fuse_request_send+0xbd/0x270 [fuse]
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> prepare_to_wait_event+0xf0/0xf0
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> fuse_dentry_revalidate+0x181/0x300 [fuse]
>> [Thu Aug  9 14:21:07 2018]  [] ? lookup_fast+0x25e/0x2b0
>> [Thu Aug  9 14:21:07 2018]  [] ? path_lookupat+0x155/0x780
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> kmem_cache_alloc+0x75/0x480
>> [Thu Aug  9 14:21:07 2018]  [] ? fuse_getxattr+0xe9/0x150 
>> [fuse]
>> [Thu Aug  9 14:21:07 2018]  [] ? filename_lookup+0x26/0xc0
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> user_path_at_empty+0x54/0x90
>> [Thu Aug  9 14:21:07 2018]  [] ? kmem_cache_free+0xd8/0x210
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> user_path_at_empty+0x5f/0x90
>> [Thu Aug  9 14:21:07 2018]  [] ? vfs_fstatat+0x46/0x90
>> [Thu Aug  9 14:21:07 2018]  [] ? SYSC_newlstat+0x1d/0x40
>> [Thu Aug  9 14:21:07 2018]  [] ? SyS_lgetxattr+0x58/0x80
>> [Thu Aug  9 14:21:07 2018]  [] ? 
>> system_call_fast_compare_end+0x10/0x15
>>
>> My 3 gluster nodes are all Debian 9 and my client Debian 8.
>>
>> Let me know if you need more information.
>>
>> Best regards,
>> Mabi
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-08-09 Thread mabi
Thanks for the documentation. On my client using FUSE mount I found the PID by 
using ps (output below):

root   456 1  4 14:17 ?00:05:15 /usr/sbin/glusterfs 
--volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private

Then I ran the following command

sudo kill -USR1 456

but now I can't find where the files are stored. Are these supposed to be 
stored on the client directly? I checked /var/run/gluster and /var/log/gluster 
but could not see anything and /var/log/gluster does not even exist on the 
client.

‐‐‐ Original Message ‐‐‐
On August 9, 2018 3:59 PM, Raghavendra Gowdappa  wrote:

> On Thu, Aug 9, 2018 at 6:47 PM, mabi  wrote:
>
>> Hi Nithya,
>>
>> Thanks for the fast answer. Here the additional info:
>>
>> 1. gluster volume info
>>
>> Volume Name: myvol-private
>> Type: Replicate
>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs1a:/data/myvol-private/brick
>> Brick2: gfs1b:/data/myvol-private/brick
>> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
>> Options Reconfigured:
>> features.default-soft-limit: 95%
>> transport.address-family: inet
>> features.quota-deem-statfs: on
>> features.inode-quota: on
>> features.quota: on
>> nfs.disable: on
>> performance.readdir-ahead: on
>> client.event-threads: 4
>> server.event-threads: 4
>> auth.allow: 192.168.100.92
>>
>> 2. Sorry I have no clue how to take a "statedump" of a process on Linux. 
>> Which command should I use for that? and which process would you like, the 
>> blocked process (for example "ls")?
>
> Statedumps are gluster specific. Please refer to 
> https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
>
>> Regards,
>> M.
>>
>> ‐‐‐ Original Message ‐‐‐
>> On August 9, 2018 3:10 PM, Nithya Balachandran  wrote:
>>
>>> Hi,
>>>
>>> Please provide the following:
>>>
>>> - gluster volume info
>>> - statedump of the fuse process when it hangs
>>>
>>> Thanks,
>>> Nithya
>>>
>>> On 9 August 2018 at 18:24, mabi  wrote:
>>>
>>>> Hello,
>>>>
>>>> I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 
>>>> and now I see a weird behaviour on my client (using FUSE mount) where I 
>>>> have processes (PHP 5.6 FPM) trying to access a specific directory and 
>>>> then the process blocks. I can't kill the process either, not even with 
>>>> kill -9. I need to reboot the machine in order to get rid of these blocked 
>>>> processes.
>>>>
>>>> This directory has one particularity compared to the other directories it 
>>>> is that it has reached it's quota soft-limit as you can see here in the 
>>>> output of gluster volume quota list:
>>>>
>>>>   Path   Hard-limit  Soft-limit  Used  
>>>> Available  Soft-limit exceeded? Hard-limit exceeded?
>>>> ---
>>>> /directory  100.0GB 80%(80.0GB)   90.5GB   
>>>> 9.5GB Yes   No
>>>>
>>>> That does not mean that it is the quota's fault but it might be a hint 
>>>> where to start looking for... And by the way can someone explain me what 
>>>> the soft-limit does? or does it not do anything special?
>>>>
>>>> Here is an the linux stack of a blocking process on that directory which 
>>>> happened with a simple "ls -la":
>>>>
>>>> [Thu Aug  9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 
>>>> seconds.
>>>> [Thu Aug  9 14:21:07 2018]   Not tainted 3.16.0-4-amd64 #1
>>>> [Thu Aug  9 14:21:07 2018] "echo 0 > 
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [Thu Aug  9 14:21:07 2018] ls  D 88017ef93200 0  2272  
>>>>  2268 0x0004
>>>> [Thu Aug  9 14:21:07 2018]  88017653f490 0286 
>>>> 00013200 880174d7bfd8
>>>> [Thu Aug  9 14:21:07 2018]  00013200 88017653f490 
>>>> 8800eeb3d5f0 8800fefac800
>>>> [Thu Aug  9 14:21:07 2018]  880174d7bbe0 8800

Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-08-14 Thread mabi
As you mentioned after creating the /var/run/gluster directory I got a 
statedump file in there.

As a workaround I have now removed the quota for this specific directory and as 
it is a production server I can currently not "play" with it by adding the 
quota back and having the same problem as it requires me to reboot the server 
with downtime...

But I can confirm that by removing the quota from that directory, the problem 
is gone (no more blocking processes such as "ls") so there must be an issue or 
bug with the quota part of gluster.

‐‐‐ Original Message ‐‐‐
On August 10, 2018 4:19 PM, Nithya Balachandran  wrote:

> On 9 August 2018 at 19:54, mabi  wrote:
>
>> Thanks for the documentation. On my client using FUSE mount I found the PID 
>> by using ps (output below):
>>
>> root   456 1  4 14:17 ?00:05:15 /usr/sbin/glusterfs 
>> --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
>>
>> Then I ran the following command
>>
>> sudo kill -USR1 456
>>
>> but now I can't find where the files are stored. Are these supposed to be 
>> stored on the client directly? I checked /var/run/gluster and 
>> /var/log/gluster but could not see anything and /var/log/gluster does not 
>> even exist on the client.
>
> They are usually created in /var/run/gluster. You will need to create the 
> directory on the client if it does not exist.
>
>> ‐‐‐ Original Message ‐‐‐
>> On August 9, 2018 3:59 PM, Raghavendra Gowdappa  wrote:
>>
>>> On Thu, Aug 9, 2018 at 6:47 PM, mabi  wrote:
>>>
>>>> Hi Nithya,
>>>>
>>>> Thanks for the fast answer. Here the additional info:
>>>>
>>>> 1. gluster volume info
>>>>
>>>> Volume Name: myvol-private
>>>> Type: Replicate
>>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: gfs1a:/data/myvol-private/brick
>>>> Brick2: gfs1b:/data/myvol-private/brick
>>>> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
>>>> Options Reconfigured:
>>>> features.default-soft-limit: 95%
>>>> transport.address-family: inet
>>>> features.quota-deem-statfs: on
>>>> features.inode-quota: on
>>>> features.quota: on
>>>> nfs.disable: on
>>>> performance.readdir-ahead: on
>>>> client.event-threads: 4
>>>> server.event-threads: 4
>>>> auth.allow: 192.168.100.92
>>>>
>>>> 2. Sorry I have no clue how to take a "statedump" of a process on Linux. 
>>>> Which command should I use for that? and which process would you like, the 
>>>> blocked process (for example "ls")?
>>>
>>> Statedumps are gluster specific. Please refer to 
>>> https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
>>>
>>>> Regards,
>>>> M.
>>>>
>>>> ‐‐‐ Original Message ‐‐‐
>>>> On August 9, 2018 3:10 PM, Nithya Balachandran  wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Please provide the following:
>>>>>
>>>>> - gluster volume info
>>>>> - statedump of the fuse process when it hangs
>>>>>
>>>>> Thanks,
>>>>> Nithya
>>>>>
>>>>> On 9 August 2018 at 18:24, mabi  wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I recently upgraded my GlusterFS replica 2+1 (aribter) to version 
>>>>>> 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) 
>>>>>> where I have processes (PHP 5.6 FPM) trying to access a specific 
>>>>>> directory and then the process blocks. I can't kill the process either, 
>>>>>> not even with kill -9. I need to reboot the machine in order to get rid 
>>>>>> of these blocked processes.
>>>>>>
>>>>>> This directory has one particularity compared to the other directories 
>>>>>> it is that it has reached it's quota soft-limit as you can see here in 
>>>>>> the output of gluster volume quota list:
>>>>>>
>>>>>>   Path   Hard-limit  Soft-limit  
>>>>>> Used  Available  Soft-limi

Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-08-14 Thread mabi
Bad news: the process blocked happened again this time with another directory 
of another user which is NOT over his quota but which also has quota enabled.

The symptoms on the Linux side are the same:

[Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 
seconds.
[Tue Aug 14 15:30:33 2018]   Not tainted 3.16.0-4-amd64 #1
[Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Tue Aug 14 15:30:33 2018] php5-fpmD 8801fea13200 0 14773
729 0x
[Tue Aug 14 15:30:33 2018]  880100bbe0d0 0282 00013200 
880129bcffd8
[Tue Aug 14 15:30:33 2018]  00013200 880100bbe0d0 880153ed0d68 
880129bcfee0
[Tue Aug 14 15:30:33 2018]  880153ed0d6c 880100bbe0d0  
880153ed0d70
[Tue Aug 14 15:30:33 2018] Call Trace:
[Tue Aug 14 15:30:33 2018]  [] ? 
schedule_preempt_disabled+0x25/0x70
[Tue Aug 14 15:30:33 2018]  [] ? 
__mutex_lock_slowpath+0xd3/0x1d0
[Tue Aug 14 15:30:33 2018]  [] ? write_inode_now+0x93/0xc0
[Tue Aug 14 15:30:33 2018]  [] ? mutex_lock+0x1b/0x2a
[Tue Aug 14 15:30:33 2018]  [] ? fuse_flush+0x8f/0x1e0 [fuse]
[Tue Aug 14 15:30:33 2018]  [] ? vfs_read+0x93/0x170
[Tue Aug 14 15:30:33 2018]  [] ? filp_close+0x2a/0x70
[Tue Aug 14 15:30:33 2018]  [] ? SyS_close+0x1f/0x50
[Tue Aug 14 15:30:33 2018]  [] ? 
system_call_fast_compare_end+0x10/0x15

and if I check this process it has state "D" which is "D = uninterruptible 
sleep".

Now I also managed to take a statedump file as recommended but I see in its 
content under the "[io-cache.inode]" a "path=" which I would need to remove as 
it contains filenames for privacy reasons. Can I remove every "path=" line and 
still send you the statedump file for analysis?

Thank you.

‐‐‐ Original Message ‐‐‐
On August 14, 2018 10:48 AM, Nithya Balachandran  wrote:

> Thanks for letting us know. Sanoj, can you take a look at this?
>
> Thanks.
> Nithya
>
> On 14 August 2018 at 13:58, mabi  wrote:
>
>> As you mentioned after creating the /var/run/gluster directory I got a 
>> statedump file in there.
>>
>> As a workaround I have now removed the quota for this specific directory and 
>> as it is a production server I can currently not "play" with it by adding 
>> the quota back and having the same problem as it requires me to reboot the 
>> server with downtime...
>>
>> But I can confirm that by removing the quota from that directory, the 
>> problem is gone (no more blocking processes such as "ls") so there must be 
>> an issue or bug with the quota part of gluster.
>>
>> ‐‐‐ Original Message ‐‐‐
>> On August 10, 2018 4:19 PM, Nithya Balachandran  wrote:
>>
>>> On 9 August 2018 at 19:54, mabi  wrote:
>>>
>>>> Thanks for the documentation. On my client using FUSE mount I found the 
>>>> PID by using ps (output below):
>>>>
>>>> root   456 1  4 14:17 ?00:05:15 /usr/sbin/glusterfs 
>>>> --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
>>>>
>>>> Then I ran the following command
>>>>
>>>> sudo kill -USR1 456
>>>>
>>>> but now I can't find where the files are stored. Are these supposed to be 
>>>> stored on the client directly? I checked /var/run/gluster and 
>>>> /var/log/gluster but could not see anything and /var/log/gluster does not 
>>>> even exist on the client.
>>>
>>> They are usually created in /var/run/gluster. You will need to create the 
>>> directory on the client if it does not exist.
>>>
>>>> ‐‐‐ Original Message ‐‐‐
>>>> On August 9, 2018 3:59 PM, Raghavendra Gowdappa  
>>>> wrote:
>>>>
>>>>> On Thu, Aug 9, 2018 at 6:47 PM, mabi  wrote:
>>>>>
>>>>>> Hi Nithya,
>>>>>>
>>>>>> Thanks for the fast answer. Here the additional info:
>>>>>>
>>>>>> 1. gluster volume info
>>>>>>
>>>>>> Volume Name: myvol-private
>>>>>> Type: Replicate
>>>>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gfs1a:/data/myvol-private/brick
>>>>>> Brick2: gfs1b:/data/myvol-private/brick
>>>>>> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arb

Re: [Gluster-users] Possibly missing two steps in upgrade to 4.1 guide

2018-08-21 Thread mabi
Oops missed that part at the bottom, thanks Hu Bert!

Now the only thing missing from the upgrade guide is what to do about the 
glustereventsd service during the upgrade.


‐‐‐ Original Message ‐‐‐
On August 21, 2018 4:11 PM, Hu Bert  wrote:

> I think point 2 is already covered by the guide; see: "Upgrade
> procedure for clients"
>
> Following are the steps to upgrade clients to the 4.1.x version,
>
> NOTE: x is the minor release number for the release
>
> > > > Unmount all glusterfs mount points on the client
>
> Stop all applications that access the volumes via gfapi (qemu, etc.)
> Install Gluster 4.1
>
> > > > Mount all gluster shares
>
> Start any applications that were stopped previously in step (2)
>
> 2018-08-21 15:33 GMT+02:00 mabi m...@protonmail.ch:
>
> > Hello,
> > I just upgraded from 4.0.2 to 4.1.2 using the official documentation:
> > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
> > I noticed that this documentation might be missing the following two 
> > additional steps:
> >
> > 1.  restart the glustereventsd service
> > 2.  umount and mount again gluster fuse mounts on clients after upgrading 
> > the clients (if using glusterfs fuse mounts of course)
> >
> > Best regards,
> > M.
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Possibly missing two steps in upgrade to 4.1 guide

2018-08-21 Thread mabi
Funny I also use stretch but from 4.0.2 to 4.1.2 the glustereventsd did not get 
restarted automatically so I restarted it manually after having finished the 
upgrade.

‐‐‐ Original Message ‐‐‐
On August 21, 2018 4:20 PM, Hu Bert  wrote:

> today i tested an upgrade 3.12.12 -> 4.1.2, and the glustereventsd
> service was restarted. We use debian stretch; maybe it depends on the
> operating system?
>
> 2018-08-21 16:17 GMT+02:00 mabi m...@protonmail.ch:
>
> > Oops missed that part at the bottom, thanks Hu Bert!
> > Now the only thing missing from the upgrade guide is what to do about the 
> > glustereventsd service during the upgrade.
> > ‐‐‐ Original Message ‐‐‐
> > On August 21, 2018 4:11 PM, Hu Bert revi...@googlemail.com wrote:
> >
> > > I think point 2 is already covered by the guide; see: "Upgrade
> > > procedure for clients"
> > > Following are the steps to upgrade clients to the 4.1.x version,
> > > NOTE: x is the minor release number for the release
> > >
> > > > > > Unmount all glusterfs mount points on the client
> > >
> > > Stop all applications that access the volumes via gfapi (qemu, etc.)
> > > Install Gluster 4.1
> > >
> > > > > > Mount all gluster shares
> > >
> > > Start any applications that were stopped previously in step (2)
> > > 2018-08-21 15:33 GMT+02:00 mabi m...@protonmail.ch:
> > >
> > > > Hello,
> > > > I just upgraded from 4.0.2 to 4.1.2 using the official documentation:
> > > > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
> > > > I noticed that this documentation might be missing the following two 
> > > > additional steps:
> > > >
> > > > 1.  restart the glustereventsd service
> > > > 2.  umount and mount again gluster fuse mounts on clients after 
> > > > upgrading the clients (if using glusterfs fuse mounts of course)
> > > >
> > > > Best regards,
> > > > M.
> > > > Gluster-users mailing list
> > > > Gluster-users@gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Possibly missing two steps in upgrade to 4.1 guide

2018-08-21 Thread mabi
Hello,

I just upgraded from 4.0.2 to 4.1.2 using the official  documentation:

https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/

I noticed that this documentation might be missing the following two additional 
steps:

1) restart the glustereventsd service
2) umount and mount again gluster fuse mounts on clients after upgrading the 
clients (if using glusterfs fuse mounts of course)

Best regards,
M.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2018-07-19 Thread mabi
Hi Amar,

Just wanted to say that I think the quota feature in GlusterFS is really 
useful. In my case I use it on one volume where I have many cloud installations 
(mostly files) for different people and all these need to have a different 
quota set on a specific directory. The GlusterFS quota allows me nicely to 
manage that which would not be possible in the application directly. It would 
really be an overhead for me to for example to have one volume per installation 
just because of setting the max size like that.

I hope that this feature can continue to exist.

Best regards,
M.

‐‐‐ Original Message ‐‐‐
On July 19, 2018 8:56 AM, Amar Tumballi  wrote:

> Hi all,
>
> Over last 12 years of Gluster, we have developed many features, and continue 
> to support most of it till now. But along the way, we have figured out better 
> methods of doing things. Also we are not actively maintaining some of these 
> features.
>
> We are now thinking of cleaning up some of these ‘unsupported’ features, and 
> mark them as ‘SunSet’ (i.e., would be totally taken out of codebase in 
> following releases) in next upcoming release, v5.0. The release notes will 
> provide options for smoothly migrating to the supported configurations.
>
> If you are using any of these features, do let us know, so that we can help 
> you with ‘migration’.. Also, we are happy to guide new developers to work on 
> those components which are not actively being maintained by current set of 
> developers.
>
> List of features hitting sunset:
>
> ‘cluster/stripe’ translator:
>
> This translator was developed very early in the evolution of GlusterFS, and 
> addressed one of the very common question of Distributed FS, which is “What 
> happens if one of my file is bigger than the available brick. Say, I have 2 
> TB hard drive, exported in glusterfs, my file is 3 TB”. While it solved the 
> purpose, it was very hard to handle failure scenarios, and give a real good 
> experience to our users with this feature. Over the time, Gluster solved the 
> problem with it’s ‘Shard’ feature, which solves the problem in much better 
> way, and provides much better solution with existing well supported stack. 
> Hence the proposal for Deprecation.
>
> If you are using this feature, then do write to us, as it needs a proper 
> migration from existing volume to a new full supported volume type before you 
> upgrade.
>
> ‘storage/bd’ translator:
>
> This feature got into the code base 5 years back with this 
> [patch](http://review.gluster.org/4809)[1]. Plan was to use a block device 
> directly as a brick, which would help to handle disk-image storage much 
> easily in glusterfs.
>
> As the feature is not getting more contribution, and we are not seeing any 
> user traction on this, would like to propose for Deprecation.
>
> If you are using the feature, plan to move to a supported gluster volume 
> configuration, and have your setup ‘supported’ before upgrading to your new 
> gluster version.
>
> ‘RDMA’ transport support:
>
> Gluster started supporting RDMA while ib-verbs was still new, and very 
> high-end infra around that time were using Infiniband. Engineers did work 
> with Mellanox, and got the technology into GlusterFS for better data 
> migration, data copy. While current day kernels support very good speed with 
> IPoIB module itself, and there are no more bandwidth for experts in these 
> area to maintain the feature, we recommend migrating over to TCP (IP based) 
> network for your volume.
>
> If you are successfully using RDMA transport, do get in touch with us to 
> prioritize the migration plan for your volume. Plan is to work on this after 
> the release, so by version 6.0, we will have a cleaner transport code, which 
> just needs to support one type.
>
> ‘Tiering’ feature
>
> Gluster’s tiering feature which was planned to be providing an option to keep 
> your ‘hot’ data in different location than your cold data, so one can get 
> better performance. While we saw some users for the feature, it needs much 
> more attention to be completely bug free. At the time, we are not having any 
> active maintainers for the feature, and hence suggesting to take it out of 
> the ‘supported’ tag.
>
> If you are willing to take it up, and maintain it, do let us know, and we are 
> happy to assist you.
>
> If you are already using tiering feature, before upgrading, make sure to do 
> gluster volume tier detach all the bricks before upgrading to next release. 
> Also, we recommend you to use features like dmcache on your LVM setup to get 
> best performance from bricks.
>
> ‘Quota’
>
> This is a call out for ‘Quota’ feature, to let you all know that it will be 
> ‘no new development’ state. While this feature is ‘actively’ in use by many 
> people, the challenges we have in accounting mechanisms involved, has made it 
> hard to achieve good performance with the feature. Also, the amount of 
> extended attribute get/set operations while using the 

Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota

2018-09-02 Thread mabi
Hello,

I wanted to report that I had this morning a similar issue on another server 
where a few PHP-FPM processes get blocked on different GlusterFS volume mounted 
through a FUSE mount. This GlusterFS volume has no quota enabled so it might 
not be quota related after all.

Here would be the Linux kernel stack trace:

[Sun Sep  2 06:47:47 2018] INFO: task php5-fpm:25880 blocked for more than 120 
seconds.
[Sun Sep  2 06:47:47 2018]   Not tainted 3.16.0-4-amd64 #1
[Sun Sep  2 06:47:47 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Sun Sep  2 06:47:47 2018] php5-fpmD 88017ee12f40 0 25880  
1 0x0004
[Sun Sep  2 06:47:47 2018]  880101688b60 0282 00012f40 
880059ca3fd8
[Sun Sep  2 06:47:47 2018]  00012f40 880101688b60 8801093b51b0 
8801067ec800
[Sun Sep  2 06:47:47 2018]  880059ca3cc0 8801093b5290 8801093b51b0 
880059ca3e80
[Sun Sep  2 06:47:47 2018] Call Trace:
[Sun Sep  2 06:47:47 2018]  [] ? 
__fuse_request_send+0xbd/0x270 [fuse]
[Sun Sep  2 06:47:47 2018]  [] ? 
prepare_to_wait_event+0xf0/0xf0
[Sun Sep  2 06:47:47 2018]  [] ? fuse_send_write+0xd0/0x100 
[fuse]
[Sun Sep  2 06:47:47 2018]  [] ? 
fuse_perform_write+0x26f/0x4b0 [fuse]
[Sun Sep  2 06:47:47 2018]  [] ? 
fuse_file_write_iter+0x1dd/0x2b0 [fuse]
[Sun Sep  2 06:47:47 2018]  [] ? new_sync_write+0x74/0xa0
[Sun Sep  2 06:47:47 2018]  [] ? vfs_write+0xb2/0x1f0
[Sun Sep  2 06:47:47 2018]  [] ? vfs_read+0xed/0x170
[Sun Sep  2 06:47:47 2018]  [] ? SyS_write+0x42/0xa0
[Sun Sep  2 06:47:47 2018]  [] ? SyS_lseek+0x7e/0xa0
[Sun Sep  2 06:47:47 2018]  [] ? 
system_call_fast_compare_end+0x10/0x15

Did anyone already have time to have a look at the statedump file I sent around 
3 weeks ago?

I never saw this type of problems in the past and it started to appear since I 
upgraded to GluterFS 3.12.12.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On August 15, 2018 9:21 AM, mabi  wrote:

> Great, you will then find attached here the statedump of the client using the 
> FUSE glusterfs mount right after two processes have blocked.
>
> Two notes here regarding the "path=" in this statedump file:
> - I have renamed all the "path=" which has the problematic directory as 
> "path=PROBLEMATIC_DIRECTORY_HERE
> - All the other "path=" I have renamed them to "path=REMOVED_FOR_PRIVACY".
>
> Note also that funnily enough the number of "path=" for that problematic 
> directory sums up to exactly 5000 entries. Coincidence or hint to the problem 
> maybe?
>
> ‐‐‐ Original Message ‐‐‐
> On August 15, 2018 5:21 AM, Raghavendra Gowdappa  wrote:
>
>> On Tue, Aug 14, 2018 at 7:23 PM, mabi  wrote:
>>
>>> Bad news: the process blocked happened again this time with another 
>>> directory of another user which is NOT over his quota but which also has 
>>> quota enabled.
>>>
>>> The symptoms on the Linux side are the same:
>>>
>>> [Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 
>>> 120 seconds.
>>> [Tue Aug 14 15:30:33 2018]   Not tainted 3.16.0-4-amd64 #1
>>> [Tue Aug 14 15:30:33 2018] "echo 0 > 
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [Tue Aug 14 15:30:33 2018] php5-fpmD 8801fea13200 0 14773   
>>>  729 0x
>>> [Tue Aug 14 15:30:33 2018]  880100bbe0d0 0282 
>>> 00013200 880129bcffd8
>>> [Tue Aug 14 15:30:33 2018]  00013200 880100bbe0d0 
>>> 880153ed0d68 880129bcfee0
>>> [Tue Aug 14 15:30:33 2018]  880153ed0d6c 880100bbe0d0 
>>>  880153ed0d70
>>> [Tue Aug 14 15:30:33 2018] Call Trace:
>>> [Tue Aug 14 15:30:33 2018]  [] ? 
>>> schedule_preempt_disabled+0x25/0x70
>>> [Tue Aug 14 15:30:33 2018]  [] ? 
>>> __mutex_lock_slowpath+0xd3/0x1d0
>>> [Tue Aug 14 15:30:33 2018]  [] ? write_inode_now+0x93/0xc0
>>> [Tue Aug 14 15:30:33 2018]  [] ? mutex_lock+0x1b/0x2a
>>> [Tue Aug 14 15:30:33 2018]  [] ? fuse_flush+0x8f/0x1e0 
>>> [fuse]
>>> [Tue Aug 14 15:30:33 2018]  [] ? vfs_read+0x93/0x170
>>> [Tue Aug 14 15:30:33 2018]  [] ? filp_close+0x2a/0x70
>>> [Tue Aug 14 15:30:33 2018]  [] ? SyS_close+0x1f/0x50
>>> [Tue Aug 14 15:30:33 2018]  [] ? 
>>> system_call_fast_compare_end+0x10/0x15
>>>
>>> and if I check this process it has state "D" which is "D = uninterruptible 
>>> sleep".
>>>
>>> Now I also managed to take a statedump file as recommended but I see in its 
>>> content under the "[io-cach

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-05 Thread mabi
Dear Ravi,

Thank you for your mail and info. 

This is great news if these patches can make it into 3.12.12. I will then 
upgrade asap. Could anyone confirm in case these patches do not make it into 
3.12.12? Because I would then rather wait for the next release. I was already 
told on this list that 3.12.9 should have fixed this issue but unfortunately it 
didn't.

Best regards,
Mabi​​

‐‐‐ Original Message ‐‐‐

On July 4, 2018 5:41 PM, Ravishankar N  wrote:

> ​​
> 
> Hi mabi, there are a couple of AFR patches  from master that I'm
> 
> currently back porting to the 3.12 branch:
> 
> afr: heal gfids when file is not present on all bricks
> 
> afr: don't update readables if inode refresh failed on all children
> 
> afr: fix bug-1363721.t failure
> 
> afr: add quorum checks in pre-op
> 
> afr: don't treat all cases all bricks being blamed as split-brain
> 
> afr: capture the correct errno in post-op quorum check
> 
> afr: add quorum checks in post-op
> 
> Many of these help make the transaction code more robust by fixing
> 
> various corner cases. It would be great if you can wait for the next
> 
> 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the
> 
> issues go away.
> 
> Note: CC'ing Karthik and Jiffin for their help in reviewing and merging
> 
> the backports for the above patches.
> 
> Thanks,
> 
> Ravi
> 
> On 07/04/2018 06:51 PM, mabi wrote:
> 
> > Hello,
> > 
> > I just wanted to let you know that last week I have upgraded my two replica 
> > nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) 
> > are running Debian 9 with a Linux 4 kernel.
> > 
> > Unfortunately I still have the exact same issue. Another detail I might 
> > have not mentioned yet is that I have quotas enabled on this volume, I 
> > don't really know if that is relevant but who knows...
> > 
> > As a reminder here is what happens on the client side which has the volume 
> > mounted via FUSE (take earlier today from the 
> > /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
> > specific case it's only one single file who had this issue.
> > 
> > [2018-07-04 08:23:49.314252] E [MSGID: 109089] 
> > [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
> > failed to open the fd (0x7fccb00a5120, flags=010) on file 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  @ myvol-replicate-0 [Input/output error]
> > 
> > [2018-07-04 08:23:49.328712] W [MSGID: 108027] 
> > [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
> > subvols for 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> > 
> > [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
> > 0-glusterfs-fuse: 55916791: TRUNCATE() 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  => -1 (Input/output error)
> > 
> > Best regards,
> > 
> > M.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On June 22, 2018 4:44 PM, mabi m...@protonmail.ch wrote:
> > 
> > > Hi,
> > > 
> > > Now that this issue has happened a few times I noticed a few things which 
> > > might be helpful for debugging:
> > > 
> > > -   This problem happens when files are uploaded via a cloud app called 
> > > Nextcloud where the files are encrypted by the app itself on the server 
> > > side (PHP code) but only rarely and randomly.
> > > 
> > > -   It does not seem to happen with Nextcloud installation which does not 
> > > have server side encryption enabled.
> > > 
> > > -   When this happens both first and second node of the replica have 120k 
> > > of context switches and 25k interrupts, the arbiter node 30k context 
> > > switches/20k interrupts. No nodes are overloaded, there is no io/wait and 
> > > no network issues or disconnections.
> > > 
> > > -   All of the problematic files to heal have spaces in one of their 
> > > sub-directories (might be totally irrelevant).
> > > 
> > > If that's of any use my two replica nodes are Debian 8 physical 
> > > servers with ZFS as file system for the bricks and the arbiter is a 
> > > Debian 9 virtual machine with XFS as file system for the brick. To mount 
> > > the volume I use a glusterfs fuse mount on the web server which has 
> > > Nextcloud running.
> &g

Re: [Gluster-users] Release 3.12.12: Scheduled for the 11th of July

2018-07-12 Thread mabi
Hi Jiffin,

Thank you very much for confirming. I will now find a maintenance window and 
upgrade GlusterFS. I will post back on this thread in case I still see any 
issues but hopefully it all goes well :-)

Cheers,
M.

‐‐‐ Original Message ‐‐‐
On July 11, 2018 4:10 PM, Jiffin Tony Thottan  wrote:

> Hi Mabi,
>
> I have checked with afr maintainer, all of the required changes is merged in 
> 3.12.
>
> Hence moving forward with 3.12.12 release
>
> Regards,
>
> Jiffin
>
> On Monday 09 July 2018 01:04 PM, mabi wrote:
>
>> Hi Jiffin,
>>
>> Based on the issues I am encountering on a nearly daily basis (See "New 
>> 3.12.7 possible split-brain on replica 3" thread in this ML) since now 
>> already 2-3 months I would be really glad if the required fixes as mentioned 
>> by Ravi could make it into the 3.12.12 release. Ravi mentioned the following:
>>
>> afr: heal gfids when file is not present on all bricks
>> afr: don't update readables if inode refresh failed on all children
>> afr: fix bug-1363721.t failure
>> afr: add quorum checks in pre-op
>> afr: don't treat all cases all bricks being blamed as split-brain
>> afr: capture the correct errno in post-op quorum check
>> afr: add quorum checks in post-op
>>
>> Right now I only see the first one pending in the review dashboard. It would 
>> be great if all of them could make it into this release.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>> On July 9, 2018 7:18 AM, Jiffin Tony Thottan 
>> [](mailto:jthot...@redhat.com) wrote:
>>
>>> Hi,
>>>
>>> It's time to prepare the 3.12.12 release, which falls on the 10th of
>>> each month, and hence would be 11-07-2018 this time around.
>>>
>>> This mail is to call out the following,
>>>
>>> 1) Are there any pending *blocker* bugs that need to be tracked for
>>> 3.12.12? If so mark them against the provided tracker [1] as blockers
>>> for the release, or at the very least post them as a response to this
>>> mail
>>>
>>> 2) Pending reviews in the 3.12 dashboard will be part of the release,
>>> *iff* they pass regressions and have the review votes, so use the
>>> dashboard [2] to check on the status of your patches to 3.12 and get
>>> these going
>>>
>>> Thanks,
>>> Jiffin
>>>
>>> [1] Release bug tracker:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.12
>>>
>>> [2] 3.12 review dashboard:
>>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Release 3.12.12: Scheduled for the 11th of July

2018-07-09 Thread mabi
Hi Jiffin,

Based on the issues I am encountering on a nearly daily basis (See "New 3.12.7 
possible split-brain on replica 3" thread in this ML) since now already 2-3 
months I would be really glad if the required fixes as mentioned by Ravi could 
make it into the 3.12.12 release. Ravi mentioned the following:

afr: heal gfids when file is not present on all bricks
afr: don't update readables if inode refresh failed on all children
afr: fix bug-1363721.t failure
afr: add quorum checks in pre-op
afr: don't treat all cases all bricks being blamed as split-brain
afr: capture the correct errno in post-op quorum check
afr: add quorum checks in post-op

Right now I only see the first one pending in the review dashboard. It would be 
great if all of them could make it into this release.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On July 9, 2018 7:18 AM, Jiffin Tony Thottan  wrote:

> Hi,
>
> It's time to prepare the 3.12.12 release, which falls on the 10th of
> each month, and hence would be 11-07-2018 this time around.
>
> This mail is to call out the following,
>
> 1) Are there any pending *blocker* bugs that need to be tracked for
> 3.12.12? If so mark them against the provided tracker [1] as blockers
> for the release, or at the very least post them as a response to this
> mail
>
> 2) Pending reviews in the 3.12 dashboard will be part of the release,
> *iff* they pass regressions and have the review votes, so use the
> dashboard [2] to check on the status of your patches to 3.12 and get
> these going
>
> Thanks,
> Jiffin
>
> [1] Release bug tracker:
> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.12
>
> [2] 3.12 review dashboard:
> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-04 Thread mabi
Hello,

I just wanted to let you know that last week I have upgraded my two replica 
nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are 
running Debian 9 with a Linux 4 kernel.

Unfortunately I still have the exact same issue. Another detail I might have 
not mentioned yet is that I have quotas enabled on this volume, I don't really 
know if that is relevant but who knows...

As a reminder here is what happens on the client side which has the volume 
mounted via FUSE (take earlier today from the 
/var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
specific case it's only one single file who had this issue.

[2018-07-04 08:23:49.314252] E [MSGID: 109089] 
[dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
failed to open the fd (0x7fccb00a5120, flags=010) on file 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 @ myvol-replicate-0 [Input/output error]
[2018-07-04 08:23:49.328712] W [MSGID: 108027] 
[afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
subvols for 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
0-glusterfs-fuse: 55916791: TRUNCATE() 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 => -1 (Input/output error) 

Best regards,
M.

‐‐‐ Original Message ‐‐‐

On June 22, 2018 4:44 PM, mabi  wrote:

> ​​
> 
> Hi,
> 
> Now that this issue has happened a few times I noticed a few things which 
> might be helpful for debugging:
> 
> -   This problem happens when files are uploaded via a cloud app called 
> Nextcloud where the files are encrypted by the app itself on the server side 
> (PHP code) but only rarely and randomly.
> -   It does not seem to happen with Nextcloud installation which does not 
> have server side encryption enabled.
> -   When this happens both first and second node of the replica have 120k of 
> context switches and 25k interrupts, the arbiter node 30k context 
> switches/20k interrupts. No nodes are overloaded, there is no io/wait and no 
> network issues or disconnections.
> -   All of the problematic files to heal have spaces in one of their 
> sub-directories (might be totally irrelevant).
> 
> If that's of any use my two replica nodes are Debian 8 physical servers 
> with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual 
> machine with XFS as file system for the brick. To mount the volume I use a 
> glusterfs fuse mount on the web server which has Nextcloud running.
> 
> Regards,
> 
> M.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote:
> 
> 
> > Thanks Ravi. Let me know when you have time to have a look. It sort of 
> > happens around once or twice per week but today it was 24 files in one go 
> > which are unsynched and where I need to manually reset the xattrs on the 
> > arbiter node.
> > 
> > By the way on this volume I use quotas which I set on specifc directories, 
> > I don't know if this is relevant or not but thought I would just mention.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote:
> > 
> > > On 05/23/2018 12:47 PM, mabi wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I just wanted to ask if you had time to look into this bug I am 
> > > > encountering and if there is anything else I can do?
> > > > 
> > > > For now in order to get rid of these 3 unsynched files shall I do the 
> > > > same method that was suggested to me in this thread?
> > > 
> > > Sorry Mabi,  I haven't had a chance to dig deeper into this. The
> > > 
> > > workaround of resetting xattrs should be fine though.
> > > 
> > > Thanks,
> > > 
> > > Ravi
> > > 
> > > > Thanks,
> > > > 
> > > > Mabi
> > > > 
> > > > ‐‐‐ Original Message ‐‐‐
> > > > 
> > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:
> > > > 
> > > > > Hi Ravi,
> > > > > 
> > > > > Please fine below the answers to your questions
> > > > > 
> > > > > 1.  I have never touched the cluster.quorum-type option. Currently it 
> > > > > is set as following for this volume:
> > > > > 
> > > > > Option Value
> > > > >

Re: [Gluster-users] Failed to get quota limits

2018-02-27 Thread mabi
Hi,

Thanks for the link to the bug. We should be hopefully moving soon onto 3.12 so 
I guess this bug is also fixed there.

Best regards,
M.
​

‐‐‐ Original Message ‐‐‐

On February 27, 2018 9:38 AM, Hari Gowtham <hgowt...@redhat.com> wrote:

> ​​
> 
> Hi Mabi,
> 
> The bugs is fixed from 3.11. For 3.10 it is yet to be backported and
> 
> made available.
> 
> The bug is https://bugzilla.redhat.com/show_bug.cgi?id=1418259.
> 
> On Sat, Feb 24, 2018 at 4:05 PM, mabi m...@protonmail.ch wrote:
> 
> > Dear Hari,
> > 
> > Thank you for getting back to me after having analysed the problem.
> > 
> > As you said I tried to run "gluster volume quota  list " for 
> > all of my directories which have a quota and found out that there was one 
> > directory quota which was missing (stale) as you can see below:
> > 
> > $ gluster volume quota myvolume list /demo.domain.tld
> > 
> > Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit 
> > exceeded?
> > 
> > 
> > --
> > 
> > /demo.domain.tld N/A N/A 8.0MB N/A N/A N/A
> > 
> > So as you suggest I added again the quota on that directory and now the 
> > "list" finally works again and show the quotas for every directories as I 
> > defined them. That did the trick!
> > 
> > Now do you know if this bug is already corrected in a new release of 
> > GlusterFS? if not do you know when it will be fixed?
> > 
> > Again many thanks for your help here!
> > 
> > Best regards,
> > 
> > M.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On February 23, 2018 7:45 AM, Hari Gowtham hgowt...@redhat.com wrote:
> > 
> > > Hi,
> > > 
> > > There is a bug in 3.10 which doesn't allow the quota list command to
> > > 
> > > output, if the last entry on the conf file is a stale entry.
> > > 
> > > The workaround for this is to remove the stale entry at the end. (If
> > > 
> > > the last two entries are stale then both have to be removed and so on
> > > 
> > > until the last entry on the conf file is a valid entry).
> > > 
> > > This can be avoided by adding a new limit. As the new limit you added
> > > 
> > > didn't work there is another way to check this.
> > > 
> > > Try quota list command with a specific limit mentioned in the command.
> > > 
> > > gluster volume quota  list 
> > > 
> > > Make sure this path and the limit are set.
> > > 
> > > If this works then you need to clean up the last stale entry.
> > > 
> > > If this doesn't work we need to look further.
> > > 
> > > Thanks Sanoj for the guidance.
> > > 
> > > On Wed, Feb 14, 2018 at 1:36 AM, mabi m...@protonmail.ch wrote:
> > > 
> > > > I tried to set the limits as you suggest by running the following 
> > > > command.
> > > > 
> > > > $ sudo gluster volume quota myvolume limit-usage /directory 200GB
> > > > 
> > > > volume quota : success
> > > > 
> > > > but then when I list the quotas there is still nothing, so nothing 
> > > > really happened.
> > > > 
> > > > I also tried to run stat on all directories which have a quota but 
> > > > nothing happened either.
> > > > 
> > > > I will send you tomorrow all the other logfiles as requested.
> > > > 
> > > > \-\-\-\-\-\-\-\- Original Message 
> > > > 
> > > > On February 13, 2018 12:20 PM, Hari Gowtham hgowt...@redhat.com wrote:
> > > > 
> > > > > Were you able to set new limits after seeing this error?
> > > > > 
> > > > > On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com 
> > > > > wrote:
> > > > > 
> > > > > > Yes, I need the log files in that duration, the log rotated file 
> > > > > > after
> > > > > > 
> > > > > > hitting the
> > > > > > 
> > > > > > issue aren't necessary, but the ones before hitting the issues are 
> > > > > > needed
> > > > > > 
> > > > > > (not just when you hit it, the ones even before you hit it).
> > > > > > 
> > > > > > Yes, you have to do

[Gluster-users] Can't stop volume using gluster volume stop

2018-04-06 Thread mabi
Hello,

On one of my GlusterFS 3.12.7 3-way replica volume I can't stop it using the 
standard gluster volume stop command as you can see below:

$ sudo gluster volume stop myvolume
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) 
y
volume stop: myvolume: failed: geo-replication Unable to get the status of 
active geo-replication session for the volume 'myvolume'.
 Please check the log file for more info.

In the past I had geo-replication running on that volume but because it did not 
perform well with millions of files I decided to delete it. Somehow it looks 
like it has not been totally deleted or correctly deleted else the volume stop 
command above should have worked. Nevertheless I can't find any traces of the 
geo-replication still configured as you can see below withe a geo-replication 
status command:

$ sudo gluster volume geo-replication myvolume geo.domain.tld::myvolume-geo 
status detail
No active geo-replication sessions between myvolume and 
geo.domain.tld::myvolume-geo
​​
Any ideas how I can fix that?

Best regards,
Mabi
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Release 3.12.8: Scheduled for the 12th of April

2018-04-11 Thread mabi
Dear Jiffin,

Would it be possible to have the following backported to 3.12:

https://bugzilla.redhat.com/show_bug.cgi?id=1482064

See my mail with subject "New 3.12.7 possible split-brain on replica 3" on the 
list earlier this week for more details.

Thank you very much.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On April 11, 2018 5:16 AM, Jiffin Tony Thottan <jthot...@redhat.com> wrote:

> Hi,
>
> It's time to prepare the 3.12.8 release, which falls on the 10th of
> each month, and hence would be 12-04-2018 this time around.
>
> This mail is to call out the following,
>
> 1) Are there any pending *blocker* bugs that need to be tracked for
> 3.12.7? If so mark them against the provided tracker [1] as blockers
> for the release, or at the very least post them as a response to this
> mail
>
> 2) Pending reviews in the 3.12 dashboard will be part of the release,
> *iff* they pass regressions and have the review votes, so use the
> dashboard [2] to check on the status of your patches to 3.12 and get
> these going
>
> 3) I have made checks on what went into 3.10 post 3.12 release and if
> these fixes are already included in 3.12 branch, then status on this is 
> *green*
> as all fixes ported to 3.10, are ported to 3.12 as well.
>
> @Mlind
>
> IMO https://review.gluster.org/19659 is like a minor feature to me. Can 
> please provide a justification for why it need to include in 3.12 stable 
> release?
>
> And please rebase the change as well
>
> @Raghavendra
>
> The smoke failed for https://review.gluster.org/#/c/19818/. Can please check 
> the same?
>
> Thanks,
> Jiffin
>
> [1] Release bug tracker:
> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.8
>
> [2] 3.12 review dashboard:
> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Release 3.12.8: Scheduled for the 12th of April

2018-04-11 Thread mabi
Thank you Ravi for your comments. I do understand that it might not be very 
wise to risk any mistakes by rushing this fix into 3.12.8. In that case I will 
be more patient and wait for 3.12.9 next month.

‐‐‐ Original Message ‐‐‐
On April 11, 2018 5:09 PM, Ravishankar N <ravishan...@redhat.com> wrote:

> Mabi,
>
> It looks like one of the patches is not a straight forward cherry-pick to the 
> 3.12 branch. Even though the conflict might be easy to resolve, I don't think 
> it is a good idea to hurry it for tomorrow. We will definitely have it ready 
> by the next minor release (or if by chance the release is delayed and the 
> back port is reviewed and merged before that). Hope that is acceptable.
>
> -Ravi
>
> On 04/11/2018 01:11 PM, mabi wrote:
>
>> Dear Jiffin,
>>
>> Would it be possible to have the following backported to 3.12:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1482064
>>
>> See my mail with subject "New 3.12.7 possible split-brain on replica 3" on 
>> the list earlier this week for more details.
>>
>> Thank you very much.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>> On April 11, 2018 5:16 AM, Jiffin Tony Thottan 
>> [<jthot...@redhat.com>](mailto:jthot...@redhat.com) wrote:
>>
>>> Hi,
>>>
>>> It's time to prepare the 3.12.8 release, which falls on the 10th of
>>> each month, and hence would be 12-04-2018 this time around.
>>>
>>> This mail is to call out the following,
>>>
>>> 1) Are there any pending *blocker* bugs that need to be tracked for
>>> 3.12.7? If so mark them against the provided tracker [1] as blockers
>>> for the release, or at the very least post them as a response to this
>>> mail
>>>
>>> 2) Pending reviews in the 3.12 dashboard will be part of the release,
>>> *iff* they pass regressions and have the review votes, so use the
>>> dashboard [2] to check on the status of your patches to 3.12 and get
>>> these going
>>>
>>> 3) I have made checks on what went into 3.10 post 3.12 release and if
>>> these fixes are already included in 3.12 branch, then status on this is 
>>> *green*
>>> as all fixes ported to 3.10, are ported to 3.12 as well.
>>>
>>> @Mlind
>>>
>>> IMO https://review.gluster.org/19659 is like a minor feature to me. Can 
>>> please provide a justification for why it need to include in 3.12 stable 
>>> release?
>>>
>>> And please rebase the change as well
>>>
>>> @Raghavendra
>>>
>>> The smoke failed for https://review.gluster.org/#/c/19818/. Can please 
>>> check the same?
>>>
>>> Thanks,
>>> Jiffin
>>>
>>> [1] Release bug tracker:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.8
>>>
>>> [2] 3.12 review dashboard:
>>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>>
>> http://lists.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Hello,

Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
cluster to 3.12.7 and this morning I got a warning that 9 files on one of my 
volumes are not synced. Ineeded checking that volume with a "volume heal info" 
shows that the third node (the arbitrer node) has 9 files to be healed but are 
not being healed automatically.

All nodes were always online and there was no network interruption so I am 
wondering if this might not really be a split-brain issue but something else.

I found some interesting log entries on the client log file 
(/var/log/glusterfs/myvol-private.log) which I have included below in this 
mail. It looks like some renaming has gone wrong because a directory is not 
empty.

For your information I have upgraded my GlusterFS in offline mode and the 
upgrade went smoothly.

What can I do to fix that issue?

Best regards,
Mabi


[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
0-myvol-private-dht: renaming 
/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip 
(hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => 
/dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip
 (hash=myvol-private-replicate-0/cache=)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] 
[client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] 
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote 
operation failed. Path:  
(13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] 
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote 
operation failed. Path:  
(13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] 
[client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] 
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
0-myvol-private-replicate-0: performing metadata selfheal on 
1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. 
sources=[2]  sinks=0 1 
[2018-04-09 06:58:53.775939] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: 
performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] 
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
0-myvol-private-replicate-0: performing metadata selfheal on 
13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. 
sources=[2]  sinks=0 1 
[2018-04-09 06:58:53.796950] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2]  
sinks=0 1 
[2018-04-09 06:58:53.812682] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: 
performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] 
[afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read 
subvols for (null)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Here would be also the corresponding log entries on a gluster node brick log 
file:

[2018-04-09 06:58:47.363536] W [MSGID: 113093] 
[posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
removing gfid2path xattr failed on 
/data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
 key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE
 [No such file or directory]
​
Hope that helps to find out the issue.

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:37 AM, mabi <m...@protonmail.ch> wrote:

> ​​
> 
> Hello,
> 
> Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
> cluster to 3.12.7 and this morning I got a warning that 9 files on one of my 
> volumes are not synced. Ineeded checking that volume with a "volume heal 
> info" shows that the third node (the arbitrer node) has 9 files to be healed 
> but are not being healed automatically.
> 
> All nodes were always online and there was no network interruption so I am 
> wondering if this might not really be a split-brain issue but something else.
> 
> I found some interesting log entries on the client log file 
> (/var/log/glusterfs/myvol-private.log) which I have included below in this 
> mail. It looks like some renaming has gone wrong because a directory is not 
> empty.
> 
> For your information I have upgraded my GlusterFS in offline mode and the 
> upgrade went smoothly.
> 
> What can I do to fix that issue?
> 
> Best regards,
> 
> Mabi
> 
> [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
> 0-myvol-private-dht: renaming 
> /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip 
> (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => 
> /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip
>  (hash=myvol-private-replicate-0/cache=)
> 
> [2018-04-09 06:58:53.692440] W [MSGID: 114031] 
> [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
> operation failed [Directory not empty]
> 
> [2018-04-09 06:58:53.714129] W [MSGID: 114031] 
> [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: 
> remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c 
> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
> 
> [2018-04-09 06:58:53.714161] W [MSGID: 114031] 
> [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: 
> remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c 
> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
> 
> [2018-04-09 06:58:53.715638] W [MSGID: 114031] 
> [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
> operation failed [Directory not empty]
> 
> [2018-04-09 06:58:53.750372] I [MSGID: 108026] 
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
> 0-myvol-private-replicate-0: performing metadata selfheal on 
> 1cc6facf-eca5-481c-a905-7a39faa25156
> 
> [2018-04-09 06:58:53.757677] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. 
> sources=[2] sinks=0 1
> 
> [2018-04-09 06:58:53.775939] I [MSGID: 108026] 
> [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 
> 0-myvol-private-replicate-0: performing entry selfheal on 
> 1cc6facf-eca5-481c-a905-7a39faa25156
> 
> [2018-04-09 06:58:53.776237] I [MSGID: 108026] 
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
> 0-myvol-private-replicate-0: performing metadata selfheal on 
> 13880e8c-13da-442f-8180-fa40b6f5327c
> 
> [2018-04-09 06:58:53.781762] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. 
> sources=[2] sinks=0 1
> 
> [2018-04-09 06:58:53.796950] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] 
> sinks=0 1
> 
> [2018-04-09 06:58:53.812682] I [MSGID: 108026] 
> [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 
> 0-myvol-private-replicate-0: performing entry selfheal on 
> 13880e8c-13da-442f-8180-fa40b6f5327c
> 
> [2018-04-09 06:58:53.879382] E [MSGID: 108008] 
> [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
> Failing READ on gfid a4c

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
As I was suggested in the past by this mailing list a now ran a stat and 
getfattr on one of the problematic files on all nodes and at the end a stat on 
the fuse mount directly. The output is below:

NODE1:

STAT:
  File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
 Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE2:

STAT:
  File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
 Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE3:
STAT:
  File: 
/srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
  Size: 0   Blocks: 8  IO Block: 4096   regular empty file
Device: ca11h/51729dInode: 404058268   Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
 Birth: -

GETFATTR:
trusted.afr.dirty=0s
trusted.afr.myvol-private-client-0=0sAQAA
trusted.afr.myvol-private-client-1=0sAQAA
trusted.bit-rot.version=0sAgBavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

CLIENT GLUSTER MOUNT:
STAT:
  File: 
'/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
  Size: 0   Blocks: 0  IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371  Links: 1
Access: (0644/-rw-r--r--)  Uid: (20909/nch20909)   Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
 Birth: -
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:49 AM, mabi <m...@protonmail.ch> wrote:

> ​​
> 
> Here would be also the corresponding log entries on a gluster node brick log 
> file:
> 
> [2018-04-09 06:58:47.363536] W [MSGID: 113093] 
> [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
> removing gfid2path xattr failed on 
> /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
>  key = trusted.gfid2path.2529bb66b56be110 [No data available]
> 
> [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> Hope that helps to find out the issue.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote:
> 
> > Hello,
> > 
> > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
> > cluster to 3.12.7 and this morning I got a warning that 9 files on one of 
> > my volumes are not synced. Ineeded checking that volume with a "volume heal 
> > info" shows that the third node (the arbitrer node) has 9 files to be 
> > healed but are not being healed automatically.
> > 
> > All nodes were always online and there was no network interruption so I am 
> > wondering if this might not really be a split-brain issue but something 
> > else.
> > 
> > I found some interesting log entr

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be possible to 
backport it to 3.12? I am asking because 3.13 is not a long-term release and as 
such I would not like to have to upgrade to 3.13.


‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:46 PM, Ravishankar N <ravishan...@redhat.com> wrote:

> ​​
> 
> On 04/09/2018 05:09 PM, mabi wrote:
> 
> > Thanks Ravi for your answer.
> > 
> > Stupid question but how do I delete the trusted.afr xattrs on this brick?
> > 
> > And when you say "this brick", do you mean the brick on the arbitrer node 
> > (node 3 in my case)?
> 
> Sorry I should have been clearer. Yes the brick on the 3rd node.
> 
> `setfattr -x trusted.afr.myvol-private-client-0 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
> 
> `setfattr -x trusted.afr.myvol-private-client-1 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
> 
> After doing this for all files, run 'gluster volume heal `.
> 
> HTH,
> 
> Ravi
> 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote:
> > 
> > > On 04/09/2018 04:36 PM, mabi wrote:
> > > 
> > > > As I was suggested in the past by this mailing list a now ran a stat 
> > > > and getfattr on one of the problematic files on all nodes and at the 
> > > > end a stat on the fuse mount directly. The output is below:
> > > > 
> > > > NODE1:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > > > 
> > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > > > 
> > > > Device: 23h/35d Inode: 6822549 Links: 2
> > > > 
> > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > > > 
> > > > Access: 2018-04-09 08:58:54.311556621 +0200
> > > > 
> > > > Modify: 2018-04-09 08:58:54.311556621 +0200
> > > > 
> > > > Change: 2018-04-09 08:58:54.423555611 +0200
> > > > 
> > > > Birth: -
> > > > 
> > > > GETFATTR:
> > > > 
> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > > > 
> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > > > 
> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > > > 
> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > > > 
> > > > NODE2:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > > > 
> > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > > > 
> > > > Device: 24h/36d Inode: 6825876 Links: 2
> > > > 
> > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > > > 
> > > > Access: 2018-04-09 08:58:54.311775605 +0200
> > > > 
> > > > Modify: 2018-04-09 08:58:54.311775605 +0200
> > > > 
> > > > Change: 2018-04-09 08:58:54.423774007 +0200
> > > > 
> > > > Birth: -
> > > > 
> > > > GETFATTR:
> > > > 
> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > > > 
> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > > > 
> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > > > 
> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > > > 
> > > > NODE3:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
> > > > 
> > > &g

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Thanks Ravi for your answer.

Stupid question but how do I delete the trusted.afr xattrs on this brick?

And when you say "this brick", do you mean the brick on the arbitrer node (node 
3 in my case)?
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:24 PM, Ravishankar N <ravishan...@redhat.com> wrote:

> ​​
> 
> On 04/09/2018 04:36 PM, mabi wrote:
> 
> > As I was suggested in the past by this mailing list a now ran a stat and 
> > getfattr on one of the problematic files on all nodes and at the end a stat 
> > on the fuse mount directly. The output is below:
> > 
> > NODE1:
> > 
> > STAT:
> > 
> > File: 
> > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > 
> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > 
> > Device: 23h/35d Inode: 6822549 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Modify: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Change: 2018-04-09 08:58:54.423555611 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > NODE2:
> > 
> > STAT:
> > 
> > File: 
> > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > 
> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > 
> > Device: 24h/36d Inode: 6825876 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-09 08:58:54.311775605 +0200
> > 
> > Modify: 2018-04-09 08:58:54.311775605 +0200
> > 
> > Change: 2018-04-09 08:58:54.423774007 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > NODE3:
> > 
> > STAT:
> > 
> > File: 
> > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
> > 
> > Size: 0 Blocks: 8 IO Block: 4096 regular empty file
> > 
> > Device: ca11h/51729d Inode: 404058268 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-05 16:16:55.292341447 +0200
> > 
> > Modify: 2018-04-05 16:16:55.292341447 +0200
> > 
> > Change: 2018-04-09 08:58:54.428074177 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.afr.dirty=0s
> > 
> > trusted.afr.myvol-private-client-0=0sAQAA
> > 
> > trusted.afr.myvol-private-client-1=0sAQAA
> 
> Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
> 
> by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
> 
> launch heal, that should fix it. But the file seems to have no content
> 
> on both data bricks as well, so you might want to check if that was
> 
> expected.
> 
> -Ravi
> 
> > trusted.bit-rot.version=0sAgBavUW2AAGBaA==
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > CLIENT GLUSTER MOUNT:
> > 
> > STAT:
> > 
> > File: 
> > '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
> > 
> > Size: 0 Blocks: 0 IO Block: 131072 regular empty file
> > 
> > Device: 1eh/30d Inode: 13600685574951729371 Links: 1
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
> > 
> > Access: 2

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Thanks for clarifying that point. Does this mean that the fix for this bug will 
get backported to the next 3.12 release?
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 2:31 PM, Ravishankar N <ravishan...@redhat.com> wrote:

> ​​
> 
> On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:
> 
> > 09.04.2018 16:18, Ravishankar N пишет:
> > 
> > > On 04/09/2018 05:40 PM, mabi wrote:
> > > 
> > > > Again thanks that worked and I have now no more unsynched files.
> > > > 
> > > > You mentioned that this bug has been fixed in 3.13, would it be
> > > > 
> > > > possible to backport it to 3.12? I am asking because 3.13 is not a
> > > > 
> > > > long-term release and as such I would not like to have to upgrade to
> > > > 
> > > > 3.13.
> > > 
> > > I don't think there will be another 3.12 release.
> > 
> > Why not? It is LTS, right?
> 
> My bad. Just checked  the schedule [1], and you are right. It is LTM.
> 
> [1] https://www.gluster.org/release-schedule/
> 
> > Gluster-users mailing list
> > 
> > Gluster-users@gluster.org
> > 
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> Gluster-users mailing list
> 
> Gluster-users@gluster.org
> 
> http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-06-22 Thread mabi
Hi,

Now that this issue has happened a few times I noticed a few things which might 
be helpful for debugging:

- This problem happens when files are uploaded via a cloud app called Nextcloud 
where the files are encrypted by the app itself on the server side (PHP code) 
but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have 
server side  encryption enabled.
- When this happens both first and second node of the replica have 120k of 
context switches and 25k interrupts, the arbiter node 30k context switches/20k 
interrupts. No nodes are overloaded, there is no io/wait and no network issues 
or disconnections.
- All of the problematic files to heal have spaces in one of their 
sub-directories (might be totally irrelevant).

If that's of any use my two replica nodes are Debian 8 physical servers with 
ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine 
with XFS as file system for the brick. To mount the volume I use a glusterfs 
fuse mount on the web server which has Nextcloud running. 

Regards,
M.​​

‐‐‐ Original Message ‐‐‐

On May 25, 2018 5:55 PM, mabi  wrote:

> ​​
> 
> Thanks Ravi. Let me know when you have time to have a look. It sort of 
> happens around once or twice per week but today it was 24 files in one go 
> which are unsynched and where I need to manually reset the xattrs on the 
> arbiter node.
> 
> By the way on this volume I use quotas which I set on specifc directories, I 
> don't know if this is relevant or not but thought I would just mention.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote:
> 
> > On 05/23/2018 12:47 PM, mabi wrote:
> > 
> > > Hello,
> > > 
> > > I just wanted to ask if you had time to look into this bug I am 
> > > encountering and if there is anything else I can do?
> > > 
> > > For now in order to get rid of these 3 unsynched files shall I do the 
> > > same method that was suggested to me in this thread?
> > 
> > Sorry Mabi,  I haven't had a chance to dig deeper into this. The
> > 
> > workaround of resetting xattrs should be fine though.
> > 
> > Thanks,
> > 
> > Ravi
> > 
> > > Thanks,
> > > 
> > > Mabi
> > > 
> > > ‐‐‐ Original Message ‐‐‐
> > > 
> > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:
> > > 
> > > > Hi Ravi,
> > > > 
> > > > Please fine below the answers to your questions
> > > > 
> > > > 1.  I have never touched the cluster.quorum-type option. Currently it 
> > > > is set as following for this volume:
> > > > 
> > > > Option Value
> > > > 
> > > > 
> > > > cluster.quorum-type none
> > > > 
> > > > 2.  The .shareKey files are not supposed to be empty. They should be 
> > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in 
> > > > a position to say why it is in this specific case only 0 bytes and if 
> > > > it is the fault of the software (Nextcloud) or GlusterFS. I can just 
> > > > say here that I have another file server which is a simple NFS server 
> > > > with another Nextcloud installation and there I never saw any 0 bytes 
> > > > .shareKey files being created.
> > > > 
> > > > 3.  It seems to be quite random and I am not the person who uses the 
> > > > Nextcloud software so I can't say what it was doing at that specific 
> > > > time but I guess uploading files or moving files around. Basically I 
> > > > use GlusterFS to store the files/data of the Nextcloud web application 
> > > > where I have it mounted using a fuse mount (mount -t glusterfs).
> > > > 
> > > > 
> > > > Regarding the logs I have attached the mount log file from the client 
> > > > and below are the relevant log entries from the brick log file of all 3 
> > > > nodes. Let me know if you need any other log files. Also if you know 
> > > > any "log file sanitizer tool" which can replace sensitive file names 
> > > > with random file names in log files that would like to use it as right 
> > > > now I have to do that manually.
> > > > 
> > > > NODE 1 brick log:
> > > > 
> > > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] 
> > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on 
> > > > /data/myvol-private/brick/cloud/data/ad

[Gluster-users] GlusterFS 4.1.x deb packages missing for Debian 8 (jessie)

2018-10-19 Thread mabi
Hello,

I just upgraded all my Debian 9 (stretch) GlusterFS servers from 3.12.14 to 
4.1.5 but unfortunately my GlusterFS clients are all Debian 8 (jessie) machines 
and there are no single GlusterFS 4.1.x package available for Debian 8 as I 
found out here:

https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/

May I kindly ask the GlusterFS packaging team or the person responsible for 
this task to please also provide the packages for Debian 8?

Right now I am running a GlusterFS 4.1.5 servers with GlusterFS 3.12.14 clients 
(FUSE mount). Could this create any problems or is not unsafe? I did not 
upgrade the op-version on the server yet.

Thank you very much in advance.

Best regards,
Mabi







___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 4.1.x deb packages missing for Debian 8 (jessie)

2018-10-24 Thread mabi
Anyone?

I would really like to be able to install GlusterFS 4.1.x on Debian 8 (jessie). 
This version of Debian 8 is still widely in use and IMHO there should be a 
GlusterFS package for it.

Many thanks in advance for your consideration.


‐‐‐ Original Message ‐‐‐
On Friday, October 19, 2018 10:58 PM, mabi  wrote:

> Hello,
>
> I just upgraded all my Debian 9 (stretch) GlusterFS servers from 3.12.14 to 
> 4.1.5 but unfortunately my GlusterFS clients are all Debian 8 (jessie) 
> machines and there are no single GlusterFS 4.1.x package available for Debian 
> 8 as I found out here:
>
> https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/
>
> May I kindly ask the GlusterFS packaging team or the person responsible for 
> this task to please also provide the packages for Debian 8?
>
> Right now I am running a GlusterFS 4.1.5 servers with GlusterFS 3.12.14 
> clients (FUSE mount). Could this create any problems or is not unsafe? I did 
> not upgrade the op-version on the server yet.
>
> Thank you very much in advance.
>
> Best regards,
> Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Who is the package maintainer for GlusterFS 4.1?

2018-10-29 Thread mabi
Hello,

I would like to know how I can contact the package maintainer for the GluserFS 
4.1.x packages?

I have noticed that Debian 8 (jessie) is missing here:

https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/

Thank you very much in advance.

Best regards,
Mabi




___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Who is the package maintainer for GlusterFS 4.1?

2018-10-29 Thread mabi
Hi Kaleb

Thank you for your answer. Now I understand why there are no packages for 
Debian 8.

Nevertheless I just need the client package for Debian 8 clients 
(glusterfs-client + glusterfs-common packages) not the server package. I guess 
here that the client package do not require golang.

If this is the case would it be possible to have just the glusterfs 4.1 client 
package available for Debian 8?

Best regards,
M.

‐‐‐ Original Message ‐‐‐
On Monday, October 29, 2018 1:44 PM, Kaleb S. KEITHLEY  
wrote:

> On 10/29/18 6:31 AM, mabi wrote:
>
> > Hello,
> > I would like to know how I can contact the package maintainer for the 
> > GluserFS 4.1.x packages?
> > I have noticed that Debian 8 (jessie) is missing here:
> > https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/
> > Thank you very much in advance.
>
> Community GlusterFS packages are built by multiple volunteers.
>
> GlusterFS 4.0, 4.1, and 5.0 packages aren't missing; they have never
> been built for Debian 8 jessie. One reason is that jessie doesn't have a
> new enough golang compiler (even in backports) to build glusterd2.
>
> If you want to build packages without glusterd2 for jessie the packaging
> files are at https://github.com/gluster/glusterfs-debian.
>
> The distributions that packages are built for are listed at
> https://docs.gluster.org/en/latest/Install-Guide/Community_Packages/
> History for this page is in github at
> https://github.com/gluster/glusterdocs/blob/master/docs/Install-Guide/Community_Packages.md
>
> ---
>
> Kaleb
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-01 Thread mabi
Is it possible the problem I encounter described below is caused by the 
following bug, introduced in 4.1.5:

Bug 1637953 - data-self-heal in arbiter volume results in stale locks.
https://bugzilla.redhat.com/show_bug.cgi?id=1637953

Regards,
M.

‐‐‐ Original Message ‐‐‐
On Wednesday, October 31, 2018 11:13 AM, mabi  wrote:

> Hello,
>
> I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and 
> currently have a volume with around 27174 files which are not being healed. 
> The "volume heal info" command shows the same 27k files under the first node 
> and the second node but there is nothing under the 3rd node (arbiter).
>
> I already tried running a "volume heal" but none of the files got healed.
>
> In the glfsheal log file for that particular volume the only error I see is a 
> few of these entries:
>
> [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 
> 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 
> 1800 for 127.0.1.1:49152
>
> and then a few of these warnings:
>
> [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
>  [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) 
> [0x7f2a798e8a84] 
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) 
> [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
>
> the glustershd.log file shows the following:
>
> [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 
> 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 
> 1800 for 127.0.1.1:49152
> [2018-10-31 10:10:52.502502] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: 
> remote operation failed [Transport endpoint is not connected]
>
> any idea what could be wrong here?
>
> Regards,
> Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] quota: error returned while attempting to connect to host:(null), port:0

2018-11-01 Thread mabi
I also noticed the following error in the glusterd.log file today related to 
quota:

[2018-11-01 13:32:06.159865] E [MSGID: 101042] [compat.c:597:gf_umount_lazy] 
0-management: Lazy unmount of /var/run/gluster/myvol-private_quota_limit/
[2018-11-01 13:32:06.160251] E [MSGID: 106363] 
[glusterd-utils.c:12556:glusterd_remove_auxiliary_mount] 0-management: umount 
on /var/run/gluster/myvol-private_quota_limit/ failed, reason : Success

Something must be wrong with the quotas?


‐‐‐ Original Message ‐‐‐
On Tuesday, October 30, 2018 6:24 PM, mabi  wrote:

> Hello,
>
> Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I 
> see quite a lot of the following error message in the brick log file for one 
> of my volumes where I have quota enabled:
>
> [2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
> 0-myvol-private-quota: error returned while attempting to connect to 
> host:(null), port:0
>
> Is this a bug? should I file a bug report? or does anyone know what is wrong 
> here maybe with my system?
>
> Best regards,
> Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] quota: error returned while attempting to connect to host:(null), port:0

2018-10-30 Thread mabi
Hello,

Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I see 
quite a lot of the following error message in the brick log file for one of my 
volumes where I have quota enabled:

[2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-myvol-private-quota: error returned while attempting to connect to 
host:(null), port:0

Is this a bug? should I file a bug report? or does anyone know what is wrong 
here maybe with my system?

Best regards,
Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] quota: error returned while attempting to connect to host:(null), port:0

2018-10-31 Thread mabi
I also noticed in the quotad.log file a lot of the following error messages:

The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 
'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument]" repeated 
107 times between [2018-10-31 08:00:27.718645] and [2018-10-31 08:02:04.476875]
The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 
'volume-uuid' is not sent on wire [Invalid argument]" repeated 107 times 
between [2018-10-31 08:00:27.718696] and [2018-10-31 08:02:04.476876]
[2018-10-31 08:02:14.629667] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 
0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid 
argument]
[2018-10-31 08:02:14.629746] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 
0-dict: key 'volume-uuid' is not sent on wire [Invalid argument]

Maybe this is related...


‐‐‐ Original Message ‐‐‐
On Tuesday, October 30, 2018 6:24 PM, mabi  wrote:

> Hello,
>
> Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I 
> see quite a lot of the following error message in the brick log file for one 
> of my volumes where I have quota enabled:
>
> [2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
> 0-myvol-private-quota: error returned while attempting to connect to 
> host:(null), port:0
>
> Is this a bug? should I file a bug report? or does anyone know what is wrong 
> here maybe with my system?
>
> Best regards,
> Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-10-31 Thread mabi
Hello,

I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and 
currently have a volume with around 27174 files which are not being healed. The 
"volume heal info" command shows the same 27k files under the first node and 
the second node but there is nothing under the 3rd node (arbiter).

I already tried running a "volume heal" but none of the files got healed.

In the glfsheal log file for that particular volume the only error I see is a 
few of these entries:

[2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800 
for 127.0.1.1:49152

and then a few of these warnings:

[2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
 [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) 
[0x7f2a798e8a84] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) 
[0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]

the glustershd.log file shows the following:

[2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800 
for 127.0.1.1:49152
[2018-10-31 10:10:52.502502] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: 
remote operation failed [Transport endpoint is not connected]

any idea what could be wrong here?

Regards,
Mabi

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-03 Thread mabi
Ravi, I actually restarted glustershd by unmounting my volume on the clients, 
stopping and starting the volume on the cluster and re-mounting it on the 
clients yesterday evening and it managed to get around 1500~ files cleared from 
the "volume heal info" output. So I am down now to around ~25k more files to 
heal. While restarting the volume I saw the following log entries in the brick 
log file:

[2018-11-02 17:51:07.078738] W [inodelk.c:610:pl_inodelk_log_cleanup] 
0-myvol-private-server: releasing lock on da4f31fb-ac53-4d78-a633-f0046ac3ebcc 
held by {client=0x7fd48400c160, pid=-6 lk-owner=b0d405e0167f}


What also bothers me is that if I run a manual "volume heal" nothing happens 
except the following log entry in glusterd log:

[2018-11-03 06:32:16.033214] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: 
error returned while attempting to connect to host:(null), port:0

That does not seem normal... what do you think?


‐‐‐ Original Message ‐‐‐
On Saturday, November 3, 2018 1:31 AM, Ravishankar N  
wrote:

> Mabi,
>
> If bug 1637953 is what you are experiencing, then you need to follow the
> workarounds mentioned in
> https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> Can you see if this works?
>
> -Ravi
>
> On 11/02/2018 11:40 PM, mabi wrote:
>
> > I tried again to manually run a heal by using the "gluster volume heal" 
> > command because still not files have been healed and noticed the following 
> > warning in the glusterd.log file:
> > [2018-11-02 18:04:19.454702] I [MSGID: 106533] 
> > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: 
> > Received heal vol req for volume myvol-private
> > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
> > 0-glustershd: error returned while attempting to connect to host:(null), 
> > port:0
> > It looks like the glustershd can't connect to "host:(null)", could that be 
> > the reason why there is no healing taking place? if yes why do I see here 
> > "host:(null)"? and what needs fixing?
> > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.
> > I really would appreciate some help here, I suspect being an issue with 
> > GlusterFS 4.1.5.
> > Thank you in advance for any feedback.
> > ‐‐‐ Original Message ‐‐‐
> > On Wednesday, October 31, 2018 11:13 AM, mabi m...@protonmail.ch wrote:
> >
> > > Hello,
> > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and 
> > > currently have a volume with around 27174 files which are not being 
> > > healed. The "volume heal info" command shows the same 27k files under the 
> > > first node and the second node but there is nothing under the 3rd node 
> > > (arbiter).
> > > I already tried running a "volume heal" but none of the files got healed.
> > > In the glfsheal log file for that particular volume the only error I see 
> > > is a few of these entries:
> > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 
> > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> > > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 
> > > 1800 for 127.0.1.1:49152
> > > and then a few of these warnings:
> > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] 
> > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
> > >  [0x7f2a6dff434a] 
> > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] 
> > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) 
> > > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
> > > the glustershd.log file shows the following:
> > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 
> > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> > > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout 
> > > = 1800 for 127.0.1.1:49152
> > > [2018-10-31 10:10:52.502502] E [MSGID: 114031] 
> > > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 
> > > 0-myvol-private-client-0: remote operation failed [Transport endpoint is 
> > > not connected]
> > > any idea what could be wrong here?
> > > Regards,
> > > Mabi
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-03 Thread mabi
Ravi (or anyone else who can help), I now have even more files which are 
pending for healing. Here is the output of a "volume heal info summary":

Brick node1:/data/myvol-private/brick
Status: Connected
Total Number of entries: 49845
Number of entries in heal pending: 49845
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node2:/data/myvol-private/brick
Status: Connected
Total Number of entries: 26644
Number of entries in heal pending: 26644
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node3:/srv/glusterfs/myvol-private/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Should I try to set the "cluster.data-self-heal" parameter of that volume to 
"off" as mentioned in the bug?

And by doing that, does it mean that my files pending heal are in danger of 
being lost?

Also is it dangerous to leave "cluster.data-self-heal" to off?



‐‐‐ Original Message ‐‐‐
On Saturday, November 3, 2018 1:31 AM, Ravishankar N  
wrote:

> Mabi,
>
> If bug 1637953 is what you are experiencing, then you need to follow the
> workarounds mentioned in
> https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> Can you see if this works?
>
> -Ravi
>
> On 11/02/2018 11:40 PM, mabi wrote:
>
> > I tried again to manually run a heal by using the "gluster volume heal" 
> > command because still not files have been healed and noticed the following 
> > warning in the glusterd.log file:
> > [2018-11-02 18:04:19.454702] I [MSGID: 106533] 
> > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: 
> > Received heal vol req for volume myvol-private
> > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
> > 0-glustershd: error returned while attempting to connect to host:(null), 
> > port:0
> > It looks like the glustershd can't connect to "host:(null)", could that be 
> > the reason why there is no healing taking place? if yes why do I see here 
> > "host:(null)"? and what needs fixing?
> > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.
> > I really would appreciate some help here, I suspect being an issue with 
> > GlusterFS 4.1.5.
> > Thank you in advance for any feedback.
> > ‐‐‐ Original Message ‐‐‐
> > On Wednesday, October 31, 2018 11:13 AM, mabi m...@protonmail.ch wrote:
> >
> > > Hello,
> > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and 
> > > currently have a volume with around 27174 files which are not being 
> > > healed. The "volume heal info" command shows the same 27k files under the 
> > > first node and the second node but there is nothing under the 3rd node 
> > > (arbiter).
> > > I already tried running a "volume heal" but none of the files got healed.
> > > In the glfsheal log file for that particular volume the only error I see 
> > > is a few of these entries:
> > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 
> > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> > > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 
> > > 1800 for 127.0.1.1:49152
> > > and then a few of these warnings:
> > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] 
> > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
> > >  [0x7f2a6dff434a] 
> > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] 
> > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) 
> > > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
> > > the glustershd.log file shows the following:
> > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 
> > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> > > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout 
> > > = 1800 for 127.0.1.1:49152
> > > [2018-10-31 10:10:52.502502] E [MSGID: 114031] 
> > > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 
> > > 0-myvol-private-client-0: remote operation failed [Transport endpoint is 
> > > not connected]
> > > any idea what could be wrong here?
> > > Regards,
> > > Mabi
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-05 Thread mabi
Ravi, I did not yet modify the cluster.data-self-heal parameter to off because 
in the mean time node2 of my cluster had a memory shortage (this node has 32 GB 
of RAM) and as such I had to reboot it. After that reboot all locks got 
released and there are no more files left to heal on that volume. So the reboot 
of node2 did the trick (but this still seems to be a bug).

Now on another volume of this same cluster I have a total of 8 files (4 of them 
being directories) unsynced from node1 and node3 (arbiter) as you can see below:

Brick node1:/data/myvol-pro/brick
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir

/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir

Status: Connected
Number of entries: 4

Brick node2:/data/myvol-pro/brick
Status: Connected
Number of entries: 0

Brick node3:/srv/glusterfs/myvol-pro/brick
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir


Status: Connected
Number of entries: 4

If I check the "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" with 
an "ls -l" directory on the client (gluster fuse mount) I get the following 
garbage:

drwxr-xr-x  4 www-data www-data 4096 Nov  5 14:19 .
drwxr-xr-x 31 www-data www-data 4096 Nov  5 14:23 ..
d?  ? ??   ?? le_dir

I checked on the nodes and indeed node1 and node3 have the same directory from 
the time 14:19 but node2 has a directory from the time 14:12.

Again here the self-heal daemon doesn't seem to be doing anything... What do 
you recommend me to do in order to heal these unsynced files?



‐‐‐ Original Message ‐‐‐
On Monday, November 5, 2018 2:42 AM, Ravishankar N  
wrote:

>
>
> On 11/03/2018 04:13 PM, mabi wrote:
>
> > Ravi (or anyone else who can help), I now have even more files which are 
> > pending for healing.
>
> If the count is increasing, there is likely a network (disconnect)
> problem between the gluster clients and the bricks that needs fixing.
>
> > Here is the output of a "volume heal info summary":
> > Brick node1:/data/myvol-private/brick
> > Status: Connected
> > Total Number of entries: 49845
> > Number of entries in heal pending: 49845
> > Number of entries in split-brain: 0
> > Number of entries possibly healing: 0
> > Brick node2:/data/myvol-private/brick
> > Status: Connected
> > Total Number of entries: 26644
> > Number of entries in heal pending: 26644
> > Number of entries in split-brain: 0
> > Number of entries possibly healing: 0
> > Brick node3:/srv/glusterfs/myvol-private/brick
> > Status: Connected
> > Total Number of entries: 0
> > Number of entries in heal pending: 0
> > Number of entries in split-brain: 0
> > Number of entries possibly healing: 0
> > Should I try to set the "cluster.data-self-heal" parameter of that volume 
> > to "off" as mentioned in the bug?
>
> Yes, as  mentioned in the workaround in the thread that I shared.
>
> > And by doing that, does it mean that my files pending heal are in danger of 
> > being lost?
>
> No.
>
> > Also is it dangerous to leave "cluster.data-self-heal" to off?
>
> No. This is only disabling client side data healing. Self-heal daemon
> would still heal the files.
> -Ravi
>
> > ‐‐‐ Original Message ‐‐‐
> > On Saturday, November 3, 2018 1:31 AM, Ravishankar N ravishan...@redhat.com 
> > wrote:
> >
> > > Mabi,
> > > If bug 1637953 is what you are experiencing, then you need to follow the
> > > workarounds mentioned in
> > > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> > > Can you see if this works?
> > > -Ravi
> > > On 11/02/2018 11:40 PM, mabi wrote:
> > >
> > > > I tried again to manually run a heal by using the "gluster volume heal" 
> > > > command because still not files have been healed and noticed the 
> > > > following warning in the glusterd.log file:
> > > > [2018-11-02 18:04:19.454702] I [MSGID: 106533] 
> > > > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 
> > > > 0-management: Received heal vol req for volume myvol-private
> > > > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 
> > > > 0-glustershd: error returned while attempting to connect to 
> > > > host:(null), port:0
> > > > It looks like the glustershd can't connect to "host:(null)", could that 
> > > > be the reason why there is no healing taking place? if yes why do I see 
> > > > here "host:(null)"? a

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-02 Thread mabi
I tried again to manually run a heal by using the "gluster volume heal" command 
because still not files have been healed and noticed the following warning in 
the glusterd.log file:

[2018-11-02 18:04:19.454702] I [MSGID: 106533] 
[glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: 
Received heal vol req for volume myvol-private
[2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: 
error returned while attempting to connect to host:(null), port:0

It looks like the glustershd can't connect to "host:(null)", could that be the 
reason why there is no healing taking place? if yes why do I see here 
"host:(null)"? and what needs fixing?

This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.

I really would appreciate some help here, I suspect being an issue with 
GlusterFS 4.1.5.

Thank you in advance for any feedback.


‐‐‐ Original Message ‐‐‐
On Wednesday, October 31, 2018 11:13 AM, mabi  wrote:

> Hello,
>
> I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and 
> currently have a volume with around 27174 files which are not being healed. 
> The "volume heal info" command shows the same 27k files under the first node 
> and the second node but there is nothing under the 3rd node (arbiter).
>
> I already tried running a "volume heal" but none of the files got healed.
>
> In the glfsheal log file for that particular volume the only error I see is a 
> few of these entries:
>
> [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 
> 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 
> 1800 for 127.0.1.1:49152
>
> and then a few of these warnings:
>
> [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
>  [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) 
> [0x7f2a798e8a84] 
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) 
> [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
>
> the glustershd.log file shows the following:
>
> [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 
> 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 
> 1800 for 127.0.1.1:49152
> [2018-10-31 10:10:52.502502] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: 
> remote operation failed [Transport endpoint is not connected]
>
> any idea what could be wrong here?
>
> Regards,
> Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-08 Thread mabi
‐‐‐ Original Message ‐‐‐
On Thursday, November 8, 2018 11:05 AM, Ravishankar N  
wrote:

> It is not a split-brain. Nodes 1 and 3 have xattrs indicating a pending
> entry heal on node2 , so heal must have happened ideally. Can you check
> a few things?

> -   Is there any disconnects between each of the shds and the brick
> processes (check via statedump or look for disconnect messages in
> glustershd.log). Does restarting shd via a `volume start force` solve
> the problem?

Yes there is one disconnect at 14:21 (UTC 13:21) because node2 ran out of 
memory (although it has 32 GB of RAM) and I had to reboot it. Here are the 
relevant log entries taken from glustershd.log on node1:

[2018-11-05 13:21:16.284239] C 
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-myvol-pro-client-1: server 
192.168.10.33:49154 has not responded in the last 42 seconds, disconnecting.
[2018-11-05 13:21:16.284385] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-myvol-pro-client-1: disconnected from 
myvol-pro-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2018-11-05 13:21:16.284889] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 
0-myvol-pro-client-1: socket disconnected

I also just ran a "volume start force" and saw that the glustershd processes 
got restarted on all 3 nodes but that did not trigger any healing. There are 
still the same amount of files/dirs pending heal...

> -   Is the symlink pointing to oc_dir present inside .glusterfs/25/e2 in
> all 3 bricks?

They are yes for node1 and node3 but node2 there is no such symlink...

I hope that helps to debug the issue further, else please let me know if you 
need more info
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-07 Thread mabi
Dear Ravi,

Thank you for your answer. I will start first by sending you below the getfattr 
from the first entry which does not get healed (it is in fact a directory). It 
is the following path/dir from the output of one of my previous mails: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir

# NODE 1
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00030003
trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19
trusted.glusterfs.dht=0x0001

# NODE 2
trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a
trusted.glusterfs.dht=0x0001

# NODE 3 (arbiter)
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00030003
trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19
trusted.glusterfs.dht=0x0001

Notice here that node 2 does not seem to have any AFR attributes which must be 
problematic. Also that specific directory on my node 2 has the oldest timestamp 
(14:12) where as that same directory on node 1 and 3 have 14:19 as timestamp.

I did run "volume heal myvol-pro" and on the console it shows:

Launching heal operation to perform index self heal on volume myvol-pro has 
been successful
Use heal info commands to check status.

but then in the glustershd.log file of both 3 nodes there has been nothing new 
logged.

The log file cmd_history.log shows:
[2018-11-08 07:20:24.481603]  : volume heal myvol-pro : SUCCESS

and glusterd.log:
[2018-11-08 07:20:24.474032] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: 
error returned while attempting to connect to host:(null), port:0

That's it... To me it looks like a split-brain but GlusterFS does not report it 
as split-brain and neither does any self-heal on it.

What do you think?

Regards,
M.

‐‐‐ Original Message ‐‐‐
On Thursday, November 8, 2018 5:00 AM, Ravishankar N  
wrote:

> Can you share the getfattr output of all 4 entries from all 3 bricks?
>
> Also, can you tailf glustershd.log on all nodes and see if anything is
> logged for these entries when you run 'gluster volume heal $volname'?
>
> Regards,
>
> Ravi
>
> On 11/07/2018 01:22 PM, mabi wrote:
>
> > To my eyes this specific case looks like a split-brain scenario but the 
> > output of "volume info split-brain" does not show any files. Should I still 
> > use the process for split-brain files as documented in the glusterfs 
> > documentation? or what do you recommend here?
> > ‐‐‐ Original Message ‐‐‐
> > On Monday, November 5, 2018 4:36 PM, mabi m...@protonmail.ch wrote:
> >
> > > Ravi, I did not yet modify the cluster.data-self-heal parameter to off 
> > > because in the mean time node2 of my cluster had a memory shortage (this 
> > > node has 32 GB of RAM) and as such I had to reboot it. After that reboot 
> > > all locks got released and there are no more files left to heal on that 
> > > volume. So the reboot of node2 did the trick (but this still seems to be 
> > > a bug).
> > > Now on another volume of this same cluster I have a total of 8 files (4 
> > > of them being directories) unsynced from node1 and node3 (arbiter) as you 
> > > can see below:
> > > Brick node1:/data/myvol-pro/brick
> > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
> > > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360
> > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir
> > > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf
> > > Status: Connected
> > > Number of entries: 4
> > > Brick node2:/data/myvol-pro/brick
> > > Status: Connected
> > > Number of entries: 0
> > > Brick node3:/srv/glusterfs/myvol-pro/brick
> > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
> > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir
> > > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf
> > > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360
> > > Status: Connected
> > > Number of entries: 4
> > > If I check the 
> > > "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" with an "ls 
> > > -l" directory on the client (gluster fuse mount) I get the following 
> > > garbage:
> > > drwxr-xr-x 4 www-data www-data 4096 Nov 5 14:19 .
> > > drwxr-xr-x 31 www-data www-data 4096 Nov 5 14:23 ..
> > > d? ? ? ? ? ? le_dir
> > > I checked on the nodes and indeed node1 and node3 have the same directory 
> > > from the time 14:19 but node2 has a directory from the time 14:12.
> > > Again here the self-heal daemon doesn't seem to be doing anything... What 
> > >

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-12 Thread mabi
‐‐‐ Original Message ‐‐‐
On Friday, November 9, 2018 2:11 AM, Ravishankar N  
wrote:

> Please re-create the symlink on node 2 to match how it is in the other
> nodes and launch heal again. Check if this is the case for other entries
> too.
> -Ravi

I can't create the missing symlink on node2 because the target 
(../../70/c8/70c894ca-422b-4bce-acf1-5cfb4669abbd/oc_dir) of that link does not 
exist. So basically the symlink and the target of that symlink are missing.

Or shall I create a symlink to a non-existing target?

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-12 Thread mabi
‐‐‐ Original Message ‐‐‐
On Friday, November 9, 2018 2:11 AM, Ravishankar N  
wrote:

> Please re-create the symlink on node 2 to match how it is in the other
> nodes and launch heal again. Check if this is the case for other entries
> too.
> -Ravi

Please ignore my previous mail, I was looking for a symlink with the GFID of 
node1 or node 3 on my node2 whereas I should have been looking with the GFID of 
node2 of course. I have now found the symlink on node2 pointing to that 
problematic directory and it looks like this:

node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac
node2# ls -la | grep d9ac19
lrwxrwxrwx 1 root root  66 Nov  5 14:12 
d9ac192c-e85e-4402-af10-5551f587ed9a -> 
../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir

When you say "re-create the symlink", do you mean I should delete the current 
symlink on node2 (d9ac192c-e85e-4402-af10-5551f587ed9a) and re-create it with 
the GFID which is used on my node 1 and node 3 like this?

node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac
node2# rm d9ac192c-e85e-4402-af10-5551f587ed9a
node2# cd /data/myvol-pro/brick/.glusterfs/25/e2
node2# ln -s ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir 
25e2616b-4fb6-4b2a-8945-1afc956fff19

Just want to make sure I understood you correctly before doing that. Could you 
please let me know if this is correct?

Thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Directory selfheal failed: Unable to form layout for directory on 4.1.5 fuse client

2018-11-13 Thread mabi
Hi,

I just wanted to report that since I upgraded my GluterFS client from 3.12.14 
to 4.1.5 on a Debian 9 client which uses FUSE mount I see a lot of these 
entries for many different directories in the mnt log file on the client:

[2018-11-13 21:28:34.626351] I [MSGID: 109005] 
[dht-selfheal.c:2342:dht_selfheal_directory] 0-myvol-pro-dht: Directory 
selfheal failed: Unable to form layout for directory /data/dir1/dir2

Never saw these info messages in the past. My server is a 3 node replica with 
arbiter running 4.1.5 on Debian 9.

It looks like what I am seeing is this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1567100

Is it possible that this bug has not made it yet into a release? or is it maybe 
a regression?

Regards,
Mabi


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-15 Thread mabi
‐‐‐ Original Message ‐‐‐
On Thursday, November 15, 2018 1:41 PM, Ravishankar N  
wrote:

> Thanks, noted. One more query. Are there files inside each of these
> directories? Or is it just empty directories?

You will find below the content of each of these 3 directories taken the brick 
on node 1:

i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10

drwxr-xr-x  4 www-data www-data  4 Nov  5 14:19 .
drwxr-xr-x 31 www-data www-data 31 Nov  5 14:23 ..
drwxr-xr-x  3 www-data www-data  3 Nov  5 14:19 dir11
drwxr-xr-x  3 www-data www-data  3 Nov  5 14:19 another_dir

ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/
drwxr-xr-x 3 www-data www-data 3 Nov  5 14:19 .
drwxr-xr-x 4 www-data www-data 4 Nov  5 14:19 ..
drwxr-xr-x 2 www-data www-data 4 Nov  5 14:19 oc_dir

iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir

drwxr-xr-x 2 www-data www-data   4 Nov  5 14:19 .
drwxr-xr-x 3 www-data www-data   3 Nov  5 14:19 ..
-rw-r--r-- 2 www-data www-data  32 Nov  5 14:19 fileKey
-rw-r--r-- 2 www-data www-data 512 Nov  5 14:19 username.shareKey

so as you see from the output above only the "oc_dir" directory has two files 
inside.


> symlinks are only for dirs. For files, they would be hard links to the
> actual files. So if stat
> ../brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf gives you
> a file, then you can use find -samefile to get the other hardlinks like so:
> #cd /brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf
> #find /brick -samefile aae4098a-1a71-4155-9cc9-e564b89957cf
>
> If it is a hardlink, then you can do a getfattr on
> /brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf itself.
> -Ravi

Thank you for explaining this important part. So yes with your help I could 
find the filenames associated to these 2 GFIDs and guess what? they are the 2 
files which listed in the output above of the "oc_dir" directory. Have a look 
at this:

# find /data/myvol-pro/brick -samefile aae4098a-1a71-4155-9cc9-e564b89957cf
/data/myvol-pro/brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf
/data/myvol-pro/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey

# find /data/myvol-pro/brick -samefile 3c92459b-8fa1-4669-9a3d-b38b8d41c360
/data/myvol-pro/brick/.glusterfs/3c/92/3c92459b-8fa1-4669-9a3d-b38b8d41c360
/data/myvol-pro/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey

I hope that helps the debug further else let me know if you need anything else.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-06 Thread mabi
To my eyes this specific case looks like a split-brain scenario but the output 
of "volume info split-brain" does not show any files. Should I still use the 
process for split-brain files as documented in the glusterfs documentation? or 
what do you recommend here?


‐‐‐ Original Message ‐‐‐
On Monday, November 5, 2018 4:36 PM, mabi  wrote:

> Ravi, I did not yet modify the cluster.data-self-heal parameter to off 
> because in the mean time node2 of my cluster had a memory shortage (this node 
> has 32 GB of RAM) and as such I had to reboot it. After that reboot all locks 
> got released and there are no more files left to heal on that volume. So the 
> reboot of node2 did the trick (but this still seems to be a bug).
>
> Now on another volume of this same cluster I have a total of 8 files (4 of 
> them being directories) unsynced from node1 and node3 (arbiter) as you can 
> see below:
>
> Brick node1:/data/myvol-pro/brick
> /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
> gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360
> /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir
> gfid:aae4098a-1a71-4155-9cc9-e564b89957cf
> Status: Connected
> Number of entries: 4
>
> Brick node2:/data/myvol-pro/brick
> Status: Connected
> Number of entries: 0
>
> Brick node3:/srv/glusterfs/myvol-pro/brick
> /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
> /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir
> gfid:aae4098a-1a71-4155-9cc9-e564b89957cf
> gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360
> Status: Connected
> Number of entries: 4
>
> If I check the "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" 
> with an "ls -l" directory on the client (gluster fuse mount) I get the 
> following garbage:
>
> drwxr-xr-x 4 www-data www-data 4096 Nov 5 14:19 .
> drwxr-xr-x 31 www-data www-data 4096 Nov 5 14:23 ..
> d? ? ? ? ? ? le_dir
>
> I checked on the nodes and indeed node1 and node3 have the same directory 
> from the time 14:19 but node2 has a directory from the time 14:12.
>
> Again here the self-heal daemon doesn't seem to be doing anything... What do 
> you recommend me to do in order to heal these unsynced files?
>
> ‐‐‐ Original Message ‐‐‐
> On Monday, November 5, 2018 2:42 AM, Ravishankar N ravishan...@redhat.com 
> wrote:
>
> > On 11/03/2018 04:13 PM, mabi wrote:
> >
> > > Ravi (or anyone else who can help), I now have even more files which are 
> > > pending for healing.
> >
> > If the count is increasing, there is likely a network (disconnect)
> > problem between the gluster clients and the bricks that needs fixing.
> >
> > > Here is the output of a "volume heal info summary":
> > > Brick node1:/data/myvol-private/brick
> > > Status: Connected
> > > Total Number of entries: 49845
> > > Number of entries in heal pending: 49845
> > > Number of entries in split-brain: 0
> > > Number of entries possibly healing: 0
> > > Brick node2:/data/myvol-private/brick
> > > Status: Connected
> > > Total Number of entries: 26644
> > > Number of entries in heal pending: 26644
> > > Number of entries in split-brain: 0
> > > Number of entries possibly healing: 0
> > > Brick node3:/srv/glusterfs/myvol-private/brick
> > > Status: Connected
> > > Total Number of entries: 0
> > > Number of entries in heal pending: 0
> > > Number of entries in split-brain: 0
> > > Number of entries possibly healing: 0
> > > Should I try to set the "cluster.data-self-heal" parameter of that volume 
> > > to "off" as mentioned in the bug?
> >
> > Yes, as  mentioned in the workaround in the thread that I shared.
> >
> > > And by doing that, does it mean that my files pending heal are in danger 
> > > of being lost?
> >
> > No.
> >
> > > Also is it dangerous to leave "cluster.data-self-heal" to off?
> >
> > No. This is only disabling client side data healing. Self-heal daemon
> > would still heal the files.
> > -Ravi
> >
> > > ‐‐‐ Original Message ‐‐‐
> > > On Saturday, November 3, 2018 1:31 AM, Ravishankar N 
> > > ravishan...@redhat.com wrote:
> > >
> > > > Mabi,
> > > > If bug 1637953 is what you are experiencing, then you need to follow the
> > > > workarounds mentioned in
> > > > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> > > > Can you see if this works?
> > > > -Ravi
> >

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-14 Thread mabi
‐‐‐ Original Message ‐‐‐
On Wednesday, November 14, 2018 5:34 AM, Ravishankar N  
wrote:

> I thought it was missing which is why I asked you to create it.  The
> trusted.gfid xattr for any given file or directory must be same in all 3
> bricks.  But it looks like that isn't the case. Are the gfids and the
> symlinks for all the dirs leading to the parent dir of oc_dir same on
> all nodes? (i.e evey directory in
> /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/)?

I now checked the GFIDs of all directories leading back down to the parent dir 
(13 directories in total) and for node 1 and node 3 the GIFDs of all underlying 
directories match each other. On node 2 they are also all the same except for 
the two highest directories (".../dir11" and and ".../dir11/oc_dir"). It's 
exactly these two directories which are also listed in the "volume heal info" 
output under node 1 and node 2 and which do not get healed.

For your reference I have pasted below the GFIDs for all underlying directories 
up to the parent directory and for all 3 nodes. I start at the top with the 
highest directory and at the bottom of the list is the parent directory (/data).

# NODE 1

trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 # 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir
trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd # 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 # ...
trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82
trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4
trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94
trusted.gfid=0xf120657977274247900db4e9cc8129dd
trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9
trusted.gfid=0x2174086880fc4fd19b187d1384300add
trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 # ...
trusted.gfid=0xa7d78519db61459399e01fad2badf3fb # /data/dir1/dir2
trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 # /data/dir1
trusted.gfid=0x2683990126724adbb6416b911180e62b # /data

# NODE 2

trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a
trusted.gfid=0x10ec1eb1c8544ff2a36c325681713093
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269
trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82
trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4
trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94
trusted.gfid=0xf120657977274247900db4e9cc8129dd
trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9
trusted.gfid=0x2174086880fc4fd19b187d1384300add
trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01
trusted.gfid=0xa7d78519db61459399e01fad2badf3fb
trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4
trusted.gfid=0x2683990126724adbb6416b911180e62b

# NODE 3

trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19
trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269
trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82
trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4
trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94
trusted.gfid=0xf120657977274247900db4e9cc8129dd
trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9
trusted.gfid=0x2174086880fc4fd19b187d1384300add
trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01
trusted.gfid=0xa7d78519db61459399e01fad2badf3fb
trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4
trusted.gfid=0x2683990126724adbb6416b911180e62b


> Let us see if the parents' gfids are the same before deleting anything.
> Is the heal info still showing 4 entries? Please also share the getfattr
> output of the the parent directory (i.e. dir11) .

Yes, the heal info still shows the 4 entries but on node 1 the directory name 
is not shown anymore but just the GFID. This is the actual output of a "volume 
heal info":

Brick node1:/data/myvol-pro/brick




Status: Connected
Number of entries: 4

Brick node2:/data/myvol-pro/brick
Status: Connected
Number of entries: 0

Brick node3:/srv/glusterfs/myvol-pro/brick
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir


Status: Connected
Number of entries: 4

What are the next steps in order to fix that?
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-15 Thread mabi
‐‐‐ Original Message ‐‐‐
On Thursday, November 15, 2018 5:57 AM, Ravishankar N  
wrote:

> 1.Could you provide the getfattr output of the following 3 dirs from all
> 3 nodes?
> i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10
> ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/
> iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir

Sure, you will find below the getfattr output of all 3 directories from all 3 
nodes.

i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10

# NODE 1
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269
trusted.glusterfs.dht=0x0001

# NODE 2
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269
trusted.glusterfs.dht=0x0001

# NODE 3
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x
trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269
trusted.glusterfs.dht=0x0001

ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/

# NODE 1
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00040003
trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd
trusted.glusterfs.dht=0x0001

# NODE 2
trusted.gfid=0x10ec1eb1c8544ff2a36c325681713093
trusted.glusterfs.dht=0x0001

# NODE 3
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00040003
trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd
trusted.glusterfs.dht=0x0001

iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir

# NODE 1
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00030003
trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19
trusted.glusterfs.dht=0x0001

# NODE 2
trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a
trusted.glusterfs.dht=0x0001

# NODE 3
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00030003
trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19
trusted.glusterfs.dht=0x0001


> 2. Do you know the file (or directory) names corresponding to the other
> 2 gfids  in heal info output, i.e
> gfid:aae4098a-1a71-4155-9cc9-e564b89957cf
> gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360
> Please share the getfattr output of them as well.

Unfortunately no. I tried the trick of mounting the volume with the mount 
option "aux-gfid-mount" in order to find the filename corresponding to the GFID 
and then using the following getfattr command:

getfattr -n trusted.glusterfs.pathinfo -e text 
/mnt/g/.gfid/aae4098a-1a71-4155-9cc9-e564b89957cf

this gave me the following output:

trusted.glusterfs.pathinfo="( 
( 

 
))"

then if I check the 
".../brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf" on node 1 or 
node 3 it does not have any symlink to a file. Or am I looking at the wrong 
place maybe or there is another trick in order to find the GFID->filename?

Regards,
Mabi
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-16 Thread mabi
‐‐‐ Original Message ‐‐‐
On Friday, November 16, 2018 5:14 AM, Ravishankar N  
wrote:

> Okay, as asked in the previous mail, please share the getfattr output
> from all bricks for these 2 files. I think once we have this, we can try
> either 'adjusting' the the gfid and symlinks on node 2 for dir11 and
> oc_dir or see if we can set afr xattrs on dir10 for self-heal to purge
> everything under it on node 2 and recreate it using the other 2 nodes.

And finally here is the output of a getfattr from both files from the 3 nodes:

FILE 1: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8
trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579


FILE 2: 
/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey

NODE 1:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579

NODE 2:
trusted.afr.dirty=0x
trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace
trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579

NODE 3:
trusted.afr.dirty=0x
trusted.afr.myvol-pro-client-1=0x00020001
trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579

Thanks again in advance for your answer.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] S3-compatbile object storage on top of GlusterFS volume

2018-12-14 Thread mabi
Hello,

First of all I was wondering if GlusterFS natively implements S3-compatbile 
object storage or if this is planned for the near future? I did not find 
anything in the documentation so I assume here that this is not the case.

As an alternative I was thinking to use Zenko CloudServer 
(https://github.com/scality/cloudserver) which is a S3-compatbile object store 
implementation on top of a volume of my already existing GlusterFS cluster. 
Does anyone have experience with this case-scenario? If yes I would be 
interested to know how well this works and what software is recommended for 
this case.

Best regards,
Mabi
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] op-version compatibility with older clients

2018-11-21 Thread mabi
Hello,

I would like to know if by increasing the op-version of all my GlusterFS 
volumes from the its actual version 31202 to 40100 by using the following 
command:

gluster volume set all op-version 40100

Will my clients using GlusterFS 3.12 client libgfapi and FUSE mount still be 
able the connect to my server and work correctly?

I am running 4.1.5 on my GlusterFS server and I am asking because I still have 
a few clients on 3.12.14 which will need to stay longer on 3.12.14.

Regards,
Mabi








___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-17 Thread mabi
Thank you Ravi for your answer. I have now set the afr xattr as you suggested 
and I am running the "find . | xargs -d '\n' stat" on my gluster fuse mount for 
this volume.

This volume has around 3 million of files and directories so it will take a 
long time to finish I suppose. Do I really need to run this find over the whole 
volume starting from its root?

Note that I added the "-d '\n'" option in xargs in order to deal with filenames 
which have spaces inside.


‐‐‐ Original Message ‐‐‐
On Saturday, November 17, 2018 6:04 AM, Ravishankar N  
wrote:

> Okay so for all files and dirs, node 2 seems to be the bad copy. Try the
> following:
>
> 1.  On both node 1 and node3, set theafr xattr for dir10:
> setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001
> 
> /data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10
>
> 2.  Fuse mount the volume temporarily in some location and from that
> mount point, do a `find .|xargs stat >/dev/null`
>
>
> 3. Run`gluster volume heal $volname`
>
> HTH,
> Ravi
>
> On 11/16/2018 09:07 PM, mabi wrote:
>
> > And finally here is the output of a getfattr from both files from the 3 
> > nodes:
> > FILE 1: 
> > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey
> > NODE 1:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
> > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579
> > NODE 2:
> > trusted.afr.dirty=0x
> > trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8
> > trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579
> > NODE 3:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
> > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579
> > FILE 2: 
> > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey
> > NODE 1:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
> > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579
> > NODE 2:
> > trusted.afr.dirty=0x
> > trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace
> > trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579
> > NODE 3:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
> > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

2018-11-17 Thread mabi
Good news, the stat on all files of my volume finished after running for over 6 
hours and the 4 files (actually 2 directories and 2 files) are now finally all 
healed. I checked the 3 bricks and all have the correct data. On node 1 I also 
saw 4 healing log entries in glustershd.log log file. I did not even need to 
manually run a "volume heal" as it healed automatically.

Now, I would really like to avoid this situation in the future, it's a pain for 
me and maybe also for you guys helping me ;-) Is this a bug or am I doing 
something wrong? How can I avoid this type of manual fixing in the future?

Again a big thank you Ravi for your patience helping me out with this issue.

‐‐‐ Original Message ‐‐‐
On Saturday, November 17, 2018 6:04 AM, Ravishankar N  
wrote:

> Okay so for all files and dirs, node 2 seems to be the bad copy. Try the
> following:
>
> 1.  On both node 1 and node3, set theafr xattr for dir10:
> setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001
> 
> /data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10
>
> 2.  Fuse mount the volume temporarily in some location and from that
> mount point, do a `find .|xargs stat >/dev/null`
>
>
> 3. Run`gluster volume heal $volname`
>
> HTH,
> Ravi
>
> On 11/16/2018 09:07 PM, mabi wrote:
>
> > And finally here is the output of a getfattr from both files from the 3 
> > nodes:
> > FILE 1: 
> > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey
> > NODE 1:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
> > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579
> > NODE 2:
> > trusted.afr.dirty=0x
> > trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8
> > trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579
> > NODE 3:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf
> > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579
> > FILE 2: 
> > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey
> > NODE 1:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
> > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579
> > NODE 2:
> > trusted.afr.dirty=0x
> > trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace
> > trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579
> > NODE 3:
> > trusted.afr.dirty=0x
> > trusted.afr.myvol-pro-client-1=0x00020001
> > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360
> > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Max length for filename

2019-01-28 Thread mabi
Hello,

I saw this warning today in my fuse mount client log file:

[2019-01-28 06:01:25.091232] W [fuse-bridge.c:565:fuse_entry_cbk] 
0-glusterfs-fuse: 530594537: LOOKUP() 
/data/somedir0/files/-somdir1/dir2/dir3/some super long 
filename….mp3.TransferId1924513788.part => -1 (File name too long)

and was actually wondering on GlusterFS what is the maximum length for a 
filename?

I am using GlusterFS 4.1.6.

Regards,
Mabi





___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] quotad error log warnings repeated

2019-02-06 Thread mabi
Hello,

I am running a 3 node (with arbiter) GlusterFS 4.1.6 cluster with one 
replicated volume where I have quotas enabled.

Now I checked my quotad.log file on one of the nodes and can see a lot of these 
warning messages which are repeated a lot:

The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 
'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument]" repeated 
224 times between [2019-02-07 07:28:15.291923] and [2019-02-07 07:30:02.625004]
The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 
'volume-uuid' is not sent on wire [Invalid argument]" repeated 224 times 
between [2019-02-07 07:28:15.291949] and [2019-02-07 07:30:02.625004]
[2019-02-07 07:30:07.747135] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 
0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid 
argument]
[2019-02-07 07:30:07.747164] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 
0-dict: key 'volume-uuid' is not sent on wire [Invalid argument]

I can re-trigger these warning messages on demand for example by running

$ gluster volume quota myvolume list

Does anyone know if this is bad? is it a bug? and what can I do about it?

Best regards,
Mabi

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS 4.1.9 Debian stretch packages missing

2019-06-23 Thread mabi
Hello,

I would like to upgrade my GlusterFS 4.1.8 cluster to 4.1.9 on my Debian 
stretch nodes. Unfortunately the packages are missing as you can see here:

https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.9/Debian/stretch/amd64/apt/

As far as I know GlusterFS 4.1 is not yet EOL so I don't understand why the 
packages are missing... Maybe an error?

Could please someone check?

Thank you very much in advance.

Best,
M.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS FUSE client on BSD

2019-07-03 Thread mabi
Hello,

Is there a way to mount a GlusterFS volume using FUSE on an BSD machine such as 
OpenBSD?

If not, what is the alternative, I guess NFS?

Regards,
M.





___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] writing to fuse device failed: No such file or directory

2020-03-02 Thread mabi
Hello,

On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see quite 
a lot of the following error message repeatedly:

[2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe]
 (--> 
/usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a]
 (--> 
/usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33]
 (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> 
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) 
0-glusterfs-fuse: writing to fuse device failed: No such file or directory

Both the server and clients are Debian 9.

What exactly does this error message mean? And is it normal? or what should I 
do to fix that?

Regards,
Mabi









Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] writing to fuse device failed: No such file or directory

2020-03-02 Thread mabi
‐‐‐ Original Message ‐‐‐
On Tuesday, March 3, 2020 6:11 AM, Hari Gowtham  wrote:

> I checked on the backport and found that this patch hasn't yet been 
> backported to any of the release branches.
> If this is the fix, it would be great to have them backported for the next 
> release.

Thanks to everyone who responded to my post. Now I wanted to ask if the fix to 
this bug will also be backported to GlusterFS 5? and if yes, will it be 
available in the next GlusterFS version 5.13?



Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Announcing Gluster release 5.11

2019-12-27 Thread mabi
Dear Hari,

Nearly 10 days after your announcement unfortunately the 5.11 Debian stretch 
packages are still missing:

https://download.gluster.org/pub/gluster/glusterfs/5/5.11/Debian/stretch/amd64/apt/pool/main/g/glusterfs/

Do you know when they will be available? or has this maybe been forgotten?

Thank you very much in advance.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Wednesday, December 18, 2019 4:56 AM, Hari Gowtham  
wrote:

> Hi,
>
> The Gluster community is pleased to announce the release of Gluster
> 5.11 (packages available at [1]).
>
> Release notes for the release can be found at [2].
>
> Major changes, features and limitations addressed in this release:
> None
>
> Thanks,
> Gluster community
>
> [1] Packages for 5.11:
> https://download.gluster.org/pub/gluster/glusterfs/5/5.11/
>
> [2] Release notes for 5.11:
> https://docs.gluster.org/en/latest/release-notes/5.11/
>
> --
> Regards,
> Hari Gowtham.

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Announcing Gluster release 5.11

2019-12-27 Thread mabi
Thank you very much for your fast response and for adding the missing Debian 
packages.

‐‐‐ Original Message ‐‐‐
On Friday, December 27, 2019 10:36 AM, Shwetha Acharya  
wrote:

> Hi Mabi,
>
> Glusterfs 5.11 Debian amd64 stretch packages are now available.
>
> Regards,
> Shwetha
>
> On Fri, Dec 27, 2019 at 1:37 PM mabi  wrote:
>
>> Dear Hari,
>>
>> Nearly 10 days after your announcement unfortunately the 5.11 Debian stretch 
>> packages are still missing:
>>
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.11/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>>
>> Do you know when they will be available? or has this maybe been forgotten?
>>
>> Thank you very much in advance.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Wednesday, December 18, 2019 4:56 AM, Hari Gowtham  
>> wrote:
>>
>>> Hi,
>>>
>>> The Gluster community is pleased to announce the release of Gluster
>>> 5.11 (packages available at [1]).
>>>
>>> Release notes for the release can be found at [2].
>>>
>>> Major changes, features and limitations addressed in this release:
>>> None
>>>
>>> Thanks,
>>> Gluster community
>>>
>>> [1] Packages for 5.11:
>>> https://download.gluster.org/pub/gluster/glusterfs/5/5.11/
>>>
>>> [2] Release notes for 5.11:
>>> https://docs.gluster.org/en/latest/release-notes/5.11/
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>
>> 
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/441850968
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] writing to fuse device failed: No such file or directory

2020-05-04 Thread mabi
Hello,

Now that GlusterFS 5.13 has been released, could someone let me know if this 
issue (see mail below) has been fixed in 5.13?

Thanks and regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Monday, March 2, 2020 3:17 PM, mabi  wrote:

> Hello,
>
> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see 
> quite a lot of the following error message repeatedly:
>
> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> 
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe]
>  (--> 
> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a]
>  (--> 
> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33]
>  (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> 
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) 
> 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
>
> Both the server and clients are Debian 9.
>
> What exactly does this error message mean? And is it normal? or what should I 
> do to fix that?
>
> Regards,
> Mabi






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] writing to fuse device failed: No such file or directory

2020-05-05 Thread mabi
Dear Hari,

Thank you for your answer.

A few months ago when I reported this issue initially I was told that the fix 
would be backported to 5.x, at that time 5.x was not EOL.

So I guess I should upgrade to 7 but reading this list it seems that version 7 
has a few other open issues. Is it safe the use version 7 in production or 
should I better use version 6?

And is it possible to upgrade from 5.11 directly to 7.5?

Regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Tuesday, May 5, 2020 1:40 PM, Hari Gowtham  wrote:

> Hi,
>
> I don't see the above mentioned fix to be backported to any branch.
> I have just cherry picked them for the release-6 and 7.
> Release-5 has reached EOL and so, it won't have the fix.
> Note: release 6 will have one more release and will be EOLed as well.
> Release-8 is being worked on and it will have the fix as a part of the way 
> it's branched.
> Once it gets merged, it should be available in the release-6 and 7. but I do 
> recommend switching from
> the older branches to the newer ones (at least release-7 in this case).
>
> https://review.gluster.org/#/q/change:I510158843e4b1d482bdc496c2e97b1860dc1ba93
>
> On Tue, May 5, 2020 at 11:52 AM mabi  wrote:
>
>> Dear Artem,
>>
>> Thank you for your answer. If you still see these errors messages with 
>> GlusterFS 5.13 I suppose then that this bug fix has not been backported to 
>> 5.x.
>>
>> Could someone of the dev team please confirm? It was said on this list that 
>> this bug fix would be back ported to 5.x, so I am a bit surprised.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Monday, May 4, 2020 9:57 PM, Artem Russakovskii  
>> wrote:
>>
>>> I'm on 5.13, and these are the only error messages I'm still seeing (after 
>>> downgrading from the failed v7 update):
>>>
>>> [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] 
>>> (--> 
>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] 
>>> (--> 
>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] 
>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: 
>>> writing to fuse device failed: No such file or directory
>>> [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] 
>>> (--> 
>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] 
>>> (--> 
>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] 
>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: 
>>> writing to fuse device failed: No such file or directory
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, [Android Police](http://www.androidpolice.com), [APK 
>>> Mirror](http://www.apkmirror.com/), Illogical Robot LLC
>>> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR)
>>>
>>> On Mon, May 4, 2020 at 5:46 AM mabi  wrote:
>>>
>>>> Hello,
>>>>
>>>> Now that GlusterFS 5.13 has been released, could someone let me know if 
>>>> this issue (see mail below) has been fixed in 5.13?
>>>>
>>>> Thanks and regards,
>>>> Mabi
>>>>
>>>> ‐‐‐ Original Message ‐‐‐
>>>> On Monday, March 2, 2020 3:17 PM, mabi  wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see 
>>>>> quite a lot of the following error message repeatedly:
>>>>>
>>>>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>>>> (--> 
>>>>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe]
>>>>>  (--> 
>>>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a]
>>>>>  (--> 
>>>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33]
>>>>>  (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> 
>>>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) 
>

Re: [Gluster-users] writing to fuse device failed: No such file or directory

2020-05-05 Thread mabi
Dear Artem,

Thank you for your answer. If you still see these errors messages with 
GlusterFS 5.13 I suppose then that this bug fix has not been backported to 5.x.

Could someone of the dev team please confirm? It was said on this list that 
this bug fix would be back ported to 5.x, so I am a bit surprised.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Monday, May 4, 2020 9:57 PM, Artem Russakovskii  wrote:

> I'm on 5.13, and these are the only error messages I'm still seeing (after 
> downgrading from the failed v7 update):
>
> [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> 
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] (--> 
> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] (--> 
> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] (--> 
> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: writing 
> to fuse device failed: No such file or directory
> [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> 
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] (--> 
> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] (--> 
> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] (--> 
> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: writing 
> to fuse device failed: No such file or directory
>
> Sincerely,
> Artem
>
> --
> Founder, [Android Police](http://www.androidpolice.com), [APK 
> Mirror](http://www.apkmirror.com/), Illogical Robot LLC
> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR)
>
> On Mon, May 4, 2020 at 5:46 AM mabi  wrote:
>
>> Hello,
>>
>> Now that GlusterFS 5.13 has been released, could someone let me know if this 
>> issue (see mail below) has been fixed in 5.13?
>>
>> Thanks and regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Monday, March 2, 2020 3:17 PM, mabi  wrote:
>>
>>> Hello,
>>>
>>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see 
>>> quite a lot of the following error message repeatedly:
>>>
>>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>> (--> 
>>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe]
>>>  (--> 
>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a]
>>>  (--> 
>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33]
>>>  (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> 
>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) 
>>> 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
>>>
>>> Both the server and clients are Debian 9.
>>>
>>> What exactly does this error message mean? And is it normal? or what should 
>>> I do to fix that?
>>>
>>> Regards,
>>> Mabi
>>
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] writing to fuse device failed: No such file or directory

2020-05-06 Thread mabi
Hi everyone,

So because upgrading introduces additional problems, does this means I should 
stick with 5.x even if it is EOL?

Or what is a "safe" version to upgrade to?

Regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Wednesday, May 6, 2020 2:44 AM, Artem Russakovskii  
wrote:

> Hi Hari,
>
> Hmm, given how poorly our migration from 5.13 to 7.5 went, I am not sure how 
> I'd move forward with what you suggested at this point.
>
> Sincerely,
> Artem
>
> --
> Founder, [Android Police](http://www.androidpolice.com), [APK 
> Mirror](http://www.apkmirror.com/), Illogical Robot LLC
> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR)
>
> On Tue, May 5, 2020 at 4:41 AM Hari Gowtham  wrote:
>
>> Hi,
>>
>> I don't see the above mentioned fix to be backported to any branch.
>> I have just cherry picked them for the release-6 and 7.
>> Release-5 has reached EOL and so, it won't have the fix.
>> Note: release 6 will have one more release and will be EOLed as well.
>> Release-8 is being worked on and it will have the fix as a part of the way 
>> it's branched.
>> Once it gets merged, it should be available in the release-6 and 7. but I do 
>> recommend switching from
>> the older branches to the newer ones (at least release-7 in this case).
>>
>> https://review.gluster.org/#/q/change:I510158843e4b1d482bdc496c2e97b1860dc1ba93
>>
>> On Tue, May 5, 2020 at 11:52 AM mabi  wrote:
>>
>>> Dear Artem,
>>>
>>> Thank you for your answer. If you still see these errors messages with 
>>> GlusterFS 5.13 I suppose then that this bug fix has not been backported to 
>>> 5.x.
>>>
>>> Could someone of the dev team please confirm? It was said on this list that 
>>> this bug fix would be back ported to 5.x, so I am a bit surprised.
>>>
>>> Best regards,
>>> Mabi
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Monday, May 4, 2020 9:57 PM, Artem Russakovskii  
>>> wrote:
>>>
>>>> I'm on 5.13, and these are the only error messages I'm still seeing (after 
>>>> downgrading from the failed v7 update):
>>>>
>>>> [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] 
>>>> (--> 
>>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] 
>>>> (--> 
>>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] 
>>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
>>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: 
>>>> writing to fuse device failed: No such file or directory
>>>> [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] 
>>>> (--> 
>>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] 
>>>> (--> 
>>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] 
>>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> 
>>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: 
>>>> writing to fuse device failed: No such file or directory
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, [Android Police](http://www.androidpolice.com), [APK 
>>>> Mirror](http://www.apkmirror.com/), Illogical Robot LLC
>>>> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR)
>>>>
>>>> On Mon, May 4, 2020 at 5:46 AM mabi  wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Now that GlusterFS 5.13 has been released, could someone let me know if 
>>>>> this issue (see mail below) has been fixed in 5.13?
>>>>>
>>>>> Thanks and regards,
>>>>> Mabi
>>>>>
>>>>> ‐‐‐ Original Message ‐‐‐
>>>>> On Monday, March 2, 2020 3:17 PM, mabi  wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see 
>>>>>> quite a lot of the following error message repeatedly:
>>>>>>
>>>>>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] 
>>>>>> (--> 
>

[Gluster-users] glustershd: EBADFD [File descriptor in bad state]

2020-10-09 Thread mabi
Hello,

I have a GlusterFS 6.9 cluster with two nodes and one arbitrer node with a 
replica volume and currently there are two files and two directories stuck to 
be self-healed.

Node 1 and 3 (arbitrer) have the files and directories on the brick but node 2 
does not have the files and directories.

Node1 glustershd log file shows the following warning message:

[2020-10-09 14:18:54.006707] I [MSGID: 108026] 
[afr-self-heal-entry.c:898:afr_selfheal_entry_do] 0-myvol-replicate-0: 
performing entry selfheal on 4d520c69-2b18-4601-bad5-3c16c29188c1
[2020-10-09 14:18:54.007064] W [MSGID: 114061] 
[client-common.c:2968:client_pre_readdir_v2] 0-myvol-client-1:  
(4d520c69-2b18-4601-bad5-3c16c29188c1) remote_fd is -1. EBADFD [File descriptor 
in bad state]

The FUSE mount client log file show the following error message:

[2020-10-09 14:15:51.115856] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f9d0a0663bc]
 (--> 
/usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7bba)[0x7f9d07743bba]
 (--> 
/usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7d23)[0x7f9d07743d23]
 (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f9d092bd4a4] (--> 
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9d08b17d0f] ) 
0-glusterfs-fuse: writing to fuse device failed: No such file or directory

I have no clue how this could have happened but as the GlusterFS self-heal 
daemon does not seem to be able to heal the two files and directories itself, I 
would like to know what I can do here to fix this?

Thank you in advance for your help.

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glustershd: EBADFD [File descriptor in bad state]

2020-10-09 Thread mabi
Just wanted to mention that the 3 hours later the self heal daemon managed to 
heal the files. I don't understand why it took 3 hours but at least the 
affected two directories and files are now available on all nodes again


‐‐‐ Original Message ‐‐‐
On Friday, October 9, 2020 4:30 PM, mabi  wrote:

> Hello,
>
> I have a GlusterFS 6.9 cluster with two nodes and one arbitrer node with a 
> replica volume and currently there are two files and two directories stuck to 
> be self-healed.
>
> Node 1 and 3 (arbitrer) have the files and directories on the brick but node 
> 2 does not have the files and directories.
>
> Node1 glustershd log file shows the following warning message:
>
> [2020-10-09 14:18:54.006707] I [MSGID: 108026] 
> [afr-self-heal-entry.c:898:afr_selfheal_entry_do] 0-myvol-replicate-0: 
> performing entry selfheal on 4d520c69-2b18-4601-bad5-3c16c29188c1
> [2020-10-09 14:18:54.007064] W [MSGID: 114061] 
> [client-common.c:2968:client_pre_readdir_v2] 0-myvol-client-1: 
> (4d520c69-2b18-4601-bad5-3c16c29188c1) remote_fd is -1. EBADFD [File 
> descriptor in bad state]
>
> The FUSE mount client log file show the following error message:
>
> [2020-10-09 14:15:51.115856] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> 
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f9d0a0663bc]
>  (--> 
> /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7bba)[0x7f9d07743bba]
>  (--> 
> /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7d23)[0x7f9d07743d23]
>  (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f9d092bd4a4] (--> 
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9d08b17d0f] ) 
> 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
>
> I have no clue how this could have happened but as the GlusterFS self-heal 
> daemon does not seem to be able to heal the two files and directories itself, 
> I would like to know what I can do here to fix this?
>
> Thank you in advance for your help.
>
> Best regards,
> Mabi






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-08-23 Thread mabi
Hello,

So to be precise I am exactly having the following issue:

https://github.com/gluster/glusterfs/issues/1332

I could not wait any longer to find some workarounds or quick fixes so I 
decided to downgrade my rejected from 7.7 back to 6.9 which worked.

I would be really glad if someone could fix this issue or provide me a 
workaround which works because version 6 of GlusterFS is not supported anymore 
so I would really like to move on to the stable version 7.

Thank you very much in advance.

Best regards,
Mabi


‐‐‐ Original Message ‐‐‐
On Saturday, August 22, 2020 7:53 PM, mabi  wrote:

> Hello,
>
> I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS 
> from 6.9 to 7.7 but unfortunately after upgrading the first node, that node 
> gets rejected due to the following error:
>
> [2020-08-22 17:43:00.240990] E [MSGID: 106012] 
> [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums 
> of quota configuration of volume myvolume differ. local cksum = 3013120651, 
> remote cksum = 0 on peer myfirstnode.domain.tld
>
> So glusterd process is running but not glusterfsd.
>
> I am exactly in the same issue as described here:
>
> https://www.gitmemory.com/Adam2Marsh
>
> But I do not see any solutions or workaround. So now I am stuck with a 
> degraded GlusterFS cluster.
>
> Could someone please advise me as soon as possible on what I should do? Is 
> there maybe any workarounds?
>
> Thank you very much in advance for your response.
>
> Best regards,
> Mabi






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-08-22 Thread mabi
Hello,

I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS 
from 6.9 to 7.7 but unfortunately after upgrading the first node, that node 
gets rejected due to the following error:

[2020-08-22 17:43:00.240990] E [MSGID: 106012] 
[glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvolume differ. local cksum = 3013120651, remote 
 cksum = 0 on peer myfirstnode.domain.tld

So glusterd process is running but not glusterfsd.

I am exactly in the same issue as described here:

https://www.gitmemory.com/Adam2Marsh

But I do not see any solutions or workaround. So now I am stuck with a degraded 
GlusterFS cluster.

Could someone please advise me as soon as possible on what I should do? Is 
there maybe any workarounds?

Thank you very much in advance for your response.

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-08-24 Thread mabi
Dear Nikhil,

Thank you for your answer. So does this mean that all my FUSE clients where I 
have the volume mounted will not loose at any time their connection during the 
whole upgrade procedure of all 3 nodes?

I am asking because if I understand correctly there will be an overlap of time 
where more than one node will not be running the glusterfsd (brick) process so 
this means that the quorum is lost and then my FUSE clients will loose 
connection to the volume?

I just want to be sure that there will not be any downtime.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Monday, August 24, 2020 11:14 AM, Nikhil Ladha  wrote:

> Hello Mabi
>
> You don't need to follow the offline upgrade procedure. Please do follow the 
> online upgrade procedure only. Upgrade the nodes one by one, you will notice 
> the `Peer Rejected` state, after upgrading one node or so, but once all the 
> nodes are upgraded it will be back to `Peer in Cluster(Connected)`. Also, if 
> any of the shd's are not online you can try restarting that node to fix that. 
> I have tried this on my own setup so I am pretty sure, it should work for you 
> as well.
> This is the workaround for the time being so that you are able to upgrade, we 
> are working on the issue to come up with a fix for it ASAP.
>
> And, yes if you face any issues even after upgrading all the nodes to 7.7, 
> you will be able to downgrade in back to 6.9, which I think you have already 
> tried and it works as per your previous mail.
>
> Regards
> Nikhil Ladha



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 3:39 PM, Diego Zuccato  
wrote:

> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)

Yes, I also too often rely on history ;)

> gluster volume remove-brick BigVol replica 2 
> str957-biostq:/srv/arbiters/{00..27}/BigVol force

Thanks for the info, I was missing the "replica 2" inside the command it looks 
like

> gluster peer detach str957-biostq
> gluster peer probe str957-biostq

Do I really need to do a detach and re-probe the aribter node? I would like to 
avoid that because I have two other volumes with even more files... so that 
would mean that I have to remove the arbiter brick of the two other volumes 
too...

> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).

I have added an extra 4 GB of RAM just in case.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
Ok I see I won't go down that path of disabling quota.

I could now remove the arbiter brick of my volume which has the quota issue so 
it is now a simple 2 nodes replica with 1 brick per node.

Now I would like to add the brick back but I get the following error:

volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in 
Cluster' state

In fact I checked and the arbiter node is still rejected as you can see here:

State: Peer Rejected (Connected)

On the arbiter node glusted.log file I see the following errors:

[2020-10-26 18:35:05.605124] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume woelkli-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node1.domain.tld
[2020-10-26 18:35:05.617009] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node2.domain.tld

So although I have removed the arbiter brick from my volume it it still 
complains about that checksum of the quota configuration. I also tried to 
restart glusterd on my arbiter node but it does not help. The peer is still 
rejected.

What should I do at this stage?


‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 6:06 PM, Strahil Nikolov  
wrote:

> Detaching the arbiter is pointless...
> Quota is an extended file attribute, and thus disabling and reenabling quota 
> on a volume with millions of files will take a lot of time and lots of IOPS. 
> I would leave it as a last resort. 
>
> Also, it was mentioned in the list about the following script that might help 
> you:
> https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py
>
> You can take a look in the mailing list for usage and more details.
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato 
> diego.zucc...@unibo.it написа:
>
> Il 26/10/20 15:09, mabi ha scritto:
>
> > Right, seen liked that this sounds reasonable. Do you actually remember the 
> > exact command you ran in order to remove the brick? I was thinking this 
> > should be it:
> > gluster volume remove-brick   force
> > but should I use "force" or "start"?
>
> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)
>
> gluster volume remove-brick BigVol replica 2
>
> =
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol force
>
> gluster peer detach str957-biostq
>
> ==
>
> gluster peer probe str957-biostq
>
> =
>
> gluster volume add-brick BigVol replica 3 arbiter 1
>
> 
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol
>
> You obviously have to wait for remove-brick to complete before detaching
> arbiter.
>
> > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and 
> > > RAM.
> > > That's quite long I must say and I am in the same case as you, my arbiter 
> > > is a VM.
>
> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).
>
> -
>
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-27 Thread mabi
First to answer your question how this first happened, I reached that issue 
first by simply rebooting my arbiter node yesterday morning in order to due 
some maintenance which I do on a regular basis and was never a problem before 
GlusterFS 7.8.

I have now removed the arbiter brick from all of my volumes (I have 3 volumes 
and only one volume uses quota). So I was then able to do a "detach" and then a 
"probe" of my arbiter node.

So far so good, so I decided to add back an aribter brick to one of my smallest 
volumes which does not have quota but I get the following error message:

$ gluster volume add-brick othervol replica 3 arbiter 1 
arbiternode.domain.tld:/srv/glusterfs/othervol/brick

volume add-brick: failed: Commit failed on arbiternode.domain.tld. Please check 
log file for details.

Checking the glusterd.log file of the arbiter node shows the following:

[2020-10-27 06:25:36.011955] I [MSGID: 106578] 
[glusterd-brick-ops.c:1024:glusterd_op_perform_add_bricks] 0-management: 
replica-count is set 3
[2020-10-27 06:25:36.011988] I [MSGID: 106578] 
[glusterd-brick-ops.c:1029:glusterd_op_perform_add_bricks] 0-management: 
arbiter-count is set 1
[2020-10-27 06:25:36.012017] I [MSGID: 106578] 
[glusterd-brick-ops.c:1033:glusterd_op_perform_add_bricks] 0-management: type 
is set 0, need to change it
[2020-10-27 06:25:36.093551] E [MSGID: 106053] 
[glusterd-utils.c:13790:glusterd_handle_replicate_brick_ops] 0-management: 
Failed to set extended attribute trusted.add-brick : Transport endpoint is not 
connected [Transport endpoint is not connected]
[2020-10-27 06:25:36.104897] E [MSGID: 101042] [compat.c:605:gf_umount_lazy] 
0-management: Lazy unmount of /tmp/mntQQVzyD [Transport endpoint is not 
connected]
[2020-10-27 06:25:36.104973] E [MSGID: 106073] 
[glusterd-brick-ops.c:2051:glusterd_op_add_brick] 0-glusterd: Unable to add 
bricks
[2020-10-27 06:25:36.105001] E [MSGID: 106122] 
[glusterd-mgmt.c:317:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit 
failed.
[2020-10-27 06:25:36.105023] E [MSGID: 106122] 
[glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: commit 
failed on operation Add brick

After that I tried to restart the glusterd service on my arbiter node and now 
it is again rejected from the other nodes with exactly the same error message 
as yesterday regarding the quota checksum being different as you can see here:

[2020-10-27 06:30:21.729577] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node2.domain.tld
[2020-10-27 06:30:21.731966] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote  
cksum = 66908910 on peer node1.domain.tld

This is really weird because at this stage I did not even try yet to add the 
brick to the arbiter node from my volume which has the quota enabled...

After detaching the arbiter node, am I supposed to delete something on the 
arbiter node?

Something is really wrong here and I am stuck in a loop somehow... any help 
would be greatly appreciated.


‐‐‐ Original Message ‐‐‐
On Tuesday, October 27, 2020 1:26 AM, Strahil Nikolov  
wrote:

> You need to fix that "reject" issue before trying anything else.
> Have you tried to "detach" the arbiter and then "probe" it again ?
>
> I have no idea what you did to reach that state - can you provide the details 
> ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 26 октомври 2020 г., 20:38:38 Гринуич+2, mabi 
> m...@protonmail.ch написа:
>
> Ok I see I won't go down that path of disabling quota.
>
> I could now remove the arbiter brick of my volume which has the quota issue 
> so it is now a simple 2 nodes replica with 1 brick per node.
>
> Now I would like to add the brick back but I get the following error:
>
> volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in 
> Cluster' state
>
> In fact I checked and the arbiter node is still rejected as you can see here:
>
> State: Peer Rejected (Connected)
>
> On the arbiter node glusted.log file I see the following errors:
>
> [2020-10-26 18:35:05.605124] E [MSGID: 106012] 
> [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums 
> of quota configuration of volume woelkli-private differ. local cksum = 0, 
> remote  cksum = 66908910 on peer node1.domain.tld
> [2020-10-26 18:35:05.617009] E [MSGID: 106012] 
> [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums 
> of quota configuration of volume myvol-private differ. local cksum = 0, 
> remote  cksum = 66908910 on peer node2.domain.tld
>
> So although I have removed the arbiter brick from my v

Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
Dear all,

Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to 7.8 but 
now, 1 week later after the upgrade, I have rebooted my third node (arbiter 
node) and unfortunately the bricks do not want to come up on that node. I get 
the same following error message:

[2020-10-26 06:21:59.726705] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote 
cksum = 66908910 on peer node2.domain
[2020-10-26 06:21:59.726871] I [MSGID: 106493] 
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to node2.domain (0), ret: 0, op_ret: -1
[2020-10-26 06:21:59.728164] I [MSGID: 106490] 
[glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: 5f4ccbf4-33f6-4298-8b31-213553223349
[2020-10-26 06:21:59.728969] E [MSGID: 106012] 
[glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of 
quota configuration of volume myvol-private differ. local cksum = 0, remote 
cksum = 66908910 on peer node1.domain
[2020-10-26 06:21:59.729099] I [MSGID: 106493] 
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to node1.domain (0), ret: 0, op_ret: -1

Can someone please advise what I need to do in order to have my arbiter node up 
and running again as soon as possible?

Thank you very much in advance for your help.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Monday, September 7, 2020 5:49 AM, Sanju Rakonde  wrote:

> Hi,
>
> issue https://github.com/gluster/glusterfs/issues/1332 is fixed now with 
> https://github.com/gluster/glusterfs/commit/865cca1190e233381f975ff36118f46e29477dcf.
>
> It will be backported to release-7 and release-8 branches soon.
>
> On Mon, Sep 7, 2020 at 1:14 AM Strahil Nikolov  wrote:
>
>> Your e-mail got in the spam...
>>
>> If you haven't fixed the issue, check Hari's topic about quota issues (based 
>> on the error message you provided) : 
>> https://medium.com/@harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a
>>
>> Most probably there is a quota issue and you need to fix it.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В неделя, 23 август 2020 г., 11:05:27 Гринуич+3, mabi  
>> написа:
>>
>> Hello,
>>
>> So to be precise I am exactly having the following issue:
>>
>> https://github.com/gluster/glusterfs/issues/1332
>>
>> I could not wait any longer to find some workarounds or quick fixes so I 
>> decided to downgrade my rejected from 7.7 back to 6.9 which worked.
>>
>> I would be really glad if someone could fix this issue or provide me a 
>> workaround which works because version 6 of GlusterFS is not supported 
>> anymore so I would really like to move on to the stable version 7.
>>
>> Thank you very much in advance.
>>
>> Best regards,
>> Mabi
>>
>> ‐‐‐ Original Message ‐‐‐
>>
>> On Saturday, August 22, 2020 7:53 PM, mabi  wrote:
>>
>>> Hello,
>>>
>>> I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS 
>>> from 6.9 to 7.7 but unfortunately after upgrading the first node, that node 
>>> gets rejected due to the following error:
>>>
>>> [2020-08-22 17:43:00.240990] E [MSGID: 106012] 
>>> [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums 
>>> of quota configuration of volume myvolume differ. local cksum = 3013120651, 
>>> remote cksum = 0 on peer myfirstnode.domain.tld
>>>
>>> So glusterd process is running but not glusterfsd.
>>>
>>> I am exactly in the same issue as described here:
>>>
>>> https://www.gitmemory.com/Adam2Marsh
>>>
>>> But I do not see any solutions or workaround. So now I am stuck with a 
>>> degraded GlusterFS cluster.
>>>
>>> Could someone please advise me as soon as possible on what I should do? Is 
>>> there maybe any workarounds?
>>>
>>> Thank you very much in advance for your response.
>>>
>>> Best regards,
>>> Mabi
>>
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Thanks,
> Sanju



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
On Monday, October 26, 2020 11:34 AM, Diego Zuccato  
wrote:

> IIRC it's the same issue I had some time ago.
> I solved it by "degrading" the volume to replica 2, then cleared the
> arbiter bricks and upgraded again to replica 3 arbiter 1.

Thanks Diego for pointing out this workaround. How much data do you have on 
that volume in terms of TB and files? Because I have around 3TB of data in 10 
million files. So I am a bit worried of taking such drastic measures.

How bad was the load after on your volume when re-adding the arbiter brick? and 
how long did it take to sync/heal?

Would another workaround such as turning off quotas on that problematic volume 
work? That sounds much less scary but I don't know if that would work...




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

2020-10-26 Thread mabi
‐‐‐ Original Message ‐‐‐
On Monday, October 26, 2020 2:56 PM, Diego Zuccato  
wrote:

> The volume is built by 26 10TB disks w/ genetic data. I currently don't
> have exact numbers, but it's still at the beginning, so there are a bit
> less than 10TB actually used.
> But you're only removing the arbiters, you always have two copies of
> your files. The worst that can happen is a split brain condition
> (avoidable by requiring a 2-nodes quorum, in that case the worst is that
> the volume goes readonly).

Right, seen liked that this sounds reasonable. Do you actually remember the 
exact command you ran in order to remove the brick? I was thinking this should 
be it:

gluster volume remove-brick   force

but should I use "force" or "start"?

> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.

That's quite long I must say and I am in the same case as you, my arbiter is a 
VM.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Slow writes on replica+arbiter after upgrade to 7.8 (issue on github)

2020-11-06 Thread mabi
Hello,

I just wanted to give you a heads up that I have now submitted an issue on 
github with the required details as I suspect this behavior to be maybe related 
to a bug introduced in version 7 as I did not have this problem with version 6:

https://github.com/gluster/glusterfs/issues/1764

Thank you in advance for your help.

Regards,
Mabi





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
Hello,

I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes 
and the two nodes (not arbiter) seem to have a high load due to the glusterfsd 
brick process taking all CPU resources (12 cores).

Checking these two servers with iostat command shows that the disks are not so 
busy and that they are mostly doing writes activity. On the FUSE clients there 
is not so much activity so I was wondering how to find out or explain why 
GlusterFS is currently generating such a high load on these two servers (the 
arbiter does not show any high load). There are no files currently healing 
either. This volume is the only volume which has the quota enabled if this 
might be a hint. So does anyone know how to see why GlusterFS is so busy on a 
specific volume?

Here is a sample "vmstat 60" of one of the nodes:

onadmin@gfs1b:~$ vmstat 60
procs ---memory-- ---swap-- -io -system-- --cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 9  2  0 22296776  32004 2602840033   301  153   39  2 60 36  2 
 0
13  0  0 22244540  32048 26045600   343  2798 10898 367652  2 80 16 
 1  0
18  0  0 22215740  32056 26067200   308  2524 9892 334537  2 83 14  
1  0
18  0  0 22179348  32084 26082800   169  2038 8703 250351  1 88 10  
0  0

I already tried rebooting but that did not help and there is nothing special in 
the log files either.

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
Below is the top output of running "top -bHd d" on one of the nodes, maybe that 
can help to see what that glusterfsd process is doing?

  PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
 4375 root  20   0 2856784 120492   8360 D 61.1  0.4 117:09.29 glfs_iotwr001
 4385 root  20   0 2856784 120492   8360 R 61.1  0.4 117:12.92 glfs_iotwr003
 4387 root  20   0 2856784 120492   8360 R 61.1  0.4 117:32.19 glfs_iotwr005
 4388 root  20   0 2856784 120492   8360 R 61.1  0.4 117:28.87 glfs_iotwr006
 4391 root  20   0 2856784 120492   8360 D 61.1  0.4 117:20.71 glfs_iotwr008
 4395 root  20   0 2856784 120492   8360 D 61.1  0.4 117:17.22 glfs_iotwr009
 4405 root  20   0 2856784 120492   8360 R 61.1  0.4 117:19.52 glfs_iotwr00d
 4406 root  20   0 2856784 120492   8360 R 61.1  0.4 117:29.51 glfs_iotwr00e
 4366 root  20   0 2856784 120492   8360 D 55.6  0.4 117:27.58 glfs_iotwr000
 4386 root  20   0 2856784 120492   8360 D 55.6  0.4 117:22.77 glfs_iotwr004
 4390 root  20   0 2856784 120492   8360 D 55.6  0.4 117:26.49 glfs_iotwr007
 4396 root  20   0 2856784 120492   8360 R 55.6  0.4 117:23.68 glfs_iotwr00a
 4376 root  20   0 2856784 120492   8360 D 50.0  0.4 117:36.17 glfs_iotwr002
 4397 root  20   0 2856784 120492   8360 D 50.0  0.4 117:11.09 glfs_iotwr00b
 4403 root  20   0 2856784 120492   8360 R 50.0  0.4 117:26.34 glfs_iotwr00c
 4408 root  20   0 2856784 120492   8360 D 50.0  0.4 117:27.47 glfs_iotwr00f
 9814 root  20   0 2043684  75208   8424 D 22.2  0.2  50:15.20 glfs_iotwr003
28131 root  20   0 2043684  75208   8424 R 22.2  0.2  50:07.46 glfs_iotwr004
 2208 root  20   0 2043684  75208   8424 R 22.2  0.2  49:32.70 glfs_iotwr008
 2372 root  20   0 2043684  75208   8424 R 22.2  0.2  49:52.60 glfs_iotwr009
 2375 root  20   0 2043684  75208   8424 D 22.2  0.2  49:54.08 glfs_iotwr00c
  767 root  39  19   0  0  0 R 16.7  0.0  67:50.83 dbuf_evict
 4132 onadmin   20   0   45292   4184   3176 R 16.7  0.0   0:00.04 top
28484 root  20   0 2043684  75208   8424 R 11.1  0.2  49:41.34 glfs_iotwr005
 2376 root  20   0 2043684  75208   8424 R 11.1  0.2  49:49.49 glfs_iotwr00d
 2719 root  20   0 2043684  75208   8424 R 11.1  0.2  49:58.61 glfs_iotwr00e
 4384 root  20   0 2856784 120492   8360 S  5.6  0.4   4:01.27 glfs_rpcrqhnd
 3842 root  20   0 2043684  75208   8424 S  5.6  0.2   0:30.12 glfs_epoll001
1 root  20   0   57696   7340   5248 S  0.0  0.0   0:03.59 systemd
2 root  20   0   0  0  0 S  0.0  0.0   0:09.57 kthreadd
3 root  20   0   0  0  0 S  0.0  0.0   0:00.16 ksoftirqd/0
5 root   0 -20   0  0  0 S  0.0  0.0   0:00.00 kworker/0:0H
7 root  20   0   0  0  0 S  0.0  0.0   0:07.36 rcu_sched
8 root  20   0   0  0  0 S  0.0  0.0   0:00.00 rcu_bh
9 root  rt   0   0  0  0 S  0.0  0.0   0:00.03 migration/0
   10 root   0 -20   0  0  0 S  0.0  0.0   0:00.00 lru-add-drain
   11 root  rt   0   0  0  0 S  0.0  0.0   0:00.01 watchdog/0
   12 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/0
   13 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/1

Any clues anyone?

The load is really high around 20 now on the two nodes...


‐‐‐ Original Message ‐‐‐
On Thursday, November 5, 2020 11:50 AM, mabi  wrote:

> Hello,
>
> I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes 
> and the two nodes (not arbiter) seem to have a high load due to the 
> glusterfsd brick process taking all CPU resources (12 cores).
>
> Checking these two servers with iostat command shows that the disks are not 
> so busy and that they are mostly doing writes activity. On the FUSE clients 
> there is not so much activity so I was wondering how to find out or explain 
> why GlusterFS is currently generating such a high load on these two servers 
> (the arbiter does not show any high load). There are no files currently 
> healing either. This volume is the only volume which has the quota enabled if 
> this might be a hint. So does anyone know how to see why GlusterFS is so busy 
> on a specific volume?
>
> Here is a sample "vmstat 60" of one of the nodes:
>
> onadmin@gfs1b:~$ vmstat 60
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0
> 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80 16 1 0
> 18 0 0 22215740 32056 260672 0 0 308 2524 9892 334537 2 83 14 1 0
> 18 0 0 22179348 32084 260828 0 0 169 2038 8703 250351 1 88 10 0 0
>
> I already tried rebooting but that did not help and there is nothing special 
> in the log files either.

Re: [Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
‐‐‐ Original Message ‐‐‐
On Thursday, November 5, 2020 3:28 PM, Yaniv Kaul  wrote:

> Waiting for IO, just like the rest of those in D state.
> You may have a slow storage subsystem. How many cores do you have, btw?
> Y.

Strange because "iostat -xtcm 5" does not show that the disks are 100% used, 
I've pasted below a sample output of "iostat -xtcm".

Both nodes have 1 CPU Intel Xeon E5-2620 v3 @ 2.40GHz which as 12 cores.

11/05/2020 03:37:25 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.93 0.00 84.81 0.03 0.00 14.22

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util
sda 0.00 0.00 0.60 23.80 0.00 0.20 17.31 0.03 1.31 45.33 0.20 1.25 3.04
sdc 0.00 0.00 0.60 24.80 0.00 0.24 19.15 0.03 1.04 40.00 0.10 1.01 2.56
sdg 0.00 0.00 0.60 23.00 0.00 0.22 19.05 0.03 1.39 45.33 0.24 1.25 2.96
sdf 0.00 0.00 0.60 25.00 0.00 0.23 18.25 0.03 1.16 41.33 0.19 1.06 2.72
sdd 0.00 0.00 0.60 24.60 0.00 0.19 15.43 0.02 0.86 32.00 0.10 0.83 2.08
sdh 0.00 0.00 0.40 25.00 0.00 0.22 17.64 0.03 1.10 58.00 0.19 1.01 2.56
sdi 0.00 0.00 0.40 25.80 0.00 0.23 17.71 0.03 1.01 60.00 0.09 0.98 2.56
sdj 0.00 0.00 0.60 24.00 0.00 0.19 15.67 0.02 0.91 32.00 0.13 0.85 2.08
sde 0.00 0.00 0.60 26.60 0.00 0.20 15.12 0.03 1.00 36.00 0.21 0.91 2.48
sdk 0.00 0.00 0.60 25.20 0.00 0.20 16.12 0.02 0.78 29.33 0.10 0.74 1.92
sdl 0.00 0.00 0.60 25.00 0.00 0.22 17.56 0.02 0.94 37.33 0.06 0.94 2.40
sdb 0.00 0.00 0.60 15.40 0.00 0.21 27.80 0.03 2.15 42.67 0.57 1.95 3.12



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Geo replication procedure for DR

2023-06-05 Thread mabi
Hello,

I was reading the geo replication documentation here:

https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/

and I was wondering how it works when in case of disaster recovery when the 
primary cluster is down and the the secondary site with the volume needs to be 
used?

What is the procedure here to make the secondary volume on the secondary site 
available for read/write?

And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to find out data alignment for LVM thin volume brick

2023-06-05 Thread mabi
Hello,

I am preparing a brick as LVM thin volume for a test slave node using this 
documentation:   

https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/

but I am confused regarding the right "--dataalignment" option to be used for 
pvcreate. The documentation mentions the following under point 1:

"Create a physical volume(PV) by using the pvcreate command. For example:

pvcreate --dataalignment 128K /dev/sdb

Here, /dev/sdb is a storage device. Use the correct dataalignment option based 
on your device.

Note: The device name and the alignment value will vary based on the device 
you are using."

As test disk for this brick I have an external USB 500GB SSD disk from Samsung 
PSSD T7 (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) 
but my question is where do I find the information on which alignment value I 
need to use for this specific disk?

Best regards,
Mabi 




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo replication procedure for DR

2023-06-07 Thread mabi
Dear Strahil,

Thank you for the detailed command. So once you want to switch all traffic to 
the DR site in case of disaster one should first disable the read-only setting 
on the secondary volume on the slave site.

What happens after when the master site is back online? What's the procedure 
there? I had the following question in my previous mail in this regard:

"And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?"

Best regards,
Mabi

--- Original Message ---
On Wednesday, June 7th, 2023 at 6:52 AM, Strahil Nikolov 
 wrote:

> It's just a setting on the target volume:
>
> gluster volume set  read-only OFF
>
> Best Regards,
> Strahil Nikolov
>
>> On Mon, Jun 5, 2023 at 22:30, mabi
>>  wrote:
>> Hello,
>>
>> I was reading the geo replication documentation here:
>>
>> https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/
>>
>> and I was wondering how it works when in case of disaster recovery when the 
>> primary cluster is down and the the secondary site with the volume needs to 
>> be used?
>>
>> What is the procedure here to make the secondary volume on the secondary 
>> site available for read/write?
>>
>> And once the primary site is back online how do you copy back or sync all 
>> data changes done on the secondary volume on the secondary site back to the 
>> primary volume on the primary site?
>>
>> Best regards,
>> Mabi
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to find out data alignment for LVM thin volume brick

2023-06-07 Thread mabi
Dear Strahil,

Thank you very much for pointing me to the RedHat documentation. I wasn't aware 
of it and it is much more detailed. I will have to read it carefully.

Now as I have a single disk (no RAID) based on that documentation I understand 
that I should use a data alignment value of 256kB.

Best regards,
Mabi

--- Original Message ---
On Wednesday, June 7th, 2023 at 6:56 AM, Strahil Nikolov 
 wrote:

> Have you checked this page: 
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/brick_configuration
>  ?
>
> The alignment depends on the HW raid stripe unit size.
>
> Best Regards,
> Strahil Nikolov
>
>> On Tue, Jun 6, 2023 at 2:35, mabi
>>  wrote:
>> Hello,
>>
>> I am preparing a brick as LVM thin volume for a test slave node using this 
>> documentation:
>>
>> https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/
>>
>> but I am confused regarding the right "--dataalignment" option to be used 
>> for pvcreate. The documentation mentions the following under point 1:
>>
>> "Create a physical volume(PV) by using the pvcreate command. For example:
>>
>> pvcreate --dataalignment 128K /dev/sdb
>>
>> Here, /dev/sdb is a storage device. Use the correct dataalignment option 
>> based on your device.
>>
>> Note: The device name and the alignment value will vary based on the device 
>> you are using."
>>
>> As test disk for this brick I have an external USB 500GB SSD disk from 
>> Samsung PSSD T7 
>> (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) but my 
>> question is where do I find the information on which alignment value I need 
>> to use for this specific disk?
>>
>> Best regards,
>> Mabi
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


<    1   2