Re: [Gluster-users] Failed to get quota limits
mits for a27818fe-0248-40fe-bb23-d43d61010478 [2018-02-13 08:16:14.082067] E [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get quota limits for daf97388-bcec-4cc0-a8ef-5b93f05b30f6 [2018-02-13 08:16:14.086929] E [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get quota limits for 3c768b36-2625-4509-87ef-fe5214cb9b01 [2018-02-13 08:16:14.087905] E [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get quota limits for f8cf47d4-4f54-43c5-ab0d-75b45b4677a3 [2018-02-13 08:16:14.089788] E [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get quota limits for b4c81a39-2152-45c5-95d3-b796d88226fe [2018-02-13 08:16:14.092919] E [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get quota limits for 16ac4cde-a5d4-451f-adcc-422a542fea24 [2018-02-13 08:16:14.092980] I [input.c:31:cli_batch] 0-: Exiting with: 0 *** /var/log/glusterfs/bricks/data-myvolume-brick.log *** [2018-02-13 08:16:13.948065] I [addr.c:182:gf_auth] 0-/data/myvolume/brick: allowed = "*", received addr = "127.0.0.1" [2018-02-13 08:16:13.948105] I [login.c:76:gf_auth] 0-auth/login: allowed user names: bea3e634-e174-4bb3-a1d6-25b09d03b536 [2018-02-13 08:16:13.948125] I [MSGID: 115029] [server-handshake.c:695:server_setvolume] 0-myvolume-server: accepted client from gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0 (version: 3.10.7) [2018-02-13 08:16:14.022257] I [MSGID: 115036] [server.c:559:server_rpc_notify] 0-myvolume-server: disconnecting connection from gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0 [2018-02-13 08:16:14.022465] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-myvolume-server: Shutting down connection gfs1a-14348-2018/02/13-08:16:09:933625-myvolume-client-0-0-0 Original Message On February 13, 2018 12:47 AM, Hari Gowtham <hgowt...@redhat.com> wrote: > Hi, > > Can you provide more information like, the volume configuration, quota.conf > file and the log files. > > On Sat, Feb 10, 2018 at 1:05 AM, mabi <m...@protonmail.ch> wrote: >> Hello, >> >> I am running GlusterFS 3.10.7 and just noticed by doing a "gluster volume >> quota list" that my quotas on that volume are broken. The command >> returns no output and no errors but by looking in /var/log/glusterfs.cli I >> found the following errors: >> >> [2018-02-09 19:31:24.242324] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for 3df709ee-641d-46a2-bd61-889583e3033c >> [2018-02-09 19:31:24.249790] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for a27818fe-0248-40fe-bb23-d43d61010478 >> [2018-02-09 19:31:24.252378] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for daf97388-bcec-4cc0-a8ef-5b93f05b30f6 >> [2018-02-09 19:31:24.256775] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for 3c768b36-2625-4509-87ef-fe5214cb9b01 >> [2018-02-09 19:31:24.257434] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for f8cf47d4-4f54-43c5-ab0d-75b45b4677a3 >> [2018-02-09 19:31:24.259126] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for b4c81a39-2152-45c5-95d3-b796d88226fe >> [2018-02-09 19:31:24.261664] E >> [cli-cmd-volume.c:1674:cli_cmd_quota_handle_list_all] 0-cli: Failed to get >> quota limits for 16ac4cde-a5d4-451f-adcc-422a542fea24 >> [2018-02-09 19:31:24.261719] I [input.c:31:cli_batch] 0-: Exiting with: 0 >> >> How can I fix my quota on that volume again? I had around 30 quotas set on >> different directories of that volume. >> >> Thanks in advance. >> >> Regards, >> M. >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Regards, > Hari Gowtham.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Failed to get quota limits
Thank you for your answer. This problem seem to have started since last week, so should I also send you the same log files but for last week? I think logrotate rotates them on a weekly basis. The only two quota commands we use are the following: gluster volume quota myvolume limit-usage /directory 10GB gluster volume quota myvolume list basically to set a new quota or to list the current quotas. The quota list was working in the past yes but we already had a similar issue where the quotas disappeared last August 2017: http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html In the mean time the only thing we did is to upgrade from 3.8 to 3.10. There are actually no errors to be seen using any gluster commands. The "quota myvolume list" returns simply nothing. In order to lookup the directories should I run a "stat" on them? and if yes should I do that on a client through the fuse mount? Original Message On February 13, 2018 10:58 AM, Hari Gowtham <hgowt...@redhat.com> wrote: >The log provided are from 11th, you have seen the issue a while before > that itself. > > The logs help us to know if something has actually went wrong. > once something goes wrong the output might get affected and i need to know > what > went wrong. Which means i need the log from the beginning. > > and i need to know a few more things, > Was the quota list command was working as expected at the beginning? > If yes, what were the commands issued, before you noticed this problem. > Is there any other error that you see other than this? > > And can you try looking up the directories the limits are set on and > check if that fixes the error? > >> Original Message >> On February 13, 2018 10:44 AM, mabi m...@protonmail.ch wrote: >>>Hi Hari, >>>Sure no problem, I will send you in a minute another mail where you can >>>download all the relevant log files including the quota.conf binary file. >>>Let me know if you need anything else. In the mean time here below is the >>>output of a volume status. >>>Best regards, >>> M. >>>Status of volume: myvolume >>> Gluster process TCP Port RDMA Port Online Pid >>>Brick gfs1a.domain.local:/data/myvolume >>> /brick 49153 0 Y 3214 >>> Brick gfs1b.domain.local:/data/myvolume >>> /brick 49154 0 Y 3256 >>> Brick gfs1c.domain.local:/srv/glusterf >>> s/myvolume/brick 49153 0 Y 515 >>> Self-heal Daemon on localhost N/A N/AY >>> 3186 >>> Quota Daemon on localhost N/A N/AY >>> 3195 >>> Self-heal Daemon on gfs1b.domain.local N/A N/AY 3217 >>> Quota Daemon on gfs1b.domain.local N/A N/AY 3229 >>> Self-heal Daemon on gfs1c.domain.local N/A N/AY 486 >>> Quota Daemon on gfs1c.domain.local N/A N/AY 495 >>>Task Status of Volume myvolume >>>There are no active volume tasks >>> Original Message >>> On February 13, 2018 10:09 AM, Hari Gowtham hgowt...@redhat.com wrote: >>>>Hi, >>>> A part of the log won't be enough to debug the issue. >>>> Need the whole log messages till date. >>>> You can send it as attachments. >>>> Yes the quota.conf is a binary file. >>>> And I need the volume status output too. >>>> On Tue, Feb 13, 2018 at 1:56 PM, mabi m...@protonmail.ch wrote: >>>>>Hi Hari, >>>>> Sorry for not providing you more details from the start. Here below you >>>>> will >>>>> find all the relevant log entries and info. Regarding the quota.conf file >>>>> I >>>>> have found one for my volume but it is a binary file. Is it supposed to be >>>>> binary or text? >>>>> Regards, >>>>> M. >>>>> *** gluster volume info myvolume *** >>>>> Volume Name: myvolume >>>>> Type: Replicate >>>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 1 x (2 + 1) = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: gfs1a.domain.local:/data/myvolume/brick >>>>> Brick2: gfs1b.domain.local:/data/myvolume/brick >>>>> Brick3: gfs1c.domain.local:/srv/
Re: [Gluster-users] Failed to get quota limits
I tried to set the limits as you suggest by running the following command. $ sudo gluster volume quota myvolume limit-usage /directory 200GB volume quota : success but then when I list the quotas there is still nothing, so nothing really happened. I also tried to run stat on all directories which have a quota but nothing happened either. I will send you tomorrow all the other logfiles as requested. Original Message On February 13, 2018 12:20 PM, Hari Gowtham <hgowt...@redhat.com> wrote: >Were you able to set new limits after seeing this error? > > On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com wrote: >>Yes, I need the log files in that duration, the log rotated file after >> hitting the >> issue aren't necessary, but the ones before hitting the issues are needed >> (not just when you hit it, the ones even before you hit it). >>Yes, you have to do a stat from the client through fuse mount. >>On Tue, Feb 13, 2018 at 3:56 PM, mabi m...@protonmail.ch wrote: >>>Thank you for your answer. This problem seem to have started since last >>>week, so should I also send you the same log files but for last week? I >>>think logrotate rotates them on a weekly basis. >>>The only two quota commands we use are the following: >>>gluster volume quota myvolume limit-usage /directory 10GB >>> gluster volume quota myvolume list >>>basically to set a new quota or to list the current quotas. The quota list >>>was working in the past yes but we already had a similar issue where the >>>quotas disappeared last August 2017: >>>http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html >>>In the mean time the only thing we did is to upgrade from 3.8 to 3.10. >>>There are actually no errors to be seen using any gluster commands. The >>>"quota myvolume list" returns simply nothing. >>>In order to lookup the directories should I run a "stat" on them? and if yes >>>should I do that on a client through the fuse mount? >>> Original Message >>> On February 13, 2018 10:58 AM, Hari Gowtham hgowt...@redhat.com wrote: >>>>The log provided are from 11th, you have seen the issue a while before >>>> that itself. >>>>The logs help us to know if something has actually went wrong. >>>> once something goes wrong the output might get affected and i need to know >>>> what >>>> went wrong. Which means i need the log from the beginning. >>>>and i need to know a few more things, >>>> Was the quota list command was working as expected at the beginning? >>>> If yes, what were the commands issued, before you noticed this problem. >>>> Is there any other error that you see other than this? >>>>And can you try looking up the directories the limits are set on and >>>> check if that fixes the error? >>>>> Original Message >>>>> On February 13, 2018 10:44 AM, mabi m...@protonmail.ch wrote: >>>>>>Hi Hari, >>>>>> Sure no problem, I will send you in a minute another mail where you can >>>>>> download all the relevant log files including the quota.conf binary >>>>>> file. Let me know if you need anything else. In the mean time here below >>>>>> is the output of a volume status. >>>>>> Best regards, >>>>>> M. >>>>>> Status of volume: myvolume >>>>>> Gluster process TCP Port RDMA Port Online >>>>>> Pid >>>>>> Brick gfs1a.domain.local:/data/myvolume >>>>>> /brick 49153 0 Y 3214 >>>>>> Brick gfs1b.domain.local:/data/myvolume >>>>>> /brick 49154 0 Y 3256 >>>>>> Brick gfs1c.domain.local:/srv/glusterf >>>>>> s/myvolume/brick 49153 0 Y 515 >>>>>> Self-heal Daemon on localhost N/A N/AY >>>>>> 3186 >>>>>> Quota Daemon on localhost N/A N/AY >>>>>> 3195 >>>>>> Self-heal Daemon on gfs1b.domain.local N/A N/AY 3217 >>>>>> Quota Daemon on gfs1b.domain.local N/A N/AY 3229 >>>>>> Self-heal Daemon on gfs1c.domain.local N/A N/AY 486 >>>>>> Quota Daemon on gfs1c.domain.l
Re: [Gluster-users] Failed to get quota limits
Dear Hari, Thank you for getting back to me after having analysed the problem. As you said I tried to run "gluster volume quota list " for all of my directories which have a quota and found out that there was one directory quota which was missing (stale) as you can see below: $ gluster volume quota myvolume list /demo.domain.tld Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /demo.domain.tldN/AN/A 8.0MB N/A N/AN/A So as you suggest I added again the quota on that directory and now the "list" finally works again and show the quotas for every directories as I defined them. That did the trick! Now do you know if this bug is already corrected in a new release of GlusterFS? if not do you know when it will be fixed? Again many thanks for your help here! Best regards, M. ‐‐‐ Original Message ‐‐‐ On February 23, 2018 7:45 AM, Hari Gowtham <hgowt...@redhat.com> wrote: > > > Hi, > > There is a bug in 3.10 which doesn't allow the quota list command to > > output, if the last entry on the conf file is a stale entry. > > The workaround for this is to remove the stale entry at the end. (If > > the last two entries are stale then both have to be removed and so on > > until the last entry on the conf file is a valid entry). > > This can be avoided by adding a new limit. As the new limit you added > > didn't work there is another way to check this. > > Try quota list command with a specific limit mentioned in the command. > > gluster volume quota list > > Make sure this path and the limit are set. > > If this works then you need to clean up the last stale entry. > > If this doesn't work we need to look further. > > Thanks Sanoj for the guidance. > > On Wed, Feb 14, 2018 at 1:36 AM, mabi m...@protonmail.ch wrote: > > > I tried to set the limits as you suggest by running the following command. > > > > $ sudo gluster volume quota myvolume limit-usage /directory 200GB > > > > volume quota : success > > > > but then when I list the quotas there is still nothing, so nothing really > > happened. > > > > I also tried to run stat on all directories which have a quota but nothing > > happened either. > > > > I will send you tomorrow all the other logfiles as requested. > > > > \-\-\-\-\-\-\-\- Original Message > > > > On February 13, 2018 12:20 PM, Hari Gowtham hgowt...@redhat.com wrote: > > > > > Were you able to set new limits after seeing this error? > > > > > > On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com wrote: > > > > > > > Yes, I need the log files in that duration, the log rotated file after > > > > > > > > hitting the > > > > > > > > issue aren't necessary, but the ones before hitting the issues are > > > > needed > > > > > > > > (not just when you hit it, the ones even before you hit it). > > > > > > > > Yes, you have to do a stat from the client through fuse mount. > > > > > > > > On Tue, Feb 13, 2018 at 3:56 PM, mabi m...@protonmail.ch wrote: > > > > > > > > > Thank you for your answer. This problem seem to have started since > > > > > last week, so should I also send you the same log files but for last > > > > > week? I think logrotate rotates them on a weekly basis. > > > > > > > > > > The only two quota commands we use are the following: > > > > > > > > > > gluster volume quota myvolume limit-usage /directory 10GB > > > > > > > > > > gluster volume quota myvolume list > > > > > > > > > > basically to set a new quota or to list the current quotas. The quota > > > > > list was working in the past yes but we already had a similar issue > > > > > where the quotas disappeared last August 2017: > > > > > > > > > > http://lists.gluster.org/pipermail/gluster-users/2017-August/031946.html > > > > > > > > > > In the mean time the only thing we did is to upgrade from 3.8 to 3.10. > > > > > > > > > > There are actually no errors to be seen using any gluster commands. > > > > > The "quota myvolume list" returns simply n
Re: [Gluster-users] glustereventsd not being stopped by systemd script
Hi Aravinda, Thanks for the info, somehow I wasn't aware about this new service. Now it's clear and I updated my documentation. Best regards, M. ‐‐‐ Original Message ‐‐‐ On July 30, 2018 5:59 AM, Aravinda Vishwanathapura Krishna Murthy wrote: > On Mon, Jul 30, 2018 at 1:03 AM mabi wrote: > >> Hi, >> >> I just noticed that when I run a "systemctl stop glusterfs" on Debian 9 the >> following glustereventsd processes are still running: >> >> root 2471 1 0 22:03 ?00:00:00 python >> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid >> root 2489 2471 0 22:03 ?00:00:00 python >> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid >> >> Isn't the glusterfs systemd command also supposed to stop these? > > glustereventsd is a separate process which can be managed independent of > glusterd. "systemctl stop glustereventsd" will stop the eventsd service. > >> I ran into this while upgrading from 3.12.9 to 3.12.12 and I thought I would >> mention it in case it has been forgotten. >> >> Best regards, >> M. >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > regards > Aravinda VK___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] glustereventsd not being stopped by systemd script
Hi, I just noticed that when I run a "systemctl stop glusterfs" on Debian 9 the following glustereventsd processes are still running: root 2471 1 0 22:03 ?00:00:00 python /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid root 2489 2471 0 22:03 ?00:00:00 python /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid Isn't the glusterfs systemd command also supposed to stop these? I ran into this while upgrading from 3.12.9 to 3.12.12 and I thought I would mention it in case it has been forgotten. Best regards, M. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] blocking process on FUSE mount in directory which is using quota
Hello, I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes. This directory has one particularity compared to the other directories it is that it has reached it's quota soft-limit as you can see here in the output of gluster volume quota list: Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special? Here is an the linux stack of a blocking process on that directory which happened with a simple "ls -la": [Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds. [Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1 [Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Aug 9 14:21:07 2018] ls D 88017ef93200 0 2272 2268 0x0004 [Thu Aug 9 14:21:07 2018] 88017653f490 0286 00013200 880174d7bfd8 [Thu Aug 9 14:21:07 2018] 00013200 88017653f490 8800eeb3d5f0 8800fefac800 [Thu Aug 9 14:21:07 2018] 880174d7bbe0 8800eeb3d6d0 8800fefac800 8800ffe1e1c0 [Thu Aug 9 14:21:07 2018] Call Trace: [Thu Aug 9 14:21:07 2018] [] ? __fuse_request_send+0xbd/0x270 [fuse] [Thu Aug 9 14:21:07 2018] [] ? prepare_to_wait_event+0xf0/0xf0 [Thu Aug 9 14:21:07 2018] [] ? fuse_dentry_revalidate+0x181/0x300 [fuse] [Thu Aug 9 14:21:07 2018] [] ? lookup_fast+0x25e/0x2b0 [Thu Aug 9 14:21:07 2018] [] ? path_lookupat+0x155/0x780 [Thu Aug 9 14:21:07 2018] [] ? kmem_cache_alloc+0x75/0x480 [Thu Aug 9 14:21:07 2018] [] ? fuse_getxattr+0xe9/0x150 [fuse] [Thu Aug 9 14:21:07 2018] [] ? filename_lookup+0x26/0xc0 [Thu Aug 9 14:21:07 2018] [] ? user_path_at_empty+0x54/0x90 [Thu Aug 9 14:21:07 2018] [] ? kmem_cache_free+0xd8/0x210 [Thu Aug 9 14:21:07 2018] [] ? user_path_at_empty+0x5f/0x90 [Thu Aug 9 14:21:07 2018] [] ? vfs_fstatat+0x46/0x90 [Thu Aug 9 14:21:07 2018] [] ? SYSC_newlstat+0x1d/0x40 [Thu Aug 9 14:21:07 2018] [] ? SyS_lgetxattr+0x58/0x80 [Thu Aug 9 14:21:07 2018] [] ? system_call_fast_compare_end+0x10/0x15 My 3 gluster nodes are all Debian 9 and my client Debian 8. Let me know if you need more information. Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota
Hi Nithya, Thanks for the fast answer. Here the additional info: 1. gluster volume info Volume Name: myvol-private Type: Replicate Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gfs1a:/data/myvol-private/brick Brick2: gfs1b:/data/myvol-private/brick Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter) Options Reconfigured: features.default-soft-limit: 95% transport.address-family: inet features.quota-deem-statfs: on features.inode-quota: on features.quota: on nfs.disable: on performance.readdir-ahead: on client.event-threads: 4 server.event-threads: 4 auth.allow: 192.168.100.92 2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")? Regards, M. ‐‐‐ Original Message ‐‐‐ On August 9, 2018 3:10 PM, Nithya Balachandran wrote: > Hi, > > Please provide the following: > > - gluster volume info > - statedump of the fuse process when it hangs > > Thanks, > Nithya > > On 9 August 2018 at 18:24, mabi wrote: > >> Hello, >> >> I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 >> and now I see a weird behaviour on my client (using FUSE mount) where I have >> processes (PHP 5.6 FPM) trying to access a specific directory and then the >> process blocks. I can't kill the process either, not even with kill -9. I >> need to reboot the machine in order to get rid of these blocked processes. >> >> This directory has one particularity compared to the other directories it is >> that it has reached it's quota soft-limit as you can see here in the output >> of gluster volume quota list: >> >> Path Hard-limit Soft-limit Used >> Available Soft-limit exceeded? Hard-limit exceeded? >> --- >> /directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB >> Yes No >> >> That does not mean that it is the quota's fault but it might be a hint where >> to start looking for... And by the way can someone explain me what the >> soft-limit does? or does it not do anything special? >> >> Here is an the linux stack of a blocking process on that directory which >> happened with a simple "ls -la": >> >> [Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 >> seconds. >> [Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1 >> [Thu Aug 9 14:21:07 2018] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [Thu Aug 9 14:21:07 2018] ls D 88017ef93200 0 2272 >> 2268 0x0004 >> [Thu Aug 9 14:21:07 2018] 88017653f490 0286 >> 00013200 880174d7bfd8 >> [Thu Aug 9 14:21:07 2018] 00013200 88017653f490 >> 8800eeb3d5f0 8800fefac800 >> [Thu Aug 9 14:21:07 2018] 880174d7bbe0 8800eeb3d6d0 >> 8800fefac800 8800ffe1e1c0 >> [Thu Aug 9 14:21:07 2018] Call Trace: >> [Thu Aug 9 14:21:07 2018] [] ? >> __fuse_request_send+0xbd/0x270 [fuse] >> [Thu Aug 9 14:21:07 2018] [] ? >> prepare_to_wait_event+0xf0/0xf0 >> [Thu Aug 9 14:21:07 2018] [] ? >> fuse_dentry_revalidate+0x181/0x300 [fuse] >> [Thu Aug 9 14:21:07 2018] [] ? lookup_fast+0x25e/0x2b0 >> [Thu Aug 9 14:21:07 2018] [] ? path_lookupat+0x155/0x780 >> [Thu Aug 9 14:21:07 2018] [] ? >> kmem_cache_alloc+0x75/0x480 >> [Thu Aug 9 14:21:07 2018] [] ? fuse_getxattr+0xe9/0x150 >> [fuse] >> [Thu Aug 9 14:21:07 2018] [] ? filename_lookup+0x26/0xc0 >> [Thu Aug 9 14:21:07 2018] [] ? >> user_path_at_empty+0x54/0x90 >> [Thu Aug 9 14:21:07 2018] [] ? kmem_cache_free+0xd8/0x210 >> [Thu Aug 9 14:21:07 2018] [] ? >> user_path_at_empty+0x5f/0x90 >> [Thu Aug 9 14:21:07 2018] [] ? vfs_fstatat+0x46/0x90 >> [Thu Aug 9 14:21:07 2018] [] ? SYSC_newlstat+0x1d/0x40 >> [Thu Aug 9 14:21:07 2018] [] ? SyS_lgetxattr+0x58/0x80 >> [Thu Aug 9 14:21:07 2018] [] ? >> system_call_fast_compare_end+0x10/0x15 >> >> My 3 gluster nodes are all Debian 9 and my client Debian 8. >> >> Let me know if you need more information. >> >> Best regards, >> Mabi >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota
Thanks for the documentation. On my client using FUSE mount I found the PID by using ps (output below): root 456 1 4 14:17 ?00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private Then I ran the following command sudo kill -USR1 456 but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client. ‐‐‐ Original Message ‐‐‐ On August 9, 2018 3:59 PM, Raghavendra Gowdappa wrote: > On Thu, Aug 9, 2018 at 6:47 PM, mabi wrote: > >> Hi Nithya, >> >> Thanks for the fast answer. Here the additional info: >> >> 1. gluster volume info >> >> Volume Name: myvol-private >> Type: Replicate >> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: gfs1a:/data/myvol-private/brick >> Brick2: gfs1b:/data/myvol-private/brick >> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter) >> Options Reconfigured: >> features.default-soft-limit: 95% >> transport.address-family: inet >> features.quota-deem-statfs: on >> features.inode-quota: on >> features.quota: on >> nfs.disable: on >> performance.readdir-ahead: on >> client.event-threads: 4 >> server.event-threads: 4 >> auth.allow: 192.168.100.92 >> >> 2. Sorry I have no clue how to take a "statedump" of a process on Linux. >> Which command should I use for that? and which process would you like, the >> blocked process (for example "ls")? > > Statedumps are gluster specific. Please refer to > https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions. > >> Regards, >> M. >> >> ‐‐‐ Original Message ‐‐‐ >> On August 9, 2018 3:10 PM, Nithya Balachandran wrote: >> >>> Hi, >>> >>> Please provide the following: >>> >>> - gluster volume info >>> - statedump of the fuse process when it hangs >>> >>> Thanks, >>> Nithya >>> >>> On 9 August 2018 at 18:24, mabi wrote: >>> >>>> Hello, >>>> >>>> I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 >>>> and now I see a weird behaviour on my client (using FUSE mount) where I >>>> have processes (PHP 5.6 FPM) trying to access a specific directory and >>>> then the process blocks. I can't kill the process either, not even with >>>> kill -9. I need to reboot the machine in order to get rid of these blocked >>>> processes. >>>> >>>> This directory has one particularity compared to the other directories it >>>> is that it has reached it's quota soft-limit as you can see here in the >>>> output of gluster volume quota list: >>>> >>>> Path Hard-limit Soft-limit Used >>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>> --- >>>> /directory 100.0GB 80%(80.0GB) 90.5GB >>>> 9.5GB Yes No >>>> >>>> That does not mean that it is the quota's fault but it might be a hint >>>> where to start looking for... And by the way can someone explain me what >>>> the soft-limit does? or does it not do anything special? >>>> >>>> Here is an the linux stack of a blocking process on that directory which >>>> happened with a simple "ls -la": >>>> >>>> [Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 >>>> seconds. >>>> [Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1 >>>> [Thu Aug 9 14:21:07 2018] "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> [Thu Aug 9 14:21:07 2018] ls D 88017ef93200 0 2272 >>>> 2268 0x0004 >>>> [Thu Aug 9 14:21:07 2018] 88017653f490 0286 >>>> 00013200 880174d7bfd8 >>>> [Thu Aug 9 14:21:07 2018] 00013200 88017653f490 >>>> 8800eeb3d5f0 8800fefac800 >>>> [Thu Aug 9 14:21:07 2018] 880174d7bbe0 8800
Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota
As you mentioned after creating the /var/run/gluster directory I got a statedump file in there. As a workaround I have now removed the quota for this specific directory and as it is a production server I can currently not "play" with it by adding the quota back and having the same problem as it requires me to reboot the server with downtime... But I can confirm that by removing the quota from that directory, the problem is gone (no more blocking processes such as "ls") so there must be an issue or bug with the quota part of gluster. ‐‐‐ Original Message ‐‐‐ On August 10, 2018 4:19 PM, Nithya Balachandran wrote: > On 9 August 2018 at 19:54, mabi wrote: > >> Thanks for the documentation. On my client using FUSE mount I found the PID >> by using ps (output below): >> >> root 456 1 4 14:17 ?00:05:15 /usr/sbin/glusterfs >> --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private >> >> Then I ran the following command >> >> sudo kill -USR1 456 >> >> but now I can't find where the files are stored. Are these supposed to be >> stored on the client directly? I checked /var/run/gluster and >> /var/log/gluster but could not see anything and /var/log/gluster does not >> even exist on the client. > > They are usually created in /var/run/gluster. You will need to create the > directory on the client if it does not exist. > >> ‐‐‐ Original Message ‐‐‐ >> On August 9, 2018 3:59 PM, Raghavendra Gowdappa wrote: >> >>> On Thu, Aug 9, 2018 at 6:47 PM, mabi wrote: >>> >>>> Hi Nithya, >>>> >>>> Thanks for the fast answer. Here the additional info: >>>> >>>> 1. gluster volume info >>>> >>>> Volume Name: myvol-private >>>> Type: Replicate >>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x (2 + 1) = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: gfs1a:/data/myvol-private/brick >>>> Brick2: gfs1b:/data/myvol-private/brick >>>> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter) >>>> Options Reconfigured: >>>> features.default-soft-limit: 95% >>>> transport.address-family: inet >>>> features.quota-deem-statfs: on >>>> features.inode-quota: on >>>> features.quota: on >>>> nfs.disable: on >>>> performance.readdir-ahead: on >>>> client.event-threads: 4 >>>> server.event-threads: 4 >>>> auth.allow: 192.168.100.92 >>>> >>>> 2. Sorry I have no clue how to take a "statedump" of a process on Linux. >>>> Which command should I use for that? and which process would you like, the >>>> blocked process (for example "ls")? >>> >>> Statedumps are gluster specific. Please refer to >>> https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions. >>> >>>> Regards, >>>> M. >>>> >>>> ‐‐‐ Original Message ‐‐‐ >>>> On August 9, 2018 3:10 PM, Nithya Balachandran wrote: >>>> >>>>> Hi, >>>>> >>>>> Please provide the following: >>>>> >>>>> - gluster volume info >>>>> - statedump of the fuse process when it hangs >>>>> >>>>> Thanks, >>>>> Nithya >>>>> >>>>> On 9 August 2018 at 18:24, mabi wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I recently upgraded my GlusterFS replica 2+1 (aribter) to version >>>>>> 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) >>>>>> where I have processes (PHP 5.6 FPM) trying to access a specific >>>>>> directory and then the process blocks. I can't kill the process either, >>>>>> not even with kill -9. I need to reboot the machine in order to get rid >>>>>> of these blocked processes. >>>>>> >>>>>> This directory has one particularity compared to the other directories >>>>>> it is that it has reached it's quota soft-limit as you can see here in >>>>>> the output of gluster volume quota list: >>>>>> >>>>>> Path Hard-limit Soft-limit >>>>>> Used Available Soft-limi
Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota
Bad news: the process blocked happened again this time with another directory of another user which is NOT over his quota but which also has quota enabled. The symptoms on the Linux side are the same: [Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 seconds. [Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1 [Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Aug 14 15:30:33 2018] php5-fpmD 8801fea13200 0 14773 729 0x [Tue Aug 14 15:30:33 2018] 880100bbe0d0 0282 00013200 880129bcffd8 [Tue Aug 14 15:30:33 2018] 00013200 880100bbe0d0 880153ed0d68 880129bcfee0 [Tue Aug 14 15:30:33 2018] 880153ed0d6c 880100bbe0d0 880153ed0d70 [Tue Aug 14 15:30:33 2018] Call Trace: [Tue Aug 14 15:30:33 2018] [] ? schedule_preempt_disabled+0x25/0x70 [Tue Aug 14 15:30:33 2018] [] ? __mutex_lock_slowpath+0xd3/0x1d0 [Tue Aug 14 15:30:33 2018] [] ? write_inode_now+0x93/0xc0 [Tue Aug 14 15:30:33 2018] [] ? mutex_lock+0x1b/0x2a [Tue Aug 14 15:30:33 2018] [] ? fuse_flush+0x8f/0x1e0 [fuse] [Tue Aug 14 15:30:33 2018] [] ? vfs_read+0x93/0x170 [Tue Aug 14 15:30:33 2018] [] ? filp_close+0x2a/0x70 [Tue Aug 14 15:30:33 2018] [] ? SyS_close+0x1f/0x50 [Tue Aug 14 15:30:33 2018] [] ? system_call_fast_compare_end+0x10/0x15 and if I check this process it has state "D" which is "D = uninterruptible sleep". Now I also managed to take a statedump file as recommended but I see in its content under the "[io-cache.inode]" a "path=" which I would need to remove as it contains filenames for privacy reasons. Can I remove every "path=" line and still send you the statedump file for analysis? Thank you. ‐‐‐ Original Message ‐‐‐ On August 14, 2018 10:48 AM, Nithya Balachandran wrote: > Thanks for letting us know. Sanoj, can you take a look at this? > > Thanks. > Nithya > > On 14 August 2018 at 13:58, mabi wrote: > >> As you mentioned after creating the /var/run/gluster directory I got a >> statedump file in there. >> >> As a workaround I have now removed the quota for this specific directory and >> as it is a production server I can currently not "play" with it by adding >> the quota back and having the same problem as it requires me to reboot the >> server with downtime... >> >> But I can confirm that by removing the quota from that directory, the >> problem is gone (no more blocking processes such as "ls") so there must be >> an issue or bug with the quota part of gluster. >> >> ‐‐‐ Original Message ‐‐‐ >> On August 10, 2018 4:19 PM, Nithya Balachandran wrote: >> >>> On 9 August 2018 at 19:54, mabi wrote: >>> >>>> Thanks for the documentation. On my client using FUSE mount I found the >>>> PID by using ps (output below): >>>> >>>> root 456 1 4 14:17 ?00:05:15 /usr/sbin/glusterfs >>>> --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private >>>> >>>> Then I ran the following command >>>> >>>> sudo kill -USR1 456 >>>> >>>> but now I can't find where the files are stored. Are these supposed to be >>>> stored on the client directly? I checked /var/run/gluster and >>>> /var/log/gluster but could not see anything and /var/log/gluster does not >>>> even exist on the client. >>> >>> They are usually created in /var/run/gluster. You will need to create the >>> directory on the client if it does not exist. >>> >>>> ‐‐‐ Original Message ‐‐‐ >>>> On August 9, 2018 3:59 PM, Raghavendra Gowdappa >>>> wrote: >>>> >>>>> On Thu, Aug 9, 2018 at 6:47 PM, mabi wrote: >>>>> >>>>>> Hi Nithya, >>>>>> >>>>>> Thanks for the fast answer. Here the additional info: >>>>>> >>>>>> 1. gluster volume info >>>>>> >>>>>> Volume Name: myvol-private >>>>>> Type: Replicate >>>>>> Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 1 x (2 + 1) = 3 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: gfs1a:/data/myvol-private/brick >>>>>> Brick2: gfs1b:/data/myvol-private/brick >>>>>> Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arb
Re: [Gluster-users] Possibly missing two steps in upgrade to 4.1 guide
Oops missed that part at the bottom, thanks Hu Bert! Now the only thing missing from the upgrade guide is what to do about the glustereventsd service during the upgrade. ‐‐‐ Original Message ‐‐‐ On August 21, 2018 4:11 PM, Hu Bert wrote: > I think point 2 is already covered by the guide; see: "Upgrade > procedure for clients" > > Following are the steps to upgrade clients to the 4.1.x version, > > NOTE: x is the minor release number for the release > > > > > Unmount all glusterfs mount points on the client > > Stop all applications that access the volumes via gfapi (qemu, etc.) > Install Gluster 4.1 > > > > > Mount all gluster shares > > Start any applications that were stopped previously in step (2) > > 2018-08-21 15:33 GMT+02:00 mabi m...@protonmail.ch: > > > Hello, > > I just upgraded from 4.0.2 to 4.1.2 using the official documentation: > > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ > > I noticed that this documentation might be missing the following two > > additional steps: > > > > 1. restart the glustereventsd service > > 2. umount and mount again gluster fuse mounts on clients after upgrading > > the clients (if using glusterfs fuse mounts of course) > > > > Best regards, > > M. > > > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Possibly missing two steps in upgrade to 4.1 guide
Funny I also use stretch but from 4.0.2 to 4.1.2 the glustereventsd did not get restarted automatically so I restarted it manually after having finished the upgrade. ‐‐‐ Original Message ‐‐‐ On August 21, 2018 4:20 PM, Hu Bert wrote: > today i tested an upgrade 3.12.12 -> 4.1.2, and the glustereventsd > service was restarted. We use debian stretch; maybe it depends on the > operating system? > > 2018-08-21 16:17 GMT+02:00 mabi m...@protonmail.ch: > > > Oops missed that part at the bottom, thanks Hu Bert! > > Now the only thing missing from the upgrade guide is what to do about the > > glustereventsd service during the upgrade. > > ‐‐‐ Original Message ‐‐‐ > > On August 21, 2018 4:11 PM, Hu Bert revi...@googlemail.com wrote: > > > > > I think point 2 is already covered by the guide; see: "Upgrade > > > procedure for clients" > > > Following are the steps to upgrade clients to the 4.1.x version, > > > NOTE: x is the minor release number for the release > > > > > > > > > Unmount all glusterfs mount points on the client > > > > > > Stop all applications that access the volumes via gfapi (qemu, etc.) > > > Install Gluster 4.1 > > > > > > > > > Mount all gluster shares > > > > > > Start any applications that were stopped previously in step (2) > > > 2018-08-21 15:33 GMT+02:00 mabi m...@protonmail.ch: > > > > > > > Hello, > > > > I just upgraded from 4.0.2 to 4.1.2 using the official documentation: > > > > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ > > > > I noticed that this documentation might be missing the following two > > > > additional steps: > > > > > > > > 1. restart the glustereventsd service > > > > 2. umount and mount again gluster fuse mounts on clients after > > > > upgrading the clients (if using glusterfs fuse mounts of course) > > > > > > > > Best regards, > > > > M. > > > > Gluster-users mailing list > > > > Gluster-users@gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Possibly missing two steps in upgrade to 4.1 guide
Hello, I just upgraded from 4.0.2 to 4.1.2 using the official documentation: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ I noticed that this documentation might be missing the following two additional steps: 1) restart the glustereventsd service 2) umount and mount again gluster fuse mounts on clients after upgrading the clients (if using glusterfs fuse mounts of course) Best regards, M. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0
Hi Amar, Just wanted to say that I think the quota feature in GlusterFS is really useful. In my case I use it on one volume where I have many cloud installations (mostly files) for different people and all these need to have a different quota set on a specific directory. The GlusterFS quota allows me nicely to manage that which would not be possible in the application directly. It would really be an overhead for me to for example to have one volume per installation just because of setting the max size like that. I hope that this feature can continue to exist. Best regards, M. ‐‐‐ Original Message ‐‐‐ On July 19, 2018 8:56 AM, Amar Tumballi wrote: > Hi all, > > Over last 12 years of Gluster, we have developed many features, and continue > to support most of it till now. But along the way, we have figured out better > methods of doing things. Also we are not actively maintaining some of these > features. > > We are now thinking of cleaning up some of these ‘unsupported’ features, and > mark them as ‘SunSet’ (i.e., would be totally taken out of codebase in > following releases) in next upcoming release, v5.0. The release notes will > provide options for smoothly migrating to the supported configurations. > > If you are using any of these features, do let us know, so that we can help > you with ‘migration’.. Also, we are happy to guide new developers to work on > those components which are not actively being maintained by current set of > developers. > > List of features hitting sunset: > > ‘cluster/stripe’ translator: > > This translator was developed very early in the evolution of GlusterFS, and > addressed one of the very common question of Distributed FS, which is “What > happens if one of my file is bigger than the available brick. Say, I have 2 > TB hard drive, exported in glusterfs, my file is 3 TB”. While it solved the > purpose, it was very hard to handle failure scenarios, and give a real good > experience to our users with this feature. Over the time, Gluster solved the > problem with it’s ‘Shard’ feature, which solves the problem in much better > way, and provides much better solution with existing well supported stack. > Hence the proposal for Deprecation. > > If you are using this feature, then do write to us, as it needs a proper > migration from existing volume to a new full supported volume type before you > upgrade. > > ‘storage/bd’ translator: > > This feature got into the code base 5 years back with this > [patch](http://review.gluster.org/4809)[1]. Plan was to use a block device > directly as a brick, which would help to handle disk-image storage much > easily in glusterfs. > > As the feature is not getting more contribution, and we are not seeing any > user traction on this, would like to propose for Deprecation. > > If you are using the feature, plan to move to a supported gluster volume > configuration, and have your setup ‘supported’ before upgrading to your new > gluster version. > > ‘RDMA’ transport support: > > Gluster started supporting RDMA while ib-verbs was still new, and very > high-end infra around that time were using Infiniband. Engineers did work > with Mellanox, and got the technology into GlusterFS for better data > migration, data copy. While current day kernels support very good speed with > IPoIB module itself, and there are no more bandwidth for experts in these > area to maintain the feature, we recommend migrating over to TCP (IP based) > network for your volume. > > If you are successfully using RDMA transport, do get in touch with us to > prioritize the migration plan for your volume. Plan is to work on this after > the release, so by version 6.0, we will have a cleaner transport code, which > just needs to support one type. > > ‘Tiering’ feature > > Gluster’s tiering feature which was planned to be providing an option to keep > your ‘hot’ data in different location than your cold data, so one can get > better performance. While we saw some users for the feature, it needs much > more attention to be completely bug free. At the time, we are not having any > active maintainers for the feature, and hence suggesting to take it out of > the ‘supported’ tag. > > If you are willing to take it up, and maintain it, do let us know, and we are > happy to assist you. > > If you are already using tiering feature, before upgrading, make sure to do > gluster volume tier detach all the bricks before upgrading to next release. > Also, we recommend you to use features like dmcache on your LVM setup to get > best performance from bricks. > > ‘Quota’ > > This is a call out for ‘Quota’ feature, to let you all know that it will be > ‘no new development’ state. While this feature is ‘actively’ in use by many > people, the challenges we have in accounting mechanisms involved, has made it > hard to achieve good performance with the feature. Also, the amount of > extended attribute get/set operations while using the
Re: [Gluster-users] blocking process on FUSE mount in directory which is using quota
Hello, I wanted to report that I had this morning a similar issue on another server where a few PHP-FPM processes get blocked on different GlusterFS volume mounted through a FUSE mount. This GlusterFS volume has no quota enabled so it might not be quota related after all. Here would be the Linux kernel stack trace: [Sun Sep 2 06:47:47 2018] INFO: task php5-fpm:25880 blocked for more than 120 seconds. [Sun Sep 2 06:47:47 2018] Not tainted 3.16.0-4-amd64 #1 [Sun Sep 2 06:47:47 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Sep 2 06:47:47 2018] php5-fpmD 88017ee12f40 0 25880 1 0x0004 [Sun Sep 2 06:47:47 2018] 880101688b60 0282 00012f40 880059ca3fd8 [Sun Sep 2 06:47:47 2018] 00012f40 880101688b60 8801093b51b0 8801067ec800 [Sun Sep 2 06:47:47 2018] 880059ca3cc0 8801093b5290 8801093b51b0 880059ca3e80 [Sun Sep 2 06:47:47 2018] Call Trace: [Sun Sep 2 06:47:47 2018] [] ? __fuse_request_send+0xbd/0x270 [fuse] [Sun Sep 2 06:47:47 2018] [] ? prepare_to_wait_event+0xf0/0xf0 [Sun Sep 2 06:47:47 2018] [] ? fuse_send_write+0xd0/0x100 [fuse] [Sun Sep 2 06:47:47 2018] [] ? fuse_perform_write+0x26f/0x4b0 [fuse] [Sun Sep 2 06:47:47 2018] [] ? fuse_file_write_iter+0x1dd/0x2b0 [fuse] [Sun Sep 2 06:47:47 2018] [] ? new_sync_write+0x74/0xa0 [Sun Sep 2 06:47:47 2018] [] ? vfs_write+0xb2/0x1f0 [Sun Sep 2 06:47:47 2018] [] ? vfs_read+0xed/0x170 [Sun Sep 2 06:47:47 2018] [] ? SyS_write+0x42/0xa0 [Sun Sep 2 06:47:47 2018] [] ? SyS_lseek+0x7e/0xa0 [Sun Sep 2 06:47:47 2018] [] ? system_call_fast_compare_end+0x10/0x15 Did anyone already have time to have a look at the statedump file I sent around 3 weeks ago? I never saw this type of problems in the past and it started to appear since I upgraded to GluterFS 3.12.12. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On August 15, 2018 9:21 AM, mabi wrote: > Great, you will then find attached here the statedump of the client using the > FUSE glusterfs mount right after two processes have blocked. > > Two notes here regarding the "path=" in this statedump file: > - I have renamed all the "path=" which has the problematic directory as > "path=PROBLEMATIC_DIRECTORY_HERE > - All the other "path=" I have renamed them to "path=REMOVED_FOR_PRIVACY". > > Note also that funnily enough the number of "path=" for that problematic > directory sums up to exactly 5000 entries. Coincidence or hint to the problem > maybe? > > ‐‐‐ Original Message ‐‐‐ > On August 15, 2018 5:21 AM, Raghavendra Gowdappa wrote: > >> On Tue, Aug 14, 2018 at 7:23 PM, mabi wrote: >> >>> Bad news: the process blocked happened again this time with another >>> directory of another user which is NOT over his quota but which also has >>> quota enabled. >>> >>> The symptoms on the Linux side are the same: >>> >>> [Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than >>> 120 seconds. >>> [Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1 >>> [Tue Aug 14 15:30:33 2018] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> [Tue Aug 14 15:30:33 2018] php5-fpmD 8801fea13200 0 14773 >>> 729 0x >>> [Tue Aug 14 15:30:33 2018] 880100bbe0d0 0282 >>> 00013200 880129bcffd8 >>> [Tue Aug 14 15:30:33 2018] 00013200 880100bbe0d0 >>> 880153ed0d68 880129bcfee0 >>> [Tue Aug 14 15:30:33 2018] 880153ed0d6c 880100bbe0d0 >>> 880153ed0d70 >>> [Tue Aug 14 15:30:33 2018] Call Trace: >>> [Tue Aug 14 15:30:33 2018] [] ? >>> schedule_preempt_disabled+0x25/0x70 >>> [Tue Aug 14 15:30:33 2018] [] ? >>> __mutex_lock_slowpath+0xd3/0x1d0 >>> [Tue Aug 14 15:30:33 2018] [] ? write_inode_now+0x93/0xc0 >>> [Tue Aug 14 15:30:33 2018] [] ? mutex_lock+0x1b/0x2a >>> [Tue Aug 14 15:30:33 2018] [] ? fuse_flush+0x8f/0x1e0 >>> [fuse] >>> [Tue Aug 14 15:30:33 2018] [] ? vfs_read+0x93/0x170 >>> [Tue Aug 14 15:30:33 2018] [] ? filp_close+0x2a/0x70 >>> [Tue Aug 14 15:30:33 2018] [] ? SyS_close+0x1f/0x50 >>> [Tue Aug 14 15:30:33 2018] [] ? >>> system_call_fast_compare_end+0x10/0x15 >>> >>> and if I check this process it has state "D" which is "D = uninterruptible >>> sleep". >>> >>> Now I also managed to take a statedump file as recommended but I see in its >>> content under the "[io-cach
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Dear Ravi, Thank you for your mail and info. This is great news if these patches can make it into 3.12.12. I will then upgrade asap. Could anyone confirm in case these patches do not make it into 3.12.12? Because I would then rather wait for the next release. I was already told on this list that 3.12.9 should have fixed this issue but unfortunately it didn't. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On July 4, 2018 5:41 PM, Ravishankar N wrote: > > > Hi mabi, there are a couple of AFR patches from master that I'm > > currently back porting to the 3.12 branch: > > afr: heal gfids when file is not present on all bricks > > afr: don't update readables if inode refresh failed on all children > > afr: fix bug-1363721.t failure > > afr: add quorum checks in pre-op > > afr: don't treat all cases all bricks being blamed as split-brain > > afr: capture the correct errno in post-op quorum check > > afr: add quorum checks in post-op > > Many of these help make the transaction code more robust by fixing > > various corner cases. It would be great if you can wait for the next > > 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the > > issues go away. > > Note: CC'ing Karthik and Jiffin for their help in reviewing and merging > > the backports for the above patches. > > Thanks, > > Ravi > > On 07/04/2018 06:51 PM, mabi wrote: > > > Hello, > > > > I just wanted to let you know that last week I have upgraded my two replica > > nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) > > are running Debian 9 with a Linux 4 kernel. > > > > Unfortunately I still have the exact same issue. Another detail I might > > have not mentioned yet is that I have quotas enabled on this volume, I > > don't really know if that is relevant but who knows... > > > > As a reminder here is what happens on the client side which has the volume > > mounted via FUSE (take earlier today from the > > /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this > > specific case it's only one single file who had this issue. > > > > [2018-07-04 08:23:49.314252] E [MSGID: 109089] > > [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: > > failed to open the fd (0x7fccb00a5120, flags=010) on file > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > @ myvol-replicate-0 [Input/output error] > > > > [2018-07-04 08:23:49.328712] W [MSGID: 108027] > > [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read > > subvols for > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > > > [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] > > 0-glusterfs-fuse: 55916791: TRUNCATE() > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > => -1 (Input/output error) > > > > Best regards, > > > > M. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On June 22, 2018 4:44 PM, mabi m...@protonmail.ch wrote: > > > > > Hi, > > > > > > Now that this issue has happened a few times I noticed a few things which > > > might be helpful for debugging: > > > > > > - This problem happens when files are uploaded via a cloud app called > > > Nextcloud where the files are encrypted by the app itself on the server > > > side (PHP code) but only rarely and randomly. > > > > > > - It does not seem to happen with Nextcloud installation which does not > > > have server side encryption enabled. > > > > > > - When this happens both first and second node of the replica have 120k > > > of context switches and 25k interrupts, the arbiter node 30k context > > > switches/20k interrupts. No nodes are overloaded, there is no io/wait and > > > no network issues or disconnections. > > > > > > - All of the problematic files to heal have spaces in one of their > > > sub-directories (might be totally irrelevant). > > > > > > If that's of any use my two replica nodes are Debian 8 physical > > > servers with ZFS as file system for the bricks and the arbiter is a > > > Debian 9 virtual machine with XFS as file system for the brick. To mount > > > the volume I use a glusterfs fuse mount on the web server which has > > > Nextcloud running. > &g
Re: [Gluster-users] Release 3.12.12: Scheduled for the 11th of July
Hi Jiffin, Thank you very much for confirming. I will now find a maintenance window and upgrade GlusterFS. I will post back on this thread in case I still see any issues but hopefully it all goes well :-) Cheers, M. ‐‐‐ Original Message ‐‐‐ On July 11, 2018 4:10 PM, Jiffin Tony Thottan wrote: > Hi Mabi, > > I have checked with afr maintainer, all of the required changes is merged in > 3.12. > > Hence moving forward with 3.12.12 release > > Regards, > > Jiffin > > On Monday 09 July 2018 01:04 PM, mabi wrote: > >> Hi Jiffin, >> >> Based on the issues I am encountering on a nearly daily basis (See "New >> 3.12.7 possible split-brain on replica 3" thread in this ML) since now >> already 2-3 months I would be really glad if the required fixes as mentioned >> by Ravi could make it into the 3.12.12 release. Ravi mentioned the following: >> >> afr: heal gfids when file is not present on all bricks >> afr: don't update readables if inode refresh failed on all children >> afr: fix bug-1363721.t failure >> afr: add quorum checks in pre-op >> afr: don't treat all cases all bricks being blamed as split-brain >> afr: capture the correct errno in post-op quorum check >> afr: add quorum checks in post-op >> >> Right now I only see the first one pending in the review dashboard. It would >> be great if all of them could make it into this release. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> On July 9, 2018 7:18 AM, Jiffin Tony Thottan >> [](mailto:jthot...@redhat.com) wrote: >> >>> Hi, >>> >>> It's time to prepare the 3.12.12 release, which falls on the 10th of >>> each month, and hence would be 11-07-2018 this time around. >>> >>> This mail is to call out the following, >>> >>> 1) Are there any pending *blocker* bugs that need to be tracked for >>> 3.12.12? If so mark them against the provided tracker [1] as blockers >>> for the release, or at the very least post them as a response to this >>> mail >>> >>> 2) Pending reviews in the 3.12 dashboard will be part of the release, >>> *iff* they pass regressions and have the review votes, so use the >>> dashboard [2] to check on the status of your patches to 3.12 and get >>> these going >>> >>> Thanks, >>> Jiffin >>> >>> [1] Release bug tracker: >>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.12 >>> >>> [2] 3.12 review dashboard: >>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Release 3.12.12: Scheduled for the 11th of July
Hi Jiffin, Based on the issues I am encountering on a nearly daily basis (See "New 3.12.7 possible split-brain on replica 3" thread in this ML) since now already 2-3 months I would be really glad if the required fixes as mentioned by Ravi could make it into the 3.12.12 release. Ravi mentioned the following: afr: heal gfids when file is not present on all bricks afr: don't update readables if inode refresh failed on all children afr: fix bug-1363721.t failure afr: add quorum checks in pre-op afr: don't treat all cases all bricks being blamed as split-brain afr: capture the correct errno in post-op quorum check afr: add quorum checks in post-op Right now I only see the first one pending in the review dashboard. It would be great if all of them could make it into this release. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On July 9, 2018 7:18 AM, Jiffin Tony Thottan wrote: > Hi, > > It's time to prepare the 3.12.12 release, which falls on the 10th of > each month, and hence would be 11-07-2018 this time around. > > This mail is to call out the following, > > 1) Are there any pending *blocker* bugs that need to be tracked for > 3.12.12? If so mark them against the provided tracker [1] as blockers > for the release, or at the very least post them as a response to this > mail > > 2) Pending reviews in the 3.12 dashboard will be part of the release, > *iff* they pass regressions and have the review votes, so use the > dashboard [2] to check on the status of your patches to 3.12 and get > these going > > Thanks, > Jiffin > > [1] Release bug tracker: > https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.12 > > [2] 3.12 review dashboard: > https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hello, I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel. Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows... As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue. [2018-07-04 08:23:49.314252] E [MSGID: 109089] [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: failed to open the fd (0x7fccb00a5120, flags=010) on file /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey @ myvol-replicate-0 [Input/output error] [2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error) Best regards, M. ‐‐‐ Original Message ‐‐‐ On June 22, 2018 4:44 PM, mabi wrote: > > > Hi, > > Now that this issue has happened a few times I noticed a few things which > might be helpful for debugging: > > - This problem happens when files are uploaded via a cloud app called > Nextcloud where the files are encrypted by the app itself on the server side > (PHP code) but only rarely and randomly. > - It does not seem to happen with Nextcloud installation which does not > have server side encryption enabled. > - When this happens both first and second node of the replica have 120k of > context switches and 25k interrupts, the arbiter node 30k context > switches/20k interrupts. No nodes are overloaded, there is no io/wait and no > network issues or disconnections. > - All of the problematic files to heal have spaces in one of their > sub-directories (might be totally irrelevant). > > If that's of any use my two replica nodes are Debian 8 physical servers > with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual > machine with XFS as file system for the brick. To mount the volume I use a > glusterfs fuse mount on the web server which has Nextcloud running. > > Regards, > > M. > > ‐‐‐ Original Message ‐‐‐ > > On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote: > > > > Thanks Ravi. Let me know when you have time to have a look. It sort of > > happens around once or twice per week but today it was 24 files in one go > > which are unsynched and where I need to manually reset the xattrs on the > > arbiter node. > > > > By the way on this volume I use quotas which I set on specifc directories, > > I don't know if this is relevant or not but thought I would just mention. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote: > > > > > On 05/23/2018 12:47 PM, mabi wrote: > > > > > > > Hello, > > > > > > > > I just wanted to ask if you had time to look into this bug I am > > > > encountering and if there is anything else I can do? > > > > > > > > For now in order to get rid of these 3 unsynched files shall I do the > > > > same method that was suggested to me in this thread? > > > > > > Sorry Mabi, I haven't had a chance to dig deeper into this. The > > > > > > workaround of resetting xattrs should be fine though. > > > > > > Thanks, > > > > > > Ravi > > > > > > > Thanks, > > > > > > > > Mabi > > > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: > > > > > > > > > Hi Ravi, > > > > > > > > > > Please fine below the answers to your questions > > > > > > > > > > 1. I have never touched the cluster.quorum-type option. Currently it > > > > > is set as following for this volume: > > > > > > > > > > Option Value > > > > >
Re: [Gluster-users] Failed to get quota limits
Hi, Thanks for the link to the bug. We should be hopefully moving soon onto 3.12 so I guess this bug is also fixed there. Best regards, M. ‐‐‐ Original Message ‐‐‐ On February 27, 2018 9:38 AM, Hari Gowtham <hgowt...@redhat.com> wrote: > > > Hi Mabi, > > The bugs is fixed from 3.11. For 3.10 it is yet to be backported and > > made available. > > The bug is https://bugzilla.redhat.com/show_bug.cgi?id=1418259. > > On Sat, Feb 24, 2018 at 4:05 PM, mabi m...@protonmail.ch wrote: > > > Dear Hari, > > > > Thank you for getting back to me after having analysed the problem. > > > > As you said I tried to run "gluster volume quota list " for > > all of my directories which have a quota and found out that there was one > > directory quota which was missing (stale) as you can see below: > > > > $ gluster volume quota myvolume list /demo.domain.tld > > > > Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit > > exceeded? > > > > > > -- > > > > /demo.domain.tld N/A N/A 8.0MB N/A N/A N/A > > > > So as you suggest I added again the quota on that directory and now the > > "list" finally works again and show the quotas for every directories as I > > defined them. That did the trick! > > > > Now do you know if this bug is already corrected in a new release of > > GlusterFS? if not do you know when it will be fixed? > > > > Again many thanks for your help here! > > > > Best regards, > > > > M. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On February 23, 2018 7:45 AM, Hari Gowtham hgowt...@redhat.com wrote: > > > > > Hi, > > > > > > There is a bug in 3.10 which doesn't allow the quota list command to > > > > > > output, if the last entry on the conf file is a stale entry. > > > > > > The workaround for this is to remove the stale entry at the end. (If > > > > > > the last two entries are stale then both have to be removed and so on > > > > > > until the last entry on the conf file is a valid entry). > > > > > > This can be avoided by adding a new limit. As the new limit you added > > > > > > didn't work there is another way to check this. > > > > > > Try quota list command with a specific limit mentioned in the command. > > > > > > gluster volume quota list > > > > > > Make sure this path and the limit are set. > > > > > > If this works then you need to clean up the last stale entry. > > > > > > If this doesn't work we need to look further. > > > > > > Thanks Sanoj for the guidance. > > > > > > On Wed, Feb 14, 2018 at 1:36 AM, mabi m...@protonmail.ch wrote: > > > > > > > I tried to set the limits as you suggest by running the following > > > > command. > > > > > > > > $ sudo gluster volume quota myvolume limit-usage /directory 200GB > > > > > > > > volume quota : success > > > > > > > > but then when I list the quotas there is still nothing, so nothing > > > > really happened. > > > > > > > > I also tried to run stat on all directories which have a quota but > > > > nothing happened either. > > > > > > > > I will send you tomorrow all the other logfiles as requested. > > > > > > > > \-\-\-\-\-\-\-\- Original Message > > > > > > > > On February 13, 2018 12:20 PM, Hari Gowtham hgowt...@redhat.com wrote: > > > > > > > > > Were you able to set new limits after seeing this error? > > > > > > > > > > On Tue, Feb 13, 2018 at 4:19 PM, Hari Gowtham hgowt...@redhat.com > > > > > wrote: > > > > > > > > > > > Yes, I need the log files in that duration, the log rotated file > > > > > > after > > > > > > > > > > > > hitting the > > > > > > > > > > > > issue aren't necessary, but the ones before hitting the issues are > > > > > > needed > > > > > > > > > > > > (not just when you hit it, the ones even before you hit it). > > > > > > > > > > > > Yes, you have to do
[Gluster-users] Can't stop volume using gluster volume stop
Hello, On one of my GlusterFS 3.12.7 3-way replica volume I can't stop it using the standard gluster volume stop command as you can see below: $ sudo gluster volume stop myvolume Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: myvolume: failed: geo-replication Unable to get the status of active geo-replication session for the volume 'myvolume'. Please check the log file for more info. In the past I had geo-replication running on that volume but because it did not perform well with millions of files I decided to delete it. Somehow it looks like it has not been totally deleted or correctly deleted else the volume stop command above should have worked. Nevertheless I can't find any traces of the geo-replication still configured as you can see below withe a geo-replication status command: $ sudo gluster volume geo-replication myvolume geo.domain.tld::myvolume-geo status detail No active geo-replication sessions between myvolume and geo.domain.tld::myvolume-geo Any ideas how I can fix that? Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Release 3.12.8: Scheduled for the 12th of April
Dear Jiffin, Would it be possible to have the following backported to 3.12: https://bugzilla.redhat.com/show_bug.cgi?id=1482064 See my mail with subject "New 3.12.7 possible split-brain on replica 3" on the list earlier this week for more details. Thank you very much. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On April 11, 2018 5:16 AM, Jiffin Tony Thottan <jthot...@redhat.com> wrote: > Hi, > > It's time to prepare the 3.12.8 release, which falls on the 10th of > each month, and hence would be 12-04-2018 this time around. > > This mail is to call out the following, > > 1) Are there any pending *blocker* bugs that need to be tracked for > 3.12.7? If so mark them against the provided tracker [1] as blockers > for the release, or at the very least post them as a response to this > mail > > 2) Pending reviews in the 3.12 dashboard will be part of the release, > *iff* they pass regressions and have the review votes, so use the > dashboard [2] to check on the status of your patches to 3.12 and get > these going > > 3) I have made checks on what went into 3.10 post 3.12 release and if > these fixes are already included in 3.12 branch, then status on this is > *green* > as all fixes ported to 3.10, are ported to 3.12 as well. > > @Mlind > > IMO https://review.gluster.org/19659 is like a minor feature to me. Can > please provide a justification for why it need to include in 3.12 stable > release? > > And please rebase the change as well > > @Raghavendra > > The smoke failed for https://review.gluster.org/#/c/19818/. Can please check > the same? > > Thanks, > Jiffin > > [1] Release bug tracker: > https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.8 > > [2] 3.12 review dashboard: > https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Release 3.12.8: Scheduled for the 12th of April
Thank you Ravi for your comments. I do understand that it might not be very wise to risk any mistakes by rushing this fix into 3.12.8. In that case I will be more patient and wait for 3.12.9 next month. ‐‐‐ Original Message ‐‐‐ On April 11, 2018 5:09 PM, Ravishankar N <ravishan...@redhat.com> wrote: > Mabi, > > It looks like one of the patches is not a straight forward cherry-pick to the > 3.12 branch. Even though the conflict might be easy to resolve, I don't think > it is a good idea to hurry it for tomorrow. We will definitely have it ready > by the next minor release (or if by chance the release is delayed and the > back port is reviewed and merged before that). Hope that is acceptable. > > -Ravi > > On 04/11/2018 01:11 PM, mabi wrote: > >> Dear Jiffin, >> >> Would it be possible to have the following backported to 3.12: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1482064 >> >> See my mail with subject "New 3.12.7 possible split-brain on replica 3" on >> the list earlier this week for more details. >> >> Thank you very much. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> On April 11, 2018 5:16 AM, Jiffin Tony Thottan >> [<jthot...@redhat.com>](mailto:jthot...@redhat.com) wrote: >> >>> Hi, >>> >>> It's time to prepare the 3.12.8 release, which falls on the 10th of >>> each month, and hence would be 12-04-2018 this time around. >>> >>> This mail is to call out the following, >>> >>> 1) Are there any pending *blocker* bugs that need to be tracked for >>> 3.12.7? If so mark them against the provided tracker [1] as blockers >>> for the release, or at the very least post them as a response to this >>> mail >>> >>> 2) Pending reviews in the 3.12 dashboard will be part of the release, >>> *iff* they pass regressions and have the review votes, so use the >>> dashboard [2] to check on the status of your patches to 3.12 and get >>> these going >>> >>> 3) I have made checks on what went into 3.10 post 3.12 release and if >>> these fixes are already included in 3.12 branch, then status on this is >>> *green* >>> as all fixes ported to 3.10, are ported to 3.12 as well. >>> >>> @Mlind >>> >>> IMO https://review.gluster.org/19659 is like a minor feature to me. Can >>> please provide a justification for why it need to include in 3.12 stable >>> release? >>> >>> And please rebase the change as well >>> >>> @Raghavendra >>> >>> The smoke failed for https://review.gluster.org/#/c/19818/. Can please >>> check the same? >>> >>> Thanks, >>> Jiffin >>> >>> [1] Release bug tracker: >>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.8 >>> >>> [2] 3.12 review dashboard: >>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> >> http://lists.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] New 3.12.7 possible split-brain on replica 3
Hello, Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. What can I do to fix that issue? Best regards, Mabi [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=) [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Here would be also the corresponding log entries on a gluster node brick log file: [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] Hope that helps to find out the issue. ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:37 AM, mabi <m...@protonmail.ch> wrote: > > > Hello, > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) > cluster to 3.12.7 and this morning I got a warning that 9 files on one of my > volumes are not synced. Ineeded checking that volume with a "volume heal > info" shows that the third node (the arbitrer node) has 9 files to be healed > but are not being healed automatically. > > All nodes were always online and there was no network interruption so I am > wondering if this might not really be a split-brain issue but something else. > > I found some interesting log entries on the client log file > (/var/log/glusterfs/myvol-private.log) which I have included below in this > mail. It looks like some renaming has gone wrong because a directory is not > empty. > > For your information I have upgraded my GlusterFS in offline mode and the > upgrade went smoothly. > > What can I do to fix that issue? > > Best regards, > > Mabi > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] > 0-myvol-private-dht: renaming > /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip > (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => > /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip > (hash=myvol-private-replicate-0/cache=) > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] > [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote > operation failed [Directory not empty] > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] > [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: > remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c > (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] > [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: > remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c > (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] > [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote > operation failed [Directory not empty] > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-myvol-private-replicate-0: performing metadata selfheal on > 1cc6facf-eca5-481c-a905-7a39faa25156 > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. > sources=[2] sinks=0 1 > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] > 0-myvol-private-replicate-0: performing entry selfheal on > 1cc6facf-eca5-481c-a905-7a39faa25156 > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-myvol-private-replicate-0: performing metadata selfheal on > 13880e8c-13da-442f-8180-fa40b6f5327c > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. > sources=[2] sinks=0 1 > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] > sinks=0 1 > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] > 0-myvol-private-replicate-0: performing entry selfheal on > 13880e8c-13da-442f-8180-fa40b6f5327c > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] > [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: > Failing READ on gfid a4c
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE2: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729dInode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0s trusted.afr.myvol-private-client-0=0sAQAA trusted.afr.myvol-private-client-1=0sAQAA trusted.bit-rot.version=0sAgBavUW2AAGBaA== trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:49 AM, mabi <m...@protonmail.ch> wrote: > > > Here would be also the corresponding log entries on a gluster node brick log > file: > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] > [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: > removing gfid2path xattr failed on > /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: > key = trusted.gfid2path.2529bb66b56be110 [No data available] > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] > 0-myvol-private-posix: opendir failed on > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE > [No such file or directory] > > Hope that helps to find out the issue. > > ‐‐‐ Original Message ‐‐‐ > > On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote: > > > Hello, > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) > > cluster to 3.12.7 and this morning I got a warning that 9 files on one of > > my volumes are not synced. Ineeded checking that volume with a "volume heal > > info" shows that the third node (the arbitrer node) has 9 files to be > > healed but are not being healed automatically. > > > > All nodes were always online and there was no network interruption so I am > > wondering if this might not really be a split-brain issue but something > > else. > > > > I found some interesting log entr
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:46 PM, Ravishankar N <ravishan...@redhat.com> wrote: > > > On 04/09/2018 05:09 PM, mabi wrote: > > > Thanks Ravi for your answer. > > > > Stupid question but how do I delete the trusted.afr xattrs on this brick? > > > > And when you say "this brick", do you mean the brick on the arbitrer node > > (node 3 in my case)? > > Sorry I should have been clearer. Yes the brick on the 3rd node. > > `setfattr -x trusted.afr.myvol-private-client-0 > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > `setfattr -x trusted.afr.myvol-private-client-1 > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > After doing this for all files, run 'gluster volume heal `. > > HTH, > > Ravi > > > ‐‐‐ Original Message ‐‐‐ > > > > On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote: > > > > > On 04/09/2018 04:36 PM, mabi wrote: > > > > > > > As I was suggested in the past by this mailing list a now ran a stat > > > > and getfattr on one of the problematic files on all nodes and at the > > > > end a stat on the fuse mount directly. The output is below: > > > > > > > > NODE1: > > > > > > > > STAT: > > > > > > > > File: > > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > > > > > NODE2: > > > > > > > > STAT: > > > > > > > > File: > > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > > > > > NODE3: > > > > > > > > STAT: > > > > > > > > File: > > > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > > > > &g
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:24 PM, Ravishankar N <ravishan...@redhat.com> wrote: > > > On 04/09/2018 04:36 PM, mabi wrote: > > > As I was suggested in the past by this mailing list a now ran a stat and > > getfattr on one of the problematic files on all nodes and at the end a stat > > on the fuse mount directly. The output is below: > > > > NODE1: > > > > STAT: > > > > File: > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > NODE2: > > > > STAT: > > > > File: > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > NODE3: > > > > STAT: > > > > File: > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.afr.dirty=0s > > > > trusted.afr.myvol-private-client-0=0sAQAA > > > > trusted.afr.myvol-private-client-1=0sAQAA > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > launch heal, that should fix it. But the file seems to have no content > > on both data bricks as well, so you might want to check if that was > > expected. > > -Ravi > > > trusted.bit-rot.version=0sAgBavUW2AAGBaA== > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > CLIENT GLUSTER MOUNT: > > > > STAT: > > > > File: > > '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > > > > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > > > > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > > > > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > > > > Access: 2
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thanks for clarifying that point. Does this mean that the fix for this bug will get backported to the next 3.12 release? ‐‐‐ Original Message ‐‐‐ On April 9, 2018 2:31 PM, Ravishankar N <ravishan...@redhat.com> wrote: > > > On 04/09/2018 05:54 PM, Dmitry Melekhov wrote: > > > 09.04.2018 16:18, Ravishankar N пишет: > > > > > On 04/09/2018 05:40 PM, mabi wrote: > > > > > > > Again thanks that worked and I have now no more unsynched files. > > > > > > > > You mentioned that this bug has been fixed in 3.13, would it be > > > > > > > > possible to backport it to 3.12? I am asking because 3.13 is not a > > > > > > > > long-term release and as such I would not like to have to upgrade to > > > > > > > > 3.13. > > > > > > I don't think there will be another 3.12 release. > > > > Why not? It is LTS, right? > > My bad. Just checked the schedule [1], and you are right. It is LTM. > > [1] https://www.gluster.org/release-schedule/ > > > Gluster-users mailing list > > > > Gluster-users@gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hi, Now that this issue has happened a few times I noticed a few things which might be helpful for debugging: - This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly. - It does not seem to happen with Nextcloud installation which does not have server side encryption enabled. - When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections. - All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant). If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running. Regards, M. ‐‐‐ Original Message ‐‐‐ On May 25, 2018 5:55 PM, mabi wrote: > > > Thanks Ravi. Let me know when you have time to have a look. It sort of > happens around once or twice per week but today it was 24 files in one go > which are unsynched and where I need to manually reset the xattrs on the > arbiter node. > > By the way on this volume I use quotas which I set on specifc directories, I > don't know if this is relevant or not but thought I would just mention. > > ‐‐‐ Original Message ‐‐‐ > > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote: > > > On 05/23/2018 12:47 PM, mabi wrote: > > > > > Hello, > > > > > > I just wanted to ask if you had time to look into this bug I am > > > encountering and if there is anything else I can do? > > > > > > For now in order to get rid of these 3 unsynched files shall I do the > > > same method that was suggested to me in this thread? > > > > Sorry Mabi, I haven't had a chance to dig deeper into this. The > > > > workaround of resetting xattrs should be fine though. > > > > Thanks, > > > > Ravi > > > > > Thanks, > > > > > > Mabi > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: > > > > > > > Hi Ravi, > > > > > > > > Please fine below the answers to your questions > > > > > > > > 1. I have never touched the cluster.quorum-type option. Currently it > > > > is set as following for this volume: > > > > > > > > Option Value > > > > > > > > > > > > cluster.quorum-type none > > > > > > > > 2. The .shareKey files are not supposed to be empty. They should be > > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in > > > > a position to say why it is in this specific case only 0 bytes and if > > > > it is the fault of the software (Nextcloud) or GlusterFS. I can just > > > > say here that I have another file server which is a simple NFS server > > > > with another Nextcloud installation and there I never saw any 0 bytes > > > > .shareKey files being created. > > > > > > > > 3. It seems to be quite random and I am not the person who uses the > > > > Nextcloud software so I can't say what it was doing at that specific > > > > time but I guess uploading files or moving files around. Basically I > > > > use GlusterFS to store the files/data of the Nextcloud web application > > > > where I have it mounted using a fuse mount (mount -t glusterfs). > > > > > > > > > > > > Regarding the logs I have attached the mount log file from the client > > > > and below are the relevant log entries from the brick log file of all 3 > > > > nodes. Let me know if you need any other log files. Also if you know > > > > any "log file sanitizer tool" which can replace sensitive file names > > > > with random file names in log files that would like to use it as right > > > > now I have to do that manually. > > > > > > > > NODE 1 brick log: > > > > > > > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] > > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on > > > > /data/myvol-private/brick/cloud/data/ad
[Gluster-users] GlusterFS 4.1.x deb packages missing for Debian 8 (jessie)
Hello, I just upgraded all my Debian 9 (stretch) GlusterFS servers from 3.12.14 to 4.1.5 but unfortunately my GlusterFS clients are all Debian 8 (jessie) machines and there are no single GlusterFS 4.1.x package available for Debian 8 as I found out here: https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/ May I kindly ask the GlusterFS packaging team or the person responsible for this task to please also provide the packages for Debian 8? Right now I am running a GlusterFS 4.1.5 servers with GlusterFS 3.12.14 clients (FUSE mount). Could this create any problems or is not unsafe? I did not upgrade the op-version on the server yet. Thank you very much in advance. Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 4.1.x deb packages missing for Debian 8 (jessie)
Anyone? I would really like to be able to install GlusterFS 4.1.x on Debian 8 (jessie). This version of Debian 8 is still widely in use and IMHO there should be a GlusterFS package for it. Many thanks in advance for your consideration. ‐‐‐ Original Message ‐‐‐ On Friday, October 19, 2018 10:58 PM, mabi wrote: > Hello, > > I just upgraded all my Debian 9 (stretch) GlusterFS servers from 3.12.14 to > 4.1.5 but unfortunately my GlusterFS clients are all Debian 8 (jessie) > machines and there are no single GlusterFS 4.1.x package available for Debian > 8 as I found out here: > > https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/ > > May I kindly ask the GlusterFS packaging team or the person responsible for > this task to please also provide the packages for Debian 8? > > Right now I am running a GlusterFS 4.1.5 servers with GlusterFS 3.12.14 > clients (FUSE mount). Could this create any problems or is not unsafe? I did > not upgrade the op-version on the server yet. > > Thank you very much in advance. > > Best regards, > Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Who is the package maintainer for GlusterFS 4.1?
Hello, I would like to know how I can contact the package maintainer for the GluserFS 4.1.x packages? I have noticed that Debian 8 (jessie) is missing here: https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/ Thank you very much in advance. Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Who is the package maintainer for GlusterFS 4.1?
Hi Kaleb Thank you for your answer. Now I understand why there are no packages for Debian 8. Nevertheless I just need the client package for Debian 8 clients (glusterfs-client + glusterfs-common packages) not the server package. I guess here that the client package do not require golang. If this is the case would it be possible to have just the glusterfs 4.1 client package available for Debian 8? Best regards, M. ‐‐‐ Original Message ‐‐‐ On Monday, October 29, 2018 1:44 PM, Kaleb S. KEITHLEY wrote: > On 10/29/18 6:31 AM, mabi wrote: > > > Hello, > > I would like to know how I can contact the package maintainer for the > > GluserFS 4.1.x packages? > > I have noticed that Debian 8 (jessie) is missing here: > > https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.5/Debian/ > > Thank you very much in advance. > > Community GlusterFS packages are built by multiple volunteers. > > GlusterFS 4.0, 4.1, and 5.0 packages aren't missing; they have never > been built for Debian 8 jessie. One reason is that jessie doesn't have a > new enough golang compiler (even in backports) to build glusterd2. > > If you want to build packages without glusterd2 for jessie the packaging > files are at https://github.com/gluster/glusterfs-debian. > > The distributions that packages are built for are listed at > https://docs.gluster.org/en/latest/Install-Guide/Community_Packages/ > History for this page is in github at > https://github.com/gluster/glusterdocs/blob/master/docs/Install-Guide/Community_Packages.md > > --- > > Kaleb > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Is it possible the problem I encounter described below is caused by the following bug, introduced in 4.1.5: Bug 1637953 - data-self-heal in arbiter volume results in stale locks. https://bugzilla.redhat.com/show_bug.cgi?id=1637953 Regards, M. ‐‐‐ Original Message ‐‐‐ On Wednesday, October 31, 2018 11:13 AM, mabi wrote: > Hello, > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and > currently have a volume with around 27174 files which are not being healed. > The "volume heal info" command shows the same 27k files under the first node > and the second node but there is nothing under the 3rd node (arbiter). > > I already tried running a "volume heal" but none of the files got healed. > > In the glfsheal log file for that particular volume the only error I see is a > few of these entries: > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = > 1800 for 127.0.1.1:49152 > > and then a few of these warnings: > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) > [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) > [0x7f2a798e8a84] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > the glustershd.log file shows the following: > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = > 1800 for 127.0.1.1:49152 > [2018-10-31 10:10:52.502502] E [MSGID: 114031] > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: > remote operation failed [Transport endpoint is not connected] > > any idea what could be wrong here? > > Regards, > Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] quota: error returned while attempting to connect to host:(null), port:0
I also noticed the following error in the glusterd.log file today related to quota: [2018-11-01 13:32:06.159865] E [MSGID: 101042] [compat.c:597:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/myvol-private_quota_limit/ [2018-11-01 13:32:06.160251] E [MSGID: 106363] [glusterd-utils.c:12556:glusterd_remove_auxiliary_mount] 0-management: umount on /var/run/gluster/myvol-private_quota_limit/ failed, reason : Success Something must be wrong with the quotas? ‐‐‐ Original Message ‐‐‐ On Tuesday, October 30, 2018 6:24 PM, mabi wrote: > Hello, > > Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I > see quite a lot of the following error message in the brick log file for one > of my volumes where I have quota enabled: > > [2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-myvol-private-quota: error returned while attempting to connect to > host:(null), port:0 > > Is this a bug? should I file a bug report? or does anyone know what is wrong > here maybe with my system? > > Best regards, > Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] quota: error returned while attempting to connect to host:(null), port:0
Hello, Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I see quite a lot of the following error message in the brick log file for one of my volumes where I have quota enabled: [2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-myvol-private-quota: error returned while attempting to connect to host:(null), port:0 Is this a bug? should I file a bug report? or does anyone know what is wrong here maybe with my system? Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] quota: error returned while attempting to connect to host:(null), port:0
I also noticed in the quotad.log file a lot of the following error messages: The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument]" repeated 107 times between [2018-10-31 08:00:27.718645] and [2018-10-31 08:02:04.476875] The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'volume-uuid' is not sent on wire [Invalid argument]" repeated 107 times between [2018-10-31 08:00:27.718696] and [2018-10-31 08:02:04.476876] [2018-10-31 08:02:14.629667] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument] [2018-10-31 08:02:14.629746] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'volume-uuid' is not sent on wire [Invalid argument] Maybe this is related... ‐‐‐ Original Message ‐‐‐ On Tuesday, October 30, 2018 6:24 PM, mabi wrote: > Hello, > > Since I upgraded my 3-node (with arbiter) GlusreFS from 3.12.14 to 4.1.5 I > see quite a lot of the following error message in the brick log file for one > of my volumes where I have quota enabled: > > [2018-10-21 05:03:25.158311] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-myvol-private-quota: error returned while attempting to connect to > host:(null), port:0 > > Is this a bug? should I file a bug report? or does anyone know what is wrong > here maybe with my system? > > Best regards, > Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Hello, I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and currently have a volume with around 27174 files which are not being healed. The "volume heal info" command shows the same 27k files under the first node and the second node but there is nothing under the 3rd node (arbiter). I already tried running a "volume heal" but none of the files got healed. In the glfsheal log file for that particular volume the only error I see is a few of these entries: [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800 for 127.0.1.1:49152 and then a few of these warnings: [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] the glustershd.log file shows the following: [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800 for 127.0.1.1:49152 [2018-10-31 10:10:52.502502] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: remote operation failed [Transport endpoint is not connected] any idea what could be wrong here? Regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Ravi, I actually restarted glustershd by unmounting my volume on the clients, stopping and starting the volume on the cluster and re-mounting it on the clients yesterday evening and it managed to get around 1500~ files cleared from the "volume heal info" output. So I am down now to around ~25k more files to heal. While restarting the volume I saw the following log entries in the brick log file: [2018-11-02 17:51:07.078738] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-myvol-private-server: releasing lock on da4f31fb-ac53-4d78-a633-f0046ac3ebcc held by {client=0x7fd48400c160, pid=-6 lk-owner=b0d405e0167f} What also bothers me is that if I run a manual "volume heal" nothing happens except the following log entry in glusterd log: [2018-11-03 06:32:16.033214] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: error returned while attempting to connect to host:(null), port:0 That does not seem normal... what do you think? ‐‐‐ Original Message ‐‐‐ On Saturday, November 3, 2018 1:31 AM, Ravishankar N wrote: > Mabi, > > If bug 1637953 is what you are experiencing, then you need to follow the > workarounds mentioned in > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html. > Can you see if this works? > > -Ravi > > On 11/02/2018 11:40 PM, mabi wrote: > > > I tried again to manually run a heal by using the "gluster volume heal" > > command because still not files have been healed and noticed the following > > warning in the glusterd.log file: > > [2018-11-02 18:04:19.454702] I [MSGID: 106533] > > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: > > Received heal vol req for volume myvol-private > > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] > > 0-glustershd: error returned while attempting to connect to host:(null), > > port:0 > > It looks like the glustershd can't connect to "host:(null)", could that be > > the reason why there is no healing taking place? if yes why do I see here > > "host:(null)"? and what needs fixing? > > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5. > > I really would appreciate some help here, I suspect being an issue with > > GlusterFS 4.1.5. > > Thank you in advance for any feedback. > > ‐‐‐ Original Message ‐‐‐ > > On Wednesday, October 31, 2018 11:13 AM, mabi m...@protonmail.ch wrote: > > > > > Hello, > > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and > > > currently have a volume with around 27174 files which are not being > > > healed. The "volume heal info" command shows the same 27k files under the > > > first node and the second node but there is nothing under the 3rd node > > > (arbiter). > > > I already tried running a "volume heal" but none of the files got healed. > > > In the glfsheal log file for that particular volume the only error I see > > > is a few of these entries: > > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] > > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > > > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = > > > 1800 for 127.0.1.1:49152 > > > and then a few of these warnings: > > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] > > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) > > > [0x7f2a6dff434a] > > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] > > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) > > > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > > the glustershd.log file shows the following: > > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] > > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > > > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout > > > = 1800 for 127.0.1.1:49152 > > > [2018-10-31 10:10:52.502502] E [MSGID: 114031] > > > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] > > > 0-myvol-private-client-0: remote operation failed [Transport endpoint is > > > not connected] > > > any idea what could be wrong here? > > > Regards, > > > Mabi > > > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Ravi (or anyone else who can help), I now have even more files which are pending for healing. Here is the output of a "volume heal info summary": Brick node1:/data/myvol-private/brick Status: Connected Total Number of entries: 49845 Number of entries in heal pending: 49845 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick node2:/data/myvol-private/brick Status: Connected Total Number of entries: 26644 Number of entries in heal pending: 26644 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick node3:/srv/glusterfs/myvol-private/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Should I try to set the "cluster.data-self-heal" parameter of that volume to "off" as mentioned in the bug? And by doing that, does it mean that my files pending heal are in danger of being lost? Also is it dangerous to leave "cluster.data-self-heal" to off? ‐‐‐ Original Message ‐‐‐ On Saturday, November 3, 2018 1:31 AM, Ravishankar N wrote: > Mabi, > > If bug 1637953 is what you are experiencing, then you need to follow the > workarounds mentioned in > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html. > Can you see if this works? > > -Ravi > > On 11/02/2018 11:40 PM, mabi wrote: > > > I tried again to manually run a heal by using the "gluster volume heal" > > command because still not files have been healed and noticed the following > > warning in the glusterd.log file: > > [2018-11-02 18:04:19.454702] I [MSGID: 106533] > > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: > > Received heal vol req for volume myvol-private > > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] > > 0-glustershd: error returned while attempting to connect to host:(null), > > port:0 > > It looks like the glustershd can't connect to "host:(null)", could that be > > the reason why there is no healing taking place? if yes why do I see here > > "host:(null)"? and what needs fixing? > > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5. > > I really would appreciate some help here, I suspect being an issue with > > GlusterFS 4.1.5. > > Thank you in advance for any feedback. > > ‐‐‐ Original Message ‐‐‐ > > On Wednesday, October 31, 2018 11:13 AM, mabi m...@protonmail.ch wrote: > > > > > Hello, > > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and > > > currently have a volume with around 27174 files which are not being > > > healed. The "volume heal info" command shows the same 27k files under the > > > first node and the second node but there is nothing under the 3rd node > > > (arbiter). > > > I already tried running a "volume heal" but none of the files got healed. > > > In the glfsheal log file for that particular volume the only error I see > > > is a few of these entries: > > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] > > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > > > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = > > > 1800 for 127.0.1.1:49152 > > > and then a few of these warnings: > > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] > > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) > > > [0x7f2a6dff434a] > > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] > > > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) > > > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > > the glustershd.log file shows the following: > > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] > > > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > > > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout > > > = 1800 for 127.0.1.1:49152 > > > [2018-10-31 10:10:52.502502] E [MSGID: 114031] > > > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] > > > 0-myvol-private-client-0: remote operation failed [Transport endpoint is > > > not connected] > > > any idea what could be wrong here? > > > Regards, > > > Mabi > > > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Ravi, I did not yet modify the cluster.data-self-heal parameter to off because in the mean time node2 of my cluster had a memory shortage (this node has 32 GB of RAM) and as such I had to reboot it. After that reboot all locks got released and there are no more files left to heal on that volume. So the reboot of node2 did the trick (but this still seems to be a bug). Now on another volume of this same cluster I have a total of 8 files (4 of them being directories) unsynced from node1 and node3 (arbiter) as you can see below: Brick node1:/data/myvol-pro/brick /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir Status: Connected Number of entries: 4 Brick node2:/data/myvol-pro/brick Status: Connected Number of entries: 0 Brick node3:/srv/glusterfs/myvol-pro/brick /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir Status: Connected Number of entries: 4 If I check the "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" with an "ls -l" directory on the client (gluster fuse mount) I get the following garbage: drwxr-xr-x 4 www-data www-data 4096 Nov 5 14:19 . drwxr-xr-x 31 www-data www-data 4096 Nov 5 14:23 .. d? ? ?? ?? le_dir I checked on the nodes and indeed node1 and node3 have the same directory from the time 14:19 but node2 has a directory from the time 14:12. Again here the self-heal daemon doesn't seem to be doing anything... What do you recommend me to do in order to heal these unsynced files? ‐‐‐ Original Message ‐‐‐ On Monday, November 5, 2018 2:42 AM, Ravishankar N wrote: > > > On 11/03/2018 04:13 PM, mabi wrote: > > > Ravi (or anyone else who can help), I now have even more files which are > > pending for healing. > > If the count is increasing, there is likely a network (disconnect) > problem between the gluster clients and the bricks that needs fixing. > > > Here is the output of a "volume heal info summary": > > Brick node1:/data/myvol-private/brick > > Status: Connected > > Total Number of entries: 49845 > > Number of entries in heal pending: 49845 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > Brick node2:/data/myvol-private/brick > > Status: Connected > > Total Number of entries: 26644 > > Number of entries in heal pending: 26644 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > Brick node3:/srv/glusterfs/myvol-private/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > Should I try to set the "cluster.data-self-heal" parameter of that volume > > to "off" as mentioned in the bug? > > Yes, as mentioned in the workaround in the thread that I shared. > > > And by doing that, does it mean that my files pending heal are in danger of > > being lost? > > No. > > > Also is it dangerous to leave "cluster.data-self-heal" to off? > > No. This is only disabling client side data healing. Self-heal daemon > would still heal the files. > -Ravi > > > ‐‐‐ Original Message ‐‐‐ > > On Saturday, November 3, 2018 1:31 AM, Ravishankar N ravishan...@redhat.com > > wrote: > > > > > Mabi, > > > If bug 1637953 is what you are experiencing, then you need to follow the > > > workarounds mentioned in > > > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html. > > > Can you see if this works? > > > -Ravi > > > On 11/02/2018 11:40 PM, mabi wrote: > > > > > > > I tried again to manually run a heal by using the "gluster volume heal" > > > > command because still not files have been healed and noticed the > > > > following warning in the glusterd.log file: > > > > [2018-11-02 18:04:19.454702] I [MSGID: 106533] > > > > [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] > > > > 0-management: Received heal vol req for volume myvol-private > > > > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] > > > > 0-glustershd: error returned while attempting to connect to > > > > host:(null), port:0 > > > > It looks like the glustershd can't connect to "host:(null)", could that > > > > be the reason why there is no healing taking place? if yes why do I see > > > > here "host:(null)"? a
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
I tried again to manually run a heal by using the "gluster volume heal" command because still not files have been healed and noticed the following warning in the glusterd.log file: [2018-11-02 18:04:19.454702] I [MSGID: 106533] [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume myvol-private [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: error returned while attempting to connect to host:(null), port:0 It looks like the glustershd can't connect to "host:(null)", could that be the reason why there is no healing taking place? if yes why do I see here "host:(null)"? and what needs fixing? This seeem to have happened since I upgraded from 3.12.14 to 4.1.5. I really would appreciate some help here, I suspect being an issue with GlusterFS 4.1.5. Thank you in advance for any feedback. ‐‐‐ Original Message ‐‐‐ On Wednesday, October 31, 2018 11:13 AM, mabi wrote: > Hello, > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and > currently have a volume with around 27174 files which are not being healed. > The "volume heal info" command shows the same 27k files under the first node > and the second node but there is nothing under the 3rd node (arbiter). > > I already tried running a "volume heal" but none of the files got healed. > > In the glfsheal log file for that particular volume the only error I see is a > few of these entries: > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = > 1800 for 127.0.1.1:49152 > > and then a few of these warnings: > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) > [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) > [0x7f2a798e8a84] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) > [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > the glustershd.log file shows the following: > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] > 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) > op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = > 1800 for 127.0.1.1:49152 > [2018-10-31 10:10:52.502502] E [MSGID: 114031] > [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: > remote operation failed [Transport endpoint is not connected] > > any idea what could be wrong here? > > Regards, > Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Thursday, November 8, 2018 11:05 AM, Ravishankar N wrote: > It is not a split-brain. Nodes 1 and 3 have xattrs indicating a pending > entry heal on node2 , so heal must have happened ideally. Can you check > a few things? > - Is there any disconnects between each of the shds and the brick > processes (check via statedump or look for disconnect messages in > glustershd.log). Does restarting shd via a `volume start force` solve > the problem? Yes there is one disconnect at 14:21 (UTC 13:21) because node2 ran out of memory (although it has 32 GB of RAM) and I had to reboot it. Here are the relevant log entries taken from glustershd.log on node1: [2018-11-05 13:21:16.284239] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-myvol-pro-client-1: server 192.168.10.33:49154 has not responded in the last 42 seconds, disconnecting. [2018-11-05 13:21:16.284385] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-myvol-pro-client-1: disconnected from myvol-pro-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-11-05 13:21:16.284889] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-myvol-pro-client-1: socket disconnected I also just ran a "volume start force" and saw that the glustershd processes got restarted on all 3 nodes but that did not trigger any healing. There are still the same amount of files/dirs pending heal... > - Is the symlink pointing to oc_dir present inside .glusterfs/25/e2 in > all 3 bricks? They are yes for node1 and node3 but node2 there is no such symlink... I hope that helps to debug the issue further, else please let me know if you need more info ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Dear Ravi, Thank you for your answer. I will start first by sending you below the getfattr from the first entry which does not get healed (it is in fact a directory). It is the following path/dir from the output of one of my previous mails: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir # NODE 1 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00030003 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.glusterfs.dht=0x0001 # NODE 2 trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a trusted.glusterfs.dht=0x0001 # NODE 3 (arbiter) trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00030003 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.glusterfs.dht=0x0001 Notice here that node 2 does not seem to have any AFR attributes which must be problematic. Also that specific directory on my node 2 has the oldest timestamp (14:12) where as that same directory on node 1 and 3 have 14:19 as timestamp. I did run "volume heal myvol-pro" and on the console it shows: Launching heal operation to perform index self heal on volume myvol-pro has been successful Use heal info commands to check status. but then in the glustershd.log file of both 3 nodes there has been nothing new logged. The log file cmd_history.log shows: [2018-11-08 07:20:24.481603] : volume heal myvol-pro : SUCCESS and glusterd.log: [2018-11-08 07:20:24.474032] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: error returned while attempting to connect to host:(null), port:0 That's it... To me it looks like a split-brain but GlusterFS does not report it as split-brain and neither does any self-heal on it. What do you think? Regards, M. ‐‐‐ Original Message ‐‐‐ On Thursday, November 8, 2018 5:00 AM, Ravishankar N wrote: > Can you share the getfattr output of all 4 entries from all 3 bricks? > > Also, can you tailf glustershd.log on all nodes and see if anything is > logged for these entries when you run 'gluster volume heal $volname'? > > Regards, > > Ravi > > On 11/07/2018 01:22 PM, mabi wrote: > > > To my eyes this specific case looks like a split-brain scenario but the > > output of "volume info split-brain" does not show any files. Should I still > > use the process for split-brain files as documented in the glusterfs > > documentation? or what do you recommend here? > > ‐‐‐ Original Message ‐‐‐ > > On Monday, November 5, 2018 4:36 PM, mabi m...@protonmail.ch wrote: > > > > > Ravi, I did not yet modify the cluster.data-self-heal parameter to off > > > because in the mean time node2 of my cluster had a memory shortage (this > > > node has 32 GB of RAM) and as such I had to reboot it. After that reboot > > > all locks got released and there are no more files left to heal on that > > > volume. So the reboot of node2 did the trick (but this still seems to be > > > a bug). > > > Now on another volume of this same cluster I have a total of 8 files (4 > > > of them being directories) unsynced from node1 and node3 (arbiter) as you > > > can see below: > > > Brick node1:/data/myvol-pro/brick > > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir > > > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360 > > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir > > > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf > > > Status: Connected > > > Number of entries: 4 > > > Brick node2:/data/myvol-pro/brick > > > Status: Connected > > > Number of entries: 0 > > > Brick node3:/srv/glusterfs/myvol-pro/brick > > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir > > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir > > > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf > > > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360 > > > Status: Connected > > > Number of entries: 4 > > > If I check the > > > "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" with an "ls > > > -l" directory on the client (gluster fuse mount) I get the following > > > garbage: > > > drwxr-xr-x 4 www-data www-data 4096 Nov 5 14:19 . > > > drwxr-xr-x 31 www-data www-data 4096 Nov 5 14:23 .. > > > d? ? ? ? ? ? le_dir > > > I checked on the nodes and indeed node1 and node3 have the same directory > > > from the time 14:19 but node2 has a directory from the time 14:12. > > > Again here the self-heal daemon doesn't seem to be doing anything... What > > >
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Friday, November 9, 2018 2:11 AM, Ravishankar N wrote: > Please re-create the symlink on node 2 to match how it is in the other > nodes and launch heal again. Check if this is the case for other entries > too. > -Ravi I can't create the missing symlink on node2 because the target (../../70/c8/70c894ca-422b-4bce-acf1-5cfb4669abbd/oc_dir) of that link does not exist. So basically the symlink and the target of that symlink are missing. Or shall I create a symlink to a non-existing target? ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Friday, November 9, 2018 2:11 AM, Ravishankar N wrote: > Please re-create the symlink on node 2 to match how it is in the other > nodes and launch heal again. Check if this is the case for other entries > too. > -Ravi Please ignore my previous mail, I was looking for a symlink with the GFID of node1 or node 3 on my node2 whereas I should have been looking with the GFID of node2 of course. I have now found the symlink on node2 pointing to that problematic directory and it looks like this: node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac node2# ls -la | grep d9ac19 lrwxrwxrwx 1 root root 66 Nov 5 14:12 d9ac192c-e85e-4402-af10-5551f587ed9a -> ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir When you say "re-create the symlink", do you mean I should delete the current symlink on node2 (d9ac192c-e85e-4402-af10-5551f587ed9a) and re-create it with the GFID which is used on my node 1 and node 3 like this? node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac node2# rm d9ac192c-e85e-4402-af10-5551f587ed9a node2# cd /data/myvol-pro/brick/.glusterfs/25/e2 node2# ln -s ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir 25e2616b-4fb6-4b2a-8945-1afc956fff19 Just want to make sure I understood you correctly before doing that. Could you please let me know if this is correct? Thanks ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Directory selfheal failed: Unable to form layout for directory on 4.1.5 fuse client
Hi, I just wanted to report that since I upgraded my GluterFS client from 3.12.14 to 4.1.5 on a Debian 9 client which uses FUSE mount I see a lot of these entries for many different directories in the mnt log file on the client: [2018-11-13 21:28:34.626351] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-myvol-pro-dht: Directory selfheal failed: Unable to form layout for directory /data/dir1/dir2 Never saw these info messages in the past. My server is a 3 node replica with arbiter running 4.1.5 on Debian 9. It looks like what I am seeing is this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1567100 Is it possible that this bug has not made it yet into a release? or is it maybe a regression? Regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Thursday, November 15, 2018 1:41 PM, Ravishankar N wrote: > Thanks, noted. One more query. Are there files inside each of these > directories? Or is it just empty directories? You will find below the content of each of these 3 directories taken the brick on node 1: i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 drwxr-xr-x 4 www-data www-data 4 Nov 5 14:19 . drwxr-xr-x 31 www-data www-data 31 Nov 5 14:23 .. drwxr-xr-x 3 www-data www-data 3 Nov 5 14:19 dir11 drwxr-xr-x 3 www-data www-data 3 Nov 5 14:19 another_dir ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/ drwxr-xr-x 3 www-data www-data 3 Nov 5 14:19 . drwxr-xr-x 4 www-data www-data 4 Nov 5 14:19 .. drwxr-xr-x 2 www-data www-data 4 Nov 5 14:19 oc_dir iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir drwxr-xr-x 2 www-data www-data 4 Nov 5 14:19 . drwxr-xr-x 3 www-data www-data 3 Nov 5 14:19 .. -rw-r--r-- 2 www-data www-data 32 Nov 5 14:19 fileKey -rw-r--r-- 2 www-data www-data 512 Nov 5 14:19 username.shareKey so as you see from the output above only the "oc_dir" directory has two files inside. > symlinks are only for dirs. For files, they would be hard links to the > actual files. So if stat > ../brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf gives you > a file, then you can use find -samefile to get the other hardlinks like so: > #cd /brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf > #find /brick -samefile aae4098a-1a71-4155-9cc9-e564b89957cf > > If it is a hardlink, then you can do a getfattr on > /brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf itself. > -Ravi Thank you for explaining this important part. So yes with your help I could find the filenames associated to these 2 GFIDs and guess what? they are the 2 files which listed in the output above of the "oc_dir" directory. Have a look at this: # find /data/myvol-pro/brick -samefile aae4098a-1a71-4155-9cc9-e564b89957cf /data/myvol-pro/brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf /data/myvol-pro/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey # find /data/myvol-pro/brick -samefile 3c92459b-8fa1-4669-9a3d-b38b8d41c360 /data/myvol-pro/brick/.glusterfs/3c/92/3c92459b-8fa1-4669-9a3d-b38b8d41c360 /data/myvol-pro/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey I hope that helps the debug further else let me know if you need anything else. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
To my eyes this specific case looks like a split-brain scenario but the output of "volume info split-brain" does not show any files. Should I still use the process for split-brain files as documented in the glusterfs documentation? or what do you recommend here? ‐‐‐ Original Message ‐‐‐ On Monday, November 5, 2018 4:36 PM, mabi wrote: > Ravi, I did not yet modify the cluster.data-self-heal parameter to off > because in the mean time node2 of my cluster had a memory shortage (this node > has 32 GB of RAM) and as such I had to reboot it. After that reboot all locks > got released and there are no more files left to heal on that volume. So the > reboot of node2 did the trick (but this still seems to be a bug). > > Now on another volume of this same cluster I have a total of 8 files (4 of > them being directories) unsynced from node1 and node3 (arbiter) as you can > see below: > > Brick node1:/data/myvol-pro/brick > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360 > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf > Status: Connected > Number of entries: 4 > > Brick node2:/data/myvol-pro/brick > Status: Connected > Number of entries: 0 > > Brick node3:/srv/glusterfs/myvol-pro/brick > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/le_dir > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360 > Status: Connected > Number of entries: 4 > > If I check the "/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/" > with an "ls -l" directory on the client (gluster fuse mount) I get the > following garbage: > > drwxr-xr-x 4 www-data www-data 4096 Nov 5 14:19 . > drwxr-xr-x 31 www-data www-data 4096 Nov 5 14:23 .. > d? ? ? ? ? ? le_dir > > I checked on the nodes and indeed node1 and node3 have the same directory > from the time 14:19 but node2 has a directory from the time 14:12. > > Again here the self-heal daemon doesn't seem to be doing anything... What do > you recommend me to do in order to heal these unsynced files? > > ‐‐‐ Original Message ‐‐‐ > On Monday, November 5, 2018 2:42 AM, Ravishankar N ravishan...@redhat.com > wrote: > > > On 11/03/2018 04:13 PM, mabi wrote: > > > > > Ravi (or anyone else who can help), I now have even more files which are > > > pending for healing. > > > > If the count is increasing, there is likely a network (disconnect) > > problem between the gluster clients and the bricks that needs fixing. > > > > > Here is the output of a "volume heal info summary": > > > Brick node1:/data/myvol-private/brick > > > Status: Connected > > > Total Number of entries: 49845 > > > Number of entries in heal pending: 49845 > > > Number of entries in split-brain: 0 > > > Number of entries possibly healing: 0 > > > Brick node2:/data/myvol-private/brick > > > Status: Connected > > > Total Number of entries: 26644 > > > Number of entries in heal pending: 26644 > > > Number of entries in split-brain: 0 > > > Number of entries possibly healing: 0 > > > Brick node3:/srv/glusterfs/myvol-private/brick > > > Status: Connected > > > Total Number of entries: 0 > > > Number of entries in heal pending: 0 > > > Number of entries in split-brain: 0 > > > Number of entries possibly healing: 0 > > > Should I try to set the "cluster.data-self-heal" parameter of that volume > > > to "off" as mentioned in the bug? > > > > Yes, as mentioned in the workaround in the thread that I shared. > > > > > And by doing that, does it mean that my files pending heal are in danger > > > of being lost? > > > > No. > > > > > Also is it dangerous to leave "cluster.data-self-heal" to off? > > > > No. This is only disabling client side data healing. Self-heal daemon > > would still heal the files. > > -Ravi > > > > > ‐‐‐ Original Message ‐‐‐ > > > On Saturday, November 3, 2018 1:31 AM, Ravishankar N > > > ravishan...@redhat.com wrote: > > > > > > > Mabi, > > > > If bug 1637953 is what you are experiencing, then you need to follow the > > > > workarounds mentioned in > > > > https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html. > > > > Can you see if this works? > > > > -Ravi > >
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Wednesday, November 14, 2018 5:34 AM, Ravishankar N wrote: > I thought it was missing which is why I asked you to create it. The > trusted.gfid xattr for any given file or directory must be same in all 3 > bricks. But it looks like that isn't the case. Are the gfids and the > symlinks for all the dirs leading to the parent dir of oc_dir same on > all nodes? (i.e evey directory in > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/)? I now checked the GFIDs of all directories leading back down to the parent dir (13 directories in total) and for node 1 and node 3 the GIFDs of all underlying directories match each other. On node 2 they are also all the same except for the two highest directories (".../dir11" and and ".../dir11/oc_dir"). It's exactly these two directories which are also listed in the "volume heal info" output under node 1 and node 2 and which do not get healed. For your reference I have pasted below the GFIDs for all underlying directories up to the parent directory and for all 3 nodes. I start at the top with the highest directory and at the bottom of the list is the parent directory (/data). # NODE 1 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 # /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd # /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11 trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 # ... trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 # ... trusted.gfid=0xa7d78519db61459399e01fad2badf3fb # /data/dir1/dir2 trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 # /data/dir1 trusted.gfid=0x2683990126724adbb6416b911180e62b # /data # NODE 2 trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a trusted.gfid=0x10ec1eb1c8544ff2a36c325681713093 trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 trusted.gfid=0xa7d78519db61459399e01fad2badf3fb trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 trusted.gfid=0x2683990126724adbb6416b911180e62b # NODE 3 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 trusted.gfid=0xa7d78519db61459399e01fad2badf3fb trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 trusted.gfid=0x2683990126724adbb6416b911180e62b > Let us see if the parents' gfids are the same before deleting anything. > Is the heal info still showing 4 entries? Please also share the getfattr > output of the the parent directory (i.e. dir11) . Yes, the heal info still shows the 4 entries but on node 1 the directory name is not shown anymore but just the GFID. This is the actual output of a "volume heal info": Brick node1:/data/myvol-pro/brick Status: Connected Number of entries: 4 Brick node2:/data/myvol-pro/brick Status: Connected Number of entries: 0 Brick node3:/srv/glusterfs/myvol-pro/brick /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11 /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir Status: Connected Number of entries: 4 What are the next steps in order to fix that? ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Thursday, November 15, 2018 5:57 AM, Ravishankar N wrote: > 1.Could you provide the getfattr output of the following 3 dirs from all > 3 nodes? > i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 > ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/ > iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir Sure, you will find below the getfattr output of all 3 directories from all 3 nodes. i)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 # NODE 1 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.glusterfs.dht=0x0001 # NODE 2 trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.glusterfs.dht=0x0001 # NODE 3 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.glusterfs.dht=0x0001 ii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/ # NODE 1 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00040003 trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd trusted.glusterfs.dht=0x0001 # NODE 2 trusted.gfid=0x10ec1eb1c8544ff2a36c325681713093 trusted.glusterfs.dht=0x0001 # NODE 3 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00040003 trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd trusted.glusterfs.dht=0x0001 iii)/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir # NODE 1 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00030003 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.glusterfs.dht=0x0001 # NODE 2 trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a trusted.glusterfs.dht=0x0001 # NODE 3 trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00030003 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.glusterfs.dht=0x0001 > 2. Do you know the file (or directory) names corresponding to the other > 2 gfids in heal info output, i.e > gfid:aae4098a-1a71-4155-9cc9-e564b89957cf > gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360 > Please share the getfattr output of them as well. Unfortunately no. I tried the trick of mounting the volume with the mount option "aux-gfid-mount" in order to find the filename corresponding to the GFID and then using the following getfattr command: getfattr -n trusted.glusterfs.pathinfo -e text /mnt/g/.gfid/aae4098a-1a71-4155-9cc9-e564b89957cf this gave me the following output: trusted.glusterfs.pathinfo="( ( ))" then if I check the ".../brick/.glusterfs/aa/e4/aae4098a-1a71-4155-9cc9-e564b89957cf" on node 1 or node 3 it does not have any symlink to a file. Or am I looking at the wrong place maybe or there is another trick in order to find the GFID->filename? Regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
‐‐‐ Original Message ‐‐‐ On Friday, November 16, 2018 5:14 AM, Ravishankar N wrote: > Okay, as asked in the previous mail, please share the getfattr output > from all bricks for these 2 files. I think once we have this, we can try > either 'adjusting' the the gfid and symlinks on node 2 for dir11 and > oc_dir or see if we can set afr xattrs on dir10 for self-heal to purge > everything under it on node 2 and recreate it using the other 2 nodes. And finally here is the output of a getfattr from both files from the 3 nodes: FILE 1: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8 trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 FILE 2: /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey NODE 1: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 NODE 2: trusted.afr.dirty=0x trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579 NODE 3: trusted.afr.dirty=0x trusted.afr.myvol-pro-client-1=0x00020001 trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 Thanks again in advance for your answer. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] S3-compatbile object storage on top of GlusterFS volume
Hello, First of all I was wondering if GlusterFS natively implements S3-compatbile object storage or if this is planned for the near future? I did not find anything in the documentation so I assume here that this is not the case. As an alternative I was thinking to use Zenko CloudServer (https://github.com/scality/cloudserver) which is a S3-compatbile object store implementation on top of a volume of my already existing GlusterFS cluster. Does anyone have experience with this case-scenario? If yes I would be interested to know how well this works and what software is recommended for this case. Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] op-version compatibility with older clients
Hello, I would like to know if by increasing the op-version of all my GlusterFS volumes from the its actual version 31202 to 40100 by using the following command: gluster volume set all op-version 40100 Will my clients using GlusterFS 3.12 client libgfapi and FUSE mount still be able the connect to my server and work correctly? I am running 4.1.5 on my GlusterFS server and I am asking because I still have a few clients on 3.12.14 which will need to stay longer on 3.12.14. Regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Thank you Ravi for your answer. I have now set the afr xattr as you suggested and I am running the "find . | xargs -d '\n' stat" on my gluster fuse mount for this volume. This volume has around 3 million of files and directories so it will take a long time to finish I suppose. Do I really need to run this find over the whole volume starting from its root? Note that I added the "-d '\n'" option in xargs in order to deal with filenames which have spaces inside. ‐‐‐ Original Message ‐‐‐ On Saturday, November 17, 2018 6:04 AM, Ravishankar N wrote: > Okay so for all files and dirs, node 2 seems to be the bad copy. Try the > following: > > 1. On both node 1 and node3, set theafr xattr for dir10: > setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001 > > /data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 > > 2. Fuse mount the volume temporarily in some location and from that > mount point, do a `find .|xargs stat >/dev/null` > > > 3. Run`gluster volume heal $volname` > > HTH, > Ravi > > On 11/16/2018 09:07 PM, mabi wrote: > > > And finally here is the output of a getfattr from both files from the 3 > > nodes: > > FILE 1: > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey > > NODE 1: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf > > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 > > NODE 2: > > trusted.afr.dirty=0x > > trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8 > > trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579 > > NODE 3: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf > > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 > > FILE 2: > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey > > NODE 1: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 > > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 > > NODE 2: > > trusted.afr.dirty=0x > > trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace > > trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579 > > NODE 3: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 > > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Good news, the stat on all files of my volume finished after running for over 6 hours and the 4 files (actually 2 directories and 2 files) are now finally all healed. I checked the 3 bricks and all have the correct data. On node 1 I also saw 4 healing log entries in glustershd.log log file. I did not even need to manually run a "volume heal" as it healed automatically. Now, I would really like to avoid this situation in the future, it's a pain for me and maybe also for you guys helping me ;-) Is this a bug or am I doing something wrong? How can I avoid this type of manual fixing in the future? Again a big thank you Ravi for your patience helping me out with this issue. ‐‐‐ Original Message ‐‐‐ On Saturday, November 17, 2018 6:04 AM, Ravishankar N wrote: > Okay so for all files and dirs, node 2 seems to be the bad copy. Try the > following: > > 1. On both node 1 and node3, set theafr xattr for dir10: > setfattr -n trusted.afr.myvol-pro-client-1 -v 0x00010001 > > /data/myvol-private/brick/data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10 > > 2. Fuse mount the volume temporarily in some location and from that > mount point, do a `find .|xargs stat >/dev/null` > > > 3. Run`gluster volume heal $volname` > > HTH, > Ravi > > On 11/16/2018 09:07 PM, mabi wrote: > > > And finally here is the output of a getfattr from both files from the 3 > > nodes: > > FILE 1: > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/fileKey > > NODE 1: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf > > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 > > NODE 2: > > trusted.afr.dirty=0x > > trusted.gfid=0x48ccb52b788f4361b33fad43157b8ea8 > > trusted.gfid2path.32a8dc56983f7b8f=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f66696c654b6579 > > NODE 3: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0xaae4098a1a7141559cc9e564b89957cf > > trusted.gfid2path.9a863b050c1975ed=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f66696c654b6579 > > FILE 2: > > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir/username.shareKey > > NODE 1: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 > > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 > > NODE 2: > > trusted.afr.dirty=0x > > trusted.gfid=0xae880a4f19824bc6a3baabe2e3c62ace > > trusted.gfid2path.0c0f97b97351b4af=0x64396163313932632d653835652d343430322d616631302d3535353166353837656439612f6a6d406d616765726c2e63682e73686172654b6579 > > NODE 3: > > trusted.afr.dirty=0x > > trusted.afr.myvol-pro-client-1=0x00020001 > > trusted.gfid=0x3c92459b8fa146699a3db38b8d41c360 > > trusted.gfid2path.510dd4750ef350f9=0x32356532363136622d346662362d346232612d383934352d316166633935366631392f6a6d406d616765726c2e63682e73686172654b6579 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Max length for filename
Hello, I saw this warning today in my fuse mount client log file: [2019-01-28 06:01:25.091232] W [fuse-bridge.c:565:fuse_entry_cbk] 0-glusterfs-fuse: 530594537: LOOKUP() /data/somedir0/files/-somdir1/dir2/dir3/some super long filename….mp3.TransferId1924513788.part => -1 (File name too long) and was actually wondering on GlusterFS what is the maximum length for a filename? I am using GlusterFS 4.1.6. Regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] quotad error log warnings repeated
Hello, I am running a 3 node (with arbiter) GlusterFS 4.1.6 cluster with one replicated volume where I have quotas enabled. Now I checked my quotad.log file on one of the nodes and can see a lot of these warning messages which are repeated a lot: The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument]" repeated 224 times between [2019-02-07 07:28:15.291923] and [2019-02-07 07:30:02.625004] The message "W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'volume-uuid' is not sent on wire [Invalid argument]" repeated 224 times between [2019-02-07 07:28:15.291949] and [2019-02-07 07:30:02.625004] [2019-02-07 07:30:07.747135] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'trusted.glusterfs.quota.size' is not sent on wire [Invalid argument] [2019-02-07 07:30:07.747164] W [MSGID: 101016] [glusterfs3.h:743:dict_to_xdr] 0-dict: key 'volume-uuid' is not sent on wire [Invalid argument] I can re-trigger these warning messages on demand for example by running $ gluster volume quota myvolume list Does anyone know if this is bad? is it a bug? and what can I do about it? Best regards, Mabi ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS 4.1.9 Debian stretch packages missing
Hello, I would like to upgrade my GlusterFS 4.1.8 cluster to 4.1.9 on my Debian stretch nodes. Unfortunately the packages are missing as you can see here: https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.9/Debian/stretch/amd64/apt/ As far as I know GlusterFS 4.1 is not yet EOL so I don't understand why the packages are missing... Maybe an error? Could please someone check? Thank you very much in advance. Best, M. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS FUSE client on BSD
Hello, Is there a way to mount a GlusterFS volume using FUSE on an BSD machine such as OpenBSD? If not, what is the alternative, I guess NFS? Regards, M. ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] writing to fuse device failed: No such file or directory
Hello, On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see quite a lot of the following error message repeatedly: [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe] (--> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a] (--> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33] (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory Both the server and clients are Debian 9. What exactly does this error message mean? And is it normal? or what should I do to fix that? Regards, Mabi Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] writing to fuse device failed: No such file or directory
‐‐‐ Original Message ‐‐‐ On Tuesday, March 3, 2020 6:11 AM, Hari Gowtham wrote: > I checked on the backport and found that this patch hasn't yet been > backported to any of the release branches. > If this is the fix, it would be great to have them backported for the next > release. Thanks to everyone who responded to my post. Now I wanted to ask if the fix to this bug will also be backported to GlusterFS 5? and if yes, will it be available in the next GlusterFS version 5.13? Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Announcing Gluster release 5.11
Dear Hari, Nearly 10 days after your announcement unfortunately the 5.11 Debian stretch packages are still missing: https://download.gluster.org/pub/gluster/glusterfs/5/5.11/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ Do you know when they will be available? or has this maybe been forgotten? Thank you very much in advance. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Wednesday, December 18, 2019 4:56 AM, Hari Gowtham wrote: > Hi, > > The Gluster community is pleased to announce the release of Gluster > 5.11 (packages available at [1]). > > Release notes for the release can be found at [2]. > > Major changes, features and limitations addressed in this release: > None > > Thanks, > Gluster community > > [1] Packages for 5.11: > https://download.gluster.org/pub/gluster/glusterfs/5/5.11/ > > [2] Release notes for 5.11: > https://docs.gluster.org/en/latest/release-notes/5.11/ > > -- > Regards, > Hari Gowtham. Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Announcing Gluster release 5.11
Thank you very much for your fast response and for adding the missing Debian packages. ‐‐‐ Original Message ‐‐‐ On Friday, December 27, 2019 10:36 AM, Shwetha Acharya wrote: > Hi Mabi, > > Glusterfs 5.11 Debian amd64 stretch packages are now available. > > Regards, > Shwetha > > On Fri, Dec 27, 2019 at 1:37 PM mabi wrote: > >> Dear Hari, >> >> Nearly 10 days after your announcement unfortunately the 5.11 Debian stretch >> packages are still missing: >> >> https://download.gluster.org/pub/gluster/glusterfs/5/5.11/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >> >> Do you know when they will be available? or has this maybe been forgotten? >> >> Thank you very much in advance. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> On Wednesday, December 18, 2019 4:56 AM, Hari Gowtham >> wrote: >> >>> Hi, >>> >>> The Gluster community is pleased to announce the release of Gluster >>> 5.11 (packages available at [1]). >>> >>> Release notes for the release can be found at [2]. >>> >>> Major changes, features and limitations addressed in this release: >>> None >>> >>> Thanks, >>> Gluster community >>> >>> [1] Packages for 5.11: >>> https://download.gluster.org/pub/gluster/glusterfs/5/5.11/ >>> >>> [2] Release notes for 5.11: >>> https://docs.gluster.org/en/latest/release-notes/5.11/ >>> >>> -- >>> Regards, >>> Hari Gowtham. >> >> >> >> Community Meeting Calendar: >> >> APAC Schedule - >> Every 2nd and 4th Tuesday at 11:30 AM IST >> Bridge: https://bluejeans.com/441850968 >> >> NA/EMEA Schedule - >> Every 1st and 3rd Tuesday at 01:00 PM EDT >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] writing to fuse device failed: No such file or directory
Hello, Now that GlusterFS 5.13 has been released, could someone let me know if this issue (see mail below) has been fixed in 5.13? Thanks and regards, Mabi ‐‐‐ Original Message ‐‐‐ On Monday, March 2, 2020 3:17 PM, mabi wrote: > Hello, > > On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see > quite a lot of the following error message repeatedly: > > [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe] > (--> > /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a] > (--> > /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33] > (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> > /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) > 0-glusterfs-fuse: writing to fuse device failed: No such file or directory > > Both the server and clients are Debian 9. > > What exactly does this error message mean? And is it normal? or what should I > do to fix that? > > Regards, > Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] writing to fuse device failed: No such file or directory
Dear Hari, Thank you for your answer. A few months ago when I reported this issue initially I was told that the fix would be backported to 5.x, at that time 5.x was not EOL. So I guess I should upgrade to 7 but reading this list it seems that version 7 has a few other open issues. Is it safe the use version 7 in production or should I better use version 6? And is it possible to upgrade from 5.11 directly to 7.5? Regards, Mabi ‐‐‐ Original Message ‐‐‐ On Tuesday, May 5, 2020 1:40 PM, Hari Gowtham wrote: > Hi, > > I don't see the above mentioned fix to be backported to any branch. > I have just cherry picked them for the release-6 and 7. > Release-5 has reached EOL and so, it won't have the fix. > Note: release 6 will have one more release and will be EOLed as well. > Release-8 is being worked on and it will have the fix as a part of the way > it's branched. > Once it gets merged, it should be available in the release-6 and 7. but I do > recommend switching from > the older branches to the newer ones (at least release-7 in this case). > > https://review.gluster.org/#/q/change:I510158843e4b1d482bdc496c2e97b1860dc1ba93 > > On Tue, May 5, 2020 at 11:52 AM mabi wrote: > >> Dear Artem, >> >> Thank you for your answer. If you still see these errors messages with >> GlusterFS 5.13 I suppose then that this bug fix has not been backported to >> 5.x. >> >> Could someone of the dev team please confirm? It was said on this list that >> this bug fix would be back ported to 5.x, so I am a bit surprised. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> On Monday, May 4, 2020 9:57 PM, Artem Russakovskii >> wrote: >> >>> I'm on 5.13, and these are the only error messages I'm still seeing (after >>> downgrading from the failed v7 update): >>> >>> [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] >>> (--> >>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] >>> (--> >>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> >>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: >>> writing to fuse device failed: No such file or directory >>> [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] >>> (--> >>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] >>> (--> >>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> >>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: >>> writing to fuse device failed: No such file or directory >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, [Android Police](http://www.androidpolice.com), [APK >>> Mirror](http://www.apkmirror.com/), Illogical Robot LLC >>> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR) >>> >>> On Mon, May 4, 2020 at 5:46 AM mabi wrote: >>> >>>> Hello, >>>> >>>> Now that GlusterFS 5.13 has been released, could someone let me know if >>>> this issue (see mail below) has been fixed in 5.13? >>>> >>>> Thanks and regards, >>>> Mabi >>>> >>>> ‐‐‐ Original Message ‐‐‐ >>>> On Monday, March 2, 2020 3:17 PM, mabi wrote: >>>> >>>>> Hello, >>>>> >>>>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see >>>>> quite a lot of the following error message repeatedly: >>>>> >>>>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>>>> (--> >>>>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe] >>>>> (--> >>>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a] >>>>> (--> >>>>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33] >>>>> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> >>>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) >
Re: [Gluster-users] writing to fuse device failed: No such file or directory
Dear Artem, Thank you for your answer. If you still see these errors messages with GlusterFS 5.13 I suppose then that this bug fix has not been backported to 5.x. Could someone of the dev team please confirm? It was said on this list that this bug fix would be back ported to 5.x, so I am a bit surprised. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Monday, May 4, 2020 9:57 PM, Artem Russakovskii wrote: > I'm on 5.13, and these are the only error messages I'm still seeing (after > downgrading from the failed v7 update): > > [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] (--> > /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] (--> > /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] (--> > /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> > /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: writing > to fuse device failed: No such file or directory > [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] (--> > /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] (--> > /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] (--> > /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> > /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: writing > to fuse device failed: No such file or directory > > Sincerely, > Artem > > -- > Founder, [Android Police](http://www.androidpolice.com), [APK > Mirror](http://www.apkmirror.com/), Illogical Robot LLC > [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR) > > On Mon, May 4, 2020 at 5:46 AM mabi wrote: > >> Hello, >> >> Now that GlusterFS 5.13 has been released, could someone let me know if this >> issue (see mail below) has been fixed in 5.13? >> >> Thanks and regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> On Monday, March 2, 2020 3:17 PM, mabi wrote: >> >>> Hello, >>> >>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see >>> quite a lot of the following error message repeatedly: >>> >>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>> (--> >>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f93d5c13cfe] >>> (--> >>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x789a)[0x7f93d331989a] >>> (--> >>> /usr/lib/x86_64-linux-gnu/glusterfs/5.11/xlator/mount/fuse.so(+0x7c33)[0x7f93d3319c33] >>> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f93d4e8f4a4] (--> >>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f93d46ead0f] ) >>> 0-glusterfs-fuse: writing to fuse device failed: No such file or directory >>> >>> Both the server and clients are Debian 9. >>> >>> What exactly does this error message mean? And is it normal? or what should >>> I do to fix that? >>> >>> Regards, >>> Mabi >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] writing to fuse device failed: No such file or directory
Hi everyone, So because upgrading introduces additional problems, does this means I should stick with 5.x even if it is EOL? Or what is a "safe" version to upgrade to? Regards, Mabi ‐‐‐ Original Message ‐‐‐ On Wednesday, May 6, 2020 2:44 AM, Artem Russakovskii wrote: > Hi Hari, > > Hmm, given how poorly our migration from 5.13 to 7.5 went, I am not sure how > I'd move forward with what you suggested at this point. > > Sincerely, > Artem > > -- > Founder, [Android Police](http://www.androidpolice.com), [APK > Mirror](http://www.apkmirror.com/), Illogical Robot LLC > [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR) > > On Tue, May 5, 2020 at 4:41 AM Hari Gowtham wrote: > >> Hi, >> >> I don't see the above mentioned fix to be backported to any branch. >> I have just cherry picked them for the release-6 and 7. >> Release-5 has reached EOL and so, it won't have the fix. >> Note: release 6 will have one more release and will be EOLed as well. >> Release-8 is being worked on and it will have the fix as a part of the way >> it's branched. >> Once it gets merged, it should be available in the release-6 and 7. but I do >> recommend switching from >> the older branches to the newer ones (at least release-7 in this case). >> >> https://review.gluster.org/#/q/change:I510158843e4b1d482bdc496c2e97b1860dc1ba93 >> >> On Tue, May 5, 2020 at 11:52 AM mabi wrote: >> >>> Dear Artem, >>> >>> Thank you for your answer. If you still see these errors messages with >>> GlusterFS 5.13 I suppose then that this bug fix has not been backported to >>> 5.x. >>> >>> Could someone of the dev team please confirm? It was said on this list that >>> this bug fix would be back ported to 5.x, so I am a bit surprised. >>> >>> Best regards, >>> Mabi >>> >>> ‐‐‐ Original Message ‐‐‐ >>> On Monday, May 4, 2020 9:57 PM, Artem Russakovskii >>> wrote: >>> >>>> I'm on 5.13, and these are the only error messages I'm still seeing (after >>>> downgrading from the failed v7 update): >>>> >>>> [2020-05-04 19:56:29.391121] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] >>>> (--> >>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] >>>> (--> >>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] >>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> >>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: >>>> writing to fuse device failed: No such file or directory >>>> [2020-05-04 19:56:29.400541] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7f0f9a5f324d] >>>> (--> >>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7f0f969d649a] >>>> (--> >>>> /usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7f0f969d67bb] >>>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7f0f99b434f9] (--> >>>> /lib64/libc.so.6(clone+0x3f)[0x7f0f9987bf2f] ) 0-glusterfs-fuse: >>>> writing to fuse device failed: No such file or directory >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, [Android Police](http://www.androidpolice.com), [APK >>>> Mirror](http://www.apkmirror.com/), Illogical Robot LLC >>>> [beerpla.net](http://beerpla.net/) | [@ArtemR](http://twitter.com/ArtemR) >>>> >>>> On Mon, May 4, 2020 at 5:46 AM mabi wrote: >>>> >>>>> Hello, >>>>> >>>>> Now that GlusterFS 5.13 has been released, could someone let me know if >>>>> this issue (see mail below) has been fixed in 5.13? >>>>> >>>>> Thanks and regards, >>>>> Mabi >>>>> >>>>> ‐‐‐ Original Message ‐‐‐ >>>>> On Monday, March 2, 2020 3:17 PM, mabi wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> On the FUSE clients of my GlusterFS 5.11 two-node replica+arbitrer I see >>>>>> quite a lot of the following error message repeatedly: >>>>>> >>>>>> [2020-03-02 14:12:40.297690] E [fuse-bridge.c:219:check_and_dump_fuse_W] >>>>>> (--> >
[Gluster-users] glustershd: EBADFD [File descriptor in bad state]
Hello, I have a GlusterFS 6.9 cluster with two nodes and one arbitrer node with a replica volume and currently there are two files and two directories stuck to be self-healed. Node 1 and 3 (arbitrer) have the files and directories on the brick but node 2 does not have the files and directories. Node1 glustershd log file shows the following warning message: [2020-10-09 14:18:54.006707] I [MSGID: 108026] [afr-self-heal-entry.c:898:afr_selfheal_entry_do] 0-myvol-replicate-0: performing entry selfheal on 4d520c69-2b18-4601-bad5-3c16c29188c1 [2020-10-09 14:18:54.007064] W [MSGID: 114061] [client-common.c:2968:client_pre_readdir_v2] 0-myvol-client-1: (4d520c69-2b18-4601-bad5-3c16c29188c1) remote_fd is -1. EBADFD [File descriptor in bad state] The FUSE mount client log file show the following error message: [2020-10-09 14:15:51.115856] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f9d0a0663bc] (--> /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7bba)[0x7f9d07743bba] (--> /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7d23)[0x7f9d07743d23] (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f9d092bd4a4] (--> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9d08b17d0f] ) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory I have no clue how this could have happened but as the GlusterFS self-heal daemon does not seem to be able to heal the two files and directories itself, I would like to know what I can do here to fix this? Thank you in advance for your help. Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glustershd: EBADFD [File descriptor in bad state]
Just wanted to mention that the 3 hours later the self heal daemon managed to heal the files. I don't understand why it took 3 hours but at least the affected two directories and files are now available on all nodes again ‐‐‐ Original Message ‐‐‐ On Friday, October 9, 2020 4:30 PM, mabi wrote: > Hello, > > I have a GlusterFS 6.9 cluster with two nodes and one arbitrer node with a > replica volume and currently there are two files and two directories stuck to > be self-healed. > > Node 1 and 3 (arbitrer) have the files and directories on the brick but node > 2 does not have the files and directories. > > Node1 glustershd log file shows the following warning message: > > [2020-10-09 14:18:54.006707] I [MSGID: 108026] > [afr-self-heal-entry.c:898:afr_selfheal_entry_do] 0-myvol-replicate-0: > performing entry selfheal on 4d520c69-2b18-4601-bad5-3c16c29188c1 > [2020-10-09 14:18:54.007064] W [MSGID: 114061] > [client-common.c:2968:client_pre_readdir_v2] 0-myvol-client-1: > (4d520c69-2b18-4601-bad5-3c16c29188c1) remote_fd is -1. EBADFD [File > descriptor in bad state] > > The FUSE mount client log file show the following error message: > > [2020-10-09 14:15:51.115856] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f9d0a0663bc] > (--> > /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7bba)[0x7f9d07743bba] > (--> > /usr/lib/x86_64-linux-gnu/glusterfs/6.9/xlator/mount/fuse.so(+0x7d23)[0x7f9d07743d23] > (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f9d092bd4a4] (--> > /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9d08b17d0f] ) > 0-glusterfs-fuse: writing to fuse device failed: No such file or directory > > I have no clue how this could have happened but as the GlusterFS self-heal > daemon does not seem to be able to heal the two files and directories itself, > I would like to know what I can do here to fix this? > > Thank you in advance for your help. > > Best regards, > Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Hello, So to be precise I am exactly having the following issue: https://github.com/gluster/glusterfs/issues/1332 I could not wait any longer to find some workarounds or quick fixes so I decided to downgrade my rejected from 7.7 back to 6.9 which worked. I would be really glad if someone could fix this issue or provide me a workaround which works because version 6 of GlusterFS is not supported anymore so I would really like to move on to the stable version 7. Thank you very much in advance. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Saturday, August 22, 2020 7:53 PM, mabi wrote: > Hello, > > I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS > from 6.9 to 7.7 but unfortunately after upgrading the first node, that node > gets rejected due to the following error: > > [2020-08-22 17:43:00.240990] E [MSGID: 106012] > [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums > of quota configuration of volume myvolume differ. local cksum = 3013120651, > remote cksum = 0 on peer myfirstnode.domain.tld > > So glusterd process is running but not glusterfsd. > > I am exactly in the same issue as described here: > > https://www.gitmemory.com/Adam2Marsh > > But I do not see any solutions or workaround. So now I am stuck with a > degraded GlusterFS cluster. > > Could someone please advise me as soon as possible on what I should do? Is > there maybe any workarounds? > > Thank you very much in advance for your response. > > Best regards, > Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Hello, I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS from 6.9 to 7.7 but unfortunately after upgrading the first node, that node gets rejected due to the following error: [2020-08-22 17:43:00.240990] E [MSGID: 106012] [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvolume differ. local cksum = 3013120651, remote cksum = 0 on peer myfirstnode.domain.tld So glusterd process is running but not glusterfsd. I am exactly in the same issue as described here: https://www.gitmemory.com/Adam2Marsh But I do not see any solutions or workaround. So now I am stuck with a degraded GlusterFS cluster. Could someone please advise me as soon as possible on what I should do? Is there maybe any workarounds? Thank you very much in advance for your response. Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Dear Nikhil, Thank you for your answer. So does this mean that all my FUSE clients where I have the volume mounted will not loose at any time their connection during the whole upgrade procedure of all 3 nodes? I am asking because if I understand correctly there will be an overlap of time where more than one node will not be running the glusterfsd (brick) process so this means that the quorum is lost and then my FUSE clients will loose connection to the volume? I just want to be sure that there will not be any downtime. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Monday, August 24, 2020 11:14 AM, Nikhil Ladha wrote: > Hello Mabi > > You don't need to follow the offline upgrade procedure. Please do follow the > online upgrade procedure only. Upgrade the nodes one by one, you will notice > the `Peer Rejected` state, after upgrading one node or so, but once all the > nodes are upgraded it will be back to `Peer in Cluster(Connected)`. Also, if > any of the shd's are not online you can try restarting that node to fix that. > I have tried this on my own setup so I am pretty sure, it should work for you > as well. > This is the workaround for the time being so that you are able to upgrade, we > are working on the issue to come up with a fix for it ASAP. > > And, yes if you face any issues even after upgrading all the nodes to 7.7, > you will be able to downgrade in back to 6.9, which I think you have already > tried and it works as per your previous mail. > > Regards > Nikhil Ladha Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 3:39 PM, Diego Zuccato wrote: > Memory does not serve me well (there are 28 disks, not 26!), but bash > history does :) Yes, I also too often rely on history ;) > gluster volume remove-brick BigVol replica 2 > str957-biostq:/srv/arbiters/{00..27}/BigVol force Thanks for the info, I was missing the "replica 2" inside the command it looks like > gluster peer detach str957-biostq > gluster peer probe str957-biostq Do I really need to do a detach and re-probe the aribter node? I would like to avoid that because I have two other volumes with even more files... so that would mean that I have to remove the arbiter brick of the two other volumes too... > Give all the CPU and RAM you can. Less than 8GB RAM is asking for > troubles (in my case). I have added an extra 4 GB of RAM just in case. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Ok I see I won't go down that path of disabling quota. I could now remove the arbiter brick of my volume which has the quota issue so it is now a simple 2 nodes replica with 1 brick per node. Now I would like to add the brick back but I get the following error: volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in Cluster' state In fact I checked and the arbiter node is still rejected as you can see here: State: Peer Rejected (Connected) On the arbiter node glusted.log file I see the following errors: [2020-10-26 18:35:05.605124] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume woelkli-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain.tld [2020-10-26 18:35:05.617009] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain.tld So although I have removed the arbiter brick from my volume it it still complains about that checksum of the quota configuration. I also tried to restart glusterd on my arbiter node but it does not help. The peer is still rejected. What should I do at this stage? ‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 6:06 PM, Strahil Nikolov wrote: > Detaching the arbiter is pointless... > Quota is an extended file attribute, and thus disabling and reenabling quota > on a volume with millions of files will take a lot of time and lots of IOPS. > I would leave it as a last resort. > > Also, it was mentioned in the list about the following script that might help > you: > https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py > > You can take a look in the mailing list for usage and more details. > > Best Regards, > Strahil Nikolov > > В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato > diego.zucc...@unibo.it написа: > > Il 26/10/20 15:09, mabi ha scritto: > > > Right, seen liked that this sounds reasonable. Do you actually remember the > > exact command you ran in order to remove the brick? I was thinking this > > should be it: > > gluster volume remove-brick force > > but should I use "force" or "start"? > > Memory does not serve me well (there are 28 disks, not 26!), but bash > history does :) > > gluster volume remove-brick BigVol replica 2 > > = > > str957-biostq:/srv/arbiters/{00..27}/BigVol force > > gluster peer detach str957-biostq > > == > > gluster peer probe str957-biostq > > = > > gluster volume add-brick BigVol replica 3 arbiter 1 > > > > str957-biostq:/srv/arbiters/{00..27}/BigVol > > You obviously have to wait for remove-brick to complete before detaching > arbiter. > > > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and > > > RAM. > > > That's quite long I must say and I am in the same case as you, my arbiter > > > is a VM. > > Give all the CPU and RAM you can. Less than 8GB RAM is asking for > troubles (in my case). > > - > > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
First to answer your question how this first happened, I reached that issue first by simply rebooting my arbiter node yesterday morning in order to due some maintenance which I do on a regular basis and was never a problem before GlusterFS 7.8. I have now removed the arbiter brick from all of my volumes (I have 3 volumes and only one volume uses quota). So I was then able to do a "detach" and then a "probe" of my arbiter node. So far so good, so I decided to add back an aribter brick to one of my smallest volumes which does not have quota but I get the following error message: $ gluster volume add-brick othervol replica 3 arbiter 1 arbiternode.domain.tld:/srv/glusterfs/othervol/brick volume add-brick: failed: Commit failed on arbiternode.domain.tld. Please check log file for details. Checking the glusterd.log file of the arbiter node shows the following: [2020-10-27 06:25:36.011955] I [MSGID: 106578] [glusterd-brick-ops.c:1024:glusterd_op_perform_add_bricks] 0-management: replica-count is set 3 [2020-10-27 06:25:36.011988] I [MSGID: 106578] [glusterd-brick-ops.c:1029:glusterd_op_perform_add_bricks] 0-management: arbiter-count is set 1 [2020-10-27 06:25:36.012017] I [MSGID: 106578] [glusterd-brick-ops.c:1033:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2020-10-27 06:25:36.093551] E [MSGID: 106053] [glusterd-utils.c:13790:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2020-10-27 06:25:36.104897] E [MSGID: 101042] [compat.c:605:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntQQVzyD [Transport endpoint is not connected] [2020-10-27 06:25:36.104973] E [MSGID: 106073] [glusterd-brick-ops.c:2051:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2020-10-27 06:25:36.105001] E [MSGID: 106122] [glusterd-mgmt.c:317:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2020-10-27 06:25:36.105023] E [MSGID: 106122] [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick After that I tried to restart the glusterd service on my arbiter node and now it is again rejected from the other nodes with exactly the same error message as yesterday regarding the quota checksum being different as you can see here: [2020-10-27 06:30:21.729577] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain.tld [2020-10-27 06:30:21.731966] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain.tld This is really weird because at this stage I did not even try yet to add the brick to the arbiter node from my volume which has the quota enabled... After detaching the arbiter node, am I supposed to delete something on the arbiter node? Something is really wrong here and I am stuck in a loop somehow... any help would be greatly appreciated. ‐‐‐ Original Message ‐‐‐ On Tuesday, October 27, 2020 1:26 AM, Strahil Nikolov wrote: > You need to fix that "reject" issue before trying anything else. > Have you tried to "detach" the arbiter and then "probe" it again ? > > I have no idea what you did to reach that state - can you provide the details > ? > > Best Regards, > Strahil Nikolov > > В понеделник, 26 октомври 2020 г., 20:38:38 Гринуич+2, mabi > m...@protonmail.ch написа: > > Ok I see I won't go down that path of disabling quota. > > I could now remove the arbiter brick of my volume which has the quota issue > so it is now a simple 2 nodes replica with 1 brick per node. > > Now I would like to add the brick back but I get the following error: > > volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in > Cluster' state > > In fact I checked and the arbiter node is still rejected as you can see here: > > State: Peer Rejected (Connected) > > On the arbiter node glusted.log file I see the following errors: > > [2020-10-26 18:35:05.605124] E [MSGID: 106012] > [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums > of quota configuration of volume woelkli-private differ. local cksum = 0, > remote cksum = 66908910 on peer node1.domain.tld > [2020-10-26 18:35:05.617009] E [MSGID: 106012] > [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums > of quota configuration of volume myvol-private differ. local cksum = 0, > remote cksum = 66908910 on peer node2.domain.tld > > So although I have removed the arbiter brick from my v
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Dear all, Thanks to this fix I could successfully upgrade from GlusterFS 6.9 to 7.8 but now, 1 week later after the upgrade, I have rebooted my third node (arbiter node) and unfortunately the bricks do not want to come up on that node. I get the same following error message: [2020-10-26 06:21:59.726705] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node2.domain [2020-10-26 06:21:59.726871] I [MSGID: 106493] [glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to node2.domain (0), ret: 0, op_ret: -1 [2020-10-26 06:21:59.728164] I [MSGID: 106490] [glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 5f4ccbf4-33f6-4298-8b31-213553223349 [2020-10-26 06:21:59.728969] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote cksum = 66908910 on peer node1.domain [2020-10-26 06:21:59.729099] I [MSGID: 106493] [glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to node1.domain (0), ret: 0, op_ret: -1 Can someone please advise what I need to do in order to have my arbiter node up and running again as soon as possible? Thank you very much in advance for your help. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Monday, September 7, 2020 5:49 AM, Sanju Rakonde wrote: > Hi, > > issue https://github.com/gluster/glusterfs/issues/1332 is fixed now with > https://github.com/gluster/glusterfs/commit/865cca1190e233381f975ff36118f46e29477dcf. > > It will be backported to release-7 and release-8 branches soon. > > On Mon, Sep 7, 2020 at 1:14 AM Strahil Nikolov wrote: > >> Your e-mail got in the spam... >> >> If you haven't fixed the issue, check Hari's topic about quota issues (based >> on the error message you provided) : >> https://medium.com/@harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a >> >> Most probably there is a quota issue and you need to fix it. >> >> Best Regards, >> Strahil Nikolov >> >> В неделя, 23 август 2020 г., 11:05:27 Гринуич+3, mabi >> написа: >> >> Hello, >> >> So to be precise I am exactly having the following issue: >> >> https://github.com/gluster/glusterfs/issues/1332 >> >> I could not wait any longer to find some workarounds or quick fixes so I >> decided to downgrade my rejected from 7.7 back to 6.9 which worked. >> >> I would be really glad if someone could fix this issue or provide me a >> workaround which works because version 6 of GlusterFS is not supported >> anymore so I would really like to move on to the stable version 7. >> >> Thank you very much in advance. >> >> Best regards, >> Mabi >> >> ‐‐‐ Original Message ‐‐‐ >> >> On Saturday, August 22, 2020 7:53 PM, mabi wrote: >> >>> Hello, >>> >>> I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS >>> from 6.9 to 7.7 but unfortunately after upgrading the first node, that node >>> gets rejected due to the following error: >>> >>> [2020-08-22 17:43:00.240990] E [MSGID: 106012] >>> [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums >>> of quota configuration of volume myvolume differ. local cksum = 3013120651, >>> remote cksum = 0 on peer myfirstnode.domain.tld >>> >>> So glusterd process is running but not glusterfsd. >>> >>> I am exactly in the same issue as described here: >>> >>> https://www.gitmemory.com/Adam2Marsh >>> >>> But I do not see any solutions or workaround. So now I am stuck with a >>> degraded GlusterFS cluster. >>> >>> Could someone please advise me as soon as possible on what I should do? Is >>> there maybe any workarounds? >>> >>> Thank you very much in advance for your response. >>> >>> Best regards, >>> Mabi >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Thanks, > Sanju Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
On Monday, October 26, 2020 11:34 AM, Diego Zuccato wrote: > IIRC it's the same issue I had some time ago. > I solved it by "degrading" the volume to replica 2, then cleared the > arbiter bricks and upgraded again to replica 3 arbiter 1. Thanks Diego for pointing out this workaround. How much data do you have on that volume in terms of TB and files? Because I have around 3TB of data in 10 million files. So I am a bit worried of taking such drastic measures. How bad was the load after on your volume when re-adding the arbiter brick? and how long did it take to sync/heal? Would another workaround such as turning off quotas on that problematic volume work? That sounds much less scary but I don't know if that would work... Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
‐‐‐ Original Message ‐‐‐ On Monday, October 26, 2020 2:56 PM, Diego Zuccato wrote: > The volume is built by 26 10TB disks w/ genetic data. I currently don't > have exact numbers, but it's still at the beginning, so there are a bit > less than 10TB actually used. > But you're only removing the arbiters, you always have two copies of > your files. The worst that can happen is a split brain condition > (avoidable by requiring a 2-nodes quorum, in that case the worst is that > the volume goes readonly). Right, seen liked that this sounds reasonable. Do you actually remember the exact command you ran in order to remove the brick? I was thinking this should be it: gluster volume remove-brick force but should I use "force" or "start"? > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM. That's quite long I must say and I am in the same case as you, my arbiter is a VM. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Slow writes on replica+arbiter after upgrade to 7.8 (issue on github)
Hello, I just wanted to give you a heads up that I have now submitted an issue on github with the required details as I suspect this behavior to be maybe related to a bug introduced in version 7 as I did not have this problem with version 6: https://github.com/gluster/glusterfs/issues/1764 Thank you in advance for your help. Regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to find out what GlusterFS is doing
Hello, I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes and the two nodes (not arbiter) seem to have a high load due to the glusterfsd brick process taking all CPU resources (12 cores). Checking these two servers with iostat command shows that the disks are not so busy and that they are mostly doing writes activity. On the FUSE clients there is not so much activity so I was wondering how to find out or explain why GlusterFS is currently generating such a high load on these two servers (the arbiter does not show any high load). There are no files currently healing either. This volume is the only volume which has the quota enabled if this might be a hint. So does anyone know how to see why GlusterFS is so busy on a specific volume? Here is a sample "vmstat 60" of one of the nodes: onadmin@gfs1b:~$ vmstat 60 procs ---memory-- ---swap-- -io -system-- --cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 9 2 0 22296776 32004 2602840033 301 153 39 2 60 36 2 0 13 0 0 22244540 32048 26045600 343 2798 10898 367652 2 80 16 1 0 18 0 0 22215740 32056 26067200 308 2524 9892 334537 2 83 14 1 0 18 0 0 22179348 32084 26082800 169 2038 8703 250351 1 88 10 0 0 I already tried rebooting but that did not help and there is nothing special in the log files either. Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to find out what GlusterFS is doing
Below is the top output of running "top -bHd d" on one of the nodes, maybe that can help to see what that glusterfsd process is doing? PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 4375 root 20 0 2856784 120492 8360 D 61.1 0.4 117:09.29 glfs_iotwr001 4385 root 20 0 2856784 120492 8360 R 61.1 0.4 117:12.92 glfs_iotwr003 4387 root 20 0 2856784 120492 8360 R 61.1 0.4 117:32.19 glfs_iotwr005 4388 root 20 0 2856784 120492 8360 R 61.1 0.4 117:28.87 glfs_iotwr006 4391 root 20 0 2856784 120492 8360 D 61.1 0.4 117:20.71 glfs_iotwr008 4395 root 20 0 2856784 120492 8360 D 61.1 0.4 117:17.22 glfs_iotwr009 4405 root 20 0 2856784 120492 8360 R 61.1 0.4 117:19.52 glfs_iotwr00d 4406 root 20 0 2856784 120492 8360 R 61.1 0.4 117:29.51 glfs_iotwr00e 4366 root 20 0 2856784 120492 8360 D 55.6 0.4 117:27.58 glfs_iotwr000 4386 root 20 0 2856784 120492 8360 D 55.6 0.4 117:22.77 glfs_iotwr004 4390 root 20 0 2856784 120492 8360 D 55.6 0.4 117:26.49 glfs_iotwr007 4396 root 20 0 2856784 120492 8360 R 55.6 0.4 117:23.68 glfs_iotwr00a 4376 root 20 0 2856784 120492 8360 D 50.0 0.4 117:36.17 glfs_iotwr002 4397 root 20 0 2856784 120492 8360 D 50.0 0.4 117:11.09 glfs_iotwr00b 4403 root 20 0 2856784 120492 8360 R 50.0 0.4 117:26.34 glfs_iotwr00c 4408 root 20 0 2856784 120492 8360 D 50.0 0.4 117:27.47 glfs_iotwr00f 9814 root 20 0 2043684 75208 8424 D 22.2 0.2 50:15.20 glfs_iotwr003 28131 root 20 0 2043684 75208 8424 R 22.2 0.2 50:07.46 glfs_iotwr004 2208 root 20 0 2043684 75208 8424 R 22.2 0.2 49:32.70 glfs_iotwr008 2372 root 20 0 2043684 75208 8424 R 22.2 0.2 49:52.60 glfs_iotwr009 2375 root 20 0 2043684 75208 8424 D 22.2 0.2 49:54.08 glfs_iotwr00c 767 root 39 19 0 0 0 R 16.7 0.0 67:50.83 dbuf_evict 4132 onadmin 20 0 45292 4184 3176 R 16.7 0.0 0:00.04 top 28484 root 20 0 2043684 75208 8424 R 11.1 0.2 49:41.34 glfs_iotwr005 2376 root 20 0 2043684 75208 8424 R 11.1 0.2 49:49.49 glfs_iotwr00d 2719 root 20 0 2043684 75208 8424 R 11.1 0.2 49:58.61 glfs_iotwr00e 4384 root 20 0 2856784 120492 8360 S 5.6 0.4 4:01.27 glfs_rpcrqhnd 3842 root 20 0 2043684 75208 8424 S 5.6 0.2 0:30.12 glfs_epoll001 1 root 20 0 57696 7340 5248 S 0.0 0.0 0:03.59 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:09.57 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.16 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:07.36 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.03 migration/0 10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain 11 root rt 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0 12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 Any clues anyone? The load is really high around 20 now on the two nodes... ‐‐‐ Original Message ‐‐‐ On Thursday, November 5, 2020 11:50 AM, mabi wrote: > Hello, > > I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes > and the two nodes (not arbiter) seem to have a high load due to the > glusterfsd brick process taking all CPU resources (12 cores). > > Checking these two servers with iostat command shows that the disks are not > so busy and that they are mostly doing writes activity. On the FUSE clients > there is not so much activity so I was wondering how to find out or explain > why GlusterFS is currently generating such a high load on these two servers > (the arbiter does not show any high load). There are no files currently > healing either. This volume is the only volume which has the quota enabled if > this might be a hint. So does anyone know how to see why GlusterFS is so busy > on a specific volume? > > Here is a sample "vmstat 60" of one of the nodes: > > onadmin@gfs1b:~$ vmstat 60 > procs ---memory-- ---swap-- -io -system-- > --cpu- > r b swpd free buff cache si so bi bo in cs us sy id wa st > 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0 > 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80 16 1 0 > 18 0 0 22215740 32056 260672 0 0 308 2524 9892 334537 2 83 14 1 0 > 18 0 0 22179348 32084 260828 0 0 169 2038 8703 250351 1 88 10 0 0 > > I already tried rebooting but that did not help and there is nothing special > in the log files either.
Re: [Gluster-users] How to find out what GlusterFS is doing
‐‐‐ Original Message ‐‐‐ On Thursday, November 5, 2020 3:28 PM, Yaniv Kaul wrote: > Waiting for IO, just like the rest of those in D state. > You may have a slow storage subsystem. How many cores do you have, btw? > Y. Strange because "iostat -xtcm 5" does not show that the disks are 100% used, I've pasted below a sample output of "iostat -xtcm". Both nodes have 1 CPU Intel Xeon E5-2620 v3 @ 2.40GHz which as 12 cores. 11/05/2020 03:37:25 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.93 0.00 84.81 0.03 0.00 14.22 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.60 23.80 0.00 0.20 17.31 0.03 1.31 45.33 0.20 1.25 3.04 sdc 0.00 0.00 0.60 24.80 0.00 0.24 19.15 0.03 1.04 40.00 0.10 1.01 2.56 sdg 0.00 0.00 0.60 23.00 0.00 0.22 19.05 0.03 1.39 45.33 0.24 1.25 2.96 sdf 0.00 0.00 0.60 25.00 0.00 0.23 18.25 0.03 1.16 41.33 0.19 1.06 2.72 sdd 0.00 0.00 0.60 24.60 0.00 0.19 15.43 0.02 0.86 32.00 0.10 0.83 2.08 sdh 0.00 0.00 0.40 25.00 0.00 0.22 17.64 0.03 1.10 58.00 0.19 1.01 2.56 sdi 0.00 0.00 0.40 25.80 0.00 0.23 17.71 0.03 1.01 60.00 0.09 0.98 2.56 sdj 0.00 0.00 0.60 24.00 0.00 0.19 15.67 0.02 0.91 32.00 0.13 0.85 2.08 sde 0.00 0.00 0.60 26.60 0.00 0.20 15.12 0.03 1.00 36.00 0.21 0.91 2.48 sdk 0.00 0.00 0.60 25.20 0.00 0.20 16.12 0.02 0.78 29.33 0.10 0.74 1.92 sdl 0.00 0.00 0.60 25.00 0.00 0.22 17.56 0.02 0.94 37.33 0.06 0.94 2.40 sdb 0.00 0.00 0.60 15.40 0.00 0.21 27.80 0.03 2.15 42.67 0.57 1.95 3.12 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Geo replication procedure for DR
Hello, I was reading the geo replication documentation here: https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/ and I was wondering how it works when in case of disaster recovery when the primary cluster is down and the the secondary site with the volume needs to be used? What is the procedure here to make the secondary volume on the secondary site available for read/write? And once the primary site is back online how do you copy back or sync all data changes done on the secondary volume on the secondary site back to the primary volume on the primary site? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to find out data alignment for LVM thin volume brick
Hello, I am preparing a brick as LVM thin volume for a test slave node using this documentation: https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/ but I am confused regarding the right "--dataalignment" option to be used for pvcreate. The documentation mentions the following under point 1: "Create a physical volume(PV) by using the pvcreate command. For example: pvcreate --dataalignment 128K /dev/sdb Here, /dev/sdb is a storage device. Use the correct dataalignment option based on your device. Note: The device name and the alignment value will vary based on the device you are using." As test disk for this brick I have an external USB 500GB SSD disk from Samsung PSSD T7 (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) but my question is where do I find the information on which alignment value I need to use for this specific disk? Best regards, Mabi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo replication procedure for DR
Dear Strahil, Thank you for the detailed command. So once you want to switch all traffic to the DR site in case of disaster one should first disable the read-only setting on the secondary volume on the slave site. What happens after when the master site is back online? What's the procedure there? I had the following question in my previous mail in this regard: "And once the primary site is back online how do you copy back or sync all data changes done on the secondary volume on the secondary site back to the primary volume on the primary site?" Best regards, Mabi --- Original Message --- On Wednesday, June 7th, 2023 at 6:52 AM, Strahil Nikolov wrote: > It's just a setting on the target volume: > > gluster volume set read-only OFF > > Best Regards, > Strahil Nikolov > >> On Mon, Jun 5, 2023 at 22:30, mabi >> wrote: >> Hello, >> >> I was reading the geo replication documentation here: >> >> https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/ >> >> and I was wondering how it works when in case of disaster recovery when the >> primary cluster is down and the the secondary site with the volume needs to >> be used? >> >> What is the procedure here to make the secondary volume on the secondary >> site available for read/write? >> >> And once the primary site is back online how do you copy back or sync all >> data changes done on the secondary volume on the secondary site back to the >> primary volume on the primary site? >> >> Best regards, >> Mabi >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to find out data alignment for LVM thin volume brick
Dear Strahil, Thank you very much for pointing me to the RedHat documentation. I wasn't aware of it and it is much more detailed. I will have to read it carefully. Now as I have a single disk (no RAID) based on that documentation I understand that I should use a data alignment value of 256kB. Best regards, Mabi --- Original Message --- On Wednesday, June 7th, 2023 at 6:56 AM, Strahil Nikolov wrote: > Have you checked this page: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/brick_configuration > ? > > The alignment depends on the HW raid stripe unit size. > > Best Regards, > Strahil Nikolov > >> On Tue, Jun 6, 2023 at 2:35, mabi >> wrote: >> Hello, >> >> I am preparing a brick as LVM thin volume for a test slave node using this >> documentation: >> >> https://docs.gluster.org/en/main/Administrator-Guide/formatting-and-mounting-bricks/ >> >> but I am confused regarding the right "--dataalignment" option to be used >> for pvcreate. The documentation mentions the following under point 1: >> >> "Create a physical volume(PV) by using the pvcreate command. For example: >> >> pvcreate --dataalignment 128K /dev/sdb >> >> Here, /dev/sdb is a storage device. Use the correct dataalignment option >> based on your device. >> >> Note: The device name and the alignment value will vary based on the device >> you are using." >> >> As test disk for this brick I have an external USB 500GB SSD disk from >> Samsung PSSD T7 >> (https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/) but my >> question is where do I find the information on which alignment value I need >> to use for this specific disk? >> >> Best regards, >> Mabi >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users