[Gluster-users] Gfid mismatch detected - but no split brain - how to solve?
hi Guys I'm seeing "Gfid mismatch detected" in the logs but no split brain indicated (4-way replica) Brick swir-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME Status: Connected Total Number of entries: 22 Number of entries in heal pending: 22 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick whale-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME Status: Connected Total Number of entries: 22 Number of entries in heal pending: 22 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick rider-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick dzien:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME Status: Connected Total Number of entries: 10 Number of entries in heal pending: 10 Number of entries in split-brain: 0 Number of entries possibly healing: 0 On swir-ring8: ... The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /lock_file>, 37b2456f-5216-4679-ac5c-4908b24f895a on USER-HOME-client-15 and ba8f87ed-9bf3-404e-8d67-2631923e1645 on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:47:49.034935] and [2020-05-29 21:47:49.079480] The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /t>, d7a4ed01-139b-4df3-8070-31bd620a6f15 on USER-HOME-client-15 and d794b6ba-2a1d-4043-bb31-b98b22692763 on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:47:49.126173] and [2020-05-29 21:47:49.155432] The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /Tables.docx>, 344febd8-c89c-4bf3-8ad8-6494c2189c43 on USER-HOME-client-15 and 48d5b12b-03f4-46bf-bed1-9f8f88815615 on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:47:49.194061] and [2020-05-29 21:47:49.239896] The message "E [MSGID: 108008] [afr-self-heal-entry.c:257:afr_selfheal_detect_gfid_and_type_mismatch] 0-USER-HOME-replicate-0: Skipping conservative merge on the file." repeated 8 times between [2020-05-29 21:47:49.037812] and [2020-05-29 21:47:49.240423] ... On whale-ring8: ... The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /pcs>, a83d0e5f-ef3a-40ab-be7b-784538d150be on USER-HOME-client-15 and 89af3d31-81fa-4242-b8f7-0f49fd5fe57b on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:45:46.152052] and [2020-05-29 21:45:46.422393] The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /history_database>, 81ebb0d5-264a-4eba-984a-e18673b43826 on USER-HOME-client-15 and 2498a303-8937-43c3-939e-5e1d786b07fa on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:45:46.167704] and [2020-05-29 21:45:46.437702] The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /client-state>, fe86c057-c74d-417f-9c2c-6e6eb9778851 on USER-HOME-client-15 and a66f2714-c2a0-4bdc-8786-ad5b93e0e988 on USER-HOME-client-13." repeated 2 times between [2020-05-29 21:45:46.144242] and [2020-05-29 21:45:46.442526] The message "E [MSGID: 108008] [afr-self-heal-common.c:384:afr_gfid_split_brain_source] 0-USER-HOME-replicate-0: Gfid mismatch detected for /history_database.1>, 9826d8ad-fecc-4dd7-bc1f-87d0eff23d73 on USER-HOME-client-15 and 81ebb0d5-264a-4eba-984a-e18673b43826 on USER-HOME-client-13." repeated 3 times between [2020-05-29 21:45:46.162016] and [2020-05-29 21:45:46.476935] ... On rider-ring8: ... 2020-05-29 21:46:53.122929] E [MSGID: 114031] [client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk] 0-QEMU_VMs-client-3: remote operation failed. Path: (6f01098f-e8db-4f63-a661-86b4d02d937f) [Permission denied] [2020-05-29 21:46:53.124148] E [MSGID: 114031] [client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk] 0-QEMU_VMs-client-4: remote operation failed. Path: (6f01098f-e8db-4f63-a661-86b4d02d937f) [Permission denied] [2020-05-29 21:46:53.133566] I [MSGID: 108026] [afr-self-heal-entry.c:898:afr_selfheal_entry_do] 0-QEMU_VMs-replicate-0: performing entry selfheal on e0121f76-2452-44dc-b1a6-82b46cc9ec79 [2020-05-29 21:46:53.145991] E [MSGID: 114031] [client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk] 0-QEMU_VMs-client-3: remote operation failed. Path: (3f0239ac-e027-4a0c-b271-431e76ad97b1) [Permission denied] [2020-05-29 21:46:53.147110] E [MSGID: 114031] [client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk] 0-QEMU_VMs-client-4: remote operation failed. Path: (3f0239ac-e027-4a0c-b271-431e76ad97b1) [Permission denied] The most recent data I'm 100% certain is on rider-ring8. Any expert could
[Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave
Hello, We're having an issue with a geo-replication process with unusually high CPU use and giving "Entry not present on master. Fixing gfid mismatch in slave" errors. Can anyone help on this? We have 3 GlusterFS replica nodes (we'll call the master), which also push data to a remote server (slave) using geo-replication. This has been running fine for a couple of months, but yesterday one of the master nodes started having unusually high CPU use. It's this process: root@cafs30:/var/log/glusterfs# ps aux | grep 32048 root 32048 68.7 0.6 1843140 845756 ? Rl 02:51 493:51 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1 --resource-remote nvfs30 --resource-remote-id 1e698ccd-aeec-4ec4-96fe-383da8fc3b78 Here's what is being logged in /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log: [2020-05-29 21:57:18.843524] I [master(worker /nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time stime=(1590789408, 0) [2020-05-29 21:57:30.626172] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entryretry_count=1 entry=({u'uid': 108, u'gfid': u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7.cfg', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False, u'dst': False}) [2020-05-29 21:57:30.627893] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entryretry_count=1 entry=({u'uid': 108, u'gfid': u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7.cfg', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False, u'dst': False}) [2020-05-29 21:57:30.629532] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entryretry_count=1 entry=({u'uid': 108, u'gfid': u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7-directory.xml', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False, u'dst': False}) [2020-05-29 21:57:30.659123] I [master(worker /nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch retry_count=1 [2020-05-29 21:57:30.659343] I [master(worker /nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry original entries. count = 1 [2020-05-29 21:57:30.725810] I [master(worker /nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster: Sucessfully fixed all entry ops with gfid mismatch [2020-05-29 21:57:31.747319] I [master(worker /nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken duration=0.7409 num_files=18job=1 return_code=0 We've verified that the files like polycom_00a859f7.cfg referred to in the error do exist on the master nodes and slave. We found this bug fix: https://bugzilla.redhat.com/show_bug.cgi?id=1642865 However that fix went in 5.1, and we're running 5.12 on the master nodes and slave. A couple of GlusterFS clients connected to the master nodes are running 5.13. Would anyone have any suggestions? Thank you in advance. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] File system very slow
I am using nfs mount of gluster volume to get better performance On Wed, May 27, 2020 at 10:02 AM wrote: > - # gluster --version > glusterfs 7.5 > > - # gluster volume status atlassian > Status of volume: atlassian > Gluster process TCP Port RDMA Port Online > Pid > > -- > Brick node1:/data/atlassian/gluster 49152 0 Y 1791 > Brick node2:/data/atlassian/gluster 49152 0 Y 1773 > Self-heal Daemon on localhost N/A N/AY > 1807 > Self-heal Daemon on node1.example.c > example.net N/A N/AY > 1778 > > Task Status of Volume atlassian > > -- > There are no active volume tasks > > - # attached pre-du and during-du log from server > > > - I do not have a remote client. when I tried to run these > gluster volume profile your-volume start says already started since I am > on the server > # setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt /mnt runs but > no output in /tmp/io-stats-pre.txt > > > - # gluster volume heal atlassian info > Brick node1:/data/atlassian/gluster > Status: Connected > Number of entries: 0 > > Brick node2:/data/atlassian/gluster > Status: Connected > Number of entries: 0 > > Let me know if you need anything else. Appreciate your help > > > On Wed, May 27, 2020 at 3:27 AM Karthik Subrahmanya > wrote: > >> Hi, >> >> Please provide the following information to understand the setup and >> debug this further: >> - Which version of gluster you are using? >> - 'gluster volume status atlassian' to confirm both bricks and shds are >> up or not >> - Complete output of 'gluster volume profile atlassian info' before >> running 'du' and during 'du'. Redirect this output to separate files and >> attach them here >> - Get the client side profile as well by following >> https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/ >> - 'gluster volume heal atlassian info' to check whether there are any >> pending heals and client side heal is contributing to this >> >> Regards, >> Karthik >> >> On Wed, May 27, 2020 at 1:06 AM wrote: >> >>> I had a parsing error. It is Volume Name: atlassian >>> >>> On Tue, May 26, 2020 at 3:12 PM wrote: >>> # gluster volume info Volume Name: myvol Type: Replicate Volume ID: cbdef65c-79ea-496e-b777-b6a2981b29cf Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: node1:/data/foo/gluster Brick2: node2:/data/foo/gluster Options Reconfigured: client.event-threads: 4 server.event-threads: 4 performance.stat-prefetch: on network.inode-lru-limit: 16384 performance.md-cache-timeout: 1 performance.cache-invalidation: false performance.cache-samba-metadata: false features.cache-invalidation-timeout: 600 features.cache-invalidation: on performance.io-thread-count: 16 performance.cache-refresh-timeout: 5 performance.write-behind-window-size: 5MB performance.cache-size: 1GB transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off diagnostics.latency-measurement: on diagnostics.count-fop-hits: on On Tue, May 26, 2020 at 3:06 PM Sunil Kumar Heggodu Gopala Acharya < shegg...@redhat.com> wrote: > Hi, > > Please share the gluster volume information. > > # gluster vol info > > > Regards, > > Sunil kumar Acharya > > > On Wed, May 27, 2020 at 12:30 AM wrote: > >> I made the following changes for small file performance as suggested >> by >> http://blog.gluster.org/gluster-tiering-and-small-file-performance/ >> >> I am still seeing du -sh /data/shared taking 39 minutes. >> >> Any other tuning I can do. Most of my files are 15K. Here is sample >> of small files with size and number of occurrences >> >> FileSize.# of occurrence >> >> >> 1.1K 1122 >> 1.1M 1040 >> 1.2K 1281 >> 1.2M 1357 >> 1.3K 1149 >> 1.3M 1098 >> 1.4K 1119 >> 1.5K 1189 >> 1.6K 1036 >> 1.7K 1169 >> 11K 2157 >> 12K 2398 >> 13K 2402 >> 14K 2406*15K 2426* >> 16K 2386 >> 17K 1986 >> 18K 2037 >> 19K 1829 >> 2.0K 1027 >> 2.1K 1048 >> 2.4K 1013 >> 20K 1585 >> 21K 1713 >> 22K 1590 >> 23K 1371 >> 24K 1428 >> 25K 1444 >> 26K 1391 >> 27K 1217 >> 28K 1485 >> 29K 1282 >> 30K 1303 >> 31K 1275 >> 32K 1296 >> 33K 1058 >> 36K 1023 >> 37K 1107 >> 39K 1092 >> 41K 1034 >> 42K 1187 >> 46K 1030 >> >> >> >> >> >> On Mon, May 25, 2020 at 5:30 PM
Re: [Gluster-users] Failing to nfs mount gluster volume
I am running gluster 7.5 on CentOS 7 which does not have gnfs compiled. I had to build it from source with ./configure --enable-gnfs --without-libtirpc[1] and then I could nfs mount the gluster volume [1] https://docs.gluster.org/en/latest/Developer-guide/Building-GlusterFS/ On Fri, May 29, 2020 at 12:42 PM wrote: > Turned off nfs-server service and now I am getting a different error > message > > [root@node1 ~]# mount -vv -t nfs -o vers=3,mountproto=tcp 192.168.1.121:/gv0 > /nfs_mount/ > mount.nfs: timeout set for Fri May 29 16:43:30 2020 > mount.nfs: trying text-based options > 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' > mount.nfs: prog 13, trying vers=3, prot=6 > mount.nfs: portmap query retrying: RPC: Program not registered > mount.nfs: prog 13, trying vers=3, prot=17 > mount.nfs: portmap query failed: RPC: Program not registered > mount.nfs: trying text-based options > 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' > mount.nfs: prog 13, trying vers=3, prot=6 > mount.nfs: portmap query retrying: RPC: Program not registered > mount.nfs: prog 13, trying vers=3, prot=17 > mount.nfs: portmap query failed: RPC: Program not registered > mount.nfs: trying text-based options > 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' > mount.nfs: prog 13, trying vers=3, prot=6 > mount.nfs: portmap query retrying: RPC: Program not registered > mount.nfs: prog 13, trying vers=3, prot=17 > mount.nfs: portmap query failed: RPC: Program not registered > mount.nfs: requested NFS version or transport protocol is not supported > > [root@node1 ~]# gluster volume info gv0 > > Volume Name: gv0 > Type: Replicate > Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 192.168.1.121:/data/brick1/gv0 > Brick2: 192.168.1.122:/data/brick2/gv0 > Options Reconfigured: > nfs.register-with-portmap: on > performance.client-io-threads: off > nfs.disable: off > storage.fips-mode-rchecksum: on > transport.address-family: inet > nfs.volume-access: read-write > nfs.mount-udp: off > > > On Thu, May 28, 2020 at 6:48 PM wrote: > >> [root@node1 ~]# uname -a >> Linux node1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 >> x86_64 x86_64 x86_64 GNU/Linux >> >> [root@node1 ~]# gluster --version >> glusterfs 7.5 >> ... >> >> [root@node1 ~]# gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: 192.168.1.121:/data/brick1/gv0 >> Brick2: 192.168.1.122:/data/brick2/gv0 >> Options Reconfigured: >> transport.address-family: inet >> storage.fips-mode-rchecksum: on >> nfs.disable: off >> performance.client-io-threads: off >> >> >> [root@node1 ~]# mount -v -t nfs -o vers=3 192.168.1.121:/gv0 /nfs_mount >> mount.nfs: timeout set for Thu May 28 22:48:39 2020 >> mount.nfs: trying text-based options 'vers=3,addr=192.168.1.121' >> mount.nfs: prog 13, trying vers=3, prot=6 >> mount.nfs: trying 192.168.1.121 prog 13 vers 3 prot TCP port 2049 >> mount.nfs: prog 15, trying vers=3, prot=17 >> mount.nfs: trying 192.168.1.121 prog 15 vers 3 prot UDP port 20048 >> mount.nfs: mount(2): No such file or directory >> mount.nfs: mounting 192.168.1.121:/gv0 failed, reason given by server: >> No such file or directory >> >> >> >> -- >> Asif Iqbal >> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu >> A: Because it messes up the order in which people normally read text. >> Q: Why is top-posting such a bad thing? >> >> > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > > -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS saturates server disk IO due to write brick temporary file to ".glusterfs" directory
Hi, I have GridFTP + a network speedup solution in network + GlusterFS as a file system component in a disk-to-disk data transferring scenario. For glusterfs, I start with creating bricks inside /dev/sda1 filesystem. During the file transfer (12 GB), it seems that glusterfs tries to write a temporary file into the .glusterfs directory at a very high rate and essentially the disk IO rate of the server host is ~100%. This leads the file transfer performance unpredictable as file transfer(i.e., GridFTP) would sometimes wait and send nothing due to the disk busy of glusterfs server. As as test, I change the brick location to tmpfs (in memory) then the disk I/O won't be saturated, this makes sense b/c the file writing would happen in memory in this case, but this is not disk-to-disk transfer anymore so this can't be an effective workaround in my case. I wonder if there is any way I can turn off glusterfs to write temporary files into the .gluster directory during the file transfer? Based on my reading so far, I understand that this is a glusterfs design decision for some recovering purpose, just in my case the writing rate is too fast and saturates the disk IO and makes my disk-to-disk performance unstable. My glusterfs version is: glusterfs 3.7.6 built on Dec 25 2015 20:50:46. I am glad to post more volume or other setup details if necessary. Thanks, Qing Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Failing to nfs mount gluster volume
Turned off nfs-server service and now I am getting a different error message [root@node1 ~]# mount -vv -t nfs -o vers=3,mountproto=tcp 192.168.1.121:/gv0 /nfs_mount/ mount.nfs: timeout set for Fri May 29 16:43:30 2020 mount.nfs: trying text-based options 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' mount.nfs: prog 13, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 13, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: trying text-based options 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' mount.nfs: prog 13, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 13, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: trying text-based options 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121' mount.nfs: prog 13, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 13, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: requested NFS version or transport protocol is not supported [root@node1 ~]# gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.1.121:/data/brick1/gv0 Brick2: 192.168.1.122:/data/brick2/gv0 Options Reconfigured: nfs.register-with-portmap: on performance.client-io-threads: off nfs.disable: off storage.fips-mode-rchecksum: on transport.address-family: inet nfs.volume-access: read-write nfs.mount-udp: off On Thu, May 28, 2020 at 6:48 PM wrote: > [root@node1 ~]# uname -a > Linux node1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 > x86_64 x86_64 x86_64 GNU/Linux > > [root@node1 ~]# gluster --version > glusterfs 7.5 > ... > > [root@node1 ~]# gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 192.168.1.121:/data/brick1/gv0 > Brick2: 192.168.1.122:/data/brick2/gv0 > Options Reconfigured: > transport.address-family: inet > storage.fips-mode-rchecksum: on > nfs.disable: off > performance.client-io-threads: off > > > [root@node1 ~]# mount -v -t nfs -o vers=3 192.168.1.121:/gv0 /nfs_mount > mount.nfs: timeout set for Thu May 28 22:48:39 2020 > mount.nfs: trying text-based options 'vers=3,addr=192.168.1.121' > mount.nfs: prog 13, trying vers=3, prot=6 > mount.nfs: trying 192.168.1.121 prog 13 vers 3 prot TCP port 2049 > mount.nfs: prog 15, trying vers=3, prot=17 > mount.nfs: trying 192.168.1.121 prog 15 vers 3 prot UDP port 20048 > mount.nfs: mount(2): No such file or directory > mount.nfs: mounting 192.168.1.121:/gv0 failed, reason given by server: No > such file or directory > > > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > > -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
Correct, every brick is a separate xfs-formatted disk attached to the machine. There are two disks per machine, the ones mounted in `/data2` are the newer ones. Thanks for the reassurance, that means we can take as long as necessary to diagnose this. Let me know if I there's more data I can provide. lsblk and stat -f outputs follow: $ ssh imagegluster1 "lsblk" NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 9.3G 0 disk └─sda1 8:10 9.3G 0 part / sdb 8:16 0 894.3G 0 disk └─sdb1 8:17 0 894.3G 0 part /data2 sdc 8:32 0 894.3G 0 disk └─sdc1 8:33 0 894.3G 0 part /data $ ssh imagegluster2 "lsblk" NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 9.3G 0 disk └─sda1 8:10 9.3G 0 part / sdb 8:16 0 894.3G 0 disk └─sdb1 8:17 0 894.3G 0 part /data2 sdc 8:32 0 894.3G 0 disk └─sdc1 8:33 0 894.3G 0 part /data $ ssh imagegluster3 "lsblk" NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 9.3G 0 disk └─sda1 8:10 9.3G 0 part / sdb 8:16 0 894.3G 0 disk └─sdb1 8:17 0 894.3G 0 part /data2 sdc 8:32 0 894.3G 0 disk └─sdc1 8:33 0 894.3G 0 part /data $ ssh imagegluster1 "stat -f /data; stat -f /data2" File: "/data" ID: 821 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307548 Free: 111493566 Available: 111493566 Inodes: Total: 468843968 Free: 459695286 File: "/data2" ID: 811 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307553 Free: 10486 Available: 10486 Inodes: Total: 468844032 Free: 459769261 $ ssh imagegluster2 "stat -f /data; stat -f /data2" File: "/data" ID: 821 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307548 Free: 111489680 Available: 111489680 Inodes: Total: 468843968 Free: 459695437 File: "/data2" ID: 811 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307553 Free: 10492 Available: 10492 Inodes: Total: 468844032 Free: 459769261 $ ssh imagegluster3 "stat -f /data; stat -f /data2" File: "/data" ID: 821 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307548 Free: 111495441 Available: 111495441 Inodes: Total: 468843968 Free: 459695437 File: "/data2" ID: 811 Namelen: 255 Type: xfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 234307553 Free: 10505 Available: 10505 Inodes: Total: 468844032 Free: 459769261 On Fri, May 29, 2020 at 1:10 PM Sanju Rakonde wrote: > > Hi Petr, > > it's absolutely safe to use this volume. you will not see any problems even > if the actual used size is greater than the reported total size of the volume > and it is safe to upgrade as well. > > Can you please share the output of the following: > l1. sblk output from all the 3 nodes in the cluster > 2. stat -f for all the bricks > > I hope all the bricks are having a separate filesystem, and it's not shared > between any two bricks. Am I correct? > > On Fri, May 29, 2020 at 4:25 PM Petr Certik wrote: >> >> Thanks! >> >> One more question -- I don't really mind having the wrong size >> reported by df, but I'm worried whether it is safe to use the volume. >> Will it be okay if I write to it? For example, once the actual used >> size is greater than the reported total size of the volume, should I >> expect problems? And is it safe to upgrade glusterfs when the volume >> is in this state? >> >> Cheers, >> Petr >> >> On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde wrote: >> > >> > Nope, for now. I will update you if we figure out any other workaround. >> > >> > Thanks for your help! >> > >> > On Fri, May 29, 2020 at 2:50 PM Petr Certik wrote: >> >> >> >> I'm afraid I don't have the resources to try and reproduce from the >> >> beginning. Is there anything else I can do to get you more >> >> information? >> >> >> >> >> >> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde >> >> wrote: >> >> > >> >> > The issue is not with glusterd restart. We need to reproduce from >> >> > beginning and add-bricks to check df -h values. >> >> > >> >> > I suggest not to try on the production environment. if you have any >> >> > other machines, please let me know. >> >> > >> >> > On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: >> >> >> >> >> >> If you mean the issue during node restart, then yes, I think I could >> >> >> reproduce that with a custom build. It's a production system, though, >> >> >> so I'll need to be extremely careful. >> >> >> >> >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a >> >> >> custom glusterd binary based off of that version? >> >> >> >> >> >> Cheers, >> >> >> Petr >> >> >> >> >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde >> >> >> wrote: >> >> >> > >> >> >> >
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
Hi Petr, it's absolutely safe to use this volume. you will not see any problems even if the actual used size is greater than the reported total size of the volume and it is safe to upgrade as well. Can you please share the output of the following: l1. sblk output from all the 3 nodes in the cluster 2. stat -f for all the bricks I hope all the bricks are having a separate filesystem, and it's not shared between any two bricks. Am I correct? On Fri, May 29, 2020 at 4:25 PM Petr Certik wrote: > Thanks! > > One more question -- I don't really mind having the wrong size > reported by df, but I'm worried whether it is safe to use the volume. > Will it be okay if I write to it? For example, once the actual used > size is greater than the reported total size of the volume, should I > expect problems? And is it safe to upgrade glusterfs when the volume > is in this state? > > Cheers, > Petr > > On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde > wrote: > > > > Nope, for now. I will update you if we figure out any other workaround. > > > > Thanks for your help! > > > > On Fri, May 29, 2020 at 2:50 PM Petr Certik wrote: > >> > >> I'm afraid I don't have the resources to try and reproduce from the > >> beginning. Is there anything else I can do to get you more > >> information? > >> > >> > >> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde > wrote: > >> > > >> > The issue is not with glusterd restart. We need to reproduce from > beginning and add-bricks to check df -h values. > >> > > >> > I suggest not to try on the production environment. if you have any > other machines, please let me know. > >> > > >> > On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: > >> >> > >> >> If you mean the issue during node restart, then yes, I think I could > >> >> reproduce that with a custom build. It's a production system, though, > >> >> so I'll need to be extremely careful. > >> >> > >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a > >> >> custom glusterd binary based off of that version? > >> >> > >> >> Cheers, > >> >> Petr > >> >> > >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde > wrote: > >> >> > > >> >> > Surprising! Will you be able to reproduce the issue and share the > logs if I provide a custom build with more logs? > >> >> > > >> >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik > wrote: > >> >> >> > >> >> >> Thanks for your help! Much appreciated. > >> >> >> > >> >> >> The fsid is the same for all bricks: > >> >> >> > >> >> >> imagegluster1: > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> >> >> > >> >> >> imagegluster2: > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> >> >> > >> >> >> imagegluster3: > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 > >> >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 > >> >> >> > >> >> >> > >> >> >> I already did try restarting the glusterd nodes with no effect, > but > >> >> >> that was before the upgrades of client versions. > >> >> >> > >> >> >> Running the "volume set" command did not seem to work either, the > >> >> >> shared-brick-counts are still the same (2). > >> >> >> > >> >> >> However, when restarting a node, I do get an error and a few > warnings > >> >> >> in the log: https://pastebin.com/tqq1FCwZ > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde < > srako...@redhat.com> wrote: > >> >> >> > > >> >> >> > The shared-brick-count value indicates the number of bricks > sharing a file-system. In your case, it should be one, as all the bricks > are from different mount points. Can you please share the values of > brick-fsid? > >> >> >> > > >> >>
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
Thanks! One more question -- I don't really mind having the wrong size reported by df, but I'm worried whether it is safe to use the volume. Will it be okay if I write to it? For example, once the actual used size is greater than the reported total size of the volume, should I expect problems? And is it safe to upgrade glusterfs when the volume is in this state? Cheers, Petr On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde wrote: > > Nope, for now. I will update you if we figure out any other workaround. > > Thanks for your help! > > On Fri, May 29, 2020 at 2:50 PM Petr Certik wrote: >> >> I'm afraid I don't have the resources to try and reproduce from the >> beginning. Is there anything else I can do to get you more >> information? >> >> >> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde wrote: >> > >> > The issue is not with glusterd restart. We need to reproduce from >> > beginning and add-bricks to check df -h values. >> > >> > I suggest not to try on the production environment. if you have any other >> > machines, please let me know. >> > >> > On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: >> >> >> >> If you mean the issue during node restart, then yes, I think I could >> >> reproduce that with a custom build. It's a production system, though, >> >> so I'll need to be extremely careful. >> >> >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a >> >> custom glusterd binary based off of that version? >> >> >> >> Cheers, >> >> Petr >> >> >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde wrote: >> >> > >> >> > Surprising! Will you be able to reproduce the issue and share the logs >> >> > if I provide a custom build with more logs? >> >> > >> >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: >> >> >> >> >> >> Thanks for your help! Much appreciated. >> >> >> >> >> >> The fsid is the same for all bricks: >> >> >> >> >> >> imagegluster1: >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> >> >> >> >> imagegluster2: >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> >> >> >> >> imagegluster3: >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 >> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 >> >> >> >> >> >> >> >> >> I already did try restarting the glusterd nodes with no effect, but >> >> >> that was before the upgrades of client versions. >> >> >> >> >> >> Running the "volume set" command did not seem to work either, the >> >> >> shared-brick-counts are still the same (2). >> >> >> >> >> >> However, when restarting a node, I do get an error and a few warnings >> >> >> in the log: https://pastebin.com/tqq1FCwZ >> >> >> >> >> >> >> >> >> >> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde >> >> >> wrote: >> >> >> > >> >> >> > The shared-brick-count value indicates the number of bricks sharing >> >> >> > a file-system. In your case, it should be one, as all the bricks are >> >> >> > from different mount points. Can you please share the values of >> >> >> > brick-fsid? >> >> >> > >> >> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ >> >> >> > >> >> >> > I tried reproducing this issue in fedora vm's but couldn't hit this. >> >> >> > we are seeing this issue on and off but are unable to reproduce >> >> >> > in-house. If you see any error messages in glusterd.log please share >> >> >> > the log too. >> >> >> > >> >> >> > Work-around to come out from this situation: >> >> >> > 1. Restarting the glusterd service on all nodes: >> >> >> > # systemctl restart glusterd >> >> >> > >> >> >> > 2. Run set volume command to update vol file: >> >> >> > # gluster v set min-free-disk 11% >> >> >> > >> >> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik wrote: >> >> >> >> >> >> >>
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
Nope, for now. I will update you if we figure out any other workaround. Thanks for your help! On Fri, May 29, 2020 at 2:50 PM Petr Certik wrote: > I'm afraid I don't have the resources to try and reproduce from the > beginning. Is there anything else I can do to get you more > information? > > > On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde > wrote: > > > > The issue is not with glusterd restart. We need to reproduce from > beginning and add-bricks to check df -h values. > > > > I suggest not to try on the production environment. if you have any > other machines, please let me know. > > > > On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: > >> > >> If you mean the issue during node restart, then yes, I think I could > >> reproduce that with a custom build. It's a production system, though, > >> so I'll need to be extremely careful. > >> > >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a > >> custom glusterd binary based off of that version? > >> > >> Cheers, > >> Petr > >> > >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde > wrote: > >> > > >> > Surprising! Will you be able to reproduce the issue and share the > logs if I provide a custom build with more logs? > >> > > >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: > >> >> > >> >> Thanks for your help! Much appreciated. > >> >> > >> >> The fsid is the same for all bricks: > >> >> > >> >> imagegluster1: > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> >> > >> >> imagegluster2: > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> >> > >> >> imagegluster3: > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 > >> >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 > >> >> > >> >> > >> >> I already did try restarting the glusterd nodes with no effect, but > >> >> that was before the upgrades of client versions. > >> >> > >> >> Running the "volume set" command did not seem to work either, the > >> >> shared-brick-counts are still the same (2). > >> >> > >> >> However, when restarting a node, I do get an error and a few warnings > >> >> in the log: https://pastebin.com/tqq1FCwZ > >> >> > >> >> > >> >> > >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde > wrote: > >> >> > > >> >> > The shared-brick-count value indicates the number of bricks > sharing a file-system. In your case, it should be one, as all the bricks > are from different mount points. Can you please share the values of > brick-fsid? > >> >> > > >> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ > >> >> > > >> >> > I tried reproducing this issue in fedora vm's but couldn't hit > this. we are seeing this issue on and off but are unable to reproduce > in-house. If you see any error messages in glusterd.log please share the > log too. > >> >> > > >> >> > Work-around to come out from this situation: > >> >> > 1. Restarting the glusterd service on all nodes: > >> >> > # systemctl restart glusterd > >> >> > > >> >> > 2. Run set volume command to update vol file: > >> >> > # gluster v set min-free-disk 11% > >> >> > > >> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik > wrote: > >> >> >> > >> >> >> As far as I remember, there was no version update on the server. > It > >> >> >> was definitely installed as version 7. > >> >> >> > >> >> >> Shared bricks: > >> >> >> > >> >> >> Server 1: > >> >> >> > >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> >> >> option shared-brick-count 2 > >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol: > option > >> >> >> shared-brick-count 2 > >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> >> >> option shared-brick-count 0 > >> >> >>
Re: [Gluster-users] The dht-layout interval is missing
On Fri, May 29, 2020 at 1:28 PM jifeng-call <17607319...@163.com> wrote: > Hi All, > I have 6 servers that form a glusterfs 2x3 distributed replication volume, > the details are as follows: > > [root@node1 ~]# gluster volume info > Volume Name: ksvd_vol > Type: Distributed-Replicate > Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: node1:/export/kylin/ksvd > Brick2: node2:/export/kylin/ksvd > Brick3: node3:/export/kylin/ksvd > Brick4: node4:/export/kylin/ksvd > Brick5: node5:/export/kylin/ksvd > Brick6: node6:/export/kylin/ksvd > Options Reconfigured: > diagnostics.client-log-level: DEBUG > diagnostics.brick-log-level: DEBUG > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > cluster.quorum-type: auto > > Restart 6 servers at a time, the server automatically mounts the ksvd_vol > volume to the / home / kylin-ksvd directory during the startup process: > > [root@node1 ~]# df -h | grep ksvd_vol > node1:/ksvd_vol4.5T 362G 4.1T9% /home/kylin-ksvd > > Failed to create a file in the glusterfs mount directory of node3, the log > shows the error as follows (the files created by other servers are normal): > > Possibly the layout setxattr failed which should not have happened given it is a replicated volume. You can check log on client side and brick side for the parent directory for more information. Post remount, a fresh triggers which heals the layout missing before and create goes in. > [2020-05-29 05:55:03.065656] E [MSGID: 109011] > [dht-common.c:8683:dht_create] 0-ksvd_vol-dht: no subvolume in layout for > path=/.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 > [2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk] > 0-glusterfs-fuse: 2454790: > /.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误) > [2020-05-29 05:55:03.680303] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > [2020-05-29 05:55:04.687456] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 491618322 > [2020-05-29 05:55:04.688612] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > [2020-05-29 05:55:05.694446] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > [2020-05-29 05:55:05.830555] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 491618322 > [2020-05-29 05:55:06.700423] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > [2020-05-29 05:55:07.706536] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > [2020-05-29 05:55:07.833049] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 491618322 > [2020-05-29 05:55:08.712128] W [MSGID: 109011] > [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash > (value) = 1738969696 > The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] > 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2 > times between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541] > > The pathinfo information in the / home / kylin-ksvd directory is displayed > as follows: > > [root@node3 kylin-ksvd]# getfattr -d -m . -n trusted.glusterfs.pathinfo > /home/kylin-ksvd/ > getfattr: Removing leading '/' from absolute path names > # file: home/kylin-ksvd/ > trusted.glusterfs.pathinfo="(( > ( > > > ) > ( > > > )) > (ksvd_vol-dht-layout (ksvd_vol-replicate-0 0 0) (ksvd_vol-replicate-1 > 3539976838 4294967295)))" > > It can be seen from the above information that ksvd_vol-dht-layout is > missing the interval 0 to 3539976837 > > After umount, remounting returned to normal ... What is the reason? > > Best regards > > > > > > > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
I'm afraid I don't have the resources to try and reproduce from the beginning. Is there anything else I can do to get you more information? On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde wrote: > > The issue is not with glusterd restart. We need to reproduce from beginning > and add-bricks to check df -h values. > > I suggest not to try on the production environment. if you have any other > machines, please let me know. > > On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: >> >> If you mean the issue during node restart, then yes, I think I could >> reproduce that with a custom build. It's a production system, though, >> so I'll need to be extremely careful. >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a >> custom glusterd binary based off of that version? >> >> Cheers, >> Petr >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde wrote: >> > >> > Surprising! Will you be able to reproduce the issue and share the logs if >> > I provide a custom build with more logs? >> > >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: >> >> >> >> Thanks for your help! Much appreciated. >> >> >> >> The fsid is the same for all bricks: >> >> >> >> imagegluster1: >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> >> >> imagegluster2: >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> >> >> imagegluster3: >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 >> >> >> >> >> >> I already did try restarting the glusterd nodes with no effect, but >> >> that was before the upgrades of client versions. >> >> >> >> Running the "volume set" command did not seem to work either, the >> >> shared-brick-counts are still the same (2). >> >> >> >> However, when restarting a node, I do get an error and a few warnings >> >> in the log: https://pastebin.com/tqq1FCwZ >> >> >> >> >> >> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde wrote: >> >> > >> >> > The shared-brick-count value indicates the number of bricks sharing a >> >> > file-system. In your case, it should be one, as all the bricks are from >> >> > different mount points. Can you please share the values of brick-fsid? >> >> > >> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ >> >> > >> >> > I tried reproducing this issue in fedora vm's but couldn't hit this. we >> >> > are seeing this issue on and off but are unable to reproduce in-house. >> >> > If you see any error messages in glusterd.log please share the log too. >> >> > >> >> > Work-around to come out from this situation: >> >> > 1. Restarting the glusterd service on all nodes: >> >> > # systemctl restart glusterd >> >> > >> >> > 2. Run set volume command to update vol file: >> >> > # gluster v set min-free-disk 11% >> >> > >> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik wrote: >> >> >> >> >> >> As far as I remember, there was no version update on the server. It >> >> >> was definitely installed as version 7. >> >> >> >> >> >> Shared bricks: >> >> >> >> >> >> Server 1: >> >> >> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: >> >> >> option shared-brick-count 2 >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option >> >> >> shared-brick-count 2 >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: >> >> >> option shared-brick-count 0 >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option >> >> >> shared-brick-count 0 >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: >> >> >> option shared-brick-count 0 >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option >> >> >> shared-brick-count 0 >> >> >> >> >> >> Server 2: >> >> >> >> >> >>
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
The issue is not with glusterd restart. We need to reproduce from beginning and add-bricks to check df -h values. I suggest not to try on the production environment. if you have any other machines, please let me know. On Fri, May 29, 2020 at 1:37 PM Petr Certik wrote: > If you mean the issue during node restart, then yes, I think I could > reproduce that with a custom build. It's a production system, though, > so I'll need to be extremely careful. > > We're using debian glusterfs-server 7.3-1 amd64, can you provide a > custom glusterd binary based off of that version? > > Cheers, > Petr > > On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde wrote: > > > > Surprising! Will you be able to reproduce the issue and share the logs > if I provide a custom build with more logs? > > > > On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: > >> > >> Thanks for your help! Much appreciated. > >> > >> The fsid is the same for all bricks: > >> > >> imagegluster1: > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> > >> imagegluster2: > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > >> > >> imagegluster3: > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 > >> > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 > >> > >> > >> I already did try restarting the glusterd nodes with no effect, but > >> that was before the upgrades of client versions. > >> > >> Running the "volume set" command did not seem to work either, the > >> shared-brick-counts are still the same (2). > >> > >> However, when restarting a node, I do get an error and a few warnings > >> in the log: https://pastebin.com/tqq1FCwZ > >> > >> > >> > >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde > wrote: > >> > > >> > The shared-brick-count value indicates the number of bricks sharing a > file-system. In your case, it should be one, as all the bricks are from > different mount points. Can you please share the values of brick-fsid? > >> > > >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ > >> > > >> > I tried reproducing this issue in fedora vm's but couldn't hit this. > we are seeing this issue on and off but are unable to reproduce in-house. > If you see any error messages in glusterd.log please share the log too. > >> > > >> > Work-around to come out from this situation: > >> > 1. Restarting the glusterd service on all nodes: > >> > # systemctl restart glusterd > >> > > >> > 2. Run set volume command to update vol file: > >> > # gluster v set min-free-disk 11% > >> > > >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik wrote: > >> >> > >> >> As far as I remember, there was no version update on the server. It > >> >> was definitely installed as version 7. > >> >> > >> >> Shared bricks: > >> >> > >> >> Server 1: > >> >> > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> >> option shared-brick-count 2 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol: > option > >> >> shared-brick-count 2 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> >> option shared-brick-count 0 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol: > option > >> >> shared-brick-count 0 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: > >> >> option shared-brick-count 0 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol: > option > >> >> shared-brick-count 0 > >> >> > >> >> Server 2: > >> >> > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> >> option shared-brick-count 0 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol: > option > >> >> shared-brick-count 0 > >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> >> option shared-brick-count 2 > >> >>
[Gluster-users] The dht-layout interval is missing
Hi All, I have 6 servers that form a glusterfs 2x3 distributed replication volume, the details are as follows: [root@node1 ~]# gluster volume info Volume Name: abcd_vol Type: Distributed-Replicate Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: node1:/export/test/abcd Brick2: node2:/export/test/abcd Brick3: node3:/export/test/abcd Brick4: node4:/export/test/abcd Brick5: node5:/export/test/abcd Brick6: node6:/export/test/abcd Options Reconfigured: diagnostics.client-log-level: DEBUG diagnostics.brick-log-level: DEBUG performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.quorum-type: auto Restart 6 servers at a time, the server automatically mounts the abcd_vol volume to the / home / test-abcd directory during the startup process: [root@node1 ~]# df -h | grep abcd_vol node1:/abcd_vol4.5T 362G 4.1T9% /home/test-abcd Failed to create a file in the glusterfs mount directory of node3, the log shows the error as follows (the files created by other servers are normal): [2020-05-29 05:55:03.065656] E [MSGID: 109011] [dht-common.c:8683:dht_create] 0-abcd_vol-dht: no subvolume in layout for path=/.abcd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 [2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk] 0-glusterfs-fuse: 2454790: /.abcd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误) [2020-05-29 05:55:03.680303] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:04.687456] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:04.688612] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:05.694446] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:05.830555] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:06.700423] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:07.706536] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:07.833049] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:08.712128] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696 The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2 times between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541] The pathinfo information in the / home / test-abcd directory is displayed as follows: [root@node3 test-abcd]# getfattr -d -m . -n trusted.glusterfs.pathinfo /home/test-abcd/ getfattr: Removing leading '/' from absolute path names # file: home/test-abcd/ trusted.glusterfs.pathinfo="(( ( ) ( )) (abcd_vol-dht-layout (abcd_vol-re plicate-0 0 0) (abcd_vol-replicate-1 3539976838 4294967295)))" It can be seen from the above information that abcd_vol-dht-layout is missing the interval 0 to 3539976837 After umount, remounting returned to normal ... What is the reason? Best regards Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
If you mean the issue during node restart, then yes, I think I could reproduce that with a custom build. It's a production system, though, so I'll need to be extremely careful. We're using debian glusterfs-server 7.3-1 amd64, can you provide a custom glusterd binary based off of that version? Cheers, Petr On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde wrote: > > Surprising! Will you be able to reproduce the issue and share the logs if I > provide a custom build with more logs? > > On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: >> >> Thanks for your help! Much appreciated. >> >> The fsid is the same for all bricks: >> >> imagegluster1: >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> imagegluster2: >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 >> >> imagegluster3: >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 >> >> >> I already did try restarting the glusterd nodes with no effect, but >> that was before the upgrades of client versions. >> >> Running the "volume set" command did not seem to work either, the >> shared-brick-counts are still the same (2). >> >> However, when restarting a node, I do get an error and a few warnings >> in the log: https://pastebin.com/tqq1FCwZ >> >> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde wrote: >> > >> > The shared-brick-count value indicates the number of bricks sharing a >> > file-system. In your case, it should be one, as all the bricks are from >> > different mount points. Can you please share the values of brick-fsid? >> > >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ >> > >> > I tried reproducing this issue in fedora vm's but couldn't hit this. we >> > are seeing this issue on and off but are unable to reproduce in-house. If >> > you see any error messages in glusterd.log please share the log too. >> > >> > Work-around to come out from this situation: >> > 1. Restarting the glusterd service on all nodes: >> > # systemctl restart glusterd >> > >> > 2. Run set volume command to update vol file: >> > # gluster v set min-free-disk 11% >> > >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik wrote: >> >> >> >> As far as I remember, there was no version update on the server. It >> >> was definitely installed as version 7. >> >> >> >> Shared bricks: >> >> >> >> Server 1: >> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: >> >> option shared-brick-count 2 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option >> >> shared-brick-count 2 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: >> >> option shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option >> >> shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: >> >> option shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option >> >> shared-brick-count 0 >> >> >> >> Server 2: >> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: >> >> option shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option >> >> shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: >> >> option shared-brick-count 2 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option >> >> shared-brick-count 2 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: >> >> option shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option >> >> shared-brick-count 0 >> >> >> >> Server 3: >> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: >> >> option shared-brick-count 0 >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:
[Gluster-users] The dht-layout interval is missing
Hi All, I have 6 servers that form a glusterfs 2x3 distributed replication volume, the details are as follows: [root@node1 ~]# gluster volume info Volume Name: ksvd_vol Type: Distributed-Replicate Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: node1:/export/kylin/ksvd Brick2: node2:/export/kylin/ksvd Brick3: node3:/export/kylin/ksvd Brick4: node4:/export/kylin/ksvd Brick5: node5:/export/kylin/ksvd Brick6: node6:/export/kylin/ksvd Options Reconfigured: diagnostics.client-log-level: DEBUG diagnostics.brick-log-level: DEBUG performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.quorum-type: auto Restart 6 servers at a time, the server automatically mounts the ksvd_vol volume to the / home / kylin-ksvd directory during the startup process: [root@node1 ~]# df -h | grep ksvd_vol node1:/ksvd_vol4.5T 362G 4.1T9% /home/kylin-ksvd Failed to create a file in the glusterfs mount directory of node3, the log shows the error as follows (the files created by other servers are normal): [2020-05-29 05:55:03.065656] E [MSGID: 109011] [dht-common.c:8683:dht_create] 0-ksvd_vol-dht: no subvolume in layout for path=/.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 [2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk] 0-glusterfs-fuse: 2454790: /.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误) [2020-05-29 05:55:03.680303] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:04.687456] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:04.688612] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:05.694446] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:05.830555] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:06.700423] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:07.706536] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 [2020-05-29 05:55:07.833049] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 491618322 [2020-05-29 05:55:08.712128] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696 The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2 times between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541] The pathinfo information in the / home / kylin-ksvd directory is displayed as follows: [root@node3 kylin-ksvd]# getfattr -d -m . -n trusted.glusterfs.pathinfo /home/kylin-ksvd/ getfattr: Removing leading '/' from absolute path names # file: home/kylin-ksvd/ trusted.glusterfs.pathinfo="(( ( ) ( )) (ksvd_vol-dht-layout (ksvd_vol-replicate-0 0 0) (ksvd_vol-replicate-1 3539976838 4294967295)))" It can be seen from the above information that ksvd_vol-dht-layout is missing the interval 0 to 3539976837 After umount, remounting returned to normal ... What is the reason? Best regards Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume
Surprising! Will you be able to reproduce the issue and share the logs if I provide a custom build with more logs? On Thu, May 28, 2020 at 1:35 PM Petr Certik wrote: > Thanks for your help! Much appreciated. > > The fsid is the same for all bricks: > > imagegluster1: > > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065 > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065 > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > > imagegluster2: > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065 > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065 > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0 > > imagegluster3: > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0 > /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0 > > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065 > /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065 > > > I already did try restarting the glusterd nodes with no effect, but > that was before the upgrades of client versions. > > Running the "volume set" command did not seem to work either, the > shared-brick-counts are still the same (2). > > However, when restarting a node, I do get an error and a few warnings > in the log: https://pastebin.com/tqq1FCwZ > > > > On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde wrote: > > > > The shared-brick-count value indicates the number of bricks sharing a > file-system. In your case, it should be one, as all the bricks are from > different mount points. Can you please share the values of brick-fsid? > > > > grep "brick-fsid" /var/lib/glusterd/vols//bricks/ > > > > I tried reproducing this issue in fedora vm's but couldn't hit this. we > are seeing this issue on and off but are unable to reproduce in-house. If > you see any error messages in glusterd.log please share the log too. > > > > Work-around to come out from this situation: > > 1. Restarting the glusterd service on all nodes: > > # systemctl restart glusterd > > > > 2. Run set volume command to update vol file: > > # gluster v set min-free-disk 11% > > > > On Wed, May 27, 2020 at 5:24 PM Petr Certik wrote: > >> > >> As far as I remember, there was no version update on the server. It > >> was definitely installed as version 7. > >> > >> Shared bricks: > >> > >> Server 1: > >> > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> option shared-brick-count 2 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option > >> shared-brick-count 2 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option > >> shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option > >> shared-brick-count 0 > >> > >> Server 2: > >> > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option > >> shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> option shared-brick-count 2 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option > >> shared-brick-count 2 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option > >> shared-brick-count 0 > >> > >> Server 3: > >> > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option > >> shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol: > >> option shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option > >> shared-brick-count 0 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol: > >> option shared-brick-count 2 > >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option > >> shared-brick-count 2 > >> > >> On Wed, May 27, 2020 at 1:36 PM