[Gluster-users] Gfid mismatch detected - but no split brain - how to solve?

2020-05-29 Thread lejeczek
hi Guys

I'm seeing "Gfid mismatch detected" in the logs but no split
brain indicated (4-way replica)

Brick
swir-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME
Status: Connected
Total Number of entries: 22
Number of entries in heal pending: 22
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick
whale-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME
Status: Connected
Total Number of entries: 22
Number of entries in heal pending: 22
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick
rider-ring8:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick dzien:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER.USER-HOME
Status: Connected
Total Number of entries: 10
Number of entries in heal pending: 10
Number of entries in split-brain: 0
Number of entries possibly healing: 0

On swir-ring8:
...
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/lock_file>,
37b2456f-5216-4679-ac5c-4908b24f895a on USER-HOME-client-15
and ba8f87ed-9bf3-404e-8d67-2631923e1645 on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:47:49.034935] and [2020-05-29 21:47:49.079480]
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/t>,
d7a4ed01-139b-4df3-8070-31bd620a6f15 on USER-HOME-client-15
and d794b6ba-2a1d-4043-bb31-b98b22692763 on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:47:49.126173] and [2020-05-29 21:47:49.155432]
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/Tables.docx>,
344febd8-c89c-4bf3-8ad8-6494c2189c43 on USER-HOME-client-15
and 48d5b12b-03f4-46bf-bed1-9f8f88815615 on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:47:49.194061] and [2020-05-29 21:47:49.239896]
The message "E [MSGID: 108008]
[afr-self-heal-entry.c:257:afr_selfheal_detect_gfid_and_type_mismatch]
0-USER-HOME-replicate-0: Skipping conservative merge on the
file." repeated 8 times between [2020-05-29 21:47:49.037812]
and [2020-05-29 21:47:49.240423]
...

On whale-ring8:
...
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/pcs>,
a83d0e5f-ef3a-40ab-be7b-784538d150be on USER-HOME-client-15
and 89af3d31-81fa-4242-b8f7-0f49fd5fe57b on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:45:46.152052] and [2020-05-29 21:45:46.422393]
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/history_database>,
81ebb0d5-264a-4eba-984a-e18673b43826 on USER-HOME-client-15
and 2498a303-8937-43c3-939e-5e1d786b07fa on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:45:46.167704] and [2020-05-29 21:45:46.437702]
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/client-state>,
fe86c057-c74d-417f-9c2c-6e6eb9778851 on USER-HOME-client-15
and a66f2714-c2a0-4bdc-8786-ad5b93e0e988 on
USER-HOME-client-13." repeated 2 times between [2020-05-29
21:45:46.144242] and [2020-05-29 21:45:46.442526]
The message "E [MSGID: 108008]
[afr-self-heal-common.c:384:afr_gfid_split_brain_source]
0-USER-HOME-replicate-0: Gfid mismatch detected for
/history_database.1>,
9826d8ad-fecc-4dd7-bc1f-87d0eff23d73 on USER-HOME-client-15
and 81ebb0d5-264a-4eba-984a-e18673b43826 on
USER-HOME-client-13." repeated 3 times between [2020-05-29
21:45:46.162016] and [2020-05-29 21:45:46.476935]
...

On rider-ring8:
...
2020-05-29 21:46:53.122929] E [MSGID: 114031]
[client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk]
0-QEMU_VMs-client-3: remote operation failed. Path:

(6f01098f-e8db-4f63-a661-86b4d02d937f) [Permission denied]
[2020-05-29 21:46:53.124148] E [MSGID: 114031]
[client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk]
0-QEMU_VMs-client-4: remote operation failed. Path:

(6f01098f-e8db-4f63-a661-86b4d02d937f) [Permission denied]
[2020-05-29 21:46:53.133566] I [MSGID: 108026]
[afr-self-heal-entry.c:898:afr_selfheal_entry_do]
0-QEMU_VMs-replicate-0: performing entry selfheal on
e0121f76-2452-44dc-b1a6-82b46cc9ec79
[2020-05-29 21:46:53.145991] E [MSGID: 114031]
[client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk]
0-QEMU_VMs-client-3: remote operation failed. Path:

(3f0239ac-e027-4a0c-b271-431e76ad97b1) [Permission denied]
[2020-05-29 21:46:53.147110] E [MSGID: 114031]
[client-rpc-fops_v2.c:1548:client4_0_xattrop_cbk]
0-QEMU_VMs-client-4: remote operation failed. Path:

(3f0239ac-e027-4a0c-b271-431e76ad97b1) [Permission denied]

The most recent data I'm 100% certain is on rider-ring8.
Any expert could 

[Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-29 Thread David Cunningham
Hello,

We're having an issue with a geo-replication process with unusually high
CPU use and giving "Entry not present on master. Fixing gfid mismatch in
slave" errors. Can anyone help on this?

We have 3 GlusterFS replica nodes (we'll call the master), which also push
data to a remote server (slave) using geo-replication. This has been
running fine for a couple of months, but yesterday one of the master nodes
started having unusually high CPU use. It's this process:

root@cafs30:/var/log/glusterfs# ps aux | grep 32048
root 32048 68.7  0.6 1843140 845756 ?  Rl   02:51 493:51 python2
/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker
gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
--resource-remote nvfs30 --resource-remote-id
1e698ccd-aeec-4ec4-96fe-383da8fc3b78

Here's what is being logged in
/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:

[2020-05-29 21:57:18.843524] I [master(worker
/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
 stime=(1590789408, 0)
[2020-05-29 21:57:30.626172] I [master(worker
/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204,
u'entry': u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7.cfg',
u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
u'slave_name': None, u'slave_gfid':
u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False, u'dst':
False})
[2020-05-29 21:57:30.627893] I [master(worker
/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204,
u'entry':
u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7.cfg',
u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
u'slave_name': None, u'slave_gfid':
u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False, u'dst':
False})
[2020-05-29 21:57:30.629532] I [master(worker
/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204,
u'entry':
u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7-directory.xml',
u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
u'slave_name': None, u'slave_gfid':
u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False, u'dst':
False})
[2020-05-29 21:57:30.659123] I [master(worker
/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster:
Sucessfully fixed entry ops with gfid mismatch retry_count=1
[2020-05-29 21:57:30.659343] I [master(worker
/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry
original entries. count = 1
[2020-05-29 21:57:30.725810] I [master(worker
/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster:
Sucessfully fixed all entry ops with gfid mismatch
[2020-05-29 21:57:31.747319] I [master(worker
/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
duration=0.7409 num_files=18job=1   return_code=0

We've verified that the files like polycom_00a859f7.cfg referred to in
the error do exist on the master nodes and slave.

We found this bug fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1642865

However that fix went in 5.1, and we're running 5.12 on the master nodes
and slave. A couple of GlusterFS clients connected to the master nodes are
running 5.13.

Would anyone have any suggestions? Thank you in advance.

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] File system very slow

2020-05-29 Thread vadud3
I am using nfs mount of gluster volume to get better performance


On Wed, May 27, 2020 at 10:02 AM  wrote:

> - # gluster --version
> glusterfs 7.5
>
> - # gluster volume status atlassian
> Status of volume: atlassian
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick node1:/data/atlassian/gluster  49152 0  Y   1791
> Brick node2:/data/atlassian/gluster  49152 0  Y   1773
> Self-heal Daemon on localhost   N/A   N/AY
> 1807
> Self-heal Daemon on node1.example.c
> example.net   N/A   N/AY
> 1778
>
> Task Status of Volume atlassian
>
> --
> There are no active volume tasks
>
> - # attached pre-du and during-du log from server
>
>
> - I do not have a remote client. when I tried to run these
> gluster volume profile your-volume start says already started since I am
> on the server
> # setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt /mnt runs but
> no output in /tmp/io-stats-pre.txt
>
>
> - # gluster volume heal atlassian info
> Brick node1:/data/atlassian/gluster
> Status: Connected
> Number of entries: 0
>
> Brick node2:/data/atlassian/gluster
> Status: Connected
> Number of entries: 0
>
> Let me know if you need anything else. Appreciate your help
>
>
> On Wed, May 27, 2020 at 3:27 AM Karthik Subrahmanya 
> wrote:
>
>> Hi,
>>
>> Please provide the following information to understand the setup and
>> debug this further:
>> - Which version of gluster you are using?
>> - 'gluster volume status atlassian' to confirm both bricks and shds are
>> up or not
>> - Complete output of 'gluster volume profile atlassian info' before
>> running 'du' and during 'du'. Redirect this output to separate files and
>> attach them here
>> - Get the client side profile as well by following
>> https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/
>> - 'gluster volume heal atlassian info' to check whether there are any
>> pending heals and client side heal is contributing to this
>>
>> Regards,
>> Karthik
>>
>> On Wed, May 27, 2020 at 1:06 AM  wrote:
>>
>>> I had a parsing error. It is Volume Name: atlassian
>>>
>>> On Tue, May 26, 2020 at 3:12 PM  wrote:
>>>
 # gluster volume info

 Volume Name: myvol
 Type: Replicate
 Volume ID: cbdef65c-79ea-496e-b777-b6a2981b29cf
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: node1:/data/foo/gluster
 Brick2: node2:/data/foo/gluster
 Options Reconfigured:
 client.event-threads: 4
 server.event-threads: 4
 performance.stat-prefetch: on
 network.inode-lru-limit: 16384
 performance.md-cache-timeout: 1
 performance.cache-invalidation: false
 performance.cache-samba-metadata: false
 features.cache-invalidation-timeout: 600
 features.cache-invalidation: on
 performance.io-thread-count: 16
 performance.cache-refresh-timeout: 5
 performance.write-behind-window-size: 5MB
 performance.cache-size: 1GB
 transport.address-family: inet
 storage.fips-mode-rchecksum: on
 nfs.disable: on
 performance.client-io-threads: off
 diagnostics.latency-measurement: on
 diagnostics.count-fop-hits: on

 On Tue, May 26, 2020 at 3:06 PM Sunil Kumar Heggodu Gopala Acharya <
 shegg...@redhat.com> wrote:

> Hi,
>
> Please share the gluster volume information.
>
> # gluster vol info
>
>
> Regards,
>
> Sunil kumar Acharya
>
>
> On Wed, May 27, 2020 at 12:30 AM  wrote:
>
>> I made the following changes for small file performance as suggested
>> by
>> http://blog.gluster.org/gluster-tiering-and-small-file-performance/
>>
>> I am still seeing du -sh /data/shared taking 39 minutes.
>>
>> Any other tuning I can do. Most of my files are 15K. Here is sample
>> of small files with size and number of occurrences
>>
>> FileSize.# of occurrence
>> 
>>
>> 1.1K 1122
>> 1.1M 1040
>> 1.2K 1281
>> 1.2M 1357
>> 1.3K 1149
>> 1.3M 1098
>> 1.4K 1119
>> 1.5K 1189
>> 1.6K 1036
>> 1.7K 1169
>> 11K 2157
>> 12K 2398
>> 13K 2402
>> 14K 2406*15K 2426*
>> 16K 2386
>> 17K 1986
>> 18K 2037
>> 19K 1829
>> 2.0K 1027
>> 2.1K 1048
>> 2.4K 1013
>> 20K 1585
>> 21K 1713
>> 22K 1590
>> 23K 1371
>> 24K 1428
>> 25K 1444
>> 26K 1391
>> 27K 1217
>> 28K 1485
>> 29K 1282
>> 30K 1303
>> 31K 1275
>> 32K 1296
>> 33K 1058
>> 36K 1023
>> 37K 1107
>> 39K 1092
>> 41K 1034
>> 42K 1187
>> 46K 1030
>>
>>
>>
>>
>>
>> On Mon, May 25, 2020 at 5:30 PM  

Re: [Gluster-users] Failing to nfs mount gluster volume

2020-05-29 Thread vadud3
I am running gluster 7.5 on CentOS 7 which does not have gnfs compiled.

I had to build it from source with ./configure --enable-gnfs
--without-libtirpc[1] and then I could nfs mount the gluster volume

[1] https://docs.gluster.org/en/latest/Developer-guide/Building-GlusterFS/

On Fri, May 29, 2020 at 12:42 PM  wrote:

> Turned off nfs-server service and now I am getting a different error
> message
>
> [root@node1 ~]# mount -vv -t nfs -o vers=3,mountproto=tcp 192.168.1.121:/gv0
> /nfs_mount/
> mount.nfs: timeout set for Fri May 29 16:43:30 2020
> mount.nfs: trying text-based options
> 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
> mount.nfs: prog 13, trying vers=3, prot=6
> mount.nfs: portmap query retrying: RPC: Program not registered
> mount.nfs: prog 13, trying vers=3, prot=17
> mount.nfs: portmap query failed: RPC: Program not registered
> mount.nfs: trying text-based options
> 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
> mount.nfs: prog 13, trying vers=3, prot=6
> mount.nfs: portmap query retrying: RPC: Program not registered
> mount.nfs: prog 13, trying vers=3, prot=17
> mount.nfs: portmap query failed: RPC: Program not registered
> mount.nfs: trying text-based options
> 'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
> mount.nfs: prog 13, trying vers=3, prot=6
> mount.nfs: portmap query retrying: RPC: Program not registered
> mount.nfs: prog 13, trying vers=3, prot=17
> mount.nfs: portmap query failed: RPC: Program not registered
> mount.nfs: requested NFS version or transport protocol is not supported
>
> [root@node1 ~]# gluster volume info gv0
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.121:/data/brick1/gv0
> Brick2: 192.168.1.122:/data/brick2/gv0
> Options Reconfigured:
> nfs.register-with-portmap: on
> performance.client-io-threads: off
> nfs.disable: off
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.volume-access: read-write
> nfs.mount-udp: off
>
>
> On Thu, May 28, 2020 at 6:48 PM  wrote:
>
>> [root@node1 ~]# uname -a
>> Linux node1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root@node1 ~]# gluster --version
>> glusterfs 7.5
>> ...
>>
>> [root@node1 ~]# gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.1.121:/data/brick1/gv0
>> Brick2: 192.168.1.122:/data/brick2/gv0
>> Options Reconfigured:
>> transport.address-family: inet
>> storage.fips-mode-rchecksum: on
>> nfs.disable: off
>> performance.client-io-threads: off
>>
>>
>> [root@node1 ~]# mount -v -t nfs -o vers=3 192.168.1.121:/gv0 /nfs_mount
>> mount.nfs: timeout set for Thu May 28 22:48:39 2020
>> mount.nfs: trying text-based options 'vers=3,addr=192.168.1.121'
>> mount.nfs: prog 13, trying vers=3, prot=6
>> mount.nfs: trying 192.168.1.121 prog 13 vers 3 prot TCP port 2049
>> mount.nfs: prog 15, trying vers=3, prot=17
>> mount.nfs: trying 192.168.1.121 prog 15 vers 3 prot UDP port 20048
>> mount.nfs: mount(2): No such file or directory
>> mount.nfs: mounting 192.168.1.121:/gv0 failed, reason given by server:
>> No such file or directory
>>
>>
>>
>> --
>> Asif Iqbal
>> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
>>
>>
>
> --
> Asif Iqbal
> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>
>

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS saturates server disk IO due to write brick temporary file to ".glusterfs" directory

2020-05-29 Thread Qing Wang
Hi,

I have GridFTP + a network speedup solution in network + GlusterFS as a
file system component in a disk-to-disk data transferring scenario. For
glusterfs, I start with creating bricks inside /dev/sda1 filesystem. During
the file transfer (12 GB), it seems that glusterfs tries to write a
temporary file into the .glusterfs directory at a very high rate and
essentially the disk IO rate of the server host is ~100%. This leads the
file transfer performance unpredictable as file transfer(i.e., GridFTP)
would sometimes wait and send nothing due to the disk busy of glusterfs
server. As as test, I change the brick location to tmpfs (in memory) then
the disk I/O won't be saturated, this makes sense b/c the file writing
would happen in memory in this case, but this is not disk-to-disk transfer
anymore so this can't be an effective workaround in my case.

I wonder if there is any way I can turn off glusterfs to write temporary
files into the .gluster directory during the file transfer? Based on my
reading so far, I understand that this is a glusterfs design decision for
some recovering purpose, just in my case the writing rate is too fast and
saturates the disk IO and makes my disk-to-disk performance unstable.

My glusterfs version is: glusterfs 3.7.6 built on Dec 25 2015 20:50:46. I
am glad to post more volume or other setup details if necessary.

Thanks,
Qing




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Failing to nfs mount gluster volume

2020-05-29 Thread vadud3
Turned off nfs-server service and now I am getting a different error message

[root@node1 ~]# mount -vv -t nfs -o vers=3,mountproto=tcp 192.168.1.121:/gv0
/nfs_mount/
mount.nfs: timeout set for Fri May 29 16:43:30 2020
mount.nfs: trying text-based options
'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
mount.nfs: prog 13, trying vers=3, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 13, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying text-based options
'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
mount.nfs: prog 13, trying vers=3, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 13, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying text-based options
'vers=3,mountproto=tcp,addr=192.168.1.121,mountaddr=192.168.1.121'
mount.nfs: prog 13, trying vers=3, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 13, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: requested NFS version or transport protocol is not supported

[root@node1 ~]# gluster volume info gv0

Volume Name: gv0
Type: Replicate
Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.121:/data/brick1/gv0
Brick2: 192.168.1.122:/data/brick2/gv0
Options Reconfigured:
nfs.register-with-portmap: on
performance.client-io-threads: off
nfs.disable: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.volume-access: read-write
nfs.mount-udp: off


On Thu, May 28, 2020 at 6:48 PM  wrote:

> [root@node1 ~]# uname -a
> Linux node1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
> x86_64 x86_64 x86_64 GNU/Linux
>
> [root@node1 ~]# gluster --version
> glusterfs 7.5
> ...
>
> [root@node1 ~]# gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: f33246a4-2e9a-4958-8aff-4cee815703bc
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.121:/data/brick1/gv0
> Brick2: 192.168.1.122:/data/brick2/gv0
> Options Reconfigured:
> transport.address-family: inet
> storage.fips-mode-rchecksum: on
> nfs.disable: off
> performance.client-io-threads: off
>
>
> [root@node1 ~]# mount -v -t nfs -o vers=3 192.168.1.121:/gv0 /nfs_mount
> mount.nfs: timeout set for Thu May 28 22:48:39 2020
> mount.nfs: trying text-based options 'vers=3,addr=192.168.1.121'
> mount.nfs: prog 13, trying vers=3, prot=6
> mount.nfs: trying 192.168.1.121 prog 13 vers 3 prot TCP port 2049
> mount.nfs: prog 15, trying vers=3, prot=17
> mount.nfs: trying 192.168.1.121 prog 15 vers 3 prot UDP port 20048
> mount.nfs: mount(2): No such file or directory
> mount.nfs: mounting 192.168.1.121:/gv0 failed, reason given by server: No
> such file or directory
>
>
>
> --
> Asif Iqbal
> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>
>

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Petr Certik
Correct, every brick is a separate xfs-formatted disk attached to the
machine. There are two disks per machine, the ones mounted in `/data2`
are the newer ones.

Thanks for the reassurance, that means we can take as long as
necessary to diagnose this. Let me know if I there's more data I can
provide. lsblk and stat -f outputs follow:

$ ssh imagegluster1 "lsblk"
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00   9.3G  0 disk
└─sda1   8:10   9.3G  0 part /
sdb  8:16   0 894.3G  0 disk
└─sdb1   8:17   0 894.3G  0 part /data2
sdc  8:32   0 894.3G  0 disk
└─sdc1   8:33   0 894.3G  0 part /data

$ ssh imagegluster2 "lsblk"
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00   9.3G  0 disk
└─sda1   8:10   9.3G  0 part /
sdb  8:16   0 894.3G  0 disk
└─sdb1   8:17   0 894.3G  0 part /data2
sdc  8:32   0 894.3G  0 disk
└─sdc1   8:33   0 894.3G  0 part /data

$ ssh imagegluster3 "lsblk"
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00   9.3G  0 disk
└─sda1   8:10   9.3G  0 part /
sdb  8:16   0 894.3G  0 disk
└─sdb1   8:17   0 894.3G  0 part /data2
sdc  8:32   0 894.3G  0 disk
└─sdc1   8:33   0 894.3G  0 part /data

$ ssh imagegluster1 "stat -f /data; stat -f /data2"
  File: "/data"
ID: 821 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307548  Free: 111493566  Available: 111493566
Inodes: Total: 468843968  Free: 459695286
  File: "/data2"
ID: 811 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307553  Free: 10486  Available: 10486
Inodes: Total: 468844032  Free: 459769261

$ ssh imagegluster2 "stat -f /data; stat -f /data2"
  File: "/data"
ID: 821 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307548  Free: 111489680  Available: 111489680
Inodes: Total: 468843968  Free: 459695437
  File: "/data2"
ID: 811 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307553  Free: 10492  Available: 10492
Inodes: Total: 468844032  Free: 459769261

$ ssh imagegluster3 "stat -f /data; stat -f /data2"
  File: "/data"
ID: 821 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307548  Free: 111495441  Available: 111495441
Inodes: Total: 468843968  Free: 459695437
  File: "/data2"
ID: 811 Namelen: 255 Type: xfs
Block size: 4096   Fundamental block size: 4096
Blocks: Total: 234307553  Free: 10505  Available: 10505
Inodes: Total: 468844032  Free: 459769261

On Fri, May 29, 2020 at 1:10 PM Sanju Rakonde  wrote:
>
> Hi Petr,
>
> it's absolutely safe to use this volume. you will not see any problems even 
> if the actual used size is greater than the reported total size of the volume 
> and it is safe to upgrade as well.
>
> Can you please share the output of the following:
> l1. sblk output from all the 3 nodes in the cluster
> 2. stat -f  for all the bricks
>
> I hope all the bricks are having a separate filesystem, and it's not shared 
> between any two bricks. Am I correct?
>
> On Fri, May 29, 2020 at 4:25 PM Petr Certik  wrote:
>>
>> Thanks!
>>
>> One more question -- I don't really mind having the wrong size
>> reported by df, but I'm worried whether it is safe to use the volume.
>> Will it be okay if I write to it? For example, once the actual used
>> size is greater than the reported total size of the volume, should I
>> expect problems? And is it safe to upgrade glusterfs when the volume
>> is in this state?
>>
>> Cheers,
>> Petr
>>
>> On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde  wrote:
>> >
>> > Nope, for now. I will update you if we figure out any other workaround.
>> >
>> > Thanks for your help!
>> >
>> > On Fri, May 29, 2020 at 2:50 PM Petr Certik  wrote:
>> >>
>> >> I'm afraid I don't have the resources to try and reproduce from the
>> >> beginning. Is there anything else I can do to get you more
>> >> information?
>> >>
>> >>
>> >> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde  
>> >> wrote:
>> >> >
>> >> > The issue is not with glusterd restart. We need to reproduce from 
>> >> > beginning and add-bricks to check df -h values.
>> >> >
>> >> > I suggest not to try on the production environment. if you have any 
>> >> > other machines, please let me know.
>> >> >
>> >> > On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:
>> >> >>
>> >> >> If you mean the issue during node restart, then yes, I think I could
>> >> >> reproduce that with a custom build. It's a production system, though,
>> >> >> so I'll need to be extremely careful.
>> >> >>
>> >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
>> >> >> custom glusterd binary based off of that version?
>> >> >>
>> >> >> Cheers,
>> >> >> Petr
>> >> >>
>> >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde  
>> >> >> wrote:
>> >> >> >
>> >> >> > 

Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Sanju Rakonde
Hi Petr,

it's absolutely safe to use this volume. you will not see any problems even
if the actual used size is greater than the reported total size of the
volume and it is safe to upgrade as well.

Can you please share the output of the following:
l1. sblk output from all the 3 nodes in the cluster
2. stat -f  for all the bricks

I hope all the bricks are having a separate filesystem, and it's not shared
between any two bricks. Am I correct?

On Fri, May 29, 2020 at 4:25 PM Petr Certik  wrote:

> Thanks!
>
> One more question -- I don't really mind having the wrong size
> reported by df, but I'm worried whether it is safe to use the volume.
> Will it be okay if I write to it? For example, once the actual used
> size is greater than the reported total size of the volume, should I
> expect problems? And is it safe to upgrade glusterfs when the volume
> is in this state?
>
> Cheers,
> Petr
>
> On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde 
> wrote:
> >
> > Nope, for now. I will update you if we figure out any other workaround.
> >
> > Thanks for your help!
> >
> > On Fri, May 29, 2020 at 2:50 PM Petr Certik  wrote:
> >>
> >> I'm afraid I don't have the resources to try and reproduce from the
> >> beginning. Is there anything else I can do to get you more
> >> information?
> >>
> >>
> >> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde 
> wrote:
> >> >
> >> > The issue is not with glusterd restart. We need to reproduce from
> beginning and add-bricks to check df -h values.
> >> >
> >> > I suggest not to try on the production environment. if you have any
> other machines, please let me know.
> >> >
> >> > On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:
> >> >>
> >> >> If you mean the issue during node restart, then yes, I think I could
> >> >> reproduce that with a custom build. It's a production system, though,
> >> >> so I'll need to be extremely careful.
> >> >>
> >> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
> >> >> custom glusterd binary based off of that version?
> >> >>
> >> >> Cheers,
> >> >> Petr
> >> >>
> >> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde 
> wrote:
> >> >> >
> >> >> > Surprising! Will you be able to reproduce the issue and share the
> logs if I provide a custom build with more logs?
> >> >> >
> >> >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik 
> wrote:
> >> >> >>
> >> >> >> Thanks for your help! Much appreciated.
> >> >> >>
> >> >> >> The fsid is the same for all bricks:
> >> >> >>
> >> >> >> imagegluster1:
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >> >> >>
> >> >> >> imagegluster2:
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >> >> >>
> >> >> >> imagegluster3:
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
> >> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
> >> >> >>
> >> >> >>
> >> >> >> I already did try restarting the glusterd nodes with no effect,
> but
> >> >> >> that was before the upgrades of client versions.
> >> >> >>
> >> >> >> Running the "volume set" command did not seem to work either, the
> >> >> >> shared-brick-counts are still the same (2).
> >> >> >>
> >> >> >> However, when restarting a node, I do get an error and a few
> warnings
> >> >> >> in the log: https://pastebin.com/tqq1FCwZ
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde <
> srako...@redhat.com> wrote:
> >> >> >> >
> >> >> >> > The shared-brick-count value indicates the number of bricks
> sharing a file-system. In your case, it should be one, as all the bricks
> are from different mount points. Can you please share the values of
> brick-fsid?
> >> >> >> >
> >> >> 

Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Petr Certik
Thanks!

One more question -- I don't really mind having the wrong size
reported by df, but I'm worried whether it is safe to use the volume.
Will it be okay if I write to it? For example, once the actual used
size is greater than the reported total size of the volume, should I
expect problems? And is it safe to upgrade glusterfs when the volume
is in this state?

Cheers,
Petr

On Fri, May 29, 2020 at 11:37 AM Sanju Rakonde  wrote:
>
> Nope, for now. I will update you if we figure out any other workaround.
>
> Thanks for your help!
>
> On Fri, May 29, 2020 at 2:50 PM Petr Certik  wrote:
>>
>> I'm afraid I don't have the resources to try and reproduce from the
>> beginning. Is there anything else I can do to get you more
>> information?
>>
>>
>> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde  wrote:
>> >
>> > The issue is not with glusterd restart. We need to reproduce from 
>> > beginning and add-bricks to check df -h values.
>> >
>> > I suggest not to try on the production environment. if you have any other 
>> > machines, please let me know.
>> >
>> > On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:
>> >>
>> >> If you mean the issue during node restart, then yes, I think I could
>> >> reproduce that with a custom build. It's a production system, though,
>> >> so I'll need to be extremely careful.
>> >>
>> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
>> >> custom glusterd binary based off of that version?
>> >>
>> >> Cheers,
>> >> Petr
>> >>
>> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde  wrote:
>> >> >
>> >> > Surprising! Will you be able to reproduce the issue and share the logs 
>> >> > if I provide a custom build with more logs?
>> >> >
>> >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:
>> >> >>
>> >> >> Thanks for your help! Much appreciated.
>> >> >>
>> >> >> The fsid is the same for all bricks:
>> >> >>
>> >> >> imagegluster1:
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>> >> >>
>> >> >> imagegluster2:
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>> >> >>
>> >> >> imagegluster3:
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
>> >> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
>> >> >>
>> >> >>
>> >> >> I already did try restarting the glusterd nodes with no effect, but
>> >> >> that was before the upgrades of client versions.
>> >> >>
>> >> >> Running the "volume set" command did not seem to work either, the
>> >> >> shared-brick-counts are still the same (2).
>> >> >>
>> >> >> However, when restarting a node, I do get an error and a few warnings
>> >> >> in the log: https://pastebin.com/tqq1FCwZ
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde  
>> >> >> wrote:
>> >> >> >
>> >> >> > The shared-brick-count value indicates the number of bricks sharing 
>> >> >> > a file-system. In your case, it should be one, as all the bricks are 
>> >> >> > from different mount points. Can you please share the values of 
>> >> >> > brick-fsid?
>> >> >> >
>> >> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
>> >> >> >
>> >> >> > I tried reproducing this issue in fedora vm's but couldn't hit this. 
>> >> >> > we are seeing this issue on and off but are unable to reproduce 
>> >> >> > in-house. If you see any error messages in glusterd.log please share 
>> >> >> > the log too.
>> >> >> >
>> >> >> > Work-around to come out from this situation:
>> >> >> > 1. Restarting the glusterd service on all nodes:
>> >> >> > # systemctl restart glusterd
>> >> >> >
>> >> >> > 2. Run set volume command to update vol file:
>> >> >> > # gluster v set  min-free-disk 11%
>> >> >> >
>> >> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik  wrote:
>> >> >> >>
>> >> >> 

Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Sanju Rakonde
Nope, for now. I will update you if we figure out any other workaround.

Thanks for your help!

On Fri, May 29, 2020 at 2:50 PM Petr Certik  wrote:

> I'm afraid I don't have the resources to try and reproduce from the
> beginning. Is there anything else I can do to get you more
> information?
>
>
> On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde 
> wrote:
> >
> > The issue is not with glusterd restart. We need to reproduce from
> beginning and add-bricks to check df -h values.
> >
> > I suggest not to try on the production environment. if you have any
> other machines, please let me know.
> >
> > On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:
> >>
> >> If you mean the issue during node restart, then yes, I think I could
> >> reproduce that with a custom build. It's a production system, though,
> >> so I'll need to be extremely careful.
> >>
> >> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
> >> custom glusterd binary based off of that version?
> >>
> >> Cheers,
> >> Petr
> >>
> >> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde 
> wrote:
> >> >
> >> > Surprising! Will you be able to reproduce the issue and share the
> logs if I provide a custom build with more logs?
> >> >
> >> > On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:
> >> >>
> >> >> Thanks for your help! Much appreciated.
> >> >>
> >> >> The fsid is the same for all bricks:
> >> >>
> >> >> imagegluster1:
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >> >>
> >> >> imagegluster2:
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >> >>
> >> >> imagegluster3:
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
> >> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
> >> >>
> >> >>
> >> >> I already did try restarting the glusterd nodes with no effect, but
> >> >> that was before the upgrades of client versions.
> >> >>
> >> >> Running the "volume set" command did not seem to work either, the
> >> >> shared-brick-counts are still the same (2).
> >> >>
> >> >> However, when restarting a node, I do get an error and a few warnings
> >> >> in the log: https://pastebin.com/tqq1FCwZ
> >> >>
> >> >>
> >> >>
> >> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde 
> wrote:
> >> >> >
> >> >> > The shared-brick-count value indicates the number of bricks
> sharing a file-system. In your case, it should be one, as all the bricks
> are from different mount points. Can you please share the values of
> brick-fsid?
> >> >> >
> >> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
> >> >> >
> >> >> > I tried reproducing this issue in fedora vm's but couldn't hit
> this. we are seeing this issue on and off but are unable to reproduce
> in-house. If you see any error messages in glusterd.log please share the
> log too.
> >> >> >
> >> >> > Work-around to come out from this situation:
> >> >> > 1. Restarting the glusterd service on all nodes:
> >> >> > # systemctl restart glusterd
> >> >> >
> >> >> > 2. Run set volume command to update vol file:
> >> >> > # gluster v set  min-free-disk 11%
> >> >> >
> >> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik 
> wrote:
> >> >> >>
> >> >> >> As far as I remember, there was no version update on the server.
> It
> >> >> >> was definitely installed as version 7.
> >> >> >>
> >> >> >> Shared bricks:
> >> >> >>
> >> >> >> Server 1:
> >> >> >>
> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> >> >> option shared-brick-count 2
> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:
> option
> >> >> >> shared-brick-count 2
> >> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> >> >> option shared-brick-count 0
> >> >> >> 

Re: [Gluster-users] The dht-layout interval is missing

2020-05-29 Thread Susant Palai
On Fri, May 29, 2020 at 1:28 PM jifeng-call <17607319...@163.com> wrote:

> Hi All,
> I have 6 servers that form a glusterfs 2x3 distributed replication volume,
> the details are as follows:
>
> [root@node1 ~]# gluster volume info
> Volume Name: ksvd_vol
> Type: Distributed-Replicate
> Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 3 = 6
> Transport-type: tcp
> Bricks:
> Brick1: node1:/export/kylin/ksvd
> Brick2: node2:/export/kylin/ksvd
> Brick3: node3:/export/kylin/ksvd
> Brick4: node4:/export/kylin/ksvd
> Brick5: node5:/export/kylin/ksvd
> Brick6: node6:/export/kylin/ksvd
> Options Reconfigured:
> diagnostics.client-log-level: DEBUG
> diagnostics.brick-log-level: DEBUG
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> cluster.quorum-type: auto
>
> Restart 6 servers at a time, the server automatically mounts the ksvd_vol
> volume to the / home / kylin-ksvd directory during the startup process:
>
> [root@node1 ~]# df -h | grep ksvd_vol
> node1:/ksvd_vol4.5T  362G  4.1T9% /home/kylin-ksvd
>
> Failed to create a file in the glusterfs mount directory of node3, the log
> shows the error as follows (the files created by other servers are normal):
>
> Possibly the layout setxattr failed which should not have happened given
it is a replicated volume. You can check log on client side and brick side
for the parent directory for more information.
Post remount, a fresh triggers which heals the layout missing before and
create goes in.


> [2020-05-29 05:55:03.065656] E [MSGID: 109011]
> [dht-common.c:8683:dht_create] 0-ksvd_vol-dht: no subvolume in layout for
> path=/.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512
> [2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk]
> 0-glusterfs-fuse: 2454790:
> /.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误)
> [2020-05-29 05:55:03.680303] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> [2020-05-29 05:55:04.687456] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 491618322
> [2020-05-29 05:55:04.688612] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> [2020-05-29 05:55:05.694446] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> [2020-05-29 05:55:05.830555] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 491618322
> [2020-05-29 05:55:06.700423] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> [2020-05-29 05:55:07.706536] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> [2020-05-29 05:55:07.833049] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 491618322
> [2020-05-29 05:55:08.712128] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash
> (value) = 1738969696
> The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search]
> 0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2
> times between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541]
>
> The pathinfo information in the / home / kylin-ksvd directory is displayed
> as follows:
>
> [root@node3 kylin-ksvd]# getfattr -d -m .  -n trusted.glusterfs.pathinfo
> /home/kylin-ksvd/
> getfattr: Removing leading '/' from absolute path names
> # file: home/kylin-ksvd/
> trusted.glusterfs.pathinfo="((
> (
> 
> 
> )
> (
> 
> 
> ))
> (ksvd_vol-dht-layout (ksvd_vol-replicate-0 0 0) (ksvd_vol-replicate-1
> 3539976838 4294967295)))"
>
> It can be seen from the above information that ksvd_vol-dht-layout is
> missing the interval 0 to 3539976837
>
> After umount, remounting returned to normal ... What is the reason?
>
> Best regards
>
>
>
>
>
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Petr Certik
I'm afraid I don't have the resources to try and reproduce from the
beginning. Is there anything else I can do to get you more
information?


On Fri, May 29, 2020 at 11:08 AM Sanju Rakonde  wrote:
>
> The issue is not with glusterd restart. We need to reproduce from beginning 
> and add-bricks to check df -h values.
>
> I suggest not to try on the production environment. if you have any other 
> machines, please let me know.
>
> On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:
>>
>> If you mean the issue during node restart, then yes, I think I could
>> reproduce that with a custom build. It's a production system, though,
>> so I'll need to be extremely careful.
>>
>> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
>> custom glusterd binary based off of that version?
>>
>> Cheers,
>> Petr
>>
>> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde  wrote:
>> >
>> > Surprising! Will you be able to reproduce the issue and share the logs if 
>> > I provide a custom build with more logs?
>> >
>> > On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:
>> >>
>> >> Thanks for your help! Much appreciated.
>> >>
>> >> The fsid is the same for all bricks:
>> >>
>> >> imagegluster1:
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>> >>
>> >> imagegluster2:
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>> >>
>> >> imagegluster3:
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
>> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
>> >>
>> >>
>> >> I already did try restarting the glusterd nodes with no effect, but
>> >> that was before the upgrades of client versions.
>> >>
>> >> Running the "volume set" command did not seem to work either, the
>> >> shared-brick-counts are still the same (2).
>> >>
>> >> However, when restarting a node, I do get an error and a few warnings
>> >> in the log: https://pastebin.com/tqq1FCwZ
>> >>
>> >>
>> >>
>> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde  wrote:
>> >> >
>> >> > The shared-brick-count value indicates the number of bricks sharing a 
>> >> > file-system. In your case, it should be one, as all the bricks are from 
>> >> > different mount points. Can you please share the values of brick-fsid?
>> >> >
>> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
>> >> >
>> >> > I tried reproducing this issue in fedora vm's but couldn't hit this. we 
>> >> > are seeing this issue on and off but are unable to reproduce in-house. 
>> >> > If you see any error messages in glusterd.log please share the log too.
>> >> >
>> >> > Work-around to come out from this situation:
>> >> > 1. Restarting the glusterd service on all nodes:
>> >> > # systemctl restart glusterd
>> >> >
>> >> > 2. Run set volume command to update vol file:
>> >> > # gluster v set  min-free-disk 11%
>> >> >
>> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik  wrote:
>> >> >>
>> >> >> As far as I remember, there was no version update on the server. It
>> >> >> was definitely installed as version 7.
>> >> >>
>> >> >> Shared bricks:
>> >> >>
>> >> >> Server 1:
>> >> >>
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
>> >> >> option shared-brick-count 2
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
>> >> >> shared-brick-count 2
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
>> >> >> option shared-brick-count 0
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
>> >> >> shared-brick-count 0
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
>> >> >> option shared-brick-count 0
>> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
>> >> >> shared-brick-count 0
>> >> >>
>> >> >> Server 2:
>> >> >>
>> >> >> 

Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Sanju Rakonde
The issue is not with glusterd restart. We need to reproduce from beginning
and add-bricks to check df -h values.

I suggest not to try on the production environment. if you have any other
machines, please let me know.

On Fri, May 29, 2020 at 1:37 PM Petr Certik  wrote:

> If you mean the issue during node restart, then yes, I think I could
> reproduce that with a custom build. It's a production system, though,
> so I'll need to be extremely careful.
>
> We're using debian glusterfs-server 7.3-1 amd64, can you provide a
> custom glusterd binary based off of that version?
>
> Cheers,
> Petr
>
> On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde  wrote:
> >
> > Surprising! Will you be able to reproduce the issue and share the logs
> if I provide a custom build with more logs?
> >
> > On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:
> >>
> >> Thanks for your help! Much appreciated.
> >>
> >> The fsid is the same for all bricks:
> >>
> >> imagegluster1:
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >>
> >> imagegluster2:
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
> >>
> >> imagegluster3:
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> >> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
> >>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
> >>
> >>
> >> I already did try restarting the glusterd nodes with no effect, but
> >> that was before the upgrades of client versions.
> >>
> >> Running the "volume set" command did not seem to work either, the
> >> shared-brick-counts are still the same (2).
> >>
> >> However, when restarting a node, I do get an error and a few warnings
> >> in the log: https://pastebin.com/tqq1FCwZ
> >>
> >>
> >>
> >> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde 
> wrote:
> >> >
> >> > The shared-brick-count value indicates the number of bricks sharing a
> file-system. In your case, it should be one, as all the bricks are from
> different mount points. Can you please share the values of brick-fsid?
> >> >
> >> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
> >> >
> >> > I tried reproducing this issue in fedora vm's but couldn't hit this.
> we are seeing this issue on and off but are unable to reproduce in-house.
> If you see any error messages in glusterd.log please share the log too.
> >> >
> >> > Work-around to come out from this situation:
> >> > 1. Restarting the glusterd service on all nodes:
> >> > # systemctl restart glusterd
> >> >
> >> > 2. Run set volume command to update vol file:
> >> > # gluster v set  min-free-disk 11%
> >> >
> >> > On Wed, May 27, 2020 at 5:24 PM Petr Certik  wrote:
> >> >>
> >> >> As far as I remember, there was no version update on the server. It
> >> >> was definitely installed as version 7.
> >> >>
> >> >> Shared bricks:
> >> >>
> >> >> Server 1:
> >> >>
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> >> option shared-brick-count 2
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:
> option
> >> >> shared-brick-count 2
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> >> option shared-brick-count 0
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:
> option
> >> >> shared-brick-count 0
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
> >> >> option shared-brick-count 0
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:
> option
> >> >> shared-brick-count 0
> >> >>
> >> >> Server 2:
> >> >>
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> >> option shared-brick-count 0
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:
> option
> >> >> shared-brick-count 0
> >> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> >> option shared-brick-count 2
> >> >> 

[Gluster-users] The dht-layout interval is missing

2020-05-29 Thread jifeng-call
Hi All,




I have 6 servers that form a glusterfs 2x3 distributed replication volume, the 
details are as follows:




[root@node1 ~]# gluster volume info

Volume Name: abcd_vol

Type: Distributed-Replicate

Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c

Status: Started

Snapshot Count: 0

Number of Bricks: 2 x 3 = 6

Transport-type: tcp

Bricks:

Brick1: node1:/export/test/abcd

Brick2: node2:/export/test/abcd

Brick3: node3:/export/test/abcd

Brick4: node4:/export/test/abcd

Brick5: node5:/export/test/abcd

Brick6: node6:/export/test/abcd

Options Reconfigured:

diagnostics.client-log-level: DEBUG

diagnostics.brick-log-level: DEBUG

performance.client-io-threads: off

nfs.disable: on

transport.address-family: inet

cluster.quorum-type: auto




Restart 6 servers at a time, the server automatically mounts the abcd_vol 
volume to the / home / test-abcd directory during the startup process:




[root@node1 ~]# df -h | grep abcd_vol

node1:/abcd_vol4.5T  362G  4.1T9% /home/test-abcd




Failed to create a file in the glusterfs mount directory of node3, the log 
shows the error as follows (the files created by other servers are normal):




[2020-05-29 05:55:03.065656] E [MSGID: 109011] [dht-common.c:8683:dht_create] 
0-abcd_vol-dht: no subvolume in layout for 
path=/.abcd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512

[2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk] 
0-glusterfs-fuse: 2454790: 
/.abcd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误)

[2020-05-29 05:55:03.680303] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

[2020-05-29 05:55:04.687456] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 491618322

[2020-05-29 05:55:04.688612] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

[2020-05-29 05:55:05.694446] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

[2020-05-29 05:55:05.830555] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 491618322

[2020-05-29 05:55:06.700423] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

[2020-05-29 05:55:07.706536] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

[2020-05-29 05:55:07.833049] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 491618322

[2020-05-29 05:55:08.712128] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-abcd_vol-dht: no subvolume for hash 
(value) = 1738969696

The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 
0-abcd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2 times 
between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541]




The pathinfo information in the / home / test-abcd directory is displayed as 
follows:




[root@node3 test-abcd]# getfattr -d -m .  -n trusted.glusterfs.pathinfo 
/home/test-abcd/

getfattr: Removing leading '/' from absolute path names

# file: home/test-abcd/

trusted.glusterfs.pathinfo="(( 
( 
 
 
) ( 
 
 
)) (abcd_vol-dht-layout 
(abcd_vol-re

plicate-0 0 0) (abcd_vol-replicate-1 3539976838 4294967295)))"




It can be seen from the above information that abcd_vol-dht-layout is missing 
the interval 0 to 3539976837




After umount, remounting returned to normal ... What is the reason?




Best regards



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Petr Certik
If you mean the issue during node restart, then yes, I think I could
reproduce that with a custom build. It's a production system, though,
so I'll need to be extremely careful.

We're using debian glusterfs-server 7.3-1 amd64, can you provide a
custom glusterd binary based off of that version?

Cheers,
Petr

On Fri, May 29, 2020 at 9:09 AM Sanju Rakonde  wrote:
>
> Surprising! Will you be able to reproduce the issue and share the logs if I 
> provide a custom build with more logs?
>
> On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:
>>
>> Thanks for your help! Much appreciated.
>>
>> The fsid is the same for all bricks:
>>
>> imagegluster1:
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>>
>> imagegluster2:
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>>
>> imagegluster3:
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
>> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
>>
>>
>> I already did try restarting the glusterd nodes with no effect, but
>> that was before the upgrades of client versions.
>>
>> Running the "volume set" command did not seem to work either, the
>> shared-brick-counts are still the same (2).
>>
>> However, when restarting a node, I do get an error and a few warnings
>> in the log: https://pastebin.com/tqq1FCwZ
>>
>>
>>
>> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde  wrote:
>> >
>> > The shared-brick-count value indicates the number of bricks sharing a 
>> > file-system. In your case, it should be one, as all the bricks are from 
>> > different mount points. Can you please share the values of brick-fsid?
>> >
>> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
>> >
>> > I tried reproducing this issue in fedora vm's but couldn't hit this. we 
>> > are seeing this issue on and off but are unable to reproduce in-house. If 
>> > you see any error messages in glusterd.log please share the log too.
>> >
>> > Work-around to come out from this situation:
>> > 1. Restarting the glusterd service on all nodes:
>> > # systemctl restart glusterd
>> >
>> > 2. Run set volume command to update vol file:
>> > # gluster v set  min-free-disk 11%
>> >
>> > On Wed, May 27, 2020 at 5:24 PM Petr Certik  wrote:
>> >>
>> >> As far as I remember, there was no version update on the server. It
>> >> was definitely installed as version 7.
>> >>
>> >> Shared bricks:
>> >>
>> >> Server 1:
>> >>
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
>> >> option shared-brick-count 2
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
>> >> shared-brick-count 2
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
>> >> option shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
>> >> shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
>> >> option shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
>> >> shared-brick-count 0
>> >>
>> >> Server 2:
>> >>
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
>> >> option shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
>> >> shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
>> >> option shared-brick-count 2
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
>> >> shared-brick-count 2
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
>> >> option shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
>> >> shared-brick-count 0
>> >>
>> >> Server 3:
>> >>
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
>> >> option shared-brick-count 0
>> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:  

[Gluster-users] The dht-layout interval is missing

2020-05-29 Thread jifeng-call
Hi All,

I have 6 servers that form a glusterfs 2x3 distributed replication volume, the 
details are as follows:


[root@node1 ~]# gluster volume info
Volume Name: ksvd_vol
Type: Distributed-Replicate
Volume ID: c9848daa-b06f-4f82-a2f8-1b425b8e869c
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: node1:/export/kylin/ksvd
Brick2: node2:/export/kylin/ksvd
Brick3: node3:/export/kylin/ksvd
Brick4: node4:/export/kylin/ksvd
Brick5: node5:/export/kylin/ksvd
Brick6: node6:/export/kylin/ksvd
Options Reconfigured:
diagnostics.client-log-level: DEBUG
diagnostics.brick-log-level: DEBUG
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto


Restart 6 servers at a time, the server automatically mounts the ksvd_vol 
volume to the / home / kylin-ksvd directory during the startup process:


[root@node1 ~]# df -h | grep ksvd_vol
node1:/ksvd_vol4.5T  362G  4.1T9% /home/kylin-ksvd


Failed to create a file in the glusterfs mount directory of node3, the log 
shows the error as follows (the files created by other servers are normal):


[2020-05-29 05:55:03.065656] E [MSGID: 109011] [dht-common.c:8683:dht_create] 
0-ksvd_vol-dht: no subvolume in layout for 
path=/.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512
[2020-05-29 05:55:03.065719] W [fuse-bridge.c:2122:fuse_create_cbk] 
0-glusterfs-fuse: 2454790: 
/.ksvd.time.405fa8a6-575d-4474-a97f-a107cbf1c673.18512 => -1 (输入/输出错误)
[2020-05-29 05:55:03.680303] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
[2020-05-29 05:55:04.687456] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 491618322
[2020-05-29 05:55:04.688612] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
[2020-05-29 05:55:05.694446] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
[2020-05-29 05:55:05.830555] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 491618322
[2020-05-29 05:55:06.700423] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
[2020-05-29 05:55:07.706536] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
[2020-05-29 05:55:07.833049] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 491618322
[2020-05-29 05:55:08.712128] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-ksvd_vol-dht: no subvolume for hash 
(value) = 1738969696
The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 
0-ksvd_vol-dht: no subvolume for hash (value) = 1738969696" repeated 2 times 
between [2020-05-29 05:55:08.712128] and [2020-05-29 05:55:09.718541]


The pathinfo information in the / home / kylin-ksvd directory is displayed as 
follows:


[root@node3 kylin-ksvd]# getfattr -d -m .  -n trusted.glusterfs.pathinfo 
/home/kylin-ksvd/
getfattr: Removing leading '/' from absolute path names
# file: home/kylin-ksvd/
trusted.glusterfs.pathinfo="(( 
( 
 
 
) 
( 
 
 
)) (ksvd_vol-dht-layout 
(ksvd_vol-replicate-0 0 0) (ksvd_vol-replicate-1 3539976838 4294967295)))"


It can be seen from the above information that ksvd_vol-dht-layout is missing 
the interval 0 to 3539976837


After umount, remounting returned to normal ... What is the reason?


Best regards





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] df shows wrong mount size, after adding bricks to volume

2020-05-29 Thread Sanju Rakonde
Surprising! Will you be able to reproduce the issue and share the logs if I
provide a custom build with more logs?

On Thu, May 28, 2020 at 1:35 PM Petr Certik  wrote:

> Thanks for your help! Much appreciated.
>
> The fsid is the same for all bricks:
>
> imagegluster1:
>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=2065
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=2065
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>
> imagegluster2:
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=2065
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=2065
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=0
>
> imagegluster3:
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster1:-data-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data2-brick:brick-fsid=0
> /var/lib/glusterd/vols/gv0/bricks/imagegluster2:-data-brick:brick-fsid=0
>
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data2-brick:brick-fsid=2065
> /var/lib/glusterd/vols/gv0/bricks/imagegluster3:-data-brick:brick-fsid=2065
>
>
> I already did try restarting the glusterd nodes with no effect, but
> that was before the upgrades of client versions.
>
> Running the "volume set" command did not seem to work either, the
> shared-brick-counts are still the same (2).
>
> However, when restarting a node, I do get an error and a few warnings
> in the log: https://pastebin.com/tqq1FCwZ
>
>
>
> On Wed, May 27, 2020 at 3:14 PM Sanju Rakonde  wrote:
> >
> > The shared-brick-count value indicates the number of bricks sharing a
> file-system. In your case, it should be one, as all the bricks are from
> different mount points. Can you please share the values of brick-fsid?
> >
> > grep "brick-fsid" /var/lib/glusterd/vols//bricks/
> >
> > I tried reproducing this issue in fedora vm's but couldn't hit this. we
> are seeing this issue on and off but are unable to reproduce in-house. If
> you see any error messages in glusterd.log please share the log too.
> >
> > Work-around to come out from this situation:
> > 1. Restarting the glusterd service on all nodes:
> > # systemctl restart glusterd
> >
> > 2. Run set volume command to update vol file:
> > # gluster v set  min-free-disk 11%
> >
> > On Wed, May 27, 2020 at 5:24 PM Petr Certik  wrote:
> >>
> >> As far as I remember, there was no version update on the server. It
> >> was definitely installed as version 7.
> >>
> >> Shared bricks:
> >>
> >> Server 1:
> >>
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> option shared-brick-count 2
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
> >> shared-brick-count 2
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
> >> shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
> >> shared-brick-count 0
> >>
> >> Server 2:
> >>
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
> >> shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> option shared-brick-count 2
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
> >> shared-brick-count 2
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
> >> shared-brick-count 0
> >>
> >> Server 3:
> >>
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster1.data-brick.vol:option
> >> shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data2-brick.vol:
> >> option shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster2.data-brick.vol:option
> >> shared-brick-count 0
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data2-brick.vol:
> >> option shared-brick-count 2
> >> /var/lib/glusterd/vols/gv0/gv0.imagegluster3.data-brick.vol:option
> >> shared-brick-count 2
> >>
> >> On Wed, May 27, 2020 at 1:36 PM