Re: [Gluster-users] Error in Installing Glusterfs-4.1.6 from tar
Can I skip this warning message in tail mail and continue with the installation? On Thu, Dec 27, 2018 at 5:11 PM Amudhan P wrote: > Thanks, Ravishankar it worked. > also, I am getting the following warning message when running `make` is it > safe to skip? > > dht-layout.c: In function ‘dht_layout_new’: > dht-layout.c:51:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (layout->ref, 1); > ^ > dht-layout.c:51:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > CC dht-helper.lo > > > CC ec.lo > ec.c: In function ‘ec_statistics_init’: > ec.c:637:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.hits, 0); > ^ > ec.c:637:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:638:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.misses, 0); > ^ > ec.c:638:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:639:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.updates, 0); > ^ > ec.c:639:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:640:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.invals, 0); > ^ > ec.c:640:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:641:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.evicts, 0); > ^ > ec.c:641:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:642:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.allocs, 0); > ^ > ec.c:642:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > ec.c:643:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT(ec->stats.stripe_cache.errors, 0); > ^ > ec.c:643:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > CC ec-data.lo > > > CCLD posix.la > .libs/posix-inode-fd-ops.o: In function `posix_do_chmod': > /home/qubevaultadmin/gluster-tar/glusterfs-4.1.6/xlators/storage/posix/src/posix-inode-fd-ops.c:203: > warning: lchmod is not implemented and will always fail > make[5]: Nothing to be done for 'all-am'. > > > CC client-handshake.lo > client-handshake.c: In function ‘clnt_fd_lk_local_create’: > client-handshake.c:150:9: warning: dereferencing type-punned pointer will > break strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (local->ref, 1); > ^ > client-handshake.c:150:9: warning: dereferencing type-punned pointer will > break strict-aliasing rules [-Wstrict-aliasing] > CC client-callback.lo > > CC readdir-ahead.lo > readdir-ahead.c: In function ‘init’: > readdir-ahead.c:637:9: warning: dereferencing type-punned pointer will > break strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (priv->rda_cache_size, 0); > ^ > readdir-ahead.c:637:9: warning: dereferencing type-punned pointer will > break strict-aliasing rules [-Wstrict-aliasing] > CCLD readdir-ahead.la > > Making all in src > CC md-cache.lo > md-cache.c: In function ‘mdc_init’: > md-cache.c:3431:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (conf->mdc_counter.stat_hit, 0); > ^ > md-cache.c:3431:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > md-cache.c:3432:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (conf->mdc_counter.stat_miss, 0); > ^ > md-cache.c:3432:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > md-cache.c:3433:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > GF_ATOMIC_INIT (conf->mdc_counter.xattr_hit, 0); > ^ > md-cache.c:3433:9: warning: dereferencing type-punned pointer will break > strict-aliasing rules [-Wstrict-aliasing] > md-cache.c:3434:9: warning: derefere
Re: [Gluster-users] java application crushes while reading a zip file
On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev wrote: > Still no JVM crushes. Is it possible that running glusterfs with > performance options turned off for a couple of days cleared out the "stale > metadata issue"? > restarting these options, would've cleared the existing cache and hence previous stale metadata would've been cleared. Hitting stale metadata again depends on races. That might be the reason you are still not seeing the issue. Can you try with enabling all perf xlators (default configuration)? > > On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev > wrote: > >> The software ran with all of the options turned off over the weekend >> without any problems. >> I will try to collect the debug info for you. I have re-enabled the 3 >> three options, but yet to see the problem reoccurring. >> >> >> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa >> wrote: >> >>> Thanks Dmitry. Can you provide the following debug info I asked earlier: >>> >>> * strace -ff -v ... of java application >>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >>> mounting). >>> >>> regards, >>> Raghavendra >>> >>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev >>> wrote: >>> These 3 options seem to trigger both (reading zip file and renaming files) problems. Options Reconfigured: performance.io-cache: off performance.stat-prefetch: off performance.quick-read: off performance.parallel-readdir: off *performance.readdir-ahead: on* *performance.write-behind: on* *performance.read-ahead: on* performance.client-io-threads: off nfs.disable: on transport.address-family: inet On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev wrote: > Turning a single option on at a time still worked fine. I will keep > trying. > > We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log > messages. Do you suppose these issues are triggered by the new > environment > or did not exist in 4.1.5? > > [root@node1 ~]# glusterfs --version > glusterfs 4.1.5 > > On AWS using > [root@node1 ~]# hostnamectl >Static hostname: node1 > Icon name: computer-vm >Chassis: vm > Machine ID: b30d0f2110ac3807b210c19ede3ce88f >Boot ID: 52bb159a0aa94043a40e7c7651967bd9 > Virtualization: kvm > Operating System: CentOS Linux 7 (Core) >CPE OS Name: cpe:/o:centos:centos:7 > Kernel: Linux 3.10.0-862.3.2.el7.x86_64 > Architecture: x86-64 > > > > > On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < > rgowd...@redhat.com> wrote: > >> >> >> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev >> wrote: >> >>> Ok. I will try different options. >>> >>> This system is scheduled to go into production soon. What version >>> would you recommend to roll back to? >>> >> >> These are long standing issues. So, rolling back may not make these >> issues go away. Instead if you think performance is agreeable to you, >> please keep these xlators off in production. >> >> >>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev wrote: > Raghavendra, > > Thank for the suggestion. > > > I am suing > > [root@jl-fanexoss1p glusterfs]# gluster --version > glusterfs 5.0 > > On > [root@jl-fanexoss1p glusterfs]# hostnamectl > Icon name: computer-vm >Chassis: vm > Machine ID: e44b8478ef7a467d98363614f4e50535 >Boot ID: eed98992fdda4c88bdd459a89101766b > Virtualization: vmware > Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server > Kernel: Linux 3.10.0-862.14.4.el7.x86_64 > Architecture: x86-64 > > > I have configured the following options > > [root@jl-fanexoss1p glusterfs]# gluster volume info > Volume Name: gv0 > Type: Replicate > Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 > Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 > Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 > Options Reconfigured: > performance.io-cache: off > performance.stat-prefetch: off > performance.quick-read: off > performance.parallel-readdir: off > perfo
Re: [Gluster-users] java application crushes while reading a zip file
Still no JVM crushes. Is it possible that running glusterfs with performance options turned off for a couple of days cleared out the "stale metadata issue"? On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev wrote: > The software ran with all of the options turned off over the weekend > without any problems. > I will try to collect the debug info for you. I have re-enabled the 3 > three options, but yet to see the problem reoccurring. > > > On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa > wrote: > >> Thanks Dmitry. Can you provide the following debug info I asked earlier: >> >> * strace -ff -v ... of java application >> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >> mounting). >> >> regards, >> Raghavendra >> >> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev >> wrote: >> >>> These 3 options seem to trigger both (reading zip file and renaming >>> files) problems. >>> >>> Options Reconfigured: >>> performance.io-cache: off >>> performance.stat-prefetch: off >>> performance.quick-read: off >>> performance.parallel-readdir: off >>> *performance.readdir-ahead: on* >>> *performance.write-behind: on* >>> *performance.read-ahead: on* >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev >>> wrote: >>> Turning a single option on at a time still worked fine. I will keep trying. We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log messages. Do you suppose these issues are triggered by the new environment or did not exist in 4.1.5? [root@node1 ~]# glusterfs --version glusterfs 4.1.5 On AWS using [root@node1 ~]# hostnamectl Static hostname: node1 Icon name: computer-vm Chassis: vm Machine ID: b30d0f2110ac3807b210c19ede3ce88f Boot ID: 52bb159a0aa94043a40e7c7651967bd9 Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-862.3.2.el7.x86_64 Architecture: x86-64 On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < rgowd...@redhat.com> wrote: > > > On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev > wrote: > >> Ok. I will try different options. >> >> This system is scheduled to go into production soon. What version >> would you recommend to roll back to? >> > > These are long standing issues. So, rolling back may not make these > issues go away. Instead if you think performance is agreeable to you, > please keep these xlators off in production. > > >> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >> rgowd...@redhat.com> wrote: >> >>> >>> >>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev >>> wrote: >>> Raghavendra, Thank for the suggestion. I am suing [root@jl-fanexoss1p glusterfs]# gluster --version glusterfs 5.0 On [root@jl-fanexoss1p glusterfs]# hostnamectl Icon name: computer-vm Chassis: vm Machine ID: e44b8478ef7a467d98363614f4e50535 Boot ID: eed98992fdda4c88bdd459a89101766b Virtualization: vmware Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server Kernel: Linux 3.10.0-862.14.4.el7.x86_64 Architecture: x86-64 I have configured the following options [root@jl-fanexoss1p glusterfs]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 Options Reconfigured: performance.io-cache: off performance.stat-prefetch: off performance.quick-read: off performance.parallel-readdir: off performance.readdir-ahead: off performance.write-behind: off performance.read-ahead: off performance.client-io-threads: off nfs.disable: on transport.address-family: inet I don't know if it is related, but I am seeing a lot of [2018-12-27 20:19:23.776080] W [MSGID: 114031] [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote operation failed [No such device or address] [2018-12-27 20:19:47.735190]
[Gluster-users] Multiple versions on the same machine, errors on glusterd startup
Hi, I'm trying to host several GlusterFS versions on the same machine (3.8.8, 4.1.6 and 5.2), not to be ran together of course. I built them with the following procedure (examples with 3.8.8): git clone https://github.com/gluster/glusterfs . git checkout v3.8.8 ./autogen ./configure --program-suffix="-3.8.8" make sudo make install sudo cp -a extras/systemd/glusterd.service /etc/systemd/system/glusterd-3.8.8.service sudo systemctl load glusterd-3.8.8 I had to edit the service for it to execute the right version of glusterd: ExecStart=/usr/local/sbin/glusterd*-3.8.8* -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS And I had to create symlinks for glusterd: cd /usr/local/sbin ln -s glusterd-3.8.8 glusterfsd-3.8.8 I also ran ldconfig for good mesure... sudo ldconfig When I run glusterd in the foreground (not even with systemd) I'm left with some errors and the process exits (errors emphasized): user@host0:~/glusterfs-3.8.8 on e5f3a990c [!☡]# sudo glusterd-3.8.8 --debug [2019-01-01 16:23:37.120684] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-glusterd-3.8.8: Started running glusterd-3.8.8 version 3.8.8 (args: glusterd-3.8.8 --debug) [2019-01-01 16:23:37.120765] D [logging.c:1791:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2019-01-01 16:23:37.121187] D [MSGID: 0] [glusterfsd.c:660:get_volfp] 0-glusterfsd: loading volume file /usr/local/etc/glusterfs/glusterd.vol [2019-01-01 16:23:37.137003] I [MSGID: 106478] [glusterd.c:1379:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-01-01 16:23:37.137064] I [MSGID: 106479] [glusterd.c:1428:init] 0-management: Using /var/lib/glusterd as working directory [2019-01-01 16:23:37.137262] D [MSGID: 0] [glusterd.c:406:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog value: 128 [2019-01-01 16:23:37.137683] D [rpcsvc.c:2316:rpcsvc_init] 0-rpc-service: RPC service inited. [2019-01-01 16:23:37.137723] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2019-01-01 16:23:37.137798] D [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/local/lib/glusterfs/3.8.8/rpc-transport/socket.so [2019-01-01 16:23:37.151778] D [socket.c:3938:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0 [2019-01-01 16:23:37.151823] D [socket.c:4021:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled [2019-01-01 16:23:37.151862] D [socket.c:4024:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled [2019-01-01 16:23:37.151890] D [socket.c:4041:socket_init] 0-socket.management: using system polling thread [2019-01-01 16:23:37.151927] D [name.c:584:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet [2019-01-01 16:23:37.152173] D [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/local/lib/glusterfs/3.8.8/rpc-transport/rdma.so [2019-01-01 16:23:37.155510] D [rpc-transport.c:321:rpc_transport_load] 0-rpc-transport: dlsym (gf_rpc_transport_reconfigure) on /usr/local/lib/glusterfs/3.8.8/rpc-transport/rdma.so: undefined symbol: reconfigure [2019-01-01 16:23:37.155830] W [MSGID: 103071] [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2019-01-01 16:23:37.155884] W [MSGID: 103055] [rdma.c:4896:init] 0-rdma.management: Failed to initialize IB Device [2019-01-01 16:23:37.155920] W [rpc-transport.c:354:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2019-01-01 16:23:37.156224] W [rpcsvc.c:1638:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed *[2019-01-01 16:23:37.156258] E [MSGID: 106243] [glusterd.c:1652:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport* [2019-01-01 16:23:37.156300] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0 [2019-01-01 16:23:37.156332] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0 [2019-01-01 16:23:37.156356] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0 [2019-01-01 16:23:37.156384] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: 3, Port: 0 [2019-01-01 16:23:37.156414] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0 [2019-01-01 16:23:37.156438] D [rpcsvc.c:1866:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0 [2019-01-01 16:23:37.1564
Re: [Gluster-users] [Stale file handle] in shard volume
Hi Nithya, Thank you for your reply. the VM's using the gluster volumes keeps on getting paused/stopped on errors like these; [2019-01-02 02:33:44.469132] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 101487 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle] [2019-01-02 02:33:44.563288] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 101488 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle] What i'm trying to find out, if i can purge all gluster volumes from all possible stale file handles (and hopefully find a method to prevent this in the future), so the VM's can start running stable again. For this i need to know when the "shard_common_lookup_shards_cbk" function considers a file as stale. The statement; "Stale file handle errors show up when a file with a specified gfid is not found." doesn't seem to cover it all, as i've shown in earlier mails the shard file and glusterfs/xx/xx/uuid file do both exist, and have the same inode. If the criteria i'm using aren't correct, could you please tell me which criteria i should use to determine if a file is stale or not? these criteria are just based observations i made, moving the stale files manually. After removing them i was able to start the VM again..until some time later it hangs on another stale shard file unfortunate. Thanks Olaf Op wo 2 jan. 2019 om 14:20 schreef Nithya Balachandran : > > > On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar > wrote: > >> Dear All, >> >> till now a selected group of VM's still seem to produce new stale file's >> and getting paused due to this. >> I've not updated gluster recently, however i did change the op version >> from 31200 to 31202 about a week before this issue arose. >> Looking at the .shard directory, i've 100.000+ files sharing the same >> characteristics as a stale file. which are found till now, >> they all have the sticky bit set, e.g. file permissions; -T. are >> 0kb in size, and have the trusted.glusterfs.dht.linkto attribute. >> > > These are internal files used by gluster and do not necessarily mean they > are stale. They "point" to data files which may be on different bricks > (same name, gfid etc but no linkto xattr and no T permissions). > > >> These files range from long a go (beginning of the year) till now. Which >> makes me suspect this was laying dormant for some time now..and somehow >> recently surfaced. >> Checking other sub-volumes they contain also 0kb files in the .shard >> directory, but don't have the sticky bit and the linkto attribute. >> >> Does anybody else experience this issue? Could this be a bug or an >> environmental issue? >> > These are most likely valid files- please do not delete them without > double-checking. > > Stale file handle errors show up when a file with a specified gfid is not > found. You will need to debug the files for which you see this error by > checking the bricks to see if they actually exist. > >> >> Also i wonder if there is any tool or gluster command to clean all stale >> file handles? >> Otherwise i'm planning to make a simple bash script, which iterates over >> the .shard dir, checks each file for the above mentioned criteria, and >> (re)moves the file and the corresponding .glusterfs file. >> If there are other criteria needed to identify a stale file handle, i >> would like to hear that. >> If this is a viable and safe operation to do of course. >> >> Thanks Olaf >> >> >> >> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar < >> olaf.buitel...@gmail.com>: >> >>> Dear All, >>> >>> I figured it out, it appeared to be the exact same issue as described >>> here; >>> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html >>> Another subvolume also had the shard file, only were all 0 bytes and had >>> the dht.linkto >>> >>> for reference; >>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >>> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >>> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >>> >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >>> >>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030 >>> >>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100 >>> >>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >>> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >>> >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >>> >>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d65386
Re: [Gluster-users] [Stale file handle] in shard volume
On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar wrote: > Dear All, > > till now a selected group of VM's still seem to produce new stale file's > and getting paused due to this. > I've not updated gluster recently, however i did change the op version > from 31200 to 31202 about a week before this issue arose. > Looking at the .shard directory, i've 100.000+ files sharing the same > characteristics as a stale file. which are found till now, > they all have the sticky bit set, e.g. file permissions; -T. are > 0kb in size, and have the trusted.glusterfs.dht.linkto attribute. > These are internal files used by gluster and do not necessarily mean they are stale. They "point" to data files which may be on different bricks (same name, gfid etc but no linkto xattr and no T permissions). > These files range from long a go (beginning of the year) till now. Which > makes me suspect this was laying dormant for some time now..and somehow > recently surfaced. > Checking other sub-volumes they contain also 0kb files in the .shard > directory, but don't have the sticky bit and the linkto attribute. > > Does anybody else experience this issue? Could this be a bug or an > environmental issue? > These are most likely valid files- please do not delete them without double-checking. Stale file handle errors show up when a file with a specified gfid is not found. You will need to debug the files for which you see this error by checking the bricks to see if they actually exist. > > Also i wonder if there is any tool or gluster command to clean all stale > file handles? > Otherwise i'm planning to make a simple bash script, which iterates over > the .shard dir, checks each file for the above mentioned criteria, and > (re)moves the file and the corresponding .glusterfs file. > If there are other criteria needed to identify a stale file handle, i > would like to hear that. > If this is a viable and safe operation to do of course. > > Thanks Olaf > > > > Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar < > olaf.buitel...@gmail.com>: > >> Dear All, >> >> I figured it out, it appeared to be the exact same issue as described >> here; >> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html >> Another subvolume also had the shard file, only were all 0 bytes and had >> the dht.linkto >> >> for reference; >> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >> >> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030 >> >> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100 >> >> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >> >> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030 >> >> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100 >> >> [root@lease-04 ovirt-backbone-2]# stat >> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> File: ‘.glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d’ >> Size: 0 Blocks: 0 IO Block: 4096 regular empty >> file >> Device: fd01h/64769dInode: 1918631406 Links: 2 >> Access: (1000/-T) Uid: (0/root) Gid: (0/root) >> Context: system_u:object_r:etc_runtime_t:s0 >> Access: 2018-12-17 21:43:36.405735296 + >> Modify: 2018-12-17 21:43:36.405735296 + >> Change: 2018-12-17 21:43:36.405735296 + >> Birth: - >> >> removing the shard file and glusterfs file from each node resolved the >> issue. >> >> I also found this thread; >> https://lists.gluster.org/pipermail/gluster-users/2018-December/035460.html >> Maybe he suffers from the same issue. >> >> Best Olaf >> >> >> Op wo 19 dec. 2018 om 21:56 schreef Olaf Buitelaar < >> olaf.buitel...@gmail.com>: >> >>> Dear All, >>> >>> It appears i've a stale file in one of the volumes, on 2 files. These >>> files are qemu images (1 raw and 1 qcow2). >>> I'll just focus on 1 file since the situation on the other seems the >>> same. >>> >>> The VM get's paused more or less directly after being booted with error; >>> [2018-12-18 14:05:05.275713] E [MSGID: 133010] >>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard: >>> Lookup on shard 51500 failed
Re: [Gluster-users] [Stale file handle] in shard volume
Dear All, The bash file i'm planning to run can be found here; https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986 It would be nice to receive some feedback from the community before i would actually run the clean-up of all stale file handles. Thanks Olaf Op zo 30 dec. 2018 om 20:56 schreef Olaf Buitelaar : > Dear All, > > till now a selected group of VM's still seem to produce new stale file's > and getting paused due to this. > I've not updated gluster recently, however i did change the op version > from 31200 to 31202 about a week before this issue arose. > Looking at the .shard directory, i've 100.000+ files sharing the same > characteristics as a stale file. which are found till now, > they all have the sticky bit set, e.g. file permissions; -T. are > 0kb in size, and have the trusted.glusterfs.dht.linkto attribute. > These files range from long a go (beginning of the year) till now. Which > makes me suspect this was laying dormant for some time now..and somehow > recently surfaced. > Checking other sub-volumes they contain also 0kb files in the .shard > directory, but don't have the sticky bit and the linkto attribute. > > Does anybody else experience this issue? Could this be a bug or an > environmental issue? > > Also i wonder if there is any tool or gluster command to clean all stale > file handles? > Otherwise i'm planning to make a simple bash script, which iterates over > the .shard dir, checks each file for the above mentioned criteria, and > (re)moves the file and the corresponding .glusterfs file. > If there are other criteria needed to identify a stale file handle, i > would like to hear that. > If this is a viable and safe operation to do of course. > > Thanks Olaf > > > > Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar < > olaf.buitel...@gmail.com>: > >> Dear All, >> >> I figured it out, it appeared to be the exact same issue as described >> here; >> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html >> Another subvolume also had the shard file, only were all 0 bytes and had >> the dht.linkto >> >> for reference; >> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >> >> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030 >> >> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100 >> >> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex >> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 >> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d >> >> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030 >> >> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100 >> >> [root@lease-04 ovirt-backbone-2]# stat >> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d >> File: ‘.glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d’ >> Size: 0 Blocks: 0 IO Block: 4096 regular empty >> file >> Device: fd01h/64769dInode: 1918631406 Links: 2 >> Access: (1000/-T) Uid: (0/root) Gid: (0/root) >> Context: system_u:object_r:etc_runtime_t:s0 >> Access: 2018-12-17 21:43:36.405735296 + >> Modify: 2018-12-17 21:43:36.405735296 + >> Change: 2018-12-17 21:43:36.405735296 + >> Birth: - >> >> removing the shard file and glusterfs file from each node resolved the >> issue. >> >> I also found this thread; >> https://lists.gluster.org/pipermail/gluster-users/2018-December/035460.html >> Maybe he suffers from the same issue. >> >> Best Olaf >> >> >> Op wo 19 dec. 2018 om 21:56 schreef Olaf Buitelaar < >> olaf.buitel...@gmail.com>: >> >>> Dear All, >>> >>> It appears i've a stale file in one of the volumes, on 2 files. These >>> files are qemu images (1 raw and 1 qcow2). >>> I'll just focus on 1 file since the situation on the other seems the >>> same. >>> >>> The VM get's paused more or less directly after being booted with error; >>> [2018-12-18 14:05:05.275713] E [MSGID: 133010] >>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard: >>> Lookup on shard 51500 failed. Base file gfid = >>> f28cabcb-d169-41fc-a633-9bef4c4a8e40 [Stale file handle] >>> >>> investigating the shard; >>> >>> #on the arbiter node: >>> >>> [root@lease-05 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string >>> /mnt
Re: [Gluster-users] On making ctime generator enabled by default in stack
On Mon, Nov 12, 2018 at 10:48 AM Amar Tumballi wrote: > > > On Mon, Nov 12, 2018 at 10:39 AM Vijay Bellur wrote: > >> >> >> On Sun, Nov 11, 2018 at 8:25 PM Raghavendra Gowdappa >> wrote: >> >>> >>> >>> On Sun, Nov 11, 2018 at 11:41 PM Vijay Bellur >>> wrote: >>> On Mon, Nov 5, 2018 at 8:31 PM Raghavendra Gowdappa < rgowd...@redhat.com> wrote: > > > On Tue, Nov 6, 2018 at 9:58 AM Vijay Bellur > wrote: > >> >> >> On Mon, Nov 5, 2018 at 7:56 PM Raghavendra Gowdappa < >> rgowd...@redhat.com> wrote: >> >>> All, >>> >>> There is a patch [1] from Kotresh, which makes ctime generator as >>> default in stack. Currently ctime generator is being recommended only >>> for >>> usecases where ctime is important (like for Elasticsearch). However, a >>> reliable (c)(m)time can fix many consistency issues within glusterfs >>> stack >>> too. These are issues with caching layers having stale (meta)data >>> [2][3][4]. Basically just like applications, components within glusterfs >>> stack too need a time to find out which among racing ops (like write, >>> stat, >>> etc) has latest (meta)data. >>> >>> Also note that a consistent (c)(m)time is not an optional feature, >>> but instead forms the core of the infrastructure. So, I am proposing to >>> merge this patch. If you've any objections, please voice out before Nov >>> 13, >>> 2018 (a week from today). >>> >>> As to the existing known issues/limitations with ctime generator, my >>> conversations with Kotresh, revealed following: >>> * Potential performance degradation (we don't yet have data to >>> conclusively prove it, preliminary basic tests from Kotresh didn't >>> indicate >>> a significant perf drop). >>> >> >> Do we have this data captured somewhere? If not, would it be possible >> to share that data here? >> > > I misquoted Kotresh. He had measured impact of gfid2path and said both > features might've similar impact as major perf cost is related to storing > xattrs on backend fs. I am in the process of getting a fresh set of > numbers. Will post those numbers when available. > > I observe that the patch under discussion has been merged now [1]. A quick search did not yield me any performance data. Do we have the performance numbers posted somewhere? >>> >>> No. Perf benchmarking is a task pending on me. >>> >> >> When can we expect this task to be complete? >> >> In any case, I don't think it is ideal for us to merge a patch without >> completing our due diligence on it. How do we want to handle this scenario >> since the patch is already merged? >> >> We could: >> >> 1. Revert the patch now >> 2. Review the performance data and revert the patch if performance >> characterization indicates a significant dip. It would be preferable to >> complete this activity before we branch off for the next release. >> > > I am for option 2. Considering the branch out for next release is another > 2 months, and no one is expected to use the 'release' off a master branch > yet, it makes sense to give that buffer time to get this activity completed. > Its unlikely I'll have time for carrying out perf benchmark. Hence I've posted a revert here: https://review.gluster.org/#/c/glusterfs/+/21975/ > Regards, > Amar > > 3. Think of some other option? >> >> Thanks, >> Vijay >> >> >>> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users