Re: [Gluster-users] Error in Installing Glusterfs-4.1.6 from tar

2019-01-02 Thread Amudhan P
Can I skip this warning message in tail mail and continue with the
installation?

On Thu, Dec 27, 2018 at 5:11 PM Amudhan P  wrote:

> Thanks, Ravishankar it worked.
> also, I am getting the following warning message when running `make` is it
> safe to skip?
>
> dht-layout.c: In function ‘dht_layout_new’:
> dht-layout.c:51:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (layout->ref, 1);
>  ^
> dht-layout.c:51:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>   CC   dht-helper.lo
>
>
>   CC   ec.lo
> ec.c: In function ‘ec_statistics_init’:
> ec.c:637:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.hits, 0);
>  ^
> ec.c:637:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:638:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.misses, 0);
>  ^
> ec.c:638:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:639:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.updates, 0);
>  ^
> ec.c:639:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:640:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.invals, 0);
>  ^
> ec.c:640:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:641:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.evicts, 0);
>  ^
> ec.c:641:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:642:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.allocs, 0);
>  ^
> ec.c:642:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> ec.c:643:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT(ec->stats.stripe_cache.errors, 0);
>  ^
> ec.c:643:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>   CC   ec-data.lo
>
>
>   CCLD posix.la
> .libs/posix-inode-fd-ops.o: In function `posix_do_chmod':
> /home/qubevaultadmin/gluster-tar/glusterfs-4.1.6/xlators/storage/posix/src/posix-inode-fd-ops.c:203:
> warning: lchmod is not implemented and will always fail
> make[5]: Nothing to be done for 'all-am'.
>
>
>  CC   client-handshake.lo
> client-handshake.c: In function ‘clnt_fd_lk_local_create’:
> client-handshake.c:150:9: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (local->ref, 1);
>  ^
> client-handshake.c:150:9: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
>   CC   client-callback.lo
>
>   CC   readdir-ahead.lo
> readdir-ahead.c: In function ‘init’:
> readdir-ahead.c:637:9: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (priv->rda_cache_size, 0);
>  ^
> readdir-ahead.c:637:9: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
>   CCLD readdir-ahead.la
>
> Making all in src
>   CC   md-cache.lo
> md-cache.c: In function ‘mdc_init’:
> md-cache.c:3431:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (conf->mdc_counter.stat_hit, 0);
>  ^
> md-cache.c:3431:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> md-cache.c:3432:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (conf->mdc_counter.stat_miss, 0);
>  ^
> md-cache.c:3432:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> md-cache.c:3433:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
>  GF_ATOMIC_INIT (conf->mdc_counter.xattr_hit, 0);
>  ^
> md-cache.c:3433:9: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> md-cache.c:3434:9: warning: derefere

Re: [Gluster-users] java application crushes while reading a zip file

2019-01-02 Thread Raghavendra Gowdappa
On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev  wrote:

> Still no JVM crushes.  Is it possible that running glusterfs with
> performance options turned off for a couple of days cleared out the "stale
> metadata issue"?
>

restarting these options, would've cleared the existing cache and hence
previous stale metadata would've been cleared. Hitting stale metadata
again  depends on races. That might be the reason you are still not seeing
the issue. Can you try with enabling all perf xlators (default
configuration)?


>
> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev 
> wrote:
>
>> The software ran with all of the options turned off over the weekend
>> without any problems.
>> I will try to collect the debug info for you.  I have re-enabled the 3
>> three options, but yet to see the problem reoccurring.
>>
>>
>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa 
>> wrote:
>>
>>> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>>>
>>> * strace -ff -v ... of java application
>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
>>> mounting).
>>>
>>> regards,
>>> Raghavendra
>>>
>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
>>> wrote:
>>>
 These 3 options seem to trigger both (reading zip file and renaming
 files) problems.

 Options Reconfigured:
 performance.io-cache: off
 performance.stat-prefetch: off
 performance.quick-read: off
 performance.parallel-readdir: off
 *performance.readdir-ahead: on*
 *performance.write-behind: on*
 *performance.read-ahead: on*
 performance.client-io-threads: off
 nfs.disable: on
 transport.address-family: inet


 On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
 wrote:

> Turning a single option on at a time still worked fine.  I will keep
> trying.
>
> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
> messages.  Do you suppose these issues are triggered by the new 
> environment
> or did not exist in 4.1.5?
>
> [root@node1 ~]# glusterfs --version
> glusterfs 4.1.5
>
> On AWS using
> [root@node1 ~]# hostnamectl
>Static hostname: node1
>  Icon name: computer-vm
>Chassis: vm
> Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>Boot ID: 52bb159a0aa94043a40e7c7651967bd9
> Virtualization: kvm
>   Operating System: CentOS Linux 7 (Core)
>CPE OS Name: cpe:/o:centos:centos:7
> Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>   Architecture: x86-64
>
>
>
>
> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>>
>>
>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
>> wrote:
>>
>>> Ok. I will try different options.
>>>
>>> This system is scheduled to go into production soon.  What version
>>> would you recommend to roll back to?
>>>
>>
>> These are long standing issues. So, rolling back may not make these
>> issues go away. Instead if you think performance is agreeable to you,
>> please keep these xlators off in production.
>>
>>
>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>


 On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
 wrote:

> Raghavendra,
>
> Thank  for the suggestion.
>
>
> I am suing
>
> [root@jl-fanexoss1p glusterfs]# gluster --version
> glusterfs 5.0
>
> On
> [root@jl-fanexoss1p glusterfs]# hostnamectl
>  Icon name: computer-vm
>Chassis: vm
> Machine ID: e44b8478ef7a467d98363614f4e50535
>Boot ID: eed98992fdda4c88bdd459a89101766b
> Virtualization: vmware
>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>   Architecture: x86-64
>
>
> I have configured the following options
>
> [root@jl-fanexoss1p glusterfs]# gluster volume info
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
> Options Reconfigured:
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.parallel-readdir: off
> perfo

Re: [Gluster-users] java application crushes while reading a zip file

2019-01-02 Thread Dmitry Isakbayev
Still no JVM crushes.  Is it possible that running glusterfs with
performance options turned off for a couple of days cleared out the "stale
metadata issue"?


On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev  wrote:

> The software ran with all of the options turned off over the weekend
> without any problems.
> I will try to collect the debug info for you.  I have re-enabled the 3
> three options, but yet to see the problem reoccurring.
>
>
> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa 
> wrote:
>
>> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>>
>> * strace -ff -v ... of java application
>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
>> mounting).
>>
>> regards,
>> Raghavendra
>>
>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
>> wrote:
>>
>>> These 3 options seem to trigger both (reading zip file and renaming
>>> files) problems.
>>>
>>> Options Reconfigured:
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.quick-read: off
>>> performance.parallel-readdir: off
>>> *performance.readdir-ahead: on*
>>> *performance.write-behind: on*
>>> *performance.read-ahead: on*
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>>
>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
>>> wrote:
>>>
 Turning a single option on at a time still worked fine.  I will keep
 trying.

 We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
 messages.  Do you suppose these issues are triggered by the new environment
 or did not exist in 4.1.5?

 [root@node1 ~]# glusterfs --version
 glusterfs 4.1.5

 On AWS using
 [root@node1 ~]# hostnamectl
Static hostname: node1
  Icon name: computer-vm
Chassis: vm
 Machine ID: b30d0f2110ac3807b210c19ede3ce88f
Boot ID: 52bb159a0aa94043a40e7c7651967bd9
 Virtualization: kvm
   Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
 Kernel: Linux 3.10.0-862.3.2.el7.x86_64
   Architecture: x86-64




 On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

>
>
> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
> wrote:
>
>> Ok. I will try different options.
>>
>> This system is scheduled to go into production soon.  What version
>> would you recommend to roll back to?
>>
>
> These are long standing issues. So, rolling back may not make these
> issues go away. Instead if you think performance is agreeable to you,
> please keep these xlators off in production.
>
>
>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
>>> wrote:
>>>
 Raghavendra,

 Thank  for the suggestion.


 I am suing

 [root@jl-fanexoss1p glusterfs]# gluster --version
 glusterfs 5.0

 On
 [root@jl-fanexoss1p glusterfs]# hostnamectl
  Icon name: computer-vm
Chassis: vm
 Machine ID: e44b8478ef7a467d98363614f4e50535
Boot ID: eed98992fdda4c88bdd459a89101766b
 Virtualization: vmware
   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
 Kernel: Linux 3.10.0-862.14.4.el7.x86_64
   Architecture: x86-64


 I have configured the following options

 [root@jl-fanexoss1p glusterfs]# gluster volume info
 Volume Name: gv0
 Type: Replicate
 Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 1 x 3 = 3
 Transport-type: tcp
 Bricks:
 Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
 Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
 Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
 Options Reconfigured:
 performance.io-cache: off
 performance.stat-prefetch: off
 performance.quick-read: off
 performance.parallel-readdir: off
 performance.readdir-ahead: off
 performance.write-behind: off
 performance.read-ahead: off
 performance.client-io-threads: off
 nfs.disable: on
 transport.address-family: inet

 I don't know if it is related, but I am seeing a lot of
 [2018-12-27 20:19:23.776080] W [MSGID: 114031]
 [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
 operation failed [No such device or address]
 [2018-12-27 20:19:47.735190] 

[Gluster-users] Multiple versions on the same machine, errors on glusterd startup

2019-01-02 Thread Raphaël Yancey

Hi,

I'm trying to host several GlusterFS versions on the same machine 
(3.8.8, 4.1.6 and 5.2), not to be ran together of course.


I built them with the following procedure (examples with 3.8.8):


git clone https://github.com/gluster/glusterfs .
git checkout v3.8.8
./autogen
./configure --program-suffix="-3.8.8"
make
sudo make install
sudo cp -a extras/systemd/glusterd.service 
/etc/systemd/system/glusterd-3.8.8.service

sudo systemctl load glusterd-3.8.8

I had to edit the service for it to execute the right version of glusterd:

ExecStart=/usr/local/sbin/glusterd*-3.8.8* -p /var/run/glusterd.pid  
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS


And I had to create symlinks for glusterd:


cd /usr/local/sbin
ln -s glusterd-3.8.8 glusterfsd-3.8.8

I also ran ldconfig for good mesure...


sudo ldconfig


When I run glusterd in the foreground (not even with systemd) I'm left 
with some errors and the process exits (errors emphasized):


user@host0:~/glusterfs-3.8.8 on e5f3a990c [!☡]# sudo glusterd-3.8.8 
--debug
[2019-01-01 16:23:37.120684] I [MSGID: 100030] 
[glusterfsd.c:2454:main] 0-glusterd-3.8.8: Started running 
glusterd-3.8.8 version 3.8.8 (args: glusterd-3.8.8 --debug)
[2019-01-01 16:23:37.120765] D 
[logging.c:1791:__gf_log_inject_timer_event] 0-logging-infra: Starting 
timer now. Timeout = 120, current buf size = 5
[2019-01-01 16:23:37.121187] D [MSGID: 0] [glusterfsd.c:660:get_volfp] 
0-glusterfsd: loading volume file /usr/local/etc/glusterfs/glusterd.vol
[2019-01-01 16:23:37.137003] I [MSGID: 106478] [glusterd.c:1379:init] 
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-01 16:23:37.137064] I [MSGID: 106479] [glusterd.c:1428:init] 
0-management: Using /var/lib/glusterd as working directory
[2019-01-01 16:23:37.137262] D [MSGID: 0] 
[glusterd.c:406:glusterd_rpcsvc_options_build] 0-glusterd: 
listen-backlog value: 128
[2019-01-01 16:23:37.137683] D [rpcsvc.c:2316:rpcsvc_init] 
0-rpc-service: RPC service inited.
[2019-01-01 16:23:37.137723] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 
1, Port: 0
[2019-01-01 16:23:37.137798] D 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: attempt to 
load file /usr/local/lib/glusterfs/3.8.8/rpc-transport/socket.so
[2019-01-01 16:23:37.151778] D [socket.c:3938:socket_init] 
0-socket.management: Configued transport.tcp-user-timeout=0
[2019-01-01 16:23:37.151823] D [socket.c:4021:socket_init] 
0-socket.management: SSL support on the I/O path is NOT enabled
[2019-01-01 16:23:37.151862] D [socket.c:4024:socket_init] 
0-socket.management: SSL support for glusterd is NOT enabled
[2019-01-01 16:23:37.151890] D [socket.c:4041:socket_init] 
0-socket.management: using system polling thread
[2019-01-01 16:23:37.151927] D [name.c:584:server_fill_address_family] 
0-socket.management: option address-family not specified, defaulting 
to inet
[2019-01-01 16:23:37.152173] D 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: attempt to 
load file /usr/local/lib/glusterfs/3.8.8/rpc-transport/rdma.so
[2019-01-01 16:23:37.155510] D 
[rpc-transport.c:321:rpc_transport_load] 0-rpc-transport: dlsym 
(gf_rpc_transport_reconfigure) on 
/usr/local/lib/glusterfs/3.8.8/rpc-transport/rdma.so: undefined 
symbol: reconfigure
[2019-01-01 16:23:37.155830] W [MSGID: 103071] 
[rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event 
channel creation failed [No such device]
[2019-01-01 16:23:37.155884] W [MSGID: 103055] [rdma.c:4896:init] 
0-rdma.management: Failed to initialize IB Device
[2019-01-01 16:23:37.155920] W 
[rpc-transport.c:354:rpc_transport_load] 0-rpc-transport: 'rdma' 
initialization failed
[2019-01-01 16:23:37.156224] W [rpcsvc.c:1638:rpcsvc_create_listener] 
0-rpc-service: cannot create listener, initing the transport failed
*[2019-01-01 16:23:37.156258] E [MSGID: 106243] [glusterd.c:1652:init] 
0-management: creation of 1 listeners failed, continuing with 
succeeded transport*
[2019-01-01 16:23:37.156300] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: GlusterD svc peer, Num: 
1238437, Ver: 2, Port: 0
[2019-01-01 16:23:37.156332] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: GlusterD svc cli read-only, 
Num: 1238463, Ver: 2, Port: 0
[2019-01-01 16:23:37.156356] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: GlusterD svc mgmt, Num: 
1238433, Ver: 2, Port: 0
[2019-01-01 16:23:37.156384] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 
1238433, Ver: 3, Port: 0
[2019-01-01 16:23:37.156414] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, 
Ver: 1, Port: 0
[2019-01-01 16:23:37.156438] D [rpcsvc.c:1866:rpcsvc_program_register] 
0-rpc-service: New program registered: Gluster Handshake, Num: 
14398633, Ver: 2, Port: 0
[2019-01-01 16:23:37.1564

Re: [Gluster-users] [Stale file handle] in shard volume

2019-01-02 Thread Olaf Buitelaar
Hi Nithya,

Thank you for your reply.

the VM's using the gluster volumes keeps on getting paused/stopped on
errors like these;
[2019-01-02 02:33:44.469132] E [MSGID: 133010]
[shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
shard 101487 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
[Stale file handle]
[2019-01-02 02:33:44.563288] E [MSGID: 133010]
[shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
shard 101488 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
[Stale file handle]


What i'm trying to find out, if i can purge all gluster volumes from all
possible stale file handles (and hopefully find a method to prevent this in
the future), so the VM's can start running stable again.
For this i need to know when the "shard_common_lookup_shards_cbk" function
considers a file as stale.
The statement; "Stale file handle errors show up when a file with a
specified gfid is not found." doesn't seem to cover it all, as i've shown
in earlier mails the shard file and glusterfs/xx/xx/uuid file do both
exist, and have the same inode.
If the criteria i'm using aren't correct, could you please tell me which
criteria i should use to determine if a file is stale or not?
these criteria are just based observations i made, moving the stale files
manually. After removing them i was able to start the VM again..until some
time later it hangs on another stale shard file unfortunate.

Thanks Olaf

Op wo 2 jan. 2019 om 14:20 schreef Nithya Balachandran :

>
>
> On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar 
> wrote:
>
>> Dear All,
>>
>> till now a selected group of VM's still seem to produce new stale file's
>> and getting paused due to this.
>> I've not updated gluster recently, however i did change the op version
>> from 31200 to 31202 about a week before this issue arose.
>> Looking at the .shard directory, i've 100.000+ files sharing the same
>> characteristics as a stale file. which are found till now,
>> they all have the sticky bit set, e.g. file permissions; -T. are
>> 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.
>>
>
> These are internal files used by gluster and do not necessarily mean they
> are stale. They "point" to data files which may be on different bricks
> (same name, gfid etc but no linkto xattr and no T permissions).
>
>
>> These files range from long a go (beginning of the year) till now. Which
>> makes me suspect this was laying dormant for some time now..and somehow
>> recently surfaced.
>> Checking other sub-volumes they contain also 0kb files in the .shard
>> directory, but don't have the sticky bit and the linkto attribute.
>>
>> Does anybody else experience this issue? Could this be a bug or an
>> environmental issue?
>>
> These are most likely valid files- please do not delete them without
> double-checking.
>
> Stale file handle errors show up when a file with a specified gfid is not
> found. You will need to debug the files for which you see this error by
> checking the bricks to see if they actually exist.
>
>>
>> Also i wonder if there is any tool or gluster command to clean all stale
>> file handles?
>> Otherwise i'm planning to make a simple bash script, which iterates over
>> the .shard dir, checks each file for the above mentioned criteria, and
>> (re)moves the file and the corresponding .glusterfs file.
>> If there are other criteria needed to identify a stale file handle, i
>> would like to hear that.
>> If this is a viable and safe operation to do of course.
>>
>> Thanks Olaf
>>
>>
>>
>> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <
>> olaf.buitel...@gmail.com>:
>>
>>> Dear All,
>>>
>>> I figured it out, it appeared to be the exact same issue as described
>>> here;
>>> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
>>> Another subvolume also had the shard file, only were all 0 bytes and had
>>> the dht.linkto
>>>
>>> for reference;
>>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>>> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>>> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>>>
>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>>
>>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030
>>>
>>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100
>>>
>>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>>
>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>>
>>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d65386

Re: [Gluster-users] [Stale file handle] in shard volume

2019-01-02 Thread Nithya Balachandran
On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar 
wrote:

> Dear All,
>
> till now a selected group of VM's still seem to produce new stale file's
> and getting paused due to this.
> I've not updated gluster recently, however i did change the op version
> from 31200 to 31202 about a week before this issue arose.
> Looking at the .shard directory, i've 100.000+ files sharing the same
> characteristics as a stale file. which are found till now,
> they all have the sticky bit set, e.g. file permissions; -T. are
> 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.
>

These are internal files used by gluster and do not necessarily mean they
are stale. They "point" to data files which may be on different bricks
(same name, gfid etc but no linkto xattr and no T permissions).


> These files range from long a go (beginning of the year) till now. Which
> makes me suspect this was laying dormant for some time now..and somehow
> recently surfaced.
> Checking other sub-volumes they contain also 0kb files in the .shard
> directory, but don't have the sticky bit and the linkto attribute.
>
> Does anybody else experience this issue? Could this be a bug or an
> environmental issue?
>
These are most likely valid files- please do not delete them without
double-checking.

Stale file handle errors show up when a file with a specified gfid is not
found. You will need to debug the files for which you see this error by
checking the bricks to see if they actually exist.

>
> Also i wonder if there is any tool or gluster command to clean all stale
> file handles?
> Otherwise i'm planning to make a simple bash script, which iterates over
> the .shard dir, checks each file for the above mentioned criteria, and
> (re)moves the file and the corresponding .glusterfs file.
> If there are other criteria needed to identify a stale file handle, i
> would like to hear that.
> If this is a viable and safe operation to do of course.
>
> Thanks Olaf
>
>
>
> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <
> olaf.buitel...@gmail.com>:
>
>> Dear All,
>>
>> I figured it out, it appeared to be the exact same issue as described
>> here;
>> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
>> Another subvolume also had the shard file, only were all 0 bytes and had
>> the dht.linkto
>>
>> for reference;
>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>
>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030
>>
>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100
>>
>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>
>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030
>>
>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100
>>
>> [root@lease-04 ovirt-backbone-2]# stat
>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>   File: ‘.glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d’
>>   Size: 0   Blocks: 0  IO Block: 4096   regular empty
>> file
>> Device: fd01h/64769dInode: 1918631406  Links: 2
>> Access: (1000/-T)  Uid: (0/root)   Gid: (0/root)
>> Context: system_u:object_r:etc_runtime_t:s0
>> Access: 2018-12-17 21:43:36.405735296 +
>> Modify: 2018-12-17 21:43:36.405735296 +
>> Change: 2018-12-17 21:43:36.405735296 +
>>  Birth: -
>>
>> removing the shard file and glusterfs file from each node resolved the
>> issue.
>>
>> I also found this thread;
>> https://lists.gluster.org/pipermail/gluster-users/2018-December/035460.html
>> Maybe he suffers from the same issue.
>>
>> Best Olaf
>>
>>
>> Op wo 19 dec. 2018 om 21:56 schreef Olaf Buitelaar <
>> olaf.buitel...@gmail.com>:
>>
>>> Dear All,
>>>
>>> It appears i've a stale file in one of the volumes, on 2 files. These
>>> files are qemu images (1 raw and 1 qcow2).
>>> I'll just focus on 1 file since the situation on the other seems the
>>> same.
>>>
>>> The VM get's paused more or less directly after being booted with error;
>>> [2018-12-18 14:05:05.275713] E [MSGID: 133010]
>>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard:
>>> Lookup on shard 51500 failed

Re: [Gluster-users] [Stale file handle] in shard volume

2019-01-02 Thread Olaf Buitelaar
Dear All,

The bash file i'm planning to run can be found here;
https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986
It would be nice to receive some feedback from the community before i would
actually run the clean-up of all stale file handles.

Thanks Olaf

Op zo 30 dec. 2018 om 20:56 schreef Olaf Buitelaar :

> Dear All,
>
> till now a selected group of VM's still seem to produce new stale file's
> and getting paused due to this.
> I've not updated gluster recently, however i did change the op version
> from 31200 to 31202 about a week before this issue arose.
> Looking at the .shard directory, i've 100.000+ files sharing the same
> characteristics as a stale file. which are found till now,
> they all have the sticky bit set, e.g. file permissions; -T. are
> 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.
> These files range from long a go (beginning of the year) till now. Which
> makes me suspect this was laying dormant for some time now..and somehow
> recently surfaced.
> Checking other sub-volumes they contain also 0kb files in the .shard
> directory, but don't have the sticky bit and the linkto attribute.
>
> Does anybody else experience this issue? Could this be a bug or an
> environmental issue?
>
> Also i wonder if there is any tool or gluster command to clean all stale
> file handles?
> Otherwise i'm planning to make a simple bash script, which iterates over
> the .shard dir, checks each file for the above mentioned criteria, and
> (re)moves the file and the corresponding .glusterfs file.
> If there are other criteria needed to identify a stale file handle, i
> would like to hear that.
> If this is a viable and safe operation to do of course.
>
> Thanks Olaf
>
>
>
> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <
> olaf.buitel...@gmail.com>:
>
>> Dear All,
>>
>> I figured it out, it appeared to be the exact same issue as described
>> here;
>> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
>> Another subvolume also had the shard file, only were all 0 bytes and had
>> the dht.linkto
>>
>> for reference;
>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>> .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>> # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>
>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030
>>
>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100
>>
>> [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>> # file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
>> trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
>>
>> trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030
>>
>> trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100
>>
>> [root@lease-04 ovirt-backbone-2]# stat
>> .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
>>   File: ‘.glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d’
>>   Size: 0   Blocks: 0  IO Block: 4096   regular empty
>> file
>> Device: fd01h/64769dInode: 1918631406  Links: 2
>> Access: (1000/-T)  Uid: (0/root)   Gid: (0/root)
>> Context: system_u:object_r:etc_runtime_t:s0
>> Access: 2018-12-17 21:43:36.405735296 +
>> Modify: 2018-12-17 21:43:36.405735296 +
>> Change: 2018-12-17 21:43:36.405735296 +
>>  Birth: -
>>
>> removing the shard file and glusterfs file from each node resolved the
>> issue.
>>
>> I also found this thread;
>> https://lists.gluster.org/pipermail/gluster-users/2018-December/035460.html
>> Maybe he suffers from the same issue.
>>
>> Best Olaf
>>
>>
>> Op wo 19 dec. 2018 om 21:56 schreef Olaf Buitelaar <
>> olaf.buitel...@gmail.com>:
>>
>>> Dear All,
>>>
>>> It appears i've a stale file in one of the volumes, on 2 files. These
>>> files are qemu images (1 raw and 1 qcow2).
>>> I'll just focus on 1 file since the situation on the other seems the
>>> same.
>>>
>>> The VM get's paused more or less directly after being booted with error;
>>> [2018-12-18 14:05:05.275713] E [MSGID: 133010]
>>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard:
>>> Lookup on shard 51500 failed. Base file gfid =
>>> f28cabcb-d169-41fc-a633-9bef4c4a8e40 [Stale file handle]
>>>
>>> investigating the shard;
>>>
>>> #on the arbiter node:
>>>
>>> [root@lease-05 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string
>>> /mnt

Re: [Gluster-users] On making ctime generator enabled by default in stack

2019-01-02 Thread Raghavendra Gowdappa
On Mon, Nov 12, 2018 at 10:48 AM Amar Tumballi  wrote:

>
>
> On Mon, Nov 12, 2018 at 10:39 AM Vijay Bellur  wrote:
>
>>
>>
>> On Sun, Nov 11, 2018 at 8:25 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Sun, Nov 11, 2018 at 11:41 PM Vijay Bellur 
>>> wrote:
>>>


 On Mon, Nov 5, 2018 at 8:31 PM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

>
>
> On Tue, Nov 6, 2018 at 9:58 AM Vijay Bellur 
> wrote:
>
>>
>>
>> On Mon, Nov 5, 2018 at 7:56 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> All,
>>>
>>> There is a patch [1] from Kotresh, which makes ctime generator as
>>> default in stack. Currently ctime generator is being recommended only 
>>> for
>>> usecases where ctime is important (like for Elasticsearch). However, a
>>> reliable (c)(m)time can fix many consistency issues within glusterfs 
>>> stack
>>> too. These are issues with caching layers having stale (meta)data
>>> [2][3][4]. Basically just like applications, components within glusterfs
>>> stack too need a time to find out which among racing ops (like write, 
>>> stat,
>>> etc) has latest (meta)data.
>>>
>>> Also note that a consistent (c)(m)time is not an optional feature,
>>> but instead forms the core of the infrastructure. So, I am proposing to
>>> merge this patch. If you've any objections, please voice out before Nov 
>>> 13,
>>> 2018 (a week from today).
>>>
>>> As to the existing known issues/limitations with ctime generator, my
>>> conversations with Kotresh, revealed following:
>>> * Potential performance degradation (we don't yet have data to
>>> conclusively prove it, preliminary basic tests from Kotresh didn't 
>>> indicate
>>> a significant perf drop).
>>>
>>
>> Do we have this data captured somewhere? If not, would it be possible
>> to share that data here?
>>
>
> I misquoted Kotresh. He had measured impact of gfid2path and said both
> features might've similar impact as major perf cost is related to storing
> xattrs on backend fs. I am in the process of getting a fresh set of
> numbers. Will post those numbers when available.
>
>

 I observe that the patch under discussion has been merged now [1]. A
 quick search did not yield me any performance data. Do we have the
 performance numbers posted somewhere?

>>>
>>> No. Perf benchmarking is a task pending on me.
>>>
>>
>> When can we expect this task to be complete?
>>
>> In any case, I don't think it is ideal for us to merge a patch without
>> completing our due diligence on it. How do we want to handle this scenario
>> since the patch is already merged?
>>
>> We could:
>>
>> 1. Revert the patch now
>> 2. Review the performance data and revert the patch if performance
>> characterization indicates a significant dip. It would be preferable to
>> complete this activity before we branch off for the next release.
>>
>
> I am for option 2. Considering the branch out for next release is another
> 2 months, and no one is expected to use the 'release' off a master branch
> yet, it makes sense to give that buffer time to get this activity completed.
>

Its unlikely I'll have time for carrying out perf benchmark. Hence I've
posted a revert here: https://review.gluster.org/#/c/glusterfs/+/21975/


> Regards,
> Amar
>
> 3. Think of some other option?
>>
>> Thanks,
>> Vijay
>>
>>
>>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users