Re: [Gluster-users] Cannot list `.snaps/` directory

2018-07-19 Thread Mohammed Rafi K C



On 07/19/2018 06:35 PM, Riccardo Murri wrote:
> Hello Rafi,
>
> mounting as a regular volume works fine:
>
> ubuntu@slurm-master-001:/var/log/glusterfs$ sudo mount -t glusterfs
> glusterfs-server-001:/snaps/test_GMT-2018.07.18-10.02.05/glusterfs
> /mnt
>
> ubuntu@slurm-master-001:/var/log/glusterfs$ ls /mnt/
> active  filesystem  homes  jobdaemon  opt  share
>
> ubuntu@slurm-master-001:/var/log/glusterfs$ ls /mnt/opt
> Anaconda2-5.1.0-Linux-x86_64  bin  lmod  modulefiles
>
> Any hint why it should not work via USS?
Let's figure it out :).

Can you enable debug log for both client and brick. Also can you take
statedump of client process and brick process

Regards
Rafi KC


>
> Thanks,
> R

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Snapshot with 4.1.1

2018-07-12 Thread Mohammed Rafi K C
Hi Stefan,

This is not a Gluster4 series, it is a design choice. This is the same
behavior in 3.x series also. When you restore a volume, we cannot use
the old path. Because we are not sure about the path at time when we
restore the volume. I will explain with an example.

Let's say you have 1*3 volume, and you took snapshot say snap1. Later
you decide to remove a brick and make the volume as 1*2. It is very much
possible that the removed brick will be deleted and/or mount point might
have reused for some other purpose. So when we restore the snap1, it
will have volume configuration as 1*3. If we use the same path, then the
third path is not in our gluster space.

To avoid this scenario, when a snapshot restore, we use the same
snapshot bricks. Ie volume bricks will make to point to snapshot brick.


Regards

Rafi KC


On 07/12/2018 01:47 PM, Stefan Kania wrote:
>
> No one uses gluster 4.1.1 with snapshots?
>
>
> Am 10.07.18 um 14:11 schrieb Stefan Kania:
>> Hello,
>>
>> I just installed Gluster Version 4.1.1 from, the gluster.org repository.
>> I tested the snapshot function and now I'm having the following problem:
>>
>> When I do a "gluster volume info" BEFOR the snapshot I got:
>> --
>>
>> root@sambabuch-c2:~# gluster snapshot create snap1 gv1
>>
>> root@sambabuch-c2:~# gluster v info
>>
>> Volume Name: gv1
>> Type: Replicate
>> Volume ID: 43dcb41c-4893-4bbe-98fd-01f47810ee89
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: knoten-1:/gluster/brick
>> Brick2: knoten-2:/gluster/brick
>> Options Reconfigured:
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> --
>> The path in "Brick1:" and "Brick2" ist ok.
>> Then I revert to the snapshot with:
>> -
>>
>> root@sambabuch-c1:~# gluster volume stop gv1
>>
>> root@sambabuch-c1:~# gluster snapshot restore snap1_GMT-2018.07.10-11.36.32
>>
>> root@sambabuch-c1:~# gluster v info
>>
>> Volume Name: gv1
>> Type: Replicate
>> Volume ID: 43dcb41c-4893-4bbe-98fd-01f47810ee89
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1:
>> knoten-1:/run/gluster/snaps/ff8f9838466a4566a18fb74be82ed56d/brick1/brick
>> Brick2:
>> knoten-2:/run/gluster/snaps/ff8f9838466a4566a18fb74be82ed56d/brick2/brick
>> Options Reconfigured:
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> features.quota: off
>> features.inode-quota: off
>> features.quota-deem-statfs: off
>> -
>> As you can see, the path for "Brick1: " and "Brick2: " has changed to
>> the path where the snapshot is located. How do I get the original path
>> back?
>> I have never seen this wit a Gluster-Version 3.x before.
>>
>> I use Debian 9. I created an LVM2 to take snapshots. The snapshots are
>> working, as long as I only mount the snapshot and copy some files out of
>> the snapshot to the volume, everything is fine. Only if I revert a
>> snapshot I have the problem. Is there anything new I did not do? Or how
>> can I got my original path back?
>>
>> Thank's
>>
>> Stefan
>>
>>
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> -- 
> Stefan Kania
> Landweg 13
> 25693 St. Michaelisdonn
>
>
> Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre E-Mail. 
> Weiter Informationen unter http://www.gnupg.org
>
> Mein Schlüssel liegt auf
>
> hkp://subkeys.pgp.net
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rafi KC attending DevConf and FOSDEM

2018-01-26 Thread Mohammed Rafi K C
Hi All,

I'm attending both DevConf (25-28) and Fosdem (3-4). If any of you are
attending the conferences and would like to talk about gluster, please
feel free to ping me through irc nick rafi on freenode or message me on
+436649795838


Regards

Rafi KC

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 回复: glusterfs segmentation fault in rdma mode

2017-11-09 Thread Mohammed Rafi K C
Hi,

For segfault problem,

Can you please give us more information like core dump as bent
suggested, or/and log files, even reproducible methods will also help.

For problem with Directory creation,

It looks like the client A has some problem to connect with hashed
subvolume.

Do you have a reproducible or more logs ?

Regards

Rafi KC


On 11/06/2017 11:51 AM, acfreeman wrote:
> Hi ,all
>
>  We found a strange problem. Some clients worked normally while some
> clients couldn't access sepcial files. For exmaple, Client A couldn't
> create the directory xxx, but Client B could. However, if Client B
> created the directory, Client A could acess it and even deleted it.
> But Client A still couldn't create the same directory later. If I
> changed the directory name, Client A worked without problems. It
> seemed that there were some problems with special bricks in special
> clients. But all the bricks were online.
>
> I saw this in the logs in the GlusterFS client after creating
> directory failure:
> [2017-11-06 11:55:18.420610] W [MSGID: 109011]
> [dht-layout.c:186:dht_layout_search] 0-data-dht: no subvolume for hash
> (value) = 4148753024
> [2017-11-06 11:55:18.457744] W [fuse-bridge.c:521:fuse_entry_cbk]
> 0-glusterfs-fuse: 488: MKDIR() /xxx => -1 (Input/output error)
> The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search]
> 0-data-dht: no subvolume for hash (value) = 4148753024" repeated 3
> times between [2017-11-06 11:55:18.420610] and [2017-11-06
> 11:55:18.457731]
>
>
> -- 原始邮件 --
> *发件人:* "Bennbsp;Turner";;
> *发送时间:* 2017年11月5日(星期天) 凌晨3:00
> *收件人:* "acfreeman"<21291...@qq.com>;
> *抄送:* "gluster-users";
> *主题:* Re: [Gluster-users] glusterfs segmentation fault in rdma mode
>
> This looks like there could be some some problem requesting / leaking
> / whatever memory but without looking at the core its tought to tell
> for sure.   Note:
>
> /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618]
>
> Can you open up a bugzilla and get us the core file to review?
>
> -b
>
> - Original Message -
> > From: "自由人" <21291...@qq.com>
> > To: "gluster-users" 
> > Sent: Saturday, November 4, 2017 5:27:50 AM
> > Subject: [Gluster-users] glusterfs segmentation fault in rdma mode
> >
> >
> >
> > Hi, All,
> >
> >
> >
> >
> > I used Infiniband to connect all GlusterFS nodes and the clients.
> Previously
> > I run IP over IB and everything was OK. Now I used rdma transport mode
> > instead. And then I ran the traffic. After I while, the glusterfs
> process
> > exited because of segmentation fault.
> >
> >
> >
> >
> > Here were the messages when I saw segmentation fault:
> >
> > pending frames:
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(1) op(WRITE)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > frame : type(0) op(0)
> >
> > patchset: git:// git.gluster.org/glusterfs.git
> >
> > signal received: 11
> >
> > time of crash:
> >
> > 2017-11-01 11:11:23
> >
> > configuration details:
> >
> > argp 1
> >
> > backtrace 1
> >
> > dlfcn 1
> >
> > libpthread 1
> >
> > llistxattr 1
> >
> > setfsid 1
> >
> > spinlock 1
> >
> > epoll.h 1
> >
> > xattr.h 1
> >
> > st_atim.tv_nsec 1
> >
> > package-string: glusterfs 3.11.0
> >
> > /usr/lib64/
> libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618 ]
> >
> > /usr/lib64/ libglusterfs.so.0(gf_print_trace+0x324)[0x7f95bc557834 ]
> >
> > /lib64/ libc.so.6(+0x32510)[0x7f95bace2510 ]
> >
> > The client OS was CentOS 7.3. The server OS was CentOS 6.5. The
> GlusterFS
> > version was 3.11.0 both in clients and servers. The Infiniband card was
> > Mellanox. The Mellanox IB driver version was v4.1-1.0.2 (27 Jun
> 2017) both
> > in clients and servers.
> >
> >
> > Is rdma code stable for GlusterFS? Need I upgrade the IB driver or
> apply a
> > patch?
> >
> > Thanks!
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list

Re: [Gluster-users] Enabling Halo sets volume RO

2017-11-08 Thread Mohammed Rafi K C
I think the problem here is by default the quorum is playing around
here, to get rid of this you can change quorum type as fixed and the
value as 2 , or you can disable the quorum.


Regards

Rafi KC


On 11/08/2017 04:03 AM, Jon Cope wrote:
> Hi all,
>
> I'm taking a stab at deploying a storage cluster to explore the Halo
> AFR feature and running into some trouble.  In GCE, I have 4
> instances, each with one 10gb brick.  2 instances are in the US and
> the other 2 are in Asia (with the hope that it will drive up latency
> sufficiently).  The bricks make up a Replica-4 volume.  Before I
> enable halo, I can mount to volume and r/w files.
>
> The issue is when I set the `cluster.halo-enabled yes`, I can no
> longer write to the volume:
>
> [root@jcope-rhs-g2fn vol]# touch /mnt/vol/test1
> touch: setting times of ‘test1’: Read-only file system
>
> This can be fixed by turning halo off again.  While halo is enabled
> and writes return the above message, the mount still shows it to be r/w:
>
> [root@jcope-rhs-g2fn vol]# mount
> gce-node1:gv0 on /mnt/vol type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>
> Thanks in advace,
> -Jon
>
>
> Setup info
> CentOS Linux release 7.4.1708 (Core)
> 4 GCE Instances (2 US, 2 Asia)
> 1 10gb Brick/Instance
> replica 4 volume
>
> Packages:
>
> glusterfs-client-xlators-3.12.1-2.el7.x86_64
> glusterfs-cli-3.12.1-2.el7.x86_64
> python2-gluster-3.12.1-2.el7.x86_64
> glusterfs-3.12.1-2.el7.x86_64
> glusterfs-api-3.12.1-2.el7.x86_64
> glusterfs-fuse-3.12.1-2.el7.x86_64
> glusterfs-server-3.12.1-2.el7.x86_64
> glusterfs-libs-3.12.1-2.el7.x86_64
> glusterfs-geo-replication-3.12.1-2.el7.x86_64
>
>
>  
> Logs, beginning when halo is enabled:
>
> [2017-11-07 22:20:15.029298] W [MSGID: 101095]
> [xlator.c:213:xlator_dynload] 0-xlator:
> /usr/lib64/glusterfs/3.12.1/xlator/nfs/server.so: cannot open shared
> object file: No such file or directory
> [2017-11-07 22:20:15.204241] W [MSGID: 101095]
> [xlator.c:162:xlator_volopt_dynload] 0-xlator:
> /usr/lib64/glusterfs/3.12.1/xlator/nfs/server.so: cannot open shared
> object file: No such file or directory
> [2017-11-07 22:20:15.232176] I [MSGID: 106600]
> [glusterd-nfs-svc.c:163:glusterd_nfssvc_reconfigure] 0-management:
> nfs/server.so xlator is not installed
> [2017-11-07 22:20:15.235481] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
> already stopped
> [2017-11-07 22:20:15.235512] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
> service is stopped
> [2017-11-07 22:20:15.235572] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd
> already stopped
> [2017-11-07 22:20:15.235585] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service
> is stopped
> [2017-11-07 22:20:15.235638] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub
> already stopped
> [2017-11-07 22:20:15.235650] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub
> service is stopped
> [2017-11-07 22:20:15.250297] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.1/xlator/mgmt/glusterd.so(+0xde17a)
> [0x7fc23442117a]
> -->/usr/lib64/glusterfs/3.12.1/xlator/mgmt/glusterd.so(+0xddc3d)
> [0x7fc234420c3d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fc23f915da5] ) 0-management: Ran script: /var/lib
> /glusterd/hooks/1/set/post/S30samba-set.sh --volname=gv0 -o
> cluster.halo-enabled=yes --gd-workdir=/var/lib/glusterd
> [2017-11-07 22:20:15.255777] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.1/xlator/mgmt/glusterd.so(+0xde17a)
> [0x7fc23442117a]
> -->/usr/lib64/glusterfs/3.12.1/xlator/mgmt/glusterd.so(+0xddc3d)
> [0x7fc234420c3d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fc23f915da5] ) 0-management: Ran script: /var/lib
> /glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
> --volname=gv0 -o cluster.halo-enabled=yes --gd-workdir=/var/lib/glusterd
> [2017-11-07 22:20:47.420098] W [MSGID: 101095]
> [xlator.c:213:xlator_dynload] 0-xlator:
> /usr/lib64/glusterfs/3.12.1/xlator/nfs/server.so: cannot open shared
> object file: No such file or directory
> [2017-11-07 22:20:47.595960] W [MSGID: 101095]
> [xlator.c:162:xlator_volopt_dynload] 0-xlator:
> /usr/lib64/glusterfs/3.12.1/xlator/nfs/server.so: cannot open shared
> object file: No such file or directory
> [2017-11-07 22:20:47.631833] I [MSGID: 106600]
> [glusterd-nfs-svc.c:163:glusterd_nfssvc_reconfigure] 0-management:
> nfs/server.so xlator is not installed
> [2017-11-07 22:20:47.635109] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
> already stopped
> [2017-11-07 22:20:47.635136] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
> service is stopped
> [2017-11-07 22:20:47.635201] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 

Re: [Gluster-users] problems running a vol over IPoIB, and qemu off it?

2017-10-23 Thread Mohammed Rafi K C
The backtrace you have provided here suggests that the issue could be
with mellanox driver, though the question still valid to users of the IPoIB.


Regards

Rafi KC


On 10/23/2017 09:29 PM, lejeczek wrote:
> hi people
>
> I wonder if anybody experience any problems with vols in replica mode
> that run across IPoIB links and libvirt stores qcow image on such a
> volume?
>
> I wonder if maybe devel could confirm it should just work, and then
> hardware/Infiniband I should blame.
>
> I have a direct IPoIB link between two hosts, gluster replica volume,
> libvirt store disk images there.
>
> I start a guest on hostA and I get below on hostB(which is IB subnet
> manager):
>
> [Mon Oct 23 16:43:32 2017] Workqueue: ipoib_wq ipoib_cm_tx_start
> [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  8010 553c90b1
> 880c1c6eb818 816a3db1
> [Mon Oct 23 16:43:32 2017]  880c1c6eb8a8 81188810
>  88042ffdb000
> [Mon Oct 23 16:43:32 2017]  0004 8010
> 880c1c6eb8a8 553c90b1
> [Mon Oct 23 16:43:32 2017] Call Trace:
> [Mon Oct 23 16:43:32 2017]  [] dump_stack+0x19/0x1b
> [Mon Oct 23 16:43:32 2017]  []
> warn_alloc_failed+0x110/0x180
> [Mon Oct 23 16:43:32 2017]  []
> __alloc_pages_slowpath+0x6b6/0x724
> [Mon Oct 23 16:43:32 2017]  []
> __alloc_pages_nodemask+0x405/0x420
> [Mon Oct 23 16:43:32 2017]  []
> dma_generic_alloc_coherent+0x8f/0x140
> [Mon Oct 23 16:43:32 2017]  []
> gart_alloc_coherent+0x2d/0x40
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core]
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_buf_alloc+0x1cb/0x240 [mlx4_core]
> [Mon Oct 23 16:43:32 2017]  []
> create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib]
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib]
> [Mon Oct 23 16:43:32 2017]  [] ?
> ipoib_cm_tx_init+0x5c/0x400 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> ib_create_qp+0x7a/0x2f0 [ib_core]
> [Mon Oct 23 16:43:32 2017]  []
> ipoib_cm_tx_init+0x103/0x400 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> ipoib_cm_tx_start+0x268/0x3f0 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> process_one_work+0x17a/0x440
> [Mon Oct 23 16:43:32 2017]  []
> worker_thread+0x126/0x3c0
> [Mon Oct 23 16:43:32 2017]  [] ?
> manage_workers.isra.24+0x2a0/0x2a0
> [Mon Oct 23 16:43:32 2017]  [] kthread+0xcf/0xe0
> [Mon Oct 23 16:43:32 2017]  [] ?
> insert_kthread_work+0x40/0x40
> [Mon Oct 23 16:43:32 2017]  [] ret_from_fork+0x58/0x90
> [Mon Oct 23 16:43:32 2017]  [] ?
> insert_kthread_work+0x40/0x40
> [Mon Oct 23 16:43:32 2017] Mem-Info:
> [Mon Oct 23 16:43:32 2017] active_anon:2389656 inactive_anon:17792
> isolated_anon:0
>  active_file:14294829 inactive_file:14609973 isolated_file:0
>  unevictable:24185 dirty:11846 writeback:9907 unstable:0
>  slab_reclaimable:1024309 slab_unreclaimable:127961
>  mapped:74895 shmem:28096 pagetables:30088 bounce:0
>  free:142329 free_pcp:249 free_cma:0
> [Mon Oct 23 16:43:32 2017] Node 0 DMA free:15320kB min:24kB low:28kB
> high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
>
>
> To clarify - other volumes which use that IPoIB link do not seem to
> case that, or any other problem.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Release 3.12.2 : Scheduled for the 10th of October

2017-10-12 Thread Mohammed Rafi K C
Hi Jiffin/Shyam,


Snapshot volume has been broken in 3.12 . We just got the bug, I have
send a patch [1]  . Let me know your thought.


[1] :  https://review.gluster.org/18506

On 10/12/2017 12:32 PM, Jiffin Tony Thottan wrote:
> Hi,
>
> I am planning to do 3.12.2 release today 11:00 pm IST (5:30 pm GMT).
>
> Following bugs is removed from tracker list
>
> Bug 1493422 - AFR : [RFE] Improvements needed in "gluster volume heal
> info" commands -- feature request will be target for 3.13
>
> Bug 1497989 - Gluster 3.12.1 Packages require manual systemctl daemon
> reload after install -- "-1" from Kaleb, no progress from Oct 4th,
>
> will be tracked as part of 3.12.3
>
> Regards,
>
> Jiffin
>
>
>
>
> On 06/10/17 12:36, Jiffin Tony Thottan wrote:
>> Hi,
>>
>> It's time to prepare the 3.12.2 release, which falls on the 10th of
>> each month, and hence would be 10-10-2017 this time around.
>>
>> This mail is to call out the following,
>>
>> 1) Are there any pending *blocker* bugs that need to be tracked for
>> 3.12.2? If so mark them against the provided tracker [1] as blockers
>> for the release, or at the very least post them as a response to this
>> mail
>>
>> 2) Pending reviews in the 3.12 dashboard will be part of the release,
>> *iff* they pass regressions and have the review votes, so use the
>> dashboard [2] to check on the status of your patches to 3.12 and get
>> these going
>>
>> 3) I have made checks on what went into 3.10 post 3.12 release and if
>> these fixes are already included in 3.12 branch, then status on this
>> is *green*
>> as all fixes ported to 3.10, are ported to 3.12 as well.
>>
>> Thanks,
>> Jiffin
>>
>> [1] Release bug tracker:
>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.12.2
>>
>> [2] 3.10 review dashboard:
>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-12-dashboard
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster volume + lvm : recommendation or neccessity ?

2017-10-11 Thread Mohammed Rafi K C

Volumes are aggregation of bricks, so I would consider bricks as a
unique entity here rather than volumes. Taking the constraints from the
blog [1].

* All bricks should be carved out from an independent thinly provisioned
logical volume (LV). In other words, no two brick should share a common
LV. More details about thin provisioning and thin provisioned snapshot
can be found here.
* This thinly provisioned LV should only be used for forming a brick.
* Thin pool from which the thin LVs are created should have sufficient
space and also it should have sufficient space for pool metadata.

You can refer the blog post here [1].

[1] : http://rajesh-joseph.blogspot.in/p/gluster-volume-snapshot-howto.html

Regards
Rafi KC


On 10/11/2017 01:23 PM, ML wrote:
> Thanks Rafi, that's understood now :)
>
> I'm considering to deploy gluster on a 4 x 40 TB  bricks, do you think
> it would better to make 1 LVM partition for each Volume I need or to
> make one Big LVM partition and start multiple volumes on it ?
>
> We'll store mostly big files (videos) on this environement.
>
>
>
>
> Le 11/10/2017 à 09:34, Mohammed Rafi K C a écrit :
>>
>> On 10/11/2017 12:20 PM, ML wrote:
>>> Hi everyone,
>>>
>>> I've read on the gluster & redhat documentation, that it seems
>>> recommended to use XFS over LVM before creating & using gluster
>>> volumes.
>>>
>>> Sources :
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Formatting_and_Mounting_Bricks.html
>>>
>>>
>>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/
>>>
>>>
>>>
>>> My point is : do we really need LVM ?
>> This recommendations was added after gluster-snapshot. Gluster snapshot
>> relays on LVM snapshot. So if you start with out lvm, in future if you
>> want to use snapshot then it would be difficult, hence the
>> recommendation to use xfs on top of lvm.
>>
>>
>> Regards
>> Rafi KC
>>
>>> For example , on a dedicated server with disks & partitions that will
>>> not change of size, it doesn't seems necessary to use LVM.
>>>
>>> I can't understand clearly wich partitioning strategy would be the
>>> best for "static size" hard drives :
>>>
>>> 1 LVM+XFS partition = multiple gluster volumes
>>> or 1 LVM+XFS partition = 1 gluster volume per LVM+XFS partition
>>> or 1 XFS partition = multiple gluster volumes
>>> or 1 XFS partition = 1 gluster volume per XFS partition
>>>
>>> What do you use on your servers ?
>>>
>>> Thanks for your help! :)
>>>
>>> Quentin
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster volume + lvm : recommendation or neccessity ?

2017-10-11 Thread Mohammed Rafi K C


On 10/11/2017 12:20 PM, ML wrote:
> Hi everyone,
>
> I've read on the gluster & redhat documentation, that it seems
> recommended to use XFS over LVM before creating & using gluster volumes.
>
> Sources :
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Formatting_and_Mounting_Bricks.html
>
> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/
>
>
> My point is : do we really need LVM ?

This recommendations was added after gluster-snapshot. Gluster snapshot
relays on LVM snapshot. So if you start with out lvm, in future if you
want to use snapshot then it would be difficult, hence the
recommendation to use xfs on top of lvm. 


Regards
Rafi KC

> For example , on a dedicated server with disks & partitions that will
> not change of size, it doesn't seems necessary to use LVM.
>
> I can't understand clearly wich partitioning strategy would be the
> best for "static size" hard drives :
>
> 1 LVM+XFS partition = multiple gluster volumes
> or 1 LVM+XFS partition = 1 gluster volume per LVM+XFS partition
> or 1 XFS partition = multiple gluster volumes
> or 1 XFS partition = 1 gluster volume per XFS partition
>
> What do you use on your servers ?
>
> Thanks for your help! :)
>
> Quentin
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] georeplication sync deamon

2017-09-20 Thread Mohammed Rafi K C


On 09/17/2017 11:38 AM, atris adam wrote:
> hi all,
>
> I want to know some more detail about glusterfs georeplication, more
> about syncdeamon, if 'file A' was mirorred in slave volume , a change
> happen to 'file A', then how the syncdeamon act?

Changes to a file will be captured in changelog , (it is not a full data
logging, just an indication this file needs sync.) and then in an
interval the daemon scans the changelog files and does rsync from master
mount to slave mount for that particular file.

Regards
Rafi KC


>
> 1. transfer the whole 'file A' to slave 
> 2. transfer the changes of file A to slave
>
>
> thx lot
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Are there any news on zfs snapshots support?

2017-08-17 Thread Mohammed Rafi K C
Yes. It is in the review phase . Actually Sriram implemented zfs
snapshot and Mark implemented the brtfs and Mark merged these two
patches [1]. Adding them in cc .

Reviews are most welcome.


[1] : https://review.gluster.org/#/c/17865/

Regards

Rafi KC

On 08/17/2017 05:22 PM, Arman Khalatyan wrote:
> Hi,
> Somewhere I read that zfs snapshots are under the  development.
> are there any news?
> Thanks,
> Arman.
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] stuck heal process

2017-07-27 Thread Mohammed Rafi K C
Can you share the selfheald logs under /var/log/glusterfs/ .


Rafi KC


On 07/25/2017 06:05 PM, Megan . wrote:
>  Good Morning!
>
> We are running RedHat 7.3 with
> glusterfs-server-3.8.4-18.el7rhgs.x86_64.  Not sure if your able to
> help with this version or not.
>
> I have a 5 node setup with 1 node having no storage and only acting as
> a quorum node. We have a mis of direct attached storage and iscsi SAN
> storage.
>
> We have distributed replica volumes created across all 4 nodes.
>
> At some point last week one of the nodes lost its connection to the
> SAN.  We had a ton of activity on one of the volumes during the past
> week.  Yesterday we realized we had an issue and bounced the node,
> fixed the connections, and brought gluster back online on that node.
> Now I have a volume with 17k heals that just aren't going anywhere.
> Is there anything I can do to recover from this?
>
>
> Starting time of crawl: Tue Jul 25 00:46:34 2017
> Ending time of crawl: Tue Jul 25 00:48:07 2017
> Type of crawl: INDEX
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 17742
>
> I did a gluster v heal $volume  and  gluster v heal $volume full
> yesterday, but nothing has changed.
>
> Any help is greatly appreciated.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS Fuse client hangs when coping large files

2017-07-27 Thread Mohammed Rafi K C
I think the mount logs will help here, can you share mount logs under
/var/log/glusterfs/ .


Rafi KC


On 07/26/2017 04:17 PM, Felipe Pina wrote:
>
> Hello,
>
> I’m having some weird problems while coping large files (> 1GB) to a
> GlusterFS through a Fuse client.
>
> When the copy is done using the cp command everything is fine, but if
> I use a Java program, the GlusterFS Fuse hangs.
>
> The kern.log shows timeout for the Java program and the Fuse client.
>
> Anyone have experienced this behavior?
>
> Test environment:
> Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0–83-generic x86_64)
> OpenJDK Runtime Environment (build
> 1.8.0_131–8u131-b11–0ubuntu1.16.04.2-b11)
> GlusterFS 3.7.6
>
>
>
> -- 
> Felipe Pina
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS snapshot filesize how do save?

2017-07-27 Thread Mohammed Rafi K C
You have to manage your thin lv in way that it has enough space for both
snapshot bricks and volume bricks.


Regards

Rafi KC


On 07/13/2017 07:25 AM, 최두일 wrote:
>
> gluster snapshot create => mount brick =>
>
> many snapshot create => many mount brick => how to disk capacity
> management?
>
> if gluster volume => 1gb
>
> => snapshot *5 = 5gb?
>
> please ㅠㅠ
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.11 on ubuntu 16.04 not working

2017-07-27 Thread Mohammed Rafi K C
Can you share the mount logs /var/log/glusterfs/.log . Also
can you check the brick path that file is crated on backend .

Regards

Rafi KC


On 07/07/2017 09:46 PM, Christiane Baier wrote:
> Hi There,
>
> we have a problem with a fresh installation of gluster 3.11 on a
> ubuntu 16.04 server.
>
> we have made the installaton straight forward like it ist described on
> the gluster.org website.
>
> 
>
> in  fstab is:
> /dev/sdb1 /gluster xfs defaults 0 0
> knoten5:/gv0 /glusterfs glusterfs defaults,_netdev,acl,selinux 0 0
>
> after reboot /gluster is mounted
> /glusterfs not
> with mount -a mounting is possible.
>
> gluster peer status shows
> Number of Peers: 1
>
> Hostname: knoten5
> Uuid: 996c9b7b-9913-4f0f-a0e2-387fbd970129
> State: Peer in Cluster (Connected)
>
> Network connectivity is okay
>
> gluster volume info shows
> Volume Name: gv0
> Type: Replicate
> Volume ID: 0e049b18-9fb7-4554-a4b7-b7413753af3a
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: knoten4:/gluster/brick
> Brick2: knoten5:/gluster/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
>
> glusterfs-server is up an running.
>
> But if I put a file in /glusterfs it won't show up on the other system?
> What is wrong.
>
> By the way, we have the same problems on a ubuntu 14.04 server with an
> older gluster (since two days).
>
> Kind Regards
> Chris
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] NFS Ganesha

2017-07-27 Thread Mohammed Rafi K C
+Jiffin


Regards

Rafi KC


On 07/06/2017 10:04 PM, Anthony Valentine wrote:
>
> Hello!
>
>  
>
> I am attempting to setup a Gluster install using Ganesha for NFS using
> the guide found here
> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>  
>
>
>  
>
> The Gluster portion is working fine, however when I try to setup
> Ganesha I have a problem.  The guide says to run ‘gluster nfs-ganesha
> enable’ however when I do, I get the following error:
>
>  
>
> [root@glustertest1 ~]# gluster nfs-ganesha enable
>
> unrecognized word: nfs-ganesha (position 0)
>
>  
>
> Has this command changed?  If so, what is the new command?  If not,
> why would I be getting this error? 
>
>  
>
> Is there a more recent guide that I should be following? 
>
>  
>
>  
>
> Thank you in advance!
>
>  
>
> Anthony Valentine
>
>  
>
>  
>
>  
>
>  
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Mess code when encryption

2017-07-27 Thread Mohammed Rafi K C
It looks like a bug to me. Can you file an issue in github.

Regards

Rafi KC


On 07/05/2017 03:30 PM, Liu, Dan wrote:
>
> Hi everyone,
>
>  
>
> I have a question about using encryption in Gluster FS.
>
>  
>
> 1.Created a file (file size is smaller than 1k) in the volume’s mount
> point.
>
> 2.Read the file, finally got a mess code.
>
> I found that the content I got is from cache and no decryption is
> operated on the file content, so mess code returns.
>
> If I set the following property to off, then everything is OK.
>
> performance.quick-read
>
> performance.write-behind
>
> performance.open-behind
>
>  
>
> My question is :
>
>Is above metioned phenomenon right?
>
>Does I have wrong configurations?
>
>  
>
> OS : CentOS 7.1
>
> Gluster FS:  3.10.3
>
>  
>
> Configuration
>
>According quick start of official website, set features.encryption
> to on and encryption.master-key, then start and mount the volume.
>
>  
>
> Looking forward to your answers. Thanks.
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-27 Thread Mohammed Rafi K C
The current sharding has very limited use cases like vmstore where the
clients accessing the sharded file will always be one. Krutuka will be
the right person to answer your questions .


Regards

Rafi KC


On 06/30/2017 04:28 PM, gen...@gencgiyen.com wrote:
>
> Hi,
>
>  
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>  
>
> First test:
>
>  
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>  
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>  
>
> Now same test without Stripe but with sharding. This results are same
> when I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>  
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>  
>
> Now, If we do this test twice at the same time (two dd or two
> doewnload at the same time) it goes below 25/mb each or slower.
>
>  
>
> I thought sharding is at least equal or a little slower (maybe?) but
> these results are terribly slow.
>
>  
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>  
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs”
> and 4TB each.
>
>  
>
> Is there any tweak/tuning out there to make it fast?
>
>  
>
> Or is this an expected behavior? If its, It is unacceptable. So slow.
> I cannot use this on production as it is terribly slow.
>
>  
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>  
>
> Thanks,
>
> Gencer.
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] AUTH-ALLOW / AUTH-REJECT

2017-07-27 Thread Mohammed Rafi K C
Can you describe the setup and the tests you did ? Is it consistently
reproducible  ?


also you can file a bug at https://bugzilla.redhat.com/


On 06/29/2017 02:46 PM, Alexandre Blanca wrote:
> Hi,
>
> I want to manage access on dispersed volume.
> When I use *gluster volume set test_volume auth.allow/IP_ADDRESS
> /*//it works but with /*HOSTNAME*/* ***the filter doesn't apply...
> Any idea to solve my problem?
>
> *glusterfs --version*
> 3.7
>
> Have a nice day,
>
> Alex
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Memory Leakage in Gluster 3.10.2-1

2017-07-27 Thread Mohammed Rafi K C
Are you still facing the problem ? If so, Can you please provide the
workload , cmd_log_history file, log files , etc ?


Regards

Rafi KC


On 06/23/2017 02:06 PM, shridhar s n wrote:
> Hi All,
>
> We are using GlusterFS 3.10.2 (upgraded from 3.7.0 last week) on
> CentOS 7.x .
>
> We continue to see memory utilization going up once every 3 days. The
> memory utilization of the server demon(glusterd) in  server is keep on
> increasing. In about 30+ hours the Memory utilization of glusterd
> service alone will reach 70% of memory available. Since we have alarms
> for this threshold, we get notified and only way to stop it so far is
> to restart the glusterd. 
>
> The GlusterFS is configured in the two server nodes with replica option.
>
> Kindly let us know how to fix this memory leakage.
>
> Thanks in advance,
>
> Shridhar S N
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Interesting split-brain...

2017-06-14 Thread Mohammed Rafi K C
Can you please explain How we ended up in this scenario. I think that
will help to understand more about this scenarios and why gluster
recommend replica 3 or arbiter volume.

Regards

Rafi KC


On 06/15/2017 10:46 AM, Karthik Subrahmanya wrote:
> Hi Ludwig,
>
> There is no way to resolve gfid split-brains with type mismatch. You
> have to do it manually by following the steps in [1].
> In case of type mismatch it is recommended to resolve it manually. But
> for only gfid mismatch in 3.11 we have a way to
> resolve it by using the *favorite-child-policy*.
> Since the file is not important, you can go with deleting that.
>
> [1]
> https://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
>
> HTH,
> Karthik
>
> On Thu, Jun 15, 2017 at 8:23 AM, Ludwig Gamache  > wrote:
>
> I am new to gluster but already like it. I did a maintenance last
> week where I shutdown both nodes (one after each others). I had
> many files that needed to be healed after that. Everything worked
> well, except for 1 file. It is in split-brain, with 2 different
> GFID. I read the documentation but it only covers the cases where
> the GFID is the same on both bricks. BTW, I am running Gluster 3.10.
>
> Here are some details...
>
> [root@NAS-01 .glusterfs]# gluster volume heal data01 info
>
> Brick 192.168.186.11:/mnt/DATA/data
>
> /abc/.zsh_history 
>
> /abc - Is in split-brain
>
>
> Status: Connected
>
> Number of entries: 2
>
>
> Brick 192.168.186.12:/mnt/DATA/data
>
> /abc - Is in split-brain
>
>
> /abc/.zsh_history 
>
> Status: Connected
>
> Number of entries: 2
>
>
> On brick 1:
>
> [root@NAS-01 abc]# ls -lart
>
> total 75
>
> drwxr-xr-x.  2 root  root  2 Jun  8 13:26 .zsh_history
>
> drwxr-xr-x.  3 12078 root  3 Jun 12 11:36 .
>
> drwxrwxrwt. 17 root  root 17 Jun 12 12:20 ..
>
>
> On brick 2:
>
> [root@DC-MTL-NAS-02 abc]# ls -lart
>
> total 66
>
> -rw-rw-r--.  2 12078 12078 1085 Jun 12 04:42 .zsh_history
>
> drwxr-xr-x.  2 12078 root 3 Jun 12 10:36 .
>
> drwxrwxrwt. 17 root  root17 Jun 12 11:20 ..
>
>
> Notice that on one brick, it is a file and on the other one it is
> a directory.
>
> On brick 1:
>
> [root@NAS-01 abc]# getfattr -d -m . -e hex
> /mnt/DATA/data/abc/.zsh_history
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/DATA/data/abc/.zsh_history
>
> 
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>
> trusted.afr.data01-client-0=0x
>
> trusted.afr.data01-client-1=0x0002
>
> trusted.gfid=0xdee43407139d41f091d13e106a51f262
>
> trusted.glusterfs.dht=0x0001
>
>
> On brick 2:
>
> root@NAS-02 abc]# getfattr -d -m . -e hex
> /mnt/DATA/data/abc/.zsh_history 
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/DATA/data/abc/.zsh_history
>
> 
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>
> trusted.afr.data01-client-0=0x00170002
>
> trusted.afr.data01-client-1=0x
>
> trusted.bit-rot.version=0x060059397acd0005dadd
>
> trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803
>
>
> Any recommendation on how to recover from that? BTW, the file is
> not important and I could easily get rid of it without impact. So,
> if this is an easy solution...
>
> Regards,
>
> -- 
> Ludwig Gamache
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.11 Retrospective

2017-06-05 Thread Mohammed Rafi K C
Great, it works perfectly. This is a good start.


Rafi KC


On 06/05/2017 01:44 PM, Amye Scavarda wrote:
> Resolved! 
> Try it again.
> - amye 
>
> On Mon, Jun 5, 2017 at 5:07 PM, Mohammed Rafi K C <rkavu...@redhat.com
> <mailto:rkavu...@redhat.com>> wrote:
>
> Hi Amye,
>
> The form is not accessible, it says
>
> Feedback for Gluster 3.11 release
>
> The form Feedback for Gluster 3.11 release is no longer accepting
> responses.
> Try contacting the owner of the form if you think this is a mistake.
>
>
> Regards
> Rafi KC
>
>
>
> On 06/05/2017 01:25 PM, Amye Scavarda wrote:
>> We're doing something new now with releases, running a
>> retrospective on what things we as a community should stop, what
>> we should start, and what we should continue.
>>
>> With last week's release, here's our quick form for 3.11
>> https://goo.gl/forms/OkhNZDFspYqdN00g2
>> <https://goo.gl/forms/OkhNZDFspYqdN00g2>
>>
>> We'll keep this open until June 15th to give everyone time to
>> give us feedback. Thanks!
>>
>> -- 
>> Amye Scavarda | a...@redhat.com <mailto:a...@redhat.com> |
>> Gluster Community Lead
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> -- 
> Amye Scavarda | a...@redhat.com <mailto:a...@redhat.com> | Gluster
> Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.11 Retrospective

2017-06-05 Thread Mohammed Rafi K C
Hi Amye,

The form is not accessible, it says

Feedback for Gluster 3.11 release

The form Feedback for Gluster 3.11 release is no longer accepting responses.
Try contacting the owner of the form if you think this is a mistake.


Regards
Rafi KC


On 06/05/2017 01:25 PM, Amye Scavarda wrote:
> We're doing something new now with releases, running a retrospective
> on what things we as a community should stop, what we should start,
> and what we should continue.
>
> With last week's release, here's our quick form for 3.11
> https://goo.gl/forms/OkhNZDFspYqdN00g2
>
> We'll keep this open until June 15th to give everyone time to give us
> feedback. Thanks!
>
> -- 
> Amye Scavarda | a...@redhat.com  | Gluster
> Community Lead
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Snapshot auto-delete unmount problem

2017-05-31 Thread Mohammed Rafi K C
Can you give us more logs on this issue. Also by any chance did somebody
unmount the lv's in any cases ?


Regards

Rafi KC


On 05/31/2017 03:00 PM, Gary Lloyd wrote:
> Hi I am having a problem deleting snapshots, gluster is failing to
> unmount them. I am running centos 7.3 with gluster-3.10.2-1
>
> here is some log output:
>
> [2017-05-31 09:21:39.961371] W [MSGID: 106057]
> [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find]
> 0-management: Snap volume
> 331ec972f90d494d8a86dd4f69d718b7.glust01-li.run-gluster-snaps-331ec972f90d494d8a86dd4f69d718b7-brick1-b
> not found [Invalid argument]
> [2017-05-31 09:21:51.520811] W [MSGID: 106112]
> [glusterd-snapshot.c:8128:glusterd_handle_snap_limit] 0-management:
> Soft-limit (value = 27) of volume shares1 is reached. Deleting
> snapshot Snap_GMT-2017.05.31-09.20.04.
> [2017-05-31 09:21:51.531729] E [MSGID: 106095]
> [glusterd-snapshot-utils.c:3359:glusterd_umount] 0-management:
> umounting /run/gluster/snaps/4f980da64dec424ba0b48d6d36c4c54e/brick1
> failed (No such file or directory) [No such file or directory]
> [2017-05-31 09:22:00.540373] E [MSGID: 106038]
> [glusterd-snapshot.c:2895:glusterd_do_lvm_snapshot_remove]
> 0-management: umount failed for path
> /run/gluster/snaps/4f980da64dec424ba0b48d6d36c4c54e/brick1 (brick:
> /run/gluster/snaps/4f980da64dec424ba0b48d6d36c4c54e/brick1/b): No such
> file or directory.
> [2017-05-31 09:22:02.442048] W [MSGID: 106033]
> [glusterd-snapshot.c:3094:glusterd_lvm_snapshot_remove] 0-management:
> Failed to rmdir: /run/gluster/snaps/4f980da64dec424ba0b48d6d36c4c54e/,
> err: Directory not empty. More than one glusterd running on this node.
> [Directory not empty]
> [2017-05-31 09:22:02.443336] W [MSGID: 106039]
> [glusterd-snapshot-utils.c:55:glusterd_snapobject_delete]
> 0-management: Failed destroying lockof snap Snap_GMT-2017.05.31-09.20.04
> [2017-05-31 09:22:02.444038] I [MSGID: 106144]
> [glusterd-pmap.c:377:pmap_registry_remove] 0-pmap: removing brick
> /run/gluster/snaps/4f980da64dec424ba0b48d6d36c4c54e/brick1/b on port 49157
>
>
>
> Can anyone help ?
>
> Thanks
>
>
> /Gary Lloyd/
> 
> I.T. Systems:Keele University
> Finance & IT Directorate
> Keele:Staffs:IC1 Building:ST5 5NB:UK
> 
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how to restore snapshot LV's

2017-05-29 Thread Mohammed Rafi K C
Did you mount the snapshot bricks after you reconfigured the vgs.


Regards

Rafi KC


On 05/29/2017 01:08 PM, WoongHee Han wrote:
> right, i had reconfigured the vg in one node. and  activate the brick
> path then restored the snapshot.
>
>
> 2017-05-29 15:54 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com
> <mailto:rkavu...@redhat.com>>:
>
>
> On 05/27/2017 09:22 AM, WoongHee Han wrote:
>> Ih, i'm sorry for my late reply
>>
>> I've tried to solve it using your answer. It worked as well
>> thanks. it means the snapshot was activated.
>> and then i was restore the snapshot.
>>
>> but, after i restored the snapshot ,there was nothing  in the
>> volume(like files) 
>> can't it recover automatically?
> I remember you were saying that you had reconfigured the vg's. Did
> you had mount for the snapshot brick path active ?
>
> Rafi KC
>
>
>
>>
>> Thank you agin for your answer.
>>
>>
>> Best regards
>>
>>
>>
>> 2017-05-19 15:34 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com
>> <mailto:rkavu...@redhat.com>>:
>>
>> I do not know how you ended up in this state. This usually
>> happens when there is a commit failure. To recover from this
>> state you can change the value of "status" from
>>
>> the path /var/lib/glusterd/snaps///info .
>> From this file change the status to 0 in nodes where the
>> values are one. Then restart glusterd on those node where we
>> changed manually.
>>
>> Then try to activate it.
>>
>>
>> Regards
>>
>> Rafi KC
>>
>>
>> On 05/18/2017 09:38 AM, Pranith Kumar Karampuri wrote:
>>> +Rafi, +Raghavendra Bhat
>>>
>>> On Tue, May 16, 2017 at 11:55 AM, WoongHee Han
>>> <polishe...@gmail.com <mailto:polishe...@gmail.com>> wrote:
>>>
>>> Hi, all!
>>>
>>> I erased the VG having snapshot LV related to gluster
>>> volumes
>>> and then, I tried to restore volume;
>>>
>>> 1. vgcreate vg_cluster /dev/sdb
>>> 2. lvcreate --size=10G --type=thin-pool -n tp_cluster
>>> vg_cluster
>>> 3. lvcreate -V 5G --thinpool vg_cluster/tp_cluster -n
>>> test_vol vg_cluster
>>> 4. gluster v stop test_vol
>>> 5. getfattr -n trusted.glusterfs.volume-id
>>> /volume/test_vol ( in other node)
>>> 6. setfattr -n trusted.glusterfs.volume-id -v
>>>  0sKtUJWIIpTeKWZx+S5PyXtQ== /volume/test_vol (already
>>> mounted)
>>> 7. gluster v start test_vol
>>> 8. restart glusterd
>>> 9. lvcreate -s vg_cluster/test_vol --setactivationskip=n
>>> --name 6564c50651484d09a36b912962c573df_0
>>> 10. lvcreate -s vg_cluster/test_vol
>>> --setactivationskip=n
>>> --name ee8c32a1941e4aba91feab21fbcb3c6c_0
>>> 11. lvcreate -s vg_cluster/test_vol
>>> --setactivationskip=n
>>> --name bf93dc34233646128f0c5f84c3ac1f83_0 
>>> 12. reboot
>>>
>>> It works, but bricks for snapshot is not working.
>>>
>>> 
>>> --
>>> ~]# glsuter snpshot status
>>> Brick Path:  
>>> 
>>> 192.225.3.35:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick1
>>> Volume Group  :   vg_cluster
>>> Brick Running :   No
>>> Brick PID :   N/A
>>> Data Percentage   :   0.22
>>> LV Size   :   5.00g
>>>
>>>
>>> Brick Path:  
>>> 
>>> 192.225.3.36:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick2
>>> Volume Group  :   vg_cluster
>>> Brick Running :   No
>>> Brick PID :   N/A
>>> Data Percentage   :   0.22
>>> LV Size   :   5.00g
>>>
>>>
>>>

Re: [Gluster-users] how to restore snapshot LV's

2017-05-29 Thread Mohammed Rafi K C

On 05/27/2017 09:22 AM, WoongHee Han wrote:
> Ih, i'm sorry for my late reply
>
> I've tried to solve it using your answer. It worked as well thanks. it
> means the snapshot was activated.
> and then i was restore the snapshot.
>
> but, after i restored the snapshot ,there was nothing  in the
> volume(like files) 
> can't it recover automatically?
I remember you were saying that you had reconfigured the vg's. Did you
had mount for the snapshot brick path active ?

Rafi KC


>
> Thank you agin for your answer.
>
>
> Best regards
>
>
>
> 2017-05-19 15:34 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com
> <mailto:rkavu...@redhat.com>>:
>
> I do not know how you ended up in this state. This usually happens
> when there is a commit failure. To recover from this state you can
> change the value of "status" from
>
> the path /var/lib/glusterd/snaps///info . From
> this file change the status to 0 in nodes where the values are
> one. Then restart glusterd on those node where we changed manually.
>
> Then try to activate it.
>
>
> Regards
>
> Rafi KC
>
>
> On 05/18/2017 09:38 AM, Pranith Kumar Karampuri wrote:
>> +Rafi, +Raghavendra Bhat
>>
>> On Tue, May 16, 2017 at 11:55 AM, WoongHee Han
>> <polishe...@gmail.com <mailto:polishe...@gmail.com>> wrote:
>>
>> Hi, all!
>>
>> I erased the VG having snapshot LV related to gluster volumes
>> and then, I tried to restore volume;
>>
>> 1. vgcreate vg_cluster /dev/sdb
>> 2. lvcreate --size=10G --type=thin-pool -n tp_cluster vg_cluster
>> 3. lvcreate -V 5G --thinpool vg_cluster/tp_cluster -n
>> test_vol vg_cluster
>> 4. gluster v stop test_vol
>> 5. getfattr -n trusted.glusterfs.volume-id /volume/test_vol (
>> in other node)
>> 6. setfattr -n trusted.glusterfs.volume-id -v
>>  0sKtUJWIIpTeKWZx+S5PyXtQ== /volume/test_vol (already mounted)
>> 7. gluster v start test_vol
>> 8. restart glusterd
>> 9. lvcreate -s vg_cluster/test_vol --setactivationskip=n
>> --name 6564c50651484d09a36b912962c573df_0
>> 10. lvcreate -s vg_cluster/test_vol --setactivationskip=n
>> --name ee8c32a1941e4aba91feab21fbcb3c6c_0
>> 11. lvcreate -s vg_cluster/test_vol --setactivationskip=n
>> --name bf93dc34233646128f0c5f84c3ac1f83_0 
>> 12. reboot
>>
>> It works, but bricks for snapshot is not working.
>>
>> 
>> --
>> ~]# glsuter snpshot status
>> Brick Path:  
>> 
>> 192.225.3.35:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick1
>> Volume Group  :   vg_cluster
>> Brick Running :   No
>> Brick PID :   N/A
>> Data Percentage   :   0.22
>> LV Size   :   5.00g
>>
>>
>> Brick Path:  
>> 
>> 192.225.3.36:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick2
>> Volume Group  :   vg_cluster
>> Brick Running :   No
>> Brick PID :   N/A
>> Data Percentage   :   0.22
>> LV Size   :   5.00g
>>
>>
>> Brick Path:  
>> 
>> 192.225.3.37:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick3
>> Volume Group  :   vg_cluster
>> Brick Running :   No
>> Brick PID :   N/A
>> Data Percentage   :   0.22
>> LV Size   :   5.00g
>>
>>
>> Brick Path:  
>> 
>> 192.225.3.38:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick4
>> Volume Group  :   vg_cluster
>> Brick Running :   Tes
>> Brick PID :   N/A
>> Data Percentage   :   0.22
>> LV Size   :   5.00g
>>
>> ~]# gluster snapshot deactivate t3_GMT-2017.05.15-08.01.37
>> Deactivating snap will make its data inaccessible. Do you
>> want to continue? (y/n) y
>> snapshot deactivate: failed: Pre Validation failed on
>> 192.225.3.36. Snapshot t3_GMT-2017.05.15-08.01.37 is already
>> deactivated.
>> Snapshot command failed
>>
>> ~]# glu

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mohammed Rafi K C


On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's
> hosting large qcow2 VM disks (I know it's reliable but it's
> not important data)
>
> I started with 5 bricks and then added another one, started the re
> balance process, everything went well, but now im looking at the
> bricks free space and i found one brick is around 82% while others
> ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than
> other bricks, and whenever i start re balance it just complete in 0
> seconds and without moving any data.
>

How much is your average file size in the cluster? And number of files
(roughly) .


> What will happen with the brick became full ?
>
Once brick contents goes beyond 90%, new files won't be created in the
brick. But existing files can grow.


> Can i move data manually from one brick to the other ?
>

Nop.It is not recommended, even though gluster will try to find the
file, it may break.


> Why re balance not distributing data evenly on all bricks ?
>

Rebalance works based on layout, so we need to see how layouts are
distributed. If one of your bricks has higher capacity, it will have
larger layout.

>
> Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
>
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
> took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> took 0.00 secs
> [2017-05-23 14:45:12.653291] I
> [dht-rebalance.c:3838:gf_defrag_start_crawl] 0-DHT: crawling
> file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
> [2017-05-23 

Re: [Gluster-users] GlusterFS+heketi+Kubernetes snapshots fail

2017-05-19 Thread Mohammed Rafi K C


On 05/18/2017 09:13 PM, Chris Jones wrote:
> On 5/18/2017 1:53 AM, Mohammed Rafi K C wrote:
>> On 05/18/2017 10:04 AM, Pranith Kumar Karampuri wrote:
>>> +Snapshot maintainer. I think he is away for a week or so. You may
>>> have to wait a bit more.
>>>
>>> On Wed, May 10, 2017 at 2:39 AM, Chris Jones <ch...@cjones.org
>>> <mailto:ch...@cjones.org>> wrote:
>>>
>>> Hi All,
>>>
>>> This was discussed briefly on IRC, but got no resolution. I have
>>> a Kubernetes cluster running heketi and GlusterFS 3.10.1. When I
>>> try to create a snapshot, I get:
>>>
>>> snapshot create: failed: Commit failed on localhost. Please
>>> check log file for details.
>>>
>>> glusterd log: http://termbin.com/r8s3
>>>
>>
>> I'm not able to open the url. Could you please paste it in a
>> different domain. ?
>
> glusterd.log:
> https://gist.github.com/cjyar/aa5dc8bc893d2439823fa11f2373428f
>
> brick log: https://gist.github.com/cjyar/10b2194a4413c6338da0776860a94401
>
> lvs output: https://gist.github.com/cjyar/87cfef8d403ed321bd96798790828d42
It looks like your VG data part is 100% full, that might be the reason
why it is failed .

Rafi KC


>
> "gluster snapshot config" output:
> https://gist.github.com/cjyar/0798a2ba8790f26f7d745f2a67abe5b1
>
>>
>>> brick log: http://termbin.com/l0ya
>>>
>>> lvs output: http://termbin.com/bwug
>>>
>>> "gluster snapshot config" output: http://termbin.com/4t1k
>>>
>>> As you can see, there's not a lot of helpful output in the log
>>> files. I'd be grateful if somebody could help me interpret
>>> what's there.
>>>
>>> Chris
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>>
>>>
>>>
>>> -- 
>>> Pranith
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how to restore snapshot LV's

2017-05-19 Thread Mohammed Rafi K C
I do not know how you ended up in this state. This usually happens when
there is a commit failure. To recover from this state you can change the
value of "status" from

the path /var/lib/glusterd/snaps///info . From this
file change the status to 0 in nodes where the values are one. Then
restart glusterd on those node where we changed manually.

Then try to activate it.


Regards

Rafi KC


On 05/18/2017 09:38 AM, Pranith Kumar Karampuri wrote:
> +Rafi, +Raghavendra Bhat
>
> On Tue, May 16, 2017 at 11:55 AM, WoongHee Han  > wrote:
>
> Hi, all!
>
> I erased the VG having snapshot LV related to gluster volumes
> and then, I tried to restore volume;
>
> 1. vgcreate vg_cluster /dev/sdb
> 2. lvcreate --size=10G --type=thin-pool -n tp_cluster vg_cluster
> 3. lvcreate -V 5G --thinpool vg_cluster/tp_cluster -n test_vol
> vg_cluster
> 4. gluster v stop test_vol
> 5. getfattr -n trusted.glusterfs.volume-id /volume/test_vol ( in
> other node)
> 6. setfattr -n trusted.glusterfs.volume-id -v
>  0sKtUJWIIpTeKWZx+S5PyXtQ== /volume/test_vol (already mounted)
> 7. gluster v start test_vol
> 8. restart glusterd
> 9. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name 6564c50651484d09a36b912962c573df_0
> 10. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name ee8c32a1941e4aba91feab21fbcb3c6c_0
> 11. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name bf93dc34233646128f0c5f84c3ac1f83_0 
> 12. reboot
>
> It works, but bricks for snapshot is not working.
>
> 
> --
> ~]# glsuter snpshot status
> Brick Path:  
> 
> 192.225.3.35:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick1
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.36:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick2
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.37:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick3
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.38:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick4
> Volume Group  :   vg_cluster
> Brick Running :   Tes
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
> ~]# gluster snapshot deactivate t3_GMT-2017.05.15-08.01.37
> Deactivating snap will make its data inaccessible. Do you want to
> continue? (y/n) y
> snapshot deactivate: failed: Pre Validation failed on
> 192.225.3.36. Snapshot t3_GMT-2017.05.15-08.01.37 is already
> deactivated.
> Snapshot command failed
>
> ~]# gluster snapshot activate t3_GMT-2017.05.15-08.01.37
> snapshot activate: failed: Snapshot t3_GMT-2017.05.15-08.01.37 is
> already activated
>
> 
> --
>
>
> how to  restore snapshot LV's ?
>
> my nodes consist of four nodes and  distributed, replicated (2x2)
>
>
> thank you.
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
>
>
>
>
> -- 
> Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS+heketi+Kubernetes snapshots fail

2017-05-18 Thread Mohammed Rafi K C


On 05/18/2017 10:04 AM, Pranith Kumar Karampuri wrote:
> +Snapshot maintainer. I think he is away for a week or so. You may
> have to wait a bit more.
>
> On Wed, May 10, 2017 at 2:39 AM, Chris Jones  > wrote:
>
> Hi All,
>
> This was discussed briefly on IRC, but got no resolution. I have a
> Kubernetes cluster running heketi and GlusterFS 3.10.1. When I try
> to create a snapshot, I get:
>
> snapshot create: failed: Commit failed on localhost. Please check
> log file for details.
>
> glusterd log: http://termbin.com/r8s3
>

I'm not able to open the url. Could you please paste it in a different
domain. ?

> brick log: http://termbin.com/l0ya
>
> lvs output: http://termbin.com/bwug
>
> "gluster snapshot config" output: http://termbin.com/4t1k
>
> As you can see, there's not a lot of helpful output in the log
> files. I'd be grateful if somebody could help me interpret what's
> there.
>
> Chris
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
>
>
>
>
> -- 
> Pranith
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Meeting minutes for Gluster Bug Triage Meeting held on 9th May 2017

2017-05-09 Thread Mohammed Rafi K C
Hi All,

The meeting minutes and logs for this weeks meeting are available at
the links below.Minutes: Minutes:
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-09/weely_gluster_bug_triage.2017-05-09-12.01.html
Minutes (text):
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-09/weely_gluster_bug_triage.2017-05-09-12.01.txt
Log:
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-09/weely_gluster_bug_triage.2017-05-09-12.01.log.html

We had a very lively meeting this time, and had good participation.
Hope next weeks meeting is also the same. The next meeting is as
always at 1200UTC next Tuesday in #gluster-meeting. See you all
there and thank you for attending todays meeting.
**
Regards!
Rafi KC


==
#gluster-meeting: Weely Gluster Bug Triage
==


Meeting summary
---
* Roll Call  (rafi, 12:02:16)

* Next weeks host  (rafi, 12:05:26)
  * ACTION: hgowtham will host the next gluster bug triage meeting
(rafi, 12:09:54)
  * Agenda is at
https://github.com/gluster/glusterfs/wiki/Bug-Triage-Meeting  (rafi,
12:10:30)

* group triage  (rafi, 12:10:49)
  * LINK: http://bit.ly/gluster-bugs-to-triage   (rafi, 12:10:54)
  * LINK:
https://gluster.readthedocs.io/en/latest/Contributors-Guide/Bug-Triage/
(rafi, 12:11:03)

* open floor  (rafi, 12:29:20)
  * ACTION: amarts and Shyam are working on triage guidelines for GitHub
issues  (rafi, 12:31:14)
  * ACTION: jiffin needs to send the changes to check-bugs.py also
(rafi, 12:31:59)

Meeting ended at 12:33:35 UTC.




Action Items

* hgowtham will host the next gluster bug triage meeting
* amarts and Shyam are working on triage guidelines for GitHub issues
* jiffin needs to send the changes to check-bugs.py also




Action Items, by person
---
* amarts
  * amarts and Shyam are working on triage guidelines for GitHub issues
* hgowtham
  * hgowtham will host the next gluster bug triage meeting
* **UNASSIGNED**
  * jiffin needs to send the changes to check-bugs.py also




People Present (lines said)
---
* rafi (35)
* hgowtham (8)
* amarts (6)
* kkeithley (6)
* ndevos (4)
* zodbot (3)
* ankitr (2)
* ashiq (1)


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] libgfapi access to snapshot volume

2017-05-04 Thread Mohammed Rafi K C
Hi Ram,


You can access snapshot through libgfapi, it is just that the volname
will become something like /snaps// . I can give you
some example programs if you have any trouble in doing so.


Or you can use uss feature to use snapshot through main volume via
libgfapi (it is also uses the above method internally).


Regards

Rafi KC


On 05/04/2017 06:42 PM, Ankireddypalle Reddy wrote:
>
> Hi,
>
>Can glusterfs snapshot volume be accessed through libgfapi.
>
>  
>
> Thanks and Regards,
>
> Ram 
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material
> for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank
> you."
> **
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] lost one replica after upgrading glusterfs from 3.7 to 3.10, please help

2017-04-28 Thread Mohammed Rafi K C
Can you share the glusterd logs from the three nodes ?


Rafi KC


On 04/28/2017 02:34 PM, Seva Gluschenko wrote:
> Dear Community,
>
>
> I call for your wisdom, as it appears that googling for keywords doesn't help 
> much.
>
> I have a glusterfs volume with replica count 2, and I tried to perform the 
> online upgrade procedure described in the docs 
> (http://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.10/). It 
> all went almost fine when I'd done with the first replica, the only problem 
> was the self-heal procedure that refused to complete until I commented out 
> all IPv6 entries in the /etc/hosts.
>
> So far, being sure that it all should work on the 2nd replica pretty the same 
> as it was on the 1st one, I had proceeded with the upgrade on the replica 2. 
> All of a sudden, it told me that it doesn't see the first replica at all. The 
> state before upgrade was:
>
> sst2# gluster volume status
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick sst0:/var/glusterfs   49152 0  Y   3482 
> Brick sst2:/var/glusterfs   49152 0  Y   29863
> NFS Server on localhost   2049  0  Y   25175
> Self-heal Daemon on localhostN/A   N/AY   25283
> NFS Server on sst0  N/A   N/AN   N/A  
> Self-heal Daemon on sst0N/A   N/AY   4827 
> NFS Server on sst1  N/A   N/AN   N/A  
> Self-heal Daemon on sst1N/A   N/AY   15009
>  
> Task Status of Volume gv0
> --
> There are no active volume tasks
>
> sst2# gluster peer status
> Number of Peers: 2
>
> Hostname: sst0
> Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc
> State: Peer in Cluster (Connected)
>
> Hostname: sst1
> Uuid: 5a2198de-f536-4328-a278-7f746f276e35
> State: Sent and Received peer request (Connected)
>
> sst2# gluster volume heal gv0 info
> Brick sst0:/var/glusterfs
> Number of entries: 0
>
> Brick sst2:/var/glusterfs
> Number of entries: 0
>
>
> After upgrade, it looked like this:
>
> sst2# gluster volume status
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick sst2:/var/glusterfs   N/A   N/AN   N/A  
> NFS Server on localhost N/A   N/AN   N/A  
> NFS Server on localhost N/A   N/AN   N/A  
>  
> Task Status of Volume gv0
> --
> There are no active volume tasks
>
> sst2# gluster peer status
> Number of Peers: 2
>
> Hostname: sst1
> Uuid: 5a2198de-f536-4328-a278-7f746f276e35
> State: Sent and Received peer request (Connected)
>
> Hostname: sst0
> Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc
> State: Peer Rejected (Connected)
>
>
> My biggest fault probably, at that point I googled and found this article 
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>  -- and followed its advice, removing at sst2 all the /var/lib/glusterd 
> contents except the glusterd.info file. As the result, the node, predictably, 
> lost all information about the volume.
>
> sst2# gluster volume status
> No volumes present
>
> sst2# gluster peer status
> Number of Peers: 2
>
> Hostname: sst0
> Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc
> State: Accepted peer request (Connected)
>
> Hostname: sst1
> Uuid: 5a2198de-f536-4328-a278-7f746f276e35
> State: Accepted peer request (Connected)
>
> Okay, I thought, this is might be a high time to re-add the brick. Not that 
> easy, Jack:
>
> sst0# gluster volume add-brick gv0 replica 2 'sst2:/var/glusterfs'
> volume add-brick: failed: Operation failed
>
> The reason appeared to be natural: sst0 still knows that there was the 
> replica on sst2. What should I do then? At this point, I tried to recover the 
> volume information on sst2 by putting it offline and copying all the volume 
> info from the sst0. Of course it wasn't enough to just copy as is, I modified 
> /var/lib/glusterd/vols/gv0/sst*\:-var-glusterfs, setting listen-port=0 for 
> the remote brick (sst0) and listen-port=49152 for the local brick (sst2). It 
> didn't help much, unfortunately. The final state I've reached is as follows:
>
> sst2# gluster peer status
> Number of Peers: 2
>
> Hostname: sst1
> Uuid: 5a2198de-f536-4328-a278-7f746f276e35
> State: Sent and Received peer request (Connected)
>
> Hostname: sst0
> Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc
> State: Sent and Received peer request (Connected)
>
> sst2# gluster volume info
>  
> 

Re: [Gluster-users] [Gluster-devel] Finding size of volume

2017-04-26 Thread Mohammed Rafi K C
I assume that you want to get the size from a client machine, rather
than nodes from trusted storage pools. If so, you can use gfapi to do a
fstat and can get the size of the volume.


Regards

Rafi KC


On 04/26/2017 02:17 PM, Nux! wrote:
> Hello,
>
> Is there a way with gluster tools to show size of a volume?
> I want to avoid mounting volumes and running df.
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fw: Deletion of old CHANGELOG files in .glusterfs/changelogs

2017-04-04 Thread Mohammed Rafi K C


On 04/04/2017 10:53 PM, mabi wrote:
> Anyone?
>
>
>
>>  Original Message 
>> Subject: Deletion of old CHANGELOG files in .glusterfs/changelogs
>> Local Time: March 31, 2017 11:22 PM
>> UTC Time: March 31, 2017 9:22 PM
>> From: m...@protonmail.ch
>> To: Gluster Users 
>>
>> Hi,
>>
>> I am using geo-replication since now over a year on my 3.7.20
>> GlusterFS volumes and noticed that the CHANGELOG. in the
>> .glusterfs/changelogs directory of a brick never get deleted. I have
>> for example over 120k files in one of these directories and it is
>> growing constantly.
>>
>> So my question, does GlusterFS have any mechanism to automatically
>> delete old and processed CHANGELOG files? If not is it safe to delete
>> them manually?

I will try to answer the question, I'm not an expert in geo-replication,
So I could be wrong here. I think GlusterFS won't delete the changelogs
automatically, reason being geo-replication is not the author of
changelogs, it is just a consumer any other application could use
changelogs.

You can safely delete**all processed** changelogs from actual changelogs
directory and geo-replication directory. You can look into the stime set
as the extended attribute on the root to see the time which
geo-replication last synced.

Adding Kotresh, and Aravinda .

Regards
Rafi KC


>>
>>
>>
>> Regards,
>> Mabi
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Community Meeting 2017-03-15

2017-03-16 Thread Mohammed Rafi K C
Hi All,

We had a few people attended, though the number was less it was active (and 
productive) meeting .
Thank you everyone for your attendance.

The highlight of the meeting was the new starting of demo series of gluster 
related projects
Prasanna did an awesome work to make it an interesting presentation on block 
storage.

We also discussed about the nigleb's suggestion to abandon the old patches, we 
decided to discuss
about this further in gluster devel.

Apart from that we discussed about snapshot implementation about brtfs which is 
under development from major

You can visit 
https://github.com/gluster/glusterfs/wiki/Community-meeting-2017-03-15 to get 
full details.

Our next meeting is scheduled for 1500UTC on March 29, 2017.
The meeting pad is available for your updates and topics of discussion at
https://bit.ly/gluster-community-meetings

See you all later.

~Rafi KC


==
#gluster-meeting: Gluster Community Meeting 2017-03-15
==


Meeting started by kshlm at 15:01:59 UTC. The full logs are available at
https://meetbot.fedoraproject.org/gluster-meeting/2017-03-15/gluster_community_meeting_2017-03-15.2017-03-15-15.01.log.html
.



Meeting summary
---
* Rollcall  (kshlm, 15:07:59)

* Old pending reviews  (kshlm, 15:11:17)
  * LINK: https://review.gluster.org/#/q/age:3+months   (major,
15:29:15)
  * LINK:
https://review.gluster.org/#/q/status:open+before:2016-12-15+Code-Review-2
(pkalever, 15:34:36)
  * ACTION: rafi will start the discussion on abandoning old reviews on
gluster-devel  (kshlm, 15:34:38)
  * AGREED: Old reviews need to be abandoned.  (kshlm, 15:35:10)

* Gluster-Block demo  (kshlm, 15:37:00)

* Action Item  (rafi, 16:15:45)

* Open floor  (rafi, 16:18:59)

Meeting ended at 16:32:54 UTC.




Action Items

* rafi will start the discussion on abandoning old reviews on
  gluster-devel




Action Items, by person
---
* rafi
  * rafi will start the discussion on abandoning old reviews on
gluster-devel
* **UNASSIGNED**
  * (none)




People Present (lines said)
---
* rafi (52)
* kshlm (52)
* ndevos (33)
* vbellur (26)
* major (21)
* pkalever (11)
* zodbot (9)
* amye (7)
* kkeithley (3)
* BatS9_ (2)


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How understand some code execute client side or server side?

2017-03-09 Thread Mohammed Rafi K C


On 03/10/2017 10:47 AM, Tahereh Fattahi wrote:
> Thank you very much, it is very helpful.
> I see the client graph also in /var/log/glusterfs/mnt-glusterfs.log
> when mount the file system.

Yes, you are on the right place. Fuse mount process log's the graph if
the log level is INFO.

> I think there is a tree structure between xlator (I had seen something
> in code like child and parent of each xlator), so just some of them
> are the point of connecting to server. I think xlator with type
> protocol/client is responsible for send request and get response from
> server.

> am I correct?

Indeed, you are a quick learner. Translator with type protocol/client
will be the last node in the graph which connects to the protocol/server
loaded in server . protocol/server will be the starting node in server.


Regards
Rafi KC

>
> On Thu, Mar 9, 2017 at 8:38 PM, Mohammed Rafi K C <rkavu...@redhat.com
> <mailto:rkavu...@redhat.com>> wrote:
>
> GlusterFS has mainly four daemons, ie glusterfs (generally client
> process), glusterfsd (generally brick process), glusterd
> (management daemon) and gluster (cli).
>
> Except cli (cli/src) all of them are basically the same binary
> symlinked to different name. So what makes them different is
> graphs, ie each daemons loads a graph and based on the graph it
> does it's job.
>
>
> Nodes of each graph are called xlators. So to figure out what are
> the xlators loaded in client side graph. You can see a client
> graph
> /var/lib/glusterd/vols//trusted-.-fuse.vol
>
> Once you figured out the xlators in client graph and their type,
> you can go to the source code, xlatos//.
>
>
> Please note that, if an xlator loaded on client graph it doesn't
> mean that it will only run in client side. The same xlator can
> also run in server if we load a graph with that xlator loaded.
>
>
> Let me know if this is not helping you to understand
>
>
> Regards
>
> Rafi KC
>
>
> So glusterd and cli codes are always ran on servers.
>
> On 03/09/2017 08:28 PM, Tahereh Fattahi wrote:
>> Hi
>> Is there any way to understand that some code is running client
>> side or server side (from source code and its directories)?
>> Is it possible for some code to execute both client and server side?
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How understand some code execute client side or server side?

2017-03-09 Thread Mohammed Rafi K C
GlusterFS has mainly four daemons, ie glusterfs (generally client
process), glusterfsd (generally brick process), glusterd (management
daemon) and gluster (cli).

Except cli (cli/src) all of them are basically the same binary symlinked
to different name. So what makes them different is graphs, ie each
daemons loads a graph and based on the graph it does it's job.


Nodes of each graph are called xlators. So to figure out what are the
xlators loaded in client side graph. You can see a client graph
/var/lib/glusterd/vols//trusted-.-fuse.vol

Once you figured out the xlators in client graph and their type, you can
go to the source code, xlatos//.


Please note that, if an xlator loaded on client graph it doesn't mean
that it will only run in client side. The same xlator can also run in
server if we load a graph with that xlator loaded.


Let me know if this is not helping you to understand


Regards

Rafi KC


So glusterd and cli codes are always ran on servers.

On 03/09/2017 08:28 PM, Tahereh Fattahi wrote:
> Hi
> Is there any way to understand that some code is running client side
> or server side (from source code and its directories)?
> Is it possible for some code to execute both client and server side?
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2017-03-09 Thread Mohammed Rafi K C
I'm sorry that you had to downgrade. We will work on it and hopefully
will see you soon in 3.8 ;) .


Just one question, does your workload include lot of delete either files
or directories. We just want to see if the delayed deletes (Janitor
thread) causing any issue .


Regards

Rafi KC


On 03/09/2017 01:53 PM, Amar Tumballi wrote:
>
> - Original Message -
>> From: "Micha Ober" <mich...@gmail.com>
>>
>> ​Just to let you know: I have reverted back to glusterfs 3.4.2 and everything
>> is working again. No more disconnects, no more errors in the kernel log. So
>> there *has* to be some kind of regression in the newer versions​. Sadly, I
>> guess, it will be hard to find.
>>
> Thanks for the update Micha. This helps to corner the issue a little at least.
>
> Regards,
> Amar
>
>
>> 2016-12-20 13:31 GMT+01:00 Micha Ober < mich...@gmail.com > :
>>
>>
>>
>> Hi Rafi,
>>
>> here are the log files:
>>
>> NFS: http://paste.ubuntu.com/23658653/
>> Brick: http://paste.ubuntu.com/23658656/
>>
>> The brick log is of the brick which has caused the last disconnect at
>> 2016-12-20 06:46:36 (0-gv0-client-7).
>>
>> For completeness, here is also dmesg output:
>> http://paste.ubuntu.com/23658691/
>>
>> Regards,
>> Micha
>>
>> 2016-12-19 7:28 GMT+01:00 Mohammed Rafi K C < rkavu...@redhat.com > :
>>
>>
>>
>>
>>
>> Hi Micha,
>>
>> Sorry for the late reply. I was busy with some other things.
>>
>> If you have still the setup available Can you enable TRACE log level [1],[2]
>> and see if you could find any log entries when the network start
>> disconnecting. Basically I'm trying to find out any disconnection had
>> occurred other than ping timer expire issue.
>>
>>
>>
>>
>>
>>
>>
>> [1] : gluster volume  diagnostics.brick-log-level TRACE
>>
>> [2] : gluster volume  diagnostics.client-log-level TRACE
>>
>>
>>
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/08/2016 07:59 PM, Atin Mukherjee wrote:
>>
>>
>>
>>
>>
>> On Thu, Dec 8, 2016 at 4:37 PM, Micha Ober < mich...@gmail.com > wrote:
>>
>>
>>
>> Hi Rafi,
>>
>> thank you for your support. It is greatly appreciated.
>>
>> Just some more thoughts from my side:
>>
>> There have been no reports from other users in *this* thread until now, but I
>> have found at least one user with a very simiar problem in an older thread:
>>
>> https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html
>>
>> He is also reporting disconnects with no apparent reasons, althogh his setup
>> is a bit more complicated, also involving a firewall. In our setup, all
>> servers/clients are connected via 1 GbE with no firewall or anything that
>> might block/throttle traffic. Also, we are using exactly the same software
>> versions on all nodes.
>>
>>
>> I can also find some reports in the bugtracker when searching for
>> "rpc_client_ping_timer_expired" and "rpc_clnt_ping_timer_expired" (looks
>> like spelling changed during versions).
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
>>
>> Just FYI, this is a different issue, here GlusterD fails to handle the volume
>> of incoming requests on time since MT-epoll is not enabled here.
>>
>>
>>
>>
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>>
>> But both reports involve large traffic/load on the bricks/disks, which is not
>> the case for out setup.
>> To give a ballpark figure: Over three days, 30 GiB were written. And the data
>> was not written at once, but continuously over the whole time.
>>
>>
>> Just to be sure, I have checked the logfiles of one of the other clusters
>> right now, which are sitting in the same building, in the same rack, even on
>> the same switch, running the same jobs, but with glusterfs 3.4.2 and I can
>> see no disconnects in the logfiles. So I can definitely rule out our
>> infrastructure as problem.
>>
>> Regards,
>> Micha
>>
>>
>>
>> Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:
>>
>>
>>
>>
>> Hi Micha,
>>
>> This is great. I will provide you one debug build which has two fixes which I
>> possible suspect for a frequent disconnect issue, though I don't have much
>> data to validate my theory. So I will take one more day to dig in to that.

Re: [Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?

2017-03-06 Thread Mohammed Rafi K C
I will see what we can do from gluster side to fix this. I will get back
to you .


Regards

Rafi KC


On 03/06/2017 05:14 PM, Denis Chaplygin wrote:
> Hello!
>
> On Fri, Mar 3, 2017 at 12:18 PM, Arman Khalatyan  > wrote:
>
> I think there are some bug in the vdsmd checks;
>
> OSError: [Errno 2] Mount of `10.10.10.44:/GluReplica` at
> `/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica` does not
> exist
>
>  
>
>
> 10.10.10.44:/GluReplica.rdma   3770662912 407818240 3362844672 
> 11% /rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica
>
>
> I suppose, that vdsm is not able to handle that .rdma suffix on volume
> path. Could you please file a bug for that issue to track it?

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Mohammed Rafi K C
Hi Arman,


On 03/03/2017 12:27 PM, Arman Khalatyan wrote:
> Dear Deepak, thank you for the hints, which gluster are you using?
> As you can see from my previous email that the RDMA connection tested
> with qperf. It is working as expected. In my case the clients are
> servers as well, they are hosts for the ovirt. Disabling selinux is
> nor recommended by ovirt, but i will give a try.

Gluster use IPoIB as mentioned by Deepak. So qperf with default options
may not be a good choice to test IPoIB. Because it will fallback to any
link available between the mentioned server and client. You can force
this behavior, please refer the link [1].

In addition to that, Can you please provide your gluster version,
glusterd logs and brick logs. Because since it complains about absence
of the device, mostly it could be a set up issue. Otherwise it could
have been a permission denied error, I'm not completely ruling out the
possibility of selinux preventing the creation if IB channel. We had
this issue in rhel which is fixed in 7.2 [2] .


[1] :
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Testing_an_RDMA_network_after_IPoIB_is_configured.html
[2] : https://bugzilla.redhat.com/show_bug.cgi?id=1386620

Regards
Rafi KC


>
> Am 03.03.2017 7:50 vorm. schrieb "Deepak Naidu"  >:
>
> I have been testing glusterfs over RDMA & below is the command I
> use. Reading up the logs, it looks like your IB(InfiniBand) device
> is not being initialized. I am not sure if u have an issue on the
> client IB or the storage server IB. Also have you configured ur IB
> devices correctly. I am using IPoIB.
>
> Can you check your firewall, disable selinux, I think, you might
> have checked it already ?
>
>  
>
> *mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0*
>
>  
>
>  
>
> · *The below error seems if you have issue starting your
> volume. I had issue, when my transport was set to tcp,rdma. I had
> to force start my volume. If I had set it only to tcp on the
> volume, the volume would start easily.*
>
>  
>
> [2017-03-02 11:49:47.829391] E [MSGID: 114022]
> [client.c:2530:client_init_rpc] 0-GluReplica-client-2: failed to
> initialize RPC
> [2017-03-02 11:49:47.829413] E [MSGID: 101019]
> [xlator.c:433:xlator_init] 0-GluReplica-client-2: Initialization
> of volume 'GluReplica-client-2' failed, review your volfile again
> [2017-03-02 11:49:47.829425] E [MSGID: 101066]
> [graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2:
> initializing translator failed
> [2017-03-02 11:49:47.829436] E [MSGID: 101176]
> [graph.c:673:glusterfs_graph_activate] 0-graph: init failed
>
>  
>
> · *The below error seems if you have issue with IB device.
> If not configured properly.*
>
>  
>
> [2017-03-02 11:49:47.828996] W [MSGID: 103071]
> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm
> event channel creation failed [No such device]
> [2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init]
> 0-GluReplica-client-2: Failed to initialize IB Device
> [2017-03-02 11:49:47.829080] W
> [rpc-transport.c:354:rpc_transport_load] 0-rpc-transport: 'rdma'
> initialization failed
>
>  
>
>  
>
> --
>
> Deepak
>
>  
>
>  
>
> *From:*gluster-users-boun...@gluster.org
> 
> [mailto:gluster-users-boun...@gluster.org
> ] *On Behalf Of *Sahina Bose
> *Sent:* Thursday, March 02, 2017 10:26 PM
> *To:* Arman Khalatyan; gluster-users@gluster.org
> ; Rafi Kavungal Chundattu Parambil
> *Cc:* users
> *Subject:* Re: [Gluster-users] [ovirt-users] Hot to force
> glusterfs to use RDMA?
>
>  
>
> [Adding gluster users to help with error]
>
> [2017-03-02 11:49:47.828996] W [MSGID: 103071]
> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm
> event channel creation failed [No such device]
>
>  
>
> On Thu, Mar 2, 2017 at 5:36 PM, Arman Khalatyan  > wrote:
>
> BTW RDMA is working as expected:
> root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
> tcp_bw:
> bw  =  475 MB/sec
> tcp_lat:
> latency  =  52.8 us
> [root@clei26 ~]#
>
> thank you beforehand.
>
> Arman.
>
>  
>
> On Thu, Mar 2, 2017 at 12:54 PM, Arman Khalatyan
> > wrote:
>
> just for reference:
>  gluster volume info
>  
> Volume Name: GluReplica
> Type: Replicate
> Volume ID: ee686dfe-203a-4caa-a691-26353460cc48
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp,rdma
> 

Re: [Gluster-users] detecting replication issues

2017-02-24 Thread Mohammed Rafi K C
Hi Joseph,

I think there is gap in understanding your problem. Let me try to give
more clear picture on this,

First , couple of clarification points here

1) client graph is an internally generated configuration file based on
your volume, that said you don't need to create or edit your own. If you
want a 3-way replicated volume you have to mention that when you create
the volume.

2) When you mount a gluster volume, you don't need to provide any client
graph, you just need to give server hostname and volname, it will
automatically fetches the graph and start working on it (so it does the
replication based on the graph generated by gluster management daemon)


Now let me briefly describe the procedure for creating a 3-way
replicated volume

1) gluster volume create  replica 3 :/
:/ :/

 Note : if you give 3 more bricks , then it will create 2-way
distributed 3 way replicated volume (you can increase the distribution
by adding multiple if 3)

 this step will automatically create the configuration file in
/var/lib/glusterd/vols//trusted-.tcp-fuse.vol

2) Now start the volume using gluster volume start 

3) Fuse mount the volume in client machine using the command mount -t
glusterfs :/   /

this will automatically fetches the configuration file and will do
the replication. You don't need to do anything


Let me know if this helps.


Regards

Rafi KC


On 02/24/2017 05:13 PM, Joseph Lorenzini wrote:
> HI Mohammed,
>
> Its not a bug per se, its a configuration and documentation issue. I
> searched the gluster documentation pretty thoroughly and I did not
> find anything that discussed the 1) client's call graph and 2) how to
> specifically configure a native glusterfs client to properly specify
> that call graph so that replication will happen across multiple
> bricks. If its there, then there's a pretty severe organization issue
> in the documentation (I am pretty sure I ended up reading almost every
> page actually).
>
> As a result, because I was a new to gluster, my initial set up really
> confused me. I would follow the instructions as documented in official
> gluster docs (execute the mount command), write data on the
> mount...and then only see it replicated to a single brick. It was only
> after much furious googling did I manage to figure out that that 1) i
> needed a client configuration file which should be specified in
> /etc/fstab and 2) that configuration block mentioned above was the key.
>
> I am actually planning on submitting a PR to the documentation to
> cover all this. To be clear, I am sure this is obvious to a seasoned
> gluster user -- but it is not at all obvious to someone who is new to
> gluster such as myself.
>
> So I am an operations engineer. I like reproducible deployments and I
> like monitoring to alert me when something is wrong. Due to human
> error or a bug in our deployment code, its possible that something
> like not setting the client call graph properly could happen. I wanted
> a way to detect this problem so that if it does happen, it can be
> remediated immediately.
>
> Your suggestion sounds promising. I shall definitely look into that.
> Though that might be a useful information to surface up in a CLI
> command in a future gluster release IMHO.
>
> Joe
>
>
>
> On Thu, Feb 23, 2017 at 11:51 PM, Mohammed Rafi K C
> <rkavu...@redhat.com <mailto:rkavu...@redhat.com>> wrote:
>
>
>
> On 02/23/2017 11:12 PM, Joseph Lorenzini wrote:
>> Hi all,
>>
>> I have a simple replicated volume with a replica count of 3. To
>> ensure any file changes (create/delete/modify) are replicated to
>> all bricks, I have this setting in my client configuration.
>>
>>  volume gv0-replicate-0
>> type cluster/replicate
>> subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>> end-volume
>>
>> And that works as expected. My question is how one could detect
>> if this was not happening which could poise a severe problem with
>> data consistency and replication. For example, those settings
>> could be omitted from the client config and then the client will
>> only write data to one brick and all kinds of terrible things
>> will start happening. I have not found a way the gluster volume
>> cli to detect when that kind of problem is occurring. For example
>> gluster volume heal  info does not detect this problem. 
>>
>> Is there any programmatic way to detect when this problem is
>> occurring?
>>
>
> I couldn't understand how you will end up in this situation. There
> is only one possibility (assuming there is no bug :) ), ie you
> changed the client graph in a way that there is only one subvolume
>

Re: [Gluster-users] volume start: data0: failed: Commit failed on localhost.

2017-02-24 Thread Mohammed Rafi K C
It looks like it is ended up in split brain kind of situation. To find
the root cause we need to get logs for the first failure of volume start
or volume stop .

Or to work around it, you can do a volume start force.


Regards

Rafi KC


On 02/24/2017 01:36 PM, Deepak Naidu wrote:
>
> I keep on getting this error when my config.transport is set to both
> tcp,rdma. The volume doesn’t start. I get the below error during
> volume start.
>
>  
>
> To get around this, I end up delete the volume, then configure either
> only rdma or tcp. May be I am missing something, just trying to get
> the volume up.
>
>  
>
> root@hostname:~# gluster volume start data0
>
> volume start: data0: failed: Commit failed on localhost. Please check
> log file for details.
>
> root@hostname:~#
>
>  
>
> root@ hostname:~# gluster volume status data0
>
> Staging failed on storageN2. Error: Volume data0 is not started
>
> root@ hostname:~
>
>  
>
> =
>
> [2017-02-24 08:00:29.923516] I [MSGID: 106499]
> [glusterd-handler.c:4349:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume data0
>
> [2017-02-24 08:00:29.926140] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed
> on storageN2. Error: Volume data0 is not started
>
> [2017-02-24 08:00:33.770505] I [MSGID: 106499]
> [glusterd-handler.c:4349:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume data0
>
> [2017-02-24 08:00:33.772824] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed
> on storageN2. Error: Volume data0 is not started
>
> =
>
> [2017-02-24 08:01:36.305165] E [MSGID: 106537]
> [glusterd-volume-ops.c:1660:glusterd_op_stage_start_volume]
> 0-management: Volume data0 already started
>
> [2017-02-24 08:01:36.305191] W [MSGID: 106122]
> [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume
> start prevalidation failed.
>
> [2017-02-24 08:01:36.305198] E [MSGID: 106122]
> [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate] 0-management: Pre
> Validation failed for operation Start on local node
>
> [2017-02-24 08:01:36.305205] E [MSGID: 106122]
> [glusterd-mgmt.c:2009:glusterd_mgmt_v3_initiate_all_phases]
> 0-management: Pre Validation Failed
>
>  
>
>  
>
> --
>
> Deepak
>
>  
>
> 
> This email message is for the sole use of the intended recipient(s)
> and may contain confidential information.  Any unauthorized review,
> use, disclosure or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
> 
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self heal failed, on /

2017-02-23 Thread Mohammed Rafi K C


On 02/24/2017 11:47 AM, max.degr...@kpn.com wrote:
>
> The version on the server of this specific mount is 3.7.11. The client
> is running version 3.4.2.
>

It is always better to have everything in one version, all clients and
all servers. In this case there is huge gap between the versions, 3.7
and 3.4 .

An additional thing is the code running on 3.4 is replicaV1 code and on
3.7 it v2, meaning there is huge difference to the logic of
replication/healing. So I recommend to keep all the gluster instances to
the same version


~Rafi


>  
>
> There is more to that. This client is actually mounting to volumes
> where the other server is running 3.4.2 as well. What’s your advice,
> update that other server to 3.7.11 (or higher) first? Of start with
> the client update?
>
>  
>
> *Van:*Mohammed Rafi K C [mailto:rkavu...@redhat.com]
> *Verzonden:* vrijdag 24 februari 2017 07:02
> *Aan:* Graaf, Max de; gluster-users@gluster.org
> *Onderwerp:* Re: [Gluster-users] self heal failed, on /
>
>  
>
>  
>
>  
>
> On 02/23/2017 12:18 PM, max.degr...@kpn.com
> <mailto:max.degr...@kpn.com> wrote:
>
> Hi,
>
>  
>
> We have a 4 node glusterfs setup that seems to be running without
> any problems. We can’t find any problems with replication or whatever.
>
>  
>
> We also have 4 machines running the glusterfs client. On all 4
> machines we see the following error in the logs at random moments:
>
>  
>
> [2017-02-23 00:04:33.168778] I
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  is successfully
> completed,   metadata self heal from source aab-client-0 to
> aab-client-1,  aab-client-2,  aab-client-3,  metadata - Pending
> matrix:  [ [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] ], on /
>
> [2017-02-23 00:09:34.431089] E
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  failed,   on /
>
> [2017-02-23 00:14:34.948975] I
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  is successfully
> completed,   metadata self heal from source aab-client-0 to
> aab-client-1,  aab-client-2,  aab-client-3,  metadata - Pending
> matrix:  [ [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] ], on /
>
>  
>
> The content within the glusterfs filesystems is rather static with
> only minor changes on it. This “self heal  failed” is printed
> randomly in the logs on the glusterfs client. It’s printed even at
> moment where nothing has changed within the glusterfs filesystem.
> When it is printed, its never on multiple servers at the same
> time. What we also don’t understand : the error indicates self
> heal failed on root “/”. In the root of this glusterfs mount there
> only 2 folders and no files are ever written at the root level.
>
>  
>
> Any thoughts?
>
>
> From the logs, It looks like an older version of gluster , probably
> 3.5 . Please confirm your glusterfs version. The version is pretty old
> and it may be moved End of Life. And this is AFR v1 , where the latest
> stable version runs with AFRv2.
>
> So I would suggest you to upgrade to a later version may be 3.8 .
>
> If you still want to go with this version, I can give it a try. Let me
> know the version, volume info and volume status. Still I will suggest
> to upgrade ;)
>
>
> Regards
> Rafi KC
>
>
>
>
>  
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>  
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Problems restarting gluster

2017-02-23 Thread Mohammed Rafi K C


On 02/23/2017 03:54 PM, xina towner wrote:
> Hi,
>
> we are using glusterfs with replica 2, we have 16 server nodes and
> around 40 client nodes.
>
> It happens sometimes that they lose connectivity and when we restart
> the node so it can come online again the server kicks us from the
> server and we are unable to login using ssh but the server responds to
> ICMP messages.
>
> I've google a little bit but I'm unable to find any reason why this is
> happening or any fix or workaround. Have you any experienced this
> situation also?
>
> I've found this message:
> http://lists.gluster.org/pipermail/gluster-users.old/2015-February/020635.html
>
> But I can't find any answer.

In straight forward I can't think of any possibilities for dependencies
between glusterfsd process and other userland process. May be somebody
else can help you.

But certainly I would be happy to look into the cause of loosing the
connection between client and bricks. If that helps please get me some
logs and other information like vol info , volume status, and version etc.

Regards
Rafi KC

>
> -- 
> Thanks,
>
> Rubén
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self heal failed, on /

2017-02-23 Thread Mohammed Rafi K C


On 02/23/2017 12:18 PM, max.degr...@kpn.com wrote:
>
> Hi,
>
>  
>
> We have a 4 node glusterfs setup that seems to be running without any
> problems. We can’t find any problems with replication or whatever.
>
>  
>
> We also have 4 machines running the glusterfs client. On all 4
> machines we see the following error in the logs at random moments:
>
>  
>
> [2017-02-23 00:04:33.168778] I
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  is successfully completed,  
> metadata self heal from source aab-client-0 to aab-client-1, 
> aab-client-2,  aab-client-3,  metadata - Pending matrix:  [ [ 0 0 0 0
> ] [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] ], on /
>
> [2017-02-23 00:09:34.431089] E
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  failed,   on /
>
> [2017-02-23 00:14:34.948975] I
> [afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
> 0-aab-replicate-0:  metadata self heal  is successfully completed,  
> metadata self heal from source aab-client-0 to aab-client-1, 
> aab-client-2,  aab-client-3,  metadata - Pending matrix:  [ [ 0 0 0 0
> ] [ 0 0 0 0 ] [ 0 0 0 0 ] [ 0 0 0 0 ] ], on /
>
>  
>
> The content within the glusterfs filesystems is rather static with
> only minor changes on it. This “self heal  failed” is printed randomly
> in the logs on the glusterfs client. It’s printed even at moment where
> nothing has changed within the glusterfs filesystem. When it is
> printed, its never on multiple servers at the same time. What we also
> don’t understand : the error indicates self heal failed on root “/”.
> In the root of this glusterfs mount there only 2 folders and no files
> are ever written at the root level.
>
>  
>
> Any thoughts?
>

>From the logs, It looks like an older version of gluster , probably 3.5
. Please confirm your glusterfs version. The version is pretty old and
it may be moved End of Life. And this is AFR v1 , where the latest
stable version runs with AFRv2.

So I would suggest you to upgrade to a later version may be 3.8 .

If you still want to go with this version, I can give it a try. Let me
know the version, volume info and volume status. Still I will suggest to
upgrade ;)


Regards
Rafi KC



>  
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Charm now supports ZFS bricks

2017-02-23 Thread Mohammed Rafi K C
Great effort. Kudos to the team.


Regards

Rafi KC


On 02/23/2017 07:12 PM, chris holcombe wrote:
> Hey Gluster Community!
>
> I wanted to announce that I have built support for ZFS bricks into the
> Gluster charm: https://github.com/cholcombe973/gluster-charm.  If anyone
> wants to give it a spin and provide feedback I would be overjoyed :).
>
> Thanks,
> Chris
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] detecting replication issues

2017-02-23 Thread Mohammed Rafi K C


On 02/23/2017 11:12 PM, Joseph Lorenzini wrote:
> Hi all,
>
> I have a simple replicated volume with a replica count of 3. To ensure
> any file changes (create/delete/modify) are replicated to all bricks,
> I have this setting in my client configuration.
>
>  volume gv0-replicate-0
> type cluster/replicate
> subvolumes gv0-client-0 gv0-client-1 gv0-client-2
> end-volume
>
> And that works as expected. My question is how one could detect if
> this was not happening which could poise a severe problem with data
> consistency and replication. For example, those settings could be
> omitted from the client config and then the client will only write
> data to one brick and all kinds of terrible things will start
> happening. I have not found a way the gluster volume cli to detect
> when that kind of problem is occurring. For example gluster volume
> heal  info does not detect this problem. 
>
> Is there any programmatic way to detect when this problem is occurring?
>

I couldn't understand how you will end up in this situation. There is
only one possibility (assuming there is no bug :) ), ie you changed the
client graph in a way that there is only one subvolume to replica server.

To check that the simply way is, there is a xlator called meta, which
provides meta data information through mount point, similiar to linux
proc file system. So you can check the active graph through meta and see
the number of subvolumes for replica xlator

for example : the directory   /.meta/graphs/active/-replicate-0/subvolumes will have
entries for each replica clients , so in your case you should see 3
directories.


Let me know if this helps.

Regards
Rafi KC


> Thanks,
> Joe
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS throughput inconsistent

2017-02-22 Thread Mohammed Rafi K C
The only log available for a client process is located at
/var/log/glusterfs/. You can also see if there is anything in
the bricks logs,


Regards

Rafi KC


On 02/23/2017 02:06 AM, Deepak Naidu wrote:
>
> Hello,
>
>  
>
> I have GlusterFS 3.8.8. I am using IB RDMA. I have noticed during
> Write or Read the throughput doesn’t seem consistent for same
> workload(fio command). Sometimes I get higher throughput sometimes it
> quickly goes into half, then stays there.
>
>  
>
> I cannot predict a consistent behavior  every time when I run the same
> workload. The time to complete varies. Is there any log file or
> something I can look into, to understand this behavior. I am single
> client(fuse) running 32 thread, 1mb block size, creating 200GB or
> reading 200GB files randomly with directIO.
>
>  
>
> --
>
> Deepak
>
> 
> This email message is for the sole use of the intended recipient(s)
> and may contain confidential information.  Any unauthorized review,
> use, disclosure or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
> 
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File operation failure on simple distributed volume

2017-02-16 Thread Mohammed Rafi K C
Hi Yonex

Recently Poornima has fixed one corruption issue with upcall, which
seems unlikely the cause of the issue, given that you are running fuse
clients. Even then I would like to give you a debug build including the
fix [1] and adding additional logs.

Will you be able to run the debug build ?


[1] : https://review.gluster.org/#/c/16613/

Regards

Rafi KC


On 02/16/2017 09:13 PM, yonex wrote:
> Hi Rafi,
>
> I'm still on this issue. But reproduction has not yet been achieved
> outside of production. In production environment, I have made
> applications stop writing data to glusterfs volume. Only read
> operations are going.
>
> P.S. It seems that I have corrupted the email thread..;-(
> http://lists.gluster.org/pipermail/gluster-users/2017-January/029679.html
>
> 2017-02-14 17:19 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>> Hi Yonex,
>>
>> Are you still hitting this issue ?
>>
>>
>> Regards
>>
>> Rafi KC
>>
>>
>> On 01/16/2017 10:36 AM, yonex wrote:
>>
>> Hi
>>
>> I noticed that there is a high throughput degradation while attaching the
>> gdb script to a glusterfs client process. Write speed becomes 2% or less. It
>> is not be able to keep thrown in production.
>>
>> Could you provide the custom build that you mentioned before? I am going to
>> keep trying to reproduce the problem outside of the production environment.
>>
>> Regards
>>
>> 2017年1月8日 21:54、Mohammed Rafi K C <rkavu...@redhat.com>:
>>
>> Is there any update on this ?
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/24/2016 03:53 PM, yonex wrote:
>>
>> Rafi,
>>
>>
>> Thanks again. I will try that and get back to you.
>>
>>
>> Regards.
>>
>>
>>
>> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>
>> Hi Yonex,
>>
>>
>> As we discussed in irc #gluster-devel , I have attached the gdb script
>>
>> along with this mail.
>>
>>
>> Procedure to run the gdb script.
>>
>>
>> 1) Install gdb,
>>
>>
>> 2) Download and install gluster debuginfo for your machine . packages
>>
>> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757
>>
>>
>> 3) find the process id and attach gdb to the process using the command
>>
>> gdb attach  -x 
>>
>>
>> 4) Continue running the script till you hit the problem
>>
>>
>> 5) Stop the gdb
>>
>>
>> 6) You will see a file called mylog.txt in the location where you ran
>>
>> the gdb
>>
>>
>>
>> Please keep an eye on the attached process. If you have any doubt please
>>
>> feel free to revert me.
>>
>>
>> Regards
>>
>>
>> Rafi KC
>>
>>
>>
>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>>
>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>>
>> Client 0-glusterfs01-client-2 has disconnected from bricks around
>>
>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
>>
>> around the time.
>>
>> You can find the brick name and hostname for 0-glusterfs01-client-2 from
>>
>> client graph.
>>
>>
>> Rafi
>>
>>
>> Are you there in any of gluster irc channel, if so Have you got a
>>
>> nickname that I can search.
>>
>>
>> Regards
>>
>> Rafi KC
>>
>>
>> On 12/19/2016 04:28 PM, yonex wrote:
>>
>> Rafi,
>>
>>
>> OK. Thanks for your guide. I found the debug log and pasted lines around
>> that.
>>
>> http://pastebin.com/vhHR6PQN
>>
>>
>> Regards
>>
>>
>>
>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>
>> On 12/16/2016 09:10 PM, yonex wrote:
>>
>> Rafi,
>>
>>
>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>
>> captured debug logs from a client and bricks.
>>
>>
>> A mount log:
>>
>> - http://pastebin.com/Tjy7wGGj
>>
>>
>> FYI rickdom126 is my client's hostname.
>>
>>
>> Brick logs around that time:
>>
>> - Brick1: http://pastebin.com/qzbVRSF3
>>
>> - Brick2: http://pastebin.com/j3yMNhP3
>>
>> - Brick3: http://pastebin.com/m81mVj6L
>>
>> - Brick4: http://pastebin.com/JDAbChf6
>>
>> - Brick5: http://pastebin.com/7saP6rsm
>>
>>
>> However I could not find any

Re: [Gluster-users] RDMA transport problems in GLUSTER on host with MIC

2017-01-20 Thread Mohammed Rafi K C
One thing to note here is that rdma uses srq, which I see as disabled in
both devices.


Regards
Rafi KC

On 01/20/2017 05:05 PM, Anoop C S wrote:
> On Fri, 2017-01-20 at 11:53 +0100, Fedele Stabile wrote:
>> Thank you for your help, 
>> I will answer to your questions:
>>
>> Il giorno ven, 20/01/2017 alle 12.58 +0530, Anoop C S ha scritto:
>>> On Wed, 2017-01-18 at 12:56 +0100, Fedele Stabile wrote:
 Hi,
 it happens that RDMA gluster transport does not works anymore
 after I have configured ibscif virtual connector for Infiniband in
 a
 server with a XeonPHI coprocessor.

 I have CentOS 6.6 and GLUSTER 3.8.5, OFED 3.12-1 MPSS 3.5.2 and I
 have
 followed the installation instructions of MPSS_Users_Guide
 (Revision
 3.5) that suggested to remove
 compat-rdma-devel and compat-rdma packages.

>>> It would help if you could somehow clearly understand the reason for
>>> removing those packages. May be
>>> they are critical and not intended to be removed. Please ask for help
>>> from OFED.
>>>
>> Files of ackages compat-rdma-devel and compat-rdma are substituted by
>> others from MPSS package that contains all the Software Stack for
>> server and MIC card including ofed drivers.
>>
 I have noticed that running the command:
 ib_send_bw
 gives the following error:

 # ib_send_bw

 
 * Waiting for client to connect... *
 
 -
 
 --
 Send BW Test
  Dual-port   : OFFDevice : scif0
  Number of qps   : 1Transport type : IW
  Connection type : RCUsing SRQ  : OFF
  RX depth: 512
  CQ Moderation   : 100
  Mtu : 2048[B]
  Link type   : Ethernet
  Gid index   : 0
  Max inline data : 0[B]
  rdma_cm QPs : OFF
  Data ex. method : Ethernet
 -
 
 --
  local address: LID 0x3e8 QPN 0x0003 PSN 0x123123
  GID: 76:121:186:102:03:119:00:00:00:00:00:00:00:00:00:00
 ethernet_read_keys: Couldn't read remote address
  Unable to read to socket/rdam_cm
 Failed to exchange data between server and clients

>>> The above error have nothing to do with GlusterFS. Can you please
>>> give more context on what failed
>>> for you while trying out GlusterFS with RDMA transport?
>> In glusterd.vol.log when I start glusterd I see:
>> [rdma.c:4837:gf_rdma_listen] 0-rdma.management: rdma option set failed
>> [Funzione non implementata]
> Sorry, my mistake. It is clear that glusterfs-rdma is installed. I 
> incorrectly interpreted the log
> entry.
>
> rdma_set_option() failed here with ENOSYS which means that some functionality 
> is not present with
> the current setup (I suspect RDMA_OPTION_ID_REUSEADDR). Need to look more on 
> what could be the
> reason. I will get back to you after some more debugging into the code.
>
>> But RDMA is correctly working on qib0 device as you can see below:
 Instead using the output of the command

 ib_send_bw -d qib0

 gives correct results:

 # ib_send_bw -d qib0

 
 * Waiting for client to connect... *
 
 -
 
 --
 Send BW Test
  Dual-port   : OFFDevice : qib0
  Number of qps   : 1Transport type : IB
  Connection type : RCUsing SRQ  : OFF
  RX depth: 512
  CQ Moderation   : 100
  Mtu : 2048[B]
  Link type   : IB
  Max inline data : 0[B]
  rdma_cm QPs : OFF
  Data ex. method : Ethernet
 -
 
 --
  local address: LID 0x0a QPN 0x0169 PSN 0xe0b768
  remote address: LID 0x20 QPN 0x28b280 PSN 0xc3008c
 -
 
 --
  #bytes #iterationsBW peak[MB/sec]BW
 average[MB/sec]   MsgRate[Mpps]
  65536  1000 0.00   2160.87   0
 .034
 574
 -
 
 --
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___

Re: [Gluster-users] WRITE => -1 (Input/output error)

2017-01-13 Thread Mohammed Rafi K C
the writes might have failed with EIO. Was there any network failure
back and forth ?


RafI KC


On 01/12/2017 07:50 PM, Stephen Martin wrote:
> Hi looking for help with an error I’m getting.
>
> I’m new to Gluster but have gone though most of the getting started and 
> overview documentation. 
>
> I have been given a production server that uses Gluster and although its up 
> and running and fuctioning I see lots of errors in the logs
>
> [2017-01-12 13:19:37.326562] W [fuse-bridge.c:2167:fuse_writev_cbk] 
> 0-glusterfs-fuse: 5852034: WRITE => -1 (Input/output error) 
>
> I don't really know what action caused it, I get lots of these logged how can 
> I start a diagnostic?
>
> Thanks
>
> .
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] File operation failure on simple distributed volume

2017-01-08 Thread Mohammed Rafi K C
Is there any update on this ?


Regards

Rafi KC

On 12/24/2016 03:53 PM, yonex wrote:
> Rafi,
>
> Thanks again. I will try that and get back to you.
>
> Regards.
>
>
> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>> Hi Yonex,
>>
>> As we discussed in irc #gluster-devel , I have attached the gdb script
>> along with this mail.
>>
>> Procedure to run the gdb script.
>>
>> 1) Install gdb,
>>
>> 2) Download and install gluster debuginfo for your machine . packages
>> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757
>>
>> 3) find the process id and attach gdb to the process using the command
>> gdb attach   -x 
>>
>> 4) Continue running the script till you hit the problem
>>
>> 5) Stop the gdb
>>
>> 6) You will see a file called mylog.txt in the location where you ran
>> the gdb
>>
>>
>> Please keep an eye on the attached process. If you have any doubt please
>> feel free to revert me.
>>
>> Regards
>>
>> Rafi KC
>>
>>
>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>>>> Client 0-glusterfs01-client-2 has disconnected from bricks around
>>>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
>>>> around the time.
>>> You can find the brick name and hostname for 0-glusterfs01-client-2 from
>>> client graph.
>>>
>>> Rafi
>>>
>>>> Are you there in any of gluster irc channel, if so Have you got a
>>>> nickname that I can search.
>>>>
>>>> Regards
>>>> Rafi KC
>>>>
>>>> On 12/19/2016 04:28 PM, yonex wrote:
>>>>> Rafi,
>>>>>
>>>>> OK. Thanks for your guide. I found the debug log and pasted lines around 
>>>>> that.
>>>>> http://pastebin.com/vhHR6PQN
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>>>>> Rafi,
>>>>>>>
>>>>>>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>>>>>> captured debug logs from a client and bricks.
>>>>>>>
>>>>>>> A mount log:
>>>>>>> - http://pastebin.com/Tjy7wGGj
>>>>>>>
>>>>>>> FYI rickdom126 is my client's hostname.
>>>>>>>
>>>>>>> Brick logs around that time:
>>>>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>>>>
>>>>>>> However I could not find any message like "EOF on socket". I hope
>>>>>>> there is any helpful information in the logs above.
>>>>>> Indeed. I understand that the connections are in disconnected state. But
>>>>>> what particularly I'm looking for is the cause of the disconnect, Can
>>>>>> you paste the debug logs when it start disconnects, and around that. You
>>>>>> may see a debug logs that says "disconnecting now".
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Rafi KC
>>>>>>
>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>>>>> Hi Rafi,
>>>>>>>>>
>>>>>>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>>>>>>> logs, since the error seems to be reproduced a few times per day. I
>>>>>>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>>>>>>> possible, is there a way to enable debug log only on specific client
>>>>>>>>> nodes?
>>>>>>>> if you are using fuse mount, there is proc kind of feature called .meta
>>>&g

Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-23 Thread Mohammed Rafi K C
Hi Yonex,

As we discussed in irc #gluster-devel , I have attached the gdb script
along with this mail.

Procedure to run the gdb script.

1) Install gdb,

2) Download and install gluster debuginfo for your machine . packages
location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757

3) find the process id and attach gdb to the process using the command
gdb attach   -x 

4) Continue running the script till you hit the problem

5) Stop the gdb

6) You will see a file called mylog.txt in the location where you ran
the gdb


Please keep an eye on the attached process. If you have any doubt please
feel free to revert me.

Regards

Rafi KC


On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>
> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>> Client 0-glusterfs01-client-2 has disconnected from bricks around
>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
>> around the time.
> You can find the brick name and hostname for 0-glusterfs01-client-2 from
> client graph.
>
> Rafi
>
>> Are you there in any of gluster irc channel, if so Have you got a
>> nickname that I can search.
>>
>> Regards
>> Rafi KC
>>
>> On 12/19/2016 04:28 PM, yonex wrote:
>>> Rafi,
>>>
>>> OK. Thanks for your guide. I found the debug log and pasted lines around 
>>> that.
>>> http://pastebin.com/vhHR6PQN
>>>
>>> Regards
>>>
>>>
>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>>> Rafi,
>>>>>
>>>>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>>>> captured debug logs from a client and bricks.
>>>>>
>>>>> A mount log:
>>>>> - http://pastebin.com/Tjy7wGGj
>>>>>
>>>>> FYI rickdom126 is my client's hostname.
>>>>>
>>>>> Brick logs around that time:
>>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>>
>>>>> However I could not find any message like "EOF on socket". I hope
>>>>> there is any helpful information in the logs above.
>>>> Indeed. I understand that the connections are in disconnected state. But
>>>> what particularly I'm looking for is the cause of the disconnect, Can
>>>> you paste the debug logs when it start disconnects, and around that. You
>>>> may see a debug logs that says "disconnecting now".
>>>>
>>>>
>>>> Regards
>>>> Rafi KC
>>>>
>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>>> Hi Rafi,
>>>>>>>
>>>>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>>>>> logs, since the error seems to be reproduced a few times per day. I
>>>>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>>>>> possible, is there a way to enable debug log only on specific client
>>>>>>> nodes?
>>>>>> if you are using fuse mount, there is proc kind of feature called .meta
>>>>>> . You can set log level through that for a particular client [1] . But I
>>>>>> also want log from bricks because I suspect bricks process for
>>>>>> initiating the disconnects.
>>>>>>
>>>>>>
>>>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Yonex
>>>>>>>
>>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>>>> Hi Yonex,
>>>>>>>>
>>>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>>>>>> and check for any message similar to [2]. Basically you can even search
>>>>>>>> for "EOF on socket".
>>>>>>>>
>>>>>>>> You can set your log level back to default (INFO) af

Re: [Gluster-users] DHT DHTLINKFILE location

2016-12-22 Thread Mohammed Rafi K C
If you are sure that the likfile has been created, then it will be in
hashed subvolume only. Just do a find on the file from backend and see .


Regards

Rafi KC

On 12/23/2016 08:26 AM, 李立 wrote:
> In glusterfs 3.8, glusterfs creates a DHTLINKFILE  file in hash volume
> when the volume have no  size or inode over the limit.But I cann't
> find the DHTLINKFILE  to indicate real volume .
> Thanks!
>
> --
> *
> *
> *
> *
> *
> --*
> LiLi
>  
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Heal command stopped

2016-12-22 Thread Mohammed Rafi K C
Hi Miloš Čučulović

Can you please give us gluster volume info output and log files fore
bricks,glusterd and selfheal daemon.


Regards

Rafi KC


On 12/22/2016 03:56 PM, Miloš Čučulović - MDPI wrote:
> I recently added a new replica server and have now:
> Number of Bricks: 1 x 2 = 2
>
> The heal was launched automatically and was working until yesterday
> (copied 5.5TB of files from total of 6.2TB). Now, the copy seems
> stopped, I do not see any file change on the new replica brick server.
>
> When trying to add a new file to the volume and checking the physical
> files on the replica brick, the file is not there.
>
> When I try to run a full heal with the command:
> sudo gluster volume heal storage full
>
> I am getting:
>
> Launching heal operation to perform full self heal on volume storage
> has been unsuccessful on bricks that are down. Please check if all
> brick processes are running.
>
> My storage info shows both bricks there.
>
> Any idea?
>
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs client report failed to get endpoints glusterfs

2016-12-20 Thread Mohammed Rafi K C
+Humble and Ashiq

Can you please provide gluster logs and gluster volume info?

Thanks
Rafi KC
On 12/20/2016 07:50 PM, likun wrote:
>
> Hi, everyone.
>
>  
>
> I use kubernetes to run gluster3.8.5 offical image. Kubernetes version
> is 1.4.6, os is coreos 1185.5.0.
>
>  
>
> Now, all my glusterfs client have following error logs every 3 minutes:
>
>  
>
> Dec 20 18:45:07 ad08.pek.prod.com kubelet-wrapper[2393]: E1220
> 10:45:07.8285302393 glusterfs.go:109] glusterfs: failed to get
> endpoints glusterfs[an empty namespace may not be set when a resource
> name is provided]
>
> Dec 20 18:45:07 ad08.pek.prod.com kubelet-wrapper[2393]: E1220
> 10:45:07.8285812393 reconciler.go:432] Could not construct volume
> information: MountVolume.NewMounter failed for volume
> "kubernetes.io/glusterfs/87a9582e-c38a-11e6-bd8d-1866dae7138c-glusterfs"
> (spec.Name: "glusterfs") pod "87a9582e-c38a-11e6-bd8d-1866dae7138c"
> (UID: "87a9582e-c38a-11e6-bd8d-1866dae7138c") with: an empty namespace
> may not be set when a resource name is provided
>
> Dec 20 18:45:07 ad08.pek.prod.com kubelet-wrapper[2393]: E1220
> 10:45:07.8286732393 glusterfs.go:109] glusterfs: failed to get
> endpoints glusterfs[an empty namespace may not be set when a resource
> name is provided]
>
> Dec 20 18:45:07 ad08.pek.prod.com kubelet-wrapper[2393]: E1220
> 10:45:07.8286862393 reconciler.go:432] Could not construct volume
> information: MountVolume.NewMounter failed for volume
> "kubernetes.io/glusterfs/9efb638d-c203-11e6-8607-1866dae7138c-glusterfs"
> (spec.Name: "glusterfs") pod "9efb638d-c203-11e6-8607-1866dae7138c"
> (UID: "9efb638d-c203-11e6-8607-1866dae7138c") with: an empty namespace
> may not be set when a resource name is provided
>
>  
>
> It seems I can visit the glusterfs normally. Anyone can give some hints?
>
>  
>
>  
>
> Likun
>
>  
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-12-19 Thread Mohammed Rafi K C
Hi Micha,

Can you please also see if there is any error messages in dmesg ?
Basically I'm trying to see whether your hitting issues described in
https://bugzilla.kernel.org/show_bug.cgi?id=73831 .


Regards

Rafi KC


On 12/19/2016 11:58 AM, Mohammed Rafi K C wrote:
>
> Hi Micha,
>
> Sorry for the late reply. I was busy with some other things.
>
> If you have still the setup available Can you enable TRACE log level
> [1],[2] and see if you could find any log entries when the network
> start disconnecting. Basically I'm trying to find out any
> disconnection had occurred other than ping timer expire issue.
>
>
>
> [1] : gluster volume  diagnostics.brick-log-level TRACE
>
> [2] : gluster volume  diagnostics.client-log-level TRACE
>
>
> Regards
>
> Rafi KC
>
>
> On 12/08/2016 07:59 PM, Atin Mukherjee wrote:
>>
>>
>> On Thu, Dec 8, 2016 at 4:37 PM, Micha Ober <mich...@gmail.com
>> <mailto:mich...@gmail.com>> wrote:
>>
>> Hi Rafi,
>>
>> thank you for your support. It is greatly appreciated.
>>
>> Just some more thoughts from my side:
>>
>> There have been no reports from other  users in *this* thread
>> until now, but I have found at least one user with a very simiar
>> problem in an older thread:
>>
>> https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html
>> 
>> <https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html>
>>
>> He is also reporting disconnects  with no apparent reasons,
>> althogh his setup is a bit more complicated, also involving a
>> firewall. In our setup, all servers/clients are connected via 1
>> GbE with no firewall or anything that might block/throttle
>> traffic. Also, we are using exactly the same software versions on
>> all nodes.
>>
>>
>> I can also find some reports in the bugtracker when searching for
>> "rpc_client_ping_timer_expired" and "rpc_clnt_ping_timer_expired"
>> (looks like spelling changed during versions).
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1096729>
>>
>>
>> Just FYI, this is a different issue, here GlusterD fails to handle
>> the volume of incoming requests on time since MT-epoll is not enabled
>> here.
>>  
>>
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1370683>
>>
>> But both reports involve large traffic/load on the bricks/disks,
>> which is not the case for out setup.
>> To give a ballpark figure: Over three days, 30 GiB were written.
>> And the data was not written at once, but continuously over the
>> whole time.
>>
>>
>>     Just to be sure, I have checked the logfiles of one of the other
>> clusters right now, which are sitting in the same building, in
>> the same rack, even on the same switch, running the same jobs,
>> but with glusterfs 3.4.2 and I can see no disconnects in the
>> logfiles. So I can definitely rule out our infrastructure as problem.
>>
>> Regards,
>> Micha
>>
>>
>>
>> Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:
>>>
>>> Hi Micha,
>>>
>>> This is great. I will provide you one debug build which has two
>>> fixes which I possible suspect for a frequent disconnect issue,
>>> though I don't have much data to validate my theory. So I will
>>> take one more day to dig in to that.
>>>
>>> Thanks for your support, and opensource++ 
>>>
>>> Regards
>>>
>>> Rafi KC
>>>
>>> On 12/07/2016 05:02 AM, Micha Ober wrote:
>>>> Hi,
>>>>
>>>> thank you for your answer and even more for the question!
>>>> Until now, I was using FUSE. Today I changed all mounts to NFS
>>>> using the same 3.7.17 version.
>>>>
>>>> But: The problem is still the same. Now, the NFS logfile
>>>> contains lines like these:
>>>>
>>>> [2016-12-06 15:12:29.006325] C
>>>> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
>>>> 0-gv0-client-7: server X.X.18.62:49153 has not responded in the
>>>> last 42 seconds, disconnecting.
>>>>
>>>> Interestingly enough,  the IP address X.X.18.62 is the same
>>>

Re: [Gluster-users] Broken Pipe - Intermittent

2016-12-19 Thread Mohammed Rafi K C
It might be because of the fragmentation caused by extent map in XFS
filesystem. see here in https://bugzilla.kernel.org/show_bug.cgi?id=73831


Regards

Rafi KC


On 12/19/2016 01:22 AM, Gustave Dahl wrote:
>
> I have been running gluster as a storage backend to OpenNebula for
> about a year and it has been running great. I have had an intermittent
> problem that has gotten worse over the last couple of days and I could
> use some help.
>
>  
>
> Setup
>
> =
>
> Gluster: 3.7.11
>
> Hyper Converged Setup - Gluster with KVM’s on the same machines with
> Gluster in a Slice on each server.
>
>  
>
> Four Servers - Each with 4 Bricks
>
>  
>
> Type: Distributed-Replicate
>
> Number of Bricks: 4 x 3 = 12
>
>  
>
> Bricks are 1TB SSD's
>
>  
>
> Gluster Status:  http://pastebin.com/Nux7VB4b
>
> Gluster Info:  http://pastebin.com/G5qR0kZq
>
>  
>
> Gluster is supporting qcow2 images that the KVM’s are using.  Image
> Sizes:  10GB up to 300GB images.
>
>  
>
> The volume is mounted on each node with glusterfs as a shared file
> system.  The KVM's using the images are using libgfapi ( i.e.
> file=gluster://shchst01:24007/shchst01/d8fcfdb97bc462aca502d5fe703afc66 )
>
>  
>
> Issue
>
> ==
>
> This setup has been running well, with the exception of this
> intermittent problem.  This only happens on one node.  It has happened
> on other bricks (all on the same node) but more freqently on Node 2:
> Brick 4
>
>  
>
> It starts here:  http://pastebin.com/YgeJ5VA9
>
>  
>
> Dec 18 02:08:54 shchhv02 kernel: XFS: possible memory allocation
> deadlock in kmem_alloc (mode:0x250)
>
>  
>
> This continues until:
>
>  
>
> Dec 18 02:11:10 shchhv02 storage-shchst01[14728]: [2016-12-18
> 08:11:10.428138] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
> 4-shchst01-client-11: server xxx.xx.xx.11:49155 has not responded in
> the last 42 seconds, disconnecting.
>
>  
>
> storage log:  http://pastebin.com/vxCdRnEg
>
>  
>
> [2016-12-18 08:11:10.435927] E [MSGID: 114031]
> [client-rpc-fops.c:2886:client3_3_opendir_cbk] 4-shchst01-client-11:
> remote operation failed. Path: /
> (----0001) [Transport endpoint is not
> connected]
>
> [2016-12-18 08:11:10.436240] E [rpc-clnt.c:362:saved_frames_unwind]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f06efbaeae2]
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f06ef97990e]
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f06ef979a1e]
> (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f06ef97b40a]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f06ef97bc38] )
> 4-shchst01-client-11: forced unwinding frame type(GF-DUMP) op(NULL(2))
> called at 2016-12-18 08:10:28.424311 (xid=0x36883d)
>
> [2016-12-18 08:11:10.436255] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk]
> 4-shchst01-client-11: socket disconnected
>
> [2016-12-18 08:11:10.436369] E [rpc-clnt.c:362:saved_frames_unwind]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f06efbaeae2]
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f06ef97990e]
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f06ef979a1e]
> (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f06ef97b40a]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f06ef97bc38] )
> 4-shchst01-client-11: forced unwinding frame type(GlusterFS 3.3)
> op(LOOKUP(27)) called at 2016-12-18 08:10:38.370507 (xid=0x36883e)
>
> [2016-12-18 08:11:10.436388] W [MSGID: 114031]
> [client-rpc-fops.c:2974:client3_3_lookup_cbk] 4-shchst01-client-11:
> remote operation failed. Path: /
> (----0001) [Transport endpoint is not
> connected]
>
> [2016-12-18 08:11:10.436488] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 4-shchst01-client-11: disconnected
> from shchst01-client-11. Client process will keep trying to connect to
> glusterd until brick's port is available
>
> The message "W [MSGID: 114031]
> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 4-shchst01-client-11:
> remote operation failed [Transport endpoint is not connected]"
> repeated 3 times between [2016-12-18 08:11:10.432640] and [2016-12-18
> 08:11:10.433530]
>
> The message "W [MSGID: 114031]
> [client-rpc-fops.c:2669:client3_3_readdirp_cbk] 4-shchst01-client-11:
> remote operation failed [Transport endpoint is not connected]"
> repeated 15 times between [2016-12-18 08:11:10.428844] and [2016-12-18
> 08:11:10.435727]
>
> The message "W [MSGID: 114061]
> [client-rpc-fops.c:4560:client3_3_fstat] 4-shchst01-client-11: 
> (----0001) remote_fd is -1. EBADFD [File
> descriptor in bad state]" repeated 11 times between [2016-12-18
> 08:11:10.433598] and [2016-12-18 08:11:10.435742]
>
>  
>
> brick 4 log:  http://pastebin.com/kQcNyGk2
>
>  
>
> [2016-12-18 08:08:33.000483] I [dict.c:473:dict_get]
> (-->/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac)
> [0x7f8504feccbc]
> -->/usr/lib64/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)

Re: [Gluster-users] files disappearing and re-appearing

2016-12-19 Thread Mohammed Rafi K C
Hi Riccardo,

I'm sorry that you didn't get your issues discussed here in the mailing
list on time. Sometimes community members would be busy with some other
issues.

I just got to know about this problem from a different thread in which
I'm part of, and you had mentioned about this case.

If you have the problem still bugging you, or if you have any previous
logs that you can share with me, that will help to analyze further.


some inline questions.


On 11/17/2016 07:22 PM, Riccardo Murri wrote:
> Hello,
>
> we are trying out GlusterFS as the working filesystem for a compute cluster; 
> the cluster is comprised of 57 compute nodes (55 cores each), acting as 
> GlusterFS clients, and 25 data server nodes (8 cores each), serving 
> 1 large GlusterFS brick each.
>
> We currently have noticed a couple of issues:
>
> 1) When compute jobs run, the `glusterfs` client process on the compute nodes
> goes up to 100% CPU, and filesystem operations start to slow down a lot.  
> Since there are many CPUs available, is it possible to make it use, e.g., 
> 4 CPUs instead of one to make it more responsive?

Can you just briefly describe about your computing job, workloads to see
what are the operation happening on the cluster.


>
> 2) In addition (but possibly related to 1) we have an issue with files 
> disappearing and re-appearing: from a compute process we test for the 
> existence
> of a file and e.g. `test -e /glusterfs/file.txt` fails.  Then we test from
> a different process or shell and the file is there.  As far as I can see,
> the servers are basically idle, and none of the peers is disconnected.
>
> We are running GlusterFS 3.7.17 on Ubuntu 16.04, installed from the Launchpad 
> PPA.
> (Details below for the interested.)
>
> Can you give any hint about what's going on?
is there any rebalance happening, tell me more about any on going
operations (internal operations like rebalance, shd,etc or client
operations).

Also some insight about your volume configuration will also help. volume
info and volume status.


Regards
Rafi KC


>
> Thanks,
> Riccardo
>
>
> Installation details:
>
> ubuntu@master001:~$ pdsh -a 'glusterfs --version | fgrep built' | dshbak -c
> 
> data[001-025],master001,worker[001-027,029-045,047,049-050,052-058,060,062-063]
> 
> glusterfs 3.7.17 built on Nov  4 2016 13:39:51
> ubuntu@master001:~$ dpkg -S $(which glusterfs)
> glusterfs-client: /usr/sbin/glusterfs
> ubuntu@master001:~$ apt-cache policy glusterfs-client 
> glusterfs-client:
>   Installed: 3.7.17-ubuntu1~xenial5
>   Candidate: 3.7.17-ubuntu1~xenial5
>   Version table:
>  *** 3.7.17-ubuntu1~xenial5 500
> 500 http://ppa.launchpad.net/gluster/glusterfs-3.7/ubuntu xenial/main 
> amd64 Packages
> 100 /var/lib/dpkg/status
>  3.7.6-1ubuntu1 500
> 500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/universe 
> amd64 Packages
> 500 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-19 Thread Mohammed Rafi K C


On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
> Client 0-glusterfs01-client-2 has disconnected from bricks around
> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
> around the time.
You can find the brick name and hostname for 0-glusterfs01-client-2 from
client graph.

Rafi

>
> Are you there in any of gluster irc channel, if so Have you got a
> nickname that I can search.
>
> Regards
> Rafi KC
>
> On 12/19/2016 04:28 PM, yonex wrote:
>> Rafi,
>>
>> OK. Thanks for your guide. I found the debug log and pasted lines around 
>> that.
>> http://pastebin.com/vhHR6PQN
>>
>> Regards
>>
>>
>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>> Rafi,
>>>>
>>>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>>> captured debug logs from a client and bricks.
>>>>
>>>> A mount log:
>>>> - http://pastebin.com/Tjy7wGGj
>>>>
>>>> FYI rickdom126 is my client's hostname.
>>>>
>>>> Brick logs around that time:
>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>
>>>> However I could not find any message like "EOF on socket". I hope
>>>> there is any helpful information in the logs above.
>>> Indeed. I understand that the connections are in disconnected state. But
>>> what particularly I'm looking for is the cause of the disconnect, Can
>>> you paste the debug logs when it start disconnects, and around that. You
>>> may see a debug logs that says "disconnecting now".
>>>
>>>
>>> Regards
>>> Rafi KC
>>>
>>>
>>>> Regards.
>>>>
>>>>
>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>> Hi Rafi,
>>>>>>
>>>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>>>> logs, since the error seems to be reproduced a few times per day. I
>>>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>>>> possible, is there a way to enable debug log only on specific client
>>>>>> nodes?
>>>>> if you are using fuse mount, there is proc kind of feature called .meta
>>>>> . You can set log level through that for a particular client [1] . But I
>>>>> also want log from bricks because I suspect bricks process for
>>>>> initiating the disconnects.
>>>>>
>>>>>
>>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Yonex
>>>>>>
>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>>> Hi Yonex,
>>>>>>>
>>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>>>>> and check for any message similar to [2]. Basically you can even search
>>>>>>> for "EOF on socket".
>>>>>>>
>>>>>>> You can set your log level back to default (INFO) after capturing for
>>>>>>> some time.
>>>>>>>
>>>>>>>
>>>>>>> [1] : gluster volume set  diagnostics.brick-log-level DEBUG and
>>>>>>> gluster volume set  diagnostics.client-log-level DEBUG
>>>>>>>
>>>>>>> [2] : http://pastebin.com/xn8QHXWa
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Rafi KC
>>>>>>>
>>>>>>> On 12/12/2016 09:35 PM, yonex wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> When my application moves a file from it's local disk to FUSE-mounted
>>>>>>>> GlusterFS volume, the client outputs many warnings and errors not
>>>>>>>> always but occasionally. The volume is a simple distributed volume.
>>>>>>>>
>>&g

Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-19 Thread Mohammed Rafi K C
Client 0-glusterfs01-client-2 has disconnected from bricks around
2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
around the time.

Are you there in any of gluster irc channel, if so Have you got a
nickname that I can search.

Regards
Rafi KC

On 12/19/2016 04:28 PM, yonex wrote:
> Rafi,
>
> OK. Thanks for your guide. I found the debug log and pasted lines around that.
> http://pastebin.com/vhHR6PQN
>
> Regards
>
>
> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>
>> On 12/16/2016 09:10 PM, yonex wrote:
>>> Rafi,
>>>
>>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>> captured debug logs from a client and bricks.
>>>
>>> A mount log:
>>> - http://pastebin.com/Tjy7wGGj
>>>
>>> FYI rickdom126 is my client's hostname.
>>>
>>> Brick logs around that time:
>>> - Brick1: http://pastebin.com/qzbVRSF3
>>> - Brick2: http://pastebin.com/j3yMNhP3
>>> - Brick3: http://pastebin.com/m81mVj6L
>>> - Brick4: http://pastebin.com/JDAbChf6
>>> - Brick5: http://pastebin.com/7saP6rsm
>>>
>>> However I could not find any message like "EOF on socket". I hope
>>> there is any helpful information in the logs above.
>> Indeed. I understand that the connections are in disconnected state. But
>> what particularly I'm looking for is the cause of the disconnect, Can
>> you paste the debug logs when it start disconnects, and around that. You
>> may see a debug logs that says "disconnecting now".
>>
>>
>> Regards
>> Rafi KC
>>
>>
>>> Regards.
>>>
>>>
>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>> Hi Rafi,
>>>>>
>>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>>> logs, since the error seems to be reproduced a few times per day. I
>>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>>> possible, is there a way to enable debug log only on specific client
>>>>> nodes?
>>>> if you are using fuse mount, there is proc kind of feature called .meta
>>>> . You can set log level through that for a particular client [1] . But I
>>>> also want log from bricks because I suspect bricks process for
>>>> initiating the disconnects.
>>>>
>>>>
>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>>>
>>>>> Regards
>>>>>
>>>>> Yonex
>>>>>
>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>>>> Hi Yonex,
>>>>>>
>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>>>> and check for any message similar to [2]. Basically you can even search
>>>>>> for "EOF on socket".
>>>>>>
>>>>>> You can set your log level back to default (INFO) after capturing for
>>>>>> some time.
>>>>>>
>>>>>>
>>>>>> [1] : gluster volume set  diagnostics.brick-log-level DEBUG and
>>>>>> gluster volume set  diagnostics.client-log-level DEBUG
>>>>>>
>>>>>> [2] : http://pastebin.com/xn8QHXWa
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Rafi KC
>>>>>>
>>>>>> On 12/12/2016 09:35 PM, yonex wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> When my application moves a file from it's local disk to FUSE-mounted
>>>>>>> GlusterFS volume, the client outputs many warnings and errors not
>>>>>>> always but occasionally. The volume is a simple distributed volume.
>>>>>>>
>>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX
>>>>>>>
>>>>>>> It seems to come from something like a network disconnection
>>>>>>> ("Transport endpoint is not connected") at a glance, but other
>>>>>>> networking applications on the same machine don't observe such a
>>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>>>>>>>
>>>>>>&g

Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-12-18 Thread Mohammed Rafi K C
Hi Micha,

Sorry for the late reply. I was busy with some other things.

If you have still the setup available Can you enable TRACE log level
[1],[2] and see if you could find any log entries when the network start
disconnecting. Basically I'm trying to find out any disconnection had
occurred other than ping timer expire issue.



[1] : gluster volume  diagnostics.brick-log-level TRACE

[2] : gluster volume  diagnostics.client-log-level TRACE


Regards

Rafi KC


On 12/08/2016 07:59 PM, Atin Mukherjee wrote:
>
>
> On Thu, Dec 8, 2016 at 4:37 PM, Micha Ober <mich...@gmail.com
> <mailto:mich...@gmail.com>> wrote:
>
> Hi Rafi,
>
> thank you for your support. It is greatly appreciated.
>
> Just some more thoughts from my side:
>
> There have been no reports from other  users in *this* thread
> until now, but I have found at least one user with a very simiar
> problem in an older thread:
>
> https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html
> 
> <https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html>
>
> He is also reporting disconnects  with no apparent reasons,
> althogh his setup is a bit more complicated, also involving a
> firewall. In our setup, all servers/clients are connected via 1
> GbE with no firewall or anything that might block/throttle
> traffic. Also, we are using exactly the same software versions on
> all nodes.
>
>
> I can also find some reports in the bugtracker when searching for
> "rpc_client_ping_timer_expired" and "rpc_clnt_ping_timer_expired"
> (looks like spelling changed during versions).
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> <https://bugzilla.redhat.com/show_bug.cgi?id=1096729>
>
>
> Just FYI, this is a different issue, here GlusterD fails to handle the
> volume of incoming requests on time since MT-epoll is not enabled here.
>  
>
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
> <https://bugzilla.redhat.com/show_bug.cgi?id=1370683>
>
> But both reports involve large traffic/load on the bricks/disks,
> which is not the case for out setup.
> To give a ballpark figure: Over three days, 30 GiB were written.
> And the data was not written at once, but continuously over the
> whole time.
>
>
> Just to be sure, I have checked the logfiles of one of the other
> clusters right now, which are sitting in the same building, in the
> same rack, even on the same switch, running the same jobs, but
> with glusterfs 3.4.2 and I can see no disconnects in the logfiles.
> So I can definitely rule out our infrastructure as problem.
>
> Regards,
> Micha
>
>
>
> Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:
>>
>> Hi Micha,
>>
>> This is great. I will provide you one debug build which has two
>> fixes which I possible suspect for a frequent disconnect issue,
>> though I don't have much data to validate my theory. So I will
>> take one more day to dig in to that.
>>
>> Thanks for your support, and opensource++ 
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/07/2016 05:02 AM, Micha Ober wrote:
>>> Hi,
>>>
>>> thank you for your answer and even more for the question!
>>> Until now, I was using FUSE. Today I changed all mounts to NFS
>>> using the same 3.7.17 version.
>>>
>>> But: The problem is still the same. Now, the NFS logfile
>>> contains lines like these:
>>>
>>> [2016-12-06 15:12:29.006325] C
>>> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
>>> 0-gv0-client-7: server X.X.18.62:49153 has not responded in the
>>> last 42 seconds, disconnecting.
>>>
>>> Interestingly enough,  the IP address X.X.18.62 is the same
>>> machine! As I wrote earlier, each node serves both as a server
>>> and a client, as each node contributes bricks to the volume.
>>> Every server is connecting to itself via its hostname. For
>>> example, the fstab on the node "giant2" looks like:
>>>
>>> #giant2:/gv0/shared_dataglusterfs   defaults,noauto
>>> 0   0
>>> #giant2:/gv2/shared_slurm   glusterfs   defaults,noauto
>>> 0   0
>>>
>>> giant2:/gv0 /shared_datanfs
>>> defaults,_netdev,vers=3 0   0
>>> giant2:/gv2 /shared_slurm   nfs
>>>

Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-18 Thread Mohammed Rafi K C


On 12/16/2016 09:10 PM, yonex wrote:
> Rafi,
>
> Thanks, the .meta feature I didn't know is very nice. I finally have
> captured debug logs from a client and bricks.
>
> A mount log:
> - http://pastebin.com/Tjy7wGGj
>
> FYI rickdom126 is my client's hostname.
>
> Brick logs around that time:
> - Brick1: http://pastebin.com/qzbVRSF3
> - Brick2: http://pastebin.com/j3yMNhP3
> - Brick3: http://pastebin.com/m81mVj6L
> - Brick4: http://pastebin.com/JDAbChf6
> - Brick5: http://pastebin.com/7saP6rsm
>
> However I could not find any message like "EOF on socket". I hope
> there is any helpful information in the logs above.

Indeed. I understand that the connections are in disconnected state. But
what particularly I'm looking for is the cause of the disconnect, Can
you paste the debug logs when it start disconnects, and around that. You
may see a debug logs that says "disconnecting now".


Regards
Rafi KC


>
> Regards.
>
>
> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>
>> On 12/13/2016 09:56 PM, yonex wrote:
>>> Hi Rafi,
>>>
>>> Thanks for your response. OK, I think it is possible to capture debug
>>> logs, since the error seems to be reproduced a few times per day. I
>>> will try that. However, so I want to avoid redundant debug outputs if
>>> possible, is there a way to enable debug log only on specific client
>>> nodes?
>> if you are using fuse mount, there is proc kind of feature called .meta
>> . You can set log level through that for a particular client [1] . But I
>> also want log from bricks because I suspect bricks process for
>> initiating the disconnects.
>>
>>
>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>
>>> Regards
>>>
>>> Yonex
>>>
>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>>>> Hi Yonex,
>>>>
>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>> and check for any message similar to [2]. Basically you can even search
>>>> for "EOF on socket".
>>>>
>>>> You can set your log level back to default (INFO) after capturing for
>>>> some time.
>>>>
>>>>
>>>> [1] : gluster volume set  diagnostics.brick-log-level DEBUG and
>>>> gluster volume set  diagnostics.client-log-level DEBUG
>>>>
>>>> [2] : http://pastebin.com/xn8QHXWa
>>>>
>>>>
>>>> Regards
>>>>
>>>> Rafi KC
>>>>
>>>> On 12/12/2016 09:35 PM, yonex wrote:
>>>>> Hi,
>>>>>
>>>>> When my application moves a file from it's local disk to FUSE-mounted
>>>>> GlusterFS volume, the client outputs many warnings and errors not
>>>>> always but occasionally. The volume is a simple distributed volume.
>>>>>
>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX
>>>>>
>>>>> It seems to come from something like a network disconnection
>>>>> ("Transport endpoint is not connected") at a glance, but other
>>>>> networking applications on the same machine don't observe such a
>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>>>>>
>>>>> It ended in failing to rename a file, logging PHP Warning like below:
>>>>>
>>>>> PHP Warning:  rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
>>>>> to open stream: Input/output error in [snipped].php on line 278
>>>>> PHP Warning:
>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>> Input/output error in [snipped].php on line 278
>>>>>
>>>>> Conditions:
>>>>>
>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6.
>>>>> - Server machines' OS: CentOS 6.
>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
>>>>> - The number of connected FUSE clients is 260.
>>>>> - No firewall between connected machines.
>>>>> - Neither remounting volumes nor rebooting client machines take effect.
>>>>> - It is caused by not only rename() but also copy() and filesize() 
>>>>> operation.
>>>>> - No outputs in brick logs when it happens.
>>>>>
>>>>> Any ideas? I'd appreciate any help.
>>>>>
>>>>> Regards.
>>>>> ___
>>>>> Gluster-users mailing list
>>>>> Gluster-users@gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-13 Thread Mohammed Rafi K C


On 12/13/2016 09:56 PM, yonex wrote:
> Hi Rafi,
>
> Thanks for your response. OK, I think it is possible to capture debug
> logs, since the error seems to be reproduced a few times per day. I
> will try that. However, so I want to avoid redundant debug outputs if
> possible, is there a way to enable debug log only on specific client
> nodes?

if you are using fuse mount, there is proc kind of feature called .meta
. You can set log level through that for a particular client [1] . But I
also want log from bricks because I suspect bricks process for
initiating the disconnects.


[1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel

>
> Regards
>
> Yonex
>
> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavu...@redhat.com>:
>> Hi Yonex,
>>
>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>> and check for any message similar to [2]. Basically you can even search
>> for "EOF on socket".
>>
>> You can set your log level back to default (INFO) after capturing for
>> some time.
>>
>>
>> [1] : gluster volume set  diagnostics.brick-log-level DEBUG and
>> gluster volume set  diagnostics.client-log-level DEBUG
>>
>> [2] : http://pastebin.com/xn8QHXWa
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/12/2016 09:35 PM, yonex wrote:
>>> Hi,
>>>
>>> When my application moves a file from it's local disk to FUSE-mounted
>>> GlusterFS volume, the client outputs many warnings and errors not
>>> always but occasionally. The volume is a simple distributed volume.
>>>
>>> A sample of logs pasted: http://pastebin.com/axkTCRJX
>>>
>>> It seems to come from something like a network disconnection
>>> ("Transport endpoint is not connected") at a glance, but other
>>> networking applications on the same machine don't observe such a
>>> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>>>
>>> It ended in failing to rename a file, logging PHP Warning like below:
>>>
>>> PHP Warning:  rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
>>> to open stream: Input/output error in [snipped].php on line 278
>>> PHP Warning:
>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>> Input/output error in [snipped].php on line 278
>>>
>>> Conditions:
>>>
>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
>>> - Client machines' OS: Scientific Linux 6 or CentOS 6.
>>> - Server machines' OS: CentOS 6.
>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
>>> - The number of connected FUSE clients is 260.
>>> - No firewall between connected machines.
>>> - Neither remounting volumes nor rebooting client machines take effect.
>>> - It is caused by not only rename() but also copy() and filesize() 
>>> operation.
>>> - No outputs in brick logs when it happens.
>>>
>>> Any ideas? I'd appreciate any help.
>>>
>>> Regards.
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] File operation failure on simple distributed volume

2016-12-13 Thread Mohammed Rafi K C
Hi Yonex,

Is this consistently reproducible ? if so, Can you enable debug log [1]
and check for any message similar to [2]. Basically you can even search
for "EOF on socket".

You can set your log level back to default (INFO) after capturing for
some time.


[1] : gluster volume set  diagnostics.brick-log-level DEBUG and
gluster volume set  diagnostics.client-log-level DEBUG

[2] : http://pastebin.com/xn8QHXWa


Regards

Rafi KC

On 12/12/2016 09:35 PM, yonex wrote:
> Hi,
>
> When my application moves a file from it's local disk to FUSE-mounted
> GlusterFS volume, the client outputs many warnings and errors not
> always but occasionally. The volume is a simple distributed volume.
>
> A sample of logs pasted: http://pastebin.com/axkTCRJX
>
> It seems to come from something like a network disconnection
> ("Transport endpoint is not connected") at a glance, but other
> networking applications on the same machine don't observe such a
> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>
> It ended in failing to rename a file, logging PHP Warning like below:
>
> PHP Warning:  rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
> to open stream: Input/output error in [snipped].php on line 278
> PHP Warning:
> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
> Input/output error in [snipped].php on line 278
>
> Conditions:
>
> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
> - Client machines' OS: Scientific Linux 6 or CentOS 6.
> - Server machines' OS: CentOS 6.
> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
> - The number of connected FUSE clients is 260.
> - No firewall between connected machines.
> - Neither remounting volumes nor rebooting client machines take effect.
> - It is caused by not only rename() but also copy() and filesize() operation.
> - No outputs in brick logs when it happens.
>
> Any ideas? I'd appreciate any help.
>
> Regards.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-12-07 Thread Mohammed Rafi K C
Hi Micha,

This is great. I will provide you one debug build which has two fixes
which I possible suspect for a frequent disconnect issue, though I don't
have much data to validate my theory. So I will take one more day to dig
in to that.

Thanks for your support, and opensource++ 

Regards

Rafi KC

On 12/07/2016 05:02 AM, Micha Ober wrote:
> Hi,
>
> thank you for your answer and even more for the question!
> Until now, I was using FUSE. Today I changed all mounts to NFS using
> the same 3.7.17 version.
>
> But: The problem is still the same. Now, the NFS logfile contains
> lines like these:
>
> [2016-12-06 15:12:29.006325] C
> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gv0-client-7:
> server X.X.18.62:49153 has not responded in the last 42 seconds,
> disconnecting.
>
> Interestingly enough,  the IP address X.X.18.62 is the same machine!
> As I wrote earlier, each node serves both as a server and a client, as
> each node contributes bricks to the volume. Every server is connecting
> to itself via its hostname. For example, the fstab on the node
> "giant2" looks like:
>
> #giant2:/gv0/shared_dataglusterfs   defaults,noauto 0   0
> #giant2:/gv2/shared_slurm   glusterfs   defaults,noauto 0   0
>
> giant2:/gv0 /shared_datanfs
> defaults,_netdev,vers=3 0   0
> giant2:/gv2 /shared_slurm   nfs
> defaults,_netdev,vers=3 0   0
>
> So I understand the disconnects even less.
>
> I don't know if it's possible to create a dummy cluster which exposes
> the same behaviour, because the disconnects only happen when there are
> compute jobs running on those nodes - and they are GPU compute jobs,
> so that's something which cannot be easily emulated in a VM.
>
> As we have more clusters (which are running fine with an ancient 3.4
> version :-)) and we are currently not dependent on this particular
> cluster (which may stay like this for this month, I think) I should be
> able to deploy the debug build on the "real" cluster, if you can
> provide a debug build.
>
> Regards and thanks,
> Micha
>
>
>
> Am 06.12.2016 um 08:15 schrieb Mohammed Rafi K C:
>>
>>
>>
>> On 12/03/2016 12:56 AM, Micha Ober wrote:
>>> ** Update: ** I have downgraded from 3.8.6 to 3.7.17 now, but the
>>> problem still exists.
>>>
>>> Client log: http://paste.ubuntu.com/23569065/
>>> Brick log: http://paste.ubuntu.com/23569067/
>>>
>>> Please note that each server has two bricks.
>>> Whereas, according to the logs, one brick loses the connection to
>>> all other hosts:
>>> [2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)
>>>
>>> The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
>>> As I said, the network connection is fine and the disks are idle.
>>> The CPU always has 2 free cores.
>>>
>>> It looks like I have to downgrade to 3.4 now in order for the disconnects 
>>> to stop.
>>
>> Hi Micha,
>>
>> Thanks for the update and sorry for what happened with gluster higher
>> versions. I can understand the need for downgrade as it is a
>> production setup.
>>
>> Can you tell me the clients used here ? whether it is a
>> fuse,nfs,nfs-ganesha, smb or libgfapi ?
>>
>> Since I'm not able to reproduce the issue (I have been trying from
>> last 3days) and the logs are not much helpful here (we don't have
>> much logs in socket layer), Could you please create a dummy cluster
>> and try to reproduce the issue? If then we can play with that volume
>> and I could provide some debug build which we can use for further
>> debugging?
>>
>> If you don't have bandwidth for this, please leave it ;).
>>
>> Regards
>> Rafi KC
>>
>>> - Micha
>>>
>>> Am 30.11.2016 um 06:57 schrieb Mohammed Rafi K C:
>>>>
>>>> Hi Micha,
>>>>
>>>> I have changed the thread and subject

Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-12-05 Thread Mohammed Rafi K C


On 12/03/2016 12:56 AM, Micha Ober wrote:
> ** Update: ** I have downgraded from 3.8.6 to 3.7.17 now, but the
> problem still exists.
>
> Client log: http://paste.ubuntu.com/23569065/
> Brick log: http://paste.ubuntu.com/23569067/
>
> Please note that each server has two bricks.
> Whereas, according to the logs, one brick loses the connection to all
> other hosts:
> [2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: 
> writev on X.X.X.219:49121 failed (Broken pipe)
> [2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: 
> writev on X.X.X.62:49118 failed (Broken pipe)
> [2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: 
> writev on X.X.X.107:49121 failed (Broken pipe)
> [2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: 
> writev on X.X.X.206:49120 failed (Broken pipe)
> [2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: 
> writev on X.X.X.58:49121 failed (Broken pipe)
>
> The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
> As I said, the network connection is fine and the disks are idle.
> The CPU always has 2 free cores.
>
> It looks like I have to downgrade to 3.4 now in order for the disconnects to 
> stop.

Hi Micha,

Thanks for the update and sorry for what happened with gluster higher
versions. I can understand the need for downgrade as it is a production
setup.

Can you tell me the clients used here ? whether it is a
fuse,nfs,nfs-ganesha, smb or libgfapi ?

Since I'm not able to reproduce the issue (I have been trying from last
3days) and the logs are not much helpful here (we don't have much logs
in socket layer), Could you please create a dummy cluster and try to
reproduce the issue? If then we can play with that volume and I could
provide some debug build which we can use for further debugging?

If you don't have bandwidth for this, please leave it ;).

Regards
Rafi KC

>
> - Micha
>
> Am 30.11.2016 um 06:57 schrieb Mohammed Rafi K C:
>>
>> Hi Micha,
>>
>> I have changed the thread and subject so that your original thread
>> remain same for your query. Let's try to fix the problem what you
>> observed with 3.8.4, So I have started a new thread to discuss the
>> frequent disconnect problem.
>>
>> *If any one else has experienced the same problem, please respond to
>> the mail.*
>>
>> It would be very helpful if you could give us some more logs from
>> clients and bricks.  Also any reproducible steps will surely help to
>> chase the problem further.
>>
>> Regards
>>
>> Rafi KC
>>
>> On 11/30/2016 04:44 AM, Micha Ober wrote:
>>> I had opened another thread on this mailing list (Subject: "After
>>> upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in
>>> disconnects and split-brain").
>>>
>>> The title may be a bit misleading now, as I am no longer observing
>>> high CPU usage after upgrading to 3.8.6, but the disconnects are
>>> still happening and the number of files in split-brain is growing.
>>>
>>> Setup: 6 compute nodes, each serving as a glusterfs server and
>>> client, Ubuntu 14.04, two bricks per node, distribute-replicate
>>>
>>> I have two gluster volumes set up (one for scratch data, one for the
>>> slurm scheduler). Only the scratch data volume shows critical errors
>>> "[...] has not responded in the last 42 seconds, disconnecting.". So
>>> I can rule out network problems, the gigabit link between the nodes
>>> is not saturated at all. The disks are almost idle (<10%).
>>>
>>> I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster,
>>> running fine since it was deployed.
>>> I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine
>>> for almost a year.
>>>
>>> After upgrading to 3.8.5, the problems (as described) started. I
>>> would like to use some of the new features of the newer versions
>>> (like bitrot), but the users can't run their compute jobs right now
>>> because the result files are garbled.
>>>
>>> There also seems to be a bug report with a smiliar problem: (but no
>>> progress)
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>>>
>>> For me, ALL servers are affected (not isolated to one or two servers)
>>>
>>> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked
>>> for more than 120 seconds." in the syslog.
>>>
>>> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>>>
>>

[Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-11-29 Thread Mohammed Rafi K C
Hi Micha,

I have changed the thread and subject so that your original thread
remain same for your query. Let's try to fix the problem what you
observed with 3.8.4, So I have started a new thread to discuss the
frequent disconnect problem.

*If any one else has experienced the same problem, please respond to the
mail.*

It would be very helpful if you could give us some more logs from
clients and bricks.  Also any reproducible steps will surely help to
chase the problem further.

Regards

Rafi KC

On 11/30/2016 04:44 AM, Micha Ober wrote:
> I had opened another thread on this mailing list (Subject: "After
> upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects
> and split-brain").
>
> The title may be a bit misleading now, as I am no longer observing
> high CPU usage after upgrading to 3.8.6, but the disconnects are still
> happening and the number of files in split-brain is growing.
>
> Setup: 6 compute nodes, each serving as a glusterfs server and client,
> Ubuntu 14.04, two bricks per node, distribute-replicate
>
> I have two gluster volumes set up (one for scratch data, one for the
> slurm scheduler). Only the scratch data volume shows critical errors
> "[...] has not responded in the last 42 seconds, disconnecting.". So I
> can rule out network problems, the gigabit link between the nodes is
> not saturated at all. The disks are almost idle (<10%).
>
> I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster,
> running fine since it was deployed.
> I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine
> for almost a year.
>
> After upgrading to 3.8.5, the problems (as described) started. I would
> like to use some of the new features of the newer versions (like
> bitrot), but the users can't run their compute jobs right now because
> the result files are garbled.
>
> There also seems to be a bug report with a smiliar problem: (but no
> progress)
> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>
> For me, ALL servers are affected (not isolated to one or two servers)
>
> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked for
> more than 120 seconds." in the syslog.
>
> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>
> [root@giant2: ~]# gluster v info
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x 2 = 12
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdc/gv0
> Brick2: giant2:/gluster/sdc/gv0
> Brick3: giant3:/gluster/sdc/gv0
> Brick4: giant4:/gluster/sdc/gv0
> Brick5: giant5:/gluster/sdc/gv0
> Brick6: giant6:/gluster/sdc/gv0
> Brick7: giant1:/gluster/sdd/gv0
> Brick8: giant2:/gluster/sdd/gv0
> Brick9: giant3:/gluster/sdd/gv0
> Brick10: giant4:/gluster/sdd/gv0
> Brick11: giant5:/gluster/sdd/gv0
> Brick12: giant6:/gluster/sdd/gv0
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> nfs.disable: on
>
> Volume Name: gv2
> Type: Replicate
> Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdd/gv2
> Brick2: giant2:/gluster/sdd/gv2
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> cluster.granular-entry-heal: on
> cluster.locking-scheme: granular
> nfs.disable: on
>
>
> 2016-11-30 0:10 GMT+01:00 Micha Ober  >:
>
> There also seems to be a bug report with a smiliar problem: (but
> no progress)
> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
> 
>
> For me, ALL servers are affected (not isolated to one or two servers)
>
> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked
> for more than 120 seconds." in the syslog.
>
> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>
> [root@giant2: ~]# gluster v info
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x 2 = 12
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdc/gv0
> Brick2: giant2:/gluster/sdc/gv0
> Brick3: giant3:/gluster/sdc/gv0
> Brick4: giant4:/gluster/sdc/gv0
> Brick5: giant5:/gluster/sdc/gv0
> Brick6: giant6:/gluster/sdc/gv0
> Brick7: giant1:/gluster/sdd/gv0
> Brick8: giant2:/gluster/sdd/gv0
> Brick9: giant3:/gluster/sdd/gv0
> Brick10: giant4:/gluster/sdd/gv0
> Brick11: giant5:/gluster/sdd/gv0
> Brick12: giant6:/gluster/sdd/gv0
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> nfs.disable: on
>
> Volume Name: gv2
> Type: Replicate
> Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 

Re: [Gluster-users] Volume ping-timeout parameter and client side mount timeouts

2016-11-15 Thread Mohammed Rafi K C
If I understand the query correctly, the problem is that gluster takes
more than 20seconds to timeout even though the brick was offline for
more than 35s. With that assumptions I have some

How did you understand that the timer has expired after 35s only, by log
file? If so glusterfs wait  some time to flush the logs to console to
push it as batch, not sure how long. So the actual timing in the logs
may not be accurate.

If you have already confirmed that by using a wireshark or similar tools
that it takes more than 20seconds to disconnect the socket, then there
could be some thing else which we need to look into.

Can you conform that using wireshark or similar tools if not already done.


Rafi KC


On 11/14/2016 09:13 PM, Martin Schlegel wrote:
> Hello Gluster Community
>
> We have 2x brick nodes running with replication for a volume gv0 for which 
> set a
> "gluster volume set gv0 ping-timeout 20".
>
> In our tests it seemed there is unknown delay with this ping-timeout - we see 
> it
> timing out much later after about 35 seconds and not at around 20 seconds (see
> test below).
>
> Our distributed database cluster is using Gluster as a secondary file system 
> for
> backups etc. - it's Pacemaker cluster manager needs to know how long to wait
> before giving up on the glusterfs mounted file system to become available 
> again
> or when to failover to another node.
>
> 1. When do we know when to give up waiting on the glusterfs mount point to
> become accessible again following an outage on the brick server this client 
> was
> connected to ?
> 2. Is there a timeout / interval setting on the client side that we could
> reduce, so that it more quickly tries to switch the mount point to a 
> different,
> available brick server ?
>
>
> Regards,
> Martin Schlegel
>
> __
>
> Here is how we tested this:
>
> As a test we blocked the entire network on one of these brick nodes:
> root@glusterfs-brick-node1 $ date;iptables -A INPUT -i bond0 -j DROP ; 
> iptables
> -A OUTPUT -o bond0 -j DROP
> Mon Nov 14 08:26:55 UTC 2016
>
> From the syslog on the glusterfs-client-node
> Nov 14 08:27:30 glusterfs-client-node1 pgshared1[26783]: [2016-11-14
> 08:27:30.275694] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
> 0-gv0-client-0: server glusterfs-brick-node1:49152 has not responded in the 
> last
> 20 seconds, disconnecting.
>
> <--- This last message "has not responded in the last 20 seconds" is confusing
> to me, because the brick node was clearly blocked for 35 seconds already ! Is
> there some client-side check interval that can be reduced ?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Feature Request: Lock Volume Settings

2016-11-13 Thread Mohammed Rafi K C
I think it is worth to implement a lock option.

+1


Rafi KC


On 11/14/2016 06:12 AM, David Gossage wrote:
> On Sun, Nov 13, 2016 at 6:35 PM, Lindsay Mathieson
> > wrote:
>
> As discussed recently, it is way to easy to make destructive changes
> to a volume,e.g change shard size. This can corrupt the data with no
> warnings and its all to easy to make a typo or access the wrong volume
> when doing 3am maintenance ...
>
> So I'd like to suggest something like the following:
>
>   gluster volume lock 
>
> Setting this would fail all:
> - setting changes
> - add bricks
> - remove bricks
> - delete volume
>
>   gluster volume unlock 
>
> would allow all changes to be made.
>
> Just a thought, open to alternate suggestions.
>
> Thanks
>
> +
> sounds handy 
>
> --
> Lindsay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Feedback on DHT option "cluster.readdir-optimize"

2016-11-10 Thread Mohammed Rafi K C


On 11/10/2016 09:05 PM, Raghavendra G wrote:
>
>
> On Thu, Nov 10, 2016 at 8:57 PM, Vijay Bellur  > wrote:
>
> On Thu, Nov 10, 2016 at 3:17 AM, Nithya Balachandran
> > wrote:
> >
> >
> > On 8 November 2016 at 20:21, Kyle Johnson  > wrote:
> >>
> >> Hey there,
> >>
> >> We have a number of processes which daily walk our entire
> directory tree
> >> and perform operations on the found files.
> >>
> >> Pre-gluster, this processes was able to complete within 24 hours of
> >> starting.  After outgrowing that single server and moving to a
> gluster setup
> >> (two bricks, two servers, distribute, 10gig uplink), the
> processes became
> >> unusable.
> >>
> >> After turning this option on, we were back to normal run times,
> with the
> >> process completing within 24 hours.
> >>
> >> Our data is heavy nested in a large number of subfolders under
> /media/ftp.
> >
> >
> > Thanks for getting back to us - this is very good information.
> Can you
> > provide a few more details?
> >
> > How deep is your directory tree and roughly how many directories
> do you have
> > at each level?
> > Are all your files in the lowest level dirs or do they exist on
> several
> > levels?
> > Would you be willing to provide the gluster volume info output
> for this
> > volume?
> >>
>
>
> I have had performance improvement with this option when the first
> level below the root consisted several thousands of directories
> without any files. IIRC, I was testing this in a 16 x 2 setup.
>
>
> Yes Vijay. I remember you mentioning it. This option is expected to
> only boost readdir performance on a directory containing
> subdirectories. For files it has no effect.
>
> On a similar note, I think we can also skip linkto files in readdirp 
> (on brick) as dht_readdirp picks the dentry from subvol containing
> data-file.

doing so will break tier_readdirp.

Rafi KC

>
>
> Regards,
> Vijay
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
>
>
>
>
> -- 
> Raghavendra G
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] getting "Transport endpoint is not connected" in glusterfs mount log file.

2016-11-10 Thread Mohammed Rafi K C
Hi Abhishek,

Could you please see if you are bricks are healthy or not, may be you
can do a gluster volume status or you can look into the logs. If bricks
are not running can you please attach the bricks logs in
/var/log/gluster/bricks/ .


Rafi KC


On 11/11/2016 10:20 AM, ABHISHEK PALIWAL wrote:
> Hi,
>
> Its an urgent case.
>
> Atleast provide your views on this
>
> On Wed, Nov 9, 2016 at 11:08 AM, ABHISHEK PALIWAL
> > wrote:
>
> Hi,
>
> We could see that sync is getting failed to sync the GlusterFS
> bricks due to error trace "Transport endpoint is not connected "
>
> [2016-10-31 04:06:03.627395] E [MSGID: 114031]
> [client-rpc-fops.c:1673:client3_3_finodelk_cbk]
> 0-c_glusterfs-client-9: remote operation failed [Transport
> endpoint is not connected]
> [2016-10-31 04:06:03.628381] I
> [socket.c:3308:socket_submit_request] 0-c_glusterfs-client-9: not
> connected (priv->connected = 0)
> [2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit]
> 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f
> Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport
> (c_glusterfs-client-9)
> [2016-10-31 04:06:03.628466] E [MSGID: 114031]
> [client-rpc-fops.c:1673:client3_3_finodelk_cbk]
> 0-c_glusterfs-client-9: remote operation failed [Transport
> endpoint is not connected]
> [2016-10-31 04:06:03.628475] I [MSGID: 108019]
> [afr-lk-common.c:1086:afr_lock_blocking]
> 0-c_glusterfs-replicate-0: unable to lock on even one child
> [2016-10-31 04:06:03.628539] I [MSGID: 108019]
> [afr-transaction.c:1224:afr_post_blocking_inodelk_cbk]
> 0-c_glusterfs-replicate-0: Blocking inodelks failed.
> [2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk]
> 0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is
> not connected)
> [2016-10-31 04:06:03.629149] E
> [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58]
> (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90]
> (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10]
> (-->
> 
> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18]
> (-->
> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808]
> ) 0-c_glusterfs-client-9: forced unwinding frame
> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-10-31
> 04:06:03.624346 (xid=0x7f5a)
> [2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-c_glusterfs-client-9: changing port to 49391 (from 0)
> [2016-10-31 04:06:03.629210] W [MSGID: 114031]
> [client-rpc-fops.c:2971:client3_3_lookup_cbk]
> 0-c_glusterfs-client-9: remote operation failed. Path:
> /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews
> (b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not
> connected]
>
>
> Could you please tell us the reason why we are getting these trace
> and how to resolve this.
>
> Logs are attached here please share your analysis.
>
> Thanks in advanced
>
> -- 
> Regards
> Abhishek Paliwal
>
>
>
>
> -- 
>
>
>
>
> Regards
> Abhishek Paliwal
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Problem with add-brick

2016-09-30 Thread Mohammed Rafi K C
It seems like an actual bug, if youcan file a bug in bugzilla, that
would be great.


At least I don't see workaround for this issue, may  be till the next
update is available with fix, you can use either rdma alone or tcp alone
volume.

Let me know whether this is acceptable, if so I can give you the steps
to change the transport of an existing volume.


Regards

Rafi KC


On 09/30/2016 10:35 AM, Mohammed Rafi K C wrote:
>
>
>
> On 09/30/2016 02:35 AM, Dennis Michael wrote:
>>
>> Are there any workarounds to this?  RDMA is configured on my servers.
>
>
> By this, I assume your rdma setup/configuration over IPoIB is working
> fine.
>
> Can you tell us what machine you are using and whether SELinux is
> configured on the machine or not.
>
> Also I couldn't see any logs attached here.
>
> Rafi KC
>
>
>>
>> Dennis
>>
>> On Thu, Sep 29, 2016 at 7:19 AM, Atin Mukherjee <amukh...@redhat.com
>> <mailto:amukh...@redhat.com>> wrote:
>>
>> Dennis,
>>
>> Thanks for sharing the logs.
>>
>> It seems like a volume configured created with tcp,rdma transport
>> fails to start (atleast in my local set up). The issue here is
>> although the brick process comes up, but glusterd receives a non
>> zero ret code from the runner interface which spawns the brick
>> process(es).
>>
>> Raghavendra Talur/Rafi,
>>
>> Is this an intended behaviour if rdma device is not configured?
>> Please chime in with your thoughts
>>
>>
>> On Wed, Sep 28, 2016 at 10:22 AM, Atin Mukherjee
>> <amukh...@redhat.com> wrote:
>>
>> Dennis,
>>
>> It seems like that add-brick has definitely failed and the
>> entry is not committed into glusterd store. volume status and
>> volume info commands are referring the in-memory data for fs4
>> (which exist) but post a restart they are no longer
>> available. Could you run glusterd with debug log enabled
>> (systemctl stop glusterd; glusterd -LDEBUG) and provide us
>> cmd_history.log, glusterd log along with fs4 brick log files
>> to further analyze the issue? Regarding the missing RDMA
>> ports for fs2, fs3 brick can you cross check if
>> glusterfs-rdma package is installed on both the nodes?
>>
>> On Wed, Sep 28, 2016 at 7:14 AM, Ravishankar N
>> <ravishan...@redhat.com> wrote:
>>
>> On 09/27/2016 10:29 PM, Dennis Michael wrote:
>>>
>>>
>>> [root@fs4 bricks]# gluster volume info
>>>  
>>> Volume Name: cees-data
>>> Type: Distribute
>>> Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
>>> Status: Started
>>> Number of Bricks: 4
>>> Transport-type: tcp,rdma
>>> Bricks:
>>> Brick1: fs1:/data/brick
>>> Brick2: fs2:/data/brick
>>> Brick3: fs3:/data/brick
>>> Brick4: fs4:/data/brick
>>> Options Reconfigured:
>>> features.quota-deem-statfs: on
>>> features.inode-quota: on
>>> features.quota: on
>>> performance.readdir-ahead: on
>>> [root@fs4 bricks]# gluster volume status
>>> Status of volume: cees-data
>>> Gluster process TCP Port
>>>  RDMA Port  Online  Pid
>>> 
>>> --
>>> Brick fs1:/data/brick   49152
>>> 49153  Y   1878 
>>> Brick fs2:/data/brick   49152 0
>>>  Y   1707 
>>> Brick fs3:/data/brick   49152 0
>>>  Y   4696 
>>> Brick fs4:/data/brick   N/A  
>>> N/AN   N/A  
>>> NFS Server on localhost 2049  0
>>>  Y   13808
>>> Quota Daemon on localhost   N/A  
>>> N/AY   13813
>>> NFS Server on fs1   2049  0
>>>  Y   6722 
>>> Quota Daemon on fs1

Re: [Gluster-users] Problem with add-brick

2016-09-30 Thread Mohammed Rafi K C


On 09/30/2016 02:35 AM, Dennis Michael wrote:
>
> Are there any workarounds to this?  RDMA is configured on my servers.


By this, I assume your rdma setup/configuration over IPoIB is working fine.

Can you tell us what machine you are using and whether SELinux is
configured on the machine or not.

Also I couldn't see any logs attached here.

Rafi KC


>
> Dennis
>
> On Thu, Sep 29, 2016 at 7:19 AM, Atin Mukherjee  > wrote:
>
> Dennis,
>
> Thanks for sharing the logs.
>
> It seems like a volume configured created with tcp,rdma transport
> fails to start (atleast in my local set up). The issue here is
> although the brick process comes up, but glusterd receives a non
> zero ret code from the runner interface which spawns the brick
> process(es).
>
> Raghavendra Talur/Rafi,
>
> Is this an intended behaviour if rdma device is not configured?
> Please chime in with your thoughts
>
>
> On Wed, Sep 28, 2016 at 10:22 AM, Atin Mukherjee
> > wrote:
>
> Dennis,
>
> It seems like that add-brick has definitely failed and the
> entry is not committed into glusterd store. volume status and
> volume info commands are referring the in-memory data for fs4
> (which exist) but post a restart they are no longer available.
> Could you run glusterd with debug log enabled (systemctl stop
> glusterd; glusterd -LDEBUG) and provide us cmd_history.log,
> glusterd log along with fs4 brick log files to further analyze
> the issue? Regarding the missing RDMA ports for fs2, fs3 brick
> can you cross check if glusterfs-rdma package is installed on
> both the nodes?
>
> On Wed, Sep 28, 2016 at 7:14 AM, Ravishankar N
> > wrote:
>
> On 09/27/2016 10:29 PM, Dennis Michael wrote:
>>
>>
>> [root@fs4 bricks]# gluster volume info
>>  
>> Volume Name: cees-data
>> Type: Distribute
>> Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
>> Status: Started
>> Number of Bricks: 4
>> Transport-type: tcp,rdma
>> Bricks:
>> Brick1: fs1:/data/brick
>> Brick2: fs2:/data/brick
>> Brick3: fs3:/data/brick
>> Brick4: fs4:/data/brick
>> Options Reconfigured:
>> features.quota-deem-statfs: on
>> features.inode-quota: on
>> features.quota: on
>> performance.readdir-ahead: on
>> [root@fs4 bricks]# gluster volume status
>> Status of volume: cees-data
>> Gluster process TCP Port
>>  RDMA Port  Online  Pid
>> 
>> --
>> Brick fs1:/data/brick   49152
>> 49153  Y   1878 
>> Brick fs2:/data/brick   49152 0  
>>Y   1707 
>> Brick fs3:/data/brick   49152 0  
>>Y   4696 
>> Brick fs4:/data/brick   N/A   N/A
>>N   N/A  
>> NFS Server on localhost 2049  0  
>>Y   13808
>> Quota Daemon on localhost   N/A   N/A
>>Y   13813
>> NFS Server on fs1   2049  0  
>>Y   6722 
>> Quota Daemon on fs1 N/A   N/A
>>Y   6730 
>> NFS Server on fs3   2049  0  
>>Y   12553
>> Quota Daemon on fs3 N/A   N/A
>>Y   12561
>> NFS Server on fs2   2049  0  
>>Y   11702
>> Quota Daemon on fs2 N/A   N/A
>>Y   11710
>>  
>> Task Status of Volume cees-data
>> 
>> --
>> There are no active volume tasks
>>  
>> [root@fs4 bricks]# ps auxww | grep gluster
>> root 13791  0.0  0.0 701472 19768 ?Ssl  09:06
>>   0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid
>> --log-level INFO
>> root 13808  0.0  0.0 560236 41420 ?Ssl  09:07
>>   0:00 /usr/sbin/glusterfs -s localhost --volfile-id
>> gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l
>> 

Re: [Gluster-users] quota problem on large cluster

2016-09-08 Thread Mohammed Rafi K C


On 09/07/2016 04:18 PM, Fedele Stabile wrote:
> Hello,
> I have a gluster volume made of 64 bricks: 
> volume type is distribute  and transport is rdma
> I have distributed bricks in a cluster of 32 server with 2 bricks each.
> This volume is mounted on all the 32 servers and also on another server
> connected to the cluster.

Great. How long have you been using this configuration, and how rdma
performance stands ?


> Now I want to implement quota on this gluster volume:
> all seems ok and I can define quota-limits and make tests but when I
> mount the volume on the 33 server the mount operation hangs and also
> volume is not accessible on all servers.
> All servers are Centos 6.6 and gluster 3.7.8
> Can you help me to understand what is wrong?
Can you give me the mount logs from client (default logs :
/var/log/glusterfs/mount-point-direcoty-name.log) and quotad, bricks logs.

Regards
Rafi KC


>
> Fedele Stabile
>  
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Tiering and sharding for VM workload

2016-09-06 Thread Mohammed Rafi K C
Yes, you are correct. On a sharded volume, the hot and cold would be
based on sharded chunks.

I'm stressing the point which Krutika mentioned in her mail that we
haven't tested the use case in depth.


Regards
Rafi KC

On 09/06/2016 06:38 PM, Krutika Dhananjay wrote:
> Theoretically whatever you said is correct (at least from shard's
> perspective).
> Adding Rafi who's worked on tiering to know if he thinks otherwise.
>
> It must be mentioned that sharding + tiering hasn't been tested as
> such till now by us at least.
>
> Did you try it? If so, what was your experience?
>
> -Krutika 
>
> On Tue, Sep 6, 2016 at 5:59 PM, Gandalf Corvotempesta
>  > wrote:
>
> Anybody?
>
>
> Il 05 set 2016 22:19, "Gandalf Corvotempesta"
>  > ha scritto:
>
> Is tiering with sharding usefull with VM workload?
> Let's assume a storage with tiering and sharding enabled, used for
> hosting VM images.
> Each shard is subject to tiering, thus the most frequent part
> of the
> VM would be cached on the SSD, allowing better performance.
>
> Is this correct?
>
> To put it simple, very simple, let's assume a webserver VM,
> with the
> following directory structure:
>
> /home/user1/public_html
> /home/user2/public_html
>
> both are stored on 2 different shards (i'm semplyfing).
> /home/user1/public_html has much more visits than user2.
>
> Would that shard cached on hot tier allowing faster access by
> the webserver?
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
>
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: Very slow performance when enabling tierd storage with SSD

2016-09-03 Thread Mohammed Rafi K C


On 09/03/2016 09:34 PM, Benjamin Kingston wrote:
> Hello Rafi,
>
> Thanks for the reply please see my answers below
>
> On Sat, Sep 3, 2016 at 12:07 AM, Mohammed Rafi K C
> <rkavu...@redhat.com <mailto:rkavu...@redhat.com>> wrote:
>
> Hi Benjamin,
>
>
> Can you tell us more about your work-load like the file size
>
> Files are a range from 10GB test file generated from /dev/urandom,
> many 100MB folder separated files to smaller images and text files
> This is a lab so typically there is no load, in the case of my testing
> this issue the only file being accessed was the 10GB test file. 
>
> , size of both hot and cold storage
>
> Hot storage is a 315GB portion of a 512GB SSD
> Cold storage is a replica made up of 3 bricks on two nodes. totaling 17TB
>
> how the files are created (after attaching the tier or before
> attaching the tier)
>
> I've experienced performance issues with files created before hot
> attachment, 10GB test file, and copying files to the volume after
> attachment (100MB files)

Files created before attaching hot tier will be present on hot brick
until it gets heated and migrated completely. During this time interval
we won't get the benefit of hot storage since the files are served from
cold brick.


For smallfiles (order of kbs), We are experiencing some performance
issue mostly with  EC (disperse ) volume.

Follow up questions,
What is the volume type ? and version of glusterfs ?

Does reread gives an equal performance when it hits the server ?

Have you done any rename for those files ?

Regards
Rafi KC


> , how long have you been using the tier volume, etc.
>
> At the moment I am not using the hot tier after discovering the
> performance loss. I originally turned on after learning of the feature
> a month or so ago. I feel like I may have actually had this issue
> since then because I have been troubleshooting a network throughalput
> issue since then that was contributing to a 4MB/s write to the volume.
> I recently resolved that issue and the 4MB/s write was still observed
> until I removed the hot tier.
>
> Unfortunate timing on both issues.
>
>
> Regards
>
> Rafi KC
>
>
>
>
> On 09/03/2016 09:16 AM, Benjamin Kingston wrote:
>> Hello all,
>>
>> I've discovered an issue in my lab that went unnoticed until
>> recently, or just came about with the latest Centos release.
>>
>> When the SSD hot tier is enabled read from the volume is 2MB/s,
>> after detaching AND committing, read of the same file is at
>> 150MB/s to /dev/null
>>
>> If I copy the file to the hot bricks directly the write is
>> 150MB/s and read is 500MB/s on the first read, and then 4GB/s on
>> the subsequent reads (filesystem RAM caching) to /dev/null
>>
>> Just enabling tier storage takes the performance to ~10-20IOPS
>> and 2-10MB/s even for local node mounted volume.
>>
>> I don't see any major log issues and I detached and did a
>> fix-layout, but it persists when re-enabling the tier.
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: Very slow performance when enabling tierd storage with SSD

2016-09-03 Thread Mohammed Rafi K C
Hi Benjamin,


Can you tell us more about your work-load like the file size, size of
both hot and cold storage, how the files are created (after attaching
the tier or before attaching the tier) , how long have you been using
the tier volume, etc.


Regards

Rafi KC




On 09/03/2016 09:16 AM, Benjamin Kingston wrote:
> Hello all,
>
> I've discovered an issue in my lab that went unnoticed until recently,
> or just came about with the latest Centos release.
>
> When the SSD hot tier is enabled read from the volume is 2MB/s, after
> detaching AND committing, read of the same file is at 150MB/s to /dev/null
>
> If I copy the file to the hot bricks directly the write is 150MB/s and
> read is 500MB/s on the first read, and then 4GB/s on the subsequent
> reads (filesystem RAM caching) to /dev/null
>
> Just enabling tier storage takes the performance to ~10-20IOPS and
> 2-10MB/s even for local node mounted volume.
>
> I don't see any major log issues and I detached and did a fix-layout,
> but it persists when re-enabling the tier.
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] CFP for Gluster Developer Summit

2016-08-19 Thread Mohammed Rafi K C
Hi All,


Here is my proposal

Tittle : Debugging a live gluster file system using .meta directory

Theme : Troubleshooting

Meta is a client side xlator which provide  an interface similar to the
Linux procfs, for GlusterFS runtime and configuration.

The contents are provided using a virtual hidden directory called .meta
which is inside the root of GlusterFS mount.


Planning to cover the following topics:

* current state of meta xlators,

* Information's that can be fetched through .meta directory

* Debugging with .meta directory (for both developers and users)

* Enhancement planned for meta xlators

* Other troubleshooting  options like statedump,io-stat, etc


Regards

Rafi KC

On 08/13/2016 01:18 AM, Vijay Bellur wrote:
> Hey All,
>
> Gluster Developer Summit 2016 is fast approaching [1] on us. We are
> looking to have talks and discussions related to the following themes
> in the summit:
>
> 1. Gluster.Next - focusing on features shaping the future of Gluster
>
> 2. Experience - Description of real world experience and feedback from:
>a> Devops and Users deploying Gluster in production
>b> Developers integrating Gluster with other
> ecosystems
>
> 3. Use cases  - focusing on key use cases that drive Gluster.today and
> Gluster.Next
>
> 4. Stability & Performance - focusing on current improvements to
> reduce our technical debt backlog
>
> 5. Process & infrastructure  - focusing on improving current workflow,
> infrastructure to make life easier for all of us!
>
> If you have a talk/discussion proposal that can be part of these
> themes, please send out your proposal(s) by replying to this thread.
> Please clearly mention the theme for which your proposal is relevant
> when you do so. We will be ending the CFP by 12 midnight PDT on August
> 31st, 2016.
>
> If you have other topics that do not fit in the themes listed, please
> feel free to propose and we might be able to accommodate some of them
> as lightening talks or something similar.
>
> Please do reach out to me or Amye if you have any questions.
>
> Thanks!
> Vijay
>
> [1] https://www.gluster.org/events/summit2016/
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Minutes : Gluster Community meeting (Wednesday 17th Aug 2016)

2016-08-17 Thread Mohammed Rafi K C
Hi All,

The meeting minutes and logs for this weeks meeting are available at
the links below.Minutes :
https://meetbot.fedoraproject.org/gluster-meeting/2016-08-17/weekly_community_meeting_17-aug-2016.2016-08-17-12.00.htmlMinutes
(Text)
:https://meetbot-raw.fedoraproject.org/gluster-meeting/2016-08-17/weekly_community_meeting_17-aug-2016.2016-08-17-12.00.txtlog
:
https://meetbot.fedoraproject.org/gluster-meeting/2016-08-17/weekly_community_meeting_17-aug-2016.2016-08-17-12.00.log.html

We had a very lively meeting this time, and had good participation.
Hope next weeks meeting is also the same. The next meeting is as
always at 1200UTC next Wednesday in #gluster-meeting. See you all
there and thank you for attending todays meeting.

*Please note that we have decided to do the screening of remaining 3.6
bugs as it is reached EOL. This task will take place on next Tuesday as
part of the bug triage meeting. If you are a maintainer, Please ensure
your presence. Looking forward to see everyone for this bug screening . *
Regards!
Rafi KC


Meeting summary
---
* Roll call  (rafi, 12:01:25)
  * The agenda is available at
https://public.pad.fsfe.org/p/gluster-community-meetings  (rafi,
12:02:22)
  * LINK: https://public.pad.fsfe.org/p/gluster-community-meetings
(rafi, 12:02:34)

* Next weeks meeting host  (rafi, 12:05:33)
  * kshlm will be hosting next week community meeting  (rafi, 12:07:49)

* GlusterFS-4.0  (rafi, 12:08:08)

* GlusterFS-3.9  (rafi, 12:15:19)

* GlusterFS-3.8  (rafi, 12:18:10)
  * GlusterFS-3.8.3 is scheduled in first week of Sept  (rafi, 12:21:44)

* GlusterFS-3.7  (rafi, 12:22:55)
  * ACTION: kshlm will send out a reminder for 3.7.15 time lines  (rafi,
12:25:23)

* GlusterFS-3.6  (rafi, 12:26:42)
  * LINK:
http://www.gluster.org/pipermail/maintainers/2016-August/001227.html
(ndevos, 12:30:03)
  * 84 bugs for 3.6 still need to be screened  (ndevos, 12:32:55)

* Infra  (rafi, 12:35:06)

* NFS ganesha  (rafi, 12:36:19)

* Gluster samba  (rafi, 12:41:56)

* Last weeks AIs  (rafi, 12:47:55)

* kshlm to setup a time to go through the 3.6 buglist one last time
  (everyone should attend).  (rafi, 12:48:22)
  * ACTION: kshlm to send reminder to go through the 3.6 buglist one
last time (everyone should attend).  (rafi, 12:50:23)

* open floor  (rafi, 12:50:50)

* Glusto - libraries have been ported by the QE Automation Team and just
  need your +1s on Glusto to begin configuring upstream and make
  available.  (rafi, 12:51:05)

* * umbrella versioning for glusterfs in bugzilla (i.e. 3.9, not 3.9.0,
  3.9.1, etc.  starting with 3.9 release)  (rafi, 12:52:03)
  * ACTION: kkeithley will send more information to gluster ML's about
changing the bugzilla versioning  to umbrella  (rafi, 12:55:32)

Meeting ended at 13:00:12 UTC.




Action Items

* kshlm will send out a reminder for 3.7.15 time lines
* kshlm to send reminder to go through the 3.6 buglist one last time
  (everyone should attend).
* kkeithley will send more information to gluster ML's about changing
  the bugzilla versioning  to umbrella




Action Items, by person
---
* kkeithley
  * kkeithley will send more information to gluster ML's about changing
the bugzilla versioning  to umbrella
* kshlm
  * kshlm will send out a reminder for 3.7.15 time lines
  * kshlm to send reminder to go through the 3.6 buglist one last time
(everyone should attend).
* **UNASSIGNED**
  * (none)




People Present (lines said)
---
* rafi (110)
* kshlm (31)
* kkeithley (30)
* ndevos (27)
* ira (18)
* post-factum (10)
* zodbot (6)
* skoduri (4)
* glusterbot (4)
* aravindavk (3)
* ankitraj (1)
* msvbhat (1)
* kotreshhr (1)
* ira_ (1)




Generated by `MeetBot`_ 0.1.4

.. _`MeetBot`: http://wiki.debian.org/MeetBot



On 08/17/2016 02:35 PM, Mohammed Rafi K C wrote:
>
>
> Hi all,
>
> The weekly Gluster community meeting is about to take place in three
> hour from now.
>
> Meeting details:
> - location: #gluster-meeting on Freenode IRC
> ( https://webchat.freenode.net/?channels=gluster-meeting
> <https://webchat.freenode.net/?channels=gluster-meeting> )
> - date: every Wednesday
> - time: 12:00 UTC
> (in your terminal, run: date -d "12:00 UTC")
> - agenda: *https://public.pad.fsfe.org/p/gluster-community-meetings**
> 
> * Currently the following items are listed:
> * *GlusterFS 4.0
> * **GlusterFS 3.9*
> * *GlusterFS 3.8*
> * *GlusterFS 3.7
> * **GlusterFS 3.6
> *** Related projects**
> * **Last weeks AIs
> * **Open Floor**
>
> *If you have any topic that need to be discussed, please add to the
> Open Floor section as a sub topic.
> *
> *Appreciate your participation.
>
> Regards,
> Rafi KC

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] REMINDER: Gluster Community meeting (Wednesday 17th Aug 2016)

2016-08-17 Thread Mohammed Rafi K C


Hi all,

The weekly Gluster community meeting is about to take place in three
hour from now.

Meeting details:
- location: #gluster-meeting on Freenode IRC
( https://webchat.freenode.net/?channels=gluster-meeting
 )
- date: every Wednesday
- time: 12:00 UTC
(in your terminal, run: date -d "12:00 UTC")
- agenda: *https://public.pad.fsfe.org/p/gluster-community-meetings**

* Currently the following items are listed:
* *GlusterFS 4.0
* **GlusterFS 3.9*
* *GlusterFS 3.8*
* *GlusterFS 3.7
* **GlusterFS 3.6
*** Related projects**
* **Last weeks AIs
* **Open Floor**

*If you have any topic that need to be discussed, please add to the Open
Floor section as a sub topic.
*
*Appreciate your participation.

Regards,
Rafi KC
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Talks and topics that need presenters (WAS: CFP for Gluster Developer Summit)

2016-08-13 Thread Mohammed Rafi K C


On 08/13/2016 01:29 PM, Niels de Vos wrote:
> In addition to Vijays request to submit talks, I would like to see some
> very specific topics presented/demo'd. Anyone attending the Summit and
> willing to take these on is very much encouraged to do so. To do so,
> reply to this (or Vijays) email with your name and a description of the
> topic.
>
> If others would like to see other topics, please add them to the list.
>
> Many thanks,
> Niels
>
>
> Practical Glusto example
>  - show how to install Glusto and dependencies
>  - write a simple new test-case from scratch (copy/paste example?)
>  - run the new test-case (in the development environment?)
>
> Debugging (large) production deployments
>  - tools that can be used for debugging on non-development systems
>  - filtering logs and other data to identify problems
>  - coming up with the root cause of the problem
>  - reporting a useful bug so that developers can fix it
>
> Making troubleshooting easier
>  - statedumps, how code tracks allocations, how to read the dumps
>  - io-stats, meta and other xlators
>  - useful, actionable log messages

I was planning to give a talk on .meta directory and how that can be
used to debug a live file system. I could also talk about the statedump,
io-stats, etc.

Regards
Rafi KC


>
> Long-Term-Maintenance, Short-Term-Maintanance, releases and backports
>  - explanation of the new release schedule
>  - when/how are releases made, branching, stability phase etc...
>  - the kind of backports that are acceptible and safe for minor updates
>
> Documentation update
>  - new Documentation Tooling based on ASCIIbinder
>  - how to migrate different documentation sites for many of the projects
>  - best location to report issues, submit fixes etc...
>
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Tiered Volumes and Backups

2016-08-07 Thread Mohammed Rafi K C


On 08/07/2016 08:41 AM, Lindsay Mathieson wrote:
> On 7/08/2016 12:34 PM, Lindsay Mathieson wrote:
>> One of the things done every week here is a full image backup of all
>> the VM's (take about 20 hours).

Do you use gluster snapshot feature here ? If so, it won't result in
promotion.

>> I was wondering what the implications are for the tier as every shard
>> will be read sequentially. Will I be seeing mass promotions and
>> demotions into the Tier? this would not be desirable. 
>
> I'm guessing cluster.tier-promote-frequency and
> cluster.read-freq-threshold would be the relevant settings to control
> this?

Yes. You can control the behavior using the options. Also we are
planning to give a pause or stop option for migration daemon so that you
can temporarily suspend migration.

Regards
Rafi KC
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] managing slow drives in cluster

2016-08-03 Thread Mohammed Rafi K C


On 08/02/2016 08:43 PM, Vijay Bellur wrote:
> On 08/01/2016 01:29 AM, Mohammed Rafi K C wrote:
>>
>>
>> On 07/30/2016 10:53 PM, Jay Berkenbilt wrote:
>>> We're using glusterfs in Amazon EC2 and observing certain behavior
>>> involving EBS volumes. The basic situation is that, in some cases,
>>> clients can write data to the file system at a rate such that the
>>> gluster daemon on one or more of the nodes may block in disk wait for
>>> longer than 42 seconds, causing gluster to decide that the brick is
>>> down. In fact, it's not down, it's just slow. I believe it is possible
>>> by looking at certain system data to tell the difference from the
>>> system
>>> with the drive on it between down and working through its queue.
>>>
>>> We are attempting a two-pronged approach to solving this problem:
>>>
>>> 1. We would like to figure out how to tune the system, including either
>>> or both of adjusting kernel parameters or glusterd, to try to avoid
>>> getting the system into the state of having so much data to flush
>>> out to
>>> disk that it blocks in disk wait for such a long time.
>>> 2. We would like to see if we can make gluster more intelligent about
>>> responding to the pings so that the client side is still getting a
>>> response when the remote side is just behind and not down. Though I do
>>> understand that, in some high performance environments, one may want to
>>> consider a disk that's not keeping up to have failed, so this may have
>>> to be a tunable parameter.
>>>
>>> We have a small team that has been working on this problem for a couple
>>> of weeks. I just joined the team on Friday. I am new to gluster, but I
>>> am not at all new to low-level system programming, Linux
>>> administration,
>>> etc. I'm very much open to the possibility of digging into the gluster
>>> code and supplying patches
>>
>> Welcome to Gluster. It is great to see a lot of ideas within days :).
>>
>>
>>>  if we can find a way to adjust the behavior
>>> of gluster to make it behave better under these conditions.
>>>
>>> So, here are my questions:
>>>
>>> * Does anyone have experience with this type of issue who can offer any
>>> suggestions on kernel parameters or gluster configurations we could
>>> play
>>> with? We have several kernel parameters in mind and are starting to
>>> measure their affect.
>>> * Does anyone have any background on how we might be able to tell that
>>> the system is getting itself into this state? Again, we have some ideas
>>> on this already, mostly by using sysstat to monitor stuff, though
>>> ultimately if we find a reliable way to do it, we'd probably code it
>>> directly by looking at the relevant stuff in /proc from our own code. I
>>> don't have the details with me right now.
>>> * Can someone provide any pointers to where in the gluster code the
>>> ping
>>> logic is handled and/or how one might go about making it a little
>>> smarter?
>>
>> One of the user had similar problems where ping packets are queued on
>> waiting list because of a huge traffic. I have a patch which try to
>> solve the issue http://review.gluster.org/#/c/11935/ . Which is under
>> review and might need some more work, but I guess it is worth trying
>>
>
>
> Would it be possible to rebase this patch against the latest master? I
> am interested to see if we still see the pre-commit regression failures.

I will do that shortly .

Rafi KC

>
> Thanks!
> Vijay
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] managing slow drives in cluster

2016-08-03 Thread Mohammed Rafi K C


On 08/02/2016 08:22 PM, Jay Berkenbilt wrote:
> So we managed to work around the behavior by setting
>
> sysctl -w vm.dirty_bytes=5000
> sysctl -w vm.dirty_background_bytes=2500
>
> In our environment with our specific load testing, this prevents the
> disk flush from taking longer than gluster's timeout and avoids the
> whole problem with gluster timing out. We haven't finished our
> performance testing, but initial results suggest that it is no worse
> than the performance we had with our previous home-grown solution. In
> our previous home grown solution, we had a fuse layer that was calling
> fsync() on every megabyte written as soon as there were 10 megabytes
> worth of requests in the queue, which was effectively emulating in user
> code what these kernel parameters do but with even smaller numbers.

Great to know you managed to workaround it.

>
> Thanks for the note below about the potential patch. I applied this to
> 3.8.1 with the fix based on the code review comment and have that in my
> back pocket in case we need it, but we're going to try with just the
> kernel tuning for now. These parameters are decent for us anyway
> because, for other reasons based on the nature of our application and
> certain customer requirements, we want to keep the amount of dirty data
> really low.
>
> It looks like the code review has been idle for some time. Any reason?
> It looks like a simple and relatively obvious change (not to take
> anything away from it at all, and I really appreciate the pointer). Is
> there anything potentially unsafe about it? Like are there some cases
> where not always appending to the queue could cause damage to data if
> the test wasn't exactly right or wasn't doing exactly what it was
> expecting? If I were to run our load test against the patch, it wouldn't
> catch anything like that because we don't actually look at the content
> of the data written in our load test.

I believe it will not result in any kind of data loss scenario, but we
can have more discussion and reviews on this area. I got reviews from
Shyam, and will continue the discussion in gerrit as a patch comment.

We would be very happy to have your valuable suggestions for the patch
or for any other solution .

> In any case, if the kernel tuning
> doesn't completely solve the problem for us, I may pull this out and do
> some more rigorous testing against it. If I do, I can comment on the
> code change.

Great. I will rebase the patch so that if needed you can cleanly apply
yo master code.

>
> For now, unless I post otherwise, we're considering our specific problem
> to be resolved, though I believe there remains a potential weakness in
> gluster's ability to report that it is still up in the case of a slower
> disk write speed on one of the nodes.
>
> --Jay
>
> On 08/01/2016 01:29 AM, Mohammed Rafi K C wrote:
>> On 07/30/2016 10:53 PM, Jay Berkenbilt wrote:
>>> We're using glusterfs in Amazon EC2 and observing certain behavior
>>> involving EBS volumes. The basic situation is that, in some cases,
>>> clients can write data to the file system at a rate such that the
>>> gluster daemon on one or more of the nodes may block in disk wait for
>>> longer than 42 seconds, causing gluster to decide that the brick is
>>> down. In fact, it's not down, it's just slow. I believe it is possible
>>> by looking at certain system data to tell the difference from the system
>>> with the drive on it between down and working through its queue.
>>>
>>> We are attempting a two-pronged approach to solving this problem:
>>>
>>> 1. We would like to figure out how to tune the system, including either
>>> or both of adjusting kernel parameters or glusterd, to try to avoid
>>> getting the system into the state of having so much data to flush out to
>>> disk that it blocks in disk wait for such a long time.
>>> 2. We would like to see if we can make gluster more intelligent about
>>> responding to the pings so that the client side is still getting a
>>> response when the remote side is just behind and not down. Though I do
>>> understand that, in some high performance environments, one may want to
>>> consider a disk that's not keeping up to have failed, so this may have
>>> to be a tunable parameter.
>>>
>>> We have a small team that has been working on this problem for a couple
>>> of weeks. I just joined the team on Friday. I am new to gluster, but I
>>> am not at all new to low-level system programming, Linux administration,
>>> etc. I'm very much open to the possibility of digging into the gluster
>>> code and supplying patches
>> Welcome 

Re: [Gluster-users] managing slow drives in cluster

2016-07-31 Thread Mohammed Rafi K C


On 07/30/2016 10:53 PM, Jay Berkenbilt wrote:
> We're using glusterfs in Amazon EC2 and observing certain behavior
> involving EBS volumes. The basic situation is that, in some cases,
> clients can write data to the file system at a rate such that the
> gluster daemon on one or more of the nodes may block in disk wait for
> longer than 42 seconds, causing gluster to decide that the brick is
> down. In fact, it's not down, it's just slow. I believe it is possible
> by looking at certain system data to tell the difference from the system
> with the drive on it between down and working through its queue.
>
> We are attempting a two-pronged approach to solving this problem:
>
> 1. We would like to figure out how to tune the system, including either
> or both of adjusting kernel parameters or glusterd, to try to avoid
> getting the system into the state of having so much data to flush out to
> disk that it blocks in disk wait for such a long time.
> 2. We would like to see if we can make gluster more intelligent about
> responding to the pings so that the client side is still getting a
> response when the remote side is just behind and not down. Though I do
> understand that, in some high performance environments, one may want to
> consider a disk that's not keeping up to have failed, so this may have
> to be a tunable parameter.
>
> We have a small team that has been working on this problem for a couple
> of weeks. I just joined the team on Friday. I am new to gluster, but I
> am not at all new to low-level system programming, Linux administration,
> etc. I'm very much open to the possibility of digging into the gluster
> code and supplying patches

Welcome to Gluster. It is great to see a lot of ideas within days :).


>  if we can find a way to adjust the behavior
> of gluster to make it behave better under these conditions.
>
> So, here are my questions:
>
> * Does anyone have experience with this type of issue who can offer any
> suggestions on kernel parameters or gluster configurations we could play
> with? We have several kernel parameters in mind and are starting to
> measure their affect.
> * Does anyone have any background on how we might be able to tell that
> the system is getting itself into this state? Again, we have some ideas
> on this already, mostly by using sysstat to monitor stuff, though
> ultimately if we find a reliable way to do it, we'd probably code it
> directly by looking at the relevant stuff in /proc from our own code. I
> don't have the details with me right now.
> * Can someone provide any pointers to where in the gluster code the ping
> logic is handled and/or how one might go about making it a little smarter?

One of the user had similar problems where ping packets are queued on
waiting list because of a huge traffic. I have a patch which try to
solve the issue http://review.gluster.org/#/c/11935/ . Which is under
review and might need some more work, but I guess it is worth trying

If your interested, you can try it out and let me know whether it solve
the issue or not. What the patch does is, it consider PING packets as
the most prioritized packets, and will add into the beginning of ioq
list (list which contains packet to be send via wire) .


I might have missed some important points from the long mail ;). I'm
sorry, I was too lazy to read it completely :).

Regards
Rafi KC

> * Does my description of what we're dealing with suggest that we're just
> missing something obvious? I jokingly asked the team whether they had
> remembered to run glusterd with the --make-it-fast flag, but sometimes
> there are solutions almost like that that we just overlook.
>
> For what it's worth, we're running gluster 3.8 on CentOS 7 in EC2. We
> see the problem the most strongly when using general purpose (gp2) EBS
> volumes on higher performance but non-EBS optimized volumes where it's
> pretty easy to overload the disk with traffic over the network. We can
> mostly mitigate this by using provisioned I/O volumes or EBS optimized
> volumes on slower instances where the disk outperforms what we can throw
> at it over the network. Yet at our scale, switching to EBS optimization
> would cost hundreds of thousands of dollars a year, and running slower
> instances has obvious drawbacks. In the absence of a "real" solution, we
> will probably end up trying to modify our software to throttle writes to
> disk, but having to modify our software to keep from flooding the file
> system seems like a really sad thing to have to do.
>
> Thanks in advance for any pointers!
>
> --Jay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Clarification about remove-brick

2016-07-31 Thread Mohammed Rafi K C


On 07/29/2016 11:08 PM, Richard Klein (RSI) wrote:
>
> Thank you very much for the information.  I have read the
> documentation link you have listed below and just wanted confirmation
> about the remove-brick process.  I did not see any documentation about
> the ability to use the remove-brick stop command so that is good to know.
>

I'm not sure, how much testing has been done in that area. But we do
have the remove stop command, and theoretically it should work. If you
do rebalance after remove-brick stop, then again the cluster will be
load balanced if files are already migrated as part of the remove brick
operation.


>  
>
> I have a couple of related follow up questions as well:
>
>  
>
> 1)  Let’s say a brick fails (hard drive goes bad).  I want to
> replace the bad drive with a new one and then get it back operational
> in the cluster.  In this case I don’t have spare brick available so I
> can’t do the replace-brick.  The only procedure I have found is a blog
> and bug report for 3.4 about this issue with a work around. Here is
> the link to the blog:
> https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/.  Here
> is the link to the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=991084.  My apologies if
> this has been addressed before but I have searched and can’t find a
> solution where you just replace the bad drive with a good one and
> recover.  Any instructions about this process would be appreciated.
>

If I understand you correctly, you want to replace a drive and create
the brick with same name. If that is the case, patch
http://review.gluster.org/#/c/12250/ will solve the. In fact the patch
is under review. CCing the replace brick experts to give you more info
regarding your replace queries.


> 2)  Does the rebalance process lock data files in Gluster?  We are
> using Gluster as primary storage in Cloudstack 4.7. If we shrink or
> expand the Gluster volume we do a rebalance of the layout and data. 
> However, after this process we have several VMs which have disk
> volumes in a read only state like there were disk problems.  Once we
> power cycle the VM all is well and no data loss occurred but there
> seems to be a correlation between the rebalance and the errors.  I am
> wondering if the rebalance process locks the data somehow and makes it
> unavailable to the VM.
>

Rebalance process will take locks as part of the healing. But I'm not
sure whether it cause VM's to go into read-only state. The locks will be
took  on directory for healing and for files during migration, but only
for a specific offset. So both should not cause an entire file to be
read-only for a long time.

Can you describe your problem more ? I will try to see if there is
anything else causing the issue.


Regards
Rafi KC

>  
>
> Thanks again for the response.
>
>  
>
> Richard Klein
>
> RSI
>
>  
>
>  
>
> *From:*Mohammed Rafi K C [mailto:rkavu...@redhat.com]
> *Sent:* Friday, July 29, 2016 1:38 AM
> *To:* Lenovo Lastname; Richard Klein (RSI); gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Clarification about remove-brick
>
>  
>
> You can also see the documentation here
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#shrinking-volumes
>
>  
>
> Rafi KC
>
> On 07/29/2016 11:39 AM, Mohammed Rafi K C wrote:
>
>  
>
> I will summarize the procedure for removing a brick with description.
>
>  
>
> 1) start an add brick operation using gluster volume remov-brick
> command. This command will mark the mentioned brick as a
> decommissioned brick. Also, this will kick a process that will
> start migrating data from the decommissioned brick to the other
> bricks.
>
> 2) Once the migration is finished you can safely do a remove-brick
> commit.
>
> 3) Or if you wish to stop the process and reset the decommissioned
> brick, you can do remove-brick stop. This will not migrate the
> data back to the decommissioned brick. It will stay in the other
> bricks and the data will be still accessible, if you want to have
> proper load balancing after this, you can start rebalance process.
>
> 4) If you wish to do an instant remove brick you can use force
> option, which will not migrate data, hence your whole data in the
> removed brick will be lost from mount point.
>
> On 07/29/2016 01:25 AM, Lenovo Lastname wrote:
>
> I'm using 3.7.11, this command works with me,
>
>  
>
> !remove-brick
> [root@node2 ~]# gluster volume remove-brick v1 replica 2
> 192.168.3.73:/gfs/b1/v1 force
> Removing brick(s) can result in data loss. Do 

Re: [Gluster-users] Clarification about remove-brick

2016-07-29 Thread Mohammed Rafi K C
You can also see the documentation here
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#shrinking-volumes


Rafi KC

On 07/29/2016 11:39 AM, Mohammed Rafi K C wrote:
>
>
> I will summarize the procedure for removing a brick with description.
>
>
> 1) start an add brick operation using gluster volume remov-brick
> command. This command will mark the mentioned brick as a
> decommissioned brick. Also, this will kick a process that will start
> migrating data from the decommissioned brick to the other bricks.
>
> 2) Once the migration is finished you can safely do a remove-brick commit.
>
> 3) Or if you wish to stop the process and reset the decommissioned
> brick, you can do remove-brick stop. This will not migrate the data
> back to the decommissioned brick. It will stay in the other bricks and
> the data will be still accessible, if you want to have proper load
> balancing after this, you can start rebalance process.
>
> 4) If you wish to do an instant remove brick you can use force option,
> which will not migrate data, hence your whole data in the removed
> brick will be lost from mount point.
>
>
> On 07/29/2016 01:25 AM, Lenovo Lastname wrote:
>> I'm using 3.7.11, this command works with me,
>>
>> !remove-brick
>> [root@node2 ~]# gluster volume remove-brick v1 replica 2
>> 192.168.3.73:/gfs/b1/v1 force
>> Removing brick(s) can result in data loss. Do you want to Continue?
>> (y/n) y
>> volume remove-brick commit force: success
>>
>> Don't know about the commit thingy...
>>
>>
>> On Thursday, July 28, 2016 3:47 PM, Richard Klein (RSI)
>> <rkl...@rsitex.com> wrote:
>>
>>
>> We are using Gluster 3.7.6 in a replica 2 distributed-replicate
>> configuration.  I am wondering when we do a remove-brick with just
>> one brick pair will the data be moved off the bricks once the status
>> show complete and then you do the commit?Also, if you start a
>> remove-brick process can you stop it?  Is there an abort or stop
>> command or do you just don’t do the commit?
>>  
>> Any help would be appreciated.
>>  
>> Richard Klein
>> RSI
>>  
>>  
>>  
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Clarification about remove-brick

2016-07-29 Thread Mohammed Rafi K C

I will summarize the procedure for removing a brick with description.


1) start an add brick operation using gluster volume remov-brick
command. This command will mark the mentioned brick as a decommissioned
brick. Also, this will kick a process that will start migrating data
from the decommissioned brick to the other bricks.

2) Once the migration is finished you can safely do a remove-brick commit.

3) Or if you wish to stop the process and reset the decommissioned
brick, you can do remove-brick stop. This will not migrate the data back
to the decommissioned brick. It will stay in the other bricks and the
data will be still accessible, if you want to have proper load balancing
after this, you can start rebalance process.

4) If you wish to do an instant remove brick you can use force option,
which will not migrate data, hence your whole data in the removed brick
will be lost from mount point.


On 07/29/2016 01:25 AM, Lenovo Lastname wrote:
> I'm using 3.7.11, this command works with me,
>
> !remove-brick
> [root@node2 ~]# gluster volume remove-brick v1 replica 2
> 192.168.3.73:/gfs/b1/v1 force
> Removing brick(s) can result in data loss. Do you want to Continue?
> (y/n) y
> volume remove-brick commit force: success
>
> Don't know about the commit thingy...
>
>
> On Thursday, July 28, 2016 3:47 PM, Richard Klein (RSI)
>  wrote:
>
>
> We are using Gluster 3.7.6 in a replica 2 distributed-replicate
> configuration.  I am wondering when we do a remove-brick with just one
> brick pair will the data be moved off the bricks once the status show
> complete and then you do the commit?Also, if you start a
> remove-brick process can you stop it?  Is there an abort or stop
> command or do you just don’t do the commit?
>  
> Any help would be appreciated.
>  
> Richard Klein
> RSI
>  
>  
>  
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Need a way to display and flush gluster cache ?

2016-07-28 Thread Mohammed Rafi K C


On 07/28/2016 07:56 PM, Niels de Vos wrote:
> On Thu, Jul 28, 2016 at 05:58:15PM +0530, Mohammed Rafi K C wrote:
>>
>> On 07/27/2016 04:33 PM, Raghavendra G wrote:
>>>
>>> On Wed, Jul 27, 2016 at 10:29 AM, Mohammed Rafi K C
>>> <rkavu...@redhat.com <mailto:rkavu...@redhat.com>> wrote:
>>>
>>> Thanks for your feedback.
>>>
>>> In fact meta xlator is loaded only on fuse mount, is there any
>>> particular reason to not to use meta-autoload xltor for nfs server
>>> and libgfapi ?
>>>
>>>
>>> I think its because of lack of resources. I am not aware of any
>>> technical reason for not using on NFSv3 server and gfapi.
>> Cool. I will try to see how we can implement meta-autoliad feature for
>> nfs-server and libgfapi. Once we have the feature in place, I will
>> implement the cache memory display/flush feature using meta xlators.
> In case you plan to have this ready in a month (before the end of
> August), you should propose it as a 3.9 feature. Click the "Edir this
> page on GitHub" link on the bottom of
> https://www.gluster.org/community/roadmap/3.9/ :)
I will do an assessment and will see if I can spend some time on this
for 3.9 release. If so, I will add it into 3.9 feature page.

Regards
Rafi KC

>
> Thanks,
> Niels
>
>
>> Thanks for your valuable feedback.
>> Rafi KC
>>
>>>  
>>>
>>> Regards
>>>
>>> Rafi KC
>>>
>>> On 07/26/2016 04:05 PM, Niels de Vos wrote:
>>>> On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
>>>>>     On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai <p...@redhat.com> 
>>>>> <mailto:p...@redhat.com> wrote:
>>>>>> +1 to option (2) which similar to echoing into 
>>>>>> /proc/sys/vm/drop_caches
>>>>>>
>>>>>>  -Prashanth Pai
>>>>>>
>>>>>> - Original Message -
>>>>>>> From: "Mohammed Rafi K C" <rkavu...@redhat.com> 
>>>>>>> <mailto:rkavu...@redhat.com>
>>>>>>> To: "gluster-users" <gluster-users@gluster.org> 
>>>>>>> <mailto:gluster-users@gluster.org>, "Gluster Devel" 
>>>>>>> <gluster-de...@gluster.org> <mailto:gluster-de...@gluster.org>
>>>>>>> Sent: Tuesday, 26 July, 2016 10:44:15 AM
>>>>>>> Subject: [Gluster-devel] Need a way to display and flush gluster 
>>>>>>> cache ?
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Gluster stack has it's own caching mechanism , mostly on client 
>>>>>>> side.
>>>>>>> But there is no concrete method to see how much memory are 
>>>>>>> consuming by
>>>>>>> gluster for caching and if needed there is no way to flush the 
>>>>>>> cache memory.
>>>>>>>
>>>>>>> So my first question is, Do we require to implement this two 
>>>>>>> features
>>>>>>> for gluster cache?
>>>>>>>
>>>>>>>
>>>>>>> If so I would like to discuss some of our thoughts towards it.
>>>>>>>
>>>>>>> (If you are not interested in implementation discussion, you can 
>>>>>>> skip
>>>>>>> this part :)
>>>>>>>
>>>>>>> 1) Implement a virtual xattr on root, and on doing setxattr, flush 
>>>>>>> all
>>>>>>> the cache, and for getxattr we can print the aggregated cache size.
>>>>>>>
>>>>>>> 2) Currently in gluster native client support .meta virtual 
>>>>>>> directory to
>>>>>>> get meta data information as analogues to proc. we can implement a
>>>>>>> virtual file inside the .meta directory to read  the cache size. 
>>>>>>> Also we
>>>>>>> can flush the cache using a special write into the file, (similar to
>>>>>>> echoing into proc file) . This approach may be difficult to 
>>>>>>> implement in
>>>>>>> other clients.
>>>>> +1 for making use of the meta-xlator. We should be mak

Re: [Gluster-users] [Gluster-devel] Need a way to display and flush gluster cache ?

2016-07-28 Thread Mohammed Rafi K C


On 07/27/2016 04:33 PM, Raghavendra G wrote:
>
>
> On Wed, Jul 27, 2016 at 10:29 AM, Mohammed Rafi K C
> <rkavu...@redhat.com <mailto:rkavu...@redhat.com>> wrote:
>
> Thanks for your feedback.
>
> In fact meta xlator is loaded only on fuse mount, is there any
> particular reason to not to use meta-autoload xltor for nfs server
> and libgfapi ?
>
>
> I think its because of lack of resources. I am not aware of any
> technical reason for not using on NFSv3 server and gfapi.

Cool. I will try to see how we can implement meta-autoliad feature for
nfs-server and libgfapi. Once we have the feature in place, I will
implement the cache memory display/flush feature using meta xlators.

Thanks for your valuable feedback.
Rafi KC

>  
>
> Regards
>
> Rafi KC
>
> On 07/26/2016 04:05 PM, Niels de Vos wrote:
>> On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
>>> On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai <p...@redhat.com> 
>>> <mailto:p...@redhat.com> wrote:
>>>> +1 to option (2) which similar to echoing into /proc/sys/vm/drop_caches
>>>>
>>>>  -Prashanth Pai
>>>>
>>>> - Original Message -
>>>>> From: "Mohammed Rafi K C" <rkavu...@redhat.com> 
>>>>> <mailto:rkavu...@redhat.com>
>>>>> To: "gluster-users" <gluster-users@gluster.org> 
>>>>> <mailto:gluster-users@gluster.org>, "Gluster Devel" 
>>>>> <gluster-de...@gluster.org> <mailto:gluster-de...@gluster.org>
>>>>> Sent: Tuesday, 26 July, 2016 10:44:15 AM
>>>>> Subject: [Gluster-devel] Need a way to display and flush gluster 
>>>>> cache ?
>>>>>
>>>>> Hi,
>>>>>
>>>>> Gluster stack has it's own caching mechanism , mostly on client side.
>>>>> But there is no concrete method to see how much memory are consuming 
>>>>> by
>>>>> gluster for caching and if needed there is no way to flush the cache 
>>>>> memory.
>>>>>
>>>>> So my first question is, Do we require to implement this two features
>>>>> for gluster cache?
>>>>>
>>>>>
>>>>> If so I would like to discuss some of our thoughts towards it.
>>>>>
>>>>> (If you are not interested in implementation discussion, you can skip
>>>>> this part :)
>>>>>
>>>>> 1) Implement a virtual xattr on root, and on doing setxattr, flush all
>>>>> the cache, and for getxattr we can print the aggregated cache size.
>>>>>
>>>>> 2) Currently in gluster native client support .meta virtual directory 
>>>>> to
>>>>> get meta data information as analogues to proc. we can implement a
>>>>> virtual file inside the .meta directory to read  the cache size. Also 
>>>>> we
>>>>> can flush the cache using a special write into the file, (similar to
>>>>> echoing into proc file) . This approach may be difficult to implement 
>>>>> in
>>>>> other clients.
>>> +1 for making use of the meta-xlator. We should be making more use of 
>>> it.
>> Indeed, this would be nice. Maybe this can also expose the memory
>> allocations like /proc/slabinfo.
>>
>> The io-stats xlator can dump some statistics to
>> /var/log/glusterfs/samples/ and /var/lib/glusterd/stats/ . That seems to
>> be acceptible too, and allows to get statistics from server-side
>> processes without involving any clients.
>>
>> HTH,
>> Niels
>>
>>
>>>>> 3) A cli command to display and flush the data with ip and port as an
>>>>> argument. GlusterD need to send the op to client from the connected
>>>>> client list. But this approach would be difficult to implement for
>>>>> libgfapi based clients. For me, it doesn't seems to be a good option.
>>>>>
>>>>> Your suggestions and comments are most welcome.
>>>>>
>>>>> Thanks to Talur and Poornima for their suggestions.
>>>>>
>>>>> Regards
>>>>>
>>>>> Rafi KC
>>>>>
>>>>> 

Re: [Gluster-users] [Gluster-devel] Need a way to display and flush gluster cache ?

2016-07-26 Thread Mohammed Rafi K C
Thanks for your feedback.

In fact meta xlator is loaded only on fuse mount, is there any
particular reason to not to use meta-autoload xltor for nfs server and
libgfapi ?

Regards

Rafi KC

On 07/26/2016 04:05 PM, Niels de Vos wrote:
> On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
>> On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai <p...@redhat.com> wrote:
>>> +1 to option (2) which similar to echoing into /proc/sys/vm/drop_caches
>>>
>>>  -Prashanth Pai
>>>
>>> - Original Message -
>>>> From: "Mohammed Rafi K C" <rkavu...@redhat.com>
>>>> To: "gluster-users" <gluster-users@gluster.org>, "Gluster Devel" 
>>>> <gluster-de...@gluster.org>
>>>> Sent: Tuesday, 26 July, 2016 10:44:15 AM
>>>> Subject: [Gluster-devel] Need a way to display and flush gluster cache ?
>>>>
>>>> Hi,
>>>>
>>>> Gluster stack has it's own caching mechanism , mostly on client side.
>>>> But there is no concrete method to see how much memory are consuming by
>>>> gluster for caching and if needed there is no way to flush the cache 
>>>> memory.
>>>>
>>>> So my first question is, Do we require to implement this two features
>>>> for gluster cache?
>>>>
>>>>
>>>> If so I would like to discuss some of our thoughts towards it.
>>>>
>>>> (If you are not interested in implementation discussion, you can skip
>>>> this part :)
>>>>
>>>> 1) Implement a virtual xattr on root, and on doing setxattr, flush all
>>>> the cache, and for getxattr we can print the aggregated cache size.
>>>>
>>>> 2) Currently in gluster native client support .meta virtual directory to
>>>> get meta data information as analogues to proc. we can implement a
>>>> virtual file inside the .meta directory to read  the cache size. Also we
>>>> can flush the cache using a special write into the file, (similar to
>>>> echoing into proc file) . This approach may be difficult to implement in
>>>> other clients.
>> +1 for making use of the meta-xlator. We should be making more use of it.
> Indeed, this would be nice. Maybe this can also expose the memory
> allocations like /proc/slabinfo.
>
> The io-stats xlator can dump some statistics to
> /var/log/glusterfs/samples/ and /var/lib/glusterd/stats/ . That seems to
> be acceptible too, and allows to get statistics from server-side
> processes without involving any clients.
>
> HTH,
> Niels
>
>
>>>> 3) A cli command to display and flush the data with ip and port as an
>>>> argument. GlusterD need to send the op to client from the connected
>>>> client list. But this approach would be difficult to implement for
>>>> libgfapi based clients. For me, it doesn't seems to be a good option.
>>>>
>>>> Your suggestions and comments are most welcome.
>>>>
>>>> Thanks to Talur and Poornima for their suggestions.
>>>>
>>>> Regards
>>>>
>>>> Rafi KC
>>>>
>>>> ___
>>>> Gluster-devel mailing list
>>>> gluster-de...@gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>> ___
>>> Gluster-devel mailing list
>>> gluster-de...@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> ___
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is there a way to manage data location manually?

2016-07-05 Thread Mohammed Rafi K C


On 07/01/2016 07:11 AM, Joe Julian wrote:
> Isn't that what tiering is for?

Yes, I believe tiering would be a great use case for such scenarios.

Regards
Rafi KC

>
> On June 30, 2016 4:54:42 PM PDT, Serg Gulko  wrote:
>
> Hello! 
>
> We are running purely distributed(no replication) gluster storage. 
> Is there a way to "bind" files to certain brick? Reason why I need
> it is very simple - I prefer to keep most recent data on more
> faster storage pods and offload stale files into slower pods(read
> - less expensive). 
>
> I tried to copy files behind gluster back into target bricks
> directory but expectantly failed. 
>
> Serg
>
> 
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add hot tier brick on tiered volume

2016-06-16 Thread Mohammed Rafi K C
Hi Vincent,

We are working on supporting add and remove brick on a tiered volume.
One patch is up in master branch towards this direction [1], which is
still under review.

Till then , the work around to scale tier volume is to detach the tier,
and then scale your cold tier, then reattach the hot tier after adding
the bricks to hot tier.

Let me know if you have further queries

[1] : http://review.gluster.org/13365

Regards
Rafi KC

On 06/16/2016 01:51 PM, Vincent Miszczak wrote:
>
> Hello,
>
>
> Playing with Gluster 3.7(lastest), I would like to be able to add
> bricks to a hot tier.
>
>
> Looks like not possible for now :
>
> /volume attach-tier: failed: Volume vol is already a tier./
>
> /
> /
>
> I've seen similar request here :
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1229237
>
> It's 3.1 and 2015
>
>
> Anyone working on this must have feature  (at least to me) ?
>
> You know, data is growing, SSD are cheap and you want a bunch of them.
>
>
> Vincent
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Many logs (errors?) on client → memory problem

2016-06-10 Thread Mohammed Rafi K C


On 06/10/2016 02:41 PM, Yannick Perret wrote:
> I get no feedback on that but I think I found the problem:
> the glusterfs client grows on memory until no memory available and
> them it crashes.

If we you can take a statedump (kill -SIGUSR1 $client_pid) and send it
across, I can take a look to see where it consumes so many memory. Since
you said it is not reproducible with latest Debian system and If it is
not important, that is fine for me.


>
> I performed the same operations on an other machine without being able
> to reproduce the problem.
> The machine with the problem is an old machine (debian, 3.2.50 kernel,
> 32bit), whereas the other machine is an up-to-date debian 64bit.
>
> To give some stats the glusterfs on the client starts with less than
> 810220 of resident size and finished with 3055336 (3Go!) when it
> crashes again. The volume was mounted only on this machine, used by
> only one process (a 'cp -Rp').
>
> Running the same from a recent machine gives far more stable memory
> usage (43364 of resident size and few and small increasing).
> Of course I'm using the same glusterfs version (compiled from sources
> on both machines).
>
> As I can't upgrade this old machine due to version compatibility with
> old softs − at least until we replace these old softs − I will so use
> a NFS mountpoint from the gluster servers.
>
> Whatever I still get on the recent machine very verbose logs for each
> directory creation:
> [2016-06-10 08:35:12.965438] I
> [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
> 0-HOME-LIRIS-dht: chunk size = 0x / 2064114 = 0x820
> [2016-06-10 08:35:12.965473] I
> [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
> 0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
> HOME-LIRIS-replicate-0
> [2016-06-10 08:35:12.966987] I [MSGID: 109036]
> [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
> 0-HOME-LIRIS-dht: Setting layout of /log_apache_error with
> [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
> 4294967295 ],

This is an INFO level message which says about the layout of a
directory. Gluster-fuse client will print this INFO when it sets the
layout on a directory. This error messages can be safely ignore.


>
> I switched clients to WARNING log level (gluster volume set HOME-LIRIS
> diagnostics.client-sys-log-level WARNING) which is fine for me.
> But maybe WARNING should be the default log level, at least for
> clients, no? In production getting 3 lines per created directory is
> useless, and anyone who wants to analyze a problem will switch to INFO
> or DEBUG.

I see many uses get panic about this error message. I agree, we have to
do something with this log entry's.

>
> Regards,
> --
> Y.
>
>
>
> Le 08/06/2016 17:35, Yannick Perret a écrit :
>> Hello,
>>
>> I have a replica 2 volume managed on 2 identical server, using 3.6.7
>> version of gluster. Here is the volume info:
>> Volume Name: HOME-LIRIS
>> Type: Replicate
>> Volume ID: 47b4b856-371b-4b8c-8baa-2b7c32d7bb23
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: sto1.mydomain:/glusterfs/home-liris/data
>> Brick2: sto2.mydomain:/glusterfs/home-liris/data
>>
>> It is mounted on a (single) client with mount -t glusterfs
>> sto1.mydomain:/HOME-LIRIS /futur-home/
>>
>> I started to copy a directory (~550Go, ~660 directories with many
>> files) into it. Copy was done using 'cp -Rp'.
>>
>> It seems to work fine but I get *many* log entries in the
>> corresponding mountpoint logs:
>> [2016-06-07 14:01:27.587300] I
>> [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
>> 0-HOME-LIRIS-dht: chunk size = 0x / 2064114 = 0x820
>> [2016-06-07 14:01:27.587338] I
>> [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
>> 0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
>> HOME-LIRIS-replicate-0
>> [2016-06-07 14:01:27.588436] I [MSGID: 109036]
>> [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
>> 0-HOME-LIRIS-dht: Setting layout of /olfamine with [Subvol_name:
>> HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
>>
>> This is repeated for many files (124088 exactly). Is it normal? If
>> yes I use default settings on the client so I find it a little bit
>> verbose. If no can someone tell me what is the problem here?
>>
>> Moreover at the end of the log file I have:
>> [2016-06-08 04:42:58.210617] A [MSGID: 0]
>> [mem-pool.c:110:__gf_calloc] : no memory available for size (14651)
>> [call stack follows]
>> [2016-06-08 04:42:58.219060] A [MSGID: 0]
>> [mem-pool.c:134:__gf_malloc] : no memory available for size (21026)
>> [call stack follows]
>> pending frames:
>> frame : type(1) op(CREATE)
>> frame : type(1) op(CREATE)
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.com/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2016-06-08 04:42:58
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 

[Gluster-users] Weekly Community Meeting - 01/June/2016

2016-06-06 Thread Mohammed Rafi K C
The meeting minutes for this weeks meeting are available at

Minutes:
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-01/weekly_community_meeting_01june2016.2016-06-01-12.00.html
Minutes (text) :
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-01/weekly_community_meeting_01june2016.2016-06-01-12.00.txt
Log:
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-01/weekly_community_meeting_01june2016.2016-06-01-12.00.log.html

Meeting summary
---
* Rollcall  (rafi1, 12:00:58)

* Next weeks meeting host  (rafi1, 12:04:35)

* GlusterFS 4.0  (rafi1, 12:07:30)

* GlusterFS 3.8  (rafi1, 12:11:47)
  * HELP: needed to review the backport patches  (rafi1, 12:13:48)
  * LINK:

http://review.gluster.org/#/q/status:open+project:glusterfs+branch:release-3.8
(rafi1, 12:13:55)
  * requesting to feature owners to give their release notes here in
https://public.pad.fsfe.org/p/glusterfs-3.8-release-notes  (rafi1,
12:16:11)
  * LINK:

https://bugzilla.redhat.com/showdependencytree.cgi?id=glusterfs-3.8.0_resolved=1
(rafi1, 12:19:03)
  * LINK:

https://meetbot.fedoraproject.org/gluster-meeting/2016-05-18/weekly_community_meeting_18may2016.2016-05-18-12.06.log.html
(post-factum, 12:20:01)
  * LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1337130   (rafi1,
12:24:07)

* GlusterFS 3.7  (rafi1, 12:26:13)
  * tentative date for 3.7.12 is on June 9th  (rafi1, 12:30:11)

* GlusterFS 3.6  (rafi1, 12:30:25)
  * LINK:
http://www.gluster.org/pipermail/gluster-devel/2016-May/049677.html
(anoopcs, 12:30:38)
  * targeted release for next 3.6 version ie 3.6.10 will be around 26 of
this month  (rafi1, 12:37:42)

* GlusterFS 3.5  (rafi1, 12:38:04)

* NFS Ganesha  (rafi1, 12:39:32)

* Samba  (rafi1, 12:41:11)

* samba  (rafi1, 12:45:16)

* AI from last week  (rafi1, 12:48:48)

* kshlm/csim/nigelb to set up faux/pseudo user email for gerrit,
  bugzilla, github  (rafi1, 12:49:17)
  * ACTION: kshlm/csim to set up faux/pseudo user email for gerrit,
bugzilla, github  (rafi1, 12:52:27)

* aravinda/amye to check on some blog posts being distorted on
  blog.gluster.org, josferna's post in particular  (rafi1, 12:52:48)
  * ACTION: aravindavk/amye to check on some blog posts being distorted
on blog.gluster.org, josferna's post in particular  (rafi1,
12:54:47)

* hagarth to announce release strategy after getting inputs from amye
  (rafi1, 12:55:19)
  * LINK:
http://www.gluster.org/pipermail/gluster-devel/2016-May/049677.html
(rafi1, 12:55:44)

* kshlm to check with reported of 3.6 leaks on backport need  (rafi1,
  12:56:49)
  * ACTION: kshlm to check with reported of 3.6 leaks on backport need
(rafi1, 12:57:40)

* rastar to look at 3.6 builds failures on BSD  (rafi1, 12:57:51)
  * ACTION: rastar to look at 3.6 builds failures on BSD  (rafi1,
12:58:25)

* Open floor  (rafi1, 12:58:48)

Meeting ended at 13:01:04 UTC.




Action Items

* kshlm/csim to set up faux/pseudo user email for gerrit, bugzilla,
  github
* aravindavk/amye to check on some blog posts being distorted on
  blog.gluster.org, josferna's post in particular
* kshlm to check with reported of 3.6 leaks on backport need
* rastar to look at 3.6 builds failures on BSD




Action Items, by person
---
* **UNASSIGNED**
  * kshlm/csim to set up faux/pseudo user email for gerrit, bugzilla,
github
  * aravindavk/amye to check on some blog posts being distorted on
blog.gluster.org, josferna's post in particular
  * kshlm to check with reported of 3.6 leaks on backport need
  * rastar to look at 3.6 builds failures on BSD




People Present (lines said)
---
* rafi1 (96)
* jiffin (26)
* post-factum (18)
* kkeithley (6)
* anoopcs (4)
* olia (4)
* zodbot (3)
* nigelb (1)
* karthik___ (1)
* glusterbot (1)
* overclk (1)
* partner (1)
* rjoseph (1)





___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] [Gluster-devel] REMINDER: Weekly Gluster Community meeting starts in ~15mnts

2016-06-01 Thread Mohammed Rafi K C
Hi all,

The weekly Gluster community meeting is starting in 1 hour at 12:00 UTC.
The current agenda for the meeting is below. Add any further topics to
the agenda at https://public.pad.fsfe.org/p/gluster-community-meetings

Meeting details:
- location: #gluster-meeting on Freenode IRC
- date: every Wednesday
- time: 8:00 EDT, 12:00 UTC, 13:00 CET, 17:30 IST
(in your terminal, run: date -d "12:00 UTC")

Current Agenda:
 * Roll Call
 * AIs from last meeting
 * GlusterFS 3.7
 * GlusterFS 3.6
 * GlusterFS 3.5
 * GlusterFS 3.8
 * GlusterFS 3.9
 * GlusterFS 4.0
 * Open Floor

See you there,
Rafi



___
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] SSD tier experimenting

2016-05-05 Thread Mohammed Rafi K C


On 05/05/2016 05:11 PM, Sergei Hanus wrote:
> Dan, looks like you are right - after some time I see performance
> around equal for both aforementioned volume.
You mean Rafi ;)

>
> Do you have any idea, how long should it take to normalize? Or, maybe,
> some commands or hints to monitor the process?

It depends on the size and your volume tree (structure of files and
directories). In the latest 3.7 release, we will set an xattr on root of
the bricks.

>
> And, another question - when I modify some volume option (like, enable
> io-cache) - again, should I wait for some time or it take effect
> immediately? 
> Do I need to restart volume, or these settings are applied on-the-fly?

Yes it will be applied in the fly. You don't need to do anything to
apply a normal volume set option.


>
>  Thank you.
>
> Sergei.
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] SSD tier experimenting

2016-05-05 Thread Mohammed Rafi K C


On 05/05/2016 03:59 PM, Sergei Hanus wrote:
> Hi, Dan, Mohammed.
> Sorry for the long pause, was busy with another project.
>
> I was able to test the situation again - and looks like you are right.
> When I create non-tiered volume, put load on it and then attach tier -
> performance gets worse.
> Speaking numbers:
>
> I use fio for testing, fio --direct=1 --rw=randrw --norandommap
> --randrepeat=0 --ioengine=sync --bs=4k --rwmixread=70 --iodepth=4
> --numjobs=8 --runtime=20 --group_reporting --size=2G --name=randwrite
> --refill_buffers
>
> I get around 140 iops with non-tiered disk, and around 70 iops - when
> adding tier.

As I mentioned in previous mail, there are couple of things related to
metedata, that tiering does for first time after attaching, Eventually
you should get an ideal performance benefit close to that 900 iops.



>
>
> When I create tiered volume from the very beginning, the same load
> profile gives performance around 900 iops
>
> Also, I'm planning to experiment with io-cache parameter in the
> future, to see how it changes the behavior. If you are interested - I
> could share results.

great, I would be very much inerested to see this. Also other community
users can also benefit.

>
> Sergei.
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


  1   2   >