from:"Serkan Çoban"

Re: [Gluster-users] multi petabyte gluster dispersed for archival?

2020-02-13 Thread Serkan Çoban

I was using EC configuration 16+4 with 40 servers, each server has
68x10TB JBOD disks.
Glusterfs is mounted to 1000 hadoop datanodes, we were using glusterfs
as hadoop archive.
When a disk fails we did not loose write speed but read speed slows
down too much. I used glusterfs at the edge and it served its purpose.
Beware that metadata operations will be much slower as ec size increases.

Now hdfs also has EC so we replaced it with hdfs.
Our servers had 2x12 core cpus with 256GB RAM each with 2x10G bonded
interfaces. CPU is used heavily during reconstruction.

On Thu, Feb 13, 2020 at 8:44 PM Douglas Duckworth
 wrote:
>
> Replication would be better yes but HA isn't a hard requirement whereas the 
> most likely loss of a brick would be power.  In that case we could stop the 
> entire file system then bring the brick back up should users complain about 
> poor I/O performance.
>
> Could you share more about your configuration at that time?  What CPUs were 
> you running on bricks, number of spindles per brick, etc?
>
> --
> Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: d...@med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
>
> 
> From: Serkan Çoban 
> Sent: Thursday, February 13, 2020 12:38 PM
> To: Douglas Duckworth 
> Cc: gluster-users@gluster.org 
> Subject: [EXTERNAL] Re: [Gluster-users] multi petabyte gluster dispersed for 
> archival?
>
> Do not use EC with small files. You cannot tolerate losing a 300TB
> brick, reconstruction will take ages. When I was using glusterfs
> reconstruction speed of ec was 10-15MB/sec. If you do not loose bricks
> you will be ok.
>
> On Thu, Feb 13, 2020 at 7:38 PM Douglas Duckworth
>  wrote:
> >
> > Hello
> >
> > I am thinking of building a Gluster file system for archival data.  
> > Initially it will start as 6 brick dispersed volume then expand to 
> > distributed dispersed as we increase capacity.
> >
> > Since metadata in Gluster isn't centralized it will eventually not perform 
> > well at scale.  So I am wondering if anyone can help identify that point?  
> > Ceph can scale to extremely high levels though the complexity required for 
> > management seems much greater than Gluster.
> >
> > The first six bricks would be a little over 2PB of raw space.  Each server 
> > will have 24 7200 RPM NL-SAS drives sans RAID.  I estimate we would max out 
> > at about 100 million files within these first six servers, though that can 
> > be reduced by having users tar their small files before writing to Gluster. 
> >   I/O patterns would be sequential upon initial copy with very infrequent 
> > reads thereafter.  Given the demands of erasure coding, especially if we 
> > lose a brick, the CPUs will be high thread count AMD Rome.  The back-end 
> > network would be EDR Infiniband, so I will mount via RDMA, while all bricks 
> > will be leaf local.
> >
> > Given these variables can anyone say whether Gluster would be able to 
> > operate at this level of metadata and continue to scale?  If so where could 
> > it break, 4PB, 12PB, with that being defined as I/O, with all bricks still 
> > online, breaking down dramatically?
> >
> > Thank you!
> > Doug
> >
> > --
> > Thanks,
> >
> > Douglas Duckworth, MSc, LFCS
> > HPC System Administrator
> > Scientific Computing Unit
> > Weill Cornell Medicine
> > E: d...@med.cornell.edu
> > O: 212-746-6305
> > F: 212-746-8690
> >
> > 
> >
> > Community Meeting Calendar:
> >
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968=DwIFaQ=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s=2Fzhh_78OGspKQpl_e-CbhH6xUjnRkaqPFUS2wTJ2cw=SsvW0KsQAhI5SQf6z4WQde56D5y5zBm3wJkCyyiVj6E=-tl_YiEBCYUEm7rzTkbvmTck0LsAurEd9DJaq8v5-fc=
> >
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968=DwIFaQ=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s=2Fzhh_78OGspKQpl_e-CbhH6xUjnRkaqPFUS2wTJ2cw=SsvW0KsQAhI5SQf6z4WQde56D5y5zBm3wJkCyyiVj6E=-tl_YiEBCYUEm7rzTkbvmTck0LsAurEd9DJaq8v5-fc=
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers=DwIFaQ=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s=2Fzhh_78OGspKQpl_e-CbhH6xUjnRkaqPFUS2wTJ2cw=SsvW0KsQAhI5SQf6z4WQde56D5y5zBm3wJkCyyiVj6E=i7jvEHb-wZksUurCWq828kigRsSxfrAiNWxT7ORcgFs=


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] multi petabyte gluster dispersed for archival?

2020-02-13 Thread Serkan Çoban

Do not use EC with small files. You cannot tolerate losing a 300TB
brick, reconstruction will take ages. When I was using glusterfs
reconstruction speed of ec was 10-15MB/sec. If you do not loose bricks
you will be ok.

On Thu, Feb 13, 2020 at 7:38 PM Douglas Duckworth
 wrote:
>
> Hello
>
> I am thinking of building a Gluster file system for archival data.  Initially 
> it will start as 6 brick dispersed volume then expand to distributed 
> dispersed as we increase capacity.
>
> Since metadata in Gluster isn't centralized it will eventually not perform 
> well at scale.  So I am wondering if anyone can help identify that point?  
> Ceph can scale to extremely high levels though the complexity required for 
> management seems much greater than Gluster.
>
> The first six bricks would be a little over 2PB of raw space.  Each server 
> will have 24 7200 RPM NL-SAS drives sans RAID.  I estimate we would max out 
> at about 100 million files within these first six servers, though that can be 
> reduced by having users tar their small files before writing to Gluster.   
> I/O patterns would be sequential upon initial copy with very infrequent reads 
> thereafter.  Given the demands of erasure coding, especially if we lose a 
> brick, the CPUs will be high thread count AMD Rome.  The back-end network 
> would be EDR Infiniband, so I will mount via RDMA, while all bricks will be 
> leaf local.
>
> Given these variables can anyone say whether Gluster would be able to operate 
> at this level of metadata and continue to scale?  If so where could it break, 
> 4PB, 12PB, with that being defined as I/O, with all bricks still online, 
> breaking down dramatically?
>
> Thank you!
> Doug
>
> --
> Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: d...@med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
>
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Max length for filename

2019-01-28 Thread Serkan Çoban

Filename max is 255 bytes, path name max is 4096 bytes.

On Mon, Jan 28, 2019 at 11:33 AM mabi  wrote:
>
> Hello,
>
> I saw this warning today in my fuse mount client log file:
>
> [2019-01-28 06:01:25.091232] W [fuse-bridge.c:565:fuse_entry_cbk] 
> 0-glusterfs-fuse: 530594537: LOOKUP() 
> /data/somedir0/files/-somdir1/dir2/dir3/some super long 
> filename….mp3.TransferId1924513788.part => -1 (File name too long)
>
> and was actually wondering on GlusterFS what is the maximum length for a 
> filename?
>
> I am using GlusterFS 4.1.6.
>
> Regards,
> Mabi
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] usage of harddisks: each hdd a brick? raid?

2019-01-09 Thread Serkan Çoban

We ara also using 10TB disks, heal takes 7-8 days.
You can play with "cluster.shd-max-threads" setting. It is default 1 I
think. I am using it with 4.
Below you can find more info:
https://access.redhat.com/solutions/882233

On Thu, Jan 10, 2019 at 9:53 AM Hu Bert  wrote:
>
> Hi Mike,
>
> > We have similar setup, and I do not test restoring...
> > How many volumes do you have - one volume on one (*3) disk 10 TB in size
> >   - then 4 volumes?
>
> Testing could be quite easy: reset-brick start, then delete
> partition/fs/etc., reset-brick commit force - and then watch.
>
> We only have 1 big volume over all bricks. Details:
>
> Volume Name: shared
> Type: Distributed-Replicate
> Number of Bricks: 4 x 3 = 12
> Brick1: gluster11:/gluster/bricksda1/shared
> Brick2: gluster12:/gluster/bricksda1/shared
> Brick3: gluster13:/gluster/bricksda1/shared
> Brick4: gluster11:/gluster/bricksdb1/shared
> Brick5: gluster12:/gluster/bricksdb1/shared
> Brick6: gluster13:/gluster/bricksdb1/shared
> Brick7: gluster11:/gluster/bricksdc1/shared
> Brick8: gluster12:/gluster/bricksdc1/shared
> Brick9: gluster13:/gluster/bricksdc1/shared
> Brick10: gluster11:/gluster/bricksdd1/shared
> Brick11: gluster12:/gluster/bricksdd1_new/shared
> Brick12: gluster13:/gluster/bricksdd1_new/shared
>
> Didn't think about creating more volumes (in order to split data),
> e.g. 4 volumes with 3*10TB each, or 2 volumes with 6*10TB each.
>
> Just curious: after splitting into 2 or more volumes - would that make
> the volume with the healthy/non-restoring disks better accessable? And
> only the volume with the once faulty and now restoring disk would be
> in a "bad mood"?
>
> > > Any opinions on that? Maybe it would be better to use more servers and
> > > smaller disks, but this isn't possible at the moment.
> > Also interested. We can swap SSDs to HDDs for RAID10, but is it worthless?
>
> Yeah, would be interested in how the glusterfs professionsals deal
> with faulty disks, especially when these are as big as our ones.
>
>
> Thx
> Hubert
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Can glusterd be restarted running on all nodes at once while clients are mounted?

2018-11-25 Thread Serkan Çoban

2500-3000 disks per cluster is maximum usable limit, after that almost
nothing works.
We are using 2700 disk cluster for cold storage with ec.
Be careful on heal operations, i see 1 week /8T heal throughput...
On Sun, Nov 25, 2018 at 6:16 PM Andreas Davour  wrote:
>
> On Sun, 25 Nov 2018, Jeevan Patnaik wrote:
>
> > Hi Andreas,
> >
> > Before rebooting, I have tried some performance tuning inorder to prevent
> > timeout errors. As we have  sufficient  RAM and cpu power,  I have
> > increased transport.listen-backlog in Kernel and syn_backlog and
> > max-connections in Kernel. So, I expected that it won't cause a problem.
> > Also the NFS clients are mounted but not being used. And all the nodes are
> > in same network.
> >
> > My assumption was that some slowness in the beginning can be seen, which
> > will be resolved automatically.
> >
> > Is it still a base idea to have 72 nodes and starting them at once?
>
> The reason for my question is that we have a cluster of 16 nodes, and
> see excessive metadata ops slowdowns, and it seems to be at least
> partially because of the size of the cluster.
>
> /andreas
>
> --
> "economics is a pseudoscience; the astrology of our time"
> Kim Stanley Robinson
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] sharding in glusterfs

2018-09-17 Thread Serkan Çoban

Did you try disperse volume? It may work your workload I think.
We are using disperse volumes for archive workloads with 2GB files and
I did not encounter any problems.
On Mon, Sep 17, 2018 at 1:43 AM Ashayam Gupta
 wrote:
>
> Hi All,
>
> We are currently using glusterfs for storing large files with write-once and 
> multiple concurrent reads, and were interested in understanding one of the 
> features of glusterfs called sharding for our use case.
>
> So far from the talk given by the developer 
> [https://www.youtube.com/watch?v=aAlLy9k65Gw] and the git issue 
> [https://github.com/gluster/glusterfs/issues/290] , we know that it was 
> developed for large VM images as use case and the second link does talk about 
> a more general purpose usage , but we are not clear if there are some issues 
> if used for non-VM image large files [which is the use case for us].
>
> Therefore it would be helpful if we can have some pointers or more 
> information about the more general use-case scenario for sharding and any 
> shortcomings if any , in case we use it for our scenario which is non-VM 
> large files with write-once and multiple concurrent reads.Also it would be 
> very helpful if you can suggest the best approach/settings for our use case 
> scenario.
>
> Thanks
> Ashayam Gupta
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Previously replaced brick not coming up after reboot

2018-08-16 Thread Serkan Çoban

What is your gluster version? There was a bug in 3.10, when you reboot
a node some bricks may not come online but it fixed in later versions.

On 8/16/18, Hu Bert  wrote:
> Hi there,
>
> 2 times i had to replace a brick on 2 different servers; replace went
> fine, heal took very long but finally finished. From time to time you
> have to reboot the server (kernel upgrades), and i've noticed that the
> replaced brick doesn't come up after the reboot. Status after reboot:
>
> gluster volume status
> Status of volume: shared
> Gluster process TCP Port  RDMA Port  Online
> Pid
> --
> Brick gluster11:/gluster/bricksda1/shared   49164 0  Y
> 6425
> Brick gluster12:/gluster/bricksda1/shared   49152 0  Y
> 2078
> Brick gluster13:/gluster/bricksda1/shared   49152 0  Y
> 2478
> Brick gluster11:/gluster/bricksdb1/shared   49165 0  Y
> 6452
> Brick gluster12:/gluster/bricksdb1/shared   49153 0  Y
> 2084
> Brick gluster13:/gluster/bricksdb1/shared   49153 0  Y
> 2497
> Brick gluster11:/gluster/bricksdc1/shared   49166 0  Y
> 6479
> Brick gluster12:/gluster/bricksdc1/shared   49154 0  Y
> 2090
> Brick gluster13:/gluster/bricksdc1/shared   49154 0  Y
> 2485
> Brick gluster11:/gluster/bricksdd1/shared   49168 0  Y
> 7897
> Brick gluster12:/gluster/bricksdd1_new/shared  49157 0  Y
> 7632
> Brick gluster13:/gluster/bricksdd1_new/shared  N/A   N/AN
>  N/A
> Self-heal Daemon on localhost   N/A   N/AY
> 25483
> Self-heal Daemon on gluster13   N/A   N/AY
> 2463
> Self-heal Daemon on gluster12   N/A   N/AY
> 17619
>
> Task Status of Volume shared
> --
> There are no active volume tasks
>
> Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log
> message after reboot in glusterd.log:
>
> [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv]
> 0-management: readv on
> /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No
> data available)
> [2018-08-16 05:22:52.987648] I [MSGID: 106005]
> [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
> Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from
> glusterd.
> [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind]
> (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e]
> (--> /usr/lib/x86_64-
> linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e]
> (--> /usr/lib/x86_64-linu
> x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8]
> ) 0-management: force
> d unwinding frame type(brick operations) op(--(4)) called at
> 2018-08-16 05:22:52.941332 (xid=0x2)
> [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59)
> [0x7fdba4f9ce59]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b)
> [0x7fdbaa39122b]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3)
> [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I
> nvalid argument]
> [2018-08-16 05:22:52.988092] E [MSGID: 106060]
> [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error
> setting index on brick status rsp dict
>
> This problem could be related to my previous mail. After executing
> "gluster volume start shared force" the brick comes up, resulting in
> healing the brick (and in high load, too). Is there any possibility to
> track down why this happens and how to ensure that the brick comes up
> at boot?
>
>
> Best regards
> Hubert
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Expanding a distributed disperse volume: some questions about the action plan

2018-06-07 Thread Serkan Çoban

>in order to copy the data from the old volume to the new one, I need a third 
>machine that can mount both the volumes; it's possible? if yes, which gluster 
>client version should I use/install on the "bridge" machine?
Yes it is possible, you need to install old client version to bridge
server, old clients can talk to new servers

Why not just extend the old cluster & rebalance and then upgrade to
latest version of glusterfs?

On Thu, Jun 7, 2018 at 3:18 PM, Mauro Tridici  wrote:
> Dear Users,
>
> just one year ago, we implemented a tier2 storage based on GlusterFS
> v.3.10.5 and we can say that it works like a charm.
> The GlusterFS volume has been created using 3 physical servers (each server
> contains 12 disks, 1 brick per disk).
>
> Below, you can find some information about the gluster volume type
>
> Volume Name: tier2
> Type: Distributed-Disperse
> Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x (4 + 2) = 36
> Transport-type: tcp
>
> Now, it's time to expand this volume so we bought 3 new servers similar to
> the 3 existing servers. Since we would like to upgrade the Gluster version
> to last available one, we would like to know if the following action plan is
> correct and safe:
>
> 1) creation of a new gluster distributed disperse volume (6x(4+2)) using the
> 3 new servers and the new gluster version;
> 2) copy of all the data saved on the old gluster volume to the new one;
> 3) complete erasure of three old servers operating system;
> 4) installation of the last available version of the operating system and
> glusterfs software stack on the three old servers;
> 5) expansion of the volume (created at point nr.1) adding the new bricks
> (created at point nr.4)
>
> If our action plan can be considered "validated", I have an other question:
>
> - in order to copy the data from the old volume to the new one, I need a
> third machine that can mount both the volumes; it's possible? if yes, which
> gluster client version should I use/install on the "bridge" machine?
>
> Thank you very much for your attention.
> Regards,
> Mauro
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] cluster of 3 nodes and san

2018-04-27 Thread Serkan Çoban

>but the Doubt is if I can use glusterfs with a san connected by FC?
Yes, just format the volumes with xfs and ready to go


For a replica in different DC, be careful about latency. What is the
connection between DCs?
It can be doable if latency is low.

On Fri, Apr 27, 2018 at 4:02 PM, Ricky Gutierrez  wrote:
> Hi, any advice?
>
> El mié., 25 abr. 2018 19:56, Ricky Gutierrez 
> escribió:
>>
>> Hi list, I need a little help, I currently have a cluster with vmware
>> and 3 nodes, I have a storage (Dell powervault) connected by FC in
>> redundancy, and I'm thinking of migrating it to proxmox since the
>> maintenance costs are very expensive, but the Doubt is if I can use
>> glusterfs with a san connected by FC? , It is advisable? , I add
>> another data, that in another site I have another cluster with proxmox
>> with another storage connected by FC (Hp eva 6000), and I am also
>> considering using glusterfs, the idea is to be able to replicate one
>> cluster to another and have a replica, but not If I can do the job
>> with glusterfs?
>>
>> I wait for your comments!
>>
>>
>> --
>> rickygm
>>
>> http://gnuforever.homelinux.com
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Fwd: lstat & readlink calls during glusterfsd process startup

2018-04-16 Thread Serkan Çoban

This is an example from one of the glusterfsd processes, strace -f -c
-p pid_of_glusterfsd

%time seconds usecs/call calls   errors syscall
68   36.2   2131 17002 4758  futex
137   5783 1206   epoll_wait
115.4360545   15  select
...

-- Forwarded message --
From: Serkan Çoban <cobanser...@gmail.com>
Date: Mon, Apr 16, 2018 at 9:20 AM
Subject: lstat & readlink calls during glusterfsd process startup
To: Gluster Users <gluster-users@gluster.org>


Hi all,

I am on gluster 3.10.5 with one EC volume 16+4.
One of the machines go down previous night and I just fixed it and powered on.
When glusterfsd processes started they consume all CPU on the server.
strace shows every process goes over in bricks directory and do a
lstat & readlink calls.
Each brick directory is 8TB, %60 full. I waited for 24 hours for it to
finish but it did not.
I stopped glusterd and restarted it but same thing happens again. Why
on startup glusterfsd processes traverse brick directory? Is it
related to self heal?

This happened one time before and I somehow prevent it happening with
glusterd stop or some other way I cannot remember right now.

Any thoughts how to solve this issue?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] lstat & readlink calls during glusterfsd process startup

2018-04-16 Thread Serkan Çoban

Hi all,

I am on gluster 3.10.5 with one EC volume 16+4.
One of the machines go down previous night and I just fixed it and powered on.
When glusterfsd processes started they consume all CPU on the server.
strace shows every process goes over in bricks directory and do a
lstat & readlink calls.
Each brick directory is 8TB, %60 full. I waited for 24 hours for it to
finish but it did not.
I stopped glusterd and restarted it but same thing happens again. Why
on startup glusterfsd processes traverse brick directory? Is it
related to self heal?

This happened one time before and I somehow prevent it happening with
glusterd stop or some other way I cannot remember right now.

Any thoughts how to solve this issue?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Tune and optimize dispersed cluster

2018-04-03 Thread Serkan Çoban

>Is there a way to tune, optimize a dispersed cluster to make it
run better with small read/writes?

You should not run disperse volumes with small IO.
2+1 disperse is gaining only %25 over replica 3 with arbiter...

On Tue, Apr 3, 2018 at 10:25 AM, Marcus Pedersén  wrote:
> Hi all,
> I have setup a dispersed cluster (2+1), version 3.12.
> The way our users run I guessed that we would get the penalties
> with dispersed cluster and I was right
> A calculation that usually takes about 48 hours (on a replicaited cluster),
> now took about 60 hours.
> There is alot of "small" reads/writes going on in these programs.
>
> Is there a way to tune, optimize a dispersed cluster to make it
> run better with small read/writes?
>
> Many thansk in advance!
>
> Best regards
> Marcus Pedersén
>
>
> --
> **
> * Marcus Pedersén*
> * System administrator   *
> **
> * Interbull Centre   *
> *    *
> * Department of Animal Breeding & Genetics — SLU *
> * Box 7023, SE-750 07*
> * Uppsala, Sweden*
> **
> * Visiting address:  *
> * Room 55614, Ulls väg 26, Ultuna*
> * Uppsala*
> * Sweden *
> **
> * Tel: +46-(0)18-67 1962 *
> **
> **
> * ISO 9001 Bureau Veritas No SE004561-1  *
> **
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster performance / Dell Idrac enterprise conflict

2018-02-26 Thread Serkan Çoban

I don't think it is related with iDRAC itself but some configuration
is wrong or there is some hw error.
Did you check battery of raid controller? Do you use disks in jbod
mode or raid mode?

On Mon, Feb 26, 2018 at 6:12 PM, Ryan Wilkinson <ryanw...@gmail.com> wrote:
> Thanks for the suggestion.  I tried both of these with no difference in
> performance.I have tried several other Dell hosts with Idrac Enterprise and
> getting the same results.  I also tried a new Dell T130 with Idrac express
> and was getting over 700 MB/s.  Any other users had this issues with Idrac
> Enterprise??
>
>
> On Thu, Feb 22, 2018 at 12:16 AM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
>>
>> "Did you check the BIOS/Power settings? They should be set for high
>> performance.
>> Also you can try to boot "intel_idle.max_cstate=0" kernel command line
>> option to be sure CPUs not entering power saving states.
>>
>> On Thu, Feb 22, 2018 at 9:59 AM, Ryan Wilkinson <ryanw...@gmail.com>
>> wrote:
>> >
>> >
>> > I have a 3 host gluster replicated cluster that is providing storage for
>> > our
>> > RHEV environment.  We've been having issues with inconsistent
>> > performance
>> > from the VMs depending on which Hypervisor they are running on.  I've
>> > confirmed throughput to be ~9Gb/s to each of the storage hosts from the
>> > hypervisors.  I'm getting ~300MB/s disk read spead when our test vm is
>> > on
>> > the slow Hypervisors and over 500 on the faster ones.  The performance
>> > doesn't seem to be affected much by the cpu, memory that are in the
>> > hypervisors.  I have tried a couple of really old boxes and got over 500
>> > MB/s.  The common thread seems to be that the poorly perfoming hosts all
>> > have Dell's Idrac 7 Enterprise.  I have one Hypervisor that has Idrac 7
>> > express and it performs well.  We've compared system packages and
>> > versions
>> > til we're blue in the face and have been struggling with this for a
>> > couple
>> > months but that seems to be the only common denominator.  I've tried on
>> > one
>> > of those Idrac 7 hosts to disable the nic, virtual drive, etc, etc. but
>> > no
>> > change in performance.  In addition, I tried 5 new hosts and all are
>> > complying to the Idrac enterprise theory.  Anyone else had this issue?!
>> >
>> >
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS Ganesha HA w/ GlusterFS

2018-02-25 Thread Serkan Çoban

I would like to see the steps for reference, can you provide a link or
just post them on mail list?

On Mon, Feb 26, 2018 at 4:29 AM, TomK  wrote:
> Hey Guy's,
>
> A success story instead of a question.
>
> With your help, managed to get the HA component working with HAPROXY and
> keepalived to build a fairly resilient NFS v4 VM cluster.  ( Used Gluster,
> NFS Ganesha v2.60, HAPROXY, keepalived w/ selinux enabled )
>
> If someone needs or it could help your work, please PM me for the written up
> post or I could just post here if the lists allow it.
>
> Cheers,
> Tom
>
>
>
> On 2/19/2018 12:25 PM, TomK wrote:
>>
>> On 2/19/2018 12:09 PM, Kaleb S. KEITHLEY wrote:
>> Sounds good and no problem at all.  Will look out for this update in the
>> future.  In the meantime, three's a few things I'll try including your
>> suggestion.
>>
>> Was looking for a sense of direction with the projects and now you've
>> given that.  Ty.  Appreciated!
>>
>> Cheers,
>> Tom
>>
>>
>>> On 02/19/2018 11:37 AM, TomK wrote:

 On 2/19/2018 10:55 AM, Kaleb S. KEITHLEY wrote:
 Yep, I noticed a couple of pages including this for 'storhaug
 configuration' off google.  Adding 'mailing list' to the search didn't
 help alot:

 https://sourceforge.net/p/nfs-ganesha/mailman/message/35929089/

 https://www.spinics.net/lists/gluster-users/msg33018.html

 Hence the ask here.  storhaug feels like it's not moving with any sort
 of update now.

 Any plans to move back to the previous NFS Ganesha HA model with
 upcoming GlusterFS versions as a result?
>>>
>>>
>>> No.
>>>
>>> (re)writing or finishing storhaug has been on my plate ever since the
>>> guy who was supposed to do it didn't.
>>>
>>> I have lots of other stuff to do too. All I can say is it'll get done
>>> when it gets done.
>>>
>>>

 In the meantime I'll look to cobble up the GlusterFS 3.10 packages and
 try with those per your suggestion.

 What's your thoughts on using HAPROXY / keepalived w/ NFS Ganesha and
 GlusterFS?  Anyone tried this sort of combination?  I want to avoid the
 situation where I have to remount clients as a result of a node failing.
   In other words, avoid this situation:

 [root@yes01 ~]# cd /n
 -bash: cd: /n: Stale file handle
 [root@yes01 ~]#

 Cheers,
 Tom

> On 02/19/2018 10:24 AM, TomK wrote:
>>
>> On 2/19/2018 2:39 AM, TomK wrote:
>> + gluster users as well.  Just read another post on the mailing lists
>> about a similar ask from Nov which didn't really have a clear answer.
>
>
> That's funny because I've answered questions like this several times.
>
> Gluster+Ganesha+Pacemaker-based HA is available up to GlusterFS 3.10.x.
>
> If you need HA, that is one "out of the box" option.
>
> There's support for using CTDB in Samba for Ganesha HA, and people have
> used it successfully with Gluster+Ganesha.
>
>>
>> Perhaps there's a way to get NFSv4 work with GlusterFS without NFS
>> Ganesha then?
>
>
> Not that I'm aware of.
>
>>
>> Cheers,
>> Tom
>>
>>> Hey All,
>>>
>>> I've setup GlusterFS on two virtuals and enabled NFS Ganesha on each
>>> node.  ATM the configs are identical between the two NFS Ganesha
>>> hosts. (Probably shouldn't be but I'm just testing things out.)
>>>
>>> I need HA capability and notice these instructions here:
>>>
>>>
>>> http://aravindavkgluster.readthedocs.io/en/latest/Administrator%20Guide/Configuring%20HA%20NFS%20Server/
>>>
>>>
>>>
>>> However I don't have package glusterfs-ganesha available on this
>>> CentOS Linux release 7.4.1708 (Core) and the maintainer's of CentOS 7
>>> haven't uploaded some of the 2.5.x packages yet so I can't use that
>>> version.
>>>
>>> glusterfs-api-3.12.6-1.el7.x86_64
>>> glusterfs-libs-3.12.6-1.el7.x86_64
>>> glusterfs-3.12.6-1.el7.x86_64
>>> glusterfs-fuse-3.12.6-1.el7.x86_64
>>> glusterfs-server-3.12.6-1.el7.x86_64
>>> python2-glusterfs-api-1.1-1.el7.noarch
>>> glusterfs-client-xlators-3.12.6-1.el7.x86_64
>>> glusterfs-cli-3.12.6-1.el7.x86_64
>>>
>>> nfs-ganesha-xfs-2.3.2-1.el7.x86_64
>>> nfs-ganesha-vfs-2.3.2-1.el7.x86_64
>>> nfs-ganesha-2.3.2-1.el7.x86_64
>>> nfs-ganesha-gluster-2.3.2-1.el7.x86_64
>>>
>>> The only high availability packages are the following but they don't
>>> come with any instructions that I can find:
>>>
>>> storhaug.noarch : High-Availability Add-on for NFS-Ganesha and Samba
>>> storhaug-nfs.noarch : storhaug NFS-Ganesha module
>>>
>>> Given that I'm missing that one package above, will configuring using
>>> ganesha-ha.conf still work?  Or should I be looking at another option
>>> alltogether?
>>>
>>> Appreciate any help.  Ty!
>>>
>>
>>
>

Re: [Gluster-users] Gluster performance / Dell Idrac enterprise conflict

2018-02-21 Thread Serkan Çoban

"Did you check the BIOS/Power settings? They should be set for high performance.
Also you can try to boot "intel_idle.max_cstate=0" kernel command line
option to be sure CPUs not entering power saving states.

On Thu, Feb 22, 2018 at 9:59 AM, Ryan Wilkinson  wrote:
>
>
> I have a 3 host gluster replicated cluster that is providing storage for our
> RHEV environment.  We've been having issues with inconsistent performance
> from the VMs depending on which Hypervisor they are running on.  I've
> confirmed throughput to be ~9Gb/s to each of the storage hosts from the
> hypervisors.  I'm getting ~300MB/s disk read spead when our test vm is on
> the slow Hypervisors and over 500 on the faster ones.  The performance
> doesn't seem to be affected much by the cpu, memory that are in the
> hypervisors.  I have tried a couple of really old boxes and got over 500
> MB/s.  The common thread seems to be that the poorly perfoming hosts all
> have Dell's Idrac 7 Enterprise.  I have one Hypervisor that has Idrac 7
> express and it performs well.  We've compared system packages and versions
> til we're blue in the face and have been struggling with this for a couple
> months but that seems to be the only common denominator.  I've tried on one
> of those Idrac 7 hosts to disable the nic, virtual drive, etc, etc. but no
> change in performance.  In addition, I tried 5 new hosts and all are
> complying to the Idrac enterprise theory.  Anyone else had this issue?!
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Are there any issues connecting a Gluster client v3.7.6 to a Gluster server v3.5.6?

2018-02-16 Thread Serkan Çoban

Old clients can talk to new server but it is not recommended to use
newer clients with old server.

On Fri, Feb 16, 2018 at 10:42 PM, Maya Estalilla
 wrote:
> We have been running several Gluster servers using version 3.5.6 for some
> time now without issue. We also have several clients running version 3.5.2
> without issue. But as soon as we added more clients running version 3.7.6,
> we've been running into problems where one directory with hundreds of
> thousands of images just hangs and is unresponsive. Are there any
> incompatibility issues in this scenario?
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] strange hostname issue on volume create command with famous Peer in Cluster state error message

2018-02-06 Thread Serkan Çoban

Did you do gluster peer probe? Check out the documentation:
http://docs.gluster.org/en/latest/Administrator%20Guide/Storage%20Pools/

On Tue, Feb 6, 2018 at 5:01 PM, Ercan Aydoğan  wrote:
> Hello,
>
> i installed glusterfs 3.11.3  version 3 nodes ubuntu 16.04 machine. All
> machines have same /etc/hosts.
>
> node1 hostname
> pri.ostechnix.lan
>
> node2 hostname
> sec.ostechnix.lan
>
> node2 hostname
> third.ostechnix.lan
>
>
> 51.15.77.14 pri.ostechnix.lan pri
> 51.15.90.60  sec.ostechnix.lan sec
> 163.172.151.120  third.ostechnix.lan   third
>
> volume create command is
>
> root@pri:/var/log/glusterfs# gluster volume create myvol1 replica 2
> transport tcp pri.ostechnix.lan:/gluster/brick1/mpoint1
> sec.ostechnix.lan:/gluster/brick1/mpoint1 force
> Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to
> avoid this. See:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
> Do you still want to continue?
>  (y/n) y
> volume create: myvol1: failed: Host pri.ostechnix.lan is not in 'Peer in
> Cluster' state
>
> node 1 glusterd.log is here
>
> root@pri:/var/log/glusterfs# cat glusterd.log
> [2018-02-06 13:28:37.638373] W [glusterfsd.c:1331:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f0232faa6ba]
> -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55e17938a8c5]
> -->/usr/sbin/glusterd(cleanup_and_exit+0x54) [0x55e17938a6e4] ) 0-: received
> signum (15), shutting down
> [2018-02-06 13:29:41.260479] I [MSGID: 100030] [glusterfsd.c:2476:main]
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.11.3
> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
> [2018-02-06 13:29:41.284367] I [MSGID: 106478] [glusterd.c:1422:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2018-02-06 13:29:41.284462] I [MSGID: 106479] [glusterd.c:1469:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2018-02-06 13:29:41.300804] W [MSGID: 103071]
> [rdma.c:4591:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2018-02-06 13:29:41.300969] W [MSGID: 103055] [rdma.c:4898:init]
> 0-rdma.management: Failed to initialize IB Device
> [2018-02-06 13:29:41.301098] W [rpc-transport.c:350:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2018-02-06 13:29:41.301190] W [rpcsvc.c:1660:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2018-02-06 13:29:41.301214] E [MSGID: 106243] [glusterd.c:1693:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2018-02-06 13:29:44.621889] E [MSGID: 101032]
> [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/glusterd.info. [No such file or directory]
> [2018-02-06 13:29:44.621967] E [MSGID: 101032]
> [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/glusterd.info. [No such file or directory]
> [2018-02-06 13:29:44.621971] I [MSGID: 106514]
> [glusterd-store.c:2215:glusterd_restore_op_version] 0-management: Detected
> new install. Setting op-version to maximum : 31100
> [2018-02-06 13:29:44.625749] I [MSGID: 106194]
> [glusterd-store.c:3772:glusterd_store_retrieve_missed_snaps_list]
> 0-management: No missed snaps list.
> Final graph:
> +--+
>   1: volume management
>   2: type mgmt/glusterd
>   3: option rpc-auth.auth-glusterfs on
>   4: option rpc-auth.auth-unix on
>   5: option rpc-auth.auth-null on
>   6: option rpc-auth-allow-insecure on
>   7: option transport.socket.listen-backlog 128
>   8: option event-threads 1
>   9: option ping-timeout 0
>  10: option transport.socket.read-fail-log off
>  11: option transport.socket.keepalive-interval 2
>  12: option transport.socket.keepalive-time 10
>  13: option transport-type rdma
>  14: option working-directory /var/lib/glusterd
>  15: end-volume
>  16:
> +--+
> [2018-02-06 13:29:44.628451] I [MSGID: 101190]
> [event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2018-02-06 13:46:38.530154] I [MSGID: 106487]
> [glusterd-handler.c:1484:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
> [2018-02-06 13:47:05.745357] I [MSGID: 106487]
> [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd: Received
> CLI probe req sec.ostechnix.lan 24007
> [2018-02-06 13:47:05.746465] I [MSGID: 106129]
> [glusterd-handler.c:3623:glusterd_probe_begin] 0-glusterd: Unable to find
> peerinfo for host: sec.ostechnix.lan (24007)
> [2018-02-06 13:47:05.751131] W [MSGID: 106062]
> [glusterd-handler.c:3399:glusterd_transport_inet_options_build] 0-glusterd:
> Failed

Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

2018-02-02 Thread Serkan Çoban

   server43509623.4GB
> 899491 0229064  in progress   16:49:18
> server32703118.0GB
> 701759 8182592  in progress   16:49:27
> server800Bytes
> 327602 0   805  in progress   16:49:18
> server63567223.9GB   
> 1028469 0240810  in progress   16:49:17
> server71   45Bytes
> 53     0 0completed0:03:53
> Estimated time left for rebalance to complete :   359739:51:24
> volume rebalance: home: success
>
>
> Thanks,
>
>
> A.
>
>
>
> On Thursday, 1 February 2018 18:57:17 CET Serkan Çoban wrote:
>> What is server4? You just mentioned server1 and server2 previously.
>> Can you post the output of gluster v status volname
>>
>> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <alessandro@meteo.be> 
>> wrote:
>> > Hi,
>> >
>> >
>> > Thanks. However "gluster v heal volname full" returned the following error
>> > message
>> > Commit failed on server4. Please check log file for details.
>> >
>> > I have checked the log files in /var/log/glusterfs on server4 (by grepping
>> > heal), but did not get any match. What should I be looking for and in
>> > which
>> > log file, please ?
>> >
>> > Note that there is currently a rebalance process running on the volume.
>> >
>> >
>> > Many thanks,
>> >
>> >
>> > A.
>> >
>> > On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote:
>> >> You do not need to reset brick if brick path does not change. Replace
>> >> the brick format and mount, then gluster v start volname force.
>> >> To start self heal just run gluster v heal volname full.
>> >>
>> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <alessandro@meteo.be>
>> >
>> > wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> > My volume home is configured in replicate mode (version 3.12.4) with
>> >> > the
>> >> > bricks server1:/data/gluster/brick1
>> >> > server2:/data/gluster/brick1
>> >> >
>> >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon
>> >> > for
>> >> > that brick on server2, umounted it, reformated it, remounted it and did
>> >> > a>
>> >> >
>> >> >> gluster volume reset-brick home server2:/data/gluster/brick1
>> >> >> server2:/data/gluster/brick1 commit force>
>> >> >
>> >> > I was expecting that the self-heal daemon would start copying data from
>> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
>> >> > server2:/data/gluster/brick1, which it only did for directories, but
>> >> > not
>> >> > for files.
>> >> >
>> >> > For the moment, I launched on the fuse mount point
>> >> >
>> >> >> find . | xargs stat
>> >> >
>> >> > but crawling the whole volume (100 TB) to trigger self-healing of a
>> >> > single
>> >> > brick of 7.4 TB is unefficient.
>> >> >
>> >> > Is there any trick to only self-heal a single brick, either by setting
>> >> > some attributes to its top directory, for example ?
>> >> >
>> >> >
>> >> > Many thanks,
>> >> >
>> >> >
>> >> > Alessandro
>> >> >
>> >> >
>> >> > ___
>> >> > Gluster-users mailing list
>> >> > Gluster-users@gluster.org
>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> > --
>> >
>> >  Dr. Ir. Alessandro Ipe
>> >  Department of Observations Tel. +32 2 373 06 31
>> >  Remote Sensing from Space
>> >  Royal Meteorological Institute
>> >  Avenue Circulaire 3Email:
>> >  B-1180 BrusselsBelgium alessandro@meteo.be
>> >  Web: http://gerb.oma.be
>
>
> --
>
>  Dr. Ir. Alessandro Ipe
>  Department of Observations Tel. +32 2 373 06 31
>  Remote Sensing from Space
>  Royal Meteorological Institute
>  Avenue Circulaire 3Email:
>  B-1180 BrusselsBelgium alessandro@meteo.be
>  Web: http://gerb.oma.be
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

2018-02-01 Thread Serkan Çoban

You do not need to reset brick if brick path does not change. Replace
the brick format and mount, then gluster v start volname force.
To start self heal just run gluster v heal volname full.

On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe  wrote:
> Hi,
>
>
> My volume home is configured in replicate mode (version 3.12.4) with the 
> bricks
> server1:/data/gluster/brick1
> server2:/data/gluster/brick1
>
> server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon for 
> that brick on server2, umounted it, reformated it, remounted it and did a
>> gluster volume reset-brick home server2:/data/gluster/brick1 
>> server2:/data/gluster/brick1 commit force
>
> I was expecting that the self-heal daemon would start copying data from 
> server1:/data/gluster/brick1
> (about 7.4 TB) to the empty server2:/data/gluster/brick1, which it only did 
> for directories, but not for files.
>
> For the moment, I launched on the fuse mount point
>> find . | xargs stat
> but crawling the whole volume (100 TB) to trigger self-healing of a single 
> brick of 7.4 TB is unefficient.
>
> Is there any trick to only self-heal a single brick, either by setting some 
> attributes to its top directory, for example ?
>
>
> Many thanks,
>
>
> Alessandro
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs development library

2018-01-15 Thread Serkan Çoban

You should try libgfapi: https://libgfapi-python.readthedocs.io/en/latest/

On Mon, Jan 15, 2018 at 9:01 PM, Marcin Dulak  wrote:
> Maybe consider extending the functionality of
> http://docs.ansible.com/ansible/latest/gluster_volume_module.html?
>
> Best regards,
>
> Marcin
>
> On Mon, Jan 15, 2018 at 11:53 AM, 陈曦  wrote:
>>
>> I want to write a python script and visual interface to manage glusterfs,
>> such as creating and deleting volumes.This can be easier to manage
>> glusterfs。
>> But,now ,I execute the glusterfs command using python's subprocess.popen
>> function，such as subprocess.Popen(GLUSTER_CMD,
>> shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE)..
>> But this does not feel like a good program, Because it has a serious
>> dependence on the shell
>> Is there a python or c library/function to execute the glusterfs
>> command？So I can develop a better performance, more lightweight script. to
>> achieve this optimization script
>> thanks everyone.
>>
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS healing questions

2017-11-09 Thread Serkan Çoban

Hi,

You can set disperse.shd-max-threads to 2 or 4 in order to make heal
faster. This makes my heal times 2-3x faster.
Also you can play with disperse.self-heal-window-size to read more
bytes at one time, but i did not test it.

On Thu, Nov 9, 2017 at 4:47 PM, Xavi Hernandez  wrote:
> Hi Rolf,
>
> answers follow inline...
>
> On Thu, Nov 9, 2017 at 3:20 PM, Rolf Larsen  wrote:
>>
>> Hi,
>>
>> We ran a test on GlusterFS 3.12.1 with erasurecoded volumes 8+2 with 10
>> bricks (default config,tested with 100gb, 200gb, 400gb bricksizes,10gbit
>> nics)
>>
>> 1.
>> Tests show that healing takes about double the time on healing 200gb vs
>> 100, and abit under the double on 400gb vs 200gb bricksizes. Is this
>> expected behaviour? In light of this would make 6,4 tb bricksizes use ~ 377
>> hours to heal.
>>
>> 100gb brick heal: 18 hours (8+2)
>> 200gb brick heal: 37 hours (8+2) +205%
>> 400gb brick heal: 59 hours (8+2) +159%
>>
>> Each 100gb is filled with 8 x 10mb files (200gb is 2x and 400gb is 4x)
>
>
> If I understand it correctly, you are storing 80.000 files of 10 MB each
> when you are using 100GB bricks, but you double this value for 200GB bricks
> (160.000 files of 10MB each). And for 400GB bricks you create 320.000 files.
> Have I understood it correctly ?
>
> If this is true, it's normal that twice the space requires approximately
> twice the heal time. The healing time depends on the contents of the brick,
> not brick size. The same amount of files should take the same healing time,
> whatever the brick size is.
>
>>
>>
>> 2.
>> Are there any possibility to show the progress of a heal? As per now we
>> run gluster volume heal volume info, but this exit's when a brick is done
>> healing and when we run heal info again the command contiunes showing gfid's
>> until the brick is done again. This gives quite a bad picture of the status
>> of a heal.
>
>
> The output of 'gluster volume heal  info' shows the list of files
> pending to be healed on each brick. The heal is complete when the list is
> empty. A faster alternative if you don't want to see the whole list of files
> is to use 'gluster volume heal  statistics heal-count'. This will
> only show the number of pending files on each brick.
>
> I don't know any other way to track progress of self-heal.
>
>>
>>
>> 3.
>> What kind of config tweaks is recommended for these kind of EC volumes?
>
>
> I usually use the following values (specific only for ec):
>
> client.event-threads 4
> server.event-threads 4
> performance.client-io-threads on
>
> Regards,
>
> Xavi
>
>
>
>>
>>
>>
>> $ gluster volume info
>> Volume Name: test-ec-100g
>> Type: Disperse
>> Volume ID: 0254281d-2f6e-4ac4-a773-2b8e0eb8ab27
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-100/brick
>> Brick2: dn-305:/mnt/test-ec-100/brick
>> Brick3: dn-306:/mnt/test-ec-100/brick
>> Brick4: dn-307:/mnt/test-ec-100/brick
>> Brick5: dn-308:/mnt/test-ec-100/brick
>> Brick6: dn-309:/mnt/test-ec-100/brick
>> Brick7: dn-310:/mnt/test-ec-100/brick
>> Brick8: dn-311:/mnt/test-ec-2/brick
>> Brick9: dn-312:/mnt/test-ec-100/brick
>> Brick10: dn-313:/mnt/test-ec-100/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> Volume Name: test-ec-200
>> Type: Disperse
>> Volume ID: 2ce23e32-7086-49c5-bf0c-7612fd7b3d5d
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-200/brick
>> Brick2: dn-305:/mnt/test-ec-200/brick
>> Brick3: dn-306:/mnt/test-ec-200/brick
>> Brick4: dn-307:/mnt/test-ec-200/brick
>> Brick5: dn-308:/mnt/test-ec-200/brick
>> Brick6: dn-309:/mnt/test-ec-200/brick
>> Brick7: dn-310:/mnt/test-ec-200/brick
>> Brick8: dn-311:/mnt/test-ec-200_2/brick
>> Brick9: dn-312:/mnt/test-ec-200/brick
>> Brick10: dn-313:/mnt/test-ec-200/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> Volume Name: test-ec-400
>> Type: Disperse
>> Volume ID: fe00713a-7099-404d-ba52-46c6b4b6ecc0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-400/brick
>> Brick2: dn-305:/mnt/test-ec-400/brick
>> Brick3: dn-306:/mnt/test-ec-400/brick
>> Brick4: dn-307:/mnt/test-ec-400/brick
>> Brick5: dn-308:/mnt/test-ec-400/brick
>> Brick6: dn-309:/mnt/test-ec-400/brick
>> Brick7: dn-310:/mnt/test-ec-400/brick
>> Brick8: dn-311:/mnt/test-ec-400_2/brick
>> Brick9: dn-312:/mnt/test-ec-400/brick
>> Brick10: dn-313:/mnt/test-ec-400/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> --
>>
>> Regards
>> Rolf Arne Larsen
>> Ops Engineer
>> r...@jottacloud.com
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>

Re: [Gluster-users] Gluster Scale Limitations

2017-10-30 Thread Serkan Çoban

Hi, After ~2500 bricks it takes too much time bricks getting online
after a reboot. So I think ~2500 bricks is an upper limit per cluster.
I have two 40 nodes/19PiB clusters. They have only one big EC volume
and used for backup/archive purpose.

On Tue, Oct 31, 2017 at 12:51 AM, Mayur Dewaikar
 wrote:
> Hi all,
>
> Are there any scale limitations in terms of how many nodes can be in a
> single Gluster Cluster or how much storage capacity can be managed in a
> single cluster? What are some of the large deployments out there that you
> know of?
>
>
>
> Thanks,
>
> Mayur
>
>
>
>
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor gluster performance on large files.

2017-10-30 Thread Serkan Çoban

>Can you please turn OFF client-io-threads as we have seen degradation of
performance with io-threads ON on sequential read/writes, random
read/writes.
May I ask which version is this degradation happened? I tested 3.10 vs 3.12
performance a while ago and saw 2-3x performance lost with 3.12. Is it
because of client-io-threads?

On Mon, Oct 30, 2017 at 1:44 PM, Karan Sandha  wrote:

> Hi Brandon,
>
> Can you please turn OFF client-io-threads as we have seen degradation of
> performance with io-threads ON on sequential read/writes, random
> read/writes. Server event threads is 1 and client event threads are 2 by
> default.
>
> Thanks & Regards
>
> On Fri, Oct 27, 2017 at 12:17 PM, Brandon Bates 
> wrote:
>
>> Hi gluster users,
>> I've spent several months trying to get any kind of high performance out
>> of gluster.  The current XFS/samba array is used for video editing and
>> 300-400MB/s for at least 4 clients is minimum (currently a single windows
>> client gets at least 700/700 for a single client over samba, peaking to 950
>> at times using blackmagic speed test).  Gluster has been getting me as low
>> as 200MB/s when the server can do well over 1000MB/s.  I have really
>> been counting on / touting Gluster as being the way of the future for us
>> .  However I can't justify cutting our performance to a mere 13% of
>> non-gluster speeds.  I've started to reach a give up point and really
>> need some help/hope otherwise I'll just have to migrate the data from
>> server 1 to server 2 just like I've been doing for the last decade. :(
>>
>> If anyone can please help me understand where I might be going wrong it
>> would be absolutely wonderful!
>>
>> Server 1:
>> Single E5-1620 v2
>> Ubuntu 14.04
>> glusterfs 3.10.5
>> 16GB Ram
>> 24 drive array on LSI raid
>> Sustained >1.5GB/s to XFS (77TB)
>>
>> Server 2:
>> Single E5-2620 v3
>> Ubuntu 16.04
>> glusterfs 3.10.5
>> 32GB Ram
>> 36 drive array on LSI raid
>> Sustained >2.5GB/s to XFS (164TB)
>>
>> Speed tests are done with local with single thread (dd) or 4 threads
>> (iozone) using my standard 64k io size to 20G or 5G files (20G for local
>> drives, 5G for gluster) files.
>>
>> Servers have Intel X520-DA2 dual port 10Gbit NICS bonded together with
>> 802.11ad LAG to a Quanta LB6-M switch.  Iperf throughput numbers are single
>> stream >9000Mbit/s
>>
>> Here is my current gluster performance:
>>
>> Single brick on server 1 (server 2 was similar):
>> Fuse mount:
>> 1000MB/s write
>> 325MB/s read
>>
>> Distributed only servers 1+2:
>> Fuse mount on server 1:
>> 900MB/s write iozone 4 streams
>> 320MB/s read iozone 4 streams
>> single stream read 91MB/s @64K, 141MB/s @1M
>> simultaneous iozone 4 stream 5G files
>> Server 1: 1200MB/s write, 200MB/s read
>> Server 2: 950MB/s write, 310MB/s read
>>
>> I did some earlier single brick tests with samba VFS and 3 workstations
>> and got up to 750MB/s write and 800MB/s read aggregate but that's still not
>> good.
>>
>> These are the only volume settings tweaks I have made (after much single
>> box testing to find what actually made a difference):
>> performance.cache-size 1GB   (Default 23MB)
>> performance.client-io-threads on
>> performance.io-thread-count 64
>> performance.read-ahead-page-count   16
>> performance.stat-prefetch on
>> server.event-threads 8 (default?)
>> client.event-threads 8
>>
>> Any help given is appreciated!
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
>
> KARAN SANDHA
>
> QUALITY ENGINEER
>
> Red Hat Bangalore 
>
> ksan...@redhat.comM: 9888009555 IM: Karan on @irc
> 
> TRIED. TESTED. TRUSTED. 
> @redhatnews    Red Hat
>    Red Hat
> 
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster status

2017-10-13 Thread Serkan Çoban

Which disks/bricks are down: gluster v status vol1 | grep " N "
Ongoing heals: gluster v heal info | grep "Number of entries" | grep
-v "Number of entries: 0"


On Fri, Oct 13, 2017 at 12:59 AM, Gandalf Corvotempesta
 wrote:
> How can I show the current state of a gluster cluster, like status,
> replicas down, what is going on and so on ?
>
> Something like /proc/mdstat for raid, where I can see which disks are
> down, if raid is rebuilding,checking, 
>
> Anything similiar in gluster?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] EC 1+2

2017-09-23 Thread Serkan Çoban

m should be power of 2 in m+n where m is data n is redundancy.

On Sat, Sep 23, 2017 at 8:00 PM, Gandalf Corvotempesta
 wrote:
> Already read that.
> Seems that I have to use a multiple of 512, so 512*(3-2) is 512.
>
> Seems fine
>
> Il 23 set 2017 5:00 PM, "Dmitri Chebotarov" <4dim...@gmail.com> ha scritto:
>>
>> Hi
>>
>> Take a look at this link (under “Optimal volumes”), for Erasure Coded
>> volume optimal configuration
>>
>> http://docs.gluster.org/Administrator%20Guide/Setting%20Up%20Volumes/
>>
>> On Sat, Sep 23, 2017 at 10:01 Gandalf Corvotempesta
>>  wrote:
>>>
>>> Is possible to create a dispersed volume 1+2 ? (Almost the same as
>>> replica 3, the same as RAID-6)
>>>
>>> If yes, how many server I have to add in the future to expand the
>>> storage? 1 or 3?
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how to calculate the ideal value for client.event-threads, server.event-threads and performance.io-thread-count?

2017-09-20 Thread Serkan Çoban

Defaults should be fine in your size. In big clusters I usually set
event-threads to 4.

On Mon, Sep 18, 2017 at 10:39 PM, Mauro Tridici  wrote:
>
> Dear All,
>
> I just implemented a (6x(4+2)) DISTRIBUTED DISPERSED gluster (v.3.10) volume
> based on the following hardware:
>
> - 3 gluster servers (each server with 2 CPU 10 cores, 64GB RAM, 12 hard disk
> SAS 12Gb/s, 10GbE storage network)
>
> Is there a way to detect the ideal value for client.event-threads,
> server.event-threads and performance.io-thread-count?
>
> Thank you in advance,
> Mauro Tridici
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how many hosts could be down in a 12x(4+2) distributed dispersed volume?

2017-09-20 Thread Serkan Çoban

If you add bricks to existing volume one host could be down in each
three host group, If you recreate the volume with one brick on each
host, then two random hosts can be tolerated.
Assume s1,s2,s3 are current servers and you add s4,s5,s6 and extend
volume. If any two servers in each group goes down you loose data. If
you chose random two host the probability you loose data will be %20
in this case.
If you recreate volume with s1,s2,s3,s4,s5,s6 with one brick on each
host any random two servers can go down. If you chose random two host
the probability you loose data will be %0 in this case.

On Mon, Sep 18, 2017 at 10:39 PM, Mauro Tridici  wrote:
> Dear All,
>
> I just implemented a (6x(4+2)) DISTRIBUTED DISPERSED gluster (v.3.10) volume 
> based on the following hardware:
>
> - 3 gluster servers (each server with 2 CPU 10 cores, 64GB RAM, 12 hard disk 
> SAS 12Gb/s, 10GbE storage network)
>
> Now, we need to add 3 new servers with the same hardware configuration 
> respecting the current volume topology.
> If I'm right, we will obtain a DITRIBUTED DISPERSED gluster volume with 12 
> subvolumes, each volume will contain (4+2) bricks, that is a [12x(4+2)] 
> volume.
>
> My question is: in the current volume configuration, only 2 bricks per 
> subvolume or one host could be down without losing data. What it will happen 
> in the next configuration? How many hosts could be down without losing data?
>
> Thank you very much.
> Mauro Tridici
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-14 Thread Serkan Çoban

I have the %100 CPU usage issue when I restart a glusterd instance and
I do not have null client errors in log.
The issue was related to number of bricks/servers so I decreased the
brick count in volume. It resolved the problem.


On Thu, Sep 14, 2017 at 9:02 AM, Sam McLeod  wrote:
> Hi Serkan,
>
> I was wondering if you resolved your issue with the high CPU usage and hang 
> after starting gluster?
>
> I'm setting up a 3 server (replica 3, arbiter 1), 300 volume, Gluster 3.12 
> cluster on CentOS 7 and am having what looks to be exactly the same issue as 
> you.
>
> With no volumes created CPU usage / load is normal, but after creating all 
> the volumes even with no data CPU and RAM usage keeps creeping up and the 
> logs are filling up with:
>
> [2017-09-14 05:47:45.447772] E [client_t.c:324:gf_client_ref] 
> (-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
> -->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] 
> -->/lib64/libglusterfs.so.0(gf_client_ref+0x179) [0x7fe3f2cb2e59] ) 
> 0-client_t: null client [Invalid argument]
> [2017-09-14 05:47:45.486593] E [client_t.c:324:gf_client_ref] 
> (-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
> -->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] --
>
> etc...
>
> It's not an overly helpful error message as although it says a null client 
> gave an invalid argument, it doesn't state which client and what the argument 
> was.
>
> I've tried strace and valgrind on glusterd as well as starting glusterd with 
> --debug to no avail.
>
> --
> Sam McLeod
> @s_mcleod
> https://smcleod.net
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.10.5 vs 3.12.0 huge performance loss

2017-09-12 Thread Serkan Çoban

Sorry, I already destroyed 3.12.0 and servers are in production with 3.10.5.

On Tue, Sep 12, 2017 at 12:46 PM, Pranith Kumar Karampuri
<pkara...@redhat.com> wrote:
> Serkan,
> Will it be possible to provide gluster volume profile  info
> output with 3.10.5 vs 3.12.0? That should give us clues about what could be
> happening.
>
> On Tue, Sep 12, 2017 at 1:51 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Hi,
>> Servers are in production with 3.10.5, so I cannot provide 3.12
>> related information anymore.
>> Thanks for help, sorry for inconvenience.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.10.5 vs 3.12.0 huge performance loss

2017-09-12 Thread Serkan Çoban

Hi,
Servers are in production with 3.10.5, so I cannot provide 3.12
related information anymore.
Thanks for help, sorry for inconvenience.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Can I use 3.7.11 server with 3.10.5 client?

2017-09-08 Thread Serkan Çoban

Any suggestions?

On Thu, Sep 7, 2017 at 4:35 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi,
>
> Is it safe to use 3.10.5 client with 3.7.11 server with read-only data
> move operation?
> Client will have 3.10.5 glusterfs-client packages. It will mount one
> volume from 3.7.11 cluster and one from 3.10.5 cluster. I will read
> from 3.7.11 and write to 3.10.5.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Can I use 3.7.11 server with 3.10.5 client?

2017-09-07 Thread Serkan Çoban

Hi,

Is it safe to use 3.10.5 client with 3.7.11 server with read-only data
move operation?
Client will have 3.10.5 glusterfs-client packages. It will mount one
volume from 3.7.11 cluster and one from 3.10.5 cluster. I will read
from 3.7.11 and write to 3.10.5.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.10.5 vs 3.12.0 huge performance loss

2017-09-06 Thread Serkan Çoban

It is sequential write with file size 2GB. Same behavior observed with
3.11.3 too.

On Thu, Sep 7, 2017 at 12:43 AM, Shyam Ranganathan <srang...@redhat.com> wrote:
> On 09/06/2017 05:48 AM, Serkan Çoban wrote:
>>
>> Hi,
>>
>> Just do some ingestion tests to 40 node 16+4EC 19PB single volume.
>> 100 clients are writing each has 5 threads total 500 threads.
>> With 3.10.5 each server has 800MB/s network traffic, cluster total is
>> 32GB/s
>> With 3.12.0 each server has 200MB/s network traffic, cluster total is
>> 8GB/s
>> I did not change any volume options in both configs.
>
>
> I just performed some *basic* IOZone tests on a 6 x (4+2) disperse volume
> and compared this against 3.10.5 and 3.12.0. The tests are no where near
> your capacity, but I do not see anything alarming in the results. (4 server,
> 4 clients, 4 worker thread per client)
>
> I do notice a 6% drop in Sequential and random write performance, and gains
> in the sequential and random reads.
>
> I need to improve the test to do larger files and for a longer duration,
> hence not reporting any numbers as yet.
>
> Tests were against 3.10.5 and then a down server upgrade to 3.12.0 and
> remounting on the clients (after the versions were upgraded there as well).
>
> I guess your test can be characterized as a sequential write workload
> (ingestion of data). What is the average file size being ingested? I can
> mimic something equivalent to that to look at this further.
>
> I would like to ensure there are no evident performance regressions as you
> report.
>
> Shyam
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] 3.10.5 vs 3.12.0 huge performance loss

2017-09-06 Thread Serkan Çoban

Hi,

Just do some ingestion tests to 40 node 16+4EC 19PB single volume.
100 clients are writing each has 5 threads total 500 threads.
With 3.10.5 each server has 800MB/s network traffic, cluster total is 32GB/s
With 3.12.0 each server has 200MB/s network traffic, cluster total is 8GB/s
I did not change any volume options in both configs.

Any thoughts?
Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-05 Thread Serkan Çoban

Ok, I am going for 2x40 server clusters then, thanks for help.

On Tue, Sep 5, 2017 at 4:57 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
>
>
> On Tue, Sep 5, 2017 at 6:13 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Some corrections about the previous mails. Problem does not happen
>> when no volumes created.
>> Problem happens volumes created but in stopped state. Problem also
>> happens when volumes started state.
>> Below is the 5 stack traces taken by 10 min intervals and volumes stopped
>> state.
>
>
> As I mentioned earlier, this is technically not a *hang* . Due to the costly
> handshaking operations for too many bricks from too many nodes, the glusterd
> takes a quite long amount of time to finish the handshake.
>
>>
>>
>> --1--
>> Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)):
>> #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
>> #1  0x7f4146312d57 in gf_timer_proc () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)):
>> #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
>> #1  0x0040643b in glusterfs_sigwaiter ()
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)):
>> #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
>> #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
>> #2  0x7f414632d8fb in pool_sweeper () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)):
>> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f414633fafc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x7f414634d9f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 4 (Thread 0x7f413cba3700 (LWP 104253)):
>> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f414633fafc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x7f414634d9f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 3 (Thread 0x7f413aa48700 (LWP 104255)):
>> #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f413befb99b in hooks_worker () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 2 (Thread 0x7f413a047700 (LWP 104256)):
>> #0  0x7f41462fd43d in dict_lookup_common () from
>> /usr/lib64/libglusterfs.so.0
>> #1  0x7f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
>> #2  0x7f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> #3  0x7f414630024c in dict_set_str () from
>> /usr/lib64/libglusterfs.so.0
>> #4  0x7f413be75f29 in glusterd_add_volume_to_dict () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #5  0x7f413be7647c in glusterd_add_volumes_to_export_dict () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #6  0x7f413be8cedf in glusterd_rpc_friend_add () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #7  0x7f413be4d8f7 in glusterd_ac_friend_add () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #8  0x7f413be4bbb9 in glusterd_friend_sm () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #9  0x7f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk ()
>> from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #10 0x7f413be8d3ee in glusterd_big_locked_cbk () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #11 0x7f41460cfad5 in rpc_clnt_handle_reply () from
>> /usr/lib64/libgfrpc.so.0
>> #12 0x7f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
>> #13 0x7f41460cbd68 in rpc_transport_not

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-05 Thread Serkan Çoban

  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f413befb99b in hooks_worker () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f413a047700 (LWP 104256)):
#0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x7f41462fd44a in dict_lookup_common () from
/usr/lib64/libglusterfs.so.0
#2  0x7f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
#3  0x7f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#4  0x7f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#5  0x7f413bf357fd in gd_add_brick_snap_details_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f413be760df in glusterd_add_volume_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f413be7647c in glusterd_add_volumes_to_export_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f413be8cedf in glusterd_rpc_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f413be4d8f7 in glusterd_ac_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f413be4bbb9 in glusterd_friend_sm () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk ()
from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#12 0x7f413be8d3ee in glusterd_big_locked_cbk () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#13 0x7f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x7f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x7f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x7f413ae8dccd in socket_event_poll_in () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#17 0x7f413ae8effe in socket_event_handler () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#18 0x7f4146362806 in event_dispatch_epoll_worker () from
/usr/lib64/libglusterfs.so.0
#19 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)):
#0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x7f41463622d5 in event_dispatch_epoll () from
/usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

On Mon, Sep 4, 2017 at 5:50 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
>
> On Mon, 4 Sep 2017 at 20:04, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> I have been using a 60 server 1560 brick 3.7.11 cluster without
>> problems for 1 years. I did not see this problem with it.
>> Note that this problem does not happen when I install packages & start
>> glusterd & peer probe and create the volumes. But after glusterd
>> restart.
>>
>> Also note that this still happens without any volumes. So it is not
>> related with brick count I think...
>
>
> The backtrace you shared earlier involves code path where all brick details
> are synced up. So I'd be really interested to see the backtrace of this when
> there are no volumes associated.
>
>>
>>
>> On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee <amukh...@redhat.com>
>> wrote:
>> >
>> >
>> > On Mon, Sep 4, 2017 at 5:28 PM, Serkan Çoban <cobanser...@gmail.com>
>> > wrote:
>> >>
>> >> >1. On 80 nodes cluster, did you reboot only one node or multiple ones?
>> >> Tried both, result is same, but the logs/stacks are from stopping and
>> >> starting glusterd only on one server while others are running.
>> >>
>> >> >2. Are you sure that pstack output was always constantly pointing on
>> >> > strcmp being stuck?
>> >> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
>> >> is from first 5-10 minutes. I will capture stack traces with 10
>> >> minutes waits and send them to you tomorrow. Also with 40 servers It
>> >> stays that way for 5 minutes and then returns to normal.
>> >>
>> >> >3. Are you absolutely sure even after few hours glusterd is stuck at
>> >> > the
>> >> > same point?
>> >> It goes to normal state after 70-80 minutes and I can run cluster
>> >> commands after that. I will check this again to be sure..
>> >
>> >
>> > So this is scalability issue you're hitting with current glusterd's
>> > design.
>> > As I mentioned earlier, peer handshaking can be a really costly
>> > operations
>&g

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Serkan Çoban

I have been using a 60 server 1560 brick 3.7.11 cluster without
problems for 1 years. I did not see this problem with it.
Note that this problem does not happen when I install packages & start
glusterd & peer probe and create the volumes. But after glusterd
restart.

Also note that this still happens without any volumes. So it is not
related with brick count I think...

On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
>
>
> On Mon, Sep 4, 2017 at 5:28 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> >1. On 80 nodes cluster, did you reboot only one node or multiple ones?
>> Tried both, result is same, but the logs/stacks are from stopping and
>> starting glusterd only on one server while others are running.
>>
>> >2. Are you sure that pstack output was always constantly pointing on
>> > strcmp being stuck?
>> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
>> is from first 5-10 minutes. I will capture stack traces with 10
>> minutes waits and send them to you tomorrow. Also with 40 servers It
>> stays that way for 5 minutes and then returns to normal.
>>
>> >3. Are you absolutely sure even after few hours glusterd is stuck at the
>> > same point?
>> It goes to normal state after 70-80 minutes and I can run cluster
>> commands after that. I will check this again to be sure..
>
>
> So this is scalability issue you're hitting with current glusterd's design.
> As I mentioned earlier, peer handshaking can be a really costly operations
> based on you scale the cluster and hence you might experience a huge delay
> in the node bringing up all the services and be operational.
>
>>
>> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee <amukh...@redhat.com>
>> wrote:
>> >
>> >
>> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire <mchan...@redhat.com>
>> > wrote:
>> >>
>> >> Serkan,
>> >> I have gone through other mails in the mail thread as well but
>> >> responding
>> >> to this one specifically.
>> >>
>> >> Is this a source install or an RPM install ?
>> >> If this is an RPM install, could you please install the
>> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
>> >>
>> >> If this is a source install, then you'll need to configure the build
>> >> with
>> >> --enable-debug and reinstall and retry capturing the gdb backtrace.
>> >>
>> >> Having the debuginfo package or a debug build helps to resolve the
>> >> function names and/or line numbers.
>> >> --
>> >> Milind
>> >>
>> >>
>> >>
>> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban <cobanser...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Here you can find 10 stack trace samples from glusterd. I wait 10
>> >>> seconds between each trace.
>> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >>>
>> >>> Content of the first stack trace is here:
>> >>>
>> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> >>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> >>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> >>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> >>> #1  0x0040643b in glusterfs_sigwaiter ()
>> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> >>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> >>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> >>> #2  0x00303f8528fb in pool_sweeper () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> >>> /lib64/libpthread.so.0
>> >&

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Serkan Çoban

>1. On 80 nodes cluster, did you reboot only one node or multiple ones?
Tried both, result is same, but the logs/stacks are from stopping and
starting glusterd only on one server while others are running.

>2. Are you sure that pstack output was always constantly pointing on strcmp 
>being stuck?
It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
is from first 5-10 minutes. I will capture stack traces with 10
minutes waits and send them to you tomorrow. Also with 40 servers It
stays that way for 5 minutes and then returns to normal.

>3. Are you absolutely sure even after few hours glusterd is stuck at the same 
>point?
It goes to normal state after 70-80 minutes and I can run cluster
commands after that. I will check this again to be sure..

On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
>
>
> On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire <mchan...@redhat.com> wrote:
>>
>> Serkan,
>> I have gone through other mails in the mail thread as well but responding
>> to this one specifically.
>>
>> Is this a source install or an RPM install ?
>> If this is an RPM install, could you please install the
>> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
>>
>> If this is a source install, then you'll need to configure the build with
>> --enable-debug and reinstall and retry capturing the gdb backtrace.
>>
>> Having the debuginfo package or a debug build helps to resolve the
>> function names and/or line numbers.
>> --
>> Milind
>>
>>
>>
>> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>>>
>>> Here you can find 10 stack trace samples from glusterd. I wait 10
>>> seconds between each trace.
>>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>>
>>> Content of the first stack trace is here:
>>>
>>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>>> #1  0x0040643b in glusterfs_sigwaiter ()
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>>> #2  0x00303f8528fb in pool_sweeper () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x00303f864afc in syncenv_task () from
>>> /usr/lib64/libglusterfs.so.0
>>> #2  0x00303f8729f0 in syncenv_processor () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x00303f864afc in syncenv_task () from
>>> /usr/lib64/libglusterfs.so.0
>>> #2  0x00303f8729f0 in syncenv_processor () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x7f7a898a099b in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>>> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>>> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Serkan Çoban

i usually change event threads to 4. But those logs are from a default
installation.

On Sun, Sep 3, 2017 at 9:52 PM, Ben Turner <btur...@redhat.com> wrote:
> - Original Message -
>> From: "Ben Turner" <btur...@redhat.com>
>> To: "Serkan Çoban" <cobanser...@gmail.com>
>> Cc: "Gluster Users" <gluster-users@gluster.org>
>> Sent: Sunday, September 3, 2017 2:30:31 PM
>> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
>>
>> ----- Original Message -
>> > From: "Milind Changire" <mchan...@redhat.com>
>> > To: "Serkan Çoban" <cobanser...@gmail.com>
>> > Cc: "Gluster Users" <gluster-users@gluster.org>
>> > Sent: Saturday, September 2, 2017 11:44:40 PM
>> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
>> >
>> > No worries Serkan,
>> > You can continue to use your 40 node clusters.
>> >
>> > The backtrace has resolved the function names and it should be sufficient
>> > to
>> > debug the issue.
>> > Thanks for letting us know.
>> >
>> > We'll post on this thread again to notify you about the findings.
>>
>> One of the things I find interesting is seeing:
>>
>>  #1  0x7f928450099b in hooks_worker () from
>>
>> The "hooks" scripts are usually shell scripts that get run when volumes are
>> started / stopped / etc.  It may be worth looking into what hooks scripts
>> are getting run at shutdown and think about how one of them could hang up
>> the system.  This may be a red herring but I don't see much else going on in
>> the stack trace that I looked at.  The thread with the deepest stack is the
>> hooks worker one, all of the other look to be in some sort of wait / sleep /
>> listen state.
>
> Sorry the hooks call doesn't have the deepest stack, I didn't see the other 
> thread below it.
>
> In the logs I see:
>
> [2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler] 
> 0-transport: EPOLLERR - disconnecting now
>
> You mentioned changing event threads?  Even threads controls the number of 
> epoll listener threads, what did you change it to?  IIRC 2 is the default 
> value.  This may be some sort of race condition?  Just my $0.02.
>
> -b
>
>>
>> -b
>>
>> >
>> >
>> >
>> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
>> > wrote:
>> >
>> >
>> > Hi Milind,
>> >
>> > Anything new about the issue? Can you able to find the problem,
>> > anything else you need?
>> > I will continue with two clusters each 40 servers, so I will not be
>> > able to provide any further info for 80 servers.
>> >
>> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
>> > wrote:
>> > > Hi,
>> > > You can find pstack sampes here:
>> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
>> > >
>> > > Here is the first one:
>> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
>> > > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
>> > > #1 0x00310fe37d57 in gf_timer_proc () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
>> > > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
>> > > #1 0x0040643b in glusterfs_sigwaiter ()
>> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
>> > > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
>> > > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
>> > > #2 0x00310fe528fb in pool_sweeper () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
>> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > > /lib64/libpthread.so.0
>> > > #1 0x00310fe64afc in syncenv_task () from
>> > > /usr/l

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-02 Thread Serkan Çoban

Hi Milind,

Anything new about the issue? Can you able to find the problem,
anything else you need?
I will continue with two clusters each 40 servers, so I will not be
able to provide any further info for 80 servers.

On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi,
> You can find pstack sampes here:
> https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
>
> Here is the first one:
> Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
> #2  0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00310fe729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00310fe729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
> #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f928450099b in hooks_worker () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
> #0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00310fe2244a in dict_lookup_common () from
> /usr/lib64/libglusterfs.so.0
> #2  0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
> #3  0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f928447b0df in glusterd_add_volume_to_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f9284491edf in glusterd_rpc_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f92844528f7 in glusterd_ac_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f9284450bb9 in glusterd_friend_sm () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk ()
> from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f92844923ee in glusterd_big_locked_cbk () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00311020fad5 in rpc_clnt_handle_reply () from 
> /usr/lib64/libgfrpc.so.0
> #14 0x003110210c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00311020bd68 in rpc_transport_notify () from 
> /usr/lib64/libgfrpc.so.0
> #16 0x7f9283492ccd in socket_event_poll_in () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f9283493ffe in socket_event_handler () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00310fe87806 in event_dispatch_epoll_worker () from
> /usr/lib64/libglusterfs.so.0
> #19 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003d998e8bbd

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-01 Thread Serkan Çoban

nfigure the build with
> --enable-debug and reinstall and retry capturing the gdb backtrace.
>
> Having the debuginfo package or a debug build helps to resolve the function
> names and/or line numbers.
> --
> Milind
>
>
>
> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
>>
>> Here you can find 10 stack trace samples from glusterd. I wait 10
>> seconds between each trace.
>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>
>> Content of the first stack trace is here:
>>
>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> #1  0x0040643b in glusterfs_sigwaiter ()
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> #2  0x00303f8528fb in pool_sweeper () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x00303f864afc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x00303f8729f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x00303f864afc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x00303f8729f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f7a898a099b in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> #4  0x00303f82524c in dict_set_str () from
>> /usr/lib64/libglusterfs.so.0
>> #5  0x7f7a898da7fd in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #6  0x7f7a8981b0df in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #7  0x7f7a8981b47c in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #8  0x7f7a89831edf in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #9  0x7f7a897f28f7 in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #10 0x7f7a897f0bb9 in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #11 0x7f7a8984c89a in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #12 0x7f7a898323ee in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
>> /usr/lib64/libgfrpc.so.0
>> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
>> #15 0x00303f40bd68 in rpc_transport_notify () from
>> /usr/lib64/libgfrpc.so.0
>> #16 0x7f7a88a6fccd in ?? () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #17 0x7f7a88a70ffe in ?? () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #18 0x00303f887806 in ?? () from /us

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Serkan Çoban

Hi Gaurav,

Any improvement about the issue?

On Tue, Aug 29, 2017 at 1:57 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> glusterd returned to normal, here is the logs:
> https://www.dropbox.com/s/41jx2zn3uizvr53/80servers_glusterd_normal_status.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 1:47 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>> Here is the logs after stopping all three volumes and restarting
>> glusterd in all nodes. I waited 70 minutes after glusterd restart but
>> it is still consuming %100 CPU.
>> https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0
>>
>>
>> On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav <gya...@redhat.com> wrote:
>>>
>>> I believe logs you have shared logs which consist of create volume followed
>>> by starting the volume.
>>> However, you have mentioned that when a node from 80 server cluster gets
>>> rebooted, glusterd process hangs.
>>>
>>> Could you please provide the logs which led glusterd to hang for all the
>>> cases along with gusterd process utilization.
>>>
>>>
>>> Thanks
>>> Gaurav
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>>>
>>>> Here is the requested logs:
>>>>
>>>> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0
>>>>
>>>>
>>>> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav <gya...@redhat.com> wrote:
>>>> > Till now I haven't found anything significant.
>>>> >
>>>> > Can you send me gluster logs along with command-history-logs for these
>>>> > scenarios:
>>>> >  Scenario1 : 20 servers
>>>> >  Scenario2 : 40 servers
>>>> >  Scenario3:  80 Servers
>>>> >
>>>> >
>>>> > Thanks
>>>> > Gaurav
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban <cobanser...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi Gaurav,
>>>> >> Any progress about the problem?
>>>> >>
>>>> >> On Thursday, August 24, 2017, Serkan Çoban <cobanser...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Thank you Gaurav,
>>>> >>> Here is more findings:
>>>> >>> Problem does not happen using only 20 servers each has 68 bricks.
>>>> >>> (peer probe only 20 servers)
>>>> >>> If we use 40 servers with single volume, glusterd cpu %100 state
>>>> >>> continues for 5 minutes and it goes to normal state.
>>>> >>> with 80 servers we have no working state yet...
>>>> >>>
>>>> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> > I am working on it and will share my findings as soon as possible.
>>>> >>> >
>>>> >>> >
>>>> >>> > Thanks
>>>> >>> > Gaurav
>>>> >>> >
>>>> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
>>>> >>> > <cobanser...@gmail.com>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>>>> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>>>> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>>>> >>> >> Only way to a healthy state is destroy gluster config/rpms,
>>>> >>> >> reinstall
>>>> >>> >> and recreate volumes.
>>>> >>> >>
>>>> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
>>>> >>> >> <cobanser...@gmail.com>
>>>> >>> >> wrote:
>>>> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>>>> >>> >> > seconds between each trace.
>>>> >>> >> >
>>>> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_p

Re: [Gluster-users] glfsheal-v0.log Too many open files

2017-08-30 Thread Serkan Çoban

Hi,
Any clues where I can change open file limit for process writing
glfsheal-v0.log?
Which proccess writes to this file? Is it glusterd or glusterfsd or
another process?

On Tue, Aug 29, 2017 at 3:02 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Sorry, I send the mail to devel group by mistake..
> Any help about the below issue?
>
> On Tue, Aug 29, 2017 at 3:00 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>> Hi,
>>
>> When I run gluster v heal v0 info, it gives "v0: Not able to fetch
>> volfile from glusterd" error message.
>> I see too many open files errors in glfsheal-v0.log file. How can I
>> increase open file limit for glfsheal?
>> I already increased nfile limit in /etc/init.d/glusterd and
>> /etc/init.d/gluserfsd but it did not help.
>>
>> Any suggestions?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glfsheal-v0.log Too many open files

2017-08-29 Thread Serkan Çoban

Sorry, I send the mail to devel group by mistake..
Any help about the below issue?

On Tue, Aug 29, 2017 at 3:00 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi,
>
> When I run gluster v heal v0 info, it gives "v0: Not able to fetch
> volfile from glusterd" error message.
> I see too many open files errors in glfsheal-v0.log file. How can I
> increase open file limit for glfsheal?
> I already increased nfile limit in /etc/init.d/glusterd and
> /etc/init.d/gluserfsd but it did not help.
>
> Any suggestions?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

glusterd returned to normal, here is the logs:
https://www.dropbox.com/s/41jx2zn3uizvr53/80servers_glusterd_normal_status.zip?dl=0


On Tue, Aug 29, 2017 at 1:47 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Here is the logs after stopping all three volumes and restarting
> glusterd in all nodes. I waited 70 minutes after glusterd restart but
> it is still consuming %100 CPU.
> https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav <gya...@redhat.com> wrote:
>>
>> I believe logs you have shared logs which consist of create volume followed
>> by starting the volume.
>> However, you have mentioned that when a node from 80 server cluster gets
>> rebooted, glusterd process hangs.
>>
>> Could you please provide the logs which led glusterd to hang for all the
>> cases along with gusterd process utilization.
>>
>>
>> Thanks
>> Gaurav
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>>
>>> Here is the requested logs:
>>>
>>> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0
>>>
>>>
>>> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav <gya...@redhat.com> wrote:
>>> > Till now I haven't found anything significant.
>>> >
>>> > Can you send me gluster logs along with command-history-logs for these
>>> > scenarios:
>>> >  Scenario1 : 20 servers
>>> >  Scenario2 : 40 servers
>>> >  Scenario3:  80 Servers
>>> >
>>> >
>>> > Thanks
>>> > Gaurav
>>> >
>>> >
>>> >
>>> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban <cobanser...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Gaurav,
>>> >> Any progress about the problem?
>>> >>
>>> >> On Thursday, August 24, 2017, Serkan Çoban <cobanser...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Thank you Gaurav,
>>> >>> Here is more findings:
>>> >>> Problem does not happen using only 20 servers each has 68 bricks.
>>> >>> (peer probe only 20 servers)
>>> >>> If we use 40 servers with single volume, glusterd cpu %100 state
>>> >>> continues for 5 minutes and it goes to normal state.
>>> >>> with 80 servers we have no working state yet...
>>> >>>
>>> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com>
>>> >>> wrote:
>>> >>> >
>>> >>> > I am working on it and will share my findings as soon as possible.
>>> >>> >
>>> >>> >
>>> >>> > Thanks
>>> >>> > Gaurav
>>> >>> >
>>> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
>>> >>> > <cobanser...@gmail.com>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>>> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>>> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>>> >>> >> Only way to a healthy state is destroy gluster config/rpms,
>>> >>> >> reinstall
>>> >>> >> and recreate volumes.
>>> >>> >>
>>> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
>>> >>> >> <cobanser...@gmail.com>
>>> >>> >> wrote:
>>> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>>> >>> >> > seconds between each trace.
>>> >>> >> >
>>> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>> >>> >> >
>>> >>> >> > Content of the first stack trace is here:
>>> >>> >> >
>>> >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> >>> >> > #0  0x003aa5c0f00d in nanosleep () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

Here is the logs after stopping all three volumes and restarting
glusterd in all nodes. I waited 70 minutes after glusterd restart but
it is still consuming %100 CPU.
https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0


On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav <gya...@redhat.com> wrote:
>
> I believe logs you have shared logs which consist of create volume followed
> by starting the volume.
> However, you have mentioned that when a node from 80 server cluster gets
> rebooted, glusterd process hangs.
>
> Could you please provide the logs which led glusterd to hang for all the
> cases along with gusterd process utilization.
>
>
> Thanks
> Gaurav
>
>
>
>
>
>
> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Here is the requested logs:
>>
>> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0
>>
>>
>> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav <gya...@redhat.com> wrote:
>> > Till now I haven't found anything significant.
>> >
>> > Can you send me gluster logs along with command-history-logs for these
>> > scenarios:
>> >  Scenario1 : 20 servers
>> >  Scenario2 : 40 servers
>> >  Scenario3:  80 Servers
>> >
>> >
>> > Thanks
>> > Gaurav
>> >
>> >
>> >
>> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban <cobanser...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Gaurav,
>> >> Any progress about the problem?
>> >>
>> >> On Thursday, August 24, 2017, Serkan Çoban <cobanser...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Thank you Gaurav,
>> >>> Here is more findings:
>> >>> Problem does not happen using only 20 servers each has 68 bricks.
>> >>> (peer probe only 20 servers)
>> >>> If we use 40 servers with single volume, glusterd cpu %100 state
>> >>> continues for 5 minutes and it goes to normal state.
>> >>> with 80 servers we have no working state yet...
>> >>>
>> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com>
>> >>> wrote:
>> >>> >
>> >>> > I am working on it and will share my findings as soon as possible.
>> >>> >
>> >>> >
>> >>> > Thanks
>> >>> > Gaurav
>> >>> >
>> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
>> >>> > <cobanser...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>> >>> >> Only way to a healthy state is destroy gluster config/rpms,
>> >>> >> reinstall
>> >>> >> and recreate volumes.
>> >>> >>
>> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
>> >>> >> <cobanser...@gmail.com>
>> >>> >> wrote:
>> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>> >>> >> > seconds between each trace.
>> >>> >> >
>> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >>> >> >
>> >>> >> > Content of the first stack trace is here:
>> >>> >> >
>> >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> >>> >> > #0  0x003aa5c0f00d in nanosleep () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>> >>&g

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

Here is the requested logs:
https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0


On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav <gya...@redhat.com> wrote:
> Till now I haven't found anything significant.
>
> Can you send me gluster logs along with command-history-logs for these
> scenarios:
>  Scenario1 : 20 servers
>  Scenario2 : 40 servers
>  Scenario3:  80 Servers
>
>
> Thanks
> Gaurav
>
>
>
> On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
>>
>> Hi Gaurav,
>> Any progress about the problem?
>>
>> On Thursday, August 24, 2017, Serkan Çoban <cobanser...@gmail.com> wrote:
>>>
>>> Thank you Gaurav,
>>> Here is more findings:
>>> Problem does not happen using only 20 servers each has 68 bricks.
>>> (peer probe only 20 servers)
>>> If we use 40 servers with single volume, glusterd cpu %100 state
>>> continues for 5 minutes and it goes to normal state.
>>> with 80 servers we have no working state yet...
>>>
>>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com> wrote:
>>> >
>>> > I am working on it and will share my findings as soon as possible.
>>> >
>>> >
>>> > Thanks
>>> > Gaurav
>>> >
>>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban <cobanser...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>>> >> Only way to a healthy state is destroy gluster config/rpms, reinstall
>>> >> and recreate volumes.
>>> >>
>>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban <cobanser...@gmail.com>
>>> >> wrote:
>>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>>> >> > seconds between each trace.
>>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>> >> >
>>> >> > Content of the first stack trace is here:
>>> >> >
>>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>>> >> > #2  0x00303f8528fb in pool_sweeper () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> >> > from
>>> >> > /lib64/libpthread.so.0
>>> >> > #1  0x00303f864afc in syncenv_task () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> >> > from
>>> >> > /lib64/libpthread.so.0
>>> >> > #1  0x00303f864a

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-27 Thread Serkan Çoban

Hi Gaurav,
Any progress about the problem?

On Thursday, August 24, 2017, Serkan Çoban <cobanser...@gmail.com> wrote:

> Thank you Gaurav,
> Here is more findings:
> Problem does not happen using only 20 servers each has 68 bricks.
> (peer probe only 20 servers)
> If we use 40 servers with single volume, glusterd cpu %100 state
> continues for 5 minutes and it goes to normal state.
> with 80 servers we have no working state yet...
>
> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com
> <javascript:;>> wrote:
> >
> > I am working on it and will share my findings as soon as possible.
> >
> >
> > Thanks
> > Gaurav
> >
> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban <cobanser...@gmail.com
> <javascript:;>> wrote:
> >>
> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
> >> Only way to a healthy state is destroy gluster config/rpms, reinstall
> >> and recreate volumes.
> >>
> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban <cobanser...@gmail.com
> <javascript:;>>
> >> wrote:
> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
> >> > seconds between each trace.
> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
> >> >
> >> > Content of the first stack trace is here:
> >> >
> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> >> > #1  0x0040643b in glusterfs_sigwaiter ()
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> >> > #2  0x00303f8528fb in pool_sweeper () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x00303f864afc in syncenv_task () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x00303f864afc in syncenv_task () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> >> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x7f7a898a099b in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> >> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> >> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #3  0x00303f8245f5 in dict_set () fro

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-24 Thread Serkan Çoban

Thank you Gaurav,
Here is more findings:
Problem does not happen using only 20 servers each has 68 bricks.
(peer probe only 20 servers)
If we use 40 servers with single volume, glusterd cpu %100 state
continues for 5 minutes and it goes to normal state.
with 80 servers we have no working state yet...

On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav <gya...@redhat.com> wrote:
>
> I am working on it and will share my findings as soon as possible.
>
>
> Thanks
> Gaurav
>
> On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>> Only way to a healthy state is destroy gluster config/rpms, reinstall
>> and recreate volumes.
>>
>> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>> > Here you can find 10 stack trace samples from glusterd. I wait 10
>> > seconds between each trace.
>> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >
>> > Content of the first stack trace is here:
>> >
>> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> > #1  0x0040643b in glusterfs_sigwaiter ()
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> > #2  0x00303f8528fb in pool_sweeper () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x00303f864afc in syncenv_task () from
>> > /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f8729f0 in syncenv_processor () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x00303f864afc in syncenv_task () from
>> > /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f8729f0 in syncenv_processor () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x7f7a898a099b in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>> > #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> > #4  0x00303f82524c in dict_set_str () from
>> > /usr/lib64/libglusterfs.so.0
>> > #5  0x7f7a898da7fd in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #6  0x7f7a8981b0df in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #7  0x7f7a8981b47c in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #8  0x7f7a89831edf in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/m

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-24 Thread Serkan Çoban

Restarting glusterd causes the same thing. I tried with 3.12.rc0,
3.10.5. 3.8.15, 3.7.20 all same behavior.
My OS is centos 6.9, I tried with centos 6.8 problem remains...
Only way to a healthy state is destroy gluster config/rpms, reinstall
and recreate volumes.

On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from 
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from 
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/li

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

Here you can find 10 stack trace samples from glusterd. I wait 10
seconds between each trace.
https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0

Content of the first stack trace is here:

Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
#0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
#0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
#1  0x0040643b in glusterfs_sigwaiter ()
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
#0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
#1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
#2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
#0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00303f8729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
#0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00303f8729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
#0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f7a898a099b in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
#0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
#3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#5  0x7f7a898da7fd in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f7a8981b0df in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f7a8981b47c in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f7a89831edf in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f7a897f28f7 in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f7a897f0bb9 in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f7a8984c89a in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#12 0x7f7a898323ee in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#13 0x00303f40fad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x00303f40bd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x7f7a88a6fccd in ?? () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#17 0x7f7a88a70ffe in ?? () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
#19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
#0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
> Could you be able to provide the pstack dump of the glusterd process?
>
> On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee <amukh...@redhat.com> wrote:
>>
>> Not yet. Gaurav will be taking a look at it tomorrow.
>>
>> On Wed, 23 Aug 2017 at 20:14, Serkan Çoban <cobanser...@gmail.com> wrote:
>>>
>>> Hi Atin,
>>>
>>> Do you have time to check the logs?
>>>
>>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban <cobanser...@gmail.com>
>>> wrote:
>>> > Same thing happens with 3.12.rc0. This time perf top

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

Hi Atin,

Do you have time to check the logs?

On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Same thing happens with 3.12.rc0. This time perf top shows hanging in
> libglusterfs.so and below is the glusterd logs, which are different
> from 3.10.
> With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
> brick processes come online and system starts to answer commands like
> "gluster peer status"..
>
> [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] --

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154649] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154705] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154774] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154852] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154903] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154995] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.155052] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.155141] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:27.074052] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:27.077034] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]

On Tue, Aug 22, 2017 at 7:00 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> I reboot multiple times, also I destroyed the gluster configuration
> and recreate multiple times. The behavior is same.
>
> On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
>> My guess is there is a corruption in vol list or peer list which has lead
>> glusterd to get into a infinite loop of traversing a peer/volume list and
>> CPU to hog up. Again this is a guess and I've not got a chance to take a
>> detail look at the logs and the strace output.
>>
>> I believe if you get to reboot the node again the problem will disappear.
>>
>> On Tue, 22 Aug 2017 at 20:07, Serkan Çoban <cobanser...@gmail.com> wrote:
>>>
>>> As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
>>> glusterd %100 cpu usage
>>> Hope this helps...
>>>
>>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban <cobanser...@gmail.com>
>>> wrote:
>>> > Hi there,
>>> >
>>> > I have a strange problem.
>>> > Gluster version in 3.10.5, I am testing new servers. Gluster
>>> > configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
>>> > I can successfully create the clu

Re: [Gluster-users] Brick count limit in a volume

2017-08-22 Thread Serkan Çoban

This is the command line output:
Total brick list is larger than a request. Can take (brick_count )
Usage: volume create  [stripe ] [replica ] 

I am testing if a big single volume will work for us. Now I am
continuing testing with three volumes each 13PB...
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick count limit in a volume

2017-08-22 Thread Serkan Çoban

Hi, I think this is the line limiting brick count:
https://github.com/gluster/glusterfs/blob/c136024613c697fec87aaff3a070862b92c57977/cli/src/cli-cmd-parser.c#L84

Can gluster-devs increase this limit? Should I open a github issue?

On Mon, Aug 21, 2017 at 7:01 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi,
> Gluster version is 3.10.5. I am trying to create a 5500 brick volume,
> but getting an error stating that  bricks is the limit. Is this a
> known limit? Can I change this with an option?
>
> Thanks,
> Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

I reboot multiple times, also I destroyed the gluster configuration
and recreate multiple times. The behavior is same.

On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee <amukh...@redhat.com> wrote:
> My guess is there is a corruption in vol list or peer list which has lead
> glusterd to get into a infinite loop of traversing a peer/volume list and
> CPU to hog up. Again this is a guess and I've not got a chance to take a
> detail look at the logs and the strace output.
>
> I believe if you get to reboot the node again the problem will disappear.
>
> On Tue, 22 Aug 2017 at 20:07, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
>> glusterd %100 cpu usage
>> Hope this helps...
>>
>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>> > Hi there,
>> >
>> > I have a strange problem.
>> > Gluster version in 3.10.5, I am testing new servers. Gluster
>> > configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
>> > I can successfully create the cluster and volumes without any
>> > problems. I write data to cluster from 100 clients for 12 hours again
>> > no problem. But when I try to reboot a node, glusterd process hangs on
>> > %100 CPU usage and seems to do nothing, no brick processes come
>> > online. You can find strace of glusterd process for 1 minutes here:
>> >
>> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
>> >
>> > Here is the glusterd logs:
>> > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
>> >
>> >
>> > By the way, reboot of one server completes without problem if I reboot
>> > the servers before creating any volumes.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> - Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
glusterd %100 cpu usage
Hope this helps...

On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi there,
>
> I have a strange problem.
> Gluster version in 3.10.5, I am testing new servers. Gluster
> configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
> I can successfully create the cluster and volumes without any
> problems. I write data to cluster from 100 clients for 12 hours again
> no problem. But when I try to reboot a node, glusterd process hangs on
> %100 CPU usage and seems to do nothing, no brick processes come
> online. You can find strace of glusterd process for 1 minutes here:
>
> https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
>
> Here is the glusterd logs:
> https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
>
>
> By the way, reboot of one server completes without problem if I reboot
> the servers before creating any volumes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

Hi there,

I have a strange problem.
Gluster version in 3.10.5, I am testing new servers. Gluster
configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
I can successfully create the cluster and volumes without any
problems. I write data to cluster from 100 clients for 12 hours again
no problem. But when I try to reboot a node, glusterd process hangs on
%100 CPU usage and seems to do nothing, no brick processes come
online. You can find strace of glusterd process for 1 minutes here:

https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0

Here is the glusterd logs:
https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0


By the way, reboot of one server completes without problem if I reboot
the servers before creating any volumes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Brick count limit in a volume

2017-08-21 Thread Serkan Çoban

Hi,
Gluster version is 3.10.5. I am trying to create a 5500 brick volume,
but getting an error stating that  bricks is the limit. Is this a
known limit? Can I change this with an option?

Thanks,
Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] 3.10.4 packages are missing

2017-08-07 Thread Serkan Çoban

Hi,

I cannot find gluster 3.10.4 packages in centos repos. 3.11 release is
also nonexistent. Can anyone fix this please?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Serkan Çoban

Did you test healing by increasing disperse.shd-max-threads?
What is your heal times per brick now?

On Fri, Jun 30, 2017 at 8:01 PM, Alastair Neil <ajneil.t...@gmail.com> wrote:
> We are using 3.10 and have a 7 PB cluster.  We decided against 16+3 as the
> rebuild time are bottlenecked by matrix operations which scale as the square
> of the number of data stripes.  There are some savings because of larger
> data chunks but we ended up using 8+3 and heal times are about half compared
> to 16+3.
>
> -Alastair
>
> On 30 June 2017 at 02:22, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> >Thanks for the reply. We will mainly use this for archival - near-cold
>> > storage.
>> Archival usage is good for EC
>>
>> >Anything, from your experience, to keep in mind while planning large
>> > installations?
>> I am using 3.7.11 and only problem is slow rebuild time when a disk
>> fails. It takes 8 days to heal a 8TB disk.(This might be related with
>> my EC configuration 16+4)
>> 3.9+ versions has some improvements about this but I cannot test them
>> yet...
>>
>> On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak <jkieb...@gmail.com> wrote:
>> > Thanks for the reply. We will mainly use this for archival - near-cold
>> > storage.
>> >
>> >
>> > Anything, from your experience, to keep in mind while planning large
>> > installations?
>> >
>> >
>> > Sent from my Verizon, Samsung Galaxy smartphone
>> >
>> >  Original message 
>> > From: Serkan Çoban <cobanser...@gmail.com>
>> > Date: 6/29/17 4:39 AM (GMT-05:00)
>> > To: Jason Kiebzak <jkieb...@gmail.com>
>> > Cc: Gluster Users <gluster-users@gluster.org>
>> > Subject: Re: [Gluster-users] Multi petabyte gluster
>> >
>> > I am currently using 10PB single volume without problems. 40PB is on
>> > the way. EC is working fine.
>> > You need to plan ahead with large installations like this. Do complete
>> > workload tests and make sure your use case is suitable for EC.
>> >
>> >
>> > On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak <jkieb...@gmail.com>
>> > wrote:
>> >> Has anyone scaled to a multi petabyte gluster setup? How well does
>> >> erasure
>> >> code do with such a large setup?
>> >>
>> >> Thanks
>> >>
>> >> ___
>> >> Gluster-users mailing list
>> >> Gluster-users@gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Serkan Çoban

>Thanks for the reply. We will mainly use this for archival - near-cold storage.
Archival usage is good for EC

>Anything, from your experience, to keep in mind while planning large 
>installations?
I am using 3.7.11 and only problem is slow rebuild time when a disk
fails. It takes 8 days to heal a 8TB disk.(This might be related with
my EC configuration 16+4)
3.9+ versions has some improvements about this but I cannot test them yet...

On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak <jkieb...@gmail.com> wrote:
> Thanks for the reply. We will mainly use this for archival - near-cold
> storage.
>
>
> Anything, from your experience, to keep in mind while planning large
> installations?
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
> ---- Original message 
> From: Serkan Çoban <cobanser...@gmail.com>
> Date: 6/29/17 4:39 AM (GMT-05:00)
> To: Jason Kiebzak <jkieb...@gmail.com>
> Cc: Gluster Users <gluster-users@gluster.org>
> Subject: Re: [Gluster-users] Multi petabyte gluster
>
> I am currently using 10PB single volume without problems. 40PB is on
> the way. EC is working fine.
> You need to plan ahead with large installations like this. Do complete
> workload tests and make sure your use case is suitable for EC.
>
>
> On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak <jkieb...@gmail.com> wrote:
>> Has anyone scaled to a multi petabyte gluster setup? How well does erasure
>> code do with such a large setup?
>>
>> Thanks
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-29 Thread Serkan Çoban

I am currently using 10PB single volume without problems. 40PB is on
the way. EC is working fine.
You need to plan ahead with large installations like this. Do complete
workload tests and make sure your use case is suitable for EC.

On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak  wrote:
> Has anyone scaled to a multi petabyte gluster setup? How well does erasure
> code do with such a large setup?
>
> Thanks
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Heal operation detail of EC volumes

2017-06-01 Thread Serkan Çoban

>Is it possible that this matches your observations ?
Yes that matches what I see. So 19 files is being in parallel by 19
SHD processes. I thought only one file is being healed at a time.
Then what is the meaning of disperse.shd-max-threads parameter? If I
set it to 2 then each SHD thread will heal two files at the same time?

>How many IOPS can handle your bricks ?
Bricks are 7200RPM NL-SAS disks. 70-80 random IOPS max. But write
pattern seems sequential, 30-40MB bulk writes every 4-5 seconds.
This is what iostat shows.

>Do you have a test environment where we could check all this ?
Not currently but will have in 4-5 weeks. New servers are arriving, I
will add this test to my notes.

> There's a feature to allow to configure the self-heal block size to optimize 
> these cases. The feature is available on 3.11.
I did not see this in 3.11 release notes, what parameter name I should look for?



On Thu, Jun 1, 2017 at 10:30 AM, Xavier Hernandez <xhernan...@datalab.es> wrote:
> Hi Serkan,
>
> On 30/05/17 10:22, Serkan Çoban wrote:
>>
>> Ok I understand that heal operation takes place on server side. In
>> this case I should see X KB
>>  out network traffic from 16 servers and 16X KB input traffic to the
>> failed brick server right? So that process will get 16 chunks
>> recalculate our chunk and write it to disk.
>
>
> That should be the normal operation for a single heal.
>
>> The problem is I am not seeing such kind of traffic on servers. In my
>> configuration (16+4 EC) I see 20 servers are all have 7-8MB outbound
>> traffic and none of them has more than 10MB incoming traffic.
>> Only heal operation is happening on cluster right now, no client/other
>> traffic. I see constant 7-8MB write to healing brick disk. So where is
>> the missing traffic?
>
>
> Not sure about your configuration, but probably you are seeing the result of
> having the SHD of each server doing heals. That would explain the network
> traffic you have.
>
> Suppose that all SHD but the one on the damaged brick are working. In this
> case 19 servers will peek 16 fragments each. This gives 19 * 16 = 304
> fragments to be requested. EC balances the reads among all available
> servers, and there's a chance (1/19) that a fragment is local to the server
> asking it. So we'll need a total of 304 - 304 / 19 = 288 network requests,
> 288 / 19 = 15.2 sent by each server.
>
> If we have a total of 288 requests, it means that each server will answer
> 288 / 19 = 15.2 requests. The net effect of all this is that each healthy
> server is sending 15.2*X bytes of data and each server is receiving 15.2*X
> bytes of data.
>
> Now we need to account for the writes to the damaged brick. We have 19
> simultaneous heals. This means that the damaged brick will receive 19*X
> bytes of data, and each healthy server will send X additional bytes of data.
>
> So:
>
> A healthy server receives 15.2*X bytes of data
> A healthy server sends 16.2*X bytes of data
> A damaged server receives 19*X bytes of data
> A damaged server sends few bytes of data (communication and synchronization
> overhead basically)
>
> As you can see, in this configuration each server has almost the same amount
> of inbound and outbound traffic. Only big difference is the damaged brick,
> that should receive a little more of traffic, but it should send much less.
>
> Is it possible that this matches your observations ?
>
> There's one more thing to consider here, and it's the apparent low
> throughput of self-heal. One possible thing to check is the small size and
> random behavior of the requests.
>
> Assuming that each request has a size of ~128 / 16 = 8KB, at a rate of ~8
> MB/s the servers are processing ~1000 IOPS. Since requests are going to 19
> different files, even if each file is accessed sequentially, the real effect
> will be like random access (some read-ahead on the filesystem can improve
> reads a bit, but writes won't benefit so much).
>
> How many IOPS can handle your bricks ?
>
> Do you have a test environment where we could check all this ? if possible
> it would be interesting to have only a single SHD (kill all SHD from all
> servers but one). In this situation, without client accesses, we should see
> the 16/1 ratio of reads vs writes on the network. We should also see a
> similar of even a little better speed because all reads and writes will be
> sequential, optimizing available IOPS.
>
> There's a feature to allow to configure the self-heal block size to optimize
> these cases. The feature is available on 3.11.
>
> Best regards,
>
> Xavi
>
>
>>
>> On Tue, May 30, 2017 at 10:25 AM, Ashish Pandey <aspan...@redhat.com>
>> wrote:
>>>
>

Re: [Gluster-users] Heal operation detail of EC volumes

2017-05-30 Thread Serkan Çoban

Ok I understand that heal operation takes place on server side. In
this case I should see X KB
 out network traffic from 16 servers and 16X KB input traffic to the
failed brick server right? So that process will get 16 chunks
recalculate our chunk and write it to disk.
The problem is I am not seeing such kind of traffic on servers. In my
configuration (16+4 EC) I see 20 servers are all have 7-8MB outbound
traffic and none of them has more than 10MB incoming traffic.
Only heal operation is happening on cluster right now, no client/other
traffic. I see constant 7-8MB write to healing brick disk. So where is
the missing traffic?

On Tue, May 30, 2017 at 10:25 AM, Ashish Pandey <aspan...@redhat.com> wrote:
>
> When we say client side heal or server side heal, we basically talking about
> the side which "triggers" heal of a file.
>
> 1 - server side heal - shd scans indices and triggers heal
>
> 2 - client side heal - a fop finds that file needs heal and it triggers heal
> for that file.
>
> Now, what happens when heal gets triggered.
> In both  the cases following functions takes part -
>
> ec_heal => ec_heal_throttle=>ec_launch_heal
>
> Now ec_launch_heal just creates heal tasks (with ec_synctask_heal_wrap which
> calls ec_heal_do ) and put it into a queue.
> This happens on server and "syncenv" infrastructure which is nothing but a
> set of workers pick these tasks and execute it. That is when actual
> read/write for
> heal happens.
>
>
> 
> From: "Serkan Çoban" <cobanser...@gmail.com>
> To: "Ashish Pandey" <aspan...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Monday, May 29, 2017 6:44:50 PM
> Subject: Re: [Gluster-users] Heal operation detail of EC volumes
>
>
>>>Healing could be triggered by client side (access of file) or server side
>>> (shd).
>>>However, in both the cases actual heal starts from "ec_heal_do" function.
> If I do a recursive getfattr operation from clients, then all heal
> operation is done on clients right? Client read the chunks, calculate
> and write the missing chunk.
> And If I don't access files from client then SHD daemons will start
> heal and read,calculate,write the missing chunks right?
>
> In first case EC calculations takes places in client fuse process, in
> second case EC calculations will be made in SHD process right?
> Does brick process has any role in EC calculations?
>
> On Mon, May 29, 2017 at 3:32 PM, Ashish Pandey <aspan...@redhat.com> wrote:
>>
>>
>> 
>> From: "Serkan Çoban" <cobanser...@gmail.com>
>> To: "Gluster Users" <gluster-users@gluster.org>
>> Sent: Monday, May 29, 2017 5:13:06 PM
>> Subject: [Gluster-users] Heal operation detail of EC volumes
>>
>> Hi,
>>
>> When a brick fails in EC, What is the healing read/write data path?
>> Which processes do the operations?
>>
>> Healing could be triggered by client side (access of file) or server side
>> (shd).
>> However, in both the cases actual heal starts from "ec_heal_do" function.
>>
>>
>> Assume a 2GB file is being healed in 16+4 EC configuration. I was
>> thinking that SHD deamon on failed brick host will read 2GB from
>> network and reconstruct its 100MB chunk and write it on to brick. Is
>> this right?
>>
>> You are correct about read/write.
>> The only point is that, SHD deamon on one of the good brick will pick the
>> index entry and heal it.
>> SHD deamon scans the .glusterfs/index directory and heals the entries. If
>> the brick went down while IO was going on, index will be present on killed
>> brick also.
>> However, if a brick was down and then you started writing on a file then
>> in
>> this case index entry would not be present on killed brick.
>> So even after brick will be  UP, sdh on that brick will not be able to
>> find
>> it out this index. However, other bricks would have entries and shd on
>> that
>> brick will heal it.
>>
>> Note: I am considering each brick on different node.
>>
>> Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Heal operation detail of EC volumes

2017-05-29 Thread Serkan Çoban

>>Healing could be triggered by client side (access of file) or server side 
>>(shd).
>>However, in both the cases actual heal starts from "ec_heal_do" function.
If I do a recursive getfattr operation from clients, then all heal
operation is done on clients right? Client read the chunks, calculate
and write the missing chunk.
And If I don't access files from client then SHD daemons will start
heal and read,calculate,write the missing chunks right?

In first case EC calculations takes places in client fuse process, in
second case EC calculations will be made in SHD process right?
Does brick process has any role in EC calculations?

On Mon, May 29, 2017 at 3:32 PM, Ashish Pandey <aspan...@redhat.com> wrote:
>
>
> ____
> From: "Serkan Çoban" <cobanser...@gmail.com>
> To: "Gluster Users" <gluster-users@gluster.org>
> Sent: Monday, May 29, 2017 5:13:06 PM
> Subject: [Gluster-users] Heal operation detail of EC volumes
>
> Hi,
>
> When a brick fails in EC, What is the healing read/write data path?
> Which processes do the operations?
>
> Healing could be triggered by client side (access of file) or server side
> (shd).
> However, in both the cases actual heal starts from "ec_heal_do" function.
>
>
> Assume a 2GB file is being healed in 16+4 EC configuration. I was
> thinking that SHD deamon on failed brick host will read 2GB from
> network and reconstruct its 100MB chunk and write it on to brick. Is
> this right?
>
> You are correct about read/write.
> The only point is that, SHD deamon on one of the good brick will pick the
> index entry and heal it.
> SHD deamon scans the .glusterfs/index directory and heals the entries. If
> the brick went down while IO was going on, index will be present on killed
> brick also.
> However, if a brick was down and then you started writing on a file then in
> this case index entry would not be present on killed brick.
> So even after brick will be  UP, sdh on that brick will not be able to find
> it out this index. However, other bricks would have entries and shd on that
> brick will heal it.
>
> Note: I am considering each brick on different node.
>
> Ashish
>
>
>
>
>
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Heal operation detail of EC volumes

2017-05-29 Thread Serkan Çoban

Hi,

When a brick fails in EC, What is the healing read/write data path?
Which processes do the operations?

Assume a 2GB file is being healed in 16+4 EC configuration. I was
thinking that SHD deamon on failed brick host will read 2GB from
network and reconstruct its 100MB chunk and write it on to brick. Is
this right?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hash function

2017-05-28 Thread Serkan Çoban

Hashing is done on filenames, but each directory has its own hash
range. So same filename under different directories mapped to
different bricks.

On Sun, May 28, 2017 at 1:00 PM, Stephen Remde
 wrote:
> Hi all,
>
> Am I correct in thinking the has used to determine the storage brick used
> only looks at the file name (not the path)?
>
> If so can this functionality be overridden to potentially only use the path?
> My volume contains lots of files however only about 12 different filenames!
> Processing is also done on a per directory basis.
>
> Any advice would be appreciated.
>
> Steve
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bad perf for small files on large EC volume

2017-05-08 Thread Serkan Çoban

There are 300M files right I am not counting wrong?
With that file profile I would never use EC in first place.
Maybe you can pack the files into tar archives or similar before
migrating to gluster?
It will take ages to heal a drive with that file count...

On Mon, May 8, 2017 at 3:59 PM, Ingard Mevåg  wrote:
> With attachments :)
>
> 2017-05-08 14:57 GMT+02:00 Ingard Mevåg :
>>
>> Hi
>>
>> We've got 3 servers with 60 drives each setup with an EC volume running on
>> gluster 3.10.0
>> The servers are connected via 10gigE.
>>
>> We've done the changes recommended here :
>> https://bugzilla.redhat.com/show_bug.cgi?id=1349953#c17 and we're able to
>> max out the network with the iozone tests referenced in the same ticket.
>>
>> However for small files we are getting 3-5 MB/s with the smallfile_cli.py
>> tool. For instance:
>> python smallfile_cli.py --operation create --threads 32 --file-size 100
>> --files 1000 --top /tmp/dfs-archive-001/
>> .
>> .
>> total threads = 32
>> total files = 31294
>> total data = 2.984 GB
>>  97.79% of requested files processed, minimum is  90.00
>> 785.542908 sec elapsed time
>> 39.837416 files/sec
>> 39.837416 IOPS
>> 3.890373 MB/sec
>> .
>>
>> We're going to use these servers for archive purposes, so the files will
>> be moved there and accessed very little. After noticing our migration tool
>> performing very badly we did some analyses on the data actually being moved
>> :
>>
>> Bucket 31808791 (16.27 GB) :: 0 bytes - 1.00 KB
>> Bucket 49448258 (122.89 GB) :: 1.00 KB - 5.00 KB
>> Bucket 13382242 (96.92 GB) :: 5.00 KB - 10.00 KB
>> Bucket 13557684 (195.15 GB) :: 10.00 KB - 20.00 KB
>> Bucket 22735245 (764.96 GB) :: 20.00 KB - 50.00 KB
>> Bucket 15101878 (1041.56 GB) :: 50.00 KB - 100.00 KB
>> Bucket 10734103 (1558.35 GB) :: 100.00 KB - 200.00 KB
>> Bucket 17695285 (5773.74 GB) :: 200.00 KB - 500.00 KB
>> Bucket 13632394 (10039.92 GB) :: 500.00 KB - 1.00 MB
>> Bucket 21815815 (32641.81 GB) :: 1.00 MB - 2.00 MB
>> Bucket 36940815 (117683.33 GB) :: 2.00 MB - 5.00 MB
>> Bucket 13580667 (91899.10 GB) :: 5.00 MB - 10.00 MB
>> Bucket 10945768 (232316.33 GB) :: 10.00 MB - 50.00 MB
>> Bucket 1723848 (542581.89 GB) :: 50.00 MB - 9223372036.85 GB
>>
>> So it turns out we've got a very large number of very small files being
>> written to this volume.
>> I've attached the volume config and 2 profiling runs so if someone wants
>> to take a look and maybe give us some hints in terms of what volume settings
>> will be best for writing a lot of small files that would be much
>> appreciated.
>>
>> kind regards
>> ingard
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-08 Thread Serkan Çoban

>What network do you have?
We have 2X10G bonded interfaces on each server.

Thanks to Xavier for detailed explanation of EC details.

On Sat, May 6, 2017 at 2:20 AM, Alastair Neil <ajneil.t...@gmail.com> wrote:
> What network do you have?
>
>
> On 5 May 2017 at 09:51, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> In our use case every node has 26 bricks. I am using 60 nodes, one 9PB
>> volume with 16+4 EC configuration, each brick in a sub-volume is on
>> different host.
>> We put 15-20k 2GB files every day into 10-15 folders. So it is 1500K
>> files/folder. Our gluster version is 3.7.11.
>> Heal speed in this environment is 8-10MB/sec/brick.
>>
>> I did some tests for parallel self heal feature with version 3.9, two
>> servers 26 bricks each, 8+2 and 16+4 EC configuration.
>> This was a small test environment and the results are as I said 8+2 is
>> 2x faster then 16+4 with parallel self heal threads set to 2/4.
>> In 1-2 months our new servers arriving, I will do detailed tests for
>> heal performance for 8+2 and 16+4 and inform you the results.
>>
>>
>> On Fri, May 5, 2017 at 2:54 PM, Pranith Kumar Karampuri
>> <pkara...@redhat.com> wrote:
>> >
>> >
>> > On Fri, May 5, 2017 at 5:19 PM, Pranith Kumar Karampuri
>> > <pkara...@redhat.com> wrote:
>> >>
>> >>
>> >>
>> >> On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban <cobanser...@gmail.com>
>> >> wrote:
>> >>>
>> >>> It is the over all time, 8TB data disk healed 2x faster in 8+2
>> >>> configuration.
>> >>
>> >>
>> >> Wow, that is counter intuitive for me. I will need to explore about
>> >> this
>> >> to find out why that could be. Thanks a lot for this feedback!
>> >
>> >
>> > From memory I remember you said you have a lot of small files hosted on
>> > the
>> > volume, right? It could be because of the bug
>> > https://review.gluster.org/17151 is fixing. That is the only reason I
>> > could
>> > guess right now. We will try to test this kind of case if you could give
>> > us
>> > a bit more details about average file-size/depth of directories etc to
>> > simulate similar looking directory structure.
>> >
>> >>
>> >>
>> >>>
>> >>>
>> >>> On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
>> >>> <pkara...@redhat.com> wrote:
>> >>> >
>> >>> >
>> >>> > On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban
>> >>> > <cobanser...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Healing gets slower as you increase m in m+n configuration.
>> >>> >> We are using 16+4 configuration without any problems other then
>> >>> >> heal
>> >>> >> speed.
>> >>> >> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals
>> >>> >> on
>> >>> >> 8+2 is faster by 2x.
>> >>> >
>> >>> >
>> >>> > As you increase number of nodes that are participating in an EC set
>> >>> > number
>> >>> > of parallel heals increase. Is the heal speed you saw improved per
>> >>> > file
>> >>> > or
>> >>> > the over all time it took to heal the data?
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey <aspan...@redhat.com>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> > 8+2 and 8+3 configurations are not the limitation but just
>> >>> >> > suggestions.
>> >>> >> > You can create 16+3 volume without any issue.
>> >>> >> >
>> >>> >> > Ashish
>> >>> >> >
>> >>> >> > 
>> >>> >> > From: "Alastair Neil" <ajneil.t...@gmail.com>
>> >>> >> > To: "gluster-users" <gluster-users@gluster.org>
>> >>> >> > Sent: Friday, May 5, 2017 2:23:32 AM
>> >>> >> > Subject: [Gluster-users] disperse volume brick counts limits in
>> >>>

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-05 Thread Serkan Çoban

In our use case every node has 26 bricks. I am using 60 nodes, one 9PB
volume with 16+4 EC configuration, each brick in a sub-volume is on
different host.
We put 15-20k 2GB files every day into 10-15 folders. So it is 1500K
files/folder. Our gluster version is 3.7.11.
Heal speed in this environment is 8-10MB/sec/brick.

I did some tests for parallel self heal feature with version 3.9, two
servers 26 bricks each, 8+2 and 16+4 EC configuration.
This was a small test environment and the results are as I said 8+2 is
2x faster then 16+4 with parallel self heal threads set to 2/4.
In 1-2 months our new servers arriving, I will do detailed tests for
heal performance for 8+2 and 16+4 and inform you the results.


On Fri, May 5, 2017 at 2:54 PM, Pranith Kumar Karampuri
<pkara...@redhat.com> wrote:
>
>
> On Fri, May 5, 2017 at 5:19 PM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
>>
>>
>>
>> On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>>>
>>> It is the over all time, 8TB data disk healed 2x faster in 8+2
>>> configuration.
>>
>>
>> Wow, that is counter intuitive for me. I will need to explore about this
>> to find out why that could be. Thanks a lot for this feedback!
>
>
> From memory I remember you said you have a lot of small files hosted on the
> volume, right? It could be because of the bug
> https://review.gluster.org/17151 is fixing. That is the only reason I could
> guess right now. We will try to test this kind of case if you could give us
> a bit more details about average file-size/depth of directories etc to
> simulate similar looking directory structure.
>
>>
>>
>>>
>>>
>>> On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
>>> <pkara...@redhat.com> wrote:
>>> >
>>> >
>>> > On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban <cobanser...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Healing gets slower as you increase m in m+n configuration.
>>> >> We are using 16+4 configuration without any problems other then heal
>>> >> speed.
>>> >> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals on
>>> >> 8+2 is faster by 2x.
>>> >
>>> >
>>> > As you increase number of nodes that are participating in an EC set
>>> > number
>>> > of parallel heals increase. Is the heal speed you saw improved per file
>>> > or
>>> > the over all time it took to heal the data?
>>> >
>>> >>
>>> >>
>>> >>
>>> >> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey <aspan...@redhat.com>
>>> >> wrote:
>>> >> >
>>> >> > 8+2 and 8+3 configurations are not the limitation but just
>>> >> > suggestions.
>>> >> > You can create 16+3 volume without any issue.
>>> >> >
>>> >> > Ashish
>>> >> >
>>> >> > 
>>> >> > From: "Alastair Neil" <ajneil.t...@gmail.com>
>>> >> > To: "gluster-users" <gluster-users@gluster.org>
>>> >> > Sent: Friday, May 5, 2017 2:23:32 AM
>>> >> > Subject: [Gluster-users] disperse volume brick counts limits in RHES
>>> >> >
>>> >> >
>>> >> > Hi
>>> >> >
>>> >> > we are deploying a large (24node/45brick) cluster and noted that the
>>> >> > RHES
>>> >> > guidelines limit the number of data bricks in a disperse set to 8.
>>> >> > Is
>>> >> > there
>>> >> > any reason for this.  I am aware that you want this to be a power of
>>> >> > 2,
>>> >> > but
>>> >> > as we have a large number of nodes we were planning on going with
>>> >> > 16+3.
>>> >> > Dropping to 8+2 or 8+3 will be a real waste for us.
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> >
>>> >> > Alastair
>>> >> >
>>> >> >
>>> >> > ___
>>> >> > Gluster-users mailing list
>>> >> > Gluster-users@gluster.org
>>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >> >
>>> >> >
>>> >> > ___
>>> >> > Gluster-users mailing list
>>> >> > Gluster-users@gluster.org
>>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >> ___
>>> >> Gluster-users mailing list
>>> >> Gluster-users@gluster.org
>>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Pranith
>>
>>
>>
>>
>> --
>> Pranith
>
>
>
>
> --
> Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-05 Thread Serkan Çoban

It is the over all time, 8TB data disk healed 2x faster in 8+2 configuration.

On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
<pkara...@redhat.com> wrote:
>
>
> On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Healing gets slower as you increase m in m+n configuration.
>> We are using 16+4 configuration without any problems other then heal
>> speed.
>> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals on
>> 8+2 is faster by 2x.
>
>
> As you increase number of nodes that are participating in an EC set number
> of parallel heals increase. Is the heal speed you saw improved per file or
> the over all time it took to heal the data?
>
>>
>>
>>
>> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey <aspan...@redhat.com> wrote:
>> >
>> > 8+2 and 8+3 configurations are not the limitation but just suggestions.
>> > You can create 16+3 volume without any issue.
>> >
>> > Ashish
>> >
>> > 
>> > From: "Alastair Neil" <ajneil.t...@gmail.com>
>> > To: "gluster-users" <gluster-users@gluster.org>
>> > Sent: Friday, May 5, 2017 2:23:32 AM
>> > Subject: [Gluster-users] disperse volume brick counts limits in RHES
>> >
>> >
>> > Hi
>> >
>> > we are deploying a large (24node/45brick) cluster and noted that the
>> > RHES
>> > guidelines limit the number of data bricks in a disperse set to 8.  Is
>> > there
>> > any reason for this.  I am aware that you want this to be a power of 2,
>> > but
>> > as we have a large number of nodes we were planning on going with 16+3.
>> > Dropping to 8+2 or 8+3 will be a real waste for us.
>> >
>> > Thanks,
>> >
>> >
>> > Alastair
>> >
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-05 Thread Serkan Çoban

Healing gets slower as you increase m in m+n configuration.
We are using 16+4 configuration without any problems other then heal speed.
I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals on
8+2 is faster by 2x.


On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey  wrote:
>
> 8+2 and 8+3 configurations are not the limitation but just suggestions.
> You can create 16+3 volume without any issue.
>
> Ashish
>
> 
> From: "Alastair Neil" 
> To: "gluster-users" 
> Sent: Friday, May 5, 2017 2:23:32 AM
> Subject: [Gluster-users] disperse volume brick counts limits in RHES
>
>
> Hi
>
> we are deploying a large (24node/45brick) cluster and noted that the RHES
> guidelines limit the number of data bricks in a disperse set to 8.  Is there
> any reason for this.  I am aware that you want this to be a power of 2, but
> as we have a large number of nodes we were planning on going with 16+3.
> Dropping to 8+2 or 8+3 will be a real waste for us.
>
> Thanks,
>
>
> Alastair
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 Thread Serkan Çoban

I think this is he fix Gandalf asking for:
https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ce


On Thu, Apr 27, 2017 at 2:03 PM, Pranith Kumar Karampuri
 wrote:
> I am very positive about the two things I told you. These are the latest
> things that happened for VM corruption with rebalance.
>
> On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta
>  wrote:
>>
>> I think we are talking about a different bug.
>>
>> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" 
>> ha scritto:
>>>
>>> I am not a DHT developer, so some of what I say could be a little wrong.
>>> But this is what I gather.
>>> I think they found 2 classes of bugs in dht
>>> 1) Graceful fop failover when rebalance is in progress is missing for
>>> some fops, that lead to VM pause.
>>>
>>> I see that https://review.gluster.org/17085 got merged on 24th on master
>>> for this. I see patches are posted for 3.8.x for this one.
>>>
>>> 2) I think there is some work needs to be done for dht_[f]xattrop. I
>>> believe this is the next step that is underway.
>>>
>>>
>>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta
>>>  wrote:

 Updates on this critical bug ?

 Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta"
  ha scritto:
>
> Any update ?
> In addition, if this is a different bug but the "workflow" is the same
> as the previous one, how is possible that fixing the previous bug
> triggered this new one ?
>
> Is possible to have some details ?
>
> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay :
> > Nope. This is a different bug.
> >
> > -Krutika
> >
> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
> >  wrote:
> >>
> >> This is a good news
> >> Is this related to the previously fixed bug?
> >>
> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"  ha
> >> scritto:
> >>>
> >>> So Raghavendra has an RCA for this issue.
> >>>
> >>> Copy-pasting his comment here:
> >>>
> >>> 
> >>>
> >>> Following is a rough algorithm of shard_writev:
> >>>
> >>> 1. Based on the offset, calculate the shards touched by current
> >>> write.
> >>> 2. Look for inodes corresponding to these shard files in itable.
> >>> 3. If one or more inodes are missing from itable, issue mknod for
> >>> corresponding shard files and ignore EEXIST in cbk.
> >>> 4. resume writes on respective shards.
> >>>
> >>> Now, imagine a write which falls to an existing "shard_file". For
> >>> the
> >>> sake of discussion lets consider a distribute of three subvols -
> >>> s1, s2, s3
> >>>
> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2
> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of
> >>> ".shard"
> >>> is fixed to include s4 and hash ranges are changed.
> >>> 3. write that touches "shard_file" is issued.
> >>> 4. The inode for "shard_file" is not present in itable after a
> >>> graph
> >>> switch and features/shard issues an mknod.
> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3
> >>> and
> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already
> >>> present on
> >>> s2.
> >>>
> >>> So, we have two files on two different subvols of dht representing
> >>> same
> >>> shard and this will lead to corruption.
> >>>
> >>> 
> >>>
> >>> Raghavendra will be sending out a patch in DHT to fix this issue.
> >>>
> >>> -Krutika
> >>>
> >>>
> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
> >>>  wrote:
> 
> 
> 
>  On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan
>  
>  wrote:
> >
> > Hi,
> >
> >
> > Do you guys have any update regarding this issue ?
> 
>  I do not actively work on this issue so I do not have an accurate
>  update, but from what I heard from Krutika and Raghavendra(works
>  on DHT) is:
>  Krutika debugged initially and found that the issue seems more
>  likely to be
>  in DHT, Satheesaran who helped us recreate this issue in lab found
>  that just
>  fix-layout without rebalance also caused the corruption 1 out of 3
>  times.
>  Raghavendra came up with a possible RCA for why this can happen.
>  Raghavendra(CCed) would be the right person to provide accurate
>  update.
> >
> >
> >
> > --
> >
> > Respectfully
> > Mahdi A. Mahdi
> >
> >

Re: [Gluster-users] [Gluster-devel] Announcing release 3.11 : Scope, schedule and feature tracking

2017-04-25 Thread Serkan Çoban

How this affect CPU usage? Does it read whole file and calculates a
hash after it is being written?
Will this patch land in 3.10.x?

On Tue, Apr 25, 2017 at 10:32 AM, Kotresh Hiremath Ravishankar
 wrote:
> Hi
>
> https://github.com/gluster/glusterfs/issues/188 is merged in master
> and needs to go in 3.11
>
> Thanks and Regards,
> Kotresh H R
>
> - Original Message -
>> From: "Kaushal M" 
>> To: "Shyam" 
>> Cc: gluster-users@gluster.org, "Gluster Devel" 
>> Sent: Thursday, April 20, 2017 12:16:39 PM
>> Subject: Re: [Gluster-devel] Announcing release 3.11 : Scope, schedule and 
>> feature tracking
>>
>> On Thu, Apr 13, 2017 at 8:17 PM, Shyam  wrote:
>> > On 02/28/2017 10:17 AM, Shyam wrote:
>> >>
>> >> Hi,
>> >>
>> >> With release 3.10 shipped [1], it is time to set the dates for release
>> >> 3.11 (and subsequently 4.0).
>> >>
>> >> This mail has the following sections, so please read or revisit as needed,
>> >>   - Release 3.11 dates (the schedule)
>> >>   - 3.11 focus areas
>> >
>> >
>> > Pinging the list on the above 2 items.
>> >
>> >> *Release 3.11 dates:*
>> >> Based on our release schedule [2], 3.11 would be 3 months from the 3.10
>> >> release and would be a Short Term Maintenance (STM) release.
>> >>
>> >> This puts 3.11 schedule as (working from the release date backwards):
>> >> - Release: May 30th, 2017
>> >> - Branching: April 27th, 2017
>> >
>> >
>> > Branching is about 2 weeks away, other than the initial set of overflow
>> > features from 3.10 nothing else has been raised on the lists and in github
>> > as requests for 3.11.
>> >
>> > So, a reminder to folks who are working on features, to raise the relevant
>> > github issue for the same, and post it to devel list for consideration in
>> > 3.11 (also this helps tracking and ensuring we are waiting for the right
>> > things at the time of branching).
>> >
>> >>
>> >> *3.11 focus areas:*
>> >> As maintainers of gluster, we want to harden testing around the various
>> >> gluster features in this release. Towards this the focus area for this
>> >> release are,
>> >>
>> >> 1) Testing improvements in Gluster
>> >>   - Primary focus would be to get automated test cases to determine
>> >> release health, rather than repeating a manual exercise every 3 months
>> >>   - Further, we would also attempt to focus on maturing Glusto[7] for
>> >> this, and other needs (as much as possible)
>> >>
>> >> 2) Merge all (or as much as possible) Facebook patches into master, and
>> >> hence into release 3.11
>> >>   - Facebook has (as announced earlier [3]) started posting their
>> >> patches mainline, and this needs some attention to make it into master
>> >>
>> >
>> > Further to the above, we are also considering the following features for
>> > this release, request feature owners to let us know if these are actively
>> > being worked on and if these will make the branching dates. (calling out
>> > folks that I think are the current feature owners for the same)
>> >
>> > 1) Halo - Initial Cut (@pranith)
>> > 2) IPv6 support (@kaushal)
>>
>> This is under review at https://review.gluster.org/16228 . The patch
>> mostly looks fine.
>>
>> The only issue is that it currently depends and links with an internal
>> FB fork of tirpc (mainly for some helper functions and utilities).
>> This makes it hard for the community to make actual use of  and test,
>> the IPv6 features/fixes introduced by the change.
>>
>> If the change were refactored the use publicly available versions of
>> tirpc or ntirpc, I'm OK for it to be merged. I did try it out myself.
>> While I was able to build it against available versions of tirpc, I
>> wasn't able to get it working correctly.
>>
>> > 3) Negative lookup (@poornima)
>> > 4) Parallel Readdirp - More changes to default settings. (@poornima, @du)
>> >
>> >
>> >> [1] 3.10 release announcement:
>> >> http://lists.gluster.org/pipermail/gluster-devel/2017-February/052188.html
>> >>
>> >> [2] Gluster release schedule:
>> >> https://www.gluster.org/community/release-schedule/
>> >>
>> >> [3] Mail regarding facebook patches:
>> >> http://lists.gluster.org/pipermail/gluster-devel/2016-December/051784.html
>> >>
>> >> [4] Release scope: https://github.com/gluster/glusterfs/projects/1
>> >>
>> >> [5] glusterfs github issues: https://github.com/gluster/glusterfs/issues
>> >>
>> >> [6] github issues for features and major fixes:
>> >> https://hackmd.io/s/BkgH8sdtg#
>> >>
>> >> [7] Glusto tests: https://github.com/gluster/glusto-tests
>> >> ___
>> >> Gluster-devel mailing list
>> >> gluster-de...@gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/gluster-devel
>> >
>> > ___
>> > Gluster-devel mailing list
>> > gluster-de...@gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
>>

Re: [Gluster-users] Add single server

2017-04-22 Thread Serkan Çoban

In EC if you have m+n configuration, you have to grow by m+n bricks.
If you have 6+2 you need to add another 8 bricks.

On Sat, Apr 22, 2017 at 3:02 PM, Gandalf Corvotempesta
 wrote:
> I'm still trying to figure out if adding a single server to an
> existing gluster cluster is possible or not, based on EC or standard
> replica.
>
> I don't think so, because with replica 3, when each server is already
> full (no more slots for disks), I need to add 3 server at once.
>
> Is this the same even with EC ? In example, a 6:2 configuration is
> "fixed" or can I add a single node moving from 6:2 to 7:2 and so on ?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] supermicro 60/90 bay servers

2017-04-20 Thread Serkan Çoban

What is your use case? Disperse is good for archive workloads, big files.
I suggest you to buy 10 servers and use 8+2 EC configuration. This way you can
handle two node failures. We are using 28 disk servers but our next
cluster will use 68 disk servers.


On Thu, Apr 20, 2017 at 1:19 PM, Ingard Mevåg  wrote:
> Hi
>
> We've been looking at supermicro 60 and 90 bay servers. Are anyone else
> using these models (or similar density) for gluster?
> Specifically I'd like to setup a distributed disperse volume with 8 of these
> servers.
>
> Any insight, does and donts or best practice guidelines would be appreciated
> :)
>
> kind regards
> Ingard
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Serkan Çoban

>Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
>We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.

That is the maximum if you are using EC volumes, I don't know about
other volume configurations.
With 3.9.0 parallel self heal of EC volumes should be faster though.



On Tue, Apr 18, 2017 at 1:38 PM, Gandalf Corvotempesta
<gandalf.corvotempe...@gmail.com> wrote:
> 2017-04-18 9:36 GMT+02:00 Serkan Çoban <cobanser...@gmail.com>:
>> Nope, healing speed is 10MB/sec/brick, each brick heals with this
>> speed, so one brick or one server each will heal in one week...
>
> Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
> We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Serkan Çoban

>I was asking about reading data in same disperse set like 8+2 disperse
config if one disk is replaced and when heal is in process and when client
reads data which is available in rest of the 9 disks.

My use case is write heavy, we barely read data, so I do not know if read
speed degrades during heal. But I know write speed do not change during
heal.

How big is your files? How many files on average in each directory?

On Tue, Apr 18, 2017 at 11:36 AM, Amudhan P <amudha...@gmail.com> wrote:

>
> I actually used this (find /mnt/gluster -d -exec getfattr -h -n
> trusted.ec.heal {} \; > /dev/null
> ) command on a specific folder to trigger heal but it was also not showing
> any difference in speed.
>
> I was asking about reading data in same disperse set like 8+2 disperse
> config if one disk is replaced and when heal is in process and when client
> reads data which is available in rest of the 9 disks.
>
> I am sure there was no bottleneck on network/disk IO in my case.
>
> I have tested 3.10.1 heal with disperse.shd-max-threads = 4. heal
> completed data size of 27GB in 13M15s. so it works well in a test
> environment but production environment it differs.
>
>
>
> On Tue, Apr 18, 2017 at 12:47 PM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
>
>> You can increase heal speed by running below command from a client:
>> find /mnt/gluster -d -exec getfattr -h -n trusted.ec.heal {} \; >
>> /dev/null
>>
>> You can write a script with different folders to make it parallel.
>>
>> In my case I see 6TB data was healed within 7-8 days with above command
>> running.
>> >did you face any issue in reading data from rest of the good bricks in
>> the set. like slow read < KB/s.
>> No, nodes generally have balanced network/disk  IO during heal..
>>
>> You should make a detailed tests with non-prod cluster and try to find
>> optimum heal configuration for your use case..
>> Our new servers are on the way, in a couple of months I also will do
>> detailed tests with 3.10.x and parallel disperse heal, will post the
>> results here...
>>
>>
>> On Tue, Apr 18, 2017 at 9:51 AM, Amudhan P <amudha...@gmail.com> wrote:
>> > Serkan,
>> >
>> > I have initially changed shd-max-thread 1 to 2 saw a little difference
>> and
>> > changing it to 4 & 8. doesn't make any difference.
>> > disk write speed was about <1MB and data passed in thru network for
>> healing
>> > node from other node were 4MB combined.
>> >
>> > Also, I tried ls -l from mount point to the folders and files which
>> need to
>> > be healed but have not seen any difference in performance.
>> >
>> > But after 3 days of heal process running disk write speed was increased
>> to 9
>> > - 11MB and data passed thru network for healing node from other node
>> were
>> > 40MB combined.
>> >
>> > Still 14GB of data to be healed when comparing to other disks in set.
>> >
>> > I saw in another thread you also had the issue with heal speed, did you
>> face
>> > any issue in reading data from rest of the good bricks in the set. like
>> slow
>> > read < KB/s.
>> >
>> > On Mon, Apr 17, 2017 at 2:05 PM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>> >>
>> >> Normally I see 8-10MB/sec/brick heal speed with gluster 3.7.11.
>> >> I tested parallel heal for disperse with version 3.9.0 and see that it
>> >> increase the heal speed to 20-40MB/sec
>> >> I tested with shd-max-threads 2,4,8 and saw that best performance
>> >> achieved with 2 or 4 threads.
>> >> you can try to start with 2 and test with 4 and 8 and compare the
>> results?
>> >
>> >
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Serkan Çoban

>But is this normal? Gluster need about 7-8 days to heal 6TB ?
This is the case with my configuration and version of gluster. With
3.9 parallel self heal feature introduced, it will help to decrease
these times.
>In case of a server failure, you need some weeks to heal ?
Nope, healing speed is 10MB/sec/brick, each brick heals with this
speed, so one brick or one server each will heal in one week...

On Tue, Apr 18, 2017 at 10:20 AM, Gandalf Corvotempesta
<gandalf.corvotempe...@gmail.com> wrote:
> 2017-04-18 9:17 GMT+02:00 Serkan Çoban <cobanser...@gmail.com>:
>> In my case I see 6TB data was healed within 7-8 days with above command 
>> running.
>
> But is this normal? Gluster need about 7-8 days to heal 6TB ?
> In case of a server failure, you need some weeks to heal ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Serkan Çoban

You can increase heal speed by running below command from a client:
find /mnt/gluster -d -exec getfattr -h -n trusted.ec.heal {} \; > /dev/null

You can write a script with different folders to make it parallel.

In my case I see 6TB data was healed within 7-8 days with above command running.
>did you face any issue in reading data from rest of the good bricks in the 
>set. like slow read < KB/s.
No, nodes generally have balanced network/disk  IO during heal..

You should make a detailed tests with non-prod cluster and try to find
optimum heal configuration for your use case..
Our new servers are on the way, in a couple of months I also will do
detailed tests with 3.10.x and parallel disperse heal, will post the
results here...


On Tue, Apr 18, 2017 at 9:51 AM, Amudhan P <amudha...@gmail.com> wrote:
> Serkan,
>
> I have initially changed shd-max-thread 1 to 2 saw a little difference and
> changing it to 4 & 8. doesn't make any difference.
> disk write speed was about <1MB and data passed in thru network for healing
> node from other node were 4MB combined.
>
> Also, I tried ls -l from mount point to the folders and files which need to
> be healed but have not seen any difference in performance.
>
> But after 3 days of heal process running disk write speed was increased to 9
> - 11MB and data passed thru network for healing node from other node were
> 40MB combined.
>
> Still 14GB of data to be healed when comparing to other disks in set.
>
> I saw in another thread you also had the issue with heal speed, did you face
> any issue in reading data from rest of the good bricks in the set. like slow
> read < KB/s.
>
> On Mon, Apr 17, 2017 at 2:05 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Normally I see 8-10MB/sec/brick heal speed with gluster 3.7.11.
>> I tested parallel heal for disperse with version 3.9.0 and see that it
>> increase the heal speed to 20-40MB/sec
>> I tested with shd-max-threads 2,4,8 and saw that best performance
>> achieved with 2 or 4 threads.
>> you can try to start with 2 and test with 4 and 8 and compare the results?
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-17 Thread Serkan Çoban

Normally I see 8-10MB/sec/brick heal speed with gluster 3.7.11.
I tested parallel heal for disperse with version 3.9.0 and see that it
increase the heal speed to 20-40MB/sec
I tested with shd-max-threads 2,4,8 and saw that best performance
achieved with 2 or 4 threads.
you can try to start with 2 and test with 4 and 8 and compare the results?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

2017-03-23 Thread Serkan Çoban

Assuming a backup window of 12 hours, you need to send data at 25GB/s
to backup solution.
Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
You can create an EC gluster cluster that can handle this rates, or
you just backup valuable data from inside VMs using open source backup
tools like borg,attic,restic , etc...

On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
 wrote:
> Let's assume a 1PB storage full of VMs images with each brick over ZFS,
> replica 3, sharding enabled
>
> How do you backup/restore that amount of data?
>
> Backing up daily is impossible, you'll never finish the backup that the
> following one is starting (in other words, you need more than 24 hours)
>
> Restoring is even worse. You need more than 24 hours with the whole cluster
> down
>
> You can't rely on ZFS snapshot due to sharding (the snapshot took from one
> node is useless without all other node related at the same shard) and you
> still have the same restore speed
>
> How do you backup this?
>
> Even georep isn't enough, if you have to restore the whole storage in case
> of disaster
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] advice needed on configuring large gluster cluster

2017-03-15 Thread Serkan Çoban

Please find my comments inline.

> Hi
>
> we have a new gluster cluster we are planning on deploying.  We will have 24
> nodes each with JBOD, 39 8TB drives and 6, 900GB SSDs, and FDR IB
>
> We will not be using all of this as one volume , but I thought initially of
> using a distributed disperse volume.
>
> Never having attempted anything on this scale I have a couple of questions
> regarding EC and distibuted disperse volumes.
>
> Does a distributed dispersed volume have to start life as distributed
> dispersed, or can I  take a disperse volume and make it distributed by
> adding bricks?
Yes you can start with one subvolume and later you can increase the subvolumes.
But be careful about planning, if you start with m+n EC configuration,
you can add
another m+n subvolume to it.
>
> Does an EC scheme of 24+4 seem reasonable?  One requirement we will have is
> the need to tolerate two nodes down at once, as the nodes share a chassis.
> I assume that  distributed disperse volumes can be expanded in a similar
> fashion to distributed replicate volumes by adding additional disperse brick
> sets?
It is recommended in m+n configuration m should be power of two.
You can do 16+4 or 8+2. Higher m will cause slower healing but
parallel self heal
of EC volumes in 3.9+ will help. 8+2 configuration with one brick from
every node will
tolerate loss of two nodes.

>
> I would also like to consider adding a hot-tier using the SSDs,  I confess I
> have not done much reading on tiering, but am hoping I can use a different
> volume form for the hot tier.  Can I use create a disperse, or a distributed
> replicated?   If I am smoking rainbows then I can consider setting up a SSD
> only distributed disperse volume.
EC performance is quite good for our workload,I did not try any tier
in front of it
Test your workload without tier, if it works then KISS
>
> I'd also appreciate any feedback on likely performance issues and tuning
> tips?
You can find kernel performance tuning here:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/
You may also change client.event-threads, server.event-threads and
heal related parameters
but do not forget to test your workload after and before changing those values.
>
> Many Thanks
>
> -Alastair
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Maximum bricks per volume recommendation

2017-03-06 Thread Serkan Çoban

It depends on your workload and expectations.
I have 1500 bricks in single volume and happy with it.
I don't do metadata heavy operations on it.
I also 4000 bricks single volume in plan if it pass the evaluations.
Just do your tests and make sure it works before going production.

On Mon, Mar 6, 2017 at 11:54 AM, qingwei wei <tcheng...@gmail.com> wrote:
> Hi Serkan,
>
> Thanks for the information. So 150 bricks should still be good. So
> what number of bricks is consider excessive?
>
> Cw
>
> On Mon, Mar 6, 2017 at 3:14 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
>> Putting lots of bricks in a volume have side affects. Slow meta
>> operations, slow gluster commands executions, etc.
>> But 150 bricks are not that much.
>>
>> On Mon, Mar 6, 2017 at 9:41 AM, qingwei wei <tcheng...@gmail.com> wrote:
>>> Hi,
>>>
>>> Is there hard limit on the maximum number of bricks per Gluster
>>> volume. And if no such hard limit exists, then is there any best
>>> practice on selecting the number of bricks per volume. Example, if i
>>> would like to create a 200TB for my host, which config below is
>>> better?
>>>
>>> HDD: 4TB (1 brick on 1 physical disk)
>>> 1 Gluster volume = 10x3 (30 bricks in total)
>>> total 5 Gluster volumes are created and host will combine them as one
>>> logical volume
>>>
>>> or
>>>
>>> HDD: 4TB (1 brick on 1 physical disk)
>>> 1 Gluster volume = 50x3 (150 bricks in total)
>>> total 1 Gluster volume is created
>>>
>>>
>>> Thanks.
>>>
>>> Cw
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Maximum bricks per volume recommendation

2017-03-05 Thread Serkan Çoban

Putting lots of bricks in a volume have side affects. Slow meta
operations, slow gluster commands executions, etc.
But 150 bricks are not that much.

On Mon, Mar 6, 2017 at 9:41 AM, qingwei wei  wrote:
> Hi,
>
> Is there hard limit on the maximum number of bricks per Gluster
> volume. And if no such hard limit exists, then is there any best
> practice on selecting the number of bricks per volume. Example, if i
> would like to create a 200TB for my host, which config below is
> better?
>
> HDD: 4TB (1 brick on 1 physical disk)
> 1 Gluster volume = 10x3 (30 bricks in total)
> total 5 Gluster volumes are created and host will combine them as one
> logical volume
>
> or
>
> HDD: 4TB (1 brick on 1 physical disk)
> 1 Gluster volume = 50x3 (150 bricks in total)
> total 1 Gluster volume is created
>
>
> Thanks.
>
> Cw
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Question about heterogeneous bricks

2017-02-21 Thread Serkan Çoban

I think, gluster1 and gluster2 became a replica pair. Smallest size
between them is affective size (1GB)
Same for gluster3 and gluster4 (3GB). Total 4GB space available. This
is just a guest though..

On Tue, Feb 21, 2017 at 1:18 PM, Daniele Antolini  wrote:
> Hi all,
>
> first of all, nice to meet you. I'm new here and I'm subscribing to do a
> very simple question.
>
> I don't understand completely how, in a distributed with replica
> environment, heterogeneous bricks are involved.
>
> I've just done a test with four bricks:
>
> gluster11 GB
> gluster22 GB
> gluster35 GB
> gluster43 GB
>
> Each partition is mounted locally at /opt/data
>
> I've created a gluster volume with:
>
> gluster volume create gv0 replica 2 gluster1:/opt/data/gv0
> gluster2:/opt/data/gv0 gluster3:/opt/data/gv0 gluster4:/opt/data/gv0
>
> and then mounted on a client:
>
> testgfs1:/gv0   4,0G   65M4,0G   2% /mnt/test
>
> I see 4 GB of free space but I cannot understand how this space has been
> allocated.
> Can please someone explain to me how this can happened?
>
> Thanks a lot
>
> Daniele
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Disks configuration

2017-02-18 Thread Serkan Çoban

AFAIK, LVM is needed only if you use snapshots. Other then that, you
do not need it.
You can use RAID if you don't mind extra lost space.
You can test both configs and then choose the right one for your workload.


On Sat, Feb 18, 2017 at 10:03 PM, Mahdi Adnan  wrote:
> Hi,
>
>
> I have a question regarding disk preparation.
>
> I have 4 nodes, each has 24 SSD, i would like to know whats the best
> practice to setup the disks.
>
> The pool will be used as a vmware datastore.
>
> im planning on using each disk as a brick without lvm, pool will be
> distributed replicas with sharding enabled, do you have any comments on this
> setup ? because i dont know if i should use all disks as one big disk with
> RAID and disable sharding, or use each disk as a brick with sharding
> enabled.
>
>
> also, Redhat website stated that i have to use LVM, but i couldn't get an
> answer on why i should use LVM or why i should't.
>
> "You should not create Red Hat Storage volume bricks using raw disks. Bricks
> must be created on thin-provisioned Logical Volumes (LVs)"
>
> https://access.redhat.com/articles/1273933
>
>
>
> Thank you.
>
>
> --
>
> Respectfully
> Mahdi A. Mahdi
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Advice for sizing a POC

2017-02-18 Thread Serkan Çoban

With 1GB/file size you should definitely try JBOD with disperse volumes.
Gluster can easily get 1GB/per node network throughput using disperse volumes.

We use 26 disks/node without problems and planning to use 90 disk/node.

I don't think you'll need SSD caching for sequential read heavy workload...

Just test the workload with different disperse configurations to find
the optimum for your workload.


On Fri, Feb 17, 2017 at 7:54 PM, Jake Davis  wrote:
> Greetings, I'm trying to spec hardware for a proof of concept. I'm hoping
> for a sanity check to see if I'm asking the right questions and making the
> right assumptions.
> I don't have real numbers for expected workload, but for our main use case,
> we're likely talking a few hundred thousand files, read heavy, with average
> file size around 1 GB. Fairly parallel access pattern.
>
> I've read elsewhere that the max recommended disk count for a RAID6 array is
> twelve. Is that per node, or per brick? i.e. if I have a number of 24 or 36
> disk arrays attached to a single node, would it make sense to divide the
> larger array into 2 or 3 bricks with 12 disk stripes, or do a want to limit
> the brick count to one per node in this case?
>
> For FUSE clients, assuming one 12 disk RAID6 brick per node, in general, how
> many nodes do I need in my cluster before I start meeting/exceeding the
> throughput of a direct attached raid via NFS mount?
>
> RAM; is it always a case of the more, the merrier? Or is there some rule of
> thumb for calculating return on investment there?
>
> Is there a scenario were adding a few SSD's to a node can increase the
> performance of a spinning disk brick by acting as a read cache or some such?
> Assuming non-ZFS.
>
> I've read that for highly parallel access, it might make more sense to use
> JBOD with one brick per disk. Is that advice file size dependent? And What
> question do I need to ask myself to determine how many of these single disk
> bricks I want per-node?
>
> Many thanks!
> -Jake
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 90 Brick/Server suggestions?

2017-02-17 Thread Serkan Çoban

>Any particular reason for this, other than maximising space by avoiding two 
>layers of RAID/redundancy?
Yes that's right we can get 720TB net usable space per server with
90*10TB disks. Any RAID layer will cost too much..


On Fri, Feb 17, 2017 at 6:13 PM, Gambit15  wrote:
>> RAID is not an option, JBOD with EC will be used.
>
>
> Any particular reason for this, other than maximising space by avoiding two
> layers of RAID/redundancy?
> Local RAID would be far simpler & quicker for replacing failed drives, and
> it would greatly reduce the number of bricks & load on Gluster.
>
> We use RAID volumes for our bricks, and the benefits of simplified
> management far outweigh the costs of a little lost capacity.
>
> D
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 90 Brick/Server suggestions?

2017-02-16 Thread Serkan Çoban

>We have 12 on order.  Actually the DSS7000 has two nodes in the chassis,
>and  each accesses 45 bricks.  We will be using an erasure code scheme
>probably 24:3 or 24:4, we have not sat down and really thought about the
>exact scheme we will use.

If we cannot get 1 node/90 disk configuration, we also get it as 2
nodes/45 disks each.
Be careful about EC. I am using 16+4 in production, only drawback is
slow rebuild times.
It takes 10 days to rebuild 8TB disk. Although parallel heal for EC
improves it in 3.9,
don't forget to test rebuild times for different EC configurations,

>90 disks per server is a lot.  In particular, it might be out of balance with 
>other
>characteristics of the machine - number of cores, amount of memory, network
>or even bus bandwidth

Nodes will be pretty powerful, 2x18 core CPUs with 256GB RAM and 2X10Gb bonded
ethernet. It will be used for archive purposes so I don't need more
than 1GB/s/node.
RAID is not an option, JBOD with EC will be used.

>gluster volume set all cluster.brick-multiplex on
I just read the 3.10 release notes and saw this. I think this is a
good solution,
I plan to use 3.10.x and will probably test multiplexing and get in
touch for help..

Thanks for the suggestions,
Serkan


On Fri, Feb 17, 2017 at 1:39 AM, Jeff Darcy  wrote:
>> We are evaluating dell DSS7000 chassis with 90 disks.
>> Has anyone used that much brick per server?
>> Any suggestions, advices?
>
> 90 disks per server is a lot.  In particular, it might be out of balance with 
> other characteristics of the machine - number of cores, amount of memory, 
> network or even bus bandwidth.  Most people who put that many disks in a 
> server use some sort of RAID (HW or SW) to combine them into a smaller number 
> of physical volumes on top of which filesystems and such can be built.  If 
> you can't do that, or don't want to, you're in poorly explored territory.  My 
> suggestion would be to try running as 90 bricks.  It might work fine, or you 
> might run into various kinds of contention:
>
> (1) Excessive context switching would indicate not enough CPU.
>
> (2) Excessive page faults would indicate not enough memory.
>
> (3) Maxed-out network ports . . . well, you can figure that one out.  ;)
>
> If (2) applies, you might want to try brick multiplexing.  This is a new 
> feature in 3.10, which can reduce memory consumption by more than 2x in many 
> cases by putting multiple bricks into a single process (instead of one per 
> brick).  This also drastically reduces the number of ports you'll need, since 
> the single process only needs one port total instead of one per brick.  In 
> terms of CPU usage or performance, gains are far more modest.  Work in that 
> area is still ongoing, as is work on multiplexing in general.  If you want to 
> help us get it all right, you can enable multiplexing like this:
>
>   gluster volume set all cluster.brick-multiplex on
>
> If multiplexing doesn't help for you, speak up and maybe we can make it 
> better, or perhaps come up with other things to try.  Good luck!
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] 90 Brick/Server suggestions?

2017-02-15 Thread Serkan Çoban

Hi,

We are evaluating dell DSS7000 chassis with 90 disks.
Has anyone used that much brick per server?
Any suggestions, advices?

Thanks,
Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster anc balance-alb

2017-02-14 Thread Serkan Çoban

In balanced-alb mode you should see nearly equal TX size, but
something wrong with your statistics.
RX balanced by intercepting MAC address for ARP replays, so in theory
if you have enough clients you should have equally balanced RX.
I am also using balanced-alb with 60 gluster servers and nearly 1000
clients using them, my TX distribution is %50 between two links and RX
is %5-%95.
This means theory and practice differs way too much :)

On Tue, Feb 14, 2017 at 12:33 PM, Alessandro Briosi  wrote:
> Hi all,
> I'd like to have a clarification on bonding with gluster.
>
> I have a gluster deployment which is using a bond with 4 eths.
>
> The bond is configured with balance-alb as 2 are connected to 1 switch
> and the other 2 to another switch.
> This is for traffic balance and redundancy.
>
> The switches are stacked with a 10Gbit cable. They are managed.
>
> The same connection is used for server and client (the servers are also
> client of themselfs).
>
> For what I understand balance-alb balances single connections, so one
> connection can get at max 1Gb speed.
>
> It though seems that only 1 ethernet is mainly used.
>
> This is the output for the interested ethernets (the same basically
> applyes to the other servers)
>
> bond2 Link encap:Ethernet  HWaddr 00:0a:f7:a5:ec:5c
>   inet addr:192.168.102.1  Bcast:192.168.102.255  Mask:255.255.255.0
>   inet6 addr: fe80::20a:f7ff:fea5:ec5c/64 Scope:Link
>   UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
>   RX packets:195041678 errors:0 dropped:4795 overruns:0 frame:0
>   TX packets:244194369 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:346742936782 (322.9 GiB)  TX bytes:1202018794556 (1.0
> TiB)
>
> eth4  Link encap:Ethernet  HWaddr 00:0a:f7:a5:ec:5c
>   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
>   RX packets:194076526 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:239094839 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:346669905046 (322.8 GiB)  TX bytes:1185779765214 (1.0
> TiB)
>   Interrupt:88
>
> eth5  Link encap:Ethernet  HWaddr 00:0a:f7:a5:ec:5d
>   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
>   RX packets:317620 errors:0 dropped:1597 overruns:0 frame:0
>   TX packets:3969287 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:21155944 (20.1 MiB)  TX bytes:16107271750 (15.0 GiB)
>   Interrupt:84
>
> eth6  Link encap:Ethernet  HWaddr 00:0a:f7:a5:ec:5e
>   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
>   RX packets:317620 errors:0 dropped:1596 overruns:0 frame:0
>   TX packets:557634 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:21155972 (20.1 MiB)  TX bytes:35688576 (34.0 MiB)
>   Interrupt:88
>
> eth7  Link encap:Ethernet  HWaddr 00:0a:f7:a5:ec:5f
>   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
>   RX packets:317618 errors:0 dropped:1596 overruns:0 frame:0
>   TX packets:557633 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:21155816 (20.1 MiB)  TX bytes:35688512 (34.0 MiB)
>   Interrupt:84
>
> #cat /proc/net/bonding/bond2
>
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>
> Bonding Mode: adaptive load balancing
> Primary Slave: None
> Currently Active Slave: eth4
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> Slave Interface: eth4
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:0a:f7:a5:ec:5c
> Slave queue ID: 0
>
> Slave Interface: eth5
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:0a:f7:a5:ec:5d
> Slave queue ID: 0
>
> Slave Interface: eth6
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:0a:f7:a5:ec:5e
> Slave queue ID: 0
>
> Slave Interface: eth7
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:0a:f7:a5:ec:5f
> Slave queue ID: 0
>
> Is this normal?
>
> I could use LACP though it would require me to use 2 bonds (1 for each
> switch), though I have no idea on how to configure a "failover".
>
> Any hint would be appreciated.
>
> Alessandro
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Notice: https://download.gluster.org:/pub/gluster/glusterfs/LATEST has changed

2017-01-01 Thread Serkan Çoban

Hi,

I want to try multi threaded disperse heal in 3.9 but I have questions:
Currently whatever I do (find /bricks -exec stat... from multiple
clients) I cannot get more than 10MB/sec heal speed for one brick.
Will multi-thread heal improve this? I will do a test but I also ask you...
Secondly, if I upgrade to 3.9 and it gives problems is it safe to
downgrade to 3.7.11? Any suggestions?

On Sat, Nov 19, 2016 at 9:12 PM, Serkan Çoban <cobanser...@gmail.com> wrote:
> Hi,
>
> Sorry for late reply. I think I will wait for 3.10 LTS release to try
> it. I am on 3.7.11 and it is very stable for us.
>
> On Thu, Nov 17, 2016 at 1:05 PM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
>>
>>
>> On Wed, Nov 16, 2016 at 11:47 PM, Serkan Çoban <cobanser...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> Will disperse related new futures be ported to 3.7? or we should
>>> upgrade for those features?
>>
>>
>> hi Serkan,
>>   Unfortunately, no they won't be backported to 3.7. We are adding
>> new features to latest releases to prevent accidental bugs slipping in
>> stable releases. While the features are working well, we did see a
>> performance problem very late in the cycle in the I/O path just with EC for
>> small files. You should wait before you upgrade IMO.
>>
>> You were trying to test how long it takes to heal data with multi-threaded
>> heal in EC right? Do you want to give us feedback by trying this feature
>> out?
>>
>>>
>>> On Wed, Nov 16, 2016 at 8:51 PM, Kaleb S. KEITHLEY <kkeit...@redhat.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > As some of you may have noticed, GlusterFS-3.9.0 was released. Watch
>>> > this space for the official announcement soon.
>>> >
>>> > If you are using Community GlusterFS packages from download.gluster.org
>>> > you should check your package metadata to be sure that an update doesn't
>>> > inadvertently update your system to 3.9.
>>> >
>>> > There is a new symlink:
>>> > https://download.gluster.org:/pub/gluster/glusterfs/LTM-3.8 which will
>>> > remain pointed at the GlusterFS-3.8 packages. Use this instead of
>>> > .../LATEST to keep getting 3.8 updates without risk of accidentally
>>> > getting 3.9. There is also a new LTM-3.7 symlink that you can use for
>>> > 3.7 updates.
>>> >
>>> > Also note that there is a new package signing key for the 3.9 packages
>>> > that are on download.gluster.org. The old key remains the same for 3.8
>>> > and earlier packages. New releases of 3.8 and 3.7 packages will continue
>>> > to use the old key.
>>> >
>>> > GlusterFS-3.9 is the first "short term" release; it will be supported
>>> > for approximately six months. 3.7 and 3.8 are Long Term Maintenance
>>> > (LTM) releases. 3.9 will be followed by 3.10; 3.10 will be a LTM release
>>> > and 3.9 and 3.7 will be End-of-Life (EOL) at that time.
>>> >
>>> >
>>> > --
>>> >
>>> > Kaleb
>>> > ___
>>> > Gluster-users mailing list
>>> > Gluster-users@gluster.org
>>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Dispersed volume and auto-heal

2016-12-07 Thread Serkan Çoban

No, you should replace the brick.

On Wed, Dec 7, 2016 at 1:02 PM, Cedric Lemarchand  wrote:
> Hello,
>
> Is gluster able to auto-heal when some bricks are lost ? by auto-heal I mean 
> that losted parity are re-generated on bricks that are still available in 
> order to recover the level of redundancy without replacing the failed bricks.
>
> I am in the learning curve, apologies if the question is trivial.
>
> Cheers,
>
> Cédric
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] DISPERSED VOLUME

2016-11-25 Thread Serkan Çoban

I think you should try with a bigger file.1,10,100,1000KB?
Small files might just being replicated to bricks...(Just a guess..)

On Fri, Nov 25, 2016 at 12:41 PM, Alexandre Blanca
 wrote:
> Hi,
>
> I am a beginner in distributed file systems and I currently work on
> Glusterfs.
> I work with 4 VM : srv1, srv2, srv3 and cli1
> I tested several types of volume (distributed, replicated, striped ...)
> which are for me JBOD, RAID 1 and RAID 0.
> When I try to make a dispersed volume (raid5 / 6) I have a misunderstanding
> ...
>
>
> gluster volume create gv7 disperse-data 3 redundancy 1
> ipserver1:/data/brick1/gv7 ipserver2:/data/brick1/gv7
> ipserver3:/data/brick1/gv7 ipserver4:/data/brick1/gv7
>
>
> gluster volume info
>
>
> Volume Name: gv7
> Type: Disperse
> Status: Created
> Number of Bricks: 4
> Transport-type: tcp
> Bricks:
> Brick1: ipserver1:/data/brick1/gv7
> Brick2: ipserver2:/data/brick1/gv7
> Brick3: ipserver3:/data/brick1/gv7
> Brick4: ipserver4:/data/brick1/gv7
>
> gluster volume start gv7
>
>
> mkdir /home/cli1/gv7_dispersed_directory
>
>
> mount -t glusterfs ipserver1:/gv7 /home/cli1/gv7_dispersed_directory
>
>
>
> Now, when i create a file on my moint point (gv7_dispersed_directory) :
>
>
> cd /home/cli1/gv7_dispersed_directory
>
>
> echo 'hello world !' >> test_file
>
>
> I can see in my srv1 :
>
>
> cd /data/brick1/gv7
>
>
> cat test
>
>
> hello world !
>
>
> in my srv2 :
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
>
> in my srv4:
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
>
> but in my srv3 :
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
> hello world !
>
>
> Why?! output of server 3 displays 2 times hello world ! Parity? Redundancy?
> I don't know...
>
> Best regards
>
> Alex
>
>
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Notice: https://download.gluster.org:/pub/gluster/glusterfs/LATEST has changed

2016-11-19 Thread Serkan Çoban

Hi,

Sorry for late reply. I think I will wait for 3.10 LTS release to try
it. I am on 3.7.11 and it is very stable for us.

On Thu, Nov 17, 2016 at 1:05 PM, Pranith Kumar Karampuri
<pkara...@redhat.com> wrote:
>
>
> On Wed, Nov 16, 2016 at 11:47 PM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
>>
>> Hi,
>> Will disperse related new futures be ported to 3.7? or we should
>> upgrade for those features?
>
>
> hi Serkan,
>   Unfortunately, no they won't be backported to 3.7. We are adding
> new features to latest releases to prevent accidental bugs slipping in
> stable releases. While the features are working well, we did see a
> performance problem very late in the cycle in the I/O path just with EC for
> small files. You should wait before you upgrade IMO.
>
> You were trying to test how long it takes to heal data with multi-threaded
> heal in EC right? Do you want to give us feedback by trying this feature
> out?
>
>>
>> On Wed, Nov 16, 2016 at 8:51 PM, Kaleb S. KEITHLEY <kkeit...@redhat.com>
>> wrote:
>> > Hi,
>> >
>> > As some of you may have noticed, GlusterFS-3.9.0 was released. Watch
>> > this space for the official announcement soon.
>> >
>> > If you are using Community GlusterFS packages from download.gluster.org
>> > you should check your package metadata to be sure that an update doesn't
>> > inadvertently update your system to 3.9.
>> >
>> > There is a new symlink:
>> > https://download.gluster.org:/pub/gluster/glusterfs/LTM-3.8 which will
>> > remain pointed at the GlusterFS-3.8 packages. Use this instead of
>> > .../LATEST to keep getting 3.8 updates without risk of accidentally
>> > getting 3.9. There is also a new LTM-3.7 symlink that you can use for
>> > 3.7 updates.
>> >
>> > Also note that there is a new package signing key for the 3.9 packages
>> > that are on download.gluster.org. The old key remains the same for 3.8
>> > and earlier packages. New releases of 3.8 and 3.7 packages will continue
>> > to use the old key.
>> >
>> > GlusterFS-3.9 is the first "short term" release; it will be supported
>> > for approximately six months. 3.7 and 3.8 are Long Term Maintenance
>> > (LTM) releases. 3.9 will be followed by 3.10; 3.10 will be a LTM release
>> > and 3.9 and 3.7 will be End-of-Life (EOL) at that time.
>> >
>> >
>> > --
>> >
>> > Kaleb
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] question about glusterfs version migrate

2016-11-16 Thread Serkan Çoban

Below link has changes in each release.
https://github.com/gluster/glusterfs/tree/release-3.7/doc/release-notes


On Wed, Nov 16, 2016 at 11:49 AM, songxin  wrote:
> Hi,
> I am planning to migrate from gluster 3.7.6 to gluster 3.7.10.
> So I have two questions below.
> 1.How could I know the changes in gluster 3.7.6 compared to gluster 3.7.10?
> 2.Does my application need any NBC changes?
>
> Thanks,
> Xin
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Looking for use cases / opinions

2016-11-09 Thread Serkan Çoban

Disks are SAS disks on the server. No hardware RAID(JBOD), no SSDs,
xfs for brick filesystem.

On Wed, Nov 9, 2016 at 8:28 PM, Alastair Neil <ajneil.t...@gmail.com> wrote:
> Serkan
>
> I'd be interested to know how your disks are attached (SAS?)?  Do you use
> any hardware RAID, or zfs and do you have and SSDs in there?
>
> On 9 November 2016 at 06:17, Serkan Çoban <cobanser...@gmail.com> wrote:
>>
>> Hi, I am using 26x8TB disks per server. There are 60 servers in gluster
>> cluster.
>> Each disk is a brick and configuration is 16+4 EC, 9PB single volume.
>> Clients are using fuse mounts.
>> Even with 1-2K files in a directory, ls from clients takes ~60 secs.
>> So If you are sensitive to metadata operations, I suggest another
>> approach...
>>
>>
>> On Wed, Nov 9, 2016 at 1:05 PM, Frank Rothenstein
>> <f.rothenst...@bodden-kliniken.de> wrote:
>> > As you said you want to have 3 or 4 replicas, so i would use the zfs
>> > knowledge and build 1 zpool per node with whatever config you know is
>> > fastest on this kind of hardware and as safe as you need (stripe,
>> > mirror, raidz1..3 - resilvering zfs is faster than healing gluster, I
>> > think) . 1 node -> 1 brick (per gluster volume).
>> >
>> > Frank
>> > Am Dienstag, den 08.11.2016, 19:19 + schrieb Thomas Wakefield:
>> >> We haven’t decided how the JBODS would be configured.  They would
>> >> likely be SAS attached without a raid controller for improved
>> >> performance.  I run large ZFS arrays this way, but only in single
>> >> server NFS setups right now.
>> >> Mounting each hard drive as it’s own brick would probably give the
>> >> most usable space, but would need scripting to manage building all
>> >> the bricks.  But does Gluster handle 1000’s of small bricks?
>> >>
>> >>
>> >>
>> >> > On Nov 8, 2016, at 9:18 AM, Frank Rothenstein <f.rothenstein@bodden
>> >> > -kliniken.de> wrote:
>> >> >
>> >> > Hi Thomas,
>> >> >
>> >> > thats a huge storage.
>> >> > What I can say from my usecase - dont use Gluster directly if the
>> >> > files
>> >> > are small. I dont know, if the file count matters, but if the files
>> >> > are
>> >> > small (few KiB), Gluster takes ages to remove for example. Doing
>> >> > the
>> >> > same in a VM with e.g. ext4 disk on the very same Gluster gives a
>> >> > big
>> >> > speedup.
>> >> > There are many options for a new Gluster volume, like Lindsay
>> >> > mentioned.
>> >> > And there are other options, like Ceph, OrangeFS.
>> >> > How do you want to use the JBODs? I dont think you would use every
>> >> > single drive as a brick... How are these connected to the servers?
>> >> >
>> >> > Im only dealing with about 10TiB Gluster volumes, so by far not at
>> >> > your
>> >> > planned level, but I really would like to see some results, if you
>> >> > go
>> >> > for Gluster!
>> >> >
>> >> > Frank
>> >> >
>> >> >
>> >> > Am Dienstag, den 08.11.2016, 13:49 + schrieb Thomas Wakefield:
>> >> > > I think we are leaning towards erasure coding with 3 or 4
>> >> > > copies.  But open to suggestions.
>> >> > >
>> >> > >
>> >> > > > On Nov 8, 2016, at 8:43 AM, Lindsay Mathieson > >> > > > n@gm
>> >> > > > ail.com> wrote:
>> >> > > >
>> >> > > > On 8/11/2016 11:38 PM, Thomas Wakefield wrote:
>> >> > > > > High Performance Computing, we have a small cluster on campus
>> >> > > > > of
>> >> > > > > about 50 linux compute servers.
>> >> > > > >
>> >> > > >
>> >> > > > D'oh! I should have thought of that.
>> >> > > >
>> >> > > >
>> >> > > > Are you looking at replication (2 or 3)/disperse or pure
>> >> > > > disperse?
>> >> > > >
>> >> > > > --
>> >> > > > Lindsay Mathieson
>> >> > > >
>> >> > >
>> >> > > ___
>> >

Re: [Gluster-users] Looking for use cases / opinions

2016-11-09 Thread Serkan Çoban

Hi, I am using 26x8TB disks per server. There are 60 servers in gluster cluster.
Each disk is a brick and configuration is 16+4 EC, 9PB single volume.
Clients are using fuse mounts.
Even with 1-2K files in a directory, ls from clients takes ~60 secs.
So If you are sensitive to metadata operations, I suggest another approach...


On Wed, Nov 9, 2016 at 1:05 PM, Frank Rothenstein
 wrote:
> As you said you want to have 3 or 4 replicas, so i would use the zfs
> knowledge and build 1 zpool per node with whatever config you know is
> fastest on this kind of hardware and as safe as you need (stripe,
> mirror, raidz1..3 - resilvering zfs is faster than healing gluster, I
> think) . 1 node -> 1 brick (per gluster volume).
>
> Frank
> Am Dienstag, den 08.11.2016, 19:19 + schrieb Thomas Wakefield:
>> We haven’t decided how the JBODS would be configured.  They would
>> likely be SAS attached without a raid controller for improved
>> performance.  I run large ZFS arrays this way, but only in single
>> server NFS setups right now.
>> Mounting each hard drive as it’s own brick would probably give the
>> most usable space, but would need scripting to manage building all
>> the bricks.  But does Gluster handle 1000’s of small bricks?
>>
>>
>>
>> > On Nov 8, 2016, at 9:18 AM, Frank Rothenstein > > -kliniken.de> wrote:
>> >
>> > Hi Thomas,
>> >
>> > thats a huge storage.
>> > What I can say from my usecase - dont use Gluster directly if the
>> > files
>> > are small. I dont know, if the file count matters, but if the files
>> > are
>> > small (few KiB), Gluster takes ages to remove for example. Doing
>> > the
>> > same in a VM with e.g. ext4 disk on the very same Gluster gives a
>> > big
>> > speedup.
>> > There are many options for a new Gluster volume, like Lindsay
>> > mentioned.
>> > And there are other options, like Ceph, OrangeFS.
>> > How do you want to use the JBODs? I dont think you would use every
>> > single drive as a brick... How are these connected to the servers?
>> >
>> > Im only dealing with about 10TiB Gluster volumes, so by far not at
>> > your
>> > planned level, but I really would like to see some results, if you
>> > go
>> > for Gluster!
>> >
>> > Frank
>> >
>> >
>> > Am Dienstag, den 08.11.2016, 13:49 + schrieb Thomas Wakefield:
>> > > I think we are leaning towards erasure coding with 3 or 4
>> > > copies.  But open to suggestions.
>> > >
>> > >
>> > > > On Nov 8, 2016, at 8:43 AM, Lindsay Mathieson > > > > n@gm
>> > > > ail.com> wrote:
>> > > >
>> > > > On 8/11/2016 11:38 PM, Thomas Wakefield wrote:
>> > > > > High Performance Computing, we have a small cluster on campus
>> > > > > of
>> > > > > about 50 linux compute servers.
>> > > > >
>> > > >
>> > > > D'oh! I should have thought of that.
>> > > >
>> > > >
>> > > > Are you looking at replication (2 or 3)/disperse or pure
>> > > > disperse?
>> > > >
>> > > > --
>> > > > Lindsay Mathieson
>> > > >
>> > >
>> > > ___
>> > > Gluster-users mailing list
>> > > Gluster-users@gluster.org
>> > > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>> >
>> >
>> >
>> > ___
>> > ___
>> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
>> > Sandhufe 2
>> > 18311 Ribnitz-Damgarten
>> >
>> > Telefon: 03821-700-0
>> > Fax:   03821-700-240
>> >
>> > E-Mail: i...@bodden-kliniken.de   Internet: http://www.bodden-klini
>> > ken.de
>> >
>> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-
>> > Nr.: 079/133/40188
>> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr.
>> > Falko Milski
>> >
>> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten
>> > Adressaten bestimmt. Wenn Sie nicht der vorge-
>> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten,
>> > beachten Sie bitte, dass jede Form der Veröf-
>> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-
>> > Mail unzulässig ist. Wir bitten Sie, sofort den
>> > Absender zu informieren und die E-Mail zu löschen.
>> >
>> >
>> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016
>> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***
>> >
>>
>>
>
>
>
>
>
> __
> BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
> Sandhufe 2
> 18311 Ribnitz-Damgarten
>
> Telefon: 03821-700-0
> Fax:   03821-700-240
>
> E-Mail: i...@bodden-kliniken.de   Internet: http://www.bodden-kliniken.de
>
> Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 
> 079/133/40188
> Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski
>
> Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten 
> bestimmt. Wenn Sie nicht der vorge-
> sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten 
> Sie bitte, dass jede Form der Veröf-
>

1 2 >

1 - 100 of 197 matches

Mail list logo