Re: [Gluster-users] 90 Brick/Server suggestions?
On 02/17/2017 10:13 AM, Gambit15 wrote: RAID is not an option, JBOD with EC will be used. Any particular reason for this, other than maximising space by avoiding two layers of RAID/redundancy? Local RAID would be far simpler & quicker for replacing failed drives, and it would greatly reduce the number of bricks & load on Gluster. We use RAID volumes for our bricks, and the benefits of simplified management far outweigh the costs of a little lost capacity. D This is as much of a question as a comment. My impression is that distributed filesystems like Gluster shine where the number if bricks is close to the number of servers and both of those numbers are as large as possible. So the ideal solution would be 90 disks as 90 bricks on 90 servers. This would be hard to do in practice but the point of Gluster is to try and spread the load and potential failures over a large surface. Putting all the disks into a big RAID array and then just duplicating that for redundancy is not much better than using something like DRBD which would likely perform faster but be less scaleable. In the end with big RAID arrays and fewer servers you have a smaller surface to absorb failures. Over the years I have seen raid systems fail because users put them in and forget about them and then see system failures becasue they did not monitor the raid arrays. I would be willing to bet that 80%+ of all the raid arrays out there are not monitored. Gluster is more in your face about failures and arguably should be more reliable in practice because you will know quickly about a failure. Feel free to correct my misconceptions. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 al...@netvel.net || ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Advice for sizing a POC
Greetings, I'm trying to spec hardware for a proof of concept. I'm hoping for a sanity check to see if I'm asking the right questions and making the right assumptions. I don't have real numbers for expected workload, but for our main use case, we're likely talking a few hundred thousand files, read heavy, with average file size around 1 GB. Fairly parallel access pattern. I've read elsewhere that the max recommended disk count for a RAID6 array is twelve. Is that per node, or per brick? i.e. if I have a number of 24 or 36 disk arrays attached to a single node, would it make sense to divide the larger array into 2 or 3 bricks with 12 disk stripes, or do a want to limit the brick count to one per node in this case? For FUSE clients, assuming one 12 disk RAID6 brick per node, in general, how many nodes do I need in my cluster before I start meeting/exceeding the throughput of a direct attached raid via NFS mount? RAM; is it always a case of the more, the merrier? Or is there some rule of thumb for calculating return on investment there? Is there a scenario were adding a few SSD's to a node can increase the performance of a spinning disk brick by acting as a read cache or some such? Assuming non-ZFS. I've read that for highly parallel access, it might make more sense to use JBOD with one brick per disk. Is that advice file size dependent? And What question do I need to ask myself to determine how many of these single disk bricks I want per-node? Many thanks! -Jake ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] connection attempt on 127.0.0.1:24007 failed ?
On 17/02/17 16:23, Joe Julian wrote: "invalid argument" in socket could be: EINVAL Unknown protocol, or protocol family not available. EINVAL Invalid flags in type Since we know that the flags don't cause errors elsewhere and don't change from one installation to another I think it's safe to disregard that possibility. That leaves the former. Obviously TCP is a known protocol. That leaves "protocol family not available". I haven't read the kernel code for this but of the top of my head I would look for ipv4 (if you are ipv6 only that's an invalid address) or socket exhaustion. something to do with kernel version, I run centos off kernel-ml and v.4.9.5 was where this message persisted, now with 4.9.6 it's gone. I wonder if gluster dev guys rest centos release also against ml kernels. thanks, L. On February 17, 2017 7:47:23 AM PST, lejeczek wrote: hi guys in case it's something trivial and I start digging, removing bits. I see these logged every couple of seconds on one peer: [2017-02-17 15:44:40.012078] E [socket.c:3097:socket_connect] 0-glusterfs: connection attempt on 127.0.0.1:24007 failed, (Invalid argument) [2017-02-17 15:44:43.837139] E [socket.c:3097:socket_connect] 0-glusterfs: connection attempt on 127.0.0.1:24007 failed, (Invalid argument) and sometimes: [2017-02-17 15:45:18.414234] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected) [2017-02-17 15:45:18.414260] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers I this caused by local to the peer mount requests? b.w. L. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Machine becomes its own peer
Joe - Scott had sent me a private email and I provided the work around, for some (unknown) reason all the nodes ended up having two uuids for a particular peer which caused it. I've asked for the log files to further debug. On Fri, 17 Feb 2017 at 21:58, Joe Julian wrote: > Does your repaired server have the correct uuid /var/lib/glusterd/ > glusterd.info? > > On February 16, 2017 9:49:56 PM PST, Scott Hazelhurst < > scott.hazelhu...@wits.ac.za> wrote: > > > Dear all > > Last week I posted a query about a problem I had with a machine that had > failed but the underlying hard disk with the gluster brick was good. I’ve > made some progress in restoring. I now have the problem with my new restored > machine where it becomes its own peer, which then breaks everything. > > 1. Gluster daemons are off on all peers, content of /var/lib/glusterd/peers > looks good. > 2. I start the gluster daemons on all peers. All looks good. > 3. For about 2 minutes, there’s no obvious problem — if I do a gluster peer > status on any machine it looks good, if I do a gluster volume status A01 on > any machine it looks good. > 4. Then at some point, the /var/lib/glusterd/peers file of the new, restored > machine gets an entry for itself and things start breaking. A typical error > message is the understandable > > : Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock > held by: 4fb930f7-554e-462a-9204-4592591feeb8 > > 5. This is repeatable — if I stop daemons, remove the offending entry in > /var/lib/glusterd/peer, and restart, the same behavior occurs — all good for > a minute or two and then something magically puts something in > /var/lib/glusterd/peers > > In a previous step in restoring my machine, I had a different error of > mismatching cksums and what I did then may be the cause of the problem. In > searching the list archives I found someone with a similar cksum problem, and > the proposed solution was to copy the /var/lib/glusterd/vols/ from another of > the peers to the new machine. This may not be the issue but this is the only > thing I think I did that was unconventional. > > I am running version 3.7.5-19 on Scientific Linux 6.8 > > If anyone can suggest a way forward I would be grateful > > Many thanks > > Scott > > > style="width:100%;"> > > size="1" color="#99">This communication is > intended for the addressee only. It is confidential. If you have received > this communication in error, please notify us immediately and destroy the > original message. You may not copy or disseminate this communication without > the permission of the University. Only authorised signatories are competent > to enter into agreements on behalf of the University and recipients are thus > advised that the content of this message may not be legally binding on the > University and may contain the personal views and opinions of the author, > which are not necessarily the views and opinions of The University of the > Witwatersrand, Johannesburg. All agreements between the University and > outsiders are subject to South African Law unless the University agrees in > writing to the contrary. > > -- > > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users -- - Atin (atinm) ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Machine becomes its own peer
Does your repaired server have the correct uuid /var/lib/glusterd/glusterd.info? On February 16, 2017 9:49:56 PM PST, Scott Hazelhurst wrote: > >Dear all > >Last week I posted a query about a problem I had with a machine that >had failed but the underlying hard disk with the gluster brick was >good. I’ve made some progress in restoring. I now have the problem with >my new restored machine where it becomes its own peer, which then >breaks everything. > >1. Gluster daemons are off on all peers, content of >/var/lib/glusterd/peers looks good. >2. I start the gluster daemons on all peers. All looks good. >3. For about 2 minutes, there’s no obvious problem — if I do a gluster >peer status on any machine it looks good, if I do a gluster volume >status A01 on any machine it looks good. >4. Then at some point, the /var/lib/glusterd/peers file of the new, >restored machine gets an entry for itself and things start breaking. A >typical error message is the understandable > >: Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, >lock held by: 4fb930f7-554e-462a-9204-4592591feeb8 > >5. This is repeatable — if I stop daemons, remove the offending entry >in /var/lib/glusterd/peer, and restart, the same behavior occurs — all >good for a minute or two and then something magically puts something in >/var/lib/glusterd/peers > >In a previous step in restoring my machine, I had a different error of >mismatching cksums and what I did then may be the cause of the problem. >In searching the list archives I found someone with a similar cksum >problem, and the proposed solution was to copy the >/var/lib/glusterd/vols/ from another of the peers to the new machine. >This may not be the issue but this is the only thing I think I did that >was unconventional. > >I am running version 3.7.5-19 on Scientific Linux 6.8 > >If anyone can suggest a way forward I would be grateful > >Many thanks > >Scott > > >style="width:100%;"> > >face="arial,sans-serif" size="1" color="#99">style="font-size:11px;">This communication is intended for the >addressee only. It is confidential. If you have received this >communication in error, please notify us immediately and destroy the >original message. You may not copy or disseminate this communication >without the permission of the University. Only authorised signatories >are competent to enter into agreements on behalf of the University and >recipients are thus advised that the content of this message may not be >legally binding on the University and may contain the personal views >and opinions of the author, which are not necessarily the views and >opinions of The University of the Witwatersrand, Johannesburg. All >agreements between the University and outsiders are subject to South >African Law unless the University agrees in writing to the contrary. > > >___ >Gluster-users mailing list >Gluster-users@gluster.org >http://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] connection attempt on 127.0.0.1:24007 failed ?
"invalid argument" in socket could be: EINVAL Unknown protocol, or protocol family not available. EINVAL Invalid flags in type Since we know that the flags don't cause errors elsewhere and don't change from one installation to another I think it's safe to disregard that possibility. That leaves the former. Obviously TCP is a known protocol. That leaves "protocol family not available". I haven't read the kernel code for this but of the top of my head I would look for ipv4 (if you are ipv6 only that's an invalid address) or socket exhaustion. On February 17, 2017 7:47:23 AM PST, lejeczek wrote: >hi guys > >in case it's something trivial and I start digging, removing >bits. I see these logged every couple of seconds on one peer: > >[2017-02-17 15:44:40.012078] E >[socket.c:3097:socket_connect] 0-glusterfs: connection >attempt on 127.0.0.1:24007 failed, (Invalid argument) >[2017-02-17 15:44:43.837139] E >[socket.c:3097:socket_connect] 0-glusterfs: connection >attempt on 127.0.0.1:24007 failed, (Invalid argument) > >and sometimes: > >[2017-02-17 15:45:18.414234] E >[glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: >failed to connect with remote-host: localhost (Transport >endpoint is not connected) >[2017-02-17 15:45:18.414260] I >[glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: >Exhausted all volfile servers > >I this caused by local to the peer mount requests? >b.w. >L. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 90 Brick/Server suggestions?
I wouldn't do that kind of per-server density for anything but cold storage. Putting that many eggs in one basket increases the potential for catastrophic failure. On February 15, 2017 11:04:16 AM PST, "Serkan Çoban" wrote: >Hi, > >We are evaluating dell DSS7000 chassis with 90 disks. >Has anyone used that much brick per server? >Any suggestions, advices? > >Thanks, >Serkan >___ >Gluster-users mailing list >Gluster-users@gluster.org >http://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] connection attempt on 127.0.0.1:24007 failed ?
hi guys in case it's something trivial and I start digging, removing bits. I see these logged every couple of seconds on one peer: [2017-02-17 15:44:40.012078] E [socket.c:3097:socket_connect] 0-glusterfs: connection attempt on 127.0.0.1:24007 failed, (Invalid argument) [2017-02-17 15:44:43.837139] E [socket.c:3097:socket_connect] 0-glusterfs: connection attempt on 127.0.0.1:24007 failed, (Invalid argument) and sometimes: [2017-02-17 15:45:18.414234] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected) [2017-02-17 15:45:18.414260] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers I this caused by local to the peer mount requests? b.w. L. ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 90 Brick/Server suggestions?
>Any particular reason for this, other than maximising space by avoiding two >layers of RAID/redundancy? Yes that's right we can get 720TB net usable space per server with 90*10TB disks. Any RAID layer will cost too much.. On Fri, Feb 17, 2017 at 6:13 PM, Gambit15 wrote: >> RAID is not an option, JBOD with EC will be used. > > > Any particular reason for this, other than maximising space by avoiding two > layers of RAID/redundancy? > Local RAID would be far simpler & quicker for replacing failed drives, and > it would greatly reduce the number of bricks & load on Gluster. > > We use RAID volumes for our bricks, and the benefits of simplified > management far outweigh the costs of a little lost capacity. > > D ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 90 Brick/Server suggestions?
> > RAID is not an option, JBOD with EC will be used. > Any particular reason for this, other than maximising space by avoiding two layers of RAID/redundancy? Local RAID would be far simpler & quicker for replacing failed drives, and it would greatly reduce the number of bricks & load on Gluster. We use RAID volumes for our bricks, and the benefits of simplified management far outweigh the costs of a little lost capacity. D ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] why is geo-rep so bloody impossible?
hi everyone, I've been browsing list's messages and it seems to me that users struggle, I do. I do what I thought was simple, I follow official docs. I, as root always do.. ]$ gluster system:: execute gsec_create ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica create push-pem force ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica start and I see: 256:log_raise_exception] : getting "No such file or directory"errors is most likely due to MISCONFIGURATION, please remove all the public keys added by geo-replication from authorized_keys file in slave nodes and run Geo-replication create command again. 263:log_raise_exception] : If `gsec_create container` was used, then run `gluster volume geo-replication [@]:: config remote-gsyncd (Example GSYNCD_PATH: `/usr/libexec/glusterfs/gsyncd`) so I remove all command="tar.. from ~/.ssh/authorized_keys on the geo-repl slave, then recreate session on master, but.. naturally, unfortunately it was not that. So I tried config gsyncd only to see: ... ..Popen: command "ssh -oPasswordAuthentication=no.. returned with 1, saying: 0-cli: Started running /usr/sbin/gluster with version 3.8.8 0-cli: Connecting to remote glusterd at localhost [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [cli-cmd.c:130:cli_cmd_process] 0-: Exiting with: 110 gsyncd initializaion failed and no idea where how to troubleshoot it further. for any help many thanks, L. ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 90 Brick/Server suggestions?
There may be some helpful information in this article: http://45drives.blogspot.ca/2016/11/an-introduction-to-clustering-how-to.html Disclaimer: I don't work for 45drives, I'm just a satisfied customer. Good luck, and please let us know how this works out for you. regards, tp On Fri, 17 Feb 2017, Serkan Çoban wrote: We have 12 on order. Actually the DSS7000 has two nodes in the chassis, and each accesses 45 bricks. We will be using an erasure code scheme probably 24:3 or 24:4, we have not sat down and really thought about the exact scheme we will use. If we cannot get 1 node/90 disk configuration, we also get it as 2 nodes/45 disks each. Be careful about EC. I am using 16+4 in production, only drawback is slow rebuild times. It takes 10 days to rebuild 8TB disk. Although parallel heal for EC improves it in 3.9, don't forget to test rebuild times for different EC configurations, 90 disks per server is a lot. In particular, it might be out of balance with other characteristics of the machine - number of cores, amount of memory, network or even bus bandwidth Nodes will be pretty powerful, 2x18 core CPUs with 256GB RAM and 2X10Gb bonded ethernet. It will be used for archive purposes so I don't need more than 1GB/s/node. RAID is not an option, JBOD with EC will be used. gluster volume set all cluster.brick-multiplex on I just read the 3.10 release notes and saw this. I think this is a good solution, I plan to use 3.10.x and will probably test multiplexing and get in touch for help.. Thanks for the suggestions, Serkan On Fri, Feb 17, 2017 at 1:39 AM, Jeff Darcy wrote: We are evaluating dell DSS7000 chassis with 90 disks. Has anyone used that much brick per server? Any suggestions, advices? 90 disks per server is a lot. In particular, it might be out of balance with other characteristics of the machine - number of cores, amount of memory, network or even bus bandwidth. Most people who put that many disks in a server use some sort of RAID (HW or SW) to combine them into a smaller number of physical volumes on top of which filesystems and such can be built. If you can't do that, or don't want to, you're in poorly explored territory. My suggestion would be to try running as 90 bricks. It might work fine, or you might run into various kinds of contention: (1) Excessive context switching would indicate not enough CPU. (2) Excessive page faults would indicate not enough memory. (3) Maxed-out network ports . . . well, you can figure that one out. ;) If (2) applies, you might want to try brick multiplexing. This is a new feature in 3.10, which can reduce memory consumption by more than 2x in many cases by putting multiple bricks into a single process (instead of one per brick). This also drastically reduces the number of ports you'll need, since the single process only needs one port total instead of one per brick. In terms of CPU usage or performance, gains are far more modest. Work in that area is still ongoing, as is work on multiplexing in general. If you want to help us get it all right, you can enable multiplexing like this: gluster volume set all cluster.brick-multiplex on If multiplexing doesn't help for you, speak up and maybe we can make it better, or perhaps come up with other things to try. Good luck! ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Machine becomes its own peer
On Fri, Feb 17, 2017 at 11:19 AM, Scott Hazelhurst < scott.hazelhu...@wits.ac.za> wrote: > > Dear all > > Last week I posted a query about a problem I had with a machine that had > failed but the underlying hard disk with the gluster brick was good. I’ve > made some progress in restoring. I now have the problem with my new > restored machine where it becomes its own peer, which then breaks > everything. > > 1. Gluster daemons are off on all peers, content of > /var/lib/glusterd/peers looks good. > 2. I start the gluster daemons on all peers. All looks good. > 3. For about 2 minutes, there’s no obvious problem — if I do a gluster > peer status on any machine it looks good, if I do a gluster volume status > A01 on any machine it looks good. > 4. Then at some point, the /var/lib/glusterd/peers file of the new, > restored machine gets an entry for itself and things start breaking. A > typical error message is the understandable > > : Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock > held by: 4fb930f7-554e-462a-9204-4592591feeb8 > > 5. This is repeatable — if I stop daemons, remove the offending entry in > /var/lib/glusterd/peer, and restart, the same behavior occurs — all good > for a minute or two and then something magically puts something in > /var/lib/glusterd/peers > I'd need few more details here: 1. output of gluster peer status 2. output of cat /var/lib/glusterd/glusterd.info & cat /var/lib/glusterd/peers/* from all the nodes > In a previous step in restoring my machine, I had a different error of > mismatching cksums and what I did then may be the cause of the problem. In > searching the list archives I found someone with a similar cksum problem, > and the proposed solution was to copy the /var/lib/glusterd/vols/ from > another of the peers to the new machine. This may not be the issue but this > is the only thing I think I did that was unconventional. > > I am running version 3.7.5-19 on Scientific Linux 6.8 > > If anyone can suggest a way forward I would be grateful > > Many thanks > > Scott > > > style="width:100%;"> > > face="arial,sans-serif" size="1" color="#99"> style="font-size:11px;">This communication is intended for the addressee > only. It is confidential. If you have received this communication in error, > please notify us immediately and destroy the original message. You may not > copy or disseminate this communication without the permission of the > University. Only authorised signatories are competent to enter into > agreements on behalf of the University and recipients are thus advised that > the content of this message may not be legally binding on the University > and may contain the personal views and opinions of the author, which are > not necessarily the views and opinions of The University of the > Witwatersrand, Johannesburg. All agreements between the University and > outsiders are subject to South African Law unless the University agrees in > writing to the contrary. > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users -- ~ Atin (atinm) ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users