On Tue, Aug 8, 2017 at 12:03 AM, FERNANDO FREDIANI < fernando.fredi...@upx.com> wrote:
> Thanks for the detailed answer Erekle. > > I conclude that it is worth in any scenario to have a arbiter node in > order to avoid wasting more disk space to RAID X + Gluster Replication on > the top of it. The cost seems much lower if you consider running costs of > the whole storage and compare it with the cost to build the arbiter node. > Even having a fully redundant arbiter service with 2 nodes would make it > wort on a larger deployment. > Note that although you get the same consistency as a replica 3 setup, a 2+arbiter gives you data availability as a replica 2 setup. May or may not be OK with your high availability requirements. Y. > Regards > Fernando > On 07/08/2017 17:07, Erekle Magradze wrote: > > Hi Fernando (sorry for misspelling your name, I used a different keyboard), > > So let's go with the following scenarios: > > 1. Let's say you have two servers (replication factor is 2), i.e. two > bricks per volume, in this case it is strongly recommended to have the > arbiter node, the metadata storage that will guarantee avoiding the split > brain situation, in this case for arbiter you don't even need a disk with > lots of space, it's enough to have a tiny ssd but hosted on a separate > server. Advantage of such setup is that you don't need the RAID 1 for each > brick, you have the metadata information stored in arbiter node and brick > replacement is easy. > > 2. If you have odd number of bricks (let's say 3, i.e. replication factor > is 3) in your volume and you didn't create the arbiter node as well as you > didn't configure the quorum, in this case the entire load for keeping the > consistency of the volume resides on all 3 servers, each of them is > important and each brick contains key information, they need to cross-check > each other (that's what people usually do with the first try of gluster :) > ), in this case replacing a brick is a big pain and in this case RAID 1 is > a good option to have (that's the disadvantage, i.e. loosing the space and > not having the JBOD option) advantage is that you don't have the to have > additional arbiter node. > > 3. You have odd number of bricks and configured arbiter node, in this case > you can easily go with JBOD, however a good practice would be to have a > RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for > volumes with 10s of TB-s in size.) > > That's basically it > > The rest about the reliability and setup scenarios you can find in gluster > documentation, especially look for quorum and arbiter node configs+options. > > Cheers > > Erekle > P.S. What I was mentioning, regarding a good practice is mostly related to > the operations of gluster not installation or deployment, i.e. not the > conceptual understanding of gluster (conceptually it's a JBOD system). > > On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote: > > Thanks for the clarification Erekle. > > However I get surprised with this way of operating from GlusterFS as it > adds another layer of complexity to the system (either a hardware or > software RAID) before the gluster config and increase the system's overall > costs. > > An important point to consider is: In RAID configuration you already have > space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then > when you have GlusterFS on the top of several RAIDs you have again more > data replicated so you end up with the same data consuming more space in a > group of disks and again on the top of several RAIDs depending on the > Gluster configuration you have (in a RAID 1 config the same data is > replicated 4 times). > > Yet another downside of having a RAID (specially RAID 5 or 6) is that it > reduces considerably the write speeds as each group of disks will end up > having the write speed of a single disk as all other disks of that group > have to wait for each other to write as well. > > Therefore if Gluster already replicates data why does it create this big > pain you mentioned if the data is replicated somewhere else, can still be > retrieved to both serve clients and reconstruct the equivalent disk when it > is replaced ? > > Fernando > > On 07/08/2017 10:26, Erekle Magradze wrote: > > Hi Frenando, > > Here is my experience, if you consider a particular hard drive as a brick > for gluster volume and it dies, i.e. it becomes not accessible it's a huge > hassle to discard that brick and exchange with another one, since gluster > some tries to access that broken brick and it's causing (at least it cause > for me) a big pain, therefore it's better to have a RAID as brick, i.e. > have RAID 1 (mirroring) for each brick, in this case if the disk is down > you can easily exchange it and rebuild the RAID without going offline, i.e > switching off the volume doing brick manipulations and switching it back on. > > Cheers > > Erekle > > On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote: > > For any RAID 5 or 6 configuration I normally follow a simple gold rule > which gave good results so far: > - up to 4 disks RAID 5 > - 5 or more disks RAID 6 > > However I didn't really understand well the recommendation to use any RAID > with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode > and control the disks (bricks) directlly so you can create whatever > distribution rule you wish, and if a single disk fails you just replace it > and which obviously have the data replicated from another. The only > downside of using in this way is that the replication data will be flow > accross all servers but that is not much a big issue. > > Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS. > > Thanks > Regards > Fernando > > On 07/08/2017 03:46, Devin Acosta wrote: > > > Moacir, > > I have recently installed multiple Red Hat Virtualization hosts for > several different companies, and have dealt with the Red Hat Support Team > in depth about optimal configuration in regards to setting up GlusterFS > most efficiently and I wanted to share with you what I learned. > > In general Red Hat Virtualization team frowns upon using each DISK of the > system as just a JBOD, sure there is some protection by having the data > replicated, however, the recommendation is to use RAID 6 (preferred) or > RAID-5, or at least RAID-1 at the very least. > > Here is the direct quote from Red Hat when I asked about RAID and Bricks: > > *"A typical Gluster configuration would use RAID underneath the bricks. > RAID 6 is most typical as it gives you 2 disk failure protection, but RAID > 5 could be used too. Once you have the RAIDed bricks, you'd then apply the > desired replication on top of that. The most popular way of doing this > would be distributed replicated with 2x replication. In general you'll get > better performance with larger bricks. 12 drives is often a sweet spot. > Another option would be to create a separate tier using all SSD’s.” * > > *In order to SSD tiering from my understanding you would need 1 x NVMe > drive in each server, or 4 x SSD hot tier (it needs to be distributed, > replicated for the hot tier if not using NVME). So with you only having 1 > SSD drive in each server, I’d suggest maybe looking into the NVME option. * > > *Since your using only 3-servers, what I’d probably suggest is to do (2 > Replicas + Arbiter Node), this setup actually doesn’t require the 3rd > server to have big drives at all as it only stores meta-data about the > files and not actually a full copy. * > > *Please see the attached document that was given to me by Red Hat to get > more information on this. Hope this information helps you.* > > > -- > > Devin Acosta, RHCA, RHVCA > Red Hat Certified Architect > > On August 6, 2017 at 7:29:29 PM, Moacir Ferreira ( > moacirferre...@hotmail.com) wrote: > > I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU > sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use > GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and > a dual 10Gb NIC. So my intention is to create a loop like a server triangle > using the 40Gb NICs for virtualization files (VMs .qcow2) access and to > move VMs around the pod (east /west traffic) while using the 10Gb > interfaces for giving services to the outside world (north/south traffic). > > > This said, my first question is: How should I deploy GlusterFS in such > oVirt scenario? My questions are: > > > 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and > then create a GlusterFS using them? > > 2 - Instead, should I create a JBOD array made of all server's disks? > > 3 - What is the best Gluster configuration to provide for HA while not > consuming too much disk space? > > 4 - Does a oVirt hypervisor pod like I am planning to build, and the > virtualization environment, benefits from tiering when using a SSD disk? > And yes, will Gluster do it by default or I have to configure it to do so? > > > At the bottom line, what is the good practice for using GlusterFS in small > pods for enterprises? > > > You opinion/feedback will be really appreciated! > > Moacir > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > > _______________________________________________ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > > > _______________________________________________ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > > > -- > Recogizer Group GmbH > > Dr.rer.nat. Erekle Magradze > Lead Big Data Engineering & DevOps > Rheinwerkallee 2, 53227 Bonn > Tel: +49 228 29974555 <+49%20228%2029974555> > > E-Mail erekle.magra...@recogizer.de > Web: www.recogizer.com > > Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ > Folgen Sie uns auf Twitter https://twitter.com/recogizer > > ----------------------------------------------------------------- > Recogizer Group GmbH > Geschäftsführer: Oliver Habisch, Carsten Kreutze > Handelsregister: Amtsgericht Bonn HRB 20724 > Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich > erhalten haben, > informieren Sie bitte sofort den Absender und löschen Sie diese Mail. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der > darin enthaltenen Informationen ist nicht gestattet. > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users