Re: recommended nodes

Adrien Mogenet Wed, 28 Nov 2012 09:08:21 -0800

Does HBase really benefit from 64 GB of RAM since allocating too large heap
might increase GC time ?


Another question : why not RAID 0, in order to aggregate disk bandwidth ?
(and thus keep 3x replication factor)


On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel <[email protected]>wrote:

> Sorry,
>
> I need to clarify.
>
> 4GB per physical core is a good starting point.
> So with 2 quad core chips, that is going to be 32GB.
>
> IMHO that's a minimum. If you go with HBase, you will want more. (Actually
> you will need more.) The next logical jump would be to 48 or 64GB.
>
> If we start to price out memory, depending on vendor, your company's
> procurement,  there really isn't much of a price difference in terms of
> 32,48, or 64 GB.
> Note that it also depends on the chips themselves. Also you need to see
> how many memory channels exist in the mother board. You may need to buy in
> pairs or triplets. Your hardware vendor can help you. (Also you need to
> keep an eye on your hardware vendor. Sometimes they will give you higher
> density chips that are going to be more expensive...) ;-)
>
> I tend to like having extra memory from the start.
> It gives you a bit more freedom and also protects you from 'fat' code.
>
> Looking at YARN... you will need more memory too.
>
>
> With respect to the hard drives...
>
> The best recommendation is to keep the drives as JBOD and then use 3x
> replication.
> In this case, make sure that the disk controller cards can handle JBOD.
> (Some don't support JBOD out of the box)
>
> With respect to RAID...
>
> If you are running MapR, no need for RAID.
> If you are running an Apache derivative, you could use RAID 1. Then cut
> your replication to 2X. This makes it easier to manage drive failures.
> (Its not the norm, but it works...) In some clusters, they are using
> appliances like Net App's e series where the machines see the drives as
> local attached storage and I think the appliances themselves are using
> RAID.  I haven't played with this configuration, however it could make
> sense and its a valid design.
>
> HTH
>
> -Mike
>
> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari <[email protected]>
> wrote:
>
> > Hi Mike,
> >
> > Thanks for all those details!
> >
> > So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
> > Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
> > good start? Or I simplified it to much?
> >
> > Regarding the hard drives. If you add more than one drive, do you need
> > to build them on RAID or similar systems? Or can Hadoop/HBase be
> > configured to use more than one drive?
> >
> > Thanks,
> >
> > JM
> >
> > 2012/11/27, Michael Segel <[email protected]>:
> >>
> >> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
> inside
> >> joke ...]
> >>
> >> So here's the problem...
> >>
> >> By default, your child processes in a map/reduce job get a default
> 512MB.
> >> The majority of the time, this gets raised to 1GB.
> >>
> >> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
> (Note:
> >> This is why when people talk about the number of cores, you have to
> specify
> >> physical cores or logical cores....)
> >>
> >> So if you were to over subscribe and have lets say 12  mappers and 12
> >> reducers, that's 24 slots. Which means that you would need 24GB of
> memory
> >> reserved just for the child processes. This would leave 8GB for DN, TT
> and
> >> the rest of the linux OS processes.
> >>
> >> Can you live with that? Sure.
> >> Now add in R, HBase, Impala, or some other set of tools on top of the
> >> cluster.
> >>
> >> Ooops! Now you are in trouble because you will swap.
> >> Also adding in R, you may want to bump up those child procs from 1GB to
> 2
> >> GB. That means the 24 slots would now require 48GB.  Now you have swap
> and
> >> if that happens you will see HBase in a cascading failure.
> >>
> >> So while you can do a rolling restart with the changed configuration
> >> (reducing the number of mappers and reducers) you end up with less slots
> >> which will mean in longer run time for your jobs. (Less slots == less
> >> parallelism )
> >>
> >> Looking at the price of memory... you can get 48GB or even 64GB  for
> around
> >> the same price point. (8GB chips)
> >>
> >> And I didn't even talk about adding SOLR either again a memory hog...
> ;-)
> >>
> >> Note that I matched the number of mappers w reducers. You could go with
> >> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
> to
> >> reducers, depending on the work flow....
> >>
> >> As to the disks... no 7200 SATA III drives are fine. SATA III interface
> is
> >> pretty much available in the new kit being shipped.
> >> Its just that you don't have enough drives. 8 cores should be 8
> spindles if
> >> available.
> >> Otherwise you end up seeing your CPU load climb on wait states as the
> >> processes wait for the disk i/o to catch up.
> >>
> >> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
> >> chassis based on price. You're making a trade off and you should be
> aware of
> >> the performance hit you will take.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> [email protected]>
> >> wrote:
> >>
> >>> Hi Michael,
> >>>
> >>> so are you recommanding 32Gb per node?
> >>>
> >>> What about the disks? SATA drives are to slow?
> >>>
> >>> JM
> >>>
> >>> 2012/11/26, Michael Segel <[email protected]>:
> >>>> Uhm, those specs are actually now out of date.
> >>>>
> >>>> If you're running HBase, or want to also run R on top of Hadoop, you
> >>>> will
> >>>> need to add more memory.
> >>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
> >>>> bound
> >>>> way too quickly.
> >>>>
> >>>>
> >>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> wrote:
> >>>>
> >>>>> Are you asking about hardware recommendations?
> >>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
> >>>>> this:
> >>>>> For middle size clusters (until 300 nodes):
> >>>>> Processor: A dual quad-core 2.6 Ghz
> >>>>> RAM: 24 GB DDR3
> >>>>> Dual 1 Gb Ethernet NICs
> >>>>> a SAS drive controller
> >>>>> at least two SATA II drives in a JBOD configuration
> >>>>>
> >>>>> The replication factor depends heavily of the primary use of your
> >>>>> cluster.
> >>>>>
> >>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> >>>>>> hi
> >>>>>>
> >>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
> larger
> >>>>>> cluster, lets say 50-100+
> >>>>>>
> >>>>>> also, what would be the ideal replication factor for larger clusters
> >>>>>> when
> >>>>>> u have 3-4 racks ?
> >>>>>>
> >>>>>> --
> >>>>>> David
> >>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >>>>>> INFORMATICAS...
> >>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>>
> >>>>>> http://www.uci.cu
> >>>>>> http://www.facebook.com/universidad.uci
> >>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Marcos Luis Ortíz Valmaseda
> >>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> >>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>>>>
> >>>>>
> >>>>>
> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >>>>> INFORMATICAS...
> >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>
> >>>>> http://www.uci.cu
> >>>>> http://www.facebook.com/universidad.uci
> >>>>> http://www.flickr.com/photos/universidad_uci
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>


-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: recommended nodes

Reply via email to