Hi Mike, Thanks for all those details!
So to simplify the equation, for 16 virtual cores we need 48 to 64GB. Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a good start? Or I simplified it to much? Regarding the hard drives. If you add more than one drive, do you need to build them on RAID or similar systems? Or can Hadoop/HBase be configured to use more than one drive? Thanks, JM 2012/11/27, Michael Segel <[email protected]>: > > OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside > joke ...] > > So here's the problem... > > By default, your child processes in a map/reduce job get a default 512MB. > The majority of the time, this gets raised to 1GB. > > 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: > This is why when people talk about the number of cores, you have to specify > physical cores or logical cores....) > > So if you were to over subscribe and have lets say 12 mappers and 12 > reducers, that's 24 slots. Which means that you would need 24GB of memory > reserved just for the child processes. This would leave 8GB for DN, TT and > the rest of the linux OS processes. > > Can you live with that? Sure. > Now add in R, HBase, Impala, or some other set of tools on top of the > cluster. > > Ooops! Now you are in trouble because you will swap. > Also adding in R, you may want to bump up those child procs from 1GB to 2 > GB. That means the 24 slots would now require 48GB. Now you have swap and > if that happens you will see HBase in a cascading failure. > > So while you can do a rolling restart with the changed configuration > (reducing the number of mappers and reducers) you end up with less slots > which will mean in longer run time for your jobs. (Less slots == less > parallelism ) > > Looking at the price of memory... you can get 48GB or even 64GB for around > the same price point. (8GB chips) > > And I didn't even talk about adding SOLR either again a memory hog... ;-) > > Note that I matched the number of mappers w reducers. You could go with > fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to > reducers, depending on the work flow.... > > As to the disks... no 7200 SATA III drives are fine. SATA III interface is > pretty much available in the new kit being shipped. > Its just that you don't have enough drives. 8 cores should be 8 spindles if > available. > Otherwise you end up seeing your CPU load climb on wait states as the > processes wait for the disk i/o to catch up. > > I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U > chassis based on price. You're making a trade off and you should be aware of > the performance hit you will take. > > HTH > > -Mike > > On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <[email protected]> > wrote: > >> Hi Michael, >> >> so are you recommanding 32Gb per node? >> >> What about the disks? SATA drives are to slow? >> >> JM >> >> 2012/11/26, Michael Segel <[email protected]>: >>> Uhm, those specs are actually now out of date. >>> >>> If you're running HBase, or want to also run R on top of Hadoop, you >>> will >>> need to add more memory. >>> Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be disk i/o >>> bound >>> way too quickly. >>> >>> >>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> wrote: >>> >>>> Are you asking about hardware recommendations? >>>> Eric Sammer on his "Hadoop Operations" book, did a great job about >>>> this: >>>> For middle size clusters (until 300 nodes): >>>> Processor: A dual quad-core 2.6 Ghz >>>> RAM: 24 GB DDR3 >>>> Dual 1 Gb Ethernet NICs >>>> a SAS drive controller >>>> at least two SATA II drives in a JBOD configuration >>>> >>>> The replication factor depends heavily of the primary use of your >>>> cluster. >>>> >>>> On 11/26/2012 08:53 AM, David Charle wrote: >>>>> hi >>>>> >>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger >>>>> cluster, lets say 50-100+ >>>>> >>>>> also, what would be the ideal replication factor for larger clusters >>>>> when >>>>> u have 3-4 racks ? >>>>> >>>>> -- >>>>> David >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>>> INFORMATICAS... >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>> >>>>> http://www.uci.cu >>>>> http://www.facebook.com/universidad.uci >>>>> http://www.flickr.com/photos/universidad_uci >>>> >>>> -- >>>> >>>> Marcos Luis OrtÃz Valmaseda >>>> about.me/marcosortiz <http://about.me/marcosortiz> >>>> @marcosluis2186 <http://twitter.com/marcosluis2186> >>>> >>>> >>>> >>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>> INFORMATICAS... >>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>> >>>> http://www.uci.cu >>>> http://www.facebook.com/universidad.uci >>>> http://www.flickr.com/photos/universidad_uci >>> >>> >> > >
