OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside joke ...]
So here's the problem... By default, your child processes in a map/reduce job get a default 512MB. The majority of the time, this gets raised to 1GB. 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: This is why when people talk about the number of cores, you have to specify physical cores or logical cores....) So if you were to over subscribe and have lets say 12 mappers and 12 reducers, that's 24 slots. Which means that you would need 24GB of memory reserved just for the child processes. This would leave 8GB for DN, TT and the rest of the linux OS processes. Can you live with that? Sure. Now add in R, HBase, Impala, or some other set of tools on top of the cluster. Ooops! Now you are in trouble because you will swap. Also adding in R, you may want to bump up those child procs from 1GB to 2 GB. That means the 24 slots would now require 48GB. Now you have swap and if that happens you will see HBase in a cascading failure. So while you can do a rolling restart with the changed configuration (reducing the number of mappers and reducers) you end up with less slots which will mean in longer run time for your jobs. (Less slots == less parallelism ) Looking at the price of memory... you can get 48GB or even 64GB for around the same price point. (8GB chips) And I didn't even talk about adding SOLR either again a memory hog... ;-) Note that I matched the number of mappers w reducers. You could go with fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to reducers, depending on the work flow.... As to the disks... no 7200 SATA III drives are fine. SATA III interface is pretty much available in the new kit being shipped. Its just that you don't have enough drives. 8 cores should be 8 spindles if available. Otherwise you end up seeing your CPU load climb on wait states as the processes wait for the disk i/o to catch up. I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U chassis based on price. You're making a trade off and you should be aware of the performance hit you will take. HTH -Mike On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <[email protected]> wrote: > Hi Michael, > > so are you recommanding 32Gb per node? > > What about the disks? SATA drives are to slow? > > JM > > 2012/11/26, Michael Segel <[email protected]>: >> Uhm, those specs are actually now out of date. >> >> If you're running HBase, or want to also run R on top of Hadoop, you will >> need to add more memory. >> Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be disk i/o bound >> way too quickly. >> >> >> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> wrote: >> >>> Are you asking about hardware recommendations? >>> Eric Sammer on his "Hadoop Operations" book, did a great job about this: >>> For middle size clusters (until 300 nodes): >>> Processor: A dual quad-core 2.6 Ghz >>> RAM: 24 GB DDR3 >>> Dual 1 Gb Ethernet NICs >>> a SAS drive controller >>> at least two SATA II drives in a JBOD configuration >>> >>> The replication factor depends heavily of the primary use of your >>> cluster. >>> >>> On 11/26/2012 08:53 AM, David Charle wrote: >>>> hi >>>> >>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger >>>> cluster, lets say 50-100+ >>>> >>>> also, what would be the ideal replication factor for larger clusters when >>>> u have 3-4 racks ? >>>> >>>> -- >>>> David >>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>> INFORMATICAS... >>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>> >>>> http://www.uci.cu >>>> http://www.facebook.com/universidad.uci >>>> http://www.flickr.com/photos/universidad_uci >>> >>> -- >>> >>> Marcos Luis OrtÃz Valmaseda >>> about.me/marcosortiz <http://about.me/marcosortiz> >>> @marcosluis2186 <http://twitter.com/marcosluis2186> >>> >>> >>> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>> INFORMATICAS... >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>> >>> http://www.uci.cu >>> http://www.facebook.com/universidad.uci >>> http://www.flickr.com/photos/universidad_uci >> >> >
