Hi Gregory, I founs this about LVM: -> http://blog.andrew.net.au/2006/08/09 -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
Seems that performances are still correct with it. I will most probably give it a try and bench that too... I have one new hard drive which should arrived tomorrow. Perfect timing ;) JM 2012/11/28, Mohit Anchlia <[email protected]>: > > > > > On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <[email protected]> > wrote: > >> Does HBase really benefit from 64 GB of RAM since allocating too large >> heap >> might increase GC time ? >> > Benefit you get is from OS cache >> Another question : why not RAID 0, in order to aggregate disk bandwidth ? >> (and thus keep 3x replication factor) >> >> >> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel >> <[email protected]>wrote: >> >>> Sorry, >>> >>> I need to clarify. >>> >>> 4GB per physical core is a good starting point. >>> So with 2 quad core chips, that is going to be 32GB. >>> >>> IMHO that's a minimum. If you go with HBase, you will want more. >>> (Actually >>> you will need more.) The next logical jump would be to 48 or 64GB. >>> >>> If we start to price out memory, depending on vendor, your company's >>> procurement, there really isn't much of a price difference in terms of >>> 32,48, or 64 GB. >>> Note that it also depends on the chips themselves. Also you need to see >>> how many memory channels exist in the mother board. You may need to buy >>> in >>> pairs or triplets. Your hardware vendor can help you. (Also you need to >>> keep an eye on your hardware vendor. Sometimes they will give you higher >>> density chips that are going to be more expensive...) ;-) >>> >>> I tend to like having extra memory from the start. >>> It gives you a bit more freedom and also protects you from 'fat' code. >>> >>> Looking at YARN... you will need more memory too. >>> >>> >>> With respect to the hard drives... >>> >>> The best recommendation is to keep the drives as JBOD and then use 3x >>> replication. >>> In this case, make sure that the disk controller cards can handle JBOD. >>> (Some don't support JBOD out of the box) >>> >>> With respect to RAID... >>> >>> If you are running MapR, no need for RAID. >>> If you are running an Apache derivative, you could use RAID 1. Then cut >>> your replication to 2X. This makes it easier to manage drive failures. >>> (Its not the norm, but it works...) In some clusters, they are using >>> appliances like Net App's e series where the machines see the drives as >>> local attached storage and I think the appliances themselves are using >>> RAID. I haven't played with this configuration, however it could make >>> sense and its a valid design. >>> >>> HTH >>> >>> -Mike >>> >>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari >>> <[email protected]> >>> wrote: >>> >>>> Hi Mike, >>>> >>>> Thanks for all those details! >>>> >>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB. >>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a >>>> good start? Or I simplified it to much? >>>> >>>> Regarding the hard drives. If you add more than one drive, do you need >>>> to build them on RAID or similar systems? Or can Hadoop/HBase be >>>> configured to use more than one drive? >>>> >>>> Thanks, >>>> >>>> JM >>>> >>>> 2012/11/27, Michael Segel <[email protected]>: >>>>> >>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an >>> inside >>>>> joke ...] >>>>> >>>>> So here's the problem... >>>>> >>>>> By default, your child processes in a map/reduce job get a default >>> 512MB. >>>>> The majority of the time, this gets raised to 1GB. >>>>> >>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. >>> (Note: >>>>> This is why when people talk about the number of cores, you have to >>> specify >>>>> physical cores or logical cores....) >>>>> >>>>> So if you were to over subscribe and have lets say 12 mappers and 12 >>>>> reducers, that's 24 slots. Which means that you would need 24GB of >>> memory >>>>> reserved just for the child processes. This would leave 8GB for DN, TT >>> and >>>>> the rest of the linux OS processes. >>>>> >>>>> Can you live with that? Sure. >>>>> Now add in R, HBase, Impala, or some other set of tools on top of the >>>>> cluster. >>>>> >>>>> Ooops! Now you are in trouble because you will swap. >>>>> Also adding in R, you may want to bump up those child procs from 1GB >>>>> to >>> 2 >>>>> GB. That means the 24 slots would now require 48GB. Now you have swap >>> and >>>>> if that happens you will see HBase in a cascading failure. >>>>> >>>>> So while you can do a rolling restart with the changed configuration >>>>> (reducing the number of mappers and reducers) you end up with less >>>>> slots >>>>> which will mean in longer run time for your jobs. (Less slots == less >>>>> parallelism ) >>>>> >>>>> Looking at the price of memory... you can get 48GB or even 64GB for >>> around >>>>> the same price point. (8GB chips) >>>>> >>>>> And I didn't even talk about adding SOLR either again a memory hog... >>> ;-) >>>>> >>>>> Note that I matched the number of mappers w reducers. You could go >>>>> with >>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers >>> to >>>>> reducers, depending on the work flow.... >>>>> >>>>> As to the disks... no 7200 SATA III drives are fine. SATA III >>>>> interface >>> is >>>>> pretty much available in the new kit being shipped. >>>>> Its just that you don't have enough drives. 8 cores should be 8 >>> spindles if >>>>> available. >>>>> Otherwise you end up seeing your CPU load climb on wait states as the >>>>> processes wait for the disk i/o to catch up. >>>>> >>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U >>>>> chassis based on price. You're making a trade off and you should be >>> aware of >>>>> the performance hit you will take. >>>>> >>>>> HTH >>>>> >>>>> -Mike >>>>> >>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari < >>> [email protected]> >>>>> wrote: >>>>> >>>>>> Hi Michael, >>>>>> >>>>>> so are you recommanding 32Gb per node? >>>>>> >>>>>> What about the disks? SATA drives are to slow? >>>>>> >>>>>> JM >>>>>> >>>>>> 2012/11/26, Michael Segel <[email protected]>: >>>>>>> Uhm, those specs are actually now out of date. >>>>>>> >>>>>>> If you're running HBase, or want to also run R on top of Hadoop, you >>>>>>> will >>>>>>> need to add more memory. >>>>>>> Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be disk >>>>>>> i/o >>>>>>> bound >>>>>>> way too quickly. >>>>>>> >>>>>>> >>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> wrote: >>>>>>> >>>>>>>> Are you asking about hardware recommendations? >>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about >>>>>>>> this: >>>>>>>> For middle size clusters (until 300 nodes): >>>>>>>> Processor: A dual quad-core 2.6 Ghz >>>>>>>> RAM: 24 GB DDR3 >>>>>>>> Dual 1 Gb Ethernet NICs >>>>>>>> a SAS drive controller >>>>>>>> at least two SATA II drives in a JBOD configuration >>>>>>>> >>>>>>>> The replication factor depends heavily of the primary use of your >>>>>>>> cluster. >>>>>>>> >>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote: >>>>>>>>> hi >>>>>>>>> >>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a >>> larger >>>>>>>>> cluster, lets say 50-100+ >>>>>>>>> >>>>>>>>> also, what would be the ideal replication factor for larger >>>>>>>>> clusters >>>>>>>>> when >>>>>>>>> u have 3-4 racks ? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> David >>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>>>>>>> INFORMATICAS... >>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>>>>>> >>>>>>>>> http://www.uci.cu >>>>>>>>> http://www.facebook.com/universidad.uci >>>>>>>>> http://www.flickr.com/photos/universidad_uci >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Marcos Luis OrtÃz Valmaseda >>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz> >>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>>>>>> INFORMATICAS... >>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>>>>> >>>>>>>> http://www.uci.cu >>>>>>>> http://www.facebook.com/universidad.uci >>>>>>>> http://www.flickr.com/photos/universidad_uci >> >> >> -- >> Adrien Mogenet >> 06.59.16.64.22 >> http://www.mogenet.me >
