Finally, it took me a while to run those tests because it was way longer than expected, but here are the results:
http://www.spaggiari.org/bonnie.html LVM is not really slower than JBOD and not really taking more CPU. So I will say, if you have to choose between the 2, take the one you prefer. Personally, I prefer LVM because it's easy to configure. The big winner here is RAID0. It's WAY faster than anything else. But it's using twice the space... Your choice. I did not get a chance to test with the Ubuntu tool because it's not working with LVM drives. JM 2012/11/28, Michael Segel <[email protected]>: > Ok, just a caveat. > > I am discussing MapR as part of a complete response. As Mohit posted MapR > takes the raw device for their MapR File System. > They do stripe on their own within what they call a volume. > > But going back to Apache... > You can stripe drives, however I wouldn't recommend it. I don't think the > performance gains would really matter. > You're going to end up getting blocked first by disk i/o, then your > controller card, then your network... assuming 10GBe. > > With only 2 disks on an 8 core system, you will hit disk i/o first and then > you'll watch your CPU Wait I/O climb. > > HTH > > -Mike > > On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <[email protected]> > wrote: > >> Hi Mike, >> >> Why not using LVM with MapR? Since LVM is reading from 2 drives almost >> at the same time, it should be better than RAID0 or a single drive, >> no? >> >> 2012/11/28, Michael Segel <[email protected]>: >>> Just a couple of things. >>> >>> I'm neutral on the use of LVMs. Some would point out that there's some >>> overhead, but on the flip side, it can make managing the machines >>> easier. >>> If you're using MapR, you don't want to use LVMs but raw devices. >>> >>> In terms of GC, its going to depend on the heap size and not the total >>> memory. With respect to HBase. ... MSLABS is the way to go. >>> >>> >>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari >>> <[email protected]> >>> wrote: >>> >>>> Hi Gregory, >>>> >>>> I founs this about LVM: >>>> -> http://blog.andrew.net.au/2006/08/09 >>>> -> >>>> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2 >>>> >>>> Seems that performances are still correct with it. I will most >>>> probably give it a try and bench that too... I have one new hard drive >>>> which should arrived tomorrow. Perfect timing ;) >>>> >>>> >>>> >>>> JM >>>> >>>> 2012/11/28, Mohit Anchlia <[email protected]>: >>>>> >>>>> >>>>> >>>>> >>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <[email protected]> >>>>> wrote: >>>>> >>>>>> Does HBase really benefit from 64 GB of RAM since allocating too >>>>>> large >>>>>> heap >>>>>> might increase GC time ? >>>>>> >>>>> Benefit you get is from OS cache >>>>>> Another question : why not RAID 0, in order to aggregate disk >>>>>> bandwidth >>>>>> ? >>>>>> (and thus keep 3x replication factor) >>>>>> >>>>>> >>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Sorry, >>>>>>> >>>>>>> I need to clarify. >>>>>>> >>>>>>> 4GB per physical core is a good starting point. >>>>>>> So with 2 quad core chips, that is going to be 32GB. >>>>>>> >>>>>>> IMHO that's a minimum. If you go with HBase, you will want more. >>>>>>> (Actually >>>>>>> you will need more.) The next logical jump would be to 48 or 64GB. >>>>>>> >>>>>>> If we start to price out memory, depending on vendor, your company's >>>>>>> procurement, there really isn't much of a price difference in terms >>>>>>> of >>>>>>> 32,48, or 64 GB. >>>>>>> Note that it also depends on the chips themselves. Also you need to >>>>>>> see >>>>>>> how many memory channels exist in the mother board. You may need to >>>>>>> buy >>>>>>> in >>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need >>>>>>> to >>>>>>> keep an eye on your hardware vendor. Sometimes they will give you >>>>>>> higher >>>>>>> density chips that are going to be more expensive...) ;-) >>>>>>> >>>>>>> I tend to like having extra memory from the start. >>>>>>> It gives you a bit more freedom and also protects you from 'fat' >>>>>>> code. >>>>>>> >>>>>>> Looking at YARN... you will need more memory too. >>>>>>> >>>>>>> >>>>>>> With respect to the hard drives... >>>>>>> >>>>>>> The best recommendation is to keep the drives as JBOD and then use >>>>>>> 3x >>>>>>> replication. >>>>>>> In this case, make sure that the disk controller cards can handle >>>>>>> JBOD. >>>>>>> (Some don't support JBOD out of the box) >>>>>>> >>>>>>> With respect to RAID... >>>>>>> >>>>>>> If you are running MapR, no need for RAID. >>>>>>> If you are running an Apache derivative, you could use RAID 1. Then >>>>>>> cut >>>>>>> your replication to 2X. This makes it easier to manage drive >>>>>>> failures. >>>>>>> (Its not the norm, but it works...) In some clusters, they are using >>>>>>> appliances like Net App's e series where the machines see the drives >>>>>>> as >>>>>>> local attached storage and I think the appliances themselves are >>>>>>> using >>>>>>> RAID. I haven't played with this configuration, however it could >>>>>>> make >>>>>>> sense and its a valid design. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> -Mike >>>>>>> >>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Mike, >>>>>>>> >>>>>>>> Thanks for all those details! >>>>>>>> >>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to >>>>>>>> 64GB. >>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are >>>>>>>> a >>>>>>>> good start? Or I simplified it to much? >>>>>>>> >>>>>>>> Regarding the hard drives. If you add more than one drive, do you >>>>>>>> need >>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be >>>>>>>> configured to use more than one drive? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> JM >>>>>>>> >>>>>>>> 2012/11/27, Michael Segel <[email protected]>: >>>>>>>>> >>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an >>>>>>> inside >>>>>>>>> joke ...] >>>>>>>>> >>>>>>>>> So here's the problem... >>>>>>>>> >>>>>>>>> By default, your child processes in a map/reduce job get a default >>>>>>> 512MB. >>>>>>>>> The majority of the time, this gets raised to 1GB. >>>>>>>>> >>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in >>>>>>>>> Linux. >>>>>>> (Note: >>>>>>>>> This is why when people talk about the number of cores, you have >>>>>>>>> to >>>>>>> specify >>>>>>>>> physical cores or logical cores....) >>>>>>>>> >>>>>>>>> So if you were to over subscribe and have lets say 12 mappers and >>>>>>>>> 12 >>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of >>>>>>> memory >>>>>>>>> reserved just for the child processes. This would leave 8GB for >>>>>>>>> DN, >>>>>>>>> TT >>>>>>> and >>>>>>>>> the rest of the linux OS processes. >>>>>>>>> >>>>>>>>> Can you live with that? Sure. >>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of >>>>>>>>> the >>>>>>>>> cluster. >>>>>>>>> >>>>>>>>> Ooops! Now you are in trouble because you will swap. >>>>>>>>> Also adding in R, you may want to bump up those child procs from >>>>>>>>> 1GB >>>>>>>>> to >>>>>>> 2 >>>>>>>>> GB. That means the 24 slots would now require 48GB. Now you have >>>>>>>>> swap >>>>>>> and >>>>>>>>> if that happens you will see HBase in a cascading failure. >>>>>>>>> >>>>>>>>> So while you can do a rolling restart with the changed >>>>>>>>> configuration >>>>>>>>> (reducing the number of mappers and reducers) you end up with less >>>>>>>>> slots >>>>>>>>> which will mean in longer run time for your jobs. (Less slots == >>>>>>>>> less >>>>>>>>> parallelism ) >>>>>>>>> >>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB >>>>>>>>> for >>>>>>> around >>>>>>>>> the same price point. (8GB chips) >>>>>>>>> >>>>>>>>> And I didn't even talk about adding SOLR either again a memory >>>>>>>>> hog... >>>>>>> ;-) >>>>>>>>> >>>>>>>>> Note that I matched the number of mappers w reducers. You could go >>>>>>>>> with >>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 >>>>>>>>> mappers >>>>>>> to >>>>>>>>> reducers, depending on the work flow.... >>>>>>>>> >>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III >>>>>>>>> interface >>>>>>> is >>>>>>>>> pretty much available in the new kit being shipped. >>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8 >>>>>>> spindles if >>>>>>>>> available. >>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as >>>>>>>>> the >>>>>>>>> processes wait for the disk i/o to catch up. >>>>>>>>> >>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a >>>>>>>>> 1 >>>>>>>>> U >>>>>>>>> chassis based on price. You're making a trade off and you should >>>>>>>>> be >>>>>>> aware of >>>>>>>>> the performance hit you will take. >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> >>>>>>>>> -Mike >>>>>>>>> >>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari < >>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Michael, >>>>>>>>>> >>>>>>>>>> so are you recommanding 32Gb per node? >>>>>>>>>> >>>>>>>>>> What about the disks? SATA drives are to slow? >>>>>>>>>> >>>>>>>>>> JM >>>>>>>>>> >>>>>>>>>> 2012/11/26, Michael Segel <[email protected]>: >>>>>>>>>>> Uhm, those specs are actually now out of date. >>>>>>>>>>> >>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop, >>>>>>>>>>> you >>>>>>>>>>> will >>>>>>>>>>> need to add more memory. >>>>>>>>>>> Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be >>>>>>>>>>> disk >>>>>>>>>>> i/o >>>>>>>>>>> bound >>>>>>>>>>> way too quickly. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Are you asking about hardware recommendations? >>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job >>>>>>>>>>>> about >>>>>>>>>>>> this: >>>>>>>>>>>> For middle size clusters (until 300 nodes): >>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz >>>>>>>>>>>> RAM: 24 GB DDR3 >>>>>>>>>>>> Dual 1 Gb Ethernet NICs >>>>>>>>>>>> a SAS drive controller >>>>>>>>>>>> at least two SATA II drives in a JBOD configuration >>>>>>>>>>>> >>>>>>>>>>>> The replication factor depends heavily of the primary use of >>>>>>>>>>>> your >>>>>>>>>>>> cluster. >>>>>>>>>>>> >>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote: >>>>>>>>>>>>> hi >>>>>>>>>>>>> >>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for >>>>>>>>>>>>> a >>>>>>> larger >>>>>>>>>>>>> cluster, lets say 50-100+ >>>>>>>>>>>>> >>>>>>>>>>>>> also, what would be the ideal replication factor for larger >>>>>>>>>>>>> clusters >>>>>>>>>>>>> when >>>>>>>>>>>>> u have 3-4 racks ? >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> David >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS >>>>>>>>>>>>> CIENCIAS >>>>>>>>>>>>> INFORMATICAS... >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>>>>>>>>>> >>>>>>>>>>>>> http://www.uci.cu >>>>>>>>>>>>> http://www.facebook.com/universidad.uci >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Marcos Luis OrtÃz Valmaseda >>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz> >>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS >>>>>>>>>>>> CIENCIAS >>>>>>>>>>>> INFORMATICAS... >>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>>>>>>>>> >>>>>>>>>>>> http://www.uci.cu >>>>>>>>>>>> http://www.facebook.com/universidad.uci >>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci >>>>>> >>>>>> >>>>>> -- >>>>>> Adrien Mogenet >>>>>> 06.59.16.64.22 >>>>>> http://www.mogenet.me >>>>> >>>> >>> >>> >> > >
