Re: recommended nodes

Jean-Marc Spaggiari Wed, 19 Dec 2012 15:00:22 -0800

Finally, it took me a while to run those tests because it was way
longer than expected, but here are the results:


http://www.spaggiari.org/bonnie.html

LVM is not really slower than JBOD and not really taking more CPU. So
I will say, if you have to choose between the 2, take the one you
prefer. Personally, I prefer LVM because it's easy to configure.

The big winner here is RAID0. It's WAY faster than anything else. But
it's using twice the space... Your choice.

I did not get a chance to test with the Ubuntu tool because it's not
working with LVM drives.

JM

2012/11/28, Michael Segel <[email protected]>:
> Ok, just a caveat.
>
> I am discussing MapR as part of a complete response. As Mohit posted MapR
> takes the raw device for their MapR File System.
> They do stripe on their own within what they call a volume.
>
> But going back to Apache...
> You can stripe drives, however I wouldn't recommend it. I don't think the
> performance gains would really matter.
> You're going to end up getting blocked first by disk i/o, then your
> controller card, then your network... assuming 10GBe.
>
> With only 2 disks on an 8 core system, you will hit disk i/o first and then
> you'll watch your CPU Wait I/O climb.
>
> HTH
>
> -Mike
>
> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <[email protected]>
> wrote:
>
>> Hi Mike,
>>
>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
>> at the same time, it should be better than RAID0 or a single drive,
>> no?
>>
>> 2012/11/28, Michael Segel <[email protected]>:
>>> Just a couple of things.
>>>
>>> I'm neutral on the use of LVMs. Some would point out that there's some
>>> overhead, but on the flip side, it can make managing the machines
>>> easier.
>>> If you're using MapR, you don't want to use LVMs but raw devices.
>>>
>>> In terms of GC, its going to depend on the heap size and not the total
>>> memory. With respect to HBase. ... MSLABS is the way to go.
>>>
>>>
>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
>>> <[email protected]>
>>> wrote:
>>>
>>>> Hi Gregory,
>>>>
>>>> I founs this about LVM:
>>>> -> http://blog.andrew.net.au/2006/08/09
>>>> ->
>>>> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>>>
>>>> Seems that performances are still correct with it. I will most
>>>> probably give it a try and bench that too... I have one new hard drive
>>>> which should arrived tomorrow. Perfect timing ;)
>>>>
>>>>
>>>>
>>>> JM
>>>>
>>>> 2012/11/28, Mohit Anchlia <[email protected]>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
>>>>>> large
>>>>>> heap
>>>>>> might increase GC time ?
>>>>>>
>>>>> Benefit you get is from OS cache
>>>>>> Another question : why not RAID 0, in order to aggregate disk
>>>>>> bandwidth
>>>>>> ?
>>>>>> (and thus keep 3x replication factor)
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Sorry,
>>>>>>>
>>>>>>> I need to clarify.
>>>>>>>
>>>>>>> 4GB per physical core is a good starting point.
>>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>>>
>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>>> (Actually
>>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>>>
>>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>>> of
>>>>>>> 32,48, or 64 GB.
>>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>>> see
>>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>>> buy
>>>>>>> in
>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>>> to
>>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>>> higher
>>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>>>
>>>>>>> I tend to like having extra memory from the start.
>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
>>>>>>> code.
>>>>>>>
>>>>>>> Looking at YARN... you will need more memory too.
>>>>>>>
>>>>>>>
>>>>>>> With respect to the hard drives...
>>>>>>>
>>>>>>> The best recommendation is to keep the drives as JBOD and then use
>>>>>>> 3x
>>>>>>> replication.
>>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>>> JBOD.
>>>>>>> (Some don't support JBOD out of the box)
>>>>>>>
>>>>>>> With respect to RAID...
>>>>>>>
>>>>>>> If you are running MapR, no need for RAID.
>>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>>> cut
>>>>>>> your replication to 2X. This makes it easier to manage drive
>>>>>>> failures.
>>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>>> as
>>>>>>> local attached storage and I think the appliances themselves are
>>>>>>> using
>>>>>>> RAID.  I haven't played with this configuration, however it could
>>>>>>> make
>>>>>>> sense and its a valid design.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> -Mike
>>>>>>>
>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> Thanks for all those details!
>>>>>>>>
>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
>>>>>>>> 64GB.
>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are
>>>>>>>> a
>>>>>>>> good start? Or I simplified it to much?
>>>>>>>>
>>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>>> need
>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>>> configured to use more than one drive?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> JM
>>>>>>>>
>>>>>>>> 2012/11/27, Michael Segel <[email protected]>:
>>>>>>>>>
>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>>> inside
>>>>>>>>> joke ...]
>>>>>>>>>
>>>>>>>>> So here's the problem...
>>>>>>>>>
>>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>>> 512MB.
>>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>>>
>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>>> Linux.
>>>>>>> (Note:
>>>>>>>>> This is why when people talk about the number of cores, you have
>>>>>>>>> to
>>>>>>> specify
>>>>>>>>> physical cores or logical cores....)
>>>>>>>>>
>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>>> 12
>>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>>> memory
>>>>>>>>> reserved just for the child processes. This would leave 8GB for
>>>>>>>>> DN,
>>>>>>>>> TT
>>>>>>> and
>>>>>>>>> the rest of the linux OS processes.
>>>>>>>>>
>>>>>>>>> Can you live with that? Sure.
>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>>> the
>>>>>>>>> cluster.
>>>>>>>>>
>>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>>> Also adding in R, you may want to bump up those child procs from
>>>>>>>>> 1GB
>>>>>>>>> to
>>>>>>> 2
>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>>> swap
>>>>>>> and
>>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>>>
>>>>>>>>> So while you can do a rolling restart with the changed
>>>>>>>>> configuration
>>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>>> slots
>>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>>> less
>>>>>>>>> parallelism )
>>>>>>>>>
>>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB
>>>>>>>>> for
>>>>>>> around
>>>>>>>>> the same price point. (8GB chips)
>>>>>>>>>
>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>>> hog...
>>>>>>> ;-)
>>>>>>>>>
>>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>>> with
>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>>> mappers
>>>>>>> to
>>>>>>>>> reducers, depending on the work flow....
>>>>>>>>>
>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>>> interface
>>>>>>> is
>>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>>> spindles if
>>>>>>>>> available.
>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>>> the
>>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>>>
>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a
>>>>>>>>> 1
>>>>>>>>> U
>>>>>>>>> chassis based on price. You're making a trade off and you should
>>>>>>>>> be
>>>>>>> aware of
>>>>>>>>> the performance hit you will take.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> -Mike
>>>>>>>>>
>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Michael,
>>>>>>>>>>
>>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>>>
>>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>>>
>>>>>>>>>> JM
>>>>>>>>>>
>>>>>>>>>> 2012/11/26, Michael Segel <[email protected]>:
>>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>>>
>>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>>> you
>>>>>>>>>>> will
>>>>>>>>>>> need to add more memory.
>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be
>>>>>>>>>>> disk
>>>>>>>>>>> i/o
>>>>>>>>>>> bound
>>>>>>>>>>> way too quickly.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>>> about
>>>>>>>>>>>> this:
>>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>>>
>>>>>>>>>>>> The replication factor depends heavily of the primary use of
>>>>>>>>>>>> your
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>>> hi
>>>>>>>>>>>>>
>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for
>>>>>>>>>>>>> a
>>>>>>> larger
>>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>>>
>>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>>> clusters
>>>>>>>>>>>>> when
>>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien Mogenet
>>>>>> 06.59.16.64.22
>>>>>> http://www.mogenet.me
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: recommended nodes

Reply via email to