Re: recommended nodes

Jean-Marc Spaggiari Wed, 28 Nov 2012 10:05:39 -0800

Hi Gregory,

I founs this about LVM:
-> http://blog.andrew.net.au/2006/08/09
-> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2


Seems that performances are still correct with it. I will most
probably give it a try and bench that too... I have one new hard drive
which should arrived tomorrow. Perfect timing ;)



JM

2012/11/28, Mohit Anchlia <[email protected]>:
>
>
>
>
> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <[email protected]>
> wrote:
>
>> Does HBase really benefit from 64 GB of RAM since allocating too large
>> heap
>> might increase GC time ?
>>
> Benefit you get is from OS cache
>> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
>> (and thus keep 3x replication factor)
>>
>>
>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>> <[email protected]>wrote:
>>
>>> Sorry,
>>>
>>> I need to clarify.
>>>
>>> 4GB per physical core is a good starting point.
>>> So with 2 quad core chips, that is going to be 32GB.
>>>
>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>> (Actually
>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>
>>> If we start to price out memory, depending on vendor, your company's
>>> procurement,  there really isn't much of a price difference in terms of
>>> 32,48, or 64 GB.
>>> Note that it also depends on the chips themselves. Also you need to see
>>> how many memory channels exist in the mother board. You may need to buy
>>> in
>>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>>> keep an eye on your hardware vendor. Sometimes they will give you higher
>>> density chips that are going to be more expensive...) ;-)
>>>
>>> I tend to like having extra memory from the start.
>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>
>>> Looking at YARN... you will need more memory too.
>>>
>>>
>>> With respect to the hard drives...
>>>
>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>> replication.
>>> In this case, make sure that the disk controller cards can handle JBOD.
>>> (Some don't support JBOD out of the box)
>>>
>>> With respect to RAID...
>>>
>>> If you are running MapR, no need for RAID.
>>> If you are running an Apache derivative, you could use RAID 1. Then cut
>>> your replication to 2X. This makes it easier to manage drive failures.
>>> (Its not the norm, but it works...) In some clusters, they are using
>>> appliances like Net App's e series where the machines see the drives as
>>> local attached storage and I think the appliances themselves are using
>>> RAID.  I haven't played with this configuration, however it could make
>>> sense and its a valid design.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>> <[email protected]>
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> Thanks for all those details!
>>>>
>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>> good start? Or I simplified it to much?
>>>>
>>>> Regarding the hard drives. If you add more than one drive, do you need
>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>> configured to use more than one drive?
>>>>
>>>> Thanks,
>>>>
>>>> JM
>>>>
>>>> 2012/11/27, Michael Segel <[email protected]>:
>>>>>
>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>> inside
>>>>> joke ...]
>>>>>
>>>>> So here's the problem...
>>>>>
>>>>> By default, your child processes in a map/reduce job get a default
>>> 512MB.
>>>>> The majority of the time, this gets raised to 1GB.
>>>>>
>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>>> (Note:
>>>>> This is why when people talk about the number of cores, you have to
>>> specify
>>>>> physical cores or logical cores....)
>>>>>
>>>>> So if you were to over subscribe and have lets say 12  mappers and 12
>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>> memory
>>>>> reserved just for the child processes. This would leave 8GB for DN, TT
>>> and
>>>>> the rest of the linux OS processes.
>>>>>
>>>>> Can you live with that? Sure.
>>>>> Now add in R, HBase, Impala, or some other set of tools on top of the
>>>>> cluster.
>>>>>
>>>>> Ooops! Now you are in trouble because you will swap.
>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>> to
>>> 2
>>>>> GB. That means the 24 slots would now require 48GB.  Now you have swap
>>> and
>>>>> if that happens you will see HBase in a cascading failure.
>>>>>
>>>>> So while you can do a rolling restart with the changed configuration
>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>> slots
>>>>> which will mean in longer run time for your jobs. (Less slots == less
>>>>> parallelism )
>>>>>
>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>> around
>>>>> the same price point. (8GB chips)
>>>>>
>>>>> And I didn't even talk about adding SOLR either again a memory hog...
>>> ;-)
>>>>>
>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>> with
>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>>> to
>>>>> reducers, depending on the work flow....
>>>>>
>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>> interface
>>> is
>>>>> pretty much available in the new kit being shipped.
>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>> spindles if
>>>>> available.
>>>>> Otherwise you end up seeing your CPU load climb on wait states as the
>>>>> processes wait for the disk i/o to catch up.
>>>>>
>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>>>>> chassis based on price. You're making a trade off and you should be
>>> aware of
>>>>> the performance hit you will take.
>>>>>
>>>>> HTH
>>>>>
>>>>> -Mike
>>>>>
>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> so are you recommanding 32Gb per node?
>>>>>>
>>>>>> What about the disks? SATA drives are to slow?
>>>>>>
>>>>>> JM
>>>>>>
>>>>>> 2012/11/26, Michael Segel <[email protected]>:
>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>
>>>>>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>>>>>> will
>>>>>>> need to add more memory.
>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>> i/o
>>>>>>> bound
>>>>>>> way too quickly.
>>>>>>>
>>>>>>>
>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]> wrote:
>>>>>>>
>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>>>>>> this:
>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>> RAM: 24 GB DDR3
>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>> a SAS drive controller
>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>
>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>> cluster.
>>>>>>>>
>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>> hi
>>>>>>>>>
>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>> larger
>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>
>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>> clusters
>>>>>>>>> when
>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> David
>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>>> INFORMATICAS...
>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>
>>>>>>>>> http://www.uci.cu
>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>> INFORMATICAS...
>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>
>>>>>>>> http://www.uci.cu
>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>
>>
>> --
>> Adrien Mogenet
>> 06.59.16.64.22
>> http://www.mogenet.me
>

Re: recommended nodes

Reply via email to