Re: recommended nodes

Michael Segel Wed, 19 Dec 2012 17:15:13 -0800

Yeah, 
I couldn't argue against LVMs when talking with the system admins. 
In terms of speed its noise because the CPUs are pretty efficient and unless 
you have more than 1 drive per physical core, you will end up saturating your 
disk I/O.


In terms of MapR, you want the raw disk. (But we're talking Apache)


On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <[email protected]> 
wrote:

> Finally, it took me a while to run those tests because it was way
> longer than expected, but here are the results:
> 
> http://www.spaggiari.org/bonnie.html
> 
> LVM is not really slower than JBOD and not really taking more CPU. So
> I will say, if you have to choose between the 2, take the one you
> prefer. Personally, I prefer LVM because it's easy to configure.
> 
> The big winner here is RAID0. It's WAY faster than anything else. But
> it's using twice the space... Your choice.
> 
> I did not get a chance to test with the Ubuntu tool because it's not
> working with LVM drives.
> 
> JM
> 
> 2012/11/28, Michael Segel <[email protected]>:
>> Ok, just a caveat.
>> 
>> I am discussing MapR as part of a complete response. As Mohit posted MapR
>> takes the raw device for their MapR File System.
>> They do stripe on their own within what they call a volume.
>> 
>> But going back to Apache...
>> You can stripe drives, however I wouldn't recommend it. I don't think the
>> performance gains would really matter.
>> You're going to end up getting blocked first by disk i/o, then your
>> controller card, then your network... assuming 10GBe.
>> 
>> With only 2 disks on an 8 core system, you will hit disk i/o first and then
>> you'll watch your CPU Wait I/O climb.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <[email protected]>
>> wrote:
>> 
>>> Hi Mike,
>>> 
>>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
>>> at the same time, it should be better than RAID0 or a single drive,
>>> no?
>>> 
>>> 2012/11/28, Michael Segel <[email protected]>:
>>>> Just a couple of things.
>>>> 
>>>> I'm neutral on the use of LVMs. Some would point out that there's some
>>>> overhead, but on the flip side, it can make managing the machines
>>>> easier.
>>>> If you're using MapR, you don't want to use LVMs but raw devices.
>>>> 
>>>> In terms of GC, its going to depend on the heap size and not the total
>>>> memory. With respect to HBase. ... MSLABS is the way to go.
>>>> 
>>>> 
>>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
>>>> <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Gregory,
>>>>> 
>>>>> I founs this about LVM:
>>>>> -> http://blog.andrew.net.au/2006/08/09
>>>>> ->
>>>>> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>>>> 
>>>>> Seems that performances are still correct with it. I will most
>>>>> probably give it a try and bench that too... I have one new hard drive
>>>>> which should arrived tomorrow. Perfect timing ;)
>>>>> 
>>>>> 
>>>>> 
>>>>> JM
>>>>> 
>>>>> 2012/11/28, Mohit Anchlia <[email protected]>:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
>>>>>>> large
>>>>>>> heap
>>>>>>> might increase GC time ?
>>>>>>> 
>>>>>> Benefit you get is from OS cache
>>>>>>> Another question : why not RAID 0, in order to aggregate disk
>>>>>>> bandwidth
>>>>>>> ?
>>>>>>> (and thus keep 3x replication factor)
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>>>> <[email protected]>wrote:
>>>>>>> 
>>>>>>>> Sorry,
>>>>>>>> 
>>>>>>>> I need to clarify.
>>>>>>>> 
>>>>>>>> 4GB per physical core is a good starting point.
>>>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>>>> 
>>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>>>> (Actually
>>>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>>>> 
>>>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>>>> of
>>>>>>>> 32,48, or 64 GB.
>>>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>>>> see
>>>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>>>> buy
>>>>>>>> in
>>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>>>> to
>>>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>>>> higher
>>>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>>>> 
>>>>>>>> I tend to like having extra memory from the start.
>>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
>>>>>>>> code.
>>>>>>>> 
>>>>>>>> Looking at YARN... you will need more memory too.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> With respect to the hard drives...
>>>>>>>> 
>>>>>>>> The best recommendation is to keep the drives as JBOD and then use
>>>>>>>> 3x
>>>>>>>> replication.
>>>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>>>> JBOD.
>>>>>>>> (Some don't support JBOD out of the box)
>>>>>>>> 
>>>>>>>> With respect to RAID...
>>>>>>>> 
>>>>>>>> If you are running MapR, no need for RAID.
>>>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>>>> cut
>>>>>>>> your replication to 2X. This makes it easier to manage drive
>>>>>>>> failures.
>>>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>>>> as
>>>>>>>> local attached storage and I think the appliances themselves are
>>>>>>>> using
>>>>>>>> RAID.  I haven't played with this configuration, however it could
>>>>>>>> make
>>>>>>>> sense and its a valid design.
>>>>>>>> 
>>>>>>>> HTH
>>>>>>>> 
>>>>>>>> -Mike
>>>>>>>> 
>>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Mike,
>>>>>>>>> 
>>>>>>>>> Thanks for all those details!
>>>>>>>>> 
>>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
>>>>>>>>> 64GB.
>>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are
>>>>>>>>> a
>>>>>>>>> good start? Or I simplified it to much?
>>>>>>>>> 
>>>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>>>> need
>>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>>>> configured to use more than one drive?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> JM
>>>>>>>>> 
>>>>>>>>> 2012/11/27, Michael Segel <[email protected]>:
>>>>>>>>>> 
>>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>>>> inside
>>>>>>>>>> joke ...]
>>>>>>>>>> 
>>>>>>>>>> So here's the problem...
>>>>>>>>>> 
>>>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>>>> 512MB.
>>>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>>>> 
>>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>>>> Linux.
>>>>>>>> (Note:
>>>>>>>>>> This is why when people talk about the number of cores, you have
>>>>>>>>>> to
>>>>>>>> specify
>>>>>>>>>> physical cores or logical cores....)
>>>>>>>>>> 
>>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>>>> 12
>>>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>>>> memory
>>>>>>>>>> reserved just for the child processes. This would leave 8GB for
>>>>>>>>>> DN,
>>>>>>>>>> TT
>>>>>>>> and
>>>>>>>>>> the rest of the linux OS processes.
>>>>>>>>>> 
>>>>>>>>>> Can you live with that? Sure.
>>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>>>> the
>>>>>>>>>> cluster.
>>>>>>>>>> 
>>>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>>>> Also adding in R, you may want to bump up those child procs from
>>>>>>>>>> 1GB
>>>>>>>>>> to
>>>>>>>> 2
>>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>>>> swap
>>>>>>>> and
>>>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>>>> 
>>>>>>>>>> So while you can do a rolling restart with the changed
>>>>>>>>>> configuration
>>>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>>>> slots
>>>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>>>> less
>>>>>>>>>> parallelism )
>>>>>>>>>> 
>>>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB
>>>>>>>>>> for
>>>>>>>> around
>>>>>>>>>> the same price point. (8GB chips)
>>>>>>>>>> 
>>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>>>> hog...
>>>>>>>> ;-)
>>>>>>>>>> 
>>>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>>>> with
>>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>>>> mappers
>>>>>>>> to
>>>>>>>>>> reducers, depending on the work flow....
>>>>>>>>>> 
>>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>>>> interface
>>>>>>>> is
>>>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>>>> spindles if
>>>>>>>>>> available.
>>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>>>> the
>>>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>>>> 
>>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a
>>>>>>>>>> 1
>>>>>>>>>> U
>>>>>>>>>> chassis based on price. You're making a trade off and you should
>>>>>>>>>> be
>>>>>>>> aware of
>>>>>>>>>> the performance hit you will take.
>>>>>>>>>> 
>>>>>>>>>> HTH
>>>>>>>>>> 
>>>>>>>>>> -Mike
>>>>>>>>>> 
>>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Michael,
>>>>>>>>>>> 
>>>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>>>> 
>>>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>>>> 
>>>>>>>>>>> JM
>>>>>>>>>>> 
>>>>>>>>>>> 2012/11/26, Michael Segel <[email protected]>:
>>>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>>>> you
>>>>>>>>>>>> will
>>>>>>>>>>>> need to add more memory.
>>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be
>>>>>>>>>>>> disk
>>>>>>>>>>>> i/o
>>>>>>>>>>>> bound
>>>>>>>>>>>> way too quickly.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>>>> about
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The replication factor depends heavily of the primary use of
>>>>>>>>>>>>> your
>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>>>> hi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for
>>>>>>>>>>>>>> a
>>>>>>>> larger
>>>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>>>> clusters
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Adrien Mogenet
>>>>>>> 06.59.16.64.22
>>>>>>> http://www.mogenet.me
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: recommended nodes

Reply via email to