I would take it one notch smaller, 32GB ram per node is probably more
than enough...

It would be hard to get full utilization of 128GB ram, and maybe even
64GB.  With 32GB you might even be able to get 2GB dimms (much
cheaper).

-ryan

On Mon, Jun 7, 2010 at 10:48 PM, Sean Bigdatafun
<[email protected]> wrote:
> On Mon, Jun 7, 2010 at 1:13 PM, Todd Lipcon <[email protected]> wrote:
>
>> If those are your actual specs, I would definitely go with 16 of the
>> smaller
>> ones. 128G heaps are not going to work well in a JVM, you're better off
>> running with more nodes with a more common configuration.
>>
>
> I am not using one JVM on a machine, right? Each Map/Reduce task use one
> JVM, I believe. And actually, my question can really be boiled down to
> whether the current map/reduce scheduler is smart enough to make best use of
> resources. If it is smart enough, I think virtualization does not make too
> much sense; if it's not smart enough, I guess virtualization may help to
> improve performance.
>
> But you are right,  here I was really making up a case -- "128G mem" is just
> the number doubling the "smaller machine"'s memory.
>
>
>
>>
>> -Todd
>>
>> On Mon, Jun 7, 2010 at 1:46 PM, Jean-Daniel Cryans <[email protected]
>> >wrote:
>>
>> > It really depends on your usage pattern, but there's a balance wrt
>> > cost VS hardware you must achieve. At StumbleUpon we run with 2xi7,
>> > 24GB, 4x 1TB and it works like a charm. The only thing I would change
>> > is maybe more disks/node but that's pretty much it. Some relevant
>> > questions:
>> >
>> >  - Do you have any mem-intensive jobs? If so, figure how many tasks
>> > you'll run per node and make the RAM fit the load.
>> >  - Do you plan to serve data out of HBase or will you just use it for
>> > MapReduce? Or will it be a mix (not recommended)?
>> >
>> > Also, keep in mind that losing 1 machine over 8 compared to 1 over 16
>> > drastically changes the performance of your system at the time of the
>> > failure.
>> >
>> > About virtualization, it doesn't make sense. Also your disks should be in
>> > JBOD.
>> >
>> > J-D
>> >
>> > On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun
>> > <[email protected]> wrote:
>> > > I am thinking of the following problem lately. I started thinking of
>> this
>> > > problem in the following context.
>> > >
>> > > I have a predefined budget and I can either
>> > >  -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu +  128GB
>> mem
>> > +
>> > > 16 x 1TB disk) or
>> > >  -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu +  64GB mem
>> +
>> > 8
>> > > x 1TB disk)
>> > >          NOTE: I am basically making up a half housepower scenario
>> > >  -- Let's say I am going to use 10Gbps network switch and each machine
>> > has
>> > > a 10Gbps network card
>> > >
>> > > In the above scenario, does A or B perform better or relatively same?
>> --
>> > I
>> > > guess this really depends on Hadoop's map/reduce's scheduler.
>> > >
>> > > And then I have a following question: does it make sense to virtualize
>> a
>> > > Hadoop datanode at all?  (if the answer to above question is
>> "relatively
>> > > same", I'd say it does not make sense)
>> > >
>> > > Thanks,
>> > > Sean
>> > >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Reply via email to