Re: Big machines or (relatively) small machines?

Sean Bigdatafun Mon, 07 Jun 2010 22:49:21 -0700

On Mon, Jun 7, 2010 at 1:13 PM, Todd Lipcon <[email protected]> wrote:


> If those are your actual specs, I would definitely go with 16 of the
> smaller
> ones. 128G heaps are not going to work well in a JVM, you're better off
> running with more nodes with a more common configuration.
>

I am not using one JVM on a machine, right? Each Map/Reduce task use one
JVM, I believe. And actually, my question can really be boiled down to
whether the current map/reduce scheduler is smart enough to make best use of
resources. If it is smart enough, I think virtualization does not make too
much sense; if it's not smart enough, I guess virtualization may help to
improve performance.

But you are right,  here I was really making up a case -- "128G mem" is just
the number doubling the "smaller machine"'s memory.



>
> -Todd
>
> On Mon, Jun 7, 2010 at 1:46 PM, Jean-Daniel Cryans <[email protected]
> >wrote:
>
> > It really depends on your usage pattern, but there's a balance wrt
> > cost VS hardware you must achieve. At StumbleUpon we run with 2xi7,
> > 24GB, 4x 1TB and it works like a charm. The only thing I would change
> > is maybe more disks/node but that's pretty much it. Some relevant
> > questions:
> >
> >  - Do you have any mem-intensive jobs? If so, figure how many tasks
> > you'll run per node and make the RAM fit the load.
> >  - Do you plan to serve data out of HBase or will you just use it for
> > MapReduce? Or will it be a mix (not recommended)?
> >
> > Also, keep in mind that losing 1 machine over 8 compared to 1 over 16
> > drastically changes the performance of your system at the time of the
> > failure.
> >
> > About virtualization, it doesn't make sense. Also your disks should be in
> > JBOD.
> >
> > J-D
> >
> > On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun
> > <[email protected]> wrote:
> > > I am thinking of the following problem lately. I started thinking of
> this
> > > problem in the following context.
> > >
> > > I have a predefined budget and I can either
> > >  -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu +  128GB
> mem
> > +
> > > 16 x 1TB disk) or
> > >  -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu +  64GB mem
> +
> > 8
> > > x 1TB disk)
> > >          NOTE: I am basically making up a half housepower scenario
> > >  -- Let's say I am going to use 10Gbps network switch and each machine
> > has
> > > a 10Gbps network card
> > >
> > > In the above scenario, does A or B perform better or relatively same?
> --
> > I
> > > guess this really depends on Hadoop's map/reduce's scheduler.
> > >
> > > And then I have a following question: does it make sense to virtualize
> a
> > > Hadoop datanode at all?  (if the answer to above question is
> "relatively
> > > same", I'd say it does not make sense)
> > >
> > > Thanks,
> > > Sean
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Big machines or (relatively) small machines?

Reply via email to