On Thu, Jun 10, 2010 at 11:27 AM, Buttler, David <[email protected]> wrote:

> It turns out that we just received a quote from a supplier where a rack of
> 2U 128 GB machines with 16 cores (4x4 I think) and 8 1TB disks is cheaper
> than a rack of 1U machines with exactly half the spec (64 GB RAM, 8 core, 4
> 1TB disks).  My initial thought was that it would be better to have the 2U
> machines as it would give us more flexibility if we wanted to have some
> map/reduce jobs that use more than 8 GB per map task.
>

Both of those configurations are very RAM-heavy for typical workloads.

I would recommend as a standard "balanced" configuration: 8 core, 6 disk,
24G RAM.

Also, take a look at:
http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/

-Todd




> The only worry is how it would affect HBase.  Would it be better to have 20
> region servers with a 16GB heap and 2 dedicated cores, or 40 region servers
> with a 8GB heap and one core? [Of course I realize we can't dedicate a core
> to a region server, but we can limit the number of map/reduce jobs so that
> there would be no more than 14 or 7 of them depending on the configuration]
>
> Finally, it seems like there are a bunch of related parameters that make
> sense to change together depending on heap size and avg row size.  Is there
> a single place that describes the interrelatedness of the parameters so that
> I don't have to guess or reconstruct good settings from 10-100 emails on the
> list?  If I understood the issues I would be happy to write it up, but I am
> afraid I don't.
>
> Thanks,
> Dave
>
> -----Original Message-----
> From: Ryan Rawson [mailto:[email protected]]
> Sent: Monday, June 07, 2010 10:51 PM
> To: [email protected]
> Subject: Re: Big machines or (relatively) small machines?
>
> I would take it one notch smaller, 32GB ram per node is probably more
> than enough...
>
> It would be hard to get full utilization of 128GB ram, and maybe even
> 64GB.  With 32GB you might even be able to get 2GB dimms (much
> cheaper).
>
> -ryan
>
> On Mon, Jun 7, 2010 at 10:48 PM, Sean Bigdatafun
> <[email protected]> wrote:
> > On Mon, Jun 7, 2010 at 1:13 PM, Todd Lipcon <[email protected]> wrote:
> >
> >> If those are your actual specs, I would definitely go with 16 of the
> >> smaller
> >> ones. 128G heaps are not going to work well in a JVM, you're better off
> >> running with more nodes with a more common configuration.
> >>
> >
> > I am not using one JVM on a machine, right? Each Map/Reduce task use one
> > JVM, I believe. And actually, my question can really be boiled down to
> > whether the current map/reduce scheduler is smart enough to make best use
> of
> > resources. If it is smart enough, I think virtualization does not make
> too
> > much sense; if it's not smart enough, I guess virtualization may help to
> > improve performance.
> >
> > But you are right,  here I was really making up a case -- "128G mem" is
> just
> > the number doubling the "smaller machine"'s memory.
> >
> >
> >
> >>
> >> -Todd
> >>
> >> On Mon, Jun 7, 2010 at 1:46 PM, Jean-Daniel Cryans <[email protected]
> >> >wrote:
> >>
> >> > It really depends on your usage pattern, but there's a balance wrt
> >> > cost VS hardware you must achieve. At StumbleUpon we run with 2xi7,
> >> > 24GB, 4x 1TB and it works like a charm. The only thing I would change
> >> > is maybe more disks/node but that's pretty much it. Some relevant
> >> > questions:
> >> >
> >> >  - Do you have any mem-intensive jobs? If so, figure how many tasks
> >> > you'll run per node and make the RAM fit the load.
> >> >  - Do you plan to serve data out of HBase or will you just use it for
> >> > MapReduce? Or will it be a mix (not recommended)?
> >> >
> >> > Also, keep in mind that losing 1 machine over 8 compared to 1 over 16
> >> > drastically changes the performance of your system at the time of the
> >> > failure.
> >> >
> >> > About virtualization, it doesn't make sense. Also your disks should be
> in
> >> > JBOD.
> >> >
> >> > J-D
> >> >
> >> > On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun
> >> > <[email protected]> wrote:
> >> > > I am thinking of the following problem lately. I started thinking of
> >> this
> >> > > problem in the following context.
> >> > >
> >> > > I have a predefined budget and I can either
> >> > >  -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu +  128GB
> >> mem
> >> > +
> >> > > 16 x 1TB disk) or
> >> > >  -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu +  64GB
> mem
> >> +
> >> > 8
> >> > > x 1TB disk)
> >> > >          NOTE: I am basically making up a half housepower scenario
> >> > >  -- Let's say I am going to use 10Gbps network switch and each
> machine
> >> > has
> >> > > a 10Gbps network card
> >> > >
> >> > > In the above scenario, does A or B perform better or relatively
> same?
> >> --
> >> > I
> >> > > guess this really depends on Hadoop's map/reduce's scheduler.
> >> > >
> >> > > And then I have a following question: does it make sense to
> virtualize
> >> a
> >> > > Hadoop datanode at all?  (if the answer to above question is
> >> "relatively
> >> > > same", I'd say it does not make sense)
> >> > >
> >> > > Thanks,
> >> > > Sean
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to