On Thu, Jun 10, 2010 at 11:27 AM, Buttler, David <[email protected]> wrote:
> It turns out that we just received a quote from a supplier where a rack of > 2U 128 GB machines with 16 cores (4x4 I think) and 8 1TB disks is cheaper > than a rack of 1U machines with exactly half the spec (64 GB RAM, 8 core, 4 > 1TB disks). My initial thought was that it would be better to have the 2U > machines as it would give us more flexibility if we wanted to have some > map/reduce jobs that use more than 8 GB per map task. > Both of those configurations are very RAM-heavy for typical workloads. I would recommend as a standard "balanced" configuration: 8 core, 6 disk, 24G RAM. Also, take a look at: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ -Todd > The only worry is how it would affect HBase. Would it be better to have 20 > region servers with a 16GB heap and 2 dedicated cores, or 40 region servers > with a 8GB heap and one core? [Of course I realize we can't dedicate a core > to a region server, but we can limit the number of map/reduce jobs so that > there would be no more than 14 or 7 of them depending on the configuration] > > Finally, it seems like there are a bunch of related parameters that make > sense to change together depending on heap size and avg row size. Is there > a single place that describes the interrelatedness of the parameters so that > I don't have to guess or reconstruct good settings from 10-100 emails on the > list? If I understood the issues I would be happy to write it up, but I am > afraid I don't. > > Thanks, > Dave > > -----Original Message----- > From: Ryan Rawson [mailto:[email protected]] > Sent: Monday, June 07, 2010 10:51 PM > To: [email protected] > Subject: Re: Big machines or (relatively) small machines? > > I would take it one notch smaller, 32GB ram per node is probably more > than enough... > > It would be hard to get full utilization of 128GB ram, and maybe even > 64GB. With 32GB you might even be able to get 2GB dimms (much > cheaper). > > -ryan > > On Mon, Jun 7, 2010 at 10:48 PM, Sean Bigdatafun > <[email protected]> wrote: > > On Mon, Jun 7, 2010 at 1:13 PM, Todd Lipcon <[email protected]> wrote: > > > >> If those are your actual specs, I would definitely go with 16 of the > >> smaller > >> ones. 128G heaps are not going to work well in a JVM, you're better off > >> running with more nodes with a more common configuration. > >> > > > > I am not using one JVM on a machine, right? Each Map/Reduce task use one > > JVM, I believe. And actually, my question can really be boiled down to > > whether the current map/reduce scheduler is smart enough to make best use > of > > resources. If it is smart enough, I think virtualization does not make > too > > much sense; if it's not smart enough, I guess virtualization may help to > > improve performance. > > > > But you are right, here I was really making up a case -- "128G mem" is > just > > the number doubling the "smaller machine"'s memory. > > > > > > > >> > >> -Todd > >> > >> On Mon, Jun 7, 2010 at 1:46 PM, Jean-Daniel Cryans <[email protected] > >> >wrote: > >> > >> > It really depends on your usage pattern, but there's a balance wrt > >> > cost VS hardware you must achieve. At StumbleUpon we run with 2xi7, > >> > 24GB, 4x 1TB and it works like a charm. The only thing I would change > >> > is maybe more disks/node but that's pretty much it. Some relevant > >> > questions: > >> > > >> > - Do you have any mem-intensive jobs? If so, figure how many tasks > >> > you'll run per node and make the RAM fit the load. > >> > - Do you plan to serve data out of HBase or will you just use it for > >> > MapReduce? Or will it be a mix (not recommended)? > >> > > >> > Also, keep in mind that losing 1 machine over 8 compared to 1 over 16 > >> > drastically changes the performance of your system at the time of the > >> > failure. > >> > > >> > About virtualization, it doesn't make sense. Also your disks should be > in > >> > JBOD. > >> > > >> > J-D > >> > > >> > On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun > >> > <[email protected]> wrote: > >> > > I am thinking of the following problem lately. I started thinking of > >> this > >> > > problem in the following context. > >> > > > >> > > I have a predefined budget and I can either > >> > > -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu + 128GB > >> mem > >> > + > >> > > 16 x 1TB disk) or > >> > > -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu + 64GB > mem > >> + > >> > 8 > >> > > x 1TB disk) > >> > > NOTE: I am basically making up a half housepower scenario > >> > > -- Let's say I am going to use 10Gbps network switch and each > machine > >> > has > >> > > a 10Gbps network card > >> > > > >> > > In the above scenario, does A or B perform better or relatively > same? > >> -- > >> > I > >> > > guess this really depends on Hadoop's map/reduce's scheduler. > >> > > > >> > > And then I have a following question: does it make sense to > virtualize > >> a > >> > > Hadoop datanode at all? (if the answer to above question is > >> "relatively > >> > > same", I'd say it does not make sense) > >> > > > >> > > Thanks, > >> > > Sean > >> > > > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > > -- Todd Lipcon Software Engineer, Cloudera
