On Mon, Jun 7, 2010 at 1:13 PM, Todd Lipcon <[email protected]> wrote:
> If those are your actual specs, I would definitely go with 16 of the > smaller > ones. 128G heaps are not going to work well in a JVM, you're better off > running with more nodes with a more common configuration. > I am not using one JVM on a machine, right? Each Map/Reduce task use one JVM, I believe. And actually, my question can really be boiled down to whether the current map/reduce scheduler is smart enough to make best use of resources. If it is smart enough, I think virtualization does not make too much sense; if it's not smart enough, I guess virtualization may help to improve performance. But you are right, here I was really making up a case -- "128G mem" is just the number doubling the "smaller machine"'s memory. > > -Todd > > On Mon, Jun 7, 2010 at 1:46 PM, Jean-Daniel Cryans <[email protected] > >wrote: > > > It really depends on your usage pattern, but there's a balance wrt > > cost VS hardware you must achieve. At StumbleUpon we run with 2xi7, > > 24GB, 4x 1TB and it works like a charm. The only thing I would change > > is maybe more disks/node but that's pretty much it. Some relevant > > questions: > > > > - Do you have any mem-intensive jobs? If so, figure how many tasks > > you'll run per node and make the RAM fit the load. > > - Do you plan to serve data out of HBase or will you just use it for > > MapReduce? Or will it be a mix (not recommended)? > > > > Also, keep in mind that losing 1 machine over 8 compared to 1 over 16 > > drastically changes the performance of your system at the time of the > > failure. > > > > About virtualization, it doesn't make sense. Also your disks should be in > > JBOD. > > > > J-D > > > > On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun > > <[email protected]> wrote: > > > I am thinking of the following problem lately. I started thinking of > this > > > problem in the following context. > > > > > > I have a predefined budget and I can either > > > -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu + 128GB > mem > > + > > > 16 x 1TB disk) or > > > -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu + 64GB mem > + > > 8 > > > x 1TB disk) > > > NOTE: I am basically making up a half housepower scenario > > > -- Let's say I am going to use 10Gbps network switch and each machine > > has > > > a 10Gbps network card > > > > > > In the above scenario, does A or B perform better or relatively same? > -- > > I > > > guess this really depends on Hadoop's map/reduce's scheduler. > > > > > > And then I have a following question: does it make sense to virtualize > a > > > Hadoop datanode at all? (if the answer to above question is > "relatively > > > same", I'd say it does not make sense) > > > > > > Thanks, > > > Sean > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
