It really depends on your usage pattern, but there's a balance wrt cost VS hardware you must achieve. At StumbleUpon we run with 2xi7, 24GB, 4x 1TB and it works like a charm. The only thing I would change is maybe more disks/node but that's pretty much it. Some relevant questions:
- Do you have any mem-intensive jobs? If so, figure how many tasks you'll run per node and make the RAM fit the load. - Do you plan to serve data out of HBase or will you just use it for MapReduce? Or will it be a mix (not recommended)? Also, keep in mind that losing 1 machine over 8 compared to 1 over 16 drastically changes the performance of your system at the time of the failure. About virtualization, it doesn't make sense. Also your disks should be in JBOD. J-D On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun <[email protected]> wrote: > I am thinking of the following problem lately. I started thinking of this > problem in the following context. > > I have a predefined budget and I can either > -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu + 128GB mem + > 16 x 1TB disk) or > -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu + 64GB mem + 8 > x 1TB disk) > NOTE: I am basically making up a half housepower scenario > -- Let's say I am going to use 10Gbps network switch and each machine has > a 10Gbps network card > > In the above scenario, does A or B perform better or relatively same? -- I > guess this really depends on Hadoop's map/reduce's scheduler. > > And then I have a following question: does it make sense to virtualize a > Hadoop datanode at all? (if the answer to above question is "relatively > same", I'd say it does not make sense) > > Thanks, > Sean >
