I am thinking of the following problem lately. I started thinking of this
problem in the following context.
I have a predefined budget and I can either
-- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu + 128GB mem +
16 x 1TB disk) or
-- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu + 64GB mem + 8
x 1TB disk)
NOTE: I am basically making up a half housepower scenario
-- Let's say I am going to use 10Gbps network switch and each machine has
a 10Gbps network card
In the above scenario, does A or B perform better or relatively same? -- I
guess this really depends on Hadoop's map/reduce's scheduler.
And then I have a following question: does it make sense to virtualize a
Hadoop datanode at all? (if the answer to above question is "relatively
same", I'd say it does not make sense)
Thanks,
Sean