Yes, I've often wondered about asymmetric configurations. Is there a mechanism to prevent partition map/reduce jobs to be aware of differences between speeds of processors and allocate less work the the slower processors?
To try to answer the question here: I have not had much experience with multi-node clusters, but I'd start with checking if the 4 cores are being used ... especiallly in the part of the process that takes the longest (Amdahl's law) ... you can only get a speedup if that is already happening. Here are a few other questions I go through: Does the process take very long? At the very least the task should take longer than twice the time it takes you to switch on switch on and boot up the other computer ... rebalance HDFS and then run the job and switch off the computer ... and all the investment in time to figure out how to use and maintain the multi-node configuration. How often do you need to run the job? ... if it is only once a day ... and it can be run in the background or while the processor is not busy, perhaps you can schedule it on your PC for when you are taking a break. Are you developing code? ... If so, it is perhaps more efficient to run on one computer and test with a small chunk of data. So, in summary, I'd use multiple computers as a last resort ... multi core is good enough for me most of the time. Thanks, -Ajo. On Sun, Feb 13, 2011 at 4:58 PM, Cam Bazz <camb...@gmail.com> wrote: > Hello, > > So all my statistics is finally being calculated, results being > processed etc, i have a 1 node cluster. Mainly taking 3 aggreate logs > from my apache logs. > > How far this setup will go? I have another machine ready to be hooked > up to my setup, and i wonder if it is worth at the moment to add this > and be a 2 node cluster. > > The first node has 8gb ram and a quad core 3.0ghz, The second computer > I have is much more noisy, and spends more electricity. İt has 8gb > ram and dual opterons with dual cores - and running at 2.0ghz. > > Best Regards, > C.B. >