I generically dislike graphs with offset references. Your run-time graph has a baseline of 1500 seconds which makes it tricky for the reader to understand your (entirely correct) statement that having more than 8 machines isn't helpful. The way you plotted the graph coincidentally looks like nearly perfect speedup across the entire range.
No comments about your setup. My guess is that you could tune hadoop to get a better result due to lower overheads but the results won't be categorically different. Iterative algorithms on stock hadoop are just plain problematic. On Fri, Mar 4, 2011 at 4:33 PM, Danny Bickson <[email protected]>wrote: > I would love to get any feedback you or others may have about the setup of > this experiment. >
