Obviously the algorithm matters, but here are some very old numbers (things today are much better), but you do see the 'linear' scaling with both nodes and datasets:
http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/ 100TB Sort - 97 mins 1000 TB Sort - 975 mins Arun On Jan 17, 2013, at 7:09 PM, Thiago Vieira wrote: > Hello! > > Is common to see this sentence: "Hadoop Scales Linearly". But, is there any > performance evaluation to confirm this? > > In my evaluations, Hadoop processing capacity scales linearly, but not > proportional to number of nodes, the processing capacity achieved with 20 > nodes is not the double of the processing capacity achieved with 10 nodes. Is > there any evaluation about this? > > Thank you! > > -- > Thiago Vieira -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
