Hi, I ran TestDFSIO in my Hadoop cluster: *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write -nrFiles 100 -fileSize 10240* The report generated is: *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
*12/08/30 01:31:34 INFO fs.TestDFSIO: Date & time: Thu Aug 30 01:31:34 CDT 2012* *12/08/30 01:31:34 INFO fs.TestDFSIO: Number of files: 100* *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0* *12/08/30 01:31:34 INFO fs.TestDFSIO: Throughput mb/sec: 5.54130695296031* *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.875064849853516* *12/08/30 01:31:34 INFO fs.TestDFSIO: IO rate std deviation: 1.503623716482166* *12/08/30 01:31:34 INFO fs.TestDFSIO: Test exec time sec: 3490.168* ** I was refering to the blog: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ As per my understanding from that blog, I calculated *Throughput = (1024000*1000)/3490.168 = 293395.61* which is not my throughput ofcourse. Then I found a file in the HDFS output directory of the job: *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this: *f:rate 587506.5 f:sqrate 3677727.2 l:size 1073741824000 l:tasks 100 l:time 184793950* Then I applied this above time in the formula: *Throughput = (1024000*1000)/184793950 = 5.541* which is my throughput. Can someone tell me what exactly is this time in the HDFS output directory file "part-0000" ? Thanks, Gaurav Dasgupta
