It depend on your input data. E.g. your input consists of 10 files, each is 65M, then each file will take 2 mappers, overall it would cost 20 mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <[email protected]> wrote: > i run the MR job,at the MR output i see > > 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717 > > because my each data block size is 64M,so total byte is 2717*64M/1024= 170G > > but in the summary of end i see follow info ,so the HDFS read byte is > 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why? > > File System Counters > FILE: Number of bytes read=9642910241 > FILE: Number of bytes written=120327706125 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=126792190158 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=8151 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 >
