Hi Ch Huang Are you sure your input data size is 170G? Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024). Each file will be considered as separate split which may be small.
Please cross check the input size using CLI Regards Nishan From: ch huang [mailto:[email protected]] Sent: 03 December 2013 01:58 PM To: [email protected] Subject: issue about total input byte of MR job i run the MR job,at the MR output i see 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717 because my each data block size is 64M,so total byte is 2717*64M/1024= 170G but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why? File System Counters FILE: Number of bytes read=9642910241 FILE: Number of bytes written=120327706125 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=126792190158 HDFS: Number of bytes written=0 HDFS: Number of read operations=8151 HDFS: Number of large read operations=0 HDFS: Number of write operations=0
