THis is the input. please help with code example.
<Company> <Employee> <id>100</id> <ename>ert</ename> <Address> <home>eewre</home> <office>wefwef</office> </address> </employee> </Company> On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <[email protected]> wrote: > It depend on your input data. E.g. your input consists of 10 files, each > is 65M, then each file will take 2 mappers, overall it would cost 20 > mappers, but the input size is actually 650M rather than 20*64=1280M > > > On Tue, Dec 3, 2013 at 4:28 PM, ch huang <[email protected]> wrote: > >> i run the MR job,at the MR output i see >> >> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717 >> >> because my each data block size is 64M,so total byte is 2717*64M/1024= >> 170G >> >> but in the summary of end i see follow info ,so the HDFS read byte is >> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why? >> >> File System Counters >> FILE: Number of bytes read=9642910241 >> FILE: Number of bytes written=120327706125 >> FILE: Number of read operations=0 >> FILE: Number of large read operations=0 >> FILE: Number of write operations=0 >> HDFS: Number of bytes read=126792190158 >> HDFS: Number of bytes written=0 >> HDFS: Number of read operations=8151 >> HDFS: Number of large read operations=0 >> HDFS: Number of write operations=0 >> > >
