I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file. Questions: - Is it normal for any final counters values not to be present? - Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.) The counters I did get: Job Counters TOTAL_LAUNCHED_REDUCES:1 SLOTS_MILLIS_MAPS: 6006 FALLOW_SLOTS_MILLIS_REDUCES: 0 FALLOW_SLOTS_MILLIS_MAPS: 0 TOTAL_LAUNCHED_MAPS: 1 DATA_LOCAL_MAPS: 1 SLOTS_MILLIS_REDUCES: 9293 File Output Format Counters BYTES_WRITTEN: 366752 FileSystemCounters FILE_BYTES_READ: 505552 HDFS_BYTES_READ: 1085517 FILE_BYTES_WRITTEN: 1122685 HDFS_BYTES_WRITTEN: 366752 File Input Format Counters BYTES_READ: 1085357 Map-Reduce Framework MAP_OUTPUT_MATERIALIZED_BYTES: 505552 MAP_INPUT_RECORDS: 19446 REDUCE_SHUFFLE_BYTES: 505552 SPILLED_RECORDS: 70358 MAP_OUTPUT_BYTES: 1750111 CPU_MILLISECONDS: 5700 COMMITTED_HEAP_BYTES: 401997824 COMBINE_INPUT_RECORDS: 181151 SPLIT_RAW_BYTES: 160 REDUCE_INPUT_RECORDS: 35179 REDUCE_INPUT_GROUPS: 35179 COMBINE_OUTPUT_RECORDS:35179 PHYSICAL_MEMORY_BYTES: 378482688 REDUCE_OUTPUT_RECORDS: 35179 VIRTUAL_MEMORY_BYTES: 1139838976 MAP_OUTPUT_RECORDS: 181151 Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/ Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB Thanks! Philippe ------------------------------- *Philippe Signoret* Skype: philippesignoret +33 6 95 89 55 55
