I noticed recently that some Word Count jobs I've run are finishing with
the MAP_INPUT_BYTES counter missing.

I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
input was a single 100KB text file.

Questions:

   - Is it normal for any final counters values not to be present?
   - Is MAP_INPUT_BYTES the best was to determine total input data size? (I
   do so programmatically, while it's running and after the job is complete.)

The counters I did get:

Job Counters
 TOTAL_LAUNCHED_REDUCES:1
 SLOTS_MILLIS_MAPS: 6006
 FALLOW_SLOTS_MILLIS_REDUCES: 0
 FALLOW_SLOTS_MILLIS_MAPS: 0
 TOTAL_LAUNCHED_MAPS: 1
 DATA_LOCAL_MAPS: 1
 SLOTS_MILLIS_REDUCES: 9293
File Output Format Counters
 BYTES_WRITTEN: 366752
FileSystemCounters
 FILE_BYTES_READ: 505552
 HDFS_BYTES_READ: 1085517
 FILE_BYTES_WRITTEN: 1122685
 HDFS_BYTES_WRITTEN: 366752
File Input Format Counters
 BYTES_READ: 1085357
Map-Reduce Framework
 MAP_OUTPUT_MATERIALIZED_BYTES: 505552
 MAP_INPUT_RECORDS: 19446
 REDUCE_SHUFFLE_BYTES: 505552
 SPILLED_RECORDS: 70358
 MAP_OUTPUT_BYTES: 1750111
 CPU_MILLISECONDS: 5700
 COMMITTED_HEAP_BYTES: 401997824
 COMBINE_INPUT_RECORDS: 181151
 SPLIT_RAW_BYTES: 160
 REDUCE_INPUT_RECORDS: 35179
 REDUCE_INPUT_GROUPS: 35179
 COMBINE_OUTPUT_RECORDS:35179
 PHYSICAL_MEMORY_BYTES: 378482688
 REDUCE_OUTPUT_RECORDS: 35179
 VIRTUAL_MEMORY_BYTES: 1139838976
 MAP_OUTPUT_RECORDS: 181151


Here are most of the relevant screens from the JobTracker web interface:
http://jsfiddle.net/Fguyy/2/embedded/result/

Here is the JobTracker log (relevant time frame):
http://pastebin.com/dvsMn4fB

Thanks!
Philippe

-------------------------------
*Philippe Signoret*
Skype: philippesignoret
+33 6 95 89 55 55

Reply via email to