Hi, I have large seed lists (>1M) which Nutch 2.x continuously fetches for me. I never noticed it before, but although it seems that the generator task completes (both Map and Reduce) successfully, the Map task indicates that 98.30% was completed... leaving some <2% unfinished. I run Generate with generate.max.count = -1 and work byHost. This is the first time I noticed this and wondered if anyone else has noticed this, or if it is common for some tasks to not complete 100% but still be marked as successful? Some metrics below. Thanks Lewis
2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 23833886; bufvoid = 99614720 2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680 2013-06-18 19:51:46,587 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2013-06-18 19:51:46,588 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2013-06-18 19:51:47,466 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: bufstart = 23833886; bufend = 47667631; bufvoid = 99614720 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680 2013-06-18 19:51:59,909 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: bufstart = 47667631; bufend = 71500404; bufvoid = 99614720 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680 2013-06-18 19:52:11,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 2013-06-18 19:52:15,885 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2013-06-18 19:52:18,006 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3 2013-06-18 19:52:18,009 INFO org.apache.hadoop.mapred.Merger: Merging 4 sorted segments 2013-06-18 19:52:18,011 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2013-06-18 19:52:18,012 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 4135088 bytes 2013-06-18 19:52:20,375 INFO org.apache.hadoop.mapred.Merger: Merging 4 sorted segments 2013-06-18 19:52:20,376 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 4199798 bytes 2013-06-18 19:52:22,777 INFO org.apache.hadoop.mapred.Task: Task:attempt_201306181935_0002_m_000000_0 is done. And is in the process of commiting 2013-06-18 19:52:22,795 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201306181935_0002_m_000000_0' done. 2013-06-18 19:52:22,797 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName law for UID 1000 from the native implementation -- *Lewis*

