Hi,
I have large seed lists (>1M) which Nutch 2.x continuously fetches for me.
I never noticed it before, but although it seems that the generator task
completes (both Map and Reduce) successfully, the Map task indicates that
98.30% was completed... leaving some <2% unfinished.
I run Generate with generate.max.count = -1 and work byHost.
This is the first time I noticed this and wondered if anyone else has
noticed this, or if it is common for some tasks to not complete 100% but
still be marked as successful?
Some metrics below.
Thanks
Lewis

2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 23833886; bufvoid = 99614720
2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 0; kvend = 262144; length = 327680
2013-06-18 19:51:46,587 INFO
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2013-06-18 19:51:46,588 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2013-06-18 19:51:47,466 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 23833886; bufend = 47667631; bufvoid = 99614720
2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 262144; kvend = 196607; length = 327680
2013-06-18 19:51:59,909 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 47667631; bufend = 71500404; bufvoid = 99614720
2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 196607; kvend = 131070; length = 327680
2013-06-18 19:52:11,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2
2013-06-18 19:52:15,885 INFO org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2013-06-18 19:52:18,006 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3
2013-06-18 19:52:18,009 INFO org.apache.hadoop.mapred.Merger: Merging
4 sorted segments
2013-06-18 19:52:18,011 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2013-06-18 19:52:18,012 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 4 segments left of total size: 4135088 bytes
2013-06-18 19:52:20,375 INFO org.apache.hadoop.mapred.Merger: Merging
4 sorted segments
2013-06-18 19:52:20,376 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 4 segments left of total size: 4199798 bytes
2013-06-18 19:52:22,777 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201306181935_0002_m_000000_0 is done. And is in the
process of commiting
2013-06-18 19:52:22,795 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201306181935_0002_m_000000_0' done.
2013-06-18 19:52:22,797 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of
14400 seconds.
2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO:
Got UserName law for UID 1000 from the native implementation



-- 
*Lewis*

Reply via email to