This also happened during fetching stage as well. JobTracker shows that Fetcher was successful, but that 98.30% of the Map was complete. This looks like user@nutch is the wrong list of this. I will take it over to MR. Lewis
On Tue, Jun 18, 2013 at 8:05 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > I have large seed lists (>1M) which Nutch 2.x continuously fetches for me. > I never noticed it before, but although it seems that the generator task > completes (both Map and Reduce) successfully, the Map task indicates that > 98.30% was completed... leaving some <2% unfinished. > I run Generate with generate.max.count = -1 and work byHost. > This is the first time I noticed this and wondered if anyone else has > noticed this, or if it is common for some tasks to not complete 100% but > still be marked as successful? > Some metrics below. > Thanks > Lewis > > 2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: Spilling map > output: record full = true > 2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; > bufend = 23833886; bufvoid = 99614720 > 2013-06-18 19:51:43,412 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; > kvend = 262144; length = 327680 > 2013-06-18 19:51:46,587 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: > Successfully loaded & initialized native-zlib library > 2013-06-18 19:51:46,588 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > 2013-06-18 19:51:47,466 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 0 > 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: Spilling map > output: record full = true > 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: bufstart = > 23833886; bufend = 47667631; bufvoid = 99614720 > 2013-06-18 19:51:55,800 INFO org.apache.hadoop.mapred.MapTask: kvstart = > 262144; kvend = 196607; length = 327680 > 2013-06-18 19:51:59,909 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 1 > 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: Spilling map > output: record full = true > 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: bufstart = > 47667631; bufend = 71500404; bufvoid = 99614720 > 2013-06-18 19:52:08,057 INFO org.apache.hadoop.mapred.MapTask: kvstart = > 196607; kvend = 131070; length = 327680 > 2013-06-18 19:52:11,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 2 > 2013-06-18 19:52:15,885 INFO org.apache.hadoop.mapred.MapTask: Starting flush > of map output > 2013-06-18 19:52:18,006 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 3 > 2013-06-18 19:52:18,009 INFO org.apache.hadoop.mapred.Merger: Merging 4 > sorted segments > 2013-06-18 19:52:18,011 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new decompressor > 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new decompressor > 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new decompressor > 2013-06-18 19:52:18,012 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new decompressor > 2013-06-18 19:52:18,012 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 4 segments left of total size: 4135088 bytes > 2013-06-18 19:52:20,375 INFO org.apache.hadoop.mapred.Merger: Merging 4 > sorted segments > 2013-06-18 19:52:20,376 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 4 segments left of total size: 4199798 bytes > 2013-06-18 19:52:22,777 INFO org.apache.hadoop.mapred.Task: > Task:attempt_201306181935_0002_m_000000_0 is done. And is in the process of > commiting > 2013-06-18 19:52:22,795 INFO org.apache.hadoop.mapred.Task: Task > 'attempt_201306181935_0002_m_000000_0' done. > 2013-06-18 19:52:22,797 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO: > Initialized cache for UID to User mapping with a cache timeout of 14400 > seconds. > 2013-06-18 19:52:22,975 INFO org.apache.hadoop.io.nativeio.NativeIO: Got > UserName law for UID 1000 from the native implementation > > > > -- > *Lewis* > -- *Lewis*

