I fetch 3 segments and then do updatedb with 3 segments. The updatedb job is
completed, but the crawldb is not updated (by checking urls in crawldb). The
locked file and temp directory are still in the crawldb directory.
Apparently updatedb stops before merging is done.  There is only one error
message:

2010-09-26 16:02:48,122 WARN  mapred.TaskTracker - Error running child
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
        at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
        at java.io.FilterInputStream.close(FilterInputStream.java:155)
        at
org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
        at
org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

What causes updatedb to stop?

the updatedb job status is here:
map() completion: 1.0
reduce() completion: 1.0
Counters: 18
        Job Counters
                Launched reduce tasks=8
                Launched map tasks=171
                Data-local map tasks=171
        FileSystemCounters
                FILE_BYTES_READ=31437553119
                HDFS_BYTES_READ=17803591396
                FILE_BYTES_WRITTEN=47532638653
                HDFS_BYTES_WRITTEN=7460484375
        Map-Reduce Framework
                Reduce input groups=45926460
                Combine output records=0
                Map input records=139824153
                Reduce shuffle bytes=16103425228
                Reduce output records=45926460
                Spilled Records=409880576
                Map output bytes=15813496752
                Map input bytes=17803440460
                Combine input records=0
                Map output records=137962810
                Reduce input records=137962810

thanks
aj
-- 
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA

Reply via email to