Hi,
Few months ago I started a crawl in a single machine (one process).
Now I'm trying to continue this crawl with Hadoop file system on the same
machine using the tutorial "How to Setup Nutch (V1.1) and Hadoop".
When I run a crawl (TopN=25000, depth=7) with the new configuration, mergesegs
fails.
The failed job details shows:
-----------------
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
------------------
Any idea?
Regards
Patricio