Hi all,
I'd like to bring back this topic, which has been ignored several times in
Nutch mailing list as well as JIRA (
http://issues.apache.org/jira/browse/NUTCH-94,
http://issues.apache.org/jira/browse/NUTCH-96,
http://issues.apache.org/jira/browse/NUTCH-117 ). Here is my error stack:
060104 110314 Finishing update
060104 110314 Processing pagesByURL: Sorted 11 instructions in 0.016seconds.
060104 110314 Processing pagesByURL: Sorted 687.5 instructions/second
java.io.IOException: already exists:
C:\tomcat\webapps\ROOT\data\db\webdb.new\pagesByURL
at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(
WebDBWriter.java:549)
at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
at org.apache.nutch.tools.UpdateDatabaseTool.close(
UpdateDatabaseTool.java:375)
This error happens not only at update time, but also at fetchlist time.
And the weird thing is that it happens so undeterministically. I debugged
around and it seems the problem is because some CloseProcessors didn't
terminate correctly, causing the webdb.new not deletable. Then I try to
reduce to only 1 thread, with lightweight load (as suggested in the JIRA
discussion), but it doesn't help. But when I try to run step by step using
debugging mode of the IDE, there was no problem.
Can anyone help me to figure out this issue? Thanks very much.
Regards,
Giang