Hi all,

  I'd like to bring back this topic, which has been ignored several times in
Nutch mailing list as well as JIRA (
http://issues.apache.org/jira/browse/NUTCH-94,
http://issues.apache.org/jira/browse/NUTCH-96,
http://issues.apache.org/jira/browse/NUTCH-117 ). Here is my error stack:

060104 110314 Finishing update
060104 110314 Processing pagesByURL: Sorted 11 instructions in 0.016seconds.
060104 110314 Processing pagesByURL: Sorted 687.5 instructions/second
java.io.IOException: already exists:
C:\tomcat\webapps\ROOT\data\db\webdb.new\pagesByURL
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
        at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(
WebDBWriter.java:549)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
        at org.apache.nutch.tools.UpdateDatabaseTool.close(
UpdateDatabaseTool.java:375)


  This error happens not only at update time, but also at fetchlist time.
And the weird thing is that it happens so undeterministically. I debugged
around and it seems the problem is because some CloseProcessors didn't
terminate correctly, causing the webdb.new not deletable. Then I try to
reduce to only 1 thread, with lightweight load (as suggested in the JIRA
discussion), but it doesn't help. But when I try to run step by step using
debugging mode of the IDE, there was no problem.

  Can anyone help me to figure out this issue? Thanks very much.

  Regards,
  Giang

Reply via email to