I used to get this occasionally too, running Windows 2000.  It looks to
me like one of the tools can sometimes fail, for whatever reason, but
leave some kind of OS-level locks on some of the files.   This is
usually the UpdateDatabaseTool.  On the FOLLOWING run of the
UpdateDatabaseTool, the files in webdb.new don't get deleted, like they
should.  Hence the "file exists" errors as soon as the tool tries to
write to them.
 
The FetchListTool can exhibit similar symptoms.
 
I know practically nothing about the internals of Windows 2000, so I
don't know what kind of locks end up on the files; other than to say I
usually can't delete webdb.new directly (through Windows Explorer) after
a failed run of the UpdateDatabaseTool.
 
I don't have a good solution to this.  I've beefed up my environment so
that the most common failures of the UpdateDatabaseTool and
FetchListTool ("out of memory" and "out of disk space") don't happen any
more.  Therefore, it's really not a problem for me any more.
 
Regards,
David.
 
 
 
Date: Wed, 4 Jan 2006 22:58:37 +0800
From: Nguyen Ngoc Giang <[EMAIL PROTECTED]>
To: [email protected]
Subject: [Nutch-general] java.io.IOException: already exists

  Hi all,

  I'd like to bring back this topic, which has been ignored several
times i=
n
Nutch mailing list as well as JIRA (
http://issues.apache.org/jira/browse/NUTCH-94,
http://issues.apache.org/jira/browse/NUTCH-96,
http://issues.apache.org/jira/browse/NUTCH-117 ). Here is my error
stack:

060104 110314 Finishing update
060104 110314 Processing pagesByURL: Sorted 11 instructions in
0.016seconds=
.
060104 110314 Processing pagesByURL: Sorted 687.5 instructions/second
java.io.IOException: already exists:
C:\tomcat\webapps\ROOT\data\db\webdb.new\pagesByURL
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
        at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(
WebDBWriter.java:549)
        at
org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
        at org.apache.nutch.tools.UpdateDatabaseTool.close(
UpdateDatabaseTool.java:375)


  This error happens not only at update time, but also at fetchlist
time.
And the weird thing is that it happens so undeterministically. I
debugged
around and it seems the problem is because some CloseProcessors didn't
terminate correctly, causing the webdb.new not deletable. Then I try
to
reduce to only 1 thread, with lightweight load (as suggested in the
JIRA
discussion), but it doesn't help. But when I try to run step by step
using
debugging mode of the IDE, there was no problem.

  Can anyone help me to figure out this issue? Thanks very much.

  Regards,
  Giang



********************************************************************************
This email may contain legally privileged information and is intended only for 
the addressee. It is not necessarily the official view or 
communication of the New Zealand Qualifications Authority. If you are not the 
intended recipient you must not use, disclose, copy or distribute this email or 
information in it. If you have received this email in error, please contact the 
sender immediately. NZQA does not accept any liability for changes made to this 
email or attachments after sending by NZQA. 

All emails have been scanned for viruses and content by MailMarshal. 
NZQA reserves the right to monitor all email communications through its network.

********************************************************************************

Reply via email to