Thanks for replying, folks :)

  I'm running Nutch on Windows Server 2003. More details, I developed a web
UI in Tomcat for user to submit crawl request. The request is controlled by
a servlet, at which the crawler is initiated.

  In the first time, I also thought that the problem I encountered is
because of the NTFS. As mentioned in Java documentation, locking in Windows
is mandatory, so it may cause some difficulties in deleting the files.
However, when I search on the mail archive, I found that there are number of
people encountering the same problem in Linux.

  Anyway, I'm going to try my program on linux to see if the problem will
happen again. Again, thanks a lot.

  Regards,
  David

On 1/4/06, Byron Miller <[EMAIL PROTECTED]> wrote:
>
> Are you running in Win2k, Windows XP, Windows Server?
> Do you have virus scanner on? Do you have anyfirewall
> software enabled? Anything blocking ports?
>
> Do you use NDFS or local?
>
> Are you on NTFS or FAT32 file system?
>
> How large is the dataset you are working with? Have
> you split into more smaller jobs instead of big/large
> jobs?
>
> --- Nguyen Ngoc Giang <[EMAIL PROTECTED]> wrote:
>
> >   Hi all,
> >
> >   I'd like to bring back this topic, which has been
> > ignored several times in
> > Nutch mailing list as well as JIRA (
> > http://issues.apache.org/jira/browse/NUTCH-94,
> > http://issues.apache.org/jira/browse/NUTCH-96,
> > http://issues.apache.org/jira/browse/NUTCH-117 ).
> > Here is my error stack:
> >
> > 060104 110314 Finishing update
> > 060104 110314 Processing pagesByURL: Sorted 11
> > instructions in 0.016seconds.
> > 060104 110314 Processing pagesByURL: Sorted 687.5
> > instructions/second
> > java.io.IOException: already exists:
> > C:\tomcat\webapps\ROOT\data\db\webdb.new\pagesByURL
> >         at
> >
> org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
> >         at
> >
> org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(
> > WebDBWriter.java:549)
> >         at
> >
> org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
> >         at
> > org.apache.nutch.tools.UpdateDatabaseTool.close(
> > UpdateDatabaseTool.java:375)
> >
> >
> >   This error happens not only at update time, but
> > also at fetchlist time.
> > And the weird thing is that it happens so
> > undeterministically. I debugged
> > around and it seems the problem is because some
> > CloseProcessors didn't
> > terminate correctly, causing the webdb.new not
> > deletable. Then I try to
> > reduce to only 1 thread, with lightweight load (as
> > suggested in the JIRA
> > discussion), but it doesn't help. But when I try to
> > run step by step using
> > debugging mode of the IDE, there was no problem.
> >
> >   Can anyone help me to figure out this issue?
> > Thanks very much.
> >
> >   Regards,
> >   Giang
> >
>
>

Reply via email to