I'm using Nutch 0.7.1. on Windows.  

My crawling&indexing task ended with this Java IO Exception: 

java.io.IOException: already exists:
C:\nutch-0.7\intranet_0308\db\webdb.new\pagesByURL
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
        at
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.jav
a:549)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
        at
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:
321)
        at
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:3
71)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)

I started this task with this nutch command:
bin/nutch crawl conf/intranet-urls.txt -dir intranet_0308 -depth 10

At the time when I started this, there was no directory called
intranet_0308.
So the file or directory that nutch is complianing about already
existing is the
file that was created by nutch.

I encountered the similar problem before.  At that time, I reran the
same command again
which succeeded.  

-kuro


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to