Hi,
I've been experimenting with nutch and lucene,
everything was working fine, but now I'm getting an
exception thrown from the crawl command.
The command manages a few fetch cycles but then I get
the following message:
060301 161128 status: segment 20060301161046, 38
pages, 0 errors, 856591 bytes, 41199 ms
060301 161128 status: 0.92235243 pages/s, 162.43396
kb/s, 22541.87 bytes/page
060301 161129 Updating C:\PF\nutch-0.7.1\LIVE\db
060301 161129 Updating for
C:\PF\nutch-0.7.1\LIVE\segments\20060301161046
060301 161129 Processing document 0
060301 161130 Finishing update
060301 161130 Processing pagesByURL: Sorted 952
instructions in 0.02 seconds.
060301 161130 Processing pagesByURL: Sorted 47600.0
instructions/second
java.io.IOException: already exists:
C:\PF\nutch-0.7.1\LIVE\db\webdb.new\pagesByURL
at
org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
at
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:549)
at
org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
at
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
at
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
at
org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)
Exception in thread "main"
Does anyone have any ideas what the problem is likely
to be. I am running nutch 0.7.1
thanks,
Julian.