Delete the folder/database and then re-issue the crawl command. The database/folder gets created when Crawl is used. I am recent user too... But, I did get the same message and I corrected by deleting the folder. IF any one has better ideas, please share. Thanks [EMAIL PROTECTED] wrote: Hi,
I've been experimenting with nutch and lucene, everything was working fine, but now I'm getting an exception thrown from the crawl command. The command manages a few fetch cycles but then I get the following message: 060301 161128 status: segment 20060301161046, 38 pages, 0 errors, 856591 bytes, 41199 ms 060301 161128 status: 0.92235243 pages/s, 162.43396 kb/s, 22541.87 bytes/page 060301 161129 Updating C:\PF\nutch-0.7.1\LIVE\db 060301 161129 Updating for C:\PF\nutch-0.7.1\LIVE\segments\20060301161046 060301 161129 Processing document 0 060301 161130 Finishing update 060301 161130 Processing pagesByURL: Sorted 952 instructions in 0.02 seconds. 060301 161130 Processing pagesByURL: Sorted 47600.0 instructions/second java.io.IOException: already exists: C:\PF\nutch-0.7.1\LIVE\db\webdb.new\pagesByURL at org.apache.nutch.io.MapFile$Writer.(MapFile.java:86) at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:549) at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544) at org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321) at org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141) Exception in thread "main" Does anyone have any ideas what the problem is likely to be. I am running nutch 0.7.1 thanks, Julian. Sudhi Seshachala http://sudhilogs.blogspot.com/ --------------------------------- Yahoo! Mail Use Photomail to share photos without annoying attachments.
