Delete the folder/database and then re-issue the crawl command.
  The database/folder gets created when Crawl is used. 
  I am recent user too... But, I did get the same message and I corrected by 
deleting the folder. IF any one has better ideas, please share.
   
  Thanks
   
  [EMAIL PROTECTED] wrote:
  Hi,

I've been experimenting with nutch and lucene,
everything was working fine, but now I'm getting an
exception thrown from the crawl command.

The command manages a few fetch cycles but then I get
the following message:

060301 161128 status: segment 20060301161046, 38
pages, 0 errors, 856591 bytes, 41199 ms
060301 161128 status: 0.92235243 pages/s, 162.43396
kb/s, 22541.87 bytes/page
060301 161129 Updating C:\PF\nutch-0.7.1\LIVE\db
060301 161129 Updating for
C:\PF\nutch-0.7.1\LIVE\segments\20060301161046
060301 161129 Processing document 0
060301 161130 Finishing update
060301 161130 Processing pagesByURL: Sorted 952
instructions in 0.02 seconds.
060301 161130 Processing pagesByURL: Sorted 47600.0
instructions/second
java.io.IOException: already exists:
C:\PF\nutch-0.7.1\LIVE\db\webdb.new\pagesByURL
at
org.apache.nutch.io.MapFile$Writer.(MapFile.java:86)
at
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:549)
at
org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
at
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
at
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
at
org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)
Exception in thread "main" 

Does anyone have any ideas what the problem is likely
to be. I am running nutch 0.7.1

thanks,


Julian.



  Sudhi Seshachala
  http://sudhilogs.blogspot.com/
   


                
---------------------------------
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.

Reply via email to