Re: ERROR when recrawling... can ANYONE help?

TDLN Fri, 23 Jun 2006 10:46:56 -0700

Please specify what exact sequence of commands you are using.


For incremental crawling best to follow the "whole web" style process
as outlined in the tutorial. The one stop crawl command cannot be used
effectively for that.

HTH Thomas

On 6/23/06, Honda-Search Administrator <[EMAIL PROTECTED]> wrote:

I'm hoping that my emails actually reach other people, as they've been
ignored so far.

I just ran a recrawl today to crawl a few injected URLs that I have.  At the
end of the recrawl I received the following error:

060623 122916 merging segment indexes to:
/home/honda/nutch-0.7.2/crawl/index
Exception in thread "main" java.io.IOException:
/home/honda/nutch-0.7.2/crawl/segments/20060619230003/index not a directory
        at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:180)
        at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141)
        at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:80)
        at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:160)

Of course all of the crawled segments are not in the index.

Can ANYONE tellme how to fix this?  I'm getting a bit discouraged with Nutch
due to the large number of errors I keep receiving during crawls.  I do not
want to have to recrawl my entire sitelist AGAIN just to fix this.

Anyone?

Matt

Re: ERROR when recrawling... can ANYONE help?

Reply via email to