Please specify what exact sequence of commands you are using. For incremental crawling best to follow the "whole web" style process as outlined in the tutorial. The one stop crawl command cannot be used effectively for that.
HTH Thomas On 6/23/06, Honda-Search Administrator <[EMAIL PROTECTED]> wrote: > I'm hoping that my emails actually reach other people, as they've been > ignored so far. > > I just ran a recrawl today to crawl a few injected URLs that I have. At the > end of the recrawl I received the following error: > > 060623 122916 merging segment indexes to: > /home/honda/nutch-0.7.2/crawl/index > Exception in thread "main" java.io.IOException: > /home/honda/nutch-0.7.2/crawl/segments/20060619230003/index not a directory > at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:180) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:80) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:160) > > Of course all of the crawled segments are not in the index. > > Can ANYONE tellme how to fix this? I'm getting a bit discouraged with Nutch > due to the large number of errors I keep receiving during crawls. I do not > want to have to recrawl my entire sitelist AGAIN just to fix this. > > Anyone? > > Matt > > Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
