Hey Ben, 

DId you find a solution? I'm having the same problem with cygwin and
nutch-0.9

Thanks mate
Cornelius



Ben Ogle wrote:
> 
> Hi all, I am having problems recrawling our intranet. Something in the
> recrawl script (is it invertlinks?) creates a
> crawldir\linkdb\current\linkdb-merge-<number> folder which has a
> part-00000 folder under that. When the indexer is invoked, it looks for
> crawldir\linkdb\current\linkdb-merge-<number>\data, but that file doesnt
> exist cause its in the part-00000 directory. How do I get the indexer to
> look in the part-00000 dir? Is it a configuration error? 
> 
> I am running a python port of recrawl script on a windows 2000 machine
> without cygwin, where the crawldir and nutch 0.8 is on a windows 2003
> server that I have very limited access to. Heres what the hadoop.log says
> about it:
> 
> 2006-09-07 13:02:39,696 INFO  indexer.Indexer - Indexer: starting
> 2006-09-07 13:02:39,696 INFO  indexer.Indexer - Indexer: linkdb:
> F:/nutch-0.8/intranet-crawl/linkdb
> 2006-09-07 13:02:40,696 INFO  indexer.Indexer - Indexer: adding segment:
> F:/nutch-0.8/intranet-crawl/segments/20060907130151
> 2006-09-07 13:02:50,804 WARN  mapred.LocalJobRunner - job_fn20sr
> java.io.IOException: Not a file:
> F:/nutch-0.8/intranet-crawl/linkdb/current/linkdb-merge-216906667/data
>       at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:121)
>       at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
> 
> If I move the contents of linkdb-merge-216906667/part-00000 to
> linkdb-merge-216906667, indexing works ok (well, it wont delete _0.f0, but
> thats another issue).
> 
> The same thing happens when this linkdb-merge-* directory exists already
> and I run invertlinks. 
> 
> What am I doing wrong? I havent been able to find anyone with these
> issues, so I must be doing something wrong.
> 
> Ben
> 

-- 
View this message in context: 
http://www.nabble.com/IOException%3A-not-a-file-with-invertlinks-index-tp6197542p14309409.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to