Hey Ben, DId you find a solution? I'm having the same problem with cygwin and nutch-0.9
Thanks mate Cornelius Ben Ogle wrote: > > Hi all, I am having problems recrawling our intranet. Something in the > recrawl script (is it invertlinks?) creates a > crawldir\linkdb\current\linkdb-merge-<number> folder which has a > part-00000 folder under that. When the indexer is invoked, it looks for > crawldir\linkdb\current\linkdb-merge-<number>\data, but that file doesnt > exist cause its in the part-00000 directory. How do I get the indexer to > look in the part-00000 dir? Is it a configuration error? > > I am running a python port of recrawl script on a windows 2000 machine > without cygwin, where the crawldir and nutch 0.8 is on a windows 2003 > server that I have very limited access to. Heres what the hadoop.log says > about it: > > 2006-09-07 13:02:39,696 INFO indexer.Indexer - Indexer: starting > 2006-09-07 13:02:39,696 INFO indexer.Indexer - Indexer: linkdb: > F:/nutch-0.8/intranet-crawl/linkdb > 2006-09-07 13:02:40,696 INFO indexer.Indexer - Indexer: adding segment: > F:/nutch-0.8/intranet-crawl/segments/20060907130151 > 2006-09-07 13:02:50,804 WARN mapred.LocalJobRunner - job_fn20sr > java.io.IOException: Not a file: > F:/nutch-0.8/intranet-crawl/linkdb/current/linkdb-merge-216906667/data > at > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:121) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80) > > If I move the contents of linkdb-merge-216906667/part-00000 to > linkdb-merge-216906667, indexing works ok (well, it wont delete _0.f0, but > thats another issue). > > The same thing happens when this linkdb-merge-* directory exists already > and I run invertlinks. > > What am I doing wrong? I havent been able to find anyone with these > issues, so I must be doing something wrong. > > Ben > -- View this message in context: http://www.nabble.com/IOException%3A-not-a-file-with-invertlinks-index-tp6197542p14309409.html Sent from the Nutch - User mailing list archive at Nabble.com.
