Jesse Hires wrote:
I am getting warnings in hadoop.log that segments.gen and segments_2 are not
directories, and as you can see by the listing, they are in fact files not
directories. I'm not sure what stage of the process this is happening in, as
I just now stumbled on them, but it concerns me that it says it is skipping
something. Any ideas before I start digging further?
2009-11-30 08:28:56,344 WARN mapred.FileInputFormat - Can't open index at
hdfs://nn1:9000/user/nutch/crawl/index1/segments.gen:0+2147483647, skipping.
Most likely reason for this is that you defined your searcher.dir as
hdfs://nn1:9000/user/nutch/crawl/index1 - instead you should set it to
hdfs://nn1:9000/user/nutch/crawl . Please also note that names "index"
and "indexes" are magic - Lucene indexes must be located under one of
these names ("index" for a single merged index, and "indexes" for
partial indexes), otherwise they won't be found by the NutchBean (the
search component in Nutch). So e.g. your Lucene index in index1/ won't
be found.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com