Jesse Hires wrote:
I am getting warnings in hadoop.log that segments.gen and segments_2 are not
directories, and as you can see by the listing, they are in fact files not
directories. I'm not sure what stage of the process this is happening in, as
I just now stumbled on them, but it concerns me that it says it is skipping
something. Any ideas before I start digging further?




2009-11-30 08:28:56,344 WARN  mapred.FileInputFormat - Can't open index at
hdfs://nn1:9000/user/nutch/crawl/index1/segments.gen:0+2147483647, skipping.

Most likely reason for this is that you defined your searcher.dir as hdfs://nn1:9000/user/nutch/crawl/index1 - instead you should set it to hdfs://nn1:9000/user/nutch/crawl . Please also note that names "index" and "indexes" are magic - Lucene indexes must be located under one of these names ("index" for a single merged index, and "indexes" for partial indexes), otherwise they won't be found by the NutchBean (the search component in Nutch). So e.g. your Lucene index in index1/ won't be found.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to