actually searcher.dir is still the default "crawl". The warnings are showing
up either while indexing segments or merging indexes. I need to spend some
time figuring out just where it is happening at. I will look into it later
tonight, work doesn't like my hobbies intruding. :)
I may need some more info in "index" vs "indexes" later if you don't mind my
asking some dumb questions about them, but thus far, things seem to be
working in the manner I have it set up. With the exception of the warnings
mentioned of course.
The searching (or searchers) run out of a different directory and I run the
indexes and segments for them locally on the individual nodes and I am
getting search results back, which increase with every pass as expected.
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair roll of dice
// Guaranteed to be random
} // xkcd.com
On Mon, Nov 30, 2009 at 8:57 AM, Andrzej Bialecki <[email protected]> wrote:
> Jesse Hires wrote:
>
>> I am getting warnings in hadoop.log that segments.gen and segments_2 are
>> not
>> directories, and as you can see by the listing, they are in fact files not
>> directories. I'm not sure what stage of the process this is happening in,
>> as
>> I just now stumbled on them, but it concerns me that it says it is
>> skipping
>> something. Any ideas before I start digging further?
>>
>>
>>
>>
>> 2009-11-30 08:28:56,344 WARN mapred.FileInputFormat - Can't open index at
>> hdfs://nn1:9000/user/nutch/crawl/index1/segments.gen:0+2147483647,
>> skipping.
>>
>
> Most likely reason for this is that you defined your searcher.dir as
> hdfs://nn1:9000/user/nutch/crawl/index1 - instead you should set it to
> hdfs://nn1:9000/user/nutch/crawl . Please also note that names "index" and
> "indexes" are magic - Lucene indexes must be located under one of these
> names ("index" for a single merged index, and "indexes" for partial
> indexes), otherwise they won't be found by the NutchBean (the search
> component in Nutch). So e.g. your Lucene index in index1/ won't be found.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>