Thanks! Fixing how I was merging the indexes took care of the warning.
Jesse

int GetRandomNumber()
{
   return 4; // Chosen by fair roll of dice
                // Guaranteed to be random
} // xkcd.com



On Tue, Dec 1, 2009 at 4:49 AM, Andrzej Bialecki <[email protected]> wrote:

> Jesse Hires wrote:
>
>> What is "segments.gen" and "segments_2" ?
>> The warning I am getting happens when I dedup two indexes.
>>
>> I create index1 and index2 through generate/fetch/index/...etc
>> index1 is an index of 1/2 the segments. index2 is an index of the other
>> 1/2
>>
>> The warning is happening on both datanodes.
>>
>> The command I am running is "bin/nutch dedup crawl/index1 crawl/index2"
>>
>> If segments.gen and segments_2 are supposed to be directories, then why
>> are
>> they created as files?
>>
>> They are created as files from the start
>> "bin/nutch index crawl/index1 crawl/crawldb /crawl/linkdb
>> crawl/segments/XXX
>> crawl/segments/YYY"
>>
>> I don't see any errors or warnings about creating the index.
>>
>
> The command that you quote above produces multiple partial indexes, located
> in crawl/index1/part-NNNNN and only in these subdirectories the Lucene
> indexes can be found. However, the deduplication process doesn't accept
> partial indexes, so you need to specify each /part-NNNN dir as an input to
> dedup.
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Reply via email to