Re: 0.8 output\index versus output\indexes

Andrzej Bialecki Fri, 22 Sep 2006 14:26:30 -0700

[EMAIL PROTECTED] wrote:

Hi All-


I am just curious if someone could explain the difference between the
'index' folder and the 'indexes' folder inside the output directory of
the crawl?

I noticed that indexes have parts (though mine only has one part) but
index just contains the Lucene index.  My theory is that each part is
the result of a hadoop reduce task, and since I am only crawling with
one machine there is only the one part... And index is the merge of
those parts.. Am I correct or just creative?


Correct.

The motivation for my question is that I am trying to determine what
parts of the crawl need to be deployed to my searcher machines (I don't
use servlet searcher but a custom class using the Nutch API).  It looks
like it works with just 'index' and 'segments', but I want to be sure
that I should not be deploying 'indexes' instead/in-addition.

That's correct. NutchBean first tries to use "index", if it can't befound then it tries "indexes".


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: 0.8 output\index versus output\indexes

Reply via email to