Hi All- I am just curious if someone could explain the difference between the 'index' folder and the 'indexes' folder inside the output directory of the crawl?
I noticed that indexes have parts (though mine only has one part) but index just contains the Lucene index. My theory is that each part is the result of a hadoop reduce task, and since I am only crawling with one machine there is only the one part... And index is the merge of those parts.. Am I correct or just creative? The motivation for my question is that I am trying to determine what parts of the crawl need to be deployed to my searcher machines (I don't use servlet searcher but a custom class using the Nutch API). It looks like it works with just 'index' and 'segments', but I want to be sure that I should not be deploying 'indexes' instead/in-addition. Thanks, Jared-
