Re: crawl indexes and part-00000

Brian Whitman Thu, 15 Feb 2007 14:06:41 -0800

The merge program doesn't care what the name of the folder is. Itcares it
should be in a certain structure.
So if we assume you have a folder named indexes, the program wantsthat eachfolder inside indexes (represents a previous run of index) shouldhave a
Lucene index in it (it looks for a folder name segments).



Thanks Gal for the explanation. It makes sense.

What doesn't though is that

bin/nutch merge crawl/index crawl/index_1 crawl/index_2 crawl/index

(i.e. merging three indexes including the previously merged one) willnot generate the part-00000 in crawl/index, it just dumps the mergedLucene index directly into crawl/index. So then the next time I do acrawl merge I have to manually move the crawl/index/* to crawl/index/part-00000/.


But knowing this at least is helpful so I can update my scripts!

-Brian

Re: crawl indexes and part-00000

Reply via email to