On 2010-11-12 20:33, Alexander Aristov wrote:
> why are you so afraid of segment merger? It appears to be the only
> "official" way to get rid of excessive folders. of course it's time/resource
> consuming but is your system so high loaded?

The truth is that in 99% cases SegmentMerger is not needed, and it is
indeed very resource intensive. Merging segments is completely optional
- NutchBean and Nutch search server will work just fine without it, and
the performance degradation related to using many segments is small. Of
course up to a point - if you have hundreds of tiny segments you should
merge them (at least some of them into larger segments), but if you have
just 30 or so then it should not be a big deal.

> 
> Also I might be wrong but if you are not planning to return summaries and
> content from nutch when you can remove folders by rm.
> 
> And you can completely get rid of segments by using the solr indexer. After
> that you perform indexing you can delete fetched segments. I presume this is
> what you saw in other threads.

If you keep up with re-fetching, i.e. new fetch cycles can keep up with
the amount of pages that become outdated, then you can remove all
segments older than db.max.fetch.interval, because Nutch forces all
pages older than that to become ready for re-fetching, so they will be
likely found in some newer segment.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to