Jacob Brunson wrote:
For some tests, I ran two fetches on segments which I generated with
topN=50.  I then tried to merge these segments using mergesegs with
slice=200 which resulted in 8 segments.

If I only fetched about 100 URLs, why do I end up with 8 segments
containing (supposedly) 200 URLs each?


Ah ... that's a good question ;)

The only way to answer this would be to dump each segment and compare the list of URLs.

What is the optimal number of segments or size of segments for search
efficiency?

It's a complicated issue ... it depends on the hardware, index size and to a certain degree its content, and also on the expected performance levels. The best situation, if you can afford it, is to keep all indexes in memory, which on a 4GB machine is IIRC equivalent to roughly 10 mln documents.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to