Jacob Brunson wrote:
For some tests, I ran two fetches on segments which I generated with
topN=50. I then tried to merge these segments using mergesegs with
slice=200 which resulted in 8 segments.
If I only fetched about 100 URLs, why do I end up with 8 segments
containing (supposedly) 200 URLs each?
Ah ... that's a good question ;)
The only way to answer this would be to dump each segment and compare
the list of URLs.
What is the optimal number of segments or size of segments for search
efficiency?
It's a complicated issue ... it depends on the hardware, index size and
to a certain degree its content, and also on the expected performance
levels. The best situation, if you can afford it, is to keep all indexes
in memory, which on a 4GB machine is IIRC equivalent to roughly 10 mln
documents.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com