Ken Krugler wrote:
Hi all,
I'm curious as to whether MultiSearcher (as of 1.9) does a good job of
blending search results, when the various indexes being searched have
significantly different characteristics.
For example, let's say I've got two indexes. One consists of documents
where the term "platypus" almost never occurs. This index will have a
very high IDF for that term.
The second index happens to be from the portion of the crawl that was
covering biology departments in Australian universities, so the term
"platypus" is significantly more common.
If I do a search on "platypus lifespan" using MultiSearcher, will hits
from the first index get an unnatural boost because of the
corresponding high IDF in that particular slice of the data? Or should
I expect that the results will (closely) match what I'd get back if I
merged the two indexes and used a regular searcher?
Unfortunately, this is still an existing problem, and neither Nutch nor
Lucene does the right job here. Please see NUTCH-92 for more
information, and a sketch of solution for this issue.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com