Ken Krugler wrote:
Hi all,

I'm curious as to whether MultiSearcher (as of 1.9) does a good job of blending search results, when the various indexes being searched have significantly different characteristics.

For example, let's say I've got two indexes. One consists of documents where the term "platypus" almost never occurs. This index will have a very high IDF for that term.

The second index happens to be from the portion of the crawl that was covering biology departments in Australian universities, so the term "platypus" is significantly more common.

If I do a search on "platypus lifespan" using MultiSearcher, will hits from the first index get an unnatural boost because of the corresponding high IDF in that particular slice of the data? Or should I expect that the results will (closely) match what I'd get back if I merged the two indexes and used a regular searcher?

Unfortunately, this is still an existing problem, and neither Nutch nor Lucene does the right job here. Please see NUTCH-92 for more information, and a sketch of solution for this issue.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to