On 2010-05-13 20:29, Hemanth Yamijala wrote:
> Hi,
> 
> I have a situation where we have data indexed from two different
> sources into different indexes. The nature of data indexed is roughly
> the same. For e.g. assume that they are from crawls of two websites of
> book sellers. When a user fires a query, I'd like to search both
> indexes and match the results. That is, I'd like to point out in the
> results that something like Book A from Index 1 is the same as Book B
> from Index 2. Is there some way of doing this with Nutch or any
> related projects like Solr, if required implementing custom plugins ?

If you want to implement this in Nutch searcher, then you would have to
modify the DistributedSearchBean where results coming from sub-searchers
are merged. In Solr this happens in SearchHandler.

The main question however is how "deep" that matching needs to go - if
you have 10000 hits from A and 10000 hits from B, and then present only
top 10, do you want to tell the user that hit #9999 from B matches hit
#1 from A?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to