On 2010-05-13 20:29, Hemanth Yamijala wrote: > Hi, > > I have a situation where we have data indexed from two different > sources into different indexes. The nature of data indexed is roughly > the same. For e.g. assume that they are from crawls of two websites of > book sellers. When a user fires a query, I'd like to search both > indexes and match the results. That is, I'd like to point out in the > results that something like Book A from Index 1 is the same as Book B > from Index 2. Is there some way of doing this with Nutch or any > related projects like Solr, if required implementing custom plugins ?
If you want to implement this in Nutch searcher, then you would have to modify the DistributedSearchBean where results coming from sub-searchers are merged. In Solr this happens in SearchHandler. The main question however is how "deep" that matching needs to go - if you have 10000 hits from A and 10000 hits from B, and then present only top 10, do you want to tell the user that hit #9999 from B matches hit #1 from A? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

