Andrzej, >> I have a situation where we have data indexed from two different >> sources into different indexes. The nature of data indexed is roughly >> the same. For e.g. assume that they are from crawls of two websites of >> book sellers. When a user fires a query, I'd like to search both >> indexes and match the results. That is, I'd like to point out in the >> results that something like Book A from Index 1 is the same as Book B >> from Index 2. Is there some way of doing this with Nutch or any >> related projects like Solr, if required implementing custom plugins ? > > If you want to implement this in Nutch searcher, then you would have to > modify the DistributedSearchBean where results coming from sub-searchers > are merged. In Solr this happens in SearchHandler. >
Thank you. I will take a look at these classes. > The main question however is how "deep" that matching needs to go - if > you have 10000 hits from A and 10000 hits from B, and then present only > top 10, do you want to tell the user that hit #9999 from B matches hit > #1 from A? > In the current scenario, I think it is unlikely that hits below the a certain shallow depth are going to match. But that's just my guess right now. Can you please tell me how the depth impacts the solution ? Are you thinking about likely performance issues ? Thanks Hemanth

