Andrzej,

>> I have a situation where we have data indexed from two different
>> sources into different indexes. The nature of data indexed is roughly
>> the same. For e.g. assume that they are from crawls of two websites of
>> book sellers. When a user fires a query, I'd like to search both
>> indexes and match the results. That is, I'd like to point out in the
>> results that something like Book A from Index 1 is the same as Book B
>> from Index 2. Is there some way of doing this with Nutch or any
>> related projects like Solr, if required implementing custom plugins ?
>
> If you want to implement this in Nutch searcher, then you would have to
> modify the DistributedSearchBean where results coming from sub-searchers
> are merged. In Solr this happens in SearchHandler.
>

Thank you. I will take a look at these classes.

> The main question however is how "deep" that matching needs to go - if
> you have 10000 hits from A and 10000 hits from B, and then present only
> top 10, do you want to tell the user that hit #9999 from B matches hit
> #1 from A?
>

In the current scenario, I think it is unlikely that hits below the a
certain shallow depth are going to match. But that's just my guess
right now. Can you please tell me how the depth impacts the solution ?
Are you thinking about likely performance issues ?

Thanks
Hemanth

Reply via email to