Hi, 

I'm considering to write a component for diversifying the results. I know that 
diversification can be achieved by using grouping but I'm thinking about 
something different and query biased. 
The idea is to have something that gets applied after the normal retrieval and 
selects the top k documents more diverse based on some distance metric: 

Example:
imagine that you are asking for 10 rows, and you set diversify.rows=3  
diversity.metric=tfidf  diversify.field=body

Solr might retrieve the the top 10 rows as usual, extract tfidf vectors for the 
bodies and select the top 3 stories that are more distant according to the 
cosine similarity. 
This would be different from grouping because documents will be 'collapsed' or 
not based on the subset of documents retrieved for the query. 
Do you think it would make sense to have it as a component?  any feedback / 
idea? 


Reply via email to