At the top of the De-Duplication wiki page is a note about collapsing results. Once you have the signature (identical for each of the duplicates) you'll want to collapse your results, keeping the one with max date.
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results k/r, Scott On Thu, Oct 29, 2015 at 11:59 PM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Yes, you can try to use the SignatureUpdateProcessorFactory to do a hashing > of the content to a signature field, and group the signature field during > your search. > > You can find more information here: > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > I have been using this method to group the index with duplicated content, > and it is working fine. > > Regards, > Edwin > > > On 30 October 2015 at 07:20, Shamik Bandopadhyay <sham...@gmail.com> > wrote: > > > Hi, > > > > I'm looking to customizing index time de-duplication. Here's my use > case > > and what I'm trying to achieve. > > > > I've identical documents coming from different release year of a given > > product. I need to index them in Solr as they are required in individual > > year context. But there's a generic search which spans across all the > years > > and hence bring back duplicate/identical content. My goal is to only > return > > the latest document and filter out the rest. For e.g. if product A has > > identical documents for 2015, 2014 and 2013, search should only return > 2015 > > (latest document) and filter out the rest. > > > > What I'm thinking (if possible) during index time : > > > > Index all documents, but add a special tag (e.g. dedup=true) to 2013 and > > 2014 content, keeping 2015 (the latest release) untouched. During query > > time, I'll add a filter which will exclude contents tagged with "dedup". > > > > Just wondering if this is achievable by perhaps extending > > UpdateRequestProcessorFactory or > > customizing SignatureUpdateProcessorFactory ? > > > > Any pointers will be appreciated. > > > > Regards, > > Shamik > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com