[
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646417#action_12646417
]
Hoss Man commented on SOLR-799:
-------------------------------
bq. It seems like uniqueField should normally enforce uniqueness, regardless of
what this component does.
agreed.
Whilei can imagine use cases for adding a signature field that is independent
from the uniqueKey field (ie: query time duplicate pruning/collapsing) I'm
having a really hard time thinking of any use cases where someone would need
special deletion logic on a (non uniqueKey) signature field. if you want docs
with identical signatures deleted, why wouldn't you make that the uniqueKey
field? ... if you have both, you could really confuse the hell out of someone
who doesn't understand why adding one doc deleted a different doc with a
completely different uniqueKey.
> Add support for hash based exact/near duplicate document handling
> -----------------------------------------------------------------
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Mark Miller
> Priority: Minor
> Attachments: SOLR-799.patch, SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking
> as well as field collapsing. Lets put it into solr.
> http://wiki.apache.org/solr/Deduplication
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.