: I want to deduplicate documents from search results. What should be the : parameters on which I should decide an efficient SignatureClass? Also, what : are the SignaureClasses available?
the signature classes available are the ones mentioned on the wiki... https://wiki.apache.org/solr/Deduplication ...which one you should choose, and which fields you feed it depend entirely on your goal -- if you want to deduplicate anytime both the "user_fname" and "user_lname" fields are exactly the same, then use those fields with either the MD5Signature or the Lookup3Signature -- (lookup3 is faster, but some people want MD5 because they want to use the computed MD5 for other things) if you want to detext when some much longer "body" field containing a lot of full test is *nearly* identical, then you should consider the TextProfileSignature -- how exactly it works and how you tune it i don't know off the top of my head. -Hoss