: I'm doing near duplication detection on a fairly large number of documents.
: Each document to be added to Solr will be compared with sample documents
: from all clusters in the index. I could of course, dedupe documents at
: client side but the performance will not be as good.

have you considered the UpdateRequestProcessor API as an alternative to 
mucking with DUH2 directly?

http://lucene.apache.org/solr/api/org/apache/solr/update/processor/package-summary.html

(i don't really know much of the details about it, but i know it was added 
specificly to support more "biz logic" related tasks at update time -- as 
oppoed to the really low level nitty gritty updating that DUH2 worries 
about directly)



-Hoss

Reply via email to