I have configured de-duplication according to the Wiki..........

My signature field is defined thus...  

<field name="signature" type="string" stored="true" indexed="true" 
multiValued="false" />

and my updateRequestProcessor as follows....

<updateRequestProcessorChain name="dedupe">
    <processor 
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
      <bool name="enabled">true</bool>
      <bool name="overwriteDupes">false</bool>
      <str name="signatureField">signature</str>
      <str name="fields">content</str>
      <str 
name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

I am using SOLRJ to write to the index with the binary (as opposed to XML) so 
my update handler is defined as below.....

 <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" 
>
    <lst name="defaults">
      <str name="update.processor">dedupe</str>
    </lst>
  </requestHandler>

However I was expecting SOLR to only allow 1 instance of a duplicate document 
into the index, but I get the following results when I query mt index...

I have deliberately added my ISA Letter file 4 times and can see it has 
correctly generated an identical signature for the first 4 entries 
(d91a5ce933457fd5). The fifth entry is a different document and correctly has a 
different signature. 

I was expecting to only see 1 instance of the duplicate. Am I misinterpreting 
the way it works? Many Thanks.

<result name="response" numFound="36" start="0">
?
<doc>
<str name="doctitle">ISA Letter</str>
<str name="signature">d91a5ce933457fd5</str>
</doc>
?
<doc>
<str name="doctitle">ISA Letter</str>
<str name="signature">d91a5ce933457fd5</str>
</doc>
?
<doc>
<str name="doctitle">ISA Letter</str>
<str name="signature">d91a5ce933457fd5</str>
</doc>
?
<doc>
<str name="doctitle">ISA Letter</str>
<str name="signature">d91a5ce933457fd5</str>
</doc>
?
<doc>
<str name="doctitle">ISA Mailing pack letter</str>
<str name="signature">fd9d9e1c0de32fb5</str>
</doc>

If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

Reply via email to