How do we do a hashing of the content? Regards, Edwin
On 4 September 2015 at 17:37, Arcadius Ahouansou <arcad...@menelic.com> wrote: > You could try using a hash of the content? > On Sep 4, 2015 9:00 AM, "Zheng Lin Edwin Yeo" <edwinye...@gmail.com> > wrote: > > > Hi, > > > > I'm trying out on the De-Duplication.I've tried to create a new signature > > field in schema.xml > > <field name="signature" type="string" stored="true" indexed="true" > > multiValued="false" /> > > > > I've also added the following in solrconfig.xml. > > > > <updateRequestProcessorChain name="dedupe"> > > <processor class="solr.processor.SignatureUpdateProcessorFactory"> > > <bool name="enabled">true</bool> > > <str name="signatureField">signature</str> > > <bool name="overwriteDupes">false</bool> > > <str name="fields">content</str> > > <str name="signatureClass">solr.processor.Lookup3Signature</str> > > </processor> > > <processor class="solr.DistributedUpdateProcessorFactory" /> > > <processor class="solr.LogUpdateProcessorFactory" /> > > <processor class="solr.RunUpdateProcessorFactory" /> > > </updateRequestProcessorChain> > > > > > > However, I can't do a copyField of content into this signature field as > > some of my contents are more than 32766 characters in length. > Previously, I > > tried to point the signatureField directly to content. but that is not > > working too. > > > > Anything else that I can do to do a group on a new signatureField? > > > > > > Regards, > > Edwin > > >