Got it with the help of Demian Katz, main developper of Vufind:
The import script of Vufind was bypassing the duplication parameters
while writing directly to the SOLR-Index.
By deactivitating direct writing to the index and using the standard way
it now works!
Thanks to all who gave input!
Markus
Markus Fischer schrieb:
I use
<bool name="overwriteDupes">true</bool>
and a different field than ID to control duplication. This is about
bibliographic data coming from different sources with different IDs
which may have the same content...
I attached solrconfig.xml if you want to take a look.
Thanks a lot!
Markus
Markus Jelsma schrieb:
What's your solrconfig? No deduplication is overwritesDedupes = false
and signature field is other than doc ID field (unique)
-----Original message-----
From: Markus Fischer <i...@flyingfischer.ch>
Sent: Thu 13-05-2010 17:01
To: solr-user@lucene.apache.org; Subject: Config issue for deduplication
I am trying to configure automatic deduplication for SOLR 1.4 in
Vufind. I followed:
http://wiki.apache.org/solr/Deduplication
Actually nothing happens. All records are being imported without any
deduplication.
What am I missing?
Thanks
Markus
I did:
- create a duplicated set of records, only shifted their ID by a fixed
number
---
solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
<lst name="defaults">
<str name="update.processor">dedupe</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="dedupe">
<processor
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<bool name="overwriteDupes">true</bool>
<str name="signatureField">dedupeHash</str>
<str name="fields">reference,issn</str>
<str
name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---
In schema.xml I added the field
<field name="dedupeHash" type="string" stored="true" indexed="true"
multiValued="false" />
--
If I look at the created field "dedupeHash" it seems to be empty...!?