Dear list, solr/lucene has a strange problem. I'm currently using apache-solr-4.0-2010-10-12_08-05-48
I have written a MessageDigest for fields which generally works. Part of my schema.xml is: ... <fieldType name="text_md" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory" mdAlgorithm="MD5" /> </analyzer> </fieldType> ... <!-- UNIQUE ID --> <field name="id" type="string" indexed="true" stored="true" required="true" /> ... <field name="docid" type="text_md" indexed="true" stored="true" omitNorms="true" /> ... <copyField source="id" dest="docid" /> ... I have a field type "text_md" which uses the KeywordTokenizerFactory and then my TextMessageDigestFilterFactory. As example I do a MD5 of "id" and store it in docid. The Field Analysis runs fine. ... Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {luceneMatchVersion=LUCENE_40} term position 1 term text foo term type word source start,end 0,3 payload de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory {mdAlgorithm=MD5, luceneMatchVersion=LUCENE_40} term position 1 term text acbd18db4cc2f85cedef654fccc4a4d8 term type word source start,end 0,3 payload The problem is that while loading via DIH the debugger shows that the TextMessageDigestFilterFactory is called and running without problems and the result of my filter is properly returned, but somehow the result never reaches the IndexWriter and gets stored to the index. Any idea where to look at? May be a class at a higher level doesn't recognize the change? The above "source start,end" still has "0,3" even after the term text has changed from "foo" to MD5 string. Should it then be "0,32" ? Regards Bernd