Hi Rita, thanks for the advice, one problem solved. "source start,end" is now set to the correct value by the filter.
After further debugging it looks like this is a bug in Lucene indexer. I wonder that noone ever noticed this... Kind regards, Bernd Am 23.11.2010 09:07, schrieb Bernd Fehling: > Dear list, > solr/lucene has a strange problem. > I'm currently using apache-solr-4.0-2010-10-12_08-05-48 > > I have written a MessageDigest for fields which generally works. > Part of my schema.xml is: > ... > <fieldType name="text_md" class="solr.TextField"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory" /> > <filter > class="de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory" > mdAlgorithm="MD5" /> > </analyzer> > </fieldType> > ... > <!-- UNIQUE ID --> > <field name="id" type="string" indexed="true" stored="true" required="true" /> > ... > <field name="docid" type="text_md" indexed="true" stored="true" > omitNorms="true" /> > ... > <copyField source="id" dest="docid" /> > ... > > I have a field type "text_md" which uses the KeywordTokenizerFactory and then > my TextMessageDigestFilterFactory. As example I do a MD5 of "id" and store > it in docid. > The Field Analysis runs fine. > ... > Index Analyzer > org.apache.solr.analysis.KeywordTokenizerFactory > {luceneMatchVersion=LUCENE_40} > term position 1 > term text foo > term type word > source start,end 0,3 > payload > de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory {mdAlgorithm=MD5, > luceneMatchVersion=LUCENE_40} > term position 1 > term text acbd18db4cc2f85cedef654fccc4a4d8 > term type word > source start,end 0,3 > payload > > The problem is that while loading via DIH the debugger shows that the > TextMessageDigestFilterFactory > is called and running without problems and the result of my filter is > properly returned, > but somehow the result never reaches the IndexWriter and gets stored to the > index. > > Any idea where to look at? > > May be a class at a higher level doesn't recognize the change? > > The above "source start,end" still has "0,3" even after the term text > has changed from "foo" to MD5 string. Should it then be "0,32" ? > > Regards > Bernd