Dear list,
solr/lucene has a strange problem.
I'm currently using apache-solr-4.0-2010-10-12_08-05-48

I have written a MessageDigest for fields which generally works.
Part of my schema.xml is:
...
<fieldType name="text_md" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <filter class="de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory" 
mdAlgorithm="MD5" />
  </analyzer>
</fieldType>
...
<!-- UNIQUE ID -->
<field name="id" type="string" indexed="true" stored="true" required="true" />
...
<field name="docid" type="text_md" indexed="true" stored="true" 
omitNorms="true" />
...
<copyField source="id" dest="docid" />
...

I have a field type "text_md" which uses the KeywordTokenizerFactory and then
my TextMessageDigestFilterFactory. As example I do a MD5 of "id" and store
it in docid.
The Field Analysis runs fine.
...
Index Analyzer
org.apache.solr.analysis.KeywordTokenizerFactory {luceneMatchVersion=LUCENE_40}
term position   1
term text       foo
term type       word
source start,end        0,3
payload         
de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory {mdAlgorithm=MD5, 
luceneMatchVersion=LUCENE_40}
term position   1
term text       acbd18db4cc2f85cedef654fccc4a4d8
term type       word
source start,end        0,3
payload         

The problem is that while loading via DIH the debugger shows that the 
TextMessageDigestFilterFactory
is called and running without problems and the result of my filter is properly 
returned,
but somehow the result never reaches the IndexWriter and gets stored to the 
index.

Any idea where to look at?

May be a class at a higher level doesn't recognize the change?

The above "source start,end" still has "0,3" even after the term text
has changed from "foo" to MD5 string. Should it then be "0,32" ?

Regards
Bernd

Reply via email to