Hi Rita,
thanks for the advice, one problem solved.
"source start,end" is now set to the correct value by the filter.

After further debugging it looks like this is a bug in Lucene indexer.
I wonder that noone ever noticed this...

Kind regards,
Bernd


Am 23.11.2010 09:07, schrieb Bernd Fehling:
> Dear list,
> solr/lucene has a strange problem.
> I'm currently using apache-solr-4.0-2010-10-12_08-05-48
> 
> I have written a MessageDigest for fields which generally works.
> Part of my schema.xml is:
> ...
> <fieldType name="text_md" class="solr.TextField">
>   <analyzer type="index">
>     <tokenizer class="solr.KeywordTokenizerFactory" />
>     <filter 
> class="de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory" 
> mdAlgorithm="MD5" />
>   </analyzer>
> </fieldType>
> ...
> <!-- UNIQUE ID -->
> <field name="id" type="string" indexed="true" stored="true" required="true" />
> ...
> <field name="docid" type="text_md" indexed="true" stored="true" 
> omitNorms="true" />
> ...
> <copyField source="id" dest="docid" />
> ...
> 
> I have a field type "text_md" which uses the KeywordTokenizerFactory and then
> my TextMessageDigestFilterFactory. As example I do a MD5 of "id" and store
> it in docid.
> The Field Analysis runs fine.
> ...
> Index Analyzer
> org.apache.solr.analysis.KeywordTokenizerFactory 
> {luceneMatchVersion=LUCENE_40}
> term position         1
> term text     foo
> term type     word
> source start,end      0,3
> payload       
> de.ubbielefeld.solr.analysis.TextMessageDigestFilterFactory {mdAlgorithm=MD5, 
> luceneMatchVersion=LUCENE_40}
> term position         1
> term text     acbd18db4cc2f85cedef654fccc4a4d8
> term type     word
> source start,end      0,3
> payload       
> 
> The problem is that while loading via DIH the debugger shows that the 
> TextMessageDigestFilterFactory
> is called and running without problems and the result of my filter is 
> properly returned,
> but somehow the result never reaches the IndexWriter and gets stored to the 
> index.
> 
> Any idea where to look at?
> 
> May be a class at a higher level doesn't recognize the change?
> 
> The above "source start,end" still has "0,3" even after the term text
> has changed from "foo" to MD5 string. Should it then be "0,32" ?
> 
> Regards
> Bernd

Reply via email to