Dear solr-user community,
I'd like to apologize upfront if this question has been asked in a similar
way before, but while searching I didn't find anything that could help me
with my question/problem....

So I got the following inconsistency with my schema.xml   at Debugging
(ANALYSIS) and index time...

from a tweet crawler i get stuff like this: 


The "<" and ">" chars form this link are encoded in UTF-8. 

At index time i'm trying to get rid of all the HTML-stuff, which I tried
using the 


But because of the encoding the tags above aren't recognised as HTML tags so
I did a little tweak and used an additional 

 to map the characters 

I tried this in the field debugging tool (ANALYSIS) in Solr and everything
worked fine - the html tags were stripped and the result/terms are like
this: "Twitter|for|Android"

But after indexing my documents, I checked the index and saw that the field 
still got the same output as before: 

without stripping the tags 

So does anybody of you have a clue why at debugging time it works, but at
index time it doesn't? 

I'd be glad if you could give me some input!
Thanks Florian

PS: 

here the fieldtype and field definition if it is of use:






--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldType-works-fine-at-ANALYSIS-debuging-but-not-at-index-time-tp4013967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to