How are you indexing?
Maybe the angle brackets are being escaped (or not being escaped) properly.
-- Jack Krupansky
-----Original Message-----
From: meier_flo
Sent: Tuesday, October 16, 2012 7:50 AM
To: solr-user@lucene.apache.org
Subject: FieldType works fine at ANALYSIS(debuging) but not at index time
Dear solr-user community,
I'd like to apologize upfront if this question has been asked in a similar
way before, but while searching I didn't find anything that could help me
with my question/problem....
So I got the following inconsistency with my schema.xml at Debugging
(ANALYSIS) and index time...
from a tweet crawler i get stuff like this:
The "<" and ">" chars form this link are encoded in UTF-8.
At index time i'm trying to get rid of all the HTML-stuff, which I tried
using the
But because of the encoding the tags above aren't recognised as HTML tags so
I did a little tweak and used an additional
to map the characters
I tried this in the field debugging tool (ANALYSIS) in Solr and everything
worked fine - the html tags were stripped and the result/terms are like
this: "Twitter|for|Android"
But after indexing my documents, I checked the index and saw that the field
still got the same output as before:
without stripping the tags
So does anybody of you have a clue why at debugging time it works, but at
index time it doesn't?
I'd be glad if you could give me some input!
Thanks Florian
PS:
here the fieldtype and field definition if it is of use:
--
View this message in context:
http://lucene.472066.n3.nabble.com/FieldType-works-fine-at-ANALYSIS-debuging-but-not-at-index-time-tp4013967.html
Sent from the Solr - User mailing list archive at Nabble.com.