That fixes it with analysis.jsp, but not with
FieldAnalysisRequestHandler I don't think. Using that field
definition below, and this request -
http://localhost:8983/solr/analysis/field?analysis.fieldtype=html_text&analysis.fieldvalue=%3Chtml%3E%3Cbody%3Ewhatever%3C/body%3E%3C/html%3E
I still see <str name="text"><html><body>whatever</body></html></str>
come out of WhitespaceTokenizer.
Does the consumer of an Analyzer from a FieldType have to do anything
special to utilize CharFilter's? Or it should all "just work"?
Erik
On Aug 17, 2009, at 10:52 PM, Yonik Seeley wrote:
I broke it with reusable token streams. Just checked in a fix - can
you try now?
-Yonik
http://www.lucidimagination.com
On Mon, Aug 17, 2009 at 10:17 PM, Erik
Hatcher<[email protected]> wrote:
I'm interested in using a CharFilter, something like this:
<fieldType name="html_text" class="solr.TextField">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
In hopes of being able to put in a value like
"<html><body>whatever</body></html>" and have "whatever" come back
out. In
analysis.jsp, I see that happening in the verbose output but it
doesn't make
it to the tokenizer input - the original string makes it there.
I must be misunderstanding something about CharFilter's and how to
use them
in Solr. HTMLStripWhitespaceTokenizerFactory is deprecated in
favor of the
above design, I think, but does what I'm after.
Solr only seems to use CharFilter's in analysis.jsp. Is that
correct?
Shouldn't they be factored into the analyzer for each field?
(like in
FieldAnalysisRequestHandler)
Thanks,
Erik