Re: TrimFilter -- t.startOffset(), t.endOffset()

Chris Hostetter Fri, 11 May 2007 13:00:06 -0700

: As implemented, the trim filter does not update offsets when it trims a
: token.  Is this intentional, or just has not been important to anyone?


in Lucene Token offset information is suppose to reflect exactly where in
the orriginal stream of date the source of the token was found ... if hte
token is modified in some way (ie: stemmed, trimmed, etc..)  the offsets
are suppose to remain the same becuase regardless of the token text
munging, the orriginal location hsa not actually changed.

mucking with the offsets can cause highlighter problems (this is the root
cause of SOLR-42 wherewe currently get the offsets wrong in the HTMPStrip
tokenizers.


: With the current impl, I get:
:
: <a href="/get/subject:aaa/">  aaa </a>--<a href="/get/subject:bbb/"> bbb
: </a>(<a href="/get/subject:ccc/">ccc</a>)
:
: I would like to get:
:
:    <a href="/get/subject:aaa/">aaa</a> -- <a href="/get/subject:bbb/">
: bbb</a> (<a href="/get/subject:ccc/">ccc</a>)

it looks like it's doing exactly what it should: "highlighting" exactly
what in the orriginal text resuted in the ultimate token ... if you want
the second behavior perhaps you should use a smarter Tokenizer?

:
:
: ryan
:



-Hoss

Re: TrimFilter -- t.startOffset(), t.endOffset()

Reply via email to