: As implemented, the trim filter does not update offsets when it trims a : token. Is this intentional, or just has not been important to anyone?
in Lucene Token offset information is suppose to reflect exactly where in the orriginal stream of date the source of the token was found ... if hte token is modified in some way (ie: stemmed, trimmed, etc..) the offsets are suppose to remain the same becuase regardless of the token text munging, the orriginal location hsa not actually changed. mucking with the offsets can cause highlighter problems (this is the root cause of SOLR-42 wherewe currently get the offsets wrong in the HTMPStrip tokenizers. : With the current impl, I get: : : <a href="/get/subject:aaa/"> aaa </a>--<a href="/get/subject:bbb/"> bbb : </a>(<a href="/get/subject:ccc/">ccc</a>) : : I would like to get: : : <a href="/get/subject:aaa/">aaa</a> -- <a href="/get/subject:bbb/"> : bbb</a> (<a href="/get/subject:ccc/">ccc</a>) it looks like it's doing exactly what it should: "highlighting" exactly what in the orriginal text resuted in the ultimate token ... if you want the second behavior perhaps you should use a smarter Tokenizer? : : : ryan : -Hoss
