: Should the EdgeNGramFilter use the same term position for the ngrams within a
: single token?
i can see the argument going both ways ... imagine a hypothetical
CharSplitterTokenFilter that takes replaces each token in the stream with
one token per character in the orriginal token (ie: hello
On 9/16/07, Ryan McKinley [EMAIL PROTECTED] wrote:
Should the EdgeNGramFilter use the same term position for the ngrams
within a single token?
It feels like that is the right approach.
I don't see value in having them sequential, and I can think of uses
for having them overlap.
-Yonik
Should the EdgeNGramFilter use the same term position for the ngrams
within a single token?
As is, the EdgeNGramTokenFilter increments the term position for each
character. In analysis.jsp, with the input hello, I get:
term position 1 2 3 4 5
term text h