Re: EdgeNGramTokenFilter, term position?
: Should the EdgeNGramFilter use the same term position for the ngrams within a : single token? i can see the argument going both ways ... imagine a hypothetical CharSplitterTokenFilter that takes replaces each token in the stream with one token per character in the orriginal token (ie: hello becomes h,e,l,l,o) ... should those tokens all have the same position? the have a logical ordered flow to them, so in theory they are sequential ... but they did occupy the same space in the orriginal token stream. when in doubt: make it an option -Hoss
Re: EdgeNGramTokenFilter, term position?
On 9/16/07, Ryan McKinley [EMAIL PROTECTED] wrote: Should the EdgeNGramFilter use the same term position for the ngrams within a single token? It feels like that is the right approach. I don't see value in having them sequential, and I can think of uses for having them overlap. -Yonik
EdgeNGramTokenFilter, term position?
Should the EdgeNGramFilter use the same term position for the ngrams within a single token? As is, the EdgeNGramTokenFilter increments the term position for each character. In analysis.jsp, with the input hello, I get: term position 1 2 3 4 5 term text h he hel hellhello term type wordwordwordwordword start,end 0,1 0,2 0,3 0,4 0,5 I would expect something more like what is generated from SOLR-357: term position 1 term text hello hell hel he h term type word prefix prefix prefix prefix start,end 0,5 0,4 0,3 0,2 0,1 This seems like it would affect slop queries, but I don't really understand them yet. thanks ryan