Hi,

answering my own question for the records: the experiments show that the
described functionality is achievable with the TokenFilter class
implementation. The only caveat though, is that Highlighter component stops
working properly, if the match position goes beyond the length of the text
field.

As for the performance, no major delays compared to the original proximity
search implementation have been noticed.

Best,

Dmitry Kan

On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan <solrexp...@gmail.com> wrote:

> Dear list,
>
> We are currently evaluating proximity searches ("term1 term2" ~slope) for
> a specific use case. In particular, each document contains artificial
> delimiter characters (one character between each pair of sentences in the
> text). Our goal is to hit the sentences individually for any proximity
> search and avoid sentence cross-boundary matches.
>
> We figured, that by using PositionIncrementAttribute as a field in the
> descendant of TokenFilter class it is possible to set a position
> increment of each artificial character (which is a term in Lucene / SOLR
> notation) to an arbitrarily large number. Thus any proximity searches with
> reasonably small slope values should automatically hit withing the sentence
> boundaries.
>
> Does this sound like a right way to tackle the problem? Are there any
> performance costs involved?
>
> Thanks in advance for any input,
>
> Dmitry Kan
>

Reply via email to