Hi Ahment,
Thank you for the reply,
That's exactly what I am doing. At the moment, to index a document, I break
it to sentences, and each sentence is analyzed (lemmatizing, stopword
removal etc.)
Now, what I am looking for is a way to create an analyzer (a class which
extends lucene's analyzer).
Currently, how are you indexing sentence boundaries? Are you placing
sentences in distinct fields, leaving a position gap, or... what?
Ultimately it comes down to how you intend to query the data in a way that
respects sentence boundaries. To put it simply, whay exactly do you care
where the
In the documentation for FieldMaskingSpanQuery, it says:
Note: as getField() returns the masked field, scoring will be done using the
Similarity and collection statistics of the field name supplied, but with the
term statistics of the real field. This may lead to exceptions, poor
performance,