Re: Text dependent analyzer

2015-04-15 Thread Shay Hummel
Hi Ahment, Thank you for the reply, That's exactly what I am doing. At the moment, to index a document, I break it to sentences, and each sentence is analyzed (lemmatizing, stopword removal etc.) Now, what I am looking for is a way to create an analyzer (a class which extends lucene's analyzer).

Re: Text dependent analyzer

2015-04-15 Thread Jack Krupansky
Currently, how are you indexing sentence boundaries? Are you placing sentences in distinct fields, leaving a position gap, or... what? Ultimately it comes down to how you intend to query the data in a way that respects sentence boundaries. To put it simply, whay exactly do you care where the

FieldMaskingSpanQuery and statistics

2015-04-15 Thread Stephen Wu
In the documentation for FieldMaskingSpanQuery, it says: Note: as getField() returns the masked field, scoring will be done using the Similarity and collection statistics of the field name supplied, but with the term statistics of the real field. This may lead to exceptions, poor performance,