Re: question about using lucene on large documents

2014-02-05 Thread Michael Sokolov
No, not really. What would you do if you had a match contained entirely within the overlapping region? You'd probably need a way to distinguish that from a term that matched in two adjacent chunks, but *not* in the overlap. Sounds very tricky to me. -Mike On 2/5/2014 2:21 AM, mrodent wrote:

Re: question about using lucene on large documents

2014-02-04 Thread mrodent
Thanks, gives me food for thought. So no { N, N+1 } ideas specifically... -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-using-lucene-on-large-documents-tp4115343p4115465.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Re: question about using lucene on large documents

2014-02-04 Thread Michael Sokolov
Ideally you would chunk a document at logical boundaries that will make sense as units of both search and presentation. For some content, these boundaries don't align; for example you might want to search for matches within a paragraph scope, or within a section, chapter, or part of a book, bu