RE: Searching for a phrase which spans on 2 pages

2006-07-13 Thread Ramesh Salla
Yes, this can be easily done using TokenStream class and hence getting the the BestTokens. But ofcourse you have to have this content in the index. DONE Ramesh Reddy On Wed, 2006-07-12 at 12:43 +0100, Mike Streeton wrote: > The simplest solution is always the best - when storing the p

Re: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Erick Erickson
Sweet!

RE: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Mike Streeton
The simplest solution is always the best - when storing the page, do not break up sentences. So a page will be all the sentences that occur on it. If a sentence starts on one page and finishes on the next it will be included in both pages in the index. Hope this helps Mike www.ardentia.com the h

Re: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Mile Rosu
Hello Erick, I have been trying on Google Books some scenarios and apparently found a Google bug ... It looks like they use number 2 approach, as this query illustrates it. http://books.google.com/books?vid=ISBN1564968316&id=14Xx2T8tmMYC&pg=PA8&lpg=PA8&dq=%2B%22the+site+is+unburdened%22&sig=QR

Re: Searching for a phrase which spans on 2 pages

2006-07-11 Thread Erick Erickson
I can think of several approaches, but the experts will no doubt show me up .. 1> index the entire book as a single document. Also, index the beginning and ending offset of each page in separate "documents". Assuming you can find the offset in the big doc of each matching phrase, you can also fin