subject:"Re\: retrieve tokens"

Re: retrieve tokens

2004-12-22 Thread Otis Gospodnetic

Martijn, have you seen the Highlighter in the Lucene Sandbox? If you've stored your text in the Lucene index, there is no need to go back to DB to pull out the blog, parse it, and highlight it - the Highlighter in the Sandbox will do this for you. Otis --- M. Smit [EMAIL PROTECTED] wrote:

Re: retrieve tokens

2004-12-22 Thread M. Smit

Otis, Problem is though that I'm a little reluctant storing the data Field.Text instead of Field.UnStored, because of the shear size of the documents and the multitude I would like to index (say some 100paged * 2k documents). But than again, it's size versus

Re: retrieve tokens

2004-12-22 Thread Erik Hatcher

On Dec 22, 2004, at 12:04 PM, M. Smit wrote: Problem is though that I'm a little reluctant storing the data Field.Text instead of Field.UnStored, because of the shear size of the documents and the multitude I would like to index (say some 100paged * 2k documents). But than again, it's size

Re: retrieve tokens

2004-12-22 Thread M. Smit

Erik Hatcher wrote: Highlighter does not mandate you store your text in the index. It is just a convenient way to do it. You're free to pull the text from anywhere and highlight it based on the query. Furthermore, you are saying that the highlighter takes care of the corresponding

Re: retrieve tokens

2004-12-22 Thread Mike Snare

But for the other issue on 'store lucene' vs 'store db'. Does anyone can provide me with some field experience on size? The system I'm developing will provide searching through some 2000 pdf's, say some 200 pages each. I feed the plain text into Lucene on a Field.UnStored bases. I also store

Re: retrieve tokens

2004-12-22 Thread Otis Gospodnetic

I suspect Martijn really wants that snippet dynamically generated, with KWIC, as on the lucenebook.com screen shot. Thus, he can't generate and store the snippet at index time, and has to construct it at search time. Otis --- Mike Snare [EMAIL PROTECTED] wrote: But for the other issue on

Re: retrieve tokens

2004-12-22 Thread Otis Gospodnetic

For simpy.com I store the full text of web pages in Lucene, in order to provide full-text web searches. Nutch (nutch.org) does the same. You can set the maximal number of tokens you want indexed via IndexWriter. You can also compress fields in the newest version of Lucene (or maybe just the one

Re: retrieve tokens

2004-12-22 Thread Erik Hatcher

On Dec 22, 2004, at 12:43 PM, M. Smit wrote: Erik Hatcher wrote: But for the other issue on 'store lucene' vs 'store db'. Does anyone can provide me with some field experience on size? The system I'm developing will provide searching through some 2000 pdf's, say some 200 pages each. I feed the

Re: retrieve tokens

2004-12-22 Thread Martijn

Erik Hatcher wrote: On Dec 22, 2004, at 12:43 PM, M. Smit wrote: Consider that you're only highlighting 20 or so entries at one time. Getting the text from a Lucene index you're already navigating will be quite quick. But it shouldn't be too bad to pull 20 records from a database either.

Re: retrieve tokens

2004-12-22 Thread Martijn

Otis Gospodnetic wrote: I suspect Martijn really wants that snippet dynamically generated, with KWIC, as on the lucenebook.com screen shot. Thus, he can't generate and store the snippet at index time, and has to construct it at search time. Otis That is correct. I won't be having a lot of

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

Re: retrieve tokens

10 matches

Site Navigation

Mail list logo

Footer information