Martijn, have you seen the Highlighter in the Lucene Sandbox?
If you've stored your text in the Lucene index, there is no need to go
back to DB to pull out the blog, parse it, and highlight it - the
Highlighter in the Sandbox will do this for you.
Otis
--- M. Smit [EMAIL PROTECTED] wrote:
Otis,
Problem is though that I'm a little reluctant storing the data
Field.Text instead of Field.UnStored, because of the shear size of the
documents and the multitude I would like to index (say some 100paged *
2k documents). But than again, it's size versus
On Dec 22, 2004, at 12:04 PM, M. Smit wrote:
Problem is though that I'm a little reluctant storing the data
Field.Text instead of Field.UnStored, because of the shear size of the
documents and the multitude I would like to index (say some 100paged *
2k documents). But than again, it's size
Erik Hatcher wrote:
Highlighter does not mandate you store your text in the index. It is
just a convenient way to do it. You're free to pull the text from
anywhere and highlight it based on the query.
Furthermore, you are saying that the highlighter takes care of the
corresponding
But for the other issue on 'store lucene' vs 'store db'. Does anyone can
provide me with some field experience on size?
The system I'm developing will provide searching through some 2000
pdf's, say some 200 pages each. I feed the plain text into Lucene on a
Field.UnStored bases. I also store
I suspect Martijn really wants that snippet dynamically generated, with
KWIC, as on the lucenebook.com screen shot. Thus, he can't generate
and store the snippet at index time, and has to construct it at search
time.
Otis
--- Mike Snare [EMAIL PROTECTED] wrote:
But for the other issue on
For simpy.com I store the full text of web pages in Lucene, in order to
provide full-text web searches. Nutch (nutch.org) does the same. You
can set the maximal number of tokens you want indexed via IndexWriter.
You can also compress fields in the newest version of Lucene (or maybe
just the one
On Dec 22, 2004, at 12:43 PM, M. Smit wrote:
Erik Hatcher wrote:
But for the other issue on 'store lucene' vs 'store db'. Does anyone
can provide me with some field experience on size?
The system I'm developing will provide searching through some 2000
pdf's, say some 200 pages each. I feed the
Erik Hatcher wrote:
On Dec 22, 2004, at 12:43 PM, M. Smit wrote:
Consider that you're only highlighting 20 or so entries at one time.
Getting the text from a Lucene index you're already navigating will be
quite quick. But it shouldn't be too bad to pull 20 records from a
database either.
Otis Gospodnetic wrote:
I suspect Martijn really wants that snippet dynamically generated, with
KWIC, as on the lucenebook.com screen shot. Thus, he can't generate
and store the snippet at index time, and has to construct it at search
time.
Otis
That is correct. I won't be having a lot of
10 matches
Mail list logo