Re: Search results excerpt similar to Google

2005-01-28 Thread Maik Schreiber
Storing in the index has some performance benefits in the CVS 
version of Lucene, as you can store term position offset information and 
avoid having to re-analyze for highlighting.
Speaking of which, is there a planned release date for a version that 
contains this feature?

--
Maik Schreiber   *   http://www.blizzy.de
GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search results excerpt similar to Google

2005-01-28 Thread Erik Hatcher
On Jan 28, 2005, at 1:46 AM, Jason Polites wrote:
I think they do a proximity result based on keyword matches.  So... If 
you search for "lucene" and the document returned has this word at the 
very start and the very end of the document, then you will see the two 
sentences (sequences of words) surrounding the two keyword matches, 
one from the start of the document and one from the end.
There is a Highlighter package in the Lucene sandbox.  Highlighting 
looks like this:

http://www.lucenebook.com/search?query=highlighter
How you determine which words from the result you include in the 
summary is up to you.  The problem with this it that in Lucene-land 
you have to store the content of the document inside in index verbatim 
(so you can get arbitrary portions of it out).  This means your index 
will be larger than it really needs to be.
You do not have to store the content in the index, it just happens to 
be convenient for most situations.  Content could be stored anywhere.  
Getting the text and reanalyzing it for Highlighter is all that is 
required.  Storing in the index has some performance benefits in the 
CVS version of Lucene, as you can store term position offset 
information and avoid having to re-analyze for highlighting.

Erik
I usually just store the first 255 characters in the index and use 
this as a summary.  It's not as good as Google, but it seems to work 
ok.

- Original Message - From: "Ben" <[EMAIL PROTECTED]>
To: "Lucene" 
Sent: Friday, January 28, 2005 5:08 PM
Subject: Search results excerpt similar to Google

Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search results excerpt similar to Google

2005-01-27 Thread Jason Polites
I think they do a proximity result based on keyword matches.  So... If you 
search for "lucene" and the document returned has this word at the very 
start and the very end of the document, then you will see the two sentences 
(sequences of words) surrounding the two keyword matches, one from the start 
of the document and one from the end.

How you determine which words from the result you include in the summary is 
up to you.  The problem with this it that in Lucene-land you have to store 
the content of the document inside in index verbatim (so you can get 
arbitrary portions of it out).  This means your index will be larger than it 
really needs to be.

I usually just store the first 255 characters in the index and use this as a 
summary.  It's not as good as Google, but it seems to work ok.

- Original Message - 
From: "Ben" <[EMAIL PROTECTED]>
To: "Lucene" 
Sent: Friday, January 28, 2005 5:08 PM
Subject: Search results excerpt similar to Google


Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]