Re: Search results excerpt similar to Google
Storing in the index has some performance benefits in the CVS version of Lucene, as you can store term position offset information and avoid having to re-analyze for highlighting. Speaking of which, is there a planned release date for a version that contains this feature? -- Maik Schreiber * http://www.blizzy.de GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search results excerpt similar to Google
On Jan 28, 2005, at 1:46 AM, Jason Polites wrote: I think they do a proximity result based on keyword matches. So... If you search for "lucene" and the document returned has this word at the very start and the very end of the document, then you will see the two sentences (sequences of words) surrounding the two keyword matches, one from the start of the document and one from the end. There is a Highlighter package in the Lucene sandbox. Highlighting looks like this: http://www.lucenebook.com/search?query=highlighter How you determine which words from the result you include in the summary is up to you. The problem with this it that in Lucene-land you have to store the content of the document inside in index verbatim (so you can get arbitrary portions of it out). This means your index will be larger than it really needs to be. You do not have to store the content in the index, it just happens to be convenient for most situations. Content could be stored anywhere. Getting the text and reanalyzing it for Highlighter is all that is required. Storing in the index has some performance benefits in the CVS version of Lucene, as you can store term position offset information and avoid having to re-analyze for highlighting. Erik I usually just store the first 255 characters in the index and use this as a summary. It's not as good as Google, but it seems to work ok. - Original Message - From: "Ben" <[EMAIL PROTECTED]> To: "Lucene" Sent: Friday, January 28, 2005 5:08 PM Subject: Search results excerpt similar to Google Hi Is it hard to implement a function that displays the search results excerpts similar to Google? Is it just string manipulations or there are some logic behind it? I like their excerpts. Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search results excerpt similar to Google
I think they do a proximity result based on keyword matches. So... If you search for "lucene" and the document returned has this word at the very start and the very end of the document, then you will see the two sentences (sequences of words) surrounding the two keyword matches, one from the start of the document and one from the end. How you determine which words from the result you include in the summary is up to you. The problem with this it that in Lucene-land you have to store the content of the document inside in index verbatim (so you can get arbitrary portions of it out). This means your index will be larger than it really needs to be. I usually just store the first 255 characters in the index and use this as a summary. It's not as good as Google, but it seems to work ok. - Original Message - From: "Ben" <[EMAIL PROTECTED]> To: "Lucene" Sent: Friday, January 28, 2005 5:08 PM Subject: Search results excerpt similar to Google Hi Is it hard to implement a function that displays the search results excerpts similar to Google? Is it just string manipulations or there are some logic behind it? I like their excerpts. Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]