Re: Duplicate hits using ParallelMultiSearcher

2005-01-24 Thread Jason Polites
Agreed on the "set of unique messages", however the problem I have is with 
the "count" of the Hits.  The Hits object may contain 100 results (for 
example), of which only 90 are unique.  Because I am paging through results 
10 at a time, I need to know the total count without loading each document. 
If I get a count of 100 but a Collection of only 90 my paging breaks.

After careful consideration I have decided that the better approach is to 
create a separate "global" index in which all messages are stored.  This 
will not only relieve my duplication issue but should also scale better 
if/when there are several hundred or several thousand distinct indexes.

Thanks,
- JP
- Original Message - 
From: "PA" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, January 24, 2005 10:43 PM
Subject: Re: Duplicate hits using ParallelMultiSearcher


On Jan 24, 2005, at 09:14, Jason Polites wrote:
I am aware of the Filter object however the unique identifier of my 
document is a field within the lucene document itself (messageid); and I 
am reluctant to access this field using the public API for every Hit as I 
fear it will have drastic performance implications.
Well... I don't see any way around that as you basically want to uniquely 
identify your messages based on their Message-ID.

That said, you don't need to do it during the search itself. You could 
simply perform your search as you do now and then create a set of unique 
messages while preserving Lucene Hits sort ordering for "relevance" 
purpose.

HTH.
Cheers
--
PA
http://alt.textdrive.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Duplicate hits using ParallelMultiSearcher

2005-01-24 Thread PA
On Jan 24, 2005, at 09:14, Jason Polites wrote:
I am aware of the Filter object however the unique identifier of my 
document is a field within the lucene document itself (messageid); and 
I am reluctant to access this field using the public API for every Hit 
as I fear it will have drastic performance implications.
Well... I don't see any way around that as you basically want to 
uniquely identify your messages based on their Message-ID.

That said, you don't need to do it during the search itself. You could 
simply perform your search as you do now and then create a set of 
unique messages while preserving Lucene Hits sort ordering for 
"relevance" purpose.

HTH.
Cheers
--
PA
http://alt.textdrive.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]