Re: Trimming the list of docs returned.

Yonik Seeley Wed, 08 Nov 2006 11:10:52 -0800

On 11/8/06, Tom <[EMAIL PROTECTED]> wrote:

On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
 > Yes, a custom hit collector would work.  Searcher.doc() would be
 > deadly... but since each doc has at most one category, the FieldCache
 > could be used (it quickly maps id to field value and was historically
 > used for sorting).


Not to be dense, but how do I use a custom HitCollector with Solr?


You would need a custom request handler, then just use the
SolrIndexSearcher you get with a request... it exposes all of the
Lucene IndexSearcher methods.

-Yonik

On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
 > Hi Tom, I moderated your email in... you need to subscribe to prevent
 > your emails being blocked in the future.

Thanks. That's fixed, I hope. I was using the wrong address.

 > http://incubator.apache.org/solr/mailing_lists.html
 >
 > On 10/30/06, Tom <[EMAIL PROTECTED]> wrote:
 > > I'd like to be able to limit the number of documents returned from
 > > any particular group of documents, much as Google only shows a max of
 > > two results from any one website.
 >
 > You bring up an interesting problem that may be of general use.
 > Solr doesn't currently do this, but it should be possible (with some
 > work in the internals).
 >
 > > The docs are all marked as to which group they belong to. There will
 > > probably be multiple groups returned from any search. Documents
 > > belong to only one group
 >
 > Documents belonging to only one group does make things easier.
 >
 > > I could just examine each returned document, and discard documents
 > > from groups I have seen before, but that seems slow (but I'm not sure
 > > there is a better alternative).
 > >
 > > The number of groups is fairly high percentage of the number of
 > > documents (maybe 5% of all documents), so building something like a
 > > filter for each group doesn't seem feasible.
 > >
 > > CustomHitCollector of some sort could work, but there is the comment
 > > in the javadoc about "should not call  Searcher.doc(int)
 > > or  IndexReader.document(int) on every  document number encountered."
 > > which would seem to be necessary to get the group id.
 >
 > Yes, a custom hit collector would work.  Searcher.doc() would be
 > deadly... but since each doc has at most one category, the FieldCache
 > could be used (it quickly maps id to field value and was historically
 > used for sorting).
 >
 > It might be useful to see what Nutch does in this regard too.
 >
 > -Yonik

Re: Trimming the list of docs returned.

Reply via email to