Re: Plugin HitCollector

Dennis Kubes Mon, 23 Oct 2006 15:01:16 -0700

We are running into the same issue. Remember that hits just give youdoc id and getting hit details from the hit does another read. Solooping through the hits to access every document will do a read perdocument. If it is a small number of hits, no big deal, but the morehits to access, the more time. For our situation limiting the querydoesn't work, we need to know information about the hit itself (i.e. acertain field so we can do a count based on the field). We implementedit using HitCollector modifications in Lucene. This works but is notideal in terms of speed so we are looking at making modifications to theIndexReader itself so when it gets the Hits it also gets our field.Understand that doing something like this though changes core Lucenefunctionality. I am not necessarily recommending doing it this way, wejust couldn't find another way.


Dennis


Andrzej Bialecki wrote:

steveb wrote:
I would like to use my own HitCollector when doing a search using the
NutchBean as I have a requirement to access every document in theresult set
but without incurring the cost of traversing the Hits collection.
Accessing every document will be costly no matter what interface youwill use ... you may need to rethink your requirements. If you onlyneed to check for presence / absence of certain fields/terms, thenperhaps using Lucene query filters is a better idea.

Re: Plugin HitCollector

Reply via email to