We are running into the same issue.  Remember that hits just give you 
doc id and getting hit details from the hit does another read.  So 
looping through the hits to access every document will do a read per 
document.  If it is a small number of hits, no big deal, but the more 
hits to access, the more time.  For our situation limiting the query 
doesn't work, we need to know information about the hit itself (i.e. a 
certain field so we can do a count based on the field).  We implemented 
it using HitCollector modifications in Lucene.  This works but is not 
ideal in terms of speed so we are looking at making modifications to the 
IndexReader itself so when it gets the Hits it also gets our field.  
Understand that doing something like this though changes core Lucene 
functionality.  I am not necessarily recommending doing it this way, we 
just couldn't find another way.

Dennis

Andrzej Bialecki wrote:
> steveb wrote:
>> I would like to use my own HitCollector when doing a search using the
>> NutchBean as I have a requirement to access every document in the 
>> result set
>> but without incurring the cost of traversing the Hits collection.
>>
>>   
>
> Accessing every document will be costly no matter what interface you 
> will use ... you may need to rethink your requirements. If you only 
> need to check for presence / absence of certain fields/terms, then 
> perhaps using Lucene query filters is a better idea.
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to