Andrzej Bialecki wrote:
I'll test it soon - one comment, though. Currently you use a subclass of
RuntimeException to stop the collecting. I think we should come up with
a better mechanism - throwing exceptions is too costly.
I thought about this, but I could not see a simple way to achieve it.
And one exception thrown per query is not very expensive. But it is bad
style. Sigh.
Perhaps the
HitCollector.collect() method should return a boolean to signal whether
the searcher should continue working.
We don't really want a HitCollector in this case: we want a TopDocs. So
the patch I made is required: we need to extend the HitCollector that
implements TopDocs-based searching.
Long-term, to avoid the 'throw', we'd need to also:
1. Change:
TopDocs Searchable.search(Query, Filter, int numHits)
to:
TopDocs Searchable.search(Query, Filter, int numHits, maxTotalHits)
2. Add, for back-compatibility:
TopDocs Searcher.search(Query, Filter, int numHits) {
return search(query, filter, numHits, Integer.MAX_VALUE);
}
3. Add a new method:
/** Return false to stop hit processing. */
boolean HitCollector.processHit(int doc, float score) {
collect(doc, score); // for back-compatibility
return true;
}
Then change all calls to HitCollector.collect to instead call this,
and deprecate HitCollector.collect.
I think that would do it. But is it worth it?
In the past I've frequently wanted to be able to extend TopDocs-based
searching, so I think the Lucene patch I've constructed so far is
generally useful.
Doug
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers