Re: getting exact number of matches

Stefan Groschupf Tue, 30 May 2006 14:42:49 -0700

Hi,
why not dedub your complete index before and not until runtime?
There is a dedub tool for that.


Stefan

Am 29.05.2006 um 21:20 schrieb Stefan Neufeind:

Hi Eugen,
what I've found (and if I'm right) is that the page-calculation isdone
in Lucene. As it is quite "expensive" (time-consuming) to dedup _all_
results when you only need the first page, I guess currently thisis notdone at the moment. However, since I also needed the exact number,I did
find out the "dirty hack" at least. That helps for the moment.
But as it might take quite a while to find out the exact number ofpagesI suggest that e.g. you compose a "hash" or the words searched for,andmaybe to be sure the number of non-dedupped searchresults, so youdon'thave to search the exact number again and again when moving betweenpages.
Hope that helps,
 Stefan

Eugen Kochuev wrote:
And did you manage to locate the place where the filtering on per
site basis is done? Is it possible to tweak nutch to make it telling
the exact number of pages after filtering or is there a problem?
I've got a pending nutch-issue on this
http://issues.apache.org/jira/browse/NUTCH-288
A dirty workaround (though working) is to do a search with onehit perpage and start-index as 99999. That will give you the actualstart-indexof the last item, which +1 is the number of results you arelooking for.Since requesting the last page takes a bit resources, you mightwant to
cache that result actually - so users searching again or navigating
through pages get the number of pages faster.
PS: For the OpenSearch-connector to not throw an exception but toreturn
the last page, please apply the patch I attached to the bug.

Re: getting exact number of matches

Reply via email to