[Nutch-dev] Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Doug Cutting Tue, 13 Dec 2005 03:41:08 -0800

Andrzej Bialecki wrote:

Shouldn't this be combined with a HitCollector that collects only thefirst-n matches? Otherwise we still need to scan the whole posting list...


Yes.  I was just posting the work-in-progress.

We will also need to estimate the total number of matches byextrapolating linearly from the maximum doc id processed. Finally, itis probably rather slow for large indexes, whose .fdt won't fit inmemory. A simple way to improve that might be to useSimilarity.floatToByte-encoded floats when sorting (e.g., the norm froman untokenized field) so that documents whose boosts are close are notre-ordered. I'll start work on these in the morning. (It is currentlymy middle-of-night.)


Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Reply via email to