On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)

With this patch and a top result set in the xml file
does that mean it will stop scanning the index at that
point?  Is there a methodology to actually prune the
index on some scaling factor so that a  4 billion page
index can be searchable only 1k results deep on
average?

seems like some sort of method to do the above would
cut your search processing/index size down fairly
well. But it may be a more expensive to post process
to this scale then it is to simply push and let the
query optimize ignore it as needed.. afterall disk
space is getting rather cheap compared to cpu
processing & memory.



--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Andrzej Bialecki wrote:
> > I'm happy to report that further tests performed
> on a larger index seem 
> > to show that the overall impact of the IndexSorter
> is definitely 
> > positive: performance improvements are
> significant, and the overall 
> > quality of results seems at least comparable, if
> not actually better.
> 
> Great news!
> 
> I will submit the Lucene patches ASAP, now that we
> know they're useful.
> 
> Doug
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to