I've got 400mill db i can run this against over the
next few days.

-byron

--- Stefan Groschupf <[EMAIL PROTECTED]> wrote:

> Hi Andrzej,
> 
> wow are really great news!
> > Using the optimized index, I reported previously
> that some of the  
> > top-scoring results were missing. As it happens,
> the missing  
> > results were typically the "junk" pages with high
> tf/idf but low  
> > "boost". Since we collect up to N hits, going from
> higher to lower  
> > "boost" values, the "junk" pages with low "boost"
> value were  
> > automatically eliminated. So, overall the
> subjective quality of  
> > results was improved. On the other hand, some of
> the legitimate  
> > results with a decent "boost" values were also
> skipped because they  
> > didn't fit within the fixed number of hits... ah,
> well. Perhaps we  
> > should limit the number of hits in
> LimitedCollector using a cutoff  
> > "boost" value, and not the maximum number of hits
> (or maybe both?).
> 
> As far we experiment it would be good to have booth.
> 
> > To conclude, I will add the IndexSorter.java to
> the core classes,  
> > and I suggest to continue the experiments ...
> 
> May someone out there in the community has a
> commercial search engine  
> running (e.g. google appliance or similar) so we may
> can setup a  
> nutch with the same pages and compare the results.
> I guess it will be difficult to compare nutch with
> yahoo or google  
> since nobody of us has a 4 billion index up and
> running. I would run  
> one on my laptop but I do not have the bandwidth to
> fetch until next  
> two days. :-D
> Great work!
> 
> Cheers,
> Stefan 
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to