It's really helpful to get feedback like this; I had done some
tests of Nutch results quality a long time ago, but this is the 
first external one that I know of.  

  If anyone from OSU is listening, thanks for your help!

  --Mike

PS - Does anyone know (Doug?) whether we are crawling the entire
OSU site?  Does Google have a coverage advantage?



On Wed, 2004-06-23 at 11:37, Doug Cutting wrote:
> All,
> 
> Attached is a comparison of Nutch to a Google Appliance, searching the 
> campus intranet of Oregon State University, contributed by Lyle Albert 
> Benedict.
> 
> Some of the problems with Nutch were configuration-related and easy to 
> fix.  Nutch wasn't configured correctly to crawl all of the Oregon State 
> domains, and it was configured to crawl too many CGI's (e.g., the campus 
> map pages).
> 
> Some problems have been fixed, e.g., indexing PDF and word documents.
> 
> The most glaring deficiency that remains in Nutch is that we don't yet 
> combine multiple hits from the same host.  This is logged as bug #752168:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=752168&group_id=59548&atid=491356
> 
> Probably this logic should be added to NutchBean.java.  Does anyone want 
> to volunteer to work on this?
> 
> But overall I think Nutch fared pretty well!
> 
> Thanks, Lyle!  This is really great to have.  Thanks also to Scott 
> Kveton and Robert Hopson at OSU.
> 
> Doug
> 
> -------- Original Message --------
> Subject: New results
> Date: Thu, 3 Jun 2004 14:28:44 -0500 (CDT)
> From: Lyle Albert Benedict
> To: Doug Cutting
> 
> Hi Doug
> 
> Here is a more complete result on OSU. I wanted to fly it by you--go ahead
> and post it on the list if you want.
> 
> The server nutch is down again. The results are pretty much the same as
> the shorter study. I didn't really look much at the parameters--as nutch
> did very well in comparison to google--and fine tuning it on this type of
> study seemed like it might be counterproductive..
> 
> The only real problem I could find was a tendancy for nutch to get stuck
> in endless repitions of the same thing-- i.e. submaps of the campus maps
> and mailing lists. Google's key (similar to sponsored link) was also
> interesting. Sort of like coffe cup holders--not necessary--but a feature.
> 
> If you have another project like this, I would be happy to do it. I could
> also write documentation. Any ideas? I'll be on vacation the last of
> the month, so I might not get back right to you right away--but I'm gung
> ho.
> 
> Thanks
> Lyle
> 
> 
> 




-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to