It's really helpful to get feedback like this; I had done some tests of Nutch results quality a long time ago, but this is the first external one that I know of.
If anyone from OSU is listening, thanks for your help! --Mike PS - Does anyone know (Doug?) whether we are crawling the entire OSU site? Does Google have a coverage advantage? On Wed, 2004-06-23 at 11:37, Doug Cutting wrote: > All, > > Attached is a comparison of Nutch to a Google Appliance, searching the > campus intranet of Oregon State University, contributed by Lyle Albert > Benedict. > > Some of the problems with Nutch were configuration-related and easy to > fix. Nutch wasn't configured correctly to crawl all of the Oregon State > domains, and it was configured to crawl too many CGI's (e.g., the campus > map pages). > > Some problems have been fixed, e.g., indexing PDF and word documents. > > The most glaring deficiency that remains in Nutch is that we don't yet > combine multiple hits from the same host. This is logged as bug #752168: > > http://sourceforge.net/tracker/index.php?func=detail&aid=752168&group_id=59548&atid=491356 > > Probably this logic should be added to NutchBean.java. Does anyone want > to volunteer to work on this? > > But overall I think Nutch fared pretty well! > > Thanks, Lyle! This is really great to have. Thanks also to Scott > Kveton and Robert Hopson at OSU. > > Doug > > -------- Original Message -------- > Subject: New results > Date: Thu, 3 Jun 2004 14:28:44 -0500 (CDT) > From: Lyle Albert Benedict > To: Doug Cutting > > Hi Doug > > Here is a more complete result on OSU. I wanted to fly it by you--go ahead > and post it on the list if you want. > > The server nutch is down again. The results are pretty much the same as > the shorter study. I didn't really look much at the parameters--as nutch > did very well in comparison to google--and fine tuning it on this type of > study seemed like it might be counterproductive.. > > The only real problem I could find was a tendancy for nutch to get stuck > in endless repitions of the same thing-- i.e. submaps of the campus maps > and mailing lists. Google's key (similar to sponsored link) was also > interesting. Sort of like coffe cup holders--not necessary--but a feature. > > If you have another project like this, I would be happy to do it. I could > also write documentation. Any ideas? I'll be on vacation the last of > the month, so I might not get back right to you right away--but I'm gung > ho. > > Thanks > Lyle > > > ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
