I have decided to work on  documentation for Nutch and
wanted to know if there are any benchmark programs or
keyword lists that developers are using to test their
implementations.

I have been using keywords to see how Nutch works.
Here are some of my observations:

1) Sites using Nutch have more diversity in results
than the major search engines. There is a lot more
overlap in results  from Google and Yahoo when
searching with the same keywords.

2) It looks like Nutch doesn't use anchor text to
determine search results. On Google, when people
search for "miserable failure", the top link is to the
Biography of President George W. Bush. (This is due to
anchor text on a large number of weblogs.)

3) Nutch filters out less porn than the major search
engines. For an example, use the keyword
"cheerleaders".
----------
Barry Bowen


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to