I have decided to work on documentation for Nutch and wanted to know if there are any benchmark programs or keyword lists that developers are using to test their implementations.
I have been using keywords to see how Nutch works. Here are some of my observations: 1) Sites using Nutch have more diversity in results than the major search engines. There is a lot more overlap in results from Google and Yahoo when searching with the same keywords. 2) It looks like Nutch doesn't use anchor text to determine search results. On Google, when people search for "miserable failure", the top link is to the Biography of President George W. Bush. (This is due to anchor text on a large number of weblogs.) 3) Nutch filters out less porn than the major search engines. For an example, use the keyword "cheerleaders". ---------- Barry Bowen ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
