Florent Gluck wrote:
Both return more or less the same results (w/ a difference of ~1.5% in the #fetches which is not surprising on a 100k set). I checked the logs and in the 2 cases, I see exactly 100'000 fetch attempts. You were right, it actually makes sense that the settings in /mapred-default.xml/ would affect the local crawl as well since they have nothing to do w/ ndfs. It therefore seems that /protocol-httpclient/ is reliable enough to be used (well, at least in my case).
This slightly different behavior could be caused perhaps by different protocol headers sent and accepted by these two plugins. The most important message from these tests is that neither plugin is horribly broken, it seems this was a problem with setting the mapred values in the wrong file...
Thank you very much for checking this! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
