Andrzej Bialecki wrote:
(FYI: if you wonder how it was working before, the trick was to generate just 1 split for the fetch job, which then lead to just one task being created for any input fetchlist.
I don't think that's right. The generator uses setNumReduceTasks() to the desired number of fetch tasks, to control how many host-disjoint fetchlists are generated. Then the fetcher does not permit input files to be split, so that fetch tasks remain host-disjoint. So lots of splits can be generated, by default one per mapred.map.tasks, permitting lots of parallel fetching.
This should still work. If it does not, I'd be interested to hear more details.
Doug ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
