[ https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514874 ]
Doğacan Güney commented on NUTCH-524: ------------------------------------- If you are fetching N urls from a single host, then you should fetch all N urls from a single machine, no matter how many machines you have. This is necessary for web politeness (your fetcher should at most keep 1 connection open to a server at any time). PS: You patch unnecessarily removes and re-adds the entire file even though it is actually just changing a single line. In the future, please do not attach a page that touches lines it doesn't change. > Generate Problem with Single Node > --------------------------------- > > Key: NUTCH-524 > URL: https://issues.apache.org/jira/browse/NUTCH-524 > Project: Nutch > Issue Type: Bug > Components: generator > Affects Versions: 0.9.0 > Environment: All > Reporter: Daniel Clark > Priority: Minor > Fix For: 0.9.0 > > Attachments: nutch-0.9_PartitionUrlByHost.patch > > > Nutch with Hadoop has problems with a single node in URL list when there is a > cluster of two or more machines. I will provide a fix for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers