[
https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514874
]
Doğacan Güney commented on NUTCH-524:
-------------------------------------
If you are fetching N urls from a single host, then you should fetch all N urls
from a single machine, no matter how many machines you have. This is necessary
for web politeness (your fetcher should at most keep 1 connection open to a
server at any time).
PS: You patch unnecessarily removes and re-adds the entire file even though it
is actually just changing a single line. In the future, please do not attach a
page that touches lines it doesn't change.
> Generate Problem with Single Node
> ---------------------------------
>
> Key: NUTCH-524
> URL: https://issues.apache.org/jira/browse/NUTCH-524
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 0.9.0
> Environment: All
> Reporter: Daniel Clark
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: nutch-0.9_PartitionUrlByHost.patch
>
>
> Nutch with Hadoop has problems with a single node in URL list when there is a
> cluster of two or more machines. I will provide a fix for this.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers