[ 
https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514874
 ] 

Doğacan Güney commented on NUTCH-524:
-------------------------------------

If you are fetching N urls from a single host, then you should fetch all N urls 
from a single machine, no matter how many machines you have. This is necessary 
for web politeness (your fetcher should at most keep 1 connection open to a 
server at any time).

PS: You patch unnecessarily removes and re-adds the entire file even though it 
is actually just changing a single line. In the future, please do not attach a 
page that touches lines it doesn't change.

> Generate Problem with Single Node
> ---------------------------------
>
>                 Key: NUTCH-524
>                 URL: https://issues.apache.org/jira/browse/NUTCH-524
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Daniel Clark
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: nutch-0.9_PartitionUrlByHost.patch
>
>
> Nutch with Hadoop has problems with a single node in URL list when there is a 
> cluster of two or more machines.  I will provide a fix for this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to