[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines
[ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552028#comment-13552028 ] Sebastian Nagel commented on NUTCH-1499: So, a vote for "won't fix". Comments? > Usage of multiple ipv4 addresses and network cards on fetcher machines > -- > > Key: NUTCH-1499 > URL: https://issues.apache.org/jira/browse/NUTCH-1499 > Project: Nutch > Issue Type: New Feature > Components: fetcher >Affects Versions: 1.5.1 >Reporter: Walter Tietze >Priority: Minor > Fix For: 1.7 > > Attachments: apache-nutch-1.5.1.NUTCH-1499.patch > > > Adds for the fetcher threads the ability to use multiple configured ipv4 > addresses. > On some cluster machines there are several ipv4 addresses configured where > each ip address is associated with its own network interface. > This patch enables to configure the protocol-http and the protocol-httpclient > to use these network interfaces in a round robin style. > If the feature is enabled, a helper class reads at *startup* the network > configuration. In each http network connection the next ip address is taken. > This method is synchronized, but this should be no bottleneck for the overall > performance of the fetcher threads. > This feature is tested on our cluster for the protocol-http and the > protocol-httpclient protocol. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines
[ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533956#comment-13533956 ] Walter Tietze commented on NUTCH-1499: -- Hi Sebastian, it seems that the implemented feature of using multiple IPs does not have any issues on the performance of the crawler. With a standard network configuration all traffic has to use the configured default gateway, which in this case is the bottleneck for network traffic. A configuration for advanced routing tables seems to be no real solution. *Best practice for avoiding the network bottleneck seems to be your suggested solution with the bonding interface.* This patch just enables to use several different IP addresses for one fetcher host. You can close this task if want to. > Usage of multiple ipv4 addresses and network cards on fetcher machines > -- > > Key: NUTCH-1499 > URL: https://issues.apache.org/jira/browse/NUTCH-1499 > Project: Nutch > Issue Type: New Feature > Components: fetcher >Affects Versions: 1.5.1 >Reporter: Walter Tietze >Priority: Minor > Attachments: apache-nutch-1.5.1.NUTCH-1499.patch > > > Adds for the fetcher threads the ability to use multiple configured ipv4 > addresses. > On some cluster machines there are several ipv4 addresses configured where > each ip address is associated with its own network interface. > This patch enables to configure the protocol-http and the protocol-httpclient > to use these network interfaces in a round robin style. > If the feature is enabled, a helper class reads at *startup* the network > configuration. In each http network connection the next ip address is taken. > This method is synchronized, but this should be no bottleneck for the overall > performance of the fetcher threads. > This feature is tested on our cluster for the protocol-http and the > protocol-httpclient protocol. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines
[ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507944#comment-13507944 ] Sebastian Nagel commented on NUTCH-1499: Thanks! That's a plausible reason: (let's call it) "administrative constraints". +1 (lean patch, look's good, I'll try to test it on a machine with suitable network settings) > Usage of multiple ipv4 addresses and network cards on fetcher machines > -- > > Key: NUTCH-1499 > URL: https://issues.apache.org/jira/browse/NUTCH-1499 > Project: Nutch > Issue Type: New Feature > Components: fetcher >Affects Versions: 1.5.1 >Reporter: Walter Tietze >Priority: Minor > Attachments: apache-nutch-1.5.1.NUTCH-1499.patch > > > Adds for the fetcher threads the ability to use multiple configured ipv4 > addresses. > On some cluster machines there are several ipv4 addresses configured where > each ip address is associated with its own network interface. > This patch enables to configure the protocol-http and the protocol-httpclient > to use these network interfaces in a round robin style. > If the feature is enabled, a helper class reads at *startup* the network > configuration. In each http network connection the next ip address is taken. > This method is synchronized, but this should be no bottleneck for the overall > performance of the fetcher threads. > This feature is tested on our cluster for the protocol-http and the > protocol-httpclient protocol. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines
[ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506650#comment-13506650 ] Walter Tietze commented on NUTCH-1499: -- @Sebastian, please don't mind, I'm not answering until now. In our cluster we also use the bonding driver. I asked the networkers of our partners already why they don't wanted to use this kind of configuration and still wait for their response. If I get the good reasons or not, I will inform you at once! > Usage of multiple ipv4 addresses and network cards on fetcher machines > -- > > Key: NUTCH-1499 > URL: https://issues.apache.org/jira/browse/NUTCH-1499 > Project: Nutch > Issue Type: New Feature > Components: fetcher >Affects Versions: 1.5.1 >Reporter: Walter Tietze >Priority: Minor > Attachments: apache-nutch-1.5.1.NUTCH-1499.patch > > > Adds for the fetcher threads the ability to use multiple configured ipv4 > addresses. > On some cluster machines there are several ipv4 addresses configured where > each ip address is associated with its own network interface. > This patch enables to configure the protocol-http and the protocol-httpclient > to use these network interfaces in a round robin style. > If the feature is enabled, a helper class reads at *startup* the network > configuration. In each http network connection the next ip address is taken. > This method is synchronized, but this should be no bottleneck for the overall > performance of the fetcher threads. > This feature is tested on our cluster for the protocol-http and the > protocol-httpclient protocol. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines
[ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504136#comment-13504136 ] Sebastian Nagel commented on NUTCH-1499: Short and precise patch. However, is there a reason why the problem is not solved on hardware or system level, cf. [[bonding|http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding]]? > Usage of multiple ipv4 addresses and network cards on fetcher machines > -- > > Key: NUTCH-1499 > URL: https://issues.apache.org/jira/browse/NUTCH-1499 > Project: Nutch > Issue Type: New Feature > Components: fetcher >Affects Versions: 1.5.1 >Reporter: Walter Tietze >Priority: Minor > Attachments: apache-nutch-1.5.1.NUTCH-1499.patch > > > Adds for the fetcher threads the ability to use multiple configured ipv4 > addresses. > On some cluster machines there are several ipv4 addresses configured where > each ip address is associated with its own network interface. > This patch enables to configure the protocol-http and the protocol-httpclient > to use these network interfaces in a round robin style. > If the feature is enabled, a helper class reads at *startup* the network > configuration. In each http network connection the next ip address is taken. > This method is synchronized, but this should be no bottleneck for the overall > performance of the fetcher threads. > This feature is tested on our cluster for the protocol-http and the > protocol-httpclient protocol. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira