[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

2013-01-12 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552028#comment-13552028
 ] 

Sebastian Nagel commented on NUTCH-1499:


So, a vote for "won't fix". Comments?

> Usage of multiple ipv4 addresses and network cards on fetcher machines
> --
>
> Key: NUTCH-1499
> URL: https://issues.apache.org/jira/browse/NUTCH-1499
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 1.5.1
>Reporter: Walter Tietze
>Priority: Minor
> Fix For: 1.7
>
> Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 
> addresses.
> On some cluster machines there are several ipv4 addresses configured where 
> each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient 
>  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network 
> configuration. In each http network connection the next ip address is taken. 
> This method is synchronized, but this should be no bottleneck for the overall 
> performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the 
> protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

2012-12-17 Thread Walter Tietze (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533956#comment-13533956
 ] 

Walter Tietze commented on NUTCH-1499:
--

Hi Sebastian,

it seems that the implemented feature of using multiple IPs does not have any 
issues on the performance of the crawler.

With a standard network configuration all traffic has to use the configured 
default gateway, which in this case is the bottleneck for network traffic.

A configuration for advanced routing tables seems to be no real solution.



*Best practice for avoiding the network bottleneck seems to be your suggested 
solution with the bonding interface.*



This patch just enables to use several different IP addresses for one fetcher 
host.

You can close this task if want to.



> Usage of multiple ipv4 addresses and network cards on fetcher machines
> --
>
> Key: NUTCH-1499
> URL: https://issues.apache.org/jira/browse/NUTCH-1499
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 1.5.1
>Reporter: Walter Tietze
>Priority: Minor
> Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 
> addresses.
> On some cluster machines there are several ipv4 addresses configured where 
> each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient 
>  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network 
> configuration. In each http network connection the next ip address is taken. 
> This method is synchronized, but this should be no bottleneck for the overall 
> performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the 
> protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

2012-12-01 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507944#comment-13507944
 ] 

Sebastian Nagel commented on NUTCH-1499:


Thanks! That's a plausible reason: (let's call it) "administrative constraints".
+1 (lean patch, look's good, I'll try to test it on a machine with suitable 
network settings)

> Usage of multiple ipv4 addresses and network cards on fetcher machines
> --
>
> Key: NUTCH-1499
> URL: https://issues.apache.org/jira/browse/NUTCH-1499
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 1.5.1
>Reporter: Walter Tietze
>Priority: Minor
> Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 
> addresses.
> On some cluster machines there are several ipv4 addresses configured where 
> each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient 
>  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network 
> configuration. In each http network connection the next ip address is taken. 
> This method is synchronized, but this should be no bottleneck for the overall 
> performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the 
> protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

2012-11-29 Thread Walter Tietze (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506650#comment-13506650
 ] 

Walter Tietze commented on NUTCH-1499:
--

@Sebastian,

please don't mind, I'm not answering until now. 

In our cluster we also use the bonding driver. I asked the networkers of our 
partners already why they don't wanted to use this kind of configuration and 
still wait for their response.

If I get the good reasons or not, I will inform you at once!



> Usage of multiple ipv4 addresses and network cards on fetcher machines
> --
>
> Key: NUTCH-1499
> URL: https://issues.apache.org/jira/browse/NUTCH-1499
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 1.5.1
>Reporter: Walter Tietze
>Priority: Minor
> Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 
> addresses.
> On some cluster machines there are several ipv4 addresses configured where 
> each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient 
>  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network 
> configuration. In each http network connection the next ip address is taken. 
> This method is synchronized, but this should be no bottleneck for the overall 
> performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the 
> protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

2012-11-26 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504136#comment-13504136
 ] 

Sebastian Nagel commented on NUTCH-1499:


Short and precise patch. However, is there a reason why the problem is not 
solved on hardware or system level, cf. 
[[bonding|http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding]]?

> Usage of multiple ipv4 addresses and network cards on fetcher machines
> --
>
> Key: NUTCH-1499
> URL: https://issues.apache.org/jira/browse/NUTCH-1499
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 1.5.1
>Reporter: Walter Tietze
>Priority: Minor
> Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 
> addresses.
> On some cluster machines there are several ipv4 addresses configured where 
> each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient 
>  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network 
> configuration. In each http network connection the next ip address is taken. 
> This method is synchronized, but this should be no bottleneck for the overall 
> performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the 
> protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira