My teammates and I are running Giraph on a cluster where a firewall is configured on each compute node. We had 100 ports opened on the compute nodes, which we thought would be more than enough to accommodate a large number of workers. However, we're unable to go beyond about 90 workers with our Giraph jobs, due to Netty ports being allocated outside of the range (30000-30100). We're not sure why this is happening. We shouldn't be running more than one worker per compute node, so we were assuming that only port 30000 would be used, but we're routinely seeing Giraph try to use ports greater than 30100 when we request close to 100 workers. This leads us to believe that a simple one up numbering scheme is being used that doesn't take the host into consideration, although this is only speculation.
Is there a way around this problem? Our system admins understandably balked at opening 1000 ports. Larry
