Avery, Thanks for the clarification. I'll look into adding the configuration option. I'll see about providing a patch, if we go down that path.
Larry On Fri, Nov 22, 2013 at 2:23 PM, Avery Ching <[email protected]> wrote: > The reason is actually simple. If you run more than one Giraph worker > per machine, there will be a port conflict. Worse yet, imagine multiple > Giraph jobs running simultaneously running on a cluster, hence we have the > increase port strategy. It would be straightforward to add a configurable > option to use a single port though for situations such as yours though > (especially since you know where the code is now). > > Avery > > > On 11/22/13 11:19 AM, Larry Compton wrote: > > Avery, > > It looks like the ports are being allocated the way we suspected (30000 + > task ID). That's a problem for us because we'll have to open a wide bank of > ports (the SAs want to minimize open ports) and also keep them available > for use by Giraph. Ideally, the port allocation would take the host into > consideration. If you ask for 200 workers and they're each running on a > different host, port 30000 could be used by every Netty server. The way > it's working now, a different port is being allocated per worker, which > appears unnecessary. Is there a reason a different port is used per > worker/task? > > Is this still the way ports are allocated in Giraph 1.1.0? > > Larry > > > On Fri, Nov 22, 2013 at 1:18 PM, Avery Ching <[email protected]> wrote: > >> The port logic is a bit complex, but all encapsulated in NettyServer.java >> (see below). >> >> If nothing else is running on those ports and you really only have one >> giraph worker per port you should be good to go. Can you look at the logs >> for the worker that is trying to start a port other than base port + taskId? >> >> >> int taskId = conf.getTaskPartition(); >> int numTasks = conf.getInt("mapred.map.tasks", 1); >> // Number of workers + 1 for master >> int numServers = conf.getInt(GiraphConstants.MAX_WORKERS, numTasks) + >> 1; >> int portIncrementConstant = >> (int) Math.pow(10, Math.ceil(Math.log10(numServers))); >> int bindPort = GiraphConstants.IPC_INITIAL_PORT.get(conf) + taskId; >> int bindAttempts = 0; >> final int maxIpcPortBindAttempts = >> MAX_IPC_PORT_BIND_ATTEMPTS.get(conf); >> final boolean failFirstPortBindingAttempt = >> GiraphConstants.FAIL_FIRST_IPC_PORT_BIND_ATTEMPT.get(conf); >> >> // Simple handling of port collisions on the same machine while >> // preserving debugability from the port number alone. >> // Round up the max number of workers to the next power of 10 and use >> // it as a constant to increase the port number with. >> while (bindAttempts < maxIpcPortBindAttempts) { >> this.myAddress = new InetSocketAddress(localHostname, bindPort); >> if (failFirstPortBindingAttempt && bindAttempts == 0) { >> if (LOG.isInfoEnabled()) { >> LOG.info("start: Intentionally fail first " + >> "binding attempt as giraph.failFirstIpcPortBindAttempt " + >> "is true, port " + bindPort); >> } >> ++bindAttempts; >> bindPort += portIncrementConstant; >> continue; >> } >> >> try { >> Channel ch = bootstrap.bind(myAddress); >> accepted.add(ch); >> >> break; >> } catch (ChannelException e) { >> LOG.warn("start: Likely failed to bind on attempt " + >> bindAttempts + " to port " + bindPort, e); >> ++bindAttempts; >> bindPort += portIncrementConstant; >> } >> } >> if (bindAttempts == maxIpcPortBindAttempts || myAddress == null) { >> throw new IllegalStateException( >> "start: Failed to start NettyServer with " + >> bindAttempts + " attempts"); >> >> } >> >> >> >> On 11/22/13 9:15 AM, Larry Compton wrote: >> >>> My teammates and I are running Giraph on a cluster where a firewall is >>> configured on each compute node. We had 100 ports opened on the compute >>> nodes, which we thought would be more than enough to accommodate a large >>> number of workers. However, we're unable to go beyond about 90 workers with >>> our Giraph jobs, due to Netty ports being allocated outside of the range >>> (30000-30100). We're not sure why this is happening. We shouldn't be >>> running more than one worker per compute node, so we were assuming that >>> only port 30000 would be used, but we're routinely seeing Giraph try to use >>> ports greater than 30100 when we request close to 100 workers. This leads >>> us to believe that a simple one up numbering scheme is being used that >>> doesn't take the host into consideration, although this is only speculation. >>> >>> Is there a way around this problem? Our system admins understandably >>> balked at opening 1000 ports. >>> >>> Larry >>> >> >> > > >
