Avery,

Thanks for the clarification. I'll look into adding the configuration
option. I'll see about providing a patch, if we go down that path.

Larry


On Fri, Nov 22, 2013 at 2:23 PM, Avery Ching <[email protected]> wrote:

>  The reason is actually simple.  If you run more than one Giraph worker
> per machine, there will be a port conflict.  Worse yet, imagine multiple
> Giraph jobs running simultaneously running on a cluster, hence we have the
> increase port strategy.  It would be straightforward to add a configurable
> option to use a single port though for situations such as yours though
> (especially since you know where the code is now).
>
> Avery
>
>
> On 11/22/13 11:19 AM, Larry Compton wrote:
>
>  Avery,
>
>  It looks like the ports are being allocated the way we suspected (30000 +
> task ID). That's a problem for us because we'll have to open a wide bank of
> ports (the SAs want to minimize open ports) and also keep them available
> for use by Giraph. Ideally, the port allocation would take the host into
> consideration. If you ask for 200 workers and they're each running on a
> different host, port 30000 could be used by every Netty server. The way
> it's working now, a different port is being allocated per worker, which
> appears unnecessary. Is there a reason a different port is used per
> worker/task?
>
> Is this still the way ports are allocated in Giraph 1.1.0?
>
>  Larry
>
>
> On Fri, Nov 22, 2013 at 1:18 PM, Avery Ching <[email protected]> wrote:
>
>> The port logic is a bit complex, but all encapsulated in NettyServer.java
>> (see below).
>>
>> If nothing else is running on those ports and you really only have one
>> giraph worker per port you should be good to go.  Can you look at the logs
>> for the worker that is trying to start a port other than base port + taskId?
>>
>>
>>     int taskId = conf.getTaskPartition();
>>     int numTasks = conf.getInt("mapred.map.tasks", 1);
>>     // Number of workers + 1 for master
>>     int numServers = conf.getInt(GiraphConstants.MAX_WORKERS, numTasks) +
>> 1;
>>     int portIncrementConstant =
>>         (int) Math.pow(10, Math.ceil(Math.log10(numServers)));
>>     int bindPort = GiraphConstants.IPC_INITIAL_PORT.get(conf) + taskId;
>>     int bindAttempts = 0;
>>     final int maxIpcPortBindAttempts =
>> MAX_IPC_PORT_BIND_ATTEMPTS.get(conf);
>>     final boolean failFirstPortBindingAttempt =
>> GiraphConstants.FAIL_FIRST_IPC_PORT_BIND_ATTEMPT.get(conf);
>>
>>     // Simple handling of port collisions on the same machine while
>>     // preserving debugability from the port number alone.
>>     // Round up the max number of workers to the next power of 10 and use
>>     // it as a constant to increase the port number with.
>>     while (bindAttempts < maxIpcPortBindAttempts) {
>>       this.myAddress = new InetSocketAddress(localHostname, bindPort);
>>       if (failFirstPortBindingAttempt && bindAttempts == 0) {
>>         if (LOG.isInfoEnabled()) {
>>           LOG.info("start: Intentionally fail first " +
>>               "binding attempt as giraph.failFirstIpcPortBindAttempt " +
>>               "is true, port " + bindPort);
>>         }
>>         ++bindAttempts;
>>         bindPort += portIncrementConstant;
>>         continue;
>>       }
>>
>>       try {
>>         Channel ch = bootstrap.bind(myAddress);
>>         accepted.add(ch);
>>
>>         break;
>>       } catch (ChannelException e) {
>>         LOG.warn("start: Likely failed to bind on attempt " +
>>             bindAttempts + " to port " + bindPort, e);
>>         ++bindAttempts;
>>         bindPort += portIncrementConstant;
>>       }
>>     }
>>     if (bindAttempts == maxIpcPortBindAttempts || myAddress == null) {
>>       throw new IllegalStateException(
>>           "start: Failed to start NettyServer with " +
>>               bindAttempts + " attempts");
>>
>>     }
>>
>>
>>
>> On 11/22/13 9:15 AM, Larry Compton wrote:
>>
>>> My teammates and I are running Giraph on a cluster where a firewall is
>>> configured on each compute node. We had 100 ports opened on the compute
>>> nodes, which we thought would be more than enough to accommodate a large
>>> number of workers. However, we're unable to go beyond about 90 workers with
>>> our Giraph jobs, due to Netty ports being allocated outside of the range
>>> (30000-30100). We're not sure why this is happening. We shouldn't be
>>> running more than one worker per compute node, so we were assuming that
>>> only port 30000 would be used, but we're routinely seeing Giraph try to use
>>> ports greater than 30100 when we request close to 100 workers. This leads
>>> us to believe that a simple one up numbering scheme is being used that
>>> doesn't take the host into consideration, although this is only speculation.
>>>
>>> Is there a way around this problem? Our system admins understandably
>>> balked at opening 1000 ports.
>>>
>>> Larry
>>>
>>
>>
>
>
>

Reply via email to