Here are the AM logs:

2015-07-21 17:08:14,279 INFO [ServiceThread:DAGClientRPCServer]
ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
2015-07-21 17:08:14,285 INFO
[ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag]
ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
2015-07-21 17:08:14,299 INFO [Socket Reader #1 for port 46373]
ipc.Server: Starting Socket Reader #1 for port 46373
2015-07-21 17:08:14,300 INFO [Socket Reader #1 for port 37949]
ipc.Server: Starting Socket Reader #1 for port 37949
2015-07-21 17:08:14,358 INFO [IPC Server Responder] ipc.Server: IPC
Server Responder: starting
2015-07-21 17:08:14,364 INFO [IPC Server listener on 46373]
ipc.Server: IPC Server listener on 46373: starting
2015-07-21 17:08:14,364 INFO [IPC Server Responder] ipc.Server: IPC
Server Responder: starting
2015-07-21 17:08:14,365 INFO [IPC Server listener on 37949]
ipc.Server: IPC Server listener on 37949: starting
2015-07-21 17:08:14,374 INFO [ServiceThread:DAGClientRPCServer]
client.DAGClientServer: Instantiated DAGClientRPCServer at
ip-10-16-141-168.ec2.internal/10.16.141.168:46373
2015-07-21 17:08:14,377 INFO [HistoryEventHandlingThread]
impl.SimpleHistoryLoggingService: Writing event AM_LAUNCHED to history
file


The interesting thing to note is the Tez Task is trying to connect to port
37949. The DAGClientRPCServer (which uses private DNS) is instantiated on
46373. But it also starts another IPC server on 37949 though I'm not sure
what it is for.

On Tue, Jul 21, 2015 at 10:13 AM, Rajat Jain <[email protected]> wrote:

> Hi,
>
> I am running a yarn cluster on AWS. The slave nodes (NMs) are all
> configured to listen on private DNS. For example, a sample node manager
> listens on ip-10-16-141-168.ec2.internal:8042
> <https://multicluster.qubole.net/cluster-proxy?encodedUrl=http%3A%2F%2Fip-10-16-141-168.ec2.internal%3A8042%2F>
> .
>
> When I'm trying to run a Tez job (even simple ones like select count(*)
> from nation) - they fail because child tasks are unable to connect to the
> AM. The issue is they are trying to connect to the IP instead of the
> private DNS. Here's a sample log line (couple of them added by me for
> debugging):
>
> 2015-07-21 17:08:21,919 INFO [main] task.TezChild: TezChild starting
> 2015-07-21 17:08:22,310 INFO [main] task.TezChild: Using socket factory 
> class: org.apache.hadoop.net.StandardSocketFactory
> 2015-07-21 17:08:22,336 INFO [main] task.TezChild: PID, containerIdentifier:  
> 3699, container_1437498369268_0001_01_000002
> 2015-07-21 17:08:22,418 INFO [main] Configuration.deprecation: 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2015-07-21 17:08:23,025 INFO [main] task.TezChild: Got host:port: 
> 10.16.141.168:37949
> 2015-07-21 17:08:23,035 INFO [main] task.TezChild: address variables: 
> 10.16.141.168:37949
> 2015-07-21 17:08:23,143 INFO [TezChild] task.ContainerReporter: Attempting to 
> fetch new task
> 2015-07-21 17:08:24,201 INFO [TezChild] ipc.Client: Retrying connect to 
> server: 10.16.141.168/10.16.141.168:37949. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
> MILLISECONDS)
> 2015-07-21 17:08:25,202 INFO [TezChild] ipc.Client: Retrying connect to 
> server: 10.16.141.168/10.16.141.168:37949. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
> MILLISECONDS)
> 2015-07-21 17:08:26,757 INFO [TezChild] ipc.Client: Retrying connect to 
> server: 10.16.141.168/10.16.141.168:37949. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
> MILLISECONDS)
> 2015-07-21 17:08:27,758 INFO [TezChild] ipc.Client: Retrying connect to 
> server: 10.16.141.168/10.16.141.168:37949. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
> MILLISECONDS)
>
>
> The task ultimately fails. Any idea how this can be fixed? These jobs ran
> fine on Tez 0.4.1.
>
> Thanks,
> Rajat
>

Reply via email to