Re: Getting ClosedByInterruptException when DAG w/ edge executes

Scott McCarty Fri, 01 Jul 2016 07:28:05 -0700

Thanks for responding.

After much hair pulling I found and fixed this.  It was due to my not
calling setFromConfiguration(tezConf) on OrderedPartitionedKVEdgeConfig
(other builders probably require the same call).  The comments in the
sample code say that the call is optional (allowing override of the config
with command line parameters) but that appears not to be the case, at least
for my code :-(


I also needed to make sure that the TezConfiguration I passed to it had
been used in the call UserGroupInformation.setConfigurat(tezConf).  There's
a lot of behind-the-scenes stuff I wasn't aware of...

--Scott

On Thu, Jun 30, 2016 at 3:48 PM, Siddharth Seth <[email protected]> wrote:

> Scott,
> Do you have logs for the entire job. I haven't seen this error before .
> The trace may be end result of an earlier failure / decision made to kill
> the task - which causes the task to be interrupted, and hence the trace.
>
> Thanks,
> Sid
>
> On Wed, Jun 29, 2016 at 10:00 AM, Scott McCarty <[email protected]>
> wrote:
>
>> Hi,
>>
>> I am trying to get Tez 0.9.0-SNAPSHOT (latest commit as of this writing,
>> but still fails with earlier 0.9.0 commits) working with vanilla hadoop
>> 2.6.0 but it's failing with the following under certain conditions:
>>
>> java.lang.RuntimeException: java.io.IOException: Failed on local
>> exception: java.nio.channels.ClosedByInterruptException; Host Details :
>> local host is: "localhost/127.0.0.1"; destination host is:
>> "localhost":9000;
>> at
>> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:209)
>> at
>> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initialize(TezGroupedSplitsInputFormat.java:156)
>> at
>> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:157)
>> at
>> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:88)
>> at
>> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:694)
>> at
>> org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:622)
>> at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:586)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
>> at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: Failed on local exception:
>> java.nio.channels.ClosedByInterruptException; Host Details : local host is:
>> "localhost/127.0.0.1"; destination host is: "localhost":9000;
>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>> at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
>> at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
>> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1750)
>> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
>> at
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
>> at
>> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:207)
>> ... 11 more
>> Caused by: java.nio.channels.ClosedByInterruptException
>> at
>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
>> at
>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>> at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>> ... 31 more
>>
>> This happens only if the DAG I've submitted has an Edge connecting nodes
>> in it--if I submit a DAG with just a Vertex then it it works as expected.
>> Comparing the log files for the working case and the failing case shows
>> that the working case does a SASL negotiation whereas the failing case
>> doesn't.  I have no idea if that's significant or not.
>>
>> What generally can cause this ClosedByInterruptException?  Searching
>> indicates protocol version mismatch could cause it also.
>>
>> My hadoop setup is a single node (pseudo-distributed) cluster using the
>> 2.6.0 tarball directly from Apache.  My core-site.xml has fs.defaultFS set
>> to hdfs://localhost:9000 and that's it.
>>
>> Any ideas on how to solve this would be most appreciated!
>>
>> --Scott
>>
>>
>

Re: Getting ClosedByInterruptException when DAG w/ edge executes

Reply via email to