Scott, Do you have logs for the entire job. I haven't seen this error before . The trace may be end result of an earlier failure / decision made to kill the task - which causes the task to be interrupted, and hence the trace.
Thanks, Sid On Wed, Jun 29, 2016 at 10:00 AM, Scott McCarty <[email protected]> wrote: > Hi, > > I am trying to get Tez 0.9.0-SNAPSHOT (latest commit as of this writing, > but still fails with earlier 0.9.0 commits) working with vanilla hadoop > 2.6.0 but it's failing with the following under certain conditions: > > java.lang.RuntimeException: java.io.IOException: Failed on local > exception: java.nio.channels.ClosedByInterruptException; Host Details : > local host is: "localhost/127.0.0.1"; destination host is: > "localhost":9000; > at > org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:209) > at > org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initialize(TezGroupedSplitsInputFormat.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:157) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:88) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:694) > at > org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:622) > at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:586) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792) > at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "localhost/127.0.0.1"; destination host is: "localhost":9000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114) > at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1750) > at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774) > at > org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54) > at > org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:207) > ... 11 more > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > ... 31 more > > This happens only if the DAG I've submitted has an Edge connecting nodes > in it--if I submit a DAG with just a Vertex then it it works as expected. > Comparing the log files for the working case and the failing case shows > that the working case does a SASL negotiation whereas the failing case > doesn't. I have no idea if that's significant or not. > > What generally can cause this ClosedByInterruptException? Searching > indicates protocol version mismatch could cause it also. > > My hadoop setup is a single node (pseudo-distributed) cluster using the > 2.6.0 tarball directly from Apache. My core-site.xml has fs.defaultFS set > to hdfs://localhost:9000 and that's it. > > Any ideas on how to solve this would be most appreciated! > > --Scott > >
