>From the logs it looks like network drop between nodes. If it fails on exact time say 10 min than check with firewall settings. On Feb 10, 2016 12:27 AM, "Paul Friedman" <[email protected]> wrote:
> Hello... > > I'm executing a long-running Drill (1.4) query (4-10mins) called via JDBC > from Talend and sometimes I'm seeing an error stack like this (see below) > > The query is a select statement with an order by against a directory of > Parquet files which were produced by Spark. Probably half the time it > succeeds and returns the expected results, but often it's erroring out as > below. > > Can you help with any insights? > > Thanks in advance. > > ---Paul > > ... > 2016-02-08 16:47:47,275 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] > INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State change requested RUNNING > --> > FINISHED > 2016-02-08 16:47:47,276 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] > INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State to report: FINISHED > 2016-02-08 16:48:25,496 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested RUNNING > --> > FAILED > 2016-02-08 16:48:25,778 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] > INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> > FAILED > 2016-02-08 16:48:25,779 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> > FAILED > 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> > CANCELLATION_REQUESTED > 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: Ignoring unexpected state > transition FAILED --> CANCELLATION_REQUESTED > 2016-02-08 16:48:25,779 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] > INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> > FAILED > 2016-02-08 16:48:25,780 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] > INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> > FINISHED > 2016-02-08 16:48:25,781 [UserServer-1] WARN > o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed > channel. > Connection: /172.20.20.154:31010 <--> /172.20.20.157:64101 (user client) > java.nio.channels.ClosedChannelException: null > 2016-02-08 16:48:25,783 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > ChannelClosedException: Channel closed /172.20.20.154:31010 <--> > /172.20.20.157:64101. > > Fragment 0:0 > > [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on > chai.dev.streetlightdata.com:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > ChannelClosedException: Channel closed /172.20.20.154:31010 <--> > /172.20.20.157:64101. > > Fragment 0:0 > > [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on > chai.dev.streetlightdata.com:31010] > at > > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.4.0.jar:1.4.0] > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) > [drill-java-exec-1.4.0.jar:1.4.0] > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) > [drill-java-exec-1.4.0.jar:1.4.0] > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) > [drill-java-exec-1.4.0.jar:1.4.0] > at > > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.4.0.jar:1.4.0] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_66] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_66] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] > Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed > /172.20.20.154:31010 <--> /172.20.20.157:64101. > at > > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:175) > ~[drill-rpc-1.4.0.jar:1.4.0] > at > > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:151) > ~[drill-rpc-1.4.0.jar:1.4.0] > at > > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) > ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) > ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) > ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) > ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.closeOnRead(AbstractEpollStreamChannel.java:409) > ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:647) > ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollRdHupReady(AbstractEpollStreamChannel.java:573) > ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:315) > ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > ... 1 common frames omitted > 2016-02-08 16:48:25,785 [drill-executor-42] WARN > o.a.d.exec.rpc.control.WorkEventBus - Fragment > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0 not found in the work bus. > 2016-02-08 16:48:25,810 [CONTROL-rpc-event-queue] WARN > o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED > state as query is already at CANCELED state (which is terminal). > 2016-02-08 16:48:25,811 [UserServer-1] INFO > o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate > query > result to initiating client. This would happen if a client is disconnected > before response notice can be sent. > org.apache.drill.exec.rpc.ChannelClosedException: null > at > > org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89) > [drill-rpc-1.4.0.jar:1.4.0] > at > > org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67) > [drill-rpc-1.4.0.jar:1.4.0] > at > > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] > 2016-02-08 16:48:25,812 [UserServer-1] WARN > o.a.drill.exec.work.foreman.Foreman - Dropping request to move to FAILED > state as query is already at CANCELED state (which is terminal). >
