Brings us back to the feedback that John Omernik has given time and time again that we need to improve the error/warning messages. Wasn't this discussed at a session on the user experience at the Drill hackathon this week?
Best, Saurabh On Thu, Sep 21, 2017 at 10:36 AM, Kunal Khatua <[email protected]> wrote: > Glad you could resolve it, Alan! > > The fact that bumping up the memory settings worked means you might be > hitting frequent garbage collection pauses that slowed down your system to > the point that it timed out. > > Could you share what is the config of the nodes and what were the memory > settings that worked for you? That way, others can also benefit from your > experience. > > > > -----Original Message----- > From: Alan Höng [mailto:[email protected]] > Sent: Thursday, September 21, 2017 2:25 AM > To: [email protected] > Subject: Re: User client timeout with results > 2M rows > > It was resolved by increasing heap memory and direct memory as suggested > in the thread. Still the error message is quite confusing. > > Thanks for your help! > > On Thu, 21 Sep 2017 at 10:07 Alan Höng <[email protected]> wrote: > > > Thanks for the search unfortunately no result from there seemed to fit > > my problem. I think what comes closest to my issue is the following > > thread, which I found on my initial search: > > https://lists.apache.org/[email protected]: > gte=1d:user%20client%20closed%20unexpectedly. > > I'm trying to adjust resources now but I hardly doubt that this is the > > problem as I'm running the query on quite a big machine 64 cores > > ~500GB RAM. > > > > Here is the stacktrace from the drillbit log: > > > > 2017-09-20 14:54:23,860 > > [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0] > > INFO o.a.d.e.w.f.FragmentStatusReporter - > > 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED > > 2017-09-20 14:56:24,741 [UserServer-1] INFO > > o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010 > > <--> /172.19.0.3:52382 (user client) timed out. Timeout was set to 30 > > seconds. Closing connection. > > 2017-09-20 14:56:24,755 [UserServer-1] INFO > > o.a.d.e.w.fragment.FragmentExecutor - > > 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested > > RUNNING > > --> FAILED > > 2017-09-20 14:56:24,763 [UserServer-1] WARN > > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to > > fail rpc response. > > java.lang.IllegalArgumentException: Self-suppression not permitted at > > java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131] > > at > > org.apache.drill.common.DeferredException.addException(DeferredExcepti > > on.java:88) > > ~[drill-common-1.9.0.jar:1.9.0] > > at > > org.apache.drill.common.DeferredException.addThrowable(DeferredExcepti > > on.java:97) > > ~[drill-common-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExec > > utor.java:407) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(Fragme > > ntExecutor.java:55) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl > > .fail(FragmentExecutor.java:421) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:20 > > 8) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav > > a:95) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav > > a:92) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42) > > ~[drill-java-exec-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(Reques > > tIdMap.java:134) > > ~[drill-rpc-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req > > uestIdMap.java:74) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req > > uestIdMap.java:64) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:6 > > 92) > > [hppc-0.7.1.jar:na] > > at > > org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java > > :58) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnect > > ion.java:175) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet > > e(RpcBus.java:167) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet > > e(RpcBus.java:146) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise > > .java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromis > > e.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise > > .java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java > > :406) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromis > > e.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel > > .java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChann > > el.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel. > > java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChann > > elPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha > > nnelHandlerContext.java:615) > > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa > > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHa > > ndlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha > > nnelHandlerContext.java:615) > > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa > > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa > > ndlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHa > > ndler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final] > > at > > org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readT > > imedOut(BasicServer.java:121) > > [drill-rpc-1.9.0.jar:1.9.0] > > at > > io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTi > > meoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask. > > java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.j > > ava:120) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleT > > hreadEventExecutor.java:357) > > [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadE > > ventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] > > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: > > org.apache.drill.exec.rpc.ChannelClosedException: Channel closed > > /172.19.0.6:31010 <--> /172.19.0.3:52382. > > at > > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet > > e(RpcBus.java:166) > > [drill-rpc-1.9.0.jar:1.9.0] > > ... 25 common frames omitted > > > > On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <[email protected]> wrote: > > > >> The client error reported is usually trimmed. What is the stack trace > >> in the drillbit logs? That will tell you exactly where the timeout > >> occurred and make it easier for you to work with. > >> > >> I did a quick browse through for 'S3 Timeout' ( > >> https://lists.apache.org/[email protected]:gte=1d:%20S3 > >> %20timeout ). You could browse through to see if any suggestions here > >> can help unblock you. > >> > >> > >> > >> -----Original Message----- > >> From: Alan Höng [mailto:[email protected]] > >> Sent: Wednesday, September 20, 2017 12:22 PM > >> To: [email protected] > >> Subject: Re: User client timeout with results > 2M rows > >> > >> Yes it takes about 2-3min for the timeout to appear the query itself > >> should finish in that time. The files are not that big for debugging. > >> I have, but I couldn't find anything relevant or helpful in my > >> situation so far. > >> > >> > >> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <[email protected]> wrote: > >> > >> > Do you know in how much time does this timeout occur? There might > >> > be some tuning needed to increase a timeout. Also, I think this (S3 > >> > specifically) has been seen before... So you might find a solution > >> > within the mailing list archives. Did you try looking there? > >> > > >> > > >> > > >> > From: Alan Höng > >> > Sent: Wednesday, September 20, 8:46 AM > >> > Subject: User client timeout with results > 2M rows > >> > To: [email protected] > >> > > >> > > >> > Hello, > >> > > >> > I'm getting errors when trying to fetch results from drill with > >> > queries that evaluate to bigger tables. Surprisingly it works like > >> > a charm if the returned table has less than 2M rows. It also seems > >> > like the query is executed and finishes successfully.... > >> > > >> > I'm querying parquet files with GZIP compression on S3. I'm running > >> > drill in distributed mode with zookeeper. I use version 1.9 from > >> > the container available on dockerhub "harisekhon/apache-drill:1.9". > >> > I'm using the pydrill package which uses the rest api to submit > >> > queries and > >> gather results. > >> > > >> > I get the following error message from the client: > >> > > >> > TransportError(500, '{\n "errorMessage" : "CONNECTION ERROR: > >> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <--> > >> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user > >> > client) closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id: > >> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}' > >> > > >> > I would appreciate any help with this. > >> > > >> > Best > >> > Alan Höng > >> > > >> > > >> > > >> > > >
