Another thing to consider is ensure you have a Spill Location setup, and then disable hashagg/hashjoin for the query...
On Wed, Mar 1, 2017 at 1:25 PM, Abhishek Girish <[email protected]> wrote: > Hey Anup, > > This is indeed an issue, and I can understand that having an unstable > environment is not something anyone wants. DRILL-4708 is still unresolved - > hopefully someone will get to it soon. I've bumped up the priority. > > Unfortunately we do not publish any sizing guidelines, so you'd have to > experiment to settle on the right load for your cluster. Please decrease > the concurrency (number of queries running in parallel). And try bumping up > Drill DIRECT memory. Also, please set the system options recommended by > Sudheesh. While this may not solve the issue, it may help reduce it's > occurrence. > > Can you also update the JIRA with your configurations, type of queries and > the relevant logs? > > -Abhishek > > On Wed, Mar 1, 2017 at 10:17 AM, Anup Tiwari <[email protected]> > wrote: > > > Hi, > > > > Can someone look into it? As we are now getting this more frequently in > > Adhoc queries as well. > > And for automation jobs, we are moving to Hive as in drill we are getting > > this more frequently. > > > > Regards, > > *Anup Tiwari* > > > > On Sat, Dec 31, 2016 at 12:11 PM, Anup Tiwari <[email protected] > > > > wrote: > > > > > Hi, > > > > > > We are getting this issue bit more frequently. can someone please look > > > into it and tell us that why it is happening since as mention in > earlier > > > mail when this query gets executed no other query is running at that > > time. > > > > > > Thanks in advance. > > > > > > Regards, > > > *Anup Tiwari* > > > > > > On Sat, Dec 24, 2016 at 10:20 AM, Anup Tiwari < > [email protected] > > > > > > wrote: > > > > > >> Hi Sudheesh, > > >> > > >> Please find below ans :- > > >> > > >> 1. Total 4,(3 Datanodes, 1 namenode) > > >> 2. Only one query, as this query is part of daily dump and runs in > early > > >> morning. > > >> > > >> And as @chun mentioned , it seems similar to DRILL-4708 , so any > update > > >> on progress of this ticket? > > >> > > >> > > >> On 22-Dec-2016 12:13 AM, "Sudheesh Katkam" <[email protected]> > > wrote: > > >> > > >> Two more questions.. > > >> > > >> (1) How many nodes in your cluster? > > >> (2) How many queries are running when the failure is seen? > > >> > > >> If you have multiple large queries running at the same time, the load > on > > >> the system could cause those failures (which are heartbeat related). > > >> > > >> The two options I suggested decrease the parallelism of stages in a > > >> query, this implies lesser load but slower execution. > > >> > > >> System level option affect all queries, and session level affect > queries > > >> on a specific connection. Not sure what is preferred in your > > environment. > > >> > > >> Also, you may be interested in metrics. More info here: > > >> > > >> http://drill.apache.org/docs/monitoring-metrics/ < > > >> http://drill.apache.org/docs/monitoring-metrics/> > > >> > > >> Thank you, > > >> Sudheesh > > >> > > >> > On Dec 21, 2016, at 4:31 AM, Anup Tiwari <[email protected] > > > > >> wrote: > > >> > > > >> > @sudheesh, yes drill bit is running on datanodeN/10.*.*.5:31010). > > >> > > > >> > Can you tell me how this will impact to query and do i have to set > > this > > >> at > > >> > session level OR system level? > > >> > > > >> > > > >> > > > >> > Regards, > > >> > *Anup Tiwari* > > >> > > > >> > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang <[email protected]> > > >> wrote: > > >> > > > >> >> I am pretty sure this is the same as DRILL-4708. > > >> >> > > >> >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam < > > >> [email protected]> > > >> >> wrote: > > >> >> > > >> >>> Is the drillbit service (running on datanodeN/10.*.*.5:31010) > > actually > > >> >>> down when the error is seen? > > >> >>> > > >> >>> If not, try lowering parallelism using these two session options, > > >> before > > >> >>> running the queries: > > >> >>> > > >> >>> planner.width.max_per_node (decrease this) > > >> >>> planner.slice_target (increase this) > > >> >>> > > >> >>> Thank you, > > >> >>> Sudheesh > > >> >>> > > >> >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari < > > [email protected] > > >> > > > >> >>> wrote: > > >> >>>> > > >> >>>> Hi Team, > > >> >>>> > > >> >>>> We are running some drill automation script on a daily basis and > we > > >> >> often > > >> >>>> see that some query gets failed frequently by giving below error > , > > >> >> Also i > > >> >>>> came across DRILL-4708 <https://issues.apache.org/ > > >> >> jira/browse/DRILL-4708 > > >> >>>> > > >> >>>> which seems similar, Can anyone give me update on that OR > > workaround > > >> to > > >> >>>> avoid such issue ? > > >> >>>> > > >> >>>> *Stack Trace :-* > > >> >>>> > > >> >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <--> > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. > > Drillbit > > >> >>> down? > > >> >>>> > > >> >>>> > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > (state=,code=0) > > >> >>>> java.sql.SQLException: CONNECTION ERROR: Connection > /10.*.*.1:41613 > > >> >> <--> > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. > Drillb > > >> >>>> it down? > > >> >>>> > > >> >>>> > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally( > > >> >>> DrillCursor.java:232) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema( > > >> >>> DrillCursor.java:275) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( > > >> >>> DrillResultSetImpl.java:1943) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( > > >> >>> DrillResultSetImpl.java:76) > > >> >>>> at > > >> >>>> org.apache.calcite.avatica.AvaticaConnection$1.execute( > > >> >>> AvaticaConnection.java:473) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute( > > >> >>> DrillMetaImpl.java:465) > > >> >>>> at > > >> >>>> org.apache.calcite.avatica.AvaticaConnection. > > >> >> prepareAndExecuteInternal( > > >> >>> AvaticaConnection.java:477) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl. > > >> >>> prepareAndExecuteInternal(DrillConnectionImpl.java:169) > > >> >>>> at > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.executeInternal( > > >> >>> AvaticaStatement.java:109) > > >> >>>> at > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.execute( > > >> >>> AvaticaStatement.java:121) > > >> >>>> at > > >> >>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute( > > >> >>> DrillStatementImpl.java:101) > > >> >>>> at sqlline.Commands.execute(Commands.java:841) > > >> >>>> at sqlline.Commands.sql(Commands.java:751) > > >> >>>> at sqlline.SqlLine.dispatch(SqlLine.java:746) > > >> >>>> at sqlline.SqlLine.runCommands(SqlLine.java:1651) > > >> >>>> at sqlline.Commands.run(Commands.java:1304) > > >> >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > > Method) > > >> >>>> at > > >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke( > > >> >>> NativeMethodAccessorImpl.java:62) > > >> >>>> at > > >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > > >> >>> DelegatingMethodAccessorImpl.java:43) > > >> >>>> at java.lang.reflect.Method.invoke(Method.java:498) > > >> >>>> at > > >> >>>> sqlline.ReflectiveCommandHandler.execute( > > >> >> ReflectiveCommandHandler.java: > > >> >>> 36) > > >> >>>> at sqlline.SqlLine.dispatch(SqlLine.java:742) > > >> >>>> at sqlline.SqlLine.initArgs(SqlLine.java:553) > > >> >>>> at sqlline.SqlLine.begin(SqlLine.java:596) > > >> >>>> at sqlline.SqlLine.start(SqlLine.java:375) > > >> >>>> at sqlline.SqlLine.main(SqlLine.java:268) > > >> >>>> Caused by: org.apache.drill.common.exceptions.UserException: > > >> >> CONNECTION > > >> >>>> ERROR: Connection /10.*.*.1:41613 <--> datanodeN/10.*.*.5:31010 > > (user > > >> >>>> client) closed unexpectedly. Drillbit down? > > >> >>>> > > >> >>>> > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > > >> >>>> at > > >> >>>> org.apache.drill.common.exceptions.UserException$ > > >> >>> Builder.build(UserException.java:543) > > >> >>>> at > > >> >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$ > > >> >>> ChannelClosedHandler$1.operationComplete(QueryResultHandler. > > java:373) > > >> >>>> at > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListener0( > > >> >>> DefaultPromise.java:680) > > >> >>>> at > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners0( > > >> >>> DefaultPromise.java:603) > > >> >>>> at > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners( > > >> >>> DefaultPromise.java:563) > > >> >>>> at > > >> >>>> io.netty.util.concurrent.DefaultPromise.trySuccess( > > >> >>> DefaultPromise.java:406) > > >> >>>> at > > >> >>>> io.netty.channel.DefaultChannelPromise.trySuccess( > > >> >>> DefaultChannelPromise.java:82) > > >> >>>> at > > >> >>>> io.netty.channel.AbstractChannel$CloseFuture. > > >> >> setClosed(AbstractChannel. > > >> >>> java:943) > > >> >>>> at > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0( > > >> >>> AbstractChannel.java:592) > > >> >>>> at > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.close( > > >> >>> AbstractChannel.java:584) > > >> >>>> at > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.cl > > >> oseOnRead( > > >> >>> AbstractNioByteChannel.java:71) > > >> >>>> at > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe. > > >> >>> handleReadException(AbstractNioByteChannel.java:89) > > >> >>>> at > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read( > > >> >>> AbstractNioByteChannel.java:162) > > >> >>>> at > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKey( > > >> >>> NioEventLoop.java:511) > > >> >>>> at > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized( > > >> >>> NioEventLoop.java:468) > > >> >>>> at > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys( > > >> >>> NioEventLoop.java:382) > > >> >>>> at io.netty.channel.nio.NioEventL > > >> oop.run(NioEventLoop.java:354) > > >> >>>> at > > >> >>>> io.netty.util.concurrent.SingleThreadEventExecutor$2. > > >> >>> run(SingleThreadEventExecutor.java:111) > > >> >>>> at java.lang.Thread.run(Thread.java:745) > > >> >>>> > > >> >>>> > > >> >>>> Regards, > > >> >>>> *Anup Tiwari* > > >> >>> > > >> >>> > > >> >> > > >> > > >> > > >> > > > > > >
