+1 on John's suggestion. On Fri, Mar 3, 2017 at 6:24 AM, John Omernik <[email protected]> wrote:
> So your node has 32G of ram yet you are allowing Drill to use 36G. I would > change your settings to be 8GB of Heap, and 22GB of Direct Memory. See if > this helps with your issues. Also, are you using a distributed filesystem? > If so you may want to allow even more free ram...i.e. 8GB of Heap and 20GB > of Direct. > > On Fri, Mar 3, 2017 at 8:20 AM, Anup Tiwari <[email protected]> > wrote: > > > Hi, > > > > Please find our configuration details :- > > > > Number of Nodes : 4 > > RAM/Node : 32GB > > Core/Node : 8 > > DRILL_MAX_DIRECT_MEMORY="20G" > > DRILL_HEAP="16G" > > > > And all other variables are set to default. > > > > Since we have tried some of the settings suggested above but still facing > > this issue more frequently, kindly suggest us what is best configuration > > for our environment. > > > > Regards, > > *Anup Tiwari* > > > > On Thu, Mar 2, 2017 at 1:26 AM, John Omernik <[email protected]> wrote: > > > > > Another thing to consider is ensure you have a Spill Location setup, > and > > > then disable hashagg/hashjoin for the query... > > > > > > On Wed, Mar 1, 2017 at 1:25 PM, Abhishek Girish <[email protected]> > > > wrote: > > > > > > > Hey Anup, > > > > > > > > This is indeed an issue, and I can understand that having an unstable > > > > environment is not something anyone wants. DRILL-4708 is still > > > unresolved - > > > > hopefully someone will get to it soon. I've bumped up the priority. > > > > > > > > Unfortunately we do not publish any sizing guidelines, so you'd have > to > > > > experiment to settle on the right load for your cluster. Please > > decrease > > > > the concurrency (number of queries running in parallel). And try > > bumping > > > up > > > > Drill DIRECT memory. Also, please set the system options recommended > by > > > > Sudheesh. While this may not solve the issue, it may help reduce it's > > > > occurrence. > > > > > > > > Can you also update the JIRA with your configurations, type of > queries > > > and > > > > the relevant logs? > > > > > > > > -Abhishek > > > > > > > > On Wed, Mar 1, 2017 at 10:17 AM, Anup Tiwari < > > [email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Can someone look into it? As we are now getting this more > frequently > > in > > > > > Adhoc queries as well. > > > > > And for automation jobs, we are moving to Hive as in drill we are > > > getting > > > > > this more frequently. > > > > > > > > > > Regards, > > > > > *Anup Tiwari* > > > > > > > > > > On Sat, Dec 31, 2016 at 12:11 PM, Anup Tiwari < > > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > We are getting this issue bit more frequently. can someone please > > > look > > > > > > into it and tell us that why it is happening since as mention in > > > > earlier > > > > > > mail when this query gets executed no other query is running at > > that > > > > > time. > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > Regards, > > > > > > *Anup Tiwari* > > > > > > > > > > > > On Sat, Dec 24, 2016 at 10:20 AM, Anup Tiwari < > > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > >> Hi Sudheesh, > > > > > >> > > > > > >> Please find below ans :- > > > > > >> > > > > > >> 1. Total 4,(3 Datanodes, 1 namenode) > > > > > >> 2. Only one query, as this query is part of daily dump and runs > in > > > > early > > > > > >> morning. > > > > > >> > > > > > >> And as @chun mentioned , it seems similar to DRILL-4708 , so any > > > > update > > > > > >> on progress of this ticket? > > > > > >> > > > > > >> > > > > > >> On 22-Dec-2016 12:13 AM, "Sudheesh Katkam" < > [email protected]> > > > > > wrote: > > > > > >> > > > > > >> Two more questions.. > > > > > >> > > > > > >> (1) How many nodes in your cluster? > > > > > >> (2) How many queries are running when the failure is seen? > > > > > >> > > > > > >> If you have multiple large queries running at the same time, the > > > load > > > > on > > > > > >> the system could cause those failures (which are heartbeat > > related). > > > > > >> > > > > > >> The two options I suggested decrease the parallelism of stages > in > > a > > > > > >> query, this implies lesser load but slower execution. > > > > > >> > > > > > >> System level option affect all queries, and session level affect > > > > queries > > > > > >> on a specific connection. Not sure what is preferred in your > > > > > environment. > > > > > >> > > > > > >> Also, you may be interested in metrics. More info here: > > > > > >> > > > > > >> http://drill.apache.org/docs/monitoring-metrics/ < > > > > > >> http://drill.apache.org/docs/monitoring-metrics/> > > > > > >> > > > > > >> Thank you, > > > > > >> Sudheesh > > > > > >> > > > > > >> > On Dec 21, 2016, at 4:31 AM, Anup Tiwari < > > > [email protected] > > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > @sudheesh, yes drill bit is running on > > datanodeN/10.*.*.5:31010). > > > > > >> > > > > > > >> > Can you tell me how this will impact to query and do i have to > > set > > > > > this > > > > > >> at > > > > > >> > session level OR system level? > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > Regards, > > > > > >> > *Anup Tiwari* > > > > > >> > > > > > > >> > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang < > > [email protected] > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> >> I am pretty sure this is the same as DRILL-4708. > > > > > >> >> > > > > > >> >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam < > > > > > >> [email protected]> > > > > > >> >> wrote: > > > > > >> >> > > > > > >> >>> Is the drillbit service (running on > datanodeN/10.*.*.5:31010) > > > > > actually > > > > > >> >>> down when the error is seen? > > > > > >> >>> > > > > > >> >>> If not, try lowering parallelism using these two session > > > options, > > > > > >> before > > > > > >> >>> running the queries: > > > > > >> >>> > > > > > >> >>> planner.width.max_per_node (decrease this) > > > > > >> >>> planner.slice_target (increase this) > > > > > >> >>> > > > > > >> >>> Thank you, > > > > > >> >>> Sudheesh > > > > > >> >>> > > > > > >> >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari < > > > > > [email protected] > > > > > >> > > > > > > >> >>> wrote: > > > > > >> >>>> > > > > > >> >>>> Hi Team, > > > > > >> >>>> > > > > > >> >>>> We are running some drill automation script on a daily > basis > > > and > > > > we > > > > > >> >> often > > > > > >> >>>> see that some query gets failed frequently by giving below > > > error > > > > , > > > > > >> >> Also i > > > > > >> >>>> came across DRILL-4708 <https://issues.apache.org/ > > > > > >> >> jira/browse/DRILL-4708 > > > > > >> >>>> > > > > > >> >>>> which seems similar, Can anyone give me update on that OR > > > > > workaround > > > > > >> to > > > > > >> >>>> avoid such issue ? > > > > > >> >>>> > > > > > >> >>>> *Stack Trace :-* > > > > > >> >>>> > > > > > >> >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <--> > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. > > > > > Drillbit > > > > > >> >>> down? > > > > > >> >>>> > > > > > >> >>>> > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > > > > (state=,code=0) > > > > > >> >>>> java.sql.SQLException: CONNECTION ERROR: Connection > > > > /10.*.*.1:41613 > > > > > >> >> <--> > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. > > > > Drillb > > > > > >> >>>> it down? > > > > > >> >>>> > > > > > >> >>>> > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally( > > > > > >> >>> DrillCursor.java:232) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema( > > > > > >> >>> DrillCursor.java:275) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( > > > > > >> >>> DrillResultSetImpl.java:1943) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( > > > > > >> >>> DrillResultSetImpl.java:76) > > > > > >> >>>> at > > > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection$1.execute( > > > > > >> >>> AvaticaConnection.java:473) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillMetaImpl. > prepareAndExecute( > > > > > >> >>> DrillMetaImpl.java:465) > > > > > >> >>>> at > > > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection. > > > > > >> >> prepareAndExecuteInternal( > > > > > >> >>> AvaticaConnection.java:477) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl. > > > > > >> >>> prepareAndExecuteInternal(DrillConnectionImpl.java:169) > > > > > >> >>>> at > > > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement. > executeInternal( > > > > > >> >>> AvaticaStatement.java:109) > > > > > >> >>>> at > > > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.execute( > > > > > >> >>> AvaticaStatement.java:121) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute( > > > > > >> >>> DrillStatementImpl.java:101) > > > > > >> >>>> at sqlline.Commands.execute(Commands.java:841) > > > > > >> >>>> at sqlline.Commands.sql(Commands.java:751) > > > > > >> >>>> at sqlline.SqlLine.dispatch(SqlLine.java:746) > > > > > >> >>>> at sqlline.SqlLine.runCommands(SqlLine.java:1651) > > > > > >> >>>> at sqlline.Commands.run(Commands.java:1304) > > > > > >> >>>> at sun.reflect.NativeMethodAccessorImpl. > invoke0(Native > > > > > Method) > > > > > >> >>>> at > > > > > >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke( > > > > > >> >>> NativeMethodAccessorImpl.java:62) > > > > > >> >>>> at > > > > > >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > > > > > >> >>> DelegatingMethodAccessorImpl.java:43) > > > > > >> >>>> at java.lang.reflect.Method.invoke(Method.java:498) > > > > > >> >>>> at > > > > > >> >>>> sqlline.ReflectiveCommandHandler.execute( > > > > > >> >> ReflectiveCommandHandler.java: > > > > > >> >>> 36) > > > > > >> >>>> at sqlline.SqlLine.dispatch(SqlLine.java:742) > > > > > >> >>>> at sqlline.SqlLine.initArgs(SqlLine.java:553) > > > > > >> >>>> at sqlline.SqlLine.begin(SqlLine.java:596) > > > > > >> >>>> at sqlline.SqlLine.start(SqlLine.java:375) > > > > > >> >>>> at sqlline.SqlLine.main(SqlLine.java:268) > > > > > >> >>>> Caused by: org.apache.drill.common. > exceptions.UserException: > > > > > >> >> CONNECTION > > > > > >> >>>> ERROR: Connection /10.*.*.1:41613 <--> > > datanodeN/10.*.*.5:31010 > > > > > (user > > > > > >> >>>> client) closed unexpectedly. Drillbit down? > > > > > >> >>>> > > > > > >> >>>> > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.common.exceptions.UserException$ > > > > > >> >>> Builder.build(UserException.java:543) > > > > > >> >>>> at > > > > > >> >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$ > > > > > >> >>> ChannelClosedHandler$1.operationComplete( > QueryResultHandler. > > > > > java:373) > > > > > >> >>>> at > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListener0( > > > > > >> >>> DefaultPromise.java:680) > > > > > >> >>>> at > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners0( > > > > > >> >>> DefaultPromise.java:603) > > > > > >> >>>> at > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners( > > > > > >> >>> DefaultPromise.java:563) > > > > > >> >>>> at > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.trySuccess( > > > > > >> >>> DefaultPromise.java:406) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.DefaultChannelPromise.trySuccess( > > > > > >> >>> DefaultChannelPromise.java:82) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.AbstractChannel$CloseFuture. > > > > > >> >> setClosed(AbstractChannel. > > > > > >> >>> java:943) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0( > > > > > >> >>> AbstractChannel.java:592) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.close( > > > > > >> >>> AbstractChannel.java:584) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$ > NioByteUnsafe.cl > > > > > >> oseOnRead( > > > > > >> >>> AbstractNioByteChannel.java:71) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe. > > > > > >> >>> handleReadException(AbstractNioByteChannel.java:89) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$ > > > NioByteUnsafe.read( > > > > > >> >>> AbstractNioByteChannel.java:162) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKey( > > > > > >> >>> NioEventLoop.java:511) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.NioEventLoop. > > > processSelectedKeysOptimized( > > > > > >> >>> NioEventLoop.java:468) > > > > > >> >>>> at > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys( > > > > > >> >>> NioEventLoop.java:382) > > > > > >> >>>> at io.netty.channel.nio.NioEventL > > > > > >> oop.run(NioEventLoop.java:354) > > > > > >> >>>> at > > > > > >> >>>> io.netty.util.concurrent.SingleThreadEventExecutor$2. > > > > > >> >>> run(SingleThreadEventExecutor.java:111) > > > > > >> >>>> at java.lang.Thread.run(Thread.java:745) > > > > > >> >>>> > > > > > >> >>>> > > > > > >> >>>> Regards, > > > > > >> >>>> *Anup Tiwari* > > > > > >> >>> > > > > > >> >>> > > > > > >> >> > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > >
