So your node has 32G of ram yet you are allowing Drill to use 36G.  I would
change your settings to be 8GB of Heap, and 22GB of Direct Memory. See if
this helps with your issues.  Also, are you using a distributed filesystem?
If so you may want to allow even more free ram...i.e. 8GB of Heap and 20GB
of Direct.

On Fri, Mar 3, 2017 at 8:20 AM, Anup Tiwari <[email protected]>
wrote:

> Hi,
>
> Please find our configuration details :-
>
> Number of Nodes : 4
> RAM/Node : 32GB
> Core/Node : 8
> DRILL_MAX_DIRECT_MEMORY="20G"
> DRILL_HEAP="16G"
>
> And all other variables are set to default.
>
> Since we have tried some of the settings suggested above but still facing
> this issue more frequently, kindly suggest us what is best configuration
> for our environment.
>
> Regards,
> *Anup Tiwari*
>
> On Thu, Mar 2, 2017 at 1:26 AM, John Omernik <[email protected]> wrote:
>
> > Another thing to consider is ensure you have a Spill Location setup, and
> > then disable hashagg/hashjoin for the query...
> >
> > On Wed, Mar 1, 2017 at 1:25 PM, Abhishek Girish <[email protected]>
> > wrote:
> >
> > > Hey Anup,
> > >
> > > This is indeed an issue, and I can understand that having an unstable
> > > environment is not something anyone wants. DRILL-4708 is still
> > unresolved -
> > > hopefully someone will get to it soon. I've bumped up the priority.
> > >
> > > Unfortunately we do not publish any sizing guidelines, so you'd have to
> > > experiment to settle on the right load for your cluster. Please
> decrease
> > > the concurrency (number of queries running in parallel). And try
> bumping
> > up
> > > Drill DIRECT memory. Also, please set the system options recommended by
> > > Sudheesh. While this may not solve the issue, it may help reduce it's
> > > occurrence.
> > >
> > > Can you also update the JIRA with your configurations, type of queries
> > and
> > > the relevant logs?
> > >
> > > -Abhishek
> > >
> > > On Wed, Mar 1, 2017 at 10:17 AM, Anup Tiwari <
> [email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone look into it? As we are now getting this more frequently
> in
> > > > Adhoc queries as well.
> > > > And for automation jobs, we are moving to Hive as in drill we are
> > getting
> > > > this more frequently.
> > > >
> > > > Regards,
> > > > *Anup Tiwari*
> > > >
> > > > On Sat, Dec 31, 2016 at 12:11 PM, Anup Tiwari <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We are getting this issue bit more frequently. can someone please
> > look
> > > > > into it and tell us that why it is happening since as mention in
> > > earlier
> > > > > mail when this query gets executed no other query is running at
> that
> > > > time.
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > > > Regards,
> > > > > *Anup Tiwari*
> > > > >
> > > > > On Sat, Dec 24, 2016 at 10:20 AM, Anup Tiwari <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Sudheesh,
> > > > >>
> > > > >> Please find below ans :-
> > > > >>
> > > > >> 1. Total 4,(3 Datanodes, 1 namenode)
> > > > >> 2. Only one query, as this query is part of daily dump and runs in
> > > early
> > > > >> morning.
> > > > >>
> > > > >> And as @chun mentioned , it seems similar to DRILL-4708 , so any
> > > update
> > > > >> on progress of this ticket?
> > > > >>
> > > > >>
> > > > >> On 22-Dec-2016 12:13 AM, "Sudheesh Katkam" <[email protected]>
> > > > wrote:
> > > > >>
> > > > >> Two more questions..
> > > > >>
> > > > >> (1) How many nodes in your cluster?
> > > > >> (2) How many queries are running when the failure is seen?
> > > > >>
> > > > >> If you have multiple large queries running at the same time, the
> > load
> > > on
> > > > >> the system could cause those failures (which are heartbeat
> related).
> > > > >>
> > > > >> The two options I suggested decrease the parallelism of stages in
> a
> > > > >> query, this implies lesser load but slower execution.
> > > > >>
> > > > >> System level option affect all queries, and session level affect
> > > queries
> > > > >> on a specific connection. Not sure what is preferred in your
> > > > environment.
> > > > >>
> > > > >> Also, you may be interested in metrics. More info here:
> > > > >>
> > > > >> http://drill.apache.org/docs/monitoring-metrics/ <
> > > > >> http://drill.apache.org/docs/monitoring-metrics/>
> > > > >>
> > > > >> Thank you,
> > > > >> Sudheesh
> > > > >>
> > > > >> > On Dec 21, 2016, at 4:31 AM, Anup Tiwari <
> > [email protected]
> > > >
> > > > >> wrote:
> > > > >> >
> > > > >> > @sudheesh, yes drill bit is running on
> datanodeN/10.*.*.5:31010).
> > > > >> >
> > > > >> > Can you tell me how this will impact to query and do i have to
> set
> > > > this
> > > > >> at
> > > > >> > session level OR system level?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Regards,
> > > > >> > *Anup Tiwari*
> > > > >> >
> > > > >> > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang <
> [email protected]
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> I am pretty sure this is the same as DRILL-4708.
> > > > >> >>
> > > > >> >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam <
> > > > >> [email protected]>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>> Is the drillbit service (running on datanodeN/10.*.*.5:31010)
> > > > actually
> > > > >> >>> down when the error is seen?
> > > > >> >>>
> > > > >> >>> If not, try lowering parallelism using these two session
> > options,
> > > > >> before
> > > > >> >>> running the queries:
> > > > >> >>>
> > > > >> >>> planner.width.max_per_node (decrease this)
> > > > >> >>> planner.slice_target (increase this)
> > > > >> >>>
> > > > >> >>> Thank you,
> > > > >> >>> Sudheesh
> > > > >> >>>
> > > > >> >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari <
> > > > [email protected]
> > > > >> >
> > > > >> >>> wrote:
> > > > >> >>>>
> > > > >> >>>> Hi Team,
> > > > >> >>>>
> > > > >> >>>> We are running some drill automation script on a daily basis
> > and
> > > we
> > > > >> >> often
> > > > >> >>>> see that some query gets failed frequently by giving below
> > error
> > > ,
> > > > >> >> Also i
> > > > >> >>>> came across DRILL-4708 <https://issues.apache.org/
> > > > >> >> jira/browse/DRILL-4708
> > > > >> >>>>
> > > > >> >>>> which seems similar, Can anyone give me update on that OR
> > > > workaround
> > > > >> to
> > > > >> >>>> avoid such issue ?
> > > > >> >>>>
> > > > >> >>>> *Stack Trace :-*
> > > > >> >>>>
> > > > >> >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <-->
> > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly.
> > > > Drillbit
> > > > >> >>> down?
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > (state=,code=0)
> > > > >> >>>> java.sql.SQLException: CONNECTION ERROR: Connection
> > > /10.*.*.1:41613
> > > > >> >> <-->
> > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly.
> > > Drillb
> > > > >> >>>> it down?
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
> > > > >> >>> DrillCursor.java:232)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
> > > > >> >>> DrillCursor.java:275)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > >> >>> DrillResultSetImpl.java:1943)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > >> >>> DrillResultSetImpl.java:76)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection$1.execute(
> > > > >> >>> AvaticaConnection.java:473)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(
> > > > >> >>> DrillMetaImpl.java:465)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection.
> > > > >> >> prepareAndExecuteInternal(
> > > > >> >>> AvaticaConnection.java:477)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl.
> > > > >> >>> prepareAndExecuteInternal(DrillConnectionImpl.java:169)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> > > > >> >>> AvaticaStatement.java:109)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.execute(
> > > > >> >>> AvaticaStatement.java:121)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
> > > > >> >>> DrillStatementImpl.java:101)
> > > > >> >>>>       at sqlline.Commands.execute(Commands.java:841)
> > > > >> >>>>       at sqlline.Commands.sql(Commands.java:751)
> > > > >> >>>>       at sqlline.SqlLine.dispatch(SqlLine.java:746)
> > > > >> >>>>       at sqlline.SqlLine.runCommands(SqlLine.java:1651)
> > > > >> >>>>       at sqlline.Commands.run(Commands.java:1304)
> > > > >> >>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > > Method)
> > > > >> >>>>       at
> > > > >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > >> >>> NativeMethodAccessorImpl.java:62)
> > > > >> >>>>       at
> > > > >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > >> >>> DelegatingMethodAccessorImpl.java:43)
> > > > >> >>>>       at java.lang.reflect.Method.invoke(Method.java:498)
> > > > >> >>>>       at
> > > > >> >>>> sqlline.ReflectiveCommandHandler.execute(
> > > > >> >> ReflectiveCommandHandler.java:
> > > > >> >>> 36)
> > > > >> >>>>       at sqlline.SqlLine.dispatch(SqlLine.java:742)
> > > > >> >>>>       at sqlline.SqlLine.initArgs(SqlLine.java:553)
> > > > >> >>>>       at sqlline.SqlLine.begin(SqlLine.java:596)
> > > > >> >>>>       at sqlline.SqlLine.start(SqlLine.java:375)
> > > > >> >>>>       at sqlline.SqlLine.main(SqlLine.java:268)
> > > > >> >>>> Caused by: org.apache.drill.common.exceptions.UserException:
> > > > >> >> CONNECTION
> > > > >> >>>> ERROR: Connection /10.*.*.1:41613 <-->
> datanodeN/10.*.*.5:31010
> > > > (user
> > > > >> >>>> client) closed unexpectedly. Drillbit down?
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.common.exceptions.UserException$
> > > > >> >>> Builder.build(UserException.java:543)
> > > > >> >>>>       at
> > > > >> >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$
> > > > >> >>> ChannelClosedHandler$1.operationComplete(QueryResultHandler.
> > > > java:373)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListener0(
> > > > >> >>> DefaultPromise.java:680)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners0(
> > > > >> >>> DefaultPromise.java:603)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners(
> > > > >> >>> DefaultPromise.java:563)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.util.concurrent.DefaultPromise.trySuccess(
> > > > >> >>> DefaultPromise.java:406)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.DefaultChannelPromise.trySuccess(
> > > > >> >>> DefaultChannelPromise.java:82)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.AbstractChannel$CloseFuture.
> > > > >> >> setClosed(AbstractChannel.
> > > > >> >>> java:943)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(
> > > > >> >>> AbstractChannel.java:592)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.close(
> > > > >> >>> AbstractChannel.java:584)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.cl
> > > > >> oseOnRead(
> > > > >> >>> AbstractNioByteChannel.java:71)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.
> > > > >> >>> handleReadException(AbstractNioByteChannel.java:89)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$
> > NioByteUnsafe.read(
> > > > >> >>> AbstractNioByteChannel.java:162)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(
> > > > >> >>> NioEventLoop.java:511)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.NioEventLoop.
> > processSelectedKeysOptimized(
> > > > >> >>> NioEventLoop.java:468)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> > > > >> >>> NioEventLoop.java:382)
> > > > >> >>>>       at io.netty.channel.nio.NioEventL
> > > > >> oop.run(NioEventLoop.java:354)
> > > > >> >>>>       at
> > > > >> >>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.
> > > > >> >>> run(SingleThreadEventExecutor.java:111)
> > > > >> >>>>       at java.lang.Thread.run(Thread.java:745)
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> Regards,
> > > > >> >>>> *Anup Tiwari*
> > > > >> >>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to