Hi Dweep,

I see that you are using the REST API to execute the queries. The
recommended way for complex queries that are resource intensive, is to
always use either the JDBC/ODBC driver to submit queries for execution.
Here is a difference for your reference. Please use JDBC/ODBC to execute
your queries and let us know if that resolves the issue for you.

*REST API* is a stateless protocol so when a query request is submitted by
REST client then the request thread on server side will block until the
query is fully executed. When all the results are available on server side,
it is converted into strings to be sent back to the REST client. So if we
have query results very big that will end up consuming lots of memory on
the server side and at the same it will be difficult for certain REST client
(like browsers) to render all those requests at once. Due to this we see
unresponsive browser and high heap usage in Drillbit as well. Basically
there is lack of pagination kind of protocol because of statelessness.
Because of this limitation we always suggest to use RESTapi for smaller
footprint query or running limit query which can result in smaller number
of memory footprint.''

*JDBC/ODBC API* on the other hand is stateful protocol which means that
client side has some context when a query is actually completed and can get
results in incremental order. So using JDBC api Drill client doesn't have
to wait for receiving all the results row from server side before sending
to application. It can start sending results whenever the first batch is
available and JDBC provides api to receive incremental results (next call).
So the problems related to higher memory footprint to get all the results
row at once is not there. Now if JDBC application is pulling data slowly
then there are feedback mechanism built in which will affect the execution
of query too. It can tell server side to stop the query pipeline from
sending more results since application is not pulling it fast enough. Hence
there is feedback mechanism which comes into play and all these are because
of statefulness of the JDBC/ODBC side of protocol.

Thanks,
Khurram

On Fri, Jun 7, 2019 at 4:50 PM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi Dweep,
>
> This mailing list does not support attachments. Consider filing a JIRA
> ticket and attaching your images there: [1]
>
> You mention you've assigned Drill 14 GB of heap. You also mention that
> your task ran out of heap. As it turns out, Drill also uses direct memory
> to store intermediate data. I wonder if the error condition is actually
> about direct memory. How much direct memory have you given to Drill?
>
> 14 GB of heap and the default direct memory (8GB) should be plenty for a
> query that produces a 48 MB Parquet file: assuming that the input size is
> similar: ~200 MB (uncompressed JSON).
>
>
> You mention that you run the JSON-to-Parquet conversion once per hour. Do
> you use this Drill instance for any other tasks? Are there other tasks
> running at the same time? How many nodes of Drill are in use?
>
>
> Finally, you mention you use the REST API. Perhaps something odd is
> happening there. A stack trace of your error would help. The stack trace
> may be in the error message, or in the Drill log file.
>
>
> Thanks,
> - Paul
>
> [1] https://issues.apache.org/jira
>
>
>     On Friday, June 7, 2019, 2:36:36 AM PDT, Dweep Sharma <
> dweep.sha...@redbus.com> wrote:
>
>  Hi Divya,
>
> The size is 48 MB (after converting to Parquet)
>
>
>
>
>
> On Fri, Jun 7, 2019 at 1:45 PM Divya Gehlot <divinediv...@gmail.com>
> wrote:
>
> Can you share the more details .
> Query profile and other aspects like data size and all to have better view
> what’s happening
>
>
> Thanks ,
> Divya
>
> On Fri, 7 Jun 2019 at 4:13 PM, Dweep Sharma <dweep.sha...@redbus.com>
> wrote:
>
> > Data is in JSON format.
> >
> > On Fri, Jun 7, 2019 at 1:39 PM Dweep Sharma <dweep.sha...@redbus.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a memory leak issue. 14GB memory is assigned to heap but it gets
> > > full within a day with just one cron running.
> > >
> > > Task is a CTAS query from Kafka to S3 once every hour. CTAS is issued
> via
> > > the Drill Rest API.
> > >
> > > Please assist on a resolution.
> > >
> > > -Dweep
> > >
> >
> >
>
>

Reply via email to