Re: Heap memory - leak

Kunal Khatua Tue, 25 Jun 2019 21:08:45 -0700

Yes. The reason is that "0% of 0GB" basically means 0% of the maximum direct 
memory claimed so far.


Drill will consume some system memory for use in Direct and can (potentially) 
return it as well. In your case, the Direct memory appears to be barely used. 
So, it is very likely that the Drillbit is using Heap memory to do I/O reads 
(typical for cases, like non-natively supported formats ... e.g. where Hive 
SerDes are used). So, most of the I/O is done by Heap.

You might also want to see what is the JStack showing. As Khurram suggested, 
the memory used up after the I/O read (say, via Heap) has data which is now 
being converted into an HTTP response to your REST API call. That conversion in 
the WebServer will also bloat into more memory usage on the heap.

~ Kunal
On 6/25/2019 3:27:04 AM, Dweep Sharma <[email protected]> wrote:
Hi All,

Thanks for the response. I currently see the Heap utilization go upwards of
80-90% but the Direct Memory is 0GB (0% of 0GB)

I have the following settings in drill-env.sh . Is it normal to show Direct
Memory utilization as 0GB ?

export DRILLBIT_MAX_PROC_MEM=${DRILLBIT_MAX_PROC_MEM:-"25G"}
export DRILL_HEAP=${DRILL_HEAP:-"16G"}
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}
export DRILLBIT_CODE_CACHE_SIZE=${DRILLBIT_CODE_CACHE_SIZE:-"2G"}

-Dweep

On Thu, Jun 13, 2019 at 1:27 AM Khurram Faraaz wrote:

> Hi Dweep,
>
> I see that you are using the REST API to execute the queries. The
> recommended way for complex queries that are resource intensive, is to
> always use either the JDBC/ODBC driver to submit queries for execution.
> Here is a difference for your reference. Please use JDBC/ODBC to execute
> your queries and let us know if that resolves the issue for you.
>
> *REST API* is a stateless protocol so when a query request is submitted by
> REST client then the request thread on server side will block until the
> query is fully executed. When all the results are available on server side,
> it is converted into strings to be sent back to the REST client. So if we
> have query results very big that will end up consuming lots of memory on
> the server side and at the same it will be difficult for certain REST
> client
> (like browsers) to render all those requests at once. Due to this we see
> unresponsive browser and high heap usage in Drillbit as well. Basically
> there is lack of pagination kind of protocol because of statelessness.
> Because of this limitation we always suggest to use RESTapi for smaller
> footprint query or running limit query which can result in smaller number
> of memory footprint.''
>
> *JDBC/ODBC API* on the other hand is stateful protocol which means that
> client side has some context when a query is actually completed and can get
> results in incremental order. So using JDBC api Drill client doesn't have
> to wait for receiving all the results row from server side before sending
> to application. It can start sending results whenever the first batch is
> available and JDBC provides api to receive incremental results (next call).
> So the problems related to higher memory footprint to get all the results
> row at once is not there. Now if JDBC application is pulling data slowly
> then there are feedback mechanism built in which will affect the execution
> of query too. It can tell server side to stop the query pipeline from
> sending more results since application is not pulling it fast enough. Hence
> there is feedback mechanism which comes into play and all these are because
> of statefulness of the JDBC/ODBC side of protocol.
>
> Thanks,
> Khurram
>
> On Fri, Jun 7, 2019 at 4:50 PM Paul Rogers
> wrote:
>
> > Hi Dweep,
> >
> > This mailing list does not support attachments. Consider filing a JIRA
> > ticket and attaching your images there: [1]
> >
> > You mention you've assigned Drill 14 GB of heap. You also mention that
> > your task ran out of heap. As it turns out, Drill also uses direct memory
> > to store intermediate data. I wonder if the error condition is actually
> > about direct memory. How much direct memory have you given to Drill?
> >
> > 14 GB of heap and the default direct memory (8GB) should be plenty for a
> > query that produces a 48 MB Parquet file: assuming that the input size is
> > similar: ~200 MB (uncompressed JSON).
> >
> >
> > You mention that you run the JSON-to-Parquet conversion once per hour. Do
> > you use this Drill instance for any other tasks? Are there other tasks
> > running at the same time? How many nodes of Drill are in use?
> >
> >
> > Finally, you mention you use the REST API. Perhaps something odd is
> > happening there. A stack trace of your error would help. The stack trace
> > may be in the error message, or in the Drill log file.
> >
> >
> > Thanks,
> > - Paul
> >
> > [1] https://issues.apache.org/jira
> >
> >
> > On Friday, June 7, 2019, 2:36:36 AM PDT, Dweep Sharma
> > [email protected]> wrote:
> >
> > Hi Divya,
> >
> > The size is 48 MB (after converting to Parquet)
> >
> >
> >
> >
> >
> > On Fri, Jun 7, 2019 at 1:45 PM Divya Gehlot
> > wrote:
> >
> > Can you share the more details .
> > Query profile and other aspects like data size and all to have better
> view
> > what’s happening
> >
> >
> > Thanks ,
> > Divya
> >
> > On Fri, 7 Jun 2019 at 4:13 PM, Dweep Sharma
> > wrote:
> >
> > > Data is in JSON format.
> > >
> > > On Fri, Jun 7, 2019 at 1:39 PM Dweep Sharma
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a memory leak issue. 14GB memory is assigned to heap but it
> gets
> > > > full within a day with just one cron running.
> > > >
> > > > Task is a CTAS query from Kafka to S3 once every hour. CTAS is issued
> > via
> > > > the Drill Rest API.
> > > >
> > > > Please assist on a resolution.
> > > >
> > > > -Dweep
> > > >
> > >
> > >
> >
> >
>

--
*::DISCLAIMER::

----------------------------------------------------------------------------------------------------------------------------------------------------


The contents of this e-mail and any attachments are confidential and
intended for the named recipient(s) only.E-mail transmission is not
guaranteed to be secure or error-free as information could be intercepted,
corrupted,lost, destroyed, arrive late or incomplete, or may contain
viruses in transmission. The e mail and its contents(with or without
referred errors) shall therefore not attach any liability on the originator
or redBus.com. Views or opinions, if any, presented in this email are
solely those of the author and may not necessarily reflect the views or
opinions of redBus.com. Any form of reproduction, dissemination, copying,
disclosure, modification,distribution and / or publication of this message
without the prior written consent of authorized representative of redbus.
com is strictly prohibited. If you have received this
email in error please delete it and notify the sender immediately.Before
opening any email and/or attachments, please check them for viruses and
other defects.*

Re: Heap memory - leak

Reply via email to