Hi Dweep, I see that you are using the REST API to execute the queries. The recommended way for complex queries that are resource intensive, is to always use either the JDBC/ODBC driver to submit queries for execution. Here is a difference for your reference. Please use JDBC/ODBC to execute your queries and let us know if that resolves the issue for you.
*REST API* is a stateless protocol so when a query request is submitted by REST client then the request thread on server side will block until the query is fully executed. When all the results are available on server side, it is converted into strings to be sent back to the REST client. So if we have query results very big that will end up consuming lots of memory on the server side and at the same it will be difficult for certain REST client (like browsers) to render all those requests at once. Due to this we see unresponsive browser and high heap usage in Drillbit as well. Basically there is lack of pagination kind of protocol because of statelessness. Because of this limitation we always suggest to use RESTapi for smaller footprint query or running limit query which can result in smaller number of memory footprint.'' *JDBC/ODBC API* on the other hand is stateful protocol which means that client side has some context when a query is actually completed and can get results in incremental order. So using JDBC api Drill client doesn't have to wait for receiving all the results row from server side before sending to application. It can start sending results whenever the first batch is available and JDBC provides api to receive incremental results (next call). So the problems related to higher memory footprint to get all the results row at once is not there. Now if JDBC application is pulling data slowly then there are feedback mechanism built in which will affect the execution of query too. It can tell server side to stop the query pipeline from sending more results since application is not pulling it fast enough. Hence there is feedback mechanism which comes into play and all these are because of statefulness of the JDBC/ODBC side of protocol. Thanks, Khurram On Fri, Jun 7, 2019 at 4:50 PM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi Dweep, > > This mailing list does not support attachments. Consider filing a JIRA > ticket and attaching your images there: [1] > > You mention you've assigned Drill 14 GB of heap. You also mention that > your task ran out of heap. As it turns out, Drill also uses direct memory > to store intermediate data. I wonder if the error condition is actually > about direct memory. How much direct memory have you given to Drill? > > 14 GB of heap and the default direct memory (8GB) should be plenty for a > query that produces a 48 MB Parquet file: assuming that the input size is > similar: ~200 MB (uncompressed JSON). > > > You mention that you run the JSON-to-Parquet conversion once per hour. Do > you use this Drill instance for any other tasks? Are there other tasks > running at the same time? How many nodes of Drill are in use? > > > Finally, you mention you use the REST API. Perhaps something odd is > happening there. A stack trace of your error would help. The stack trace > may be in the error message, or in the Drill log file. > > > Thanks, > - Paul > > [1] https://issues.apache.org/jira > > > On Friday, June 7, 2019, 2:36:36 AM PDT, Dweep Sharma < > dweep.sha...@redbus.com> wrote: > > Hi Divya, > > The size is 48 MB (after converting to Parquet) > > > > > > On Fri, Jun 7, 2019 at 1:45 PM Divya Gehlot <divinediv...@gmail.com> > wrote: > > Can you share the more details . > Query profile and other aspects like data size and all to have better view > what’s happening > > > Thanks , > Divya > > On Fri, 7 Jun 2019 at 4:13 PM, Dweep Sharma <dweep.sha...@redbus.com> > wrote: > > > Data is in JSON format. > > > > On Fri, Jun 7, 2019 at 1:39 PM Dweep Sharma <dweep.sha...@redbus.com> > > wrote: > > > > > Hi, > > > > > > I have a memory leak issue. 14GB memory is assigned to heap but it gets > > > full within a day with just one cron running. > > > > > > Task is a CTAS query from Kafka to S3 once every hour. CTAS is issued > via > > > the Drill Rest API. > > > > > > Please assist on a resolution. > > > > > > -Dweep > > > > > > > > >