Hi Preet,

Welcome to the Drill community! You asked "why consumed memory by query is not 
coming down"?

Perhaps you meant to ask "why is the consumed JVM memory not coming down?" This 
is a feature of Java: once the JVM uses an amount of memory, the JVM will not 
release that memory back to the operating system. The memory is free within the 
JVM and will be reused for the next query.

Also, the memory given to Drill seems excessive: a total of 96 GB for 50 
million records. That is an average of over 1K per record. Are your records 
that large?

Depending on the structure of your query, Drill may not hold the entire data 
set in memory. If you do a simple "SELECT * FROM yourtable" then Drill will 
hold on only a few "batches" of data in memory as data streams from your 
Parquet file, through drill, to your client application.

On the other hand, if you include an ORDER BY, GROUP BY or similar operator, 
then Drill will hold all data in memory. If memory is insufficient, Drill will 
spill to disk.

Some questions to ask you are:

* How big is your Parquet file in MB?
* What client are you using? For this size of data, you should use a CREATE 
TABLE AS or a JDBC or ODBC client. This is NOT a good query for the REST API or 
Drill web console.
* What is your query? Simple SELECT, or does it include sorting, grouping or 
other operations?

Thanks,
- Paul

 

    On Thursday, July 18, 2019, 02:53:48 PM PDT, preet singh 
<preetre...@gmail.com> wrote:  
 
 Hello Drill Team

I am evaluating Apache drill for my query on 50 million of records in
parquet files. I updated max direct memory to 64 GB and heap 32 GB.

I am able to execute my query but after complete execution, why consumed
memory by query is not coming down (still be around 64 GB) ?

Thanks
Preet Singh
  

Reply via email to