Re: CTAS memory leak

Kunal Khatua Thu, 30 Aug 2018 11:04:52 -0700

Scott 

I think I can explain why you are getting the OutOfMemory.

Drill essentially has 2 pools of memory... the standard JVM Heap and the 
Netty-managed Direct memory. When you are reading a JSON document, it needs to 
be deserialized into Java heap objects because of the JSON parser libraries 
Drill uses. After that, Drill converts it into its internal representation 
within the Direct memory space. The issue you are seeing is most likely that 
this initial step is consuming a very large amount of Heap memory. 

So, the options you have are
1. Reduce the size of the individual units of the dataset (I'm assuming it is 
one giant JSON document within the source file)
2. Increase the Heap, possibly at the cost of Direct (say, 12GB Xmx and 6GB 
Direct)
3. Reduce the parallelization, so that fewer JSON files are read and 
materialized in the heap memory at a given time.

~ Kunal

On 8/29/2018 5:10:55 PM, Boaz Ben-Zvi <[email protected]> wrote:
Hi Scott,

1. "swaps and then crashes" - do you mean an Out-Of-Memory error ?

2. Version 1.14 is available now, with several memory control
improvements (e.g., Hash Join spilling, output batch sizing)

3. Direct memory is only 10G - why not go higher ? This is where most of
Drill's in-memory data is held (not so much the stack and heap).

4. May want to increase the memory available to each query on each node;
the default ( 2GB ) is too conservative (i.e. low).

E.g., to go to 8GB, do

alter session set `planner.memory.max_query_memory_per_node` =
8589934592;

Thanks,

Boaz

On 8/29/18 4:09 PM, scott wrote:
> Hi all,
> I've got a problem using the create table as option I was hoping someone
> could help with. I am trying to create parquet files from existing json
> files using this method. It works on smaller datasets, but when I try this
> on a large dataset, drill will take up all memory on my servers until it
> swaps and then crashes. I'm running version 1.12 on centos 7. I've got my
> drillbits set to xmx 8G, which seems to work for most queries and it does
> not exceed that limit by much, but when I do the CTAS, it just keeps
> growing without bounds.
> I run 4 drillbits on each server with these settings: -Xms8G -Xmx8G
> -XX:MaxDirectMemorySize=10G on a server that has 48G RAM.
> Has anyone else experienced this? Are there any workarounds you can suggest?
>
> Thanks for your time,
> Scott
>

Re: CTAS memory leak

Reply via email to