Scott I think I can explain why you are getting the OutOfMemory.
Drill essentially has 2 pools of memory... the standard JVM Heap and the Netty-managed Direct memory. When you are reading a JSON document, it needs to be deserialized into Java heap objects because of the JSON parser libraries Drill uses. After that, Drill converts it into its internal representation within the Direct memory space. The issue you are seeing is most likely that this initial step is consuming a very large amount of Heap memory. So, the options you have are 1. Reduce the size of the individual units of the dataset (I'm assuming it is one giant JSON document within the source file) 2. Increase the Heap, possibly at the cost of Direct (say, 12GB Xmx and 6GB Direct) 3. Reduce the parallelization, so that fewer JSON files are read and materialized in the heap memory at a given time. ~ Kunal On 8/29/2018 5:10:55 PM, Boaz Ben-Zvi <[email protected]> wrote: Hi Scott, 1. "swaps and then crashes" - do you mean an Out-Of-Memory error ? 2. Version 1.14 is available now, with several memory control improvements (e.g., Hash Join spilling, output batch sizing) 3. Direct memory is only 10G - why not go higher ? This is where most of Drill's in-memory data is held (not so much the stack and heap). 4. May want to increase the memory available to each query on each node; the default ( 2GB ) is too conservative (i.e. low). E.g., to go to 8GB, do alter session set `planner.memory.max_query_memory_per_node` = 8589934592; Thanks, Boaz On 8/29/18 4:09 PM, scott wrote: > Hi all, > I've got a problem using the create table as option I was hoping someone > could help with. I am trying to create parquet files from existing json > files using this method. It works on smaller datasets, but when I try this > on a large dataset, drill will take up all memory on my servers until it > swaps and then crashes. I'm running version 1.12 on centos 7. I've got my > drillbits set to xmx 8G, which seems to work for most queries and it does > not exceed that limit by much, but when I do the CTAS, it just keeps > growing without bounds. > I run 4 drillbits on each server with these settings: -Xms8G -Xmx8G > -XX:MaxDirectMemorySize=10G on a server that has 48G RAM. > Has anyone else experienced this? Are there any workarounds you can suggest? > > Thanks for your time, > Scott >
