See if you can bump your heap to 2G. I have not worked on Windows embedded, but there should be no reason why setting the node width to 1 should not work. Used on Linux and Mac with no issue to create parquet.
Also decrease the parquet block size to 32MB on such a small system if you keep running into memory issues. You can also turn off HASH JOIN to limit memory usage (planner.enable_hashjoin), but it may slow down the query. HASH JOIN and Parquet writer can consume a lot of memory. On the last query which columns do you want, a select * on a join may not always be ideal with CTAS. Without knowing the data it is hard to follow what the desired outcome is. —Andries > On Aug 4, 2015, at 9:05 AM, Boris Chmiel <[email protected]> > wrote: > > Hi Andries, > I am using Drill 1.1.0Configuration is : > DRILL_MAX_DIRECT_MEMORY="4G"DRILL_HEAP="1G"planner.memory.max_query_memory_per_node > is 4147483648Physical RAM is 8G. The computer is dedicated to test Drill > (fresh Win install) > However total.max peak at 2 019 033 088 within Metrics > During the intial query : - 6 Minor Fragments in the PARQUET_WRITER are > instantiated - The query fails before starting writing- 3 Minor Fragments for > each of the 2 PARQUET_ROW_GROUP_SCAN : The first did not start and the Second > fails with peak memory at 125MB + 121MB + 21MB > Re running the query while Dropping planner.width.max_per_node from 3 to 1 > causes : - to have only 1 minor fragment for each PARQUET_ROW_GROUP_SCAN > operators- to start both PARQUET_ROW_GROUP_SCAN operators (32K row read over > 760K and 760K over 4840K)- to make the query fails with PARQUET_WRITER and > HASH_JOIN initiated (Major Fragment 1) > The total peak memory usage within the plan is : - 57MB for PARQUET_WRITER- > 109MB for HASH_JOIN- 170MB for PARQUET_ROW_GROUP_SCAN #1- 360MB for > PARQUET_ROW_GROUP_SCAN #2- 25MB for a PROJECT operators=> 721MB peak > Do you think my configuration is not appropriate to what I'm trying to do ? I > am definitely limited by physical memory ? > Thanks > RegsBoris > > > Le Mardi 4 août 2015 17h10, Andries Engelbrecht > <[email protected]> a écrit : > > > How much memory is allocated to Drill in the drill-env.sh file? > > CTAS with parquet can consume quite a bit of memory as various structures are > allocated in memory before the parquet files are written. If you look in the > query profiles you will get a good indication of the memory usage. > > Also see how many fragments are working on creating the parquet files, if you > are limited on memory you can reduce the number of fragments in CTAS to limit > memory usage. > You can check planner.width.max_per_node and reduce the number if it is > higher than 1. > > Which version of Drill are you using? > > —Andries > > >> On Aug 4, 2015, at 7:50 AM, Boris Chmiel <[email protected]> >> wrote: >> >> Hi all, >> >> I try to figureout how to optimize my queries. I found that when I prepare >> my data prior toquery it, using CTAS to apply schema and transform my CSV >> files to Parquetformat, subsequent queries are much likely to reach OOM. >> >> i.e : >> >> This direct queryon csv files works: >> >> CREATE TABLEt3parquet as ( >> >> SELECT * FROMTable1.csv >> >> INNER JOINTable2.csv ON table1.columns [0] = table2.columns[0]); >> >> When thiscombination does not: >> >> CREATE TABLEt1parquet AS ( >> >> SELECT >> >> CAST(columns[0] ASvarchar(10)) key1) >> >> CAST(columns[1] …and so on) >> >> FROM Table1.csv); >> >> >> >> CREATE TABLE t2parquetAS ( >> >> SELECT CAST(columns[0]AS varchar(10)) key1) >> >> CAST(columns[1] …and so on) >> >> FROM Table2.csv); >> >> >> >> CREATE TABLE t3parquet as ( >> >> SELECT * FROM t2parquet >> >> INNER JOIN t1parquet ON t1parquet.key1 =t2parquet.key1); >> >> >> >> This last query runs OOM on PARQUET_ROW_GROUP_SCAN >> >> >> >> I use embedded mode upon Windows, File system storage,64MB parquet block >> size, not so big files (less hundreds of MB in raw format) >> >> >> >> >> >> Does the way Drill / Parquet work implies to prefer queries/ views on raw >> files to save memory rather than parquet ? Does this behavior isnormal ? >> >> Do you think I my memory configuration should by tunedor does I miss >> understand something ? >> >> >> >> Thanks in advance, and sorry for my english >> >> Regards >> >> Boris >> >> > >
