I am trying to run a query on Apache drill to simply count the number of rows 
in a table stored in parquet format in S3. I am running this on a 20 node 
r3.8xlarge EC2 instance cluster and I have my direct memory set to 80GB, heap 
memory set to 32GB and set the planner.memory.max_memory_per_node to a very 
high value. However, counting the rows in this table takes around 7662 seconds, 
or around 2 hours, for drill to finish the query on a 9.93TB, 56 billion rows, 
and 174 column dataset.It seems like, from the logs and the web console that 
query planning itself is taking near 99% of the time and actual query execution 
is almost taking no time. I ran the same query on PrestoDB of a similar setup 
(20 node r3.8xlarge) and found that it completed in 137 seconds or just over 2 
minutes. Is there someting wrong with my configuration of drill possibly or is 
this what is expected for drill.

Reply via email to