I am trying to run a query on Apache drill to simply count the number of rows
in a table stored in parquet format in S3. I am running this on a 20 node
r3.8xlarge EC2 instance cluster and I have my direct memory set to 80GB, heap
memory set to 32GB and set the planner.memory.max_memory_per_node to a very
high value. However, counting the rows in this table takes around 7662 seconds,
or around 2 hours, for drill to finish the query on a 9.93TB, 56 billion rows,
and 174 column dataset.It seems like, from the logs and the web console that
query planning itself is taking near 99% of the time and actual query execution
is almost taking no time. I ran the same query on PrestoDB of a similar setup
(20 node r3.8xlarge) and found that it completed in 137 seconds or just over 2
minutes. Is there someting wrong with my configuration of drill possibly or is
this what is expected for drill.