Drill performance tuning parquet

Dan Holmes Thu, 27 Jul 2017 12:59:34 -0700

I am performance testing a single drill instance with different vCPU 
configurations in AWS.  I have a parquet files on an EFS volume and use the 
same data for each EC2 instance.


I have used 4vCPUs, 8 and 16.  Drill performance is ~25 second, 15 and 12 
respectively.  I have not changed any of the options.   This an out of the box 
1.11 installation.

What Drill tuning options should I experiment with?  I have read 
https://drill.apache.org/docs/asynchronous-parquet-reader/ but it is so 
technical that I can't consume it but it reads like the default options are the 
best ones.

The query looks like this:
SELECT store_key, SUM(sales_dollars) sd
FROM dfs.root.sales_p
GROUP BY store_key
ORDER BY sd DESC
LIMIT 10

Dan Holmes | Architect | Revenue Analytics, Inc.
300 Galleria Parkway, Suite 1900 | Atlanta, Georgia 30339
Direct: 770.859.1255 Cell: 404.617.3444
www.revenueanalytics.com<https://webmail.revenueanalytics.com/owa/redir.aspx?SURL=RqmyOJRm3r383jV2nPQLyg9BvjWZqMX4-tL3BHj81WfaslMWau_SCGgAdAB0AHAAOgAvAC8AdwB3AHcALgByAGUAdgBlAG4AdQBlAGEAbgBhAGwAeQB0AGkAYwBzAC4AYwBvAG0A&URL=http%3a%2f%2fwww.revenueanalytics.com>
LinkedIn<https://webmail.revenueanalytics.com/owa/redir.aspx?SURL=SrcaeiXxVTCDhl49ibCO7CHhTsNynunc_8gSjHDaikXaslMWau_SCGgAdAB0AHAAcwA6AC8ALwB3AHcAdwAuAGwAaQBuAGsAZQBkAGkAbgAuAGMAbwBtAC8AYwBvAG0AcABhAG4AeQAvAHIAZQB2AGUAbgB1AGUALQBhAG4AYQBsAHkAdABpAGMAcwAtAGkAbgBjAC0A&URL=https%3a%2f%2fwww.linkedin.com%2fcompany%2frevenue-analytics-inc->
 | 
Twitter<https://webmail.revenueanalytics.com/owa/redir.aspx?SURL=cdePsMV8TCGx8O_Rugbj-maE9C9DVT373vSJwbUc23faslMWau_SCGgAdAB0AHAAcwA6AC8ALwB0AHcAaQB0AHQAZQByAC4AYwBvAG0ALwBSAGUAdgBfAEEAbgBhAGwAeQB0AGkAYwBzAA..&URL=https%3a%2f%2ftwitter.com%2fRev_Analytics>

Drill performance tuning parquet

Reply via email to