Hi Yijie, Thanks for the profile. Looks like from the Operator Profile overview, 03-xx-02 HASH_AGGREGATE and 03-xx-06 PARQUET_ROW_GROUP_SCAN took the most of time:
03-xx-02 HASH_AGGREGATE 0.020s 0.083s 0.213s 1m06s 1m55s 3m12s 0.000s 0.000s 0.000s 16MB 16MB 03-xx-03 PROJECT 0.004s 0.009s 0.024s 6.539s 11.249s 16.504s 0.000s 0.000s 0.000s 536KB 536KB 03-xx-04 SELECTION_VECTOR_REMOVER 0.002s 0.004s 0.014s 34.456s 58.595s 1m26s 0.000s 0.000s 0.000s 664KB 664KB 03-xx-05 FILTER 0.001s 0.003s 0.009s 2.556s 4.355s 6.543s 0.000s 0.000s 0.000s 8MB 8MB 03-xx-06 PARQUET_ROW_GROUP_SCAN 0.000s 0.000s 0.000s 2m13s 3m40s 5m14s 5.392s 6.372s 7.371s 12MB 12MB For the SCAN, there are some skewness among the batches/records processed by the minor fragments: from ~5000 batches to +15000 batches The 5m process times are for the bigger batches. Similar skewness is also shown in 03-xx-02. If the load can be balanced among the fragments, it may speed up the query a little bit. On the other hand, the 10min query time is not that much off the line, considering that you have only 3 nodes and each node is with only 7 HDDs. I did a similar run of the query on my cluster (10 nodes each with 32 cores and 23 HDDs), For 03-xx-02 and 03-xx-06, the process time per record is about the same as what you got, even though the total query time is much short due to it is on much bigger system. Another note: for your test, disk IO is about ~24MB/s/HDD (150GB/300s/21HDD), which is about the IO limit if there are a lot small IOs (what is the chuck size of your HDFS?) Hope this helps, let me know If you have any further questions. Thanks, Dechang On Mon, May 23, 2016 at 7:28 PM, Yijie Shen <[email protected] <mailto:[email protected]>> wrote: Hi Dechang, Thanks for your reply. I've changed the parallelism of each node from 17 to 24 but no performance improvements seen. The 1516 files are distributed in HDFS constitute 3 datanodes, same 3 servers where I start DrillBits. For each datanode, 7 HDD is configured as dfs.datanode.data.dir. The data was generated from TPCH dbgen tool and loaded into parquet format using Apache Spark, the file size are 101MB on average, no skew seen. I've paste the `Full json profile` content and attached to this mail. Thank you. On Mon, May 23, 2016 at 11:12 PM, Dechang Gu <[email protected] <mailto:[email protected]>> wrote: Hi Yijie, This is Dechang at MapR. I work on Drill performance. From what you described, looks like scan took most of the time. How are the files are distributed on the disks, are there any skew? How many disks are there? If possible can you provide the profile for the run? Thanks, Dechang On Sun, May 22, 2016 at 9:06 AM, Yijie Shen <[email protected] <mailto:[email protected]>> wrote: Hi all, I'm trying out Drill on master branch lately and have deployed a cluster on three physical server. The input data `lineitem` is in parquet format of total size 150GB, 101MB per file and 1516 files in total. The server has two Intel(R) Xeon(R) CPU E5645 @2.40GHz CPUs and 24 cores in total, 32GB memory. While executing Q1 using: SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE), SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)), SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY), AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1) FROM dfs.tpch.`lineitem` WHERE L_SHIPDATE<='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS I've noticed the parallelism was 51 (planner.width.max_per_node = 17) in my case for Major Fragment 03 (Scan Filter Project HashAgg and Project), and each Minor fragment last about 8 to 9 minutes. one record for example: 03-00-xx hw080 7.309s 42.358s 9m35s 118,758,489 14,540 22:31:32 22:31:32 33MB FINISHED Is this a normal speed (more than 10 minutes) for Drill for my current cluster? Did I miss something important in conf to accelerate the execution? Thanks very much! Yijie
