Gautam Parai created DRILL-7245:
-----------------------------------

             Summary: TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics 
is enabled at sf 100
                 Key: DRILL-7245
                 URL: https://issues.apache.org/jira/browse/DRILL-7245
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.16.0
            Reporter: Gautam Parai
            Assignee: Gautam Parai
             Fix For: 1.17.0


Here is query 65:
{noformat}
SELECT s_store_name, 
               i_item_desc, 
               sc.revenue, 
               i_current_price, 
               i_wholesale_cost, 
               i_brand 
FROM   store, 
       item, 
       (SELECT ss_store_sk, 
               Avg(revenue) AS ave 
        FROM   (SELECT ss_store_sk, 
                       ss_item_sk, 
                       Sum(ss_sales_price) AS revenue 
                FROM   store_sales, 
                       date_dim 
                WHERE  ss_sold_date_sk = d_date_sk 
                       AND d_month_seq BETWEEN 1199 AND 1199 + 11 
                GROUP  BY ss_store_sk, 
                          ss_item_sk) sa 
        GROUP  BY ss_store_sk) sb, 
       (SELECT ss_store_sk, 
               ss_item_sk, 
               Sum(ss_sales_price) AS revenue 
        FROM   store_sales, 
               date_dim 
        WHERE  ss_sold_date_sk = d_date_sk 
               AND d_month_seq BETWEEN 1199 AND 1199 + 11 
        GROUP  BY ss_store_sk, 
                  ss_item_sk) sc 
WHERE  sb.ss_store_sk = sc.ss_store_sk 
       AND sc.revenue <= 0.1 * sb.ave 
       AND s_store_sk = sc.ss_store_sk 
       AND i_item_sk = sc.ss_item_sk 
ORDER  BY s_store_name, 
          i_item_desc
LIMIT 100; 
{noformat}
Here is the new plan.
{noformat}
00-00    Screen : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY 
revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 
100.0, cumulative cost = {1.170402776556728E9 rows, 9.49229008144672E9 cpu, 
2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id 
= 45433
00-01      Project(s_store_name=[$0], i_item_desc=[$1], revenue=[$2], 
i_current_price=[$3], i_wholesale_cost=[$4], i_brand=[$5]) : rowType = 
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, 
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = 
{1.170402766556728E9 rows, 9.49229007144672E9 cpu, 2.305289192E9 io, 
5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45432
00-02        SelectionVectorRemover : rowType = RecordType(ANY s_store_name, 
ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY 
i_brand): rowcount = 100.0, cumulative cost = {1.170402666556728E9 rows, 
9.49228947144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 
1.3320421632282749E8 memory}, id = 45431
00-03          Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY 
i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY 
i_brand): rowcount = 100.0, cumulative cost = {1.170402566556728E9 rows, 
9.49228937144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 
1.3320421632282749E8 memory}, id = 45430
00-04            SingleMergeExchange(sort0=[0], sort1=[1]) : rowType = 
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, 
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = 
{1.170402466556728E9 rows, 9.49228897144672E9 cpu, 2.305289192E9 io, 
5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45429
01-01              OrderedMuxExchange(sort0=[0], sort1=[1]) : rowType = 
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, 
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = 
{1.170402366556728E9 rows, 9.492286842675482E9 cpu, 2.305289192E9 io, 
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45428
02-01                SelectionVectorRemover : rowType = RecordType(ANY 
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = 
{1.170402266556728E9 rows, 9.492286742675482E9 cpu, 2.305289192E9 io, 
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45427
02-02                  Limit(fetch=[100]) : rowType = RecordType(ANY 
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = 
{1.170402166556728E9 rows, 9.492286642675482E9 cpu, 2.305289192E9 io, 
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45426
02-03                    SelectionVectorRemover : rowType = RecordType(ANY 
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = 
{1.170402066556728E9 rows, 9.492286242675482E9 cpu, 2.305289192E9 io, 
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45425
02-04                      TopN(limit=[100]) : rowType = RecordType(ANY 
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = 
{1.1699855881208198E9 rows, 9.491869764239574E9 cpu, 2.305289192E9 io, 
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45424
02-05                        HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : 
rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY 
i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 
416478.4359081372, cumulative cost = {1.1695691096849117E9 rows, 
9.469733581565046E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 
1.3320421632282749E8 memory}, id = 45423
03-01                          Project(s_store_name=[$1], i_item_desc=[$3], 
revenue=[$11], i_current_price=[$4], i_wholesale_cost=[$5], i_brand=[$6]) : 
rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY 
i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 
416478.4359081372, cumulative cost = {1.1691526312490036E9 rows, 
9.464735840334148E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 
1.3320421632282749E8 memory}, id = 45422
03-02                            SelectionVectorRemover : rowType = 
RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, 
ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY 
ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 
416478.4359081372, cumulative cost = {1.1687361528130956E9 rows, 
9.4622369697187E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 
1.3320421632282749E8 memory}, id = 45421
03-03                              Filter(condition=[<=($11, *(0.1, $8))]) : 
rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY 
i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY 
ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount 
= 416478.4359081372, cumulative cost = {1.1683196743771875E9 rows, 
9.461820491282793E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 
1.3320421632282749E8 memory}, id = 45420
03-04                                Project(s_store_sk=[$5], 
s_store_name=[$6], i_item_sk=[$7], i_item_desc=[$8], i_current_price=[$9], 
i_wholesale_cost=[$10], i_brand=[$11], ss_store_sk=[$3], ave=[$4], 
ss_store_sk0=[$0], ss_item_sk=[$1], revenue=[$2]) : rowType = RecordType(ANY 
s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY 
i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, 
ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 832956.8718162744, 
cumulative cost = {1.1674867175053713E9 rows, 9.456822750051895E9 cpu, 
2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, 
id = 45419
03-05                                  HashJoin(condition=[=($7, $1)], 
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, 
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY 
s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 832956.8718162744, cumulative cost = 
{1.1666537606335552E9 rows, 9.4468272675901E9 cpu, 2.305289192E9 io, 
4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45418
03-07                                    HashToRandomExchange(dist0=[[$1]]) : 
rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY 
ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount = 
832377.0685035395, cumulative cost = {1.1652093835650516E9 rows, 
9.430922742768057E9 cpu, 2.304269192E9 io, 4.544006312385362E10 network, 
1.2961381632282749E8 memory}, id = 45415
04-01                                      HashJoin(condition=[=($5, $0)], 
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, 
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY 
s_store_name): rowcount = 832377.0685035395, cumulative cost = 
{1.164377006496548E9 rows, 9.417604709672E9 cpu, 2.304269192E9 io, 
2.157414781572014E10 network, 1.2961381632282749E8 memory}, id = 45414
04-03                                        HashJoin(condition=[=($3, $0)], 
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, 
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave): rowcount = 
832377.0685035397, cumulative cost = {1.1635434234280443E9 rows, 
9.407608948849958E9 cpu, 2.304268388E9 io, 2.1544179841320137E10 network, 
1.2960674112282749E8 memory}, id = 45411
04-05                                          HashAgg(group=[{0, 1}], 
revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY 
revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8 
rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 
5.674418293829948E7 memory}, id = 45393
04-07                                            
HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY 
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, 
cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 
1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, 
id = 45392
06-01                                              HashAgg(group=[{0, 1}], 
revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY 
revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8 
rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 
3.476942832980602E7 memory}, id = 45391
06-02                                                Project(ss_store_sk=[$2], 
ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk, 
ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative 
cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io, 
2.48987648E7 network, 5878.400000000001 memory}, id = 45390
06-03                                                  
HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY 
ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, 
cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 
2.48987648E7 network, 5878.400000000001 memory}, id = 45389
06-05                                                    Scan(table=[[dfs, 
/tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, 
numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, 
`ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY 
ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount 
= 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 
1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45384
06-04                                                    BroadcastExchange : 
rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, 
cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7 
network, 0.0 memory}, id = 45388
09-01                                                      
SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): 
rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 
0.0 network, 0.0 memory}, id = 45387
09-02                                                        
Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY 
d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 
rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45386
09-03                                                          
Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, 
filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , 
less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, 
`d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): 
rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 
0.0 network, 0.0 memory}, id = 45385
04-04                                          Project(ss_store_sk0=[$0], 
ave=[$1]) : rowType = RecordType(ANY ss_store_sk0, ANY ave): rowcount = 202.0, 
cumulative cost = {5.822713399551244E8 rows, 4.714959034142926E9 cpu, 
1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id 
= 45410
04-06                                            BroadcastExchange : rowType = 
RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = 
{5.822711379551244E8 rows, 4.714958630142926E9 cpu, 1.152134194E9 io, 
1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45409
07-01                                              Project(ss_store_sk=[$0], 
ave=[divide(CastHigh(CASE(=($2, 0), null, $1)), $2)]) : rowType = 
RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = 
{5.822709359551244E8 rows, 4.714957014142926E9 cpu, 1.152134194E9 io, 
1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45408
07-02                                                HashAgg(group=[{0}], 
agg#0=[$SUM0($1)], agg#1=[$SUM0($2)]) : rowType = RecordType(ANY ss_store_sk, 
ANY $f1, BIGINT $f2): rowcount = 202.0, cumulative cost = {5.822707339551244E8 
rows, 4.714956004142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 
7.2859002984528E7 memory}, id = 45407
07-03                                                  
HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY ss_store_sk, ANY 
$f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = 
{5.82187496248274E8 rows, 4.712292397523715E9 cpu, 1.152134194E9 io, 
1.1275973124348644E10 network, 7.139401934396178E7 memory}, id = 45406
10-01                                                    HashAgg(group=[{0}], 
agg#0=[$SUM0($1)], agg#1=[COUNT($1)]) : rowType = RecordType(ANY ss_store_sk, 
ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = 
{5.821042585414236E8 rows, 4.710960594214109E9 cpu, 1.152134194E9 io, 
1.0253148182571495E10 network, 7.139401934396178E7 memory}, id = 45405
10-02                                                      
Project(ss_store_sk=[$0], revenue=[$2]) : rowType = RecordType(ANY ss_store_sk, 
ANY revenue): rowcount = 832377.0685035397, cumulative cost = 
{5.812718814729201E8 rows, 4.6843245280219965E9 cpu, 1.152134194E9 io, 
1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45404
10-03                                                        HashAgg(group=[{0, 
1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, 
ANY revenue): rowcount = 832377.0685035397, cumulative cost = 
{5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 
1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45403
10-04                                                          
HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY 
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, 
cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 
1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, 
id = 45402
11-01                                                            
HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY 
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, 
cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu, 
1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45401
11-02                                                              
Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = 
RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount = 
1316801.1337047734, cumulative cost = {5.774579491337048E8 rows, 
4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 
5878.400000000001 memory}, id = 45400
11-03                                                                
HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY 
ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, 
cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 
2.48987648E7 network, 5878.400000000001 memory}, id = 45399
11-05                                                                  
Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///tpcdsParquet10/SF100/store_sales]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, 
numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, 
`ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY 
ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount 
= 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 
1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45394
11-04                                                                  
BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): 
rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 
2.48987648E7 network, 0.0 memory}, id = 45398
12-01                                                                    
SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): 
rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 
0.0 network, 0.0 memory}, id = 45397
12-02                                                                      
Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY 
d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 
rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45396
12-03                                                                        
Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, 
filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , 
less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, 
`d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): 
rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 
0.0 network, 0.0 memory}, id = 45395
04-02                                        BroadcastExchange : rowType = 
RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost 
= {804.0 rows, 4020.0 cpu, 804.0 io, 2.99679744E7 network, 0.0 memory}, id = 
45413
08-01                                          Scan(table=[[dfs, 
/tpcdsParquet10/SF100/store]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/store, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`s_store_sk`, `s_store_name`]]]) : rowType = 
RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost 
= {402.0 rows, 804.0 cpu, 804.0 io, 0.0 network, 0.0 memory}, id = 45412
03-06                                    HashToRandomExchange(dist0=[[$0]]) : 
rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY 
i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost = 
{408000.0 rows, 4284000.0 cpu, 1020000.0 io, 4.17792E9 network, 0.0 memory}, id 
= 45417
05-01                                      Scan(table=[[dfs, 
/tpcdsParquet10/SF100/item]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/item]], 
selectionRoot=maprfs:/tpcdsParquet10/SF100/item, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`i_item_sk`, `i_item_desc`, `i_current_price`, 
`i_wholesale_cost`, `i_brand`]]]) : rowType = RecordType(ANY i_item_sk, ANY 
i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount 
= 204000.0, cumulative cost = {204000.0 rows, 1020000.0 cpu, 1020000.0 io, 0.0 
network, 0.0 memory}, id = 45416
{noformat}

For query 65, there are two HashAggs (operator 04-05 and 10-03) that are taking 
a long time. Their rowcounts are incorrect. Operator 04-05 is estimated to be 
800K and is actually 55M. Operator 10-03 is estimated to be 800K and is also 
55M. As a result, these two operators only have 9 minor fragments in the new 
plan. This may be caused by the HashJoin rowcount for operator (06-03). The 
estimated rowcount for the HashJoin is 1.3M rows but the actual rowcount is 55M 
rows.

For query 1, hash join operator 03-04 may have a similar problem.

For query 45, hash join operator 16-02 may have a similar problem.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to