[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col
   

[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col
   

[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using by the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col