[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

2019-04-13 Thread zengxl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated HIVE-21452:
--
Affects Version/s: (was: 2.3.4)
   (was: 3.1.0)

> Loss of query condition when exist exists
> -
>
> Key: HIVE-21452
> URL: https://issues.apache.org/jira/browse/HIVE-21452
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: zengxl
>Priority: Major
>
> In our production environment, there are four tables to do association 
> queries. There are exists in the conditions, and we found that the first two 
> conditions were lost.
> The following two conditions are missing:
> {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
>   
>  In the test environment, I prepared the data of the following four tables, 
> and a condition was lost in the simulation.
> tables:
> test_table1
>  cust_no,name
>  60001,lisa
>  60002,tina
>  60003,kylin
>  60004,jeny
>  60005,john
>  60006,jamse
> test_table2
>  cust_no,acct_type
>  60001,1
>  60001,1
>  60001,2
>  60002,1
>  60003,2
>  60003,3
> test_table3
>  cust_no
>  60001
>  60002
>  60003
>  60004
>  60005
>  60007
> test_table4
>  cust_no,cust_level
>  60001,1
>  60002,2
>  60003,3
>  60004,4
>  60005,5
>  
> create table tmp.test_table1(cust_no string,name string);
>  create table tmp.test_table2(cust_no string,acct_type string);
>  create table tmp.test_table3(cust_no string);
>  create table tmp.test_table4(cust_no string,cust_level string);
> insert into tmp.test_table1 select '60001','lisa';
>  insert into tmp.test_table1 select '60002','tina';
>  insert into tmp.test_table1 select '60003','kylin';
>  insert into tmp.test_table1 select '60004','jeny';
>  insert into tmp.test_table1 select '60005','john';
>  insert into tmp.test_table1 select '60006','jamse';
> insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','2';
>  insert into tmp.test_table2 select '60002','1';
>  insert into tmp.test_table2 select '60003','2';
>  insert into tmp.test_table2 select '60002','3';
> insert into tmp.test_table3 select '60001';
>  insert into tmp.test_table3 select '60002';
>  insert into tmp.test_table3 select '60003';
>  insert into tmp.test_table3 select '60004';
>  insert into tmp.test_table3 select '60005';
>  insert into tmp.test_table3 select '60007';
> insert into tmp.test_table4 select '60001','1';
>  insert into tmp.test_table4 select '60002','2';
>  insert into tmp.test_table4 select '60003','3';
>  insert into tmp.test_table4 select '60004','4';
>  insert into tmp.test_table4 select '60005','5';
>   
>  Here is my query SQL And shut down mapjoin:
> set hive.auto.convert.join=false;
> select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from 
> tmp.test_table1 t1
>  left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
>  and t2.acct_type='1'
>  left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
>  where t2.cust_no is null and t4.cust_level not in ('4','5') and exists 
> (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)
>  
> All I want is to include cust_no for 6003,But the result is inclusive 6004 
> and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came 
> out。Loss of condition because cust_no is  null。{color}
> {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
> condition missing{color}
> *{color:#f6c342}Explain:{color}*
> STAGE DEPENDENCIES:
>  Stage-1 is a root stage
>  Stage-2 depends on stages: Stage-1
>  Stage-0 depends on stages: Stage-2
> STAGE PLANS:
>  Stage: Stage-1
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: t1
>  Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: cust_no is not null (type: boolean)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  value expressions: name (type: string)
>  TableScan
>  alias: t2
>  Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  TableScan
>  alias: t4
>  Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE 

[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

2019-03-28 Thread zengxl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated HIVE-21452:
--
Description: 
In our production environment, there are four tables to do association queries. 
There are exists in the conditions, and we found that the first two conditions 
were lost.

The following two conditions are missing:

{color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
  
 In the test environment, I prepared the data of the following four tables, and 
a condition was lost in the simulation.

tables:

test_table1
 cust_no,name
 60001,lisa
 60002,tina
 60003,kylin
 60004,jeny
 60005,john
 60006,jamse

test_table2
 cust_no,acct_type
 60001,1
 60001,1
 60001,2
 60002,1
 60003,2
 60003,3

test_table3
 cust_no
 60001
 60002
 60003
 60004
 60005
 60007

test_table4
 cust_no,cust_level
 60001,1
 60002,2
 60003,3
 60004,4
 60005,5

 

create table tmp.test_table1(cust_no string,name string);
 create table tmp.test_table2(cust_no string,acct_type string);
 create table tmp.test_table3(cust_no string);
 create table tmp.test_table4(cust_no string,cust_level string);

insert into tmp.test_table1 select '60001','lisa';
 insert into tmp.test_table1 select '60002','tina';
 insert into tmp.test_table1 select '60003','kylin';
 insert into tmp.test_table1 select '60004','jeny';
 insert into tmp.test_table1 select '60005','john';
 insert into tmp.test_table1 select '60006','jamse';

insert into tmp.test_table2 select '60001','1';
 insert into tmp.test_table2 select '60001','1';
 insert into tmp.test_table2 select '60001','2';
 insert into tmp.test_table2 select '60002','1';
 insert into tmp.test_table2 select '60003','2';
 insert into tmp.test_table2 select '60002','3';

insert into tmp.test_table3 select '60001';
 insert into tmp.test_table3 select '60002';
 insert into tmp.test_table3 select '60003';
 insert into tmp.test_table3 select '60004';
 insert into tmp.test_table3 select '60005';
 insert into tmp.test_table3 select '60007';

insert into tmp.test_table4 select '60001','1';
 insert into tmp.test_table4 select '60002','2';
 insert into tmp.test_table4 select '60003','3';
 insert into tmp.test_table4 select '60004','4';
 insert into tmp.test_table4 select '60005','5';
  
 Here is my query SQL And shut down mapjoin:

set hive.auto.convert.join=false;

select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 
t1
 left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
 and t2.acct_type='1'
 left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
 where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 
1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)

 

All I want is to include cust_no for 6003,But the result is inclusive 6004 and 
6005,this wrong 。{color:#f79232}In my production environment, 6001 came 
out。Loss of condition because cust_no is  null。{color}

{color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
condition missing{color}

*{color:#f6c342}Explain:{color}*

STAGE DEPENDENCIES:
 Stage-1 is a root stage
 Stage-2 depends on stages: Stage-1
 Stage-0 depends on stages: Stage-2

STAGE PLANS:
 Stage: Stage-1
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: cust_no is not null (type: boolean)
 Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
NONE
 value expressions: name (type: string)
 TableScan
 alias: t2
 Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
 Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: NONE
 TableScan
 alias: t4
 Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: cust_no is not null (type: boolean)
 Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: NONE
 Reduce Operator Tree:
 Join Operator
 condition map:
 Left Outer Join0 to 1
 Left Outer Join0 to 2
 keys:
 0 cust_no (type: string)
 1 cust_no (type: string)
 2 cust_no (type: string)
 outputColumnNames: _col0, _col1, _col5
 Statistics: Num rows: 13 

[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

2019-03-27 Thread zengxl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated HIVE-21452:
--
Affects Version/s: 3.1.0
   2.3.4

> Loss of query condition when exist exists
> -
>
> Key: HIVE-21452
> URL: https://issues.apache.org/jira/browse/HIVE-21452
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.1.0, 2.3.4
>Reporter: zengxl
>Priority: Major
>
> In our production environment, there are four tables to do association 
> queries. There are exists in the conditions, and we found that the first two 
> conditions were lost.
> The following two conditions are missing:
> {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
>   
>  In the test environment, I prepared the data of the following four tables, 
> and a condition was lost in the simulation.
> tables:
> test_table1
>  cust_no,name
>  60001,lisa
>  60002,tina
>  60003,kylin
>  60004,jeny
>  60005,john
>  60006,jamse
> test_table2
>  cust_no,acct_type
>  60001,1
>  60001,1
>  60001,2
>  60002,1
>  60003,2
>  60003,3
> test_table3
>  cust_no
>  60001
>  60002
>  60003
>  60004
>  60005
>  60007
> test_table4
>  cust_no,cust_level
>  60001,1
>  60002,2
>  60003,3
>  60004,4
>  60005,5
>  
> create table tmp.test_table1(cust_no string,name string);
>  create table tmp.test_table2(cust_no string,acct_type string);
>  create table tmp.test_table3(cust_no string);
>  create table tmp.test_table4(cust_no string,cust_level string);
> insert into tmp.test_table1 select '60001','lisa';
>  insert into tmp.test_table1 select '60002','tina';
>  insert into tmp.test_table1 select '60003','kylin';
>  insert into tmp.test_table1 select '60004','jeny';
>  insert into tmp.test_table1 select '60005','john';
>  insert into tmp.test_table1 select '60006','jamse';
> insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','2';
>  insert into tmp.test_table2 select '60002','1';
>  insert into tmp.test_table2 select '60003','2';
>  insert into tmp.test_table2 select '60002','3';
> insert into tmp.test_table3 select '60001';
>  insert into tmp.test_table3 select '60002';
>  insert into tmp.test_table3 select '60003';
>  insert into tmp.test_table3 select '60004';
>  insert into tmp.test_table3 select '60005';
>  insert into tmp.test_table3 select '60007';
> insert into tmp.test_table4 select '60001','1';
>  insert into tmp.test_table4 select '60002','2';
>  insert into tmp.test_table4 select '60003','3';
>  insert into tmp.test_table4 select '60004','4';
>  insert into tmp.test_table4 select '60005','5';
>   
>  Here is my query SQL And shut down mapjoin:
> set hive.auto.convert.join=false;
> select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from 
> tmp.test_table1 t1
>  left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
>  and t2.acct_type='1'
>  left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
>  where t2.cust_no is null and t4.cust_level not in ('4','5') and exists 
> (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)
>  
> All I want is to include cust_no for 6003,But the result is inclusive 6004 
> and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came 
> out。Loss of condition because cust_no is  null。{color}
> {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
> condition missing{color}
> *{color:#f6c342}Explain:{color}*
> STAGE DEPENDENCIES:
>  Stage-1 is a root stage
>  Stage-2 depends on stages: Stage-1
>  Stage-0 depends on stages: Stage-2
> STAGE PLANS:
>  Stage: Stage-1
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: t1
>  Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: cust_no is not null (type: boolean)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  value expressions: name (type: string)
>  TableScan
>  alias: t2
>  Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  TableScan
>  alias: t4
>  Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column 

[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

2019-03-18 Thread zengxl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated HIVE-21452:
--
Description: 
In our production environment, there are four tables to do association queries. 
There are exists in the conditions, and we found that the first two conditions 
were lost.

The following two conditions are missing:

{color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
  
 In the test environment, I prepared the data of the following four tables, and 
a condition was lost in the simulation.

tables:

test_table1
 cust_no,name
 60001,lisa
 60002,tina
 60003,kylin
 60004,jeny
 60005,john
 60006,jamse

test_table2
 cust_no,acct_type
 60001,1
 60001,1
 60001,2
 60002,1
 60003,2
 60003,3

test_table3
 cust_no
 60001
 60002
 60003
 60004
 60005
 60007

test_table4
 cust_no,cust_level
 60001,1
 60002,2
 60003,3
 60004,4
 60005,5

 

create table tmp.test_table1(cust_no string,name string);
 create table tmp.test_table2(cust_no string,acct_type string);
 create table tmp.test_table3(cust_no string);
 create table tmp.test_table4(cust_no string,cust_level string);

insert into tmp.test_table1 select '60001','lisa';
 insert into tmp.test_table1 select '60002','tina';
 insert into tmp.test_table1 select '60003','kylin';
 insert into tmp.test_table1 select '60004','jeny';
 insert into tmp.test_table1 select '60005','john';
 insert into tmp.test_table1 select '60006','jamse';

insert into tmp.test_table2 select '60001','1';
 insert into tmp.test_table2 select '60001','1';
 insert into tmp.test_table2 select '60001','2';
 insert into tmp.test_table2 select '60002','1';
 insert into tmp.test_table2 select '60003','2';
 insert into tmp.test_table2 select '60002','3';

insert into tmp.test_table3 select '60001';
 insert into tmp.test_table3 select '60002';
 insert into tmp.test_table3 select '60003';
 insert into tmp.test_table3 select '60004';
 insert into tmp.test_table3 select '60005';
 insert into tmp.test_table3 select '60007';

insert into tmp.test_table4 select '60001','1';
 insert into tmp.test_table4 select '60002','2';
 insert into tmp.test_table4 select '60003','3';
 insert into tmp.test_table4 select '60004','4';
 insert into tmp.test_table4 select '60005','5';
  
 Here is my query SQL And shut down mapjoin:

set hive.auto.convert.join=false;

select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 
t1
 left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
 and t2.acct_type='1'
 left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
 where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 
1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)

 

All I want is to include cust_no for 6003,But the result is inclusive 6004 and 
6005,this wrong 。{color:#f79232}In my production environment, 6001 came 
out。Loss of condition because cust_no is  null。{color}

{color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
condition missing{color}

*{color:#f6c342}Explain:{color}*

STAGE DEPENDENCIES:
 Stage-1 is a root stage
 Stage-2 depends on stages: Stage-1
 Stage-0 depends on stages: Stage-2

STAGE PLANS:
 Stage: Stage-1
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: cust_no is not null (type: boolean)
 Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
NONE
 value expressions: name (type: string)
 TableScan
 alias: t2
 Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
 Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: NONE
 TableScan
 alias: t4
 Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column stats: 
NONE
 Filter Operator
 predicate: cust_no is not null (type: boolean)
 Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: cust_no (type: string)
 sort order: +
 Map-reduce partition columns: cust_no (type: string)
 Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: NONE
 Reduce Operator Tree:
 Join Operator
 condition map:
 Left Outer Join0 to 1
 Left Outer Join0 to 2
 keys:
 0 cust_no (type: string)
 1 cust_no (type: string)
 2 cust_no (type: string)
 outputColumnNames: _col0, _col1, _col5
 Statistics: Num rows: 13