[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17148:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vlad!

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Fix For: 3.0.0
>
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.3.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.2.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Open  (was: Patch Available)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-02 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Open  (was: Patch Available)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-02 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-02 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.1.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-24 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRO

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-24 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:(a1 is not null and b1 is not null)
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> S

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Environment: (was: {color:red}colored text{color})

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:{color:red}(a1 is not null and b1 is not null){color}
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type? otherwise HiveJoinAddNotRule which 
> creates whis problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type? otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRO