[jira] [Resolved] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27537. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, > HIVE-21498 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-27537: Affects Version/s: 3.1.3 (was: 3.2.0) > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, > HIVE-21498 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.1.3 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated HIVE-27537: Summary: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498 (was: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498 to branch-3) > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, > HIVE-21498 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-27537: Summary: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, HIVE-21498 to branch-3 (was: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833 to branch-3) > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833, > HIVE-21498 to branch-3 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated HIVE-27537: Summary: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453 (was: Backport of HIVE-19416 to branch-3) > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27537) Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-27537: Summary: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833 to branch-3 (was: Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453) > Backport of HIVE-19416, HIVE-20293, HIVE-20117, HIVE-18453, HIVE-20833 to > branch-3 > -- > > Key: HIVE-27537 > URL: https://issues.apache.org/jira/browse/HIVE-27537 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765949#comment-17765949 ] Diksha commented on HIVE-27324: --- {quote}!sql_queries.png! {quote} > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > Attachments: sql_queries.png > > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > --- > > Then executed the below query: > -- > select * from t3 > where age not in (select distinct(age) age from t1); > -- > > The result should be as below: > {code:java} > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > | 9 | ron | 3 | > | 10 | Sam | 22 | > | 11 | nick | 19 | > | 12 | fed | 18 | > | 13 | kong | 13 | > ++--+-+ > 5 rows selected (35.897 seconds) {code} > But when we run the above query it is giving zero records: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t1); > INFO : Compiling > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): > select * from t3 > where age not in (select distinct(age) age from t1) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > . > INFO : Completed executing > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); > Time taken: 10.191 seconds > INFO : OK > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > ++--+-+ > No rows selected (12.17 seconds) {code} > The query works fine when we use nvl function or not null condition. > > So as a workaround we can use nvl function for both main and sub query as > below: > {code:java} > select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age > from t1); {code} > > *SECOND ISSUE:* > Also while testing multiple scenario's i found one more issue as well. > When the sub query table (t1) doesn't contain any null values then the query > is giving result but it is ignoring the null values of the main table(t3) . > > For example: Created another table t4 and inserted the data's without any > null values: > create table t4 (id int,name string, age int); > insert into t4 > values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > Now i tested with the below query and it gives 5 records. The count should be > six and it omitted the null value of the table t3: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t4); > INFO : Compiling > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): > select * from t3 > where age not in (select distinct(age) age from t4) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > .. > INFO : Completed executing > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a11
[jira] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324 ] Diksha deleted comment on HIVE-27324: --- was (Author: JIRAUSER298900): {quote}!sql_queries.png! {quote} > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > Attachments: sql_queries.png > > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > --- > > Then executed the below query: > -- > select * from t3 > where age not in (select distinct(age) age from t1); > -- > > The result should be as below: > {code:java} > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > | 9 | ron | 3 | > | 10 | Sam | 22 | > | 11 | nick | 19 | > | 12 | fed | 18 | > | 13 | kong | 13 | > ++--+-+ > 5 rows selected (35.897 seconds) {code} > But when we run the above query it is giving zero records: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t1); > INFO : Compiling > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): > select * from t3 > where age not in (select distinct(age) age from t1) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > . > INFO : Completed executing > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); > Time taken: 10.191 seconds > INFO : OK > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > ++--+-+ > No rows selected (12.17 seconds) {code} > The query works fine when we use nvl function or not null condition. > > So as a workaround we can use nvl function for both main and sub query as > below: > {code:java} > select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age > from t1); {code} > > *SECOND ISSUE:* > Also while testing multiple scenario's i found one more issue as well. > When the sub query table (t1) doesn't contain any null values then the query > is giving result but it is ignoring the null values of the main table(t3) . > > For example: Created another table t4 and inserted the data's without any > null values: > create table t4 (id int,name string, age int); > insert into t4 > values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > Now i tested with the below query and it gives 5 records. The count should be > six and it omitted the null value of the table t3: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t4); > INFO : Compiling > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): > select * from t3 > where age not in (select distinct(age) age from t4) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > .. > INFO : Completed executing > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); > Time taken: 17.724 seconds > INFO : OK > ++--
[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diksha updated HIVE-27324: -- Attachment: sql_queries.png > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > Attachments: sql_queries.png > > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > --- > > Then executed the below query: > -- > select * from t3 > where age not in (select distinct(age) age from t1); > -- > > The result should be as below: > {code:java} > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > | 9 | ron | 3 | > | 10 | Sam | 22 | > | 11 | nick | 19 | > | 12 | fed | 18 | > | 13 | kong | 13 | > ++--+-+ > 5 rows selected (35.897 seconds) {code} > But when we run the above query it is giving zero records: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t1); > INFO : Compiling > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): > select * from t3 > where age not in (select distinct(age) age from t1) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > . > INFO : Completed executing > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); > Time taken: 10.191 seconds > INFO : OK > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > ++--+-+ > No rows selected (12.17 seconds) {code} > The query works fine when we use nvl function or not null condition. > > So as a workaround we can use nvl function for both main and sub query as > below: > {code:java} > select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age > from t1); {code} > > *SECOND ISSUE:* > Also while testing multiple scenario's i found one more issue as well. > When the sub query table (t1) doesn't contain any null values then the query > is giving result but it is ignoring the null values of the main table(t3) . > > For example: Created another table t4 and inserted the data's without any > null values: > create table t4 (id int,name string, age int); > insert into t4 > values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > Now i tested with the below query and it gives 5 records. The count should be > six and it omitted the null value of the table t3: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t4); > INFO : Compiling > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): > select * from t3 > where age not in (select distinct(age) age from t4) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > .. > INFO : Completed executing > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); > Time taken: 17.724 seconds > INFO : OK > +---
[jira] [Comment Edited] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765948#comment-17765948 ] Diksha edited comment on HIVE-27324 at 9/16/23 11:03 AM: - sql queries ran in online sql compiler: create table t3 (id int,name varchar(255), age int); insert into t3 values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); create table t4 (id int,name varchar(255), age int); insert into t4 values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); create table t5 (id int,name varchar(255), ages int); insert into t5 values(1,'Sagar',23),(3,'Surya',NULL),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); select * from t3 where age in (select distinct(age) age from t4) order by age ; select * from t3 where age not in (select distinct(age) age from t4 ) order by age ; select * from t3 where age not in (select distinct(ages) ages from t5 where t5.ages is not null) order by age ; select count(*) from t3 where age not in (select distinct(ages) ages from t5 ) order by age ; select count(*) from t3 where age not in (23,22, null ); select count(*) from t3 where age not in (select distinct(age)age from t3 t1 where t1.age > 10); Outputs: 6|Ramya|5 1|Sagar|23 3|Surya|23 5|Scott|23 7||23 8||23 4|Raman|45 14|hela|45 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 0 0 2 [Execution complete with exit code 0] was (Author: JIRAUSER298900): sql queries ran in online sql compiler: create table t3 (id int,name varchar(255), age int); insert into t3 values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); create table t4 (id int,name varchar(255), age int); insert into t4 values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); create table t5 (id int,name varchar(255), ages int); insert into t5 values(1,'Sagar',23),(3,'Surya',NULL),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); select * from t3 where age in (select distinct(age) age from t4) order by age ; select * from t3 where age not in (select distinct(age) age from t4 ) order by age ; select * from t3 where age not in (select distinct(ages) ages from t5 where t5.ages is not null) order by age ; select count(*) from t3 where age not in (select distinct(ages) ages from t5 ) order by age ; select count(*) from t3 where age not in (23,22, null ); select count(*) from t3 where age not in (select distinct(age)age from t3 t1 where t1.age > 10); Outputs: 6|Ramya|5 1|Sagar|23 3|Surya|23 5|Scott|23 7||23 8||23 4|Raman|45 14|hela|45 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 0 0 2 [Execution complete with exit code 0] > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > ---
[jira] [Commented] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765948#comment-17765948 ] Diksha commented on HIVE-27324: --- sql queries ran in online sql compiler: create table t3 (id int,name varchar(255), age int); insert into t3 values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); create table t4 (id int,name varchar(255), age int); insert into t4 values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); create table t5 (id int,name varchar(255), ages int); insert into t5 values(1,'Sagar',23),(3,'Surya',NULL),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); select * from t3 where age in (select distinct(age) age from t4) order by age ; select * from t3 where age not in (select distinct(age) age from t4 ) order by age ; select * from t3 where age not in (select distinct(ages) ages from t5 where t5.ages is not null) order by age ; select count(*) from t3 where age not in (select distinct(ages) ages from t5 ) order by age ; select count(*) from t3 where age not in (23,22, null ); select count(*) from t3 where age not in (select distinct(age)age from t3 t1 where t1.age > 10); Outputs: 6|Ramya|5 1|Sagar|23 3|Surya|23 5|Scott|23 7||23 8||23 4|Raman|45 14|hela|45 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 9|ron|3 13|kong|13 12|fed|18 11|nick|19 10|Sam|22 0 0 2 [Execution complete with exit code 0] > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > --- > > Then executed the below query: > -- > select * from t3 > where age not in (select distinct(age) age from t1); > -- > > The result should be as below: > {code:java} > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > | 9 | ron | 3 | > | 10 | Sam | 22 | > | 11 | nick | 19 | > | 12 | fed | 18 | > | 13 | kong | 13 | > ++--+-+ > 5 rows selected (35.897 seconds) {code} > But when we run the above query it is giving zero records: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t1); > INFO : Compiling > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): > select * from t3 > where age not in (select distinct(age) age from t1) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > . > INFO : Completed executing > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); > Time taken: 10.191 seconds > INFO : OK > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > ++--+-+ > No rows selected (12.17 seconds) {code} > The query works fine when we use nvl function or not null condition. > > So as a workaround we can use nvl function for both main a
[jira] [Updated] (HIVE-27696) Docker build from source should include iceberg profile
[ https://issues.apache.org/jira/browse/HIVE-27696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27696: -- Labels: pull-request-available (was: ) > Docker build from source should include iceberg profile > --- > > Key: HIVE-27696 > URL: https://issues.apache.org/jira/browse/HIVE-27696 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Building docker image from source doesn't include iceberg profile. So, > creating iceberg tables by images built by it fails > {noformat} > 0: jdbc:hive2://localhost:1/> CREATE TABLE test (ID INT) STORED BY > ICEBERG TBLPROPERTIES('format-version'='2'); > Error: Error while compiling statement: FAILED: SemanticException Cannot find > class 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > (state=42000,code=4){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27696) Docker build from source should include iceberg profile
Ayush Saxena created HIVE-27696: --- Summary: Docker build from source should include iceberg profile Key: HIVE-27696 URL: https://issues.apache.org/jira/browse/HIVE-27696 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Building docker image from source doesn't include iceberg profile. So, creating iceberg tables by images built by it fails {noformat} 0: jdbc:hive2://localhost:1/> CREATE TABLE test (ID INT) STORED BY ICEBERG TBLPROPERTIES('format-version'='2'); Error: Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' (state=42000,code=4){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27644) Backport of HIVE-17917, HIVE-21457, HIVE-22582
[ https://issues.apache.org/jira/browse/HIVE-27644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27644. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-17917, HIVE-21457, HIVE-22582 > -- > > Key: HIVE-27644 > URL: https://issues.apache.org/jira/browse/HIVE-27644 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27644) Backport of HIVE-17917, HIVE-21457, HIVE-22582
[ https://issues.apache.org/jira/browse/HIVE-27644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-27644: Affects Version/s: 3.1.3 (was: 3.2.0) > Backport of HIVE-17917, HIVE-21457, HIVE-22582 > -- > > Key: HIVE-27644 > URL: https://issues.apache.org/jira/browse/HIVE-27644 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.1.3 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.
[ https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765925#comment-17765925 ] Sankar Hariappan commented on HIVE-27324: - [~diksha193] Pls attach the mysql output for queries in discussion. > Hive query with NOT IN condition is giving incorrect results when the sub > query table contains the null value. > -- > > Key: HIVE-27324 > URL: https://issues.apache.org/jira/browse/HIVE-27324 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions, 3.1.0 >Reporter: Shobika Selvaraj >Assignee: Diksha >Priority: Major > Labels: pull-request-available > > Hive query giving empty results when the sub query table contains the null > value. > We encountered two issues here. > 1) The query - "select * from t3 where age not in (select distinct(age) age > from t1);" is giving empty results when the table t1 contains a null value. > Disabling cbo didn't helped here. > 2) Let's consider the table t3 has null value and table t1 doesn't have any > null values. Now if we run the above query it is returning other data's but > not the null value from t3. If we disable the cbo then it's giving result > with null value. > > *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:* > *FIRST ISSUE:* > Create two tables and insert data with null values as below: > --- > create table t3 (id int,name string, age int); > > insert into t3 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45); > > create table t1 (id int,name string, age int); > insert into t1 > values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > --- > > Then executed the below query: > -- > select * from t3 > where age not in (select distinct(age) age from t1); > -- > > The result should be as below: > {code:java} > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > | 9 | ron | 3 | > | 10 | Sam | 22 | > | 11 | nick | 19 | > | 12 | fed | 18 | > | 13 | kong | 13 | > ++--+-+ > 5 rows selected (35.897 seconds) {code} > But when we run the above query it is giving zero records: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t1); > INFO : Compiling > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): > select * from t3 > where age not in (select distinct(age) age from t1) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > . > INFO : Completed executing > command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); > Time taken: 10.191 seconds > INFO : OK > ++--+-+ > | t3.id | t3.name | t3.age | > ++--+-+ > ++--+-+ > No rows selected (12.17 seconds) {code} > The query works fine when we use nvl function or not null condition. > > So as a workaround we can use nvl function for both main and sub query as > below: > {code:java} > select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age > from t1); {code} > > *SECOND ISSUE:* > Also while testing multiple scenario's i found one more issue as well. > When the sub query table (t1) doesn't contain any null values then the query > is giving result but it is ignoring the null values of the main table(t3) . > > For example: Created another table t4 and inserted the data's without any > null values: > create table t4 (id int,name string, age int); > insert into t4 > values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23); > Now i tested with the below query and it gives 5 records. The count should be > six and it omitted the null value of the table t3: > {code:java} > 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3 > . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select > distinct(age) age from t4); > INFO : Compiling > command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): > select * from t3 > where age not in (select distinct(age) age from t4) > INFO : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross > product > .. > .. > INFO : Completed executing > command(queryId=hive_20230427164745_f20f47ce-614d-4
[jira] [Updated] (HIVE-26751) Bug Fixes and Improvements for 3.2.0 release
[ https://issues.apache.org/jira/browse/HIVE-26751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-26751: Issue Type: Improvement (was: Task) > Bug Fixes and Improvements for 3.2.0 release > > > Key: HIVE-26751 > URL: https://issues.apache.org/jira/browse/HIVE-26751 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.3 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Blocker > Labels: hive-3.2.0-must, pull-request-available, release-3.2.0 > > Creating subtask to track the bug fixes that will go as part of 3.2.0 release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27212) Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-27212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27212. - Fix Version/s: 3.2.0 Assignee: Diksha Resolution: Duplicate Duplicate with https://issues.apache.org/jira/browse/HIVE-27058 > Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-27212 > URL: https://issues.apache.org/jira/browse/HIVE-27212 > Project: Hive > Issue Type: Sub-task >Reporter: Diksha >Assignee: Diksha >Priority: Major > Fix For: 3.2.0 > > > Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)