[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313012#comment-17313012 ] Stamatis Zampetakis commented on HIVE-24957: {quote}it seems to me that we could avoid this issue by materializing the coalesced value before the correlated subq: {quote} [~kgyrtkirk] This is what I thought as well :) I am building a prototype along this idea. > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312977#comment-17312977 ] Zoltan Haindrich commented on HIVE-24957: - I was wondering if changing the semijoin compare conditional to {{<=>}} (is not distinct) would help in this case or not - not sure if we can do I saw some null related comments in the decorrelator... it seems to me that we could avoid this issue by materializing the coalesced value before the correlated subq: {code} with b as (select *,coalesce(b_authorkey, 300) as bb from book) select b.b_title from b where exists (select a_authorkey from author a where bb = a.a_authorkey); {code} > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311521#comment-17311521 ] Stamatis Zampetakis commented on HIVE-24957: The problem lies in the query plan and more specifically in the {{HiveRelDecorrelator}}. {noformat} 2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] rules.RelFieldTrimmer: Plan after trimming unused fields HiveProject(b_title=[$0]) HiveFilter(condition=[EXISTS({ HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) })]) HiveProject(b_title=[$1], b_authorkey=[$2]) HiveTableScan(table=[[default, book]], table:alias=[b]) 2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan before removing subquery: HiveProject(b_title=[$1]) HiveFilter(condition=[EXISTS({ HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) })]) HiveTableScan(table=[[default, book]], table:alias=[b]) 2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan just after removing subquery: HiveProject(b_title=[$1]) LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}]) HiveTableScan(table=[[default, book]], table:alias=[b]) HiveProject(literalTrue=[true]) HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) 2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan after decorrelation: HiveProject(b_title=[$1]) HiveSemiJoin(condition=[=($8, $2)], joinType=[semi]) HiveTableScan(table=[[default, book]], table:alias=[b]) HiveProject(literalTrue=[true], b_authorkey=[$1]) HiveProject(_o__c0=[1], b_authorkey=[$6]) HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, author]], table:alias=[a]) HiveAggregate(group=[{0}]) HiveProject(b_authorkey=[$2]) HiveTableScan(table=[[default, book]], table:alias=[b]) {noformat} The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that books with NULL {{b_authorkey}} are removed from the result set. > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)