subject:"\[jira\] \[Commented\] \(HIVE\-24957\) Wrong results when subquery has COALESCE in correlation predicate"

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-01 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313012#comment-17313012
 ] 

Stamatis Zampetakis commented on HIVE-24957:


{quote}it seems to me that we could avoid this issue by materializing the 
coalesced value before the correlated subq:
{quote}
[~kgyrtkirk]  This is what I thought as well :) I am building a prototype along 
this idea.

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-01 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312977#comment-17312977
 ] 

Zoltan Haindrich commented on HIVE-24957:
-

I was wondering if changing the semijoin compare conditional to {{<=>}} (is not 
distinct) would help in this case or not - not sure if we can do I saw some 
null related comments in the decorrelator...

it seems to me that we could avoid this issue by materializing the coalesced 
value before the correlated subq:
{code}
with 
b as (select *,coalesce(b_authorkey, 300) as bb from book)
select b.b_title
from b
where exists
  (select a_authorkey
   from author a
   where bb = a.a_authorkey);
{code}

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-03-30 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311521#comment-17311521
 ] 

Stamatis Zampetakis commented on HIVE-24957:


The problem lies in the query plan and more specifically in the 
{{HiveRelDecorrelator}}. 
{noformat}
2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
  HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
  HiveTableScan(table=[[default, author]], table:alias=[a])

2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
  HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
  HiveProject(_o__c0=[1], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], 
joinType=[inner], algorithm=[none], cost=[not available])
  HiveTableScan(table=[[default, author]], table:alias=[a])
  HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])
{noformat}
The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that 
books with NULL {{b_authorkey}} are removed from the result set. 

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

3 matches

Site Navigation

Mail list logo

Footer information