[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24902:
--
Labels: pull-request-available  (was: )

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following sql returns only one record (20210308) but expected two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>   then  
>   cast (from_unixtime(unix_timestamp(cast(20210309 as 
> string),'MMdd') - 86400,'MMdd') as bigint)
>   else 
>   20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> The query plan has incorrect predict: 
>  predicate: COALESCE((col0 = 1),false) (type: boolean)
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-19 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24902:
-
Description: 
The following sql returns only one record (20210308) but expected two(20210308
20210309).

{code:sql}
select * from (
select 
case when b.a=1
then  
cast (from_unixtime(unix_timestamp(cast(20210309 as 
string),'MMdd') - 86400,'MMdd') as bigint)
else 
20210309 
   end 
as col
from 
(select stack(2,1,2) as (a))
 as b
) t 
where t.col is not null;
{code}

The query plan has incorrect predict: 
 predicate: COALESCE((col0 = 1),false) (type: boolean)
{code:sql}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
stats: COMPLETE
  Select Operator
expressions: 2 (type: int), 1 (type: int), 2 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column 
stats: COMPLETE
UDTF Operator
  Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
  function name: stack
  Filter Operator
predicate: COALESCE((col0 = 1),false) (type: boolean)
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
(20210309L) END (type: bigint)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
  ListSink

Time taken: 0.155 seconds, Fetched: 28 row(s)

{code}


  was:
The following sql returns only one record (20210308)but we expect two(20210308
20210309).

{code:sql}
select * from (
select 
case when b.a=1
   then  
cast 
(from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
86400,'MMdd') as bigint)
  else 
  20210309 
   end 
as col
from 
(select stack(2,1,2) as (a))
 as b
) t 
where t.col is not null;
{code}

After debuging, i find the ReduceExpressionsRule changes expression in the 
wrong way.
Original expression:

{code:sql}
IS NOT NULL(CASE(=($0, 1), 
CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
20210309))
{code}

After reducing expressions:
{code:sql}
CASE(=($0, 1), IS NOT 
NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), true)
{code}

The query plan in main branch:
{code:sql}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
stats: COMPLETE
  Select Operator
expressions: 2 (type: int), 1 (type: int), 2 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column 
stats: COMPLETE
UDTF Operator
  Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
  function name: stack
  Filter Operator
predicate: COALESCE((col0 = 1),false) (type: boolean)
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
(20210309L) END (type: bigint)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
  ListSink

Time taken: 0.155 seconds, Fetched: 28 row(s)

{code}



> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>

[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-19 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24902:
-
Summary: Incorrect result after fold CASE into COALESCE  (was: Incorrect 
result after fold CASE into NVL)

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)