[ 
https://issues.apache.org/jira/browse/IMPALA-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-9707.
-------------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Parquet stat filtering issue when min/max values are cast to NULL
> -----------------------------------------------------------------
>
>                 Key: IMPALA-9707
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9707
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: correctness, parquet
>             Fix For: Impala 4.0
>
>
> This issue can occur if there is a cast during the evaluation of the min/max 
> stats and the min or the max value are cast to NULL.
> Example:
> {code}
> create table ts (dt string) stored as parquet;
> insert into ts values ("2010-01-01"), ("non ts");
> set PARQUET_READ_STATISTICS=1;
> select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 0 rows
> set PARQUET_READ_STATISTICS=0;
> select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 1 row
> {code}
> The issue doesn't occur if "non ts" is not added to the table.
> I think the root cause is that cast(max_stat_for_dt as timestamp) >= 
> cast("2010-01-01") is evaluated during stat filtering, and as "non ts" is the 
> biggest STRING in the table, we'll cast it to TIMESTAMP, which returns NULL. 
> As <= with NULL always returns NULL, Impala will think that the row group 
> doesn't contain values <= 2010-01-01.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to