[ https://issues.apache.org/jira/browse/IMPALA-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer resolved IMPALA-9707. ------------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed > Parquet stat filtering issue when min/max values are cast to NULL > ----------------------------------------------------------------- > > Key: IMPALA-9707 > URL: https://issues.apache.org/jira/browse/IMPALA-9707 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend > Reporter: Csaba Ringhofer > Assignee: Csaba Ringhofer > Priority: Critical > Labels: correctness, parquet > Fix For: Impala 4.0 > > > This issue can occur if there is a cast during the evaluation of the min/max > stats and the min or the max value are cast to NULL. > Example: > {code} > create table ts (dt string) stored as parquet; > insert into ts values ("2010-01-01"), ("non ts"); > set PARQUET_READ_STATISTICS=1; > select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 0 rows > set PARQUET_READ_STATISTICS=0; > select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 1 row > {code} > The issue doesn't occur if "non ts" is not added to the table. > I think the root cause is that cast(max_stat_for_dt as timestamp) >= > cast("2010-01-01") is evaluated during stat filtering, and as "non ts" is the > biggest STRING in the table, we'll cast it to TIMESTAMP, which returns NULL. > As <= with NULL always returns NULL, Impala will think that the row group > doesn't contain values <= 2010-01-01. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org