Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 )
Change subject: IMPALA-7368: Add initial support for DATE type ...................................................................... Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@20 PS11, Line 20: - from STRING to DATE. The string value must be formatted as : yyyy-MM-dd. This seems to be stricter than CAST() in postgres, mysql, and hive (see https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0 row 31) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38 PS11, Line 38: E.g: year('2019-02-15') must resolve to : year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is : not implemented yet, so this is not an issue at the moment but : it will be in the future. > Not necessarily. The valid range for TIMESTAMP is from 1400-01-01 to 9999-1 Should we be emitting a WARNING for any out-of-range values? it seems surprising to silently emit NULL for out-of-range casts (I had a customer complain about this last week wrt timestamp) http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: public boolean compare(Function other, CompareMode mode) { > I agree that the function overload resolution algorithm needs to be redesig I'm afraid about introducing new semantics here which we'll need to break again later. I did some testing on your patch relative to Hive 2, Hive 3, postgres, and MariaDB, and the results were pretty interesting: https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0 It seems like at least in the cases where Hive3 and Postgres agree, we should probably be giving the same results as well? It certainly seems like Impala is giving NULL in some places I wouldn't have expected. Maybe we can add a few more queries here, and a column for Impala (prior to the patch) to make sure we aren't introducing any backward-incompatible behavior? In other words, I think the priorities should be: 1) don't break existing queries 2) for queries where postgres and Hive3 agree, we should give the same result 3) for queries where postgres and Hive differ, we should probably go with Hive3 or collaborate with the Hive team to see if they think this is a bug In the case that this would require a big restructuring, I'd prefer if in the meantime we result these queries in an error like "ambiguous types detected, insert casts to resolve". Later, if we can figure out correct implicit casting rules, we can loosen it. But I'm against introducing new behavior which we already know that we'll have to break. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 02 Apr 2019 19:35:01 +0000 Gerrit-HasComments: Yes
