Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@20
PS11, Line 20:     - from STRING to DATE. The string value must be formatted as
             :       yyyy-MM-dd.
This seems to be stricter than CAST() in postgres, mysql, and hive (see 
https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0
 row 31)


http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38
PS11, Line 38: E.g: year('2019-02-15') must resolve to
             :        year(TIMESTAMP) instead of year(DATE). Note, that 
year(DATE) is
             :        not implemented yet, so this is not an issue at the 
moment but
             :        it will be in the future.
> Not necessarily. The valid range for TIMESTAMP is from 1400-01-01 to 9999-1
Should we be emitting a WARNING for any out-of-range values? it seems 
surprising to silently emit NULL for out-of-range casts (I had a customer 
complain about this last week wrt timestamp)


http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java
File fe/src/main/java/org/apache/impala/catalog/Function.java:

http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191
PS11, Line 191:   public boolean compare(Function other, CompareMode mode) {
> I agree that the function overload resolution algorithm needs to be redesig
I'm afraid about introducing new semantics here which we'll need to break again 
later. I did some testing on your patch relative to Hive 2, Hive 3, postgres, 
and MariaDB, and the results were pretty interesting:

https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0

It seems like at least in the cases where Hive3 and Postgres agree, we should 
probably be giving the same results as well? It certainly seems like Impala is 
giving NULL in some places I wouldn't have expected.

Maybe we can add a few more queries here, and a column for Impala (prior to the 
patch) to make sure we aren't introducing any backward-incompatible behavior?

In other words, I think the priorities should be:

1) don't break existing queries
2) for queries where postgres and Hive3 agree, we should give the same result
3) for queries where postgres and Hive differ, we should probably go with Hive3 
or collaborate with the Hive team to see if they think this is a bug

In the case that this would require a big restructuring, I'd prefer if in the 
meantime we result these queries in an error like "ambiguous types detected, 
insert casts to resolve". Later, if we can figure out correct implicit casting 
rules, we can loosen it. But I'm against introducing new behavior which we 
already know that we'll have to break.



--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 02 Apr 2019 19:35:01 +0000
Gerrit-HasComments: Yes

Reply via email to