Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 )
Change subject: IMPALA-7368: Add initial support for DATE type ...................................................................... Patch Set 14: (5 comments) Looking pretty close to good. I just have some questions digging into the function resolution stuff. I don't have a really specific example of something I think is wrong, but I want to make sure we're pretty clear on what our function resolution policy is and that we think it won't have any unexpected user behavior. I'll spend some more time on that tomorrow http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@52 PS14, Line 52: public List<PartitionKeyValue> getPartitionSpecKeyValues() { return partitionSpec_; } Do we ever expect this to get called prior to analysis? Maybe it makes sense to add a bool analyzed_ and Preconditions.checkState(analyzed_); here to ensure that we don't ever return the pre-analysis KVs? This makes sure that we don't expose out the partitionSpec_ before it becomes immutable. http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@156 PS14, Line 156: public List<TPartitionKeyValue> toThrift() { same (Precondition on analysis if possible) http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: public boolean compare(Function other, CompareMode mode) { > I took a quick look at the SQL ANSI standard, but couldn't find anything ab hm, I spent some time researching this across multiple databases: DB2's logic is very clearly described: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_determinbestfitfunction.html (it seems to go left-to-right) Postgres's is also pretty well described and seems relatively similar to what's done here: https://www.postgresql.org/docs/10/typeconv-func.html Hive's also seems to bail earlier with an "ambiguous" error when doing method resolution for UDFs: "Closest match is defined as the one that requires the least number of arguments to be converted. In case more than one matches are found, the method throws an ambiguous method exception" Comparing Postgres to what's implemented here, it seems like our two gaps are: 1) they have a well-defined concept of "preferred type". I checked the pg catalog and for the 'date/time' category, timestamptz is the 'preferred type': => select oid,typname, typispreferred from pg_type where typcategory = 'D'; oid | typname | typispreferred -------+-------------+---------------- 702 | abstime | f 1082 | date | f 1083 | time | f 1114 | timestamp | f 1184 | timestamptz | t 1266 | timetz | f 12401 | time_stamp | f (7 rows) It seems that, because it only allows implicit conversion to "preferred" types, it's somewhat more strict than what we do today, though. 2) they have a concept of 'unknown' type to represent string literals in queries as being distinct from an expression with a defined type (eg any other expression than a string literal). I wonder if we should be treating StringLiteral casts differently than other exprs? http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@236 PS11, Line 236: // Check trailing varargs. I think we could simplify this code a lot by adding some utility function like: Type[] tryExtendArgsToLength(int numArgs) { if (!hasVarArgs_ || argTypes_.length <= numArgs) return argTypes_; Type[] ret = Arrays.copyOf(argTypes_, numArgs); for (int i = argTypes_.length; i < numArgs; i++) { ret[i] = getVarArgsType(); } return ret; } then in these comparison functions, we can extend the varargs to match the user-provided call types and do the rest of the logic on fixed-length argument lists, without having to duplicate code between L227-234 and L239-247. http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java File fe/src/main/java/org/apache/impala/util/FunctionUtils.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@243 PS11, Line 243: return max_func; it's interesting that our function resolution returns the first fit, from the order returned by the catalog. Best I can tell, we never will fail with an "ambiguous resolution" error. Now that we have both DATE and TIMESTAMP (and assumedly we'll add DATETIME at some point?) is there really no case that we want to force an explicit match by failing the user query when ambiguous? -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 26 Mar 2019 00:21:02 +0000 Gerrit-HasComments: Yes