Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
......................................................................


Patch Set 14:

(5 comments)

Looking pretty close to good. I just have some questions digging into the 
function resolution stuff. I don't have a really specific example of something 
I think is wrong, but I want to make sure we're pretty clear on what our 
function resolution policy is and that we think it won't have any unexpected 
user behavior. I'll spend some more time on that tomorrow

http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java:

http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@52
PS14, Line 52:   public List<PartitionKeyValue> getPartitionSpecKeyValues() { 
return partitionSpec_; }
Do we ever expect this to get called prior to analysis?

Maybe it makes sense to add a bool analyzed_ and 
Preconditions.checkState(analyzed_); here to ensure that we don't ever return 
the pre-analysis KVs? This makes sure that we don't expose out the 
partitionSpec_ before it becomes immutable.


http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@156
PS14, Line 156:   public List<TPartitionKeyValue> toThrift() {
same (Precondition on analysis if possible)


http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java
File fe/src/main/java/org/apache/impala/catalog/Function.java:

http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191
PS11, Line 191:   public boolean compare(Function other, CompareMode mode) {
> I took a quick look at the SQL ANSI standard, but couldn't find anything ab
hm, I spent some time researching this across multiple databases:

DB2's logic is very clearly described: 
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_determinbestfitfunction.html
(it seems to go left-to-right)

Postgres's is also pretty well described and seems relatively similar to what's 
done here:

https://www.postgresql.org/docs/10/typeconv-func.html

Hive's also seems to bail earlier with an "ambiguous" error when doing method 
resolution for UDFs: "Closest match is defined as the one that requires the 
least number of arguments to be converted. In case more than one matches are 
found, the method throws an ambiguous method exception"


Comparing Postgres to what's implemented here, it seems like our two gaps are:
1) they have a well-defined concept of "preferred type". I checked the pg 
catalog and for the 'date/time' category, timestamptz is the 'preferred type':

=> select oid,typname, typispreferred from pg_type where typcategory = 'D';
  oid  |   typname   | typispreferred
-------+-------------+----------------
   702 | abstime     | f
  1082 | date        | f
  1083 | time        | f
  1114 | timestamp   | f
  1184 | timestamptz | t
  1266 | timetz      | f
 12401 | time_stamp  | f
(7 rows)

It seems that, because it only allows implicit conversion to "preferred" types, 
it's somewhat more strict than what we do today, though.

2) they have a concept of 'unknown' type to represent string literals in 
queries as being distinct from an expression with a defined type (eg any other 
expression than a string literal). I wonder if we should be treating 
StringLiteral casts differently than other exprs?


http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@236
PS11, Line 236:     // Check trailing varargs.
I think we could simplify this code a lot by adding some utility function like:

Type[] tryExtendArgsToLength(int numArgs) {
  if (!hasVarArgs_ || argTypes_.length <= numArgs) return argTypes_;
  Type[] ret = Arrays.copyOf(argTypes_, numArgs);
  for (int i = argTypes_.length; i < numArgs; i++) {
    ret[i] = getVarArgsType();
  }
  return ret;
}

then in these comparison functions, we can extend the varargs to match the 
user-provided call types and do the rest of the logic on fixed-length argument 
lists, without having to duplicate code between L227-234 and L239-247.


http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java
File fe/src/main/java/org/apache/impala/util/FunctionUtils.java:

http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@243
PS11, Line 243:     return max_func;
it's interesting that our function resolution returns the first fit, from the 
order returned by the catalog. Best I can tell, we never will fail with an 
"ambiguous resolution" error. Now that we have both DATE and TIMESTAMP (and 
assumedly we'll add DATETIME at some point?) is there really no case that we 
want to force an explicit match by failing the user query when ambiguous?



--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Tue, 26 Mar 2019 00:21:02 +0000
Gerrit-HasComments: Yes

Reply via email to