Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19881 )

Change subject: IMPALA-10173: Allow implicit casts between numeric and string 
types when inserting into table
......................................................................


Patch Set 5:

(32 comments)

Thanks for the replies.

http://gerrit.cloudera.org:8080/#/c/19881/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19881/5//COMMIT_MSG@15
PS5, Line 15: set operations
>From this it seems that for set operations the casted expressions don't need 
>to be constant, but at the end of the commit message it says that (for 
>string->[var]char conversions) in inserts the expressions need to be const. 
>Are there other conversions for which the exprs don't need to be const or does 
>the constness requirement also apply to all set operations? It should be made 
>clear in the commit message.


http://gerrit.cloudera.org:8080/#/c/19881/5/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19881/5/common/thrift/ImpalaService.thrift@794
PS5, Line 794:   // Allowing implicit casts with loss of precision.
Nit: empty line before this.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3281
PS5, Line 3281:     if (compatibility.isUnsafe()) {
In this overload we check constness in case of UNSAFE compatibility, but in the 
overload in L3264 we don't. Is there a reason for that?

I would think that we should always check constness with UNSAFE compatibility 
(unless there are cases where non-const expressions are allowed with UNSAFE, 
but so far I've thought there aren't).

But then the problem arises what happens if there are non-const expressions but 
there is a common type also with regular compatibility. Maybe we should fall 
back to the regular compatibility in these cases, or the opposite, always try 
with regular first and only try permissive if it fails. If I understand it 
correctly L3389 is an example of that, but other parts in the code don't do 
that. Maybe we should push it down to Type.getAssignmentCompatibleType() and/or 
ScalarType.getAssignmentCompatibleType() so that no caller calls them without 
constness checks, for example Expr.castTo().


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3284
PS5, Line 3284:       Optional<Expr> rightNonConstExpr = 
expr.getFirstNonConstSourceExpr();
Optional: we only need to get the right first non-const expr if the left one 
doesn't exist, so it could also be

Optional<Expr> nonConstExpr = lastCompatibleExpr.getFirstNonConstSourceExpr();
if (!nonConstExpr.isPresent()) {
  Optional<Expr> nonConstExpr = expr.getFirstNonConstSourceExpr();
}


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3389
PS5, Line 3389:           compatibleType = getCompatibleType(
Do we know of a case where getCompatibleType() gives (non-error) results for 
both regular and permissive compatibility and the results are different? Is it 
the reason we try it with the regular compatibility first?

Or is it because if we try it with UNSAFE first and it contains non-const exprs 
it throws?


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3407
PS5, Line 3407: getCompatibleType
With this overload it's not checked whether the expression contains non-const 
subexpressions, shouldn't it be?

Also, if there are non-const subexpressions, shouldn't we try to check the 
compatible type with regular compatibility here? The overall compatible type 
may have been found with regular compatibility on L3389.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3451
PS5, Line 3451: compatibility
The comment should explain what the function does with 'compatibility'.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3458
PS5, Line 3458:   public TypeCompatibility getRegularCompatibilityLevel() {
Nit: empty line before function.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@659
PS5, Line 659: TypeCompatibility.STRICT_DECIMAL
Is it only possible to call this function with TypeCompatibility.STRICT_DECIMAL 
or TypeCompatibility.DEFAULT? Can't it be called for example with 
TypeCompatibility.ALL_STRICT? I think instead of the enum values we should 
refer to TypeCompatibility.isStrictDecimal() here.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1114
PS5, Line 1114: getPermissiveCompatibilityLevel
Can we get here if we're not in an INSERT or SET OPERATION? Permissive 
compatibility should only apply to those cases.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1559
PS5, Line 1559: compatibility
See L1114 about permissive compatibility. Also, constness should be checked 
(see the comment at Analyzer.java L3281) - or do we allow non-const expressions 
here?


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1943
PS5, Line 1943: 1. checking whether
We only do step 1. if there is no underlying slot ref.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1943
PS5, Line 1943: itself constant
Nit: ... itself is constant ...


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1951
PS5, Line 1951: !
It would be clearer if we removed the negation ('!') and reversed the order of 
the result operands.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1951
PS5, Line 1951: isConstant
The comment of the isConstant() function says that it may be unreliable before 
the expression is analyzed. Do we know if the expression is analyzed here? We 
could insert a Precondition check for that and run some tests.

Unfortunately the comment doesn't tell us more about the uncertainty, for 
example if only false negatives can occur or also false positives.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1952
PS5, Line 1952:       return nonConstExpr;
Could return the expression directly, without assigning it to 'nonConstExpr'.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1953
PS5, Line 1953: else
No need to put it in an ELSE block as the IF block always returns. 
'nonConstExpr' could be declared after the IF.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1954
PS5, Line 1954: isConstant
It slotRef.isConstant() is true, can't we early-return Optional.empty() here?
In this case we could also eliminate the 'nonConstExpr' variable.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1954
PS5, Line 1954: !
Nit: remove negation and reverse result operands.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1967
PS5, Line 1967: findFirst
If the comment on L1954 about early-returning is correct, we only reach here if 
the slot ref is not constant. In this case the value returned by findFirst() 
should not be empty, we should add a Precondition check for it.

(Or if, for some reason, it's possible for a slot ref to not be const if all 
source expressions are const, in this case we should return the slot ref as the 
first non-const expr. Maybe we should check how 'isConstant_' is set during 
analysis, that could give more info about what states are possible.)


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@216
PS5, Line 216: compatibility
We called it 'regularCompatibility()' in Analyzer.java, I think that's a better 
name.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java@24
PS5, Line 24: from
It can't be cast from STRING because of L36, right?


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java
File fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java@25
PS5, Line 25: apply
I think it would be better on a new line.


http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java@38
PS3, Line 38:     matrix[BOOLEAN.ordinal()][TINYINT.ordinal()] = 
PrimitiveType.TINYINT;
> matrix[TINYINT.ordinal()][BOOLEAN.ordinal()] cannot happen, the size of TIN
I think we should either fill the matrix symmetrically or leave a comment that 
the matrix is conceptually symmetrical and types should/will be looked up in a 
defined order. We could also refer to the comment of Type.compatibilityMatrix.

Filling the matrix symmetrically feels better because in this case default and 
permissive matrices can be handled uniformly, though of course it has some cost 
in terms of code, (performance) and effort because if the matrix becomes 
symmetrical the ordered lookup we use now becomes unnecessary and should be 
removed.

On the other hand, the default and strict matrices are used to get a common 
type, and the permissive matrix is used to check whether a type can be 
converted to another (if I understand it correctly), so they are used 
differently anyways. But a comment should describe the situation.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java@80
PS5, Line 80: the
Nit: in the?


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java
File fe/src/main/java/org/apache/impala/catalog/Type.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@675
PS5, Line 675: compatibilityMatrix
It could be added to the comment that types are to be looked up in a specific 
order ("smaller" first) (if the matrix is not modified to be symmetrical).


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@684
PS5, Line 684:    * If unsafe mode is enabled, 'unsafeCompatibilityMatrix' is 
used for lookup. This
It could be added that it is not actually used to find a compatible type but 
check whether conversion in the given direction is allowed.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@687
PS5, Line 687: ,
Nit: dot instead of comma.


http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java@21
PS3, Line 21:
> It gets truncated to the limit of the target type, maybe the "Possible loss
This truncating behaviour should be mentioned here.


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java@56
PS5, Line 56:         // Unreachable state
Can't the Java compiler see that the switch is exhaustive so no need for a 
default branch?


http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java
File fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java:

http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java@81
PS5, Line 81:     // CHAR, VARCHAR to STRING
What about STRING to CHAR, VARCHAR? In TypeCompatibility.java it is written 
that STRINGs can be converted to CHARs and VARCHARs.


http://gerrit.cloudera.org:8080/#/c/19881/3/testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test
File testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test:

http://gerrit.cloudera.org:8080/#/c/19881/3/testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test@39
PS3, Line 39:
> Yes, the common type for the values clause will be DECIMAL(6,1). The type c
Ok, can you open an issue for it so we know about it?



--
To view, visit http://gerrit.cloudera.org:8080/19881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee5db2301216c2e088b4b3e4f6cb5a1fd10600f7
Gerrit-Change-Number: 19881
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Comment-Date: Fri, 16 Jun 2023 13:51:06 +0000
Gerrit-HasComments: Yes

Reply via email to