Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19881 )
Change subject: IMPALA-10173: Allow implicit casts between numeric and string types when inserting into table ...................................................................... Patch Set 5: (32 comments) Thanks for the replies. http://gerrit.cloudera.org:8080/#/c/19881/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19881/5//COMMIT_MSG@15 PS5, Line 15: set operations >From this it seems that for set operations the casted expressions don't need >to be constant, but at the end of the commit message it says that (for >string->[var]char conversions) in inserts the expressions need to be const. >Are there other conversions for which the exprs don't need to be const or does >the constness requirement also apply to all set operations? It should be made >clear in the commit message. http://gerrit.cloudera.org:8080/#/c/19881/5/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/19881/5/common/thrift/ImpalaService.thrift@794 PS5, Line 794: // Allowing implicit casts with loss of precision. Nit: empty line before this. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3281 PS5, Line 3281: if (compatibility.isUnsafe()) { In this overload we check constness in case of UNSAFE compatibility, but in the overload in L3264 we don't. Is there a reason for that? I would think that we should always check constness with UNSAFE compatibility (unless there are cases where non-const expressions are allowed with UNSAFE, but so far I've thought there aren't). But then the problem arises what happens if there are non-const expressions but there is a common type also with regular compatibility. Maybe we should fall back to the regular compatibility in these cases, or the opposite, always try with regular first and only try permissive if it fails. If I understand it correctly L3389 is an example of that, but other parts in the code don't do that. Maybe we should push it down to Type.getAssignmentCompatibleType() and/or ScalarType.getAssignmentCompatibleType() so that no caller calls them without constness checks, for example Expr.castTo(). http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3284 PS5, Line 3284: Optional<Expr> rightNonConstExpr = expr.getFirstNonConstSourceExpr(); Optional: we only need to get the right first non-const expr if the left one doesn't exist, so it could also be Optional<Expr> nonConstExpr = lastCompatibleExpr.getFirstNonConstSourceExpr(); if (!nonConstExpr.isPresent()) { Optional<Expr> nonConstExpr = expr.getFirstNonConstSourceExpr(); } http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3389 PS5, Line 3389: compatibleType = getCompatibleType( Do we know of a case where getCompatibleType() gives (non-error) results for both regular and permissive compatibility and the results are different? Is it the reason we try it with the regular compatibility first? Or is it because if we try it with UNSAFE first and it contains non-const exprs it throws? http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3407 PS5, Line 3407: getCompatibleType With this overload it's not checked whether the expression contains non-const subexpressions, shouldn't it be? Also, if there are non-const subexpressions, shouldn't we try to check the compatible type with regular compatibility here? The overall compatible type may have been found with regular compatibility on L3389. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3451 PS5, Line 3451: compatibility The comment should explain what the function does with 'compatibility'. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3458 PS5, Line 3458: public TypeCompatibility getRegularCompatibilityLevel() { Nit: empty line before function. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@659 PS5, Line 659: TypeCompatibility.STRICT_DECIMAL Is it only possible to call this function with TypeCompatibility.STRICT_DECIMAL or TypeCompatibility.DEFAULT? Can't it be called for example with TypeCompatibility.ALL_STRICT? I think instead of the enum values we should refer to TypeCompatibility.isStrictDecimal() here. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1114 PS5, Line 1114: getPermissiveCompatibilityLevel Can we get here if we're not in an INSERT or SET OPERATION? Permissive compatibility should only apply to those cases. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1559 PS5, Line 1559: compatibility See L1114 about permissive compatibility. Also, constness should be checked (see the comment at Analyzer.java L3281) - or do we allow non-const expressions here? http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1943 PS5, Line 1943: 1. checking whether We only do step 1. if there is no underlying slot ref. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1943 PS5, Line 1943: itself constant Nit: ... itself is constant ... http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1951 PS5, Line 1951: ! It would be clearer if we removed the negation ('!') and reversed the order of the result operands. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1951 PS5, Line 1951: isConstant The comment of the isConstant() function says that it may be unreliable before the expression is analyzed. Do we know if the expression is analyzed here? We could insert a Precondition check for that and run some tests. Unfortunately the comment doesn't tell us more about the uncertainty, for example if only false negatives can occur or also false positives. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1952 PS5, Line 1952: return nonConstExpr; Could return the expression directly, without assigning it to 'nonConstExpr'. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1953 PS5, Line 1953: else No need to put it in an ELSE block as the IF block always returns. 'nonConstExpr' could be declared after the IF. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1954 PS5, Line 1954: isConstant It slotRef.isConstant() is true, can't we early-return Optional.empty() here? In this case we could also eliminate the 'nonConstExpr' variable. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1954 PS5, Line 1954: ! Nit: remove negation and reverse result operands. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@1967 PS5, Line 1967: findFirst If the comment on L1954 about early-returning is correct, we only reach here if the slot ref is not constant. In this case the value returned by findFirst() should not be empty, we should add a Precondition check for it. (Or if, for some reason, it's possible for a slot ref to not be const if all source expressions are const, in this case we should return the slot ref as the first non-const expr. Maybe we should check how 'isConstant_' is set during analysis, that could give more info about what states are possible.) http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@216 PS5, Line 216: compatibility We called it 'regularCompatibility()' in Analyzer.java, I think that's a better name. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java File fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/BinaryCompatibility.java@24 PS5, Line 24: from It can't be cast from STRING because of L36, right? http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java File fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/CompatibilityRule.java@25 PS5, Line 25: apply I think it would be better on a new line. http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java File fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java@38 PS3, Line 38: matrix[BOOLEAN.ordinal()][TINYINT.ordinal()] = PrimitiveType.TINYINT; > matrix[TINYINT.ordinal()][BOOLEAN.ordinal()] cannot happen, the size of TIN I think we should either fill the matrix symmetrically or leave a comment that the matrix is conceptually symmetrical and types should/will be looked up in a defined order. We could also refer to the comment of Type.compatibilityMatrix. Filling the matrix symmetrically feels better because in this case default and permissive matrices can be handled uniformly, though of course it has some cost in terms of code, (performance) and effort because if the matrix becomes symmetrical the ordered lookup we use now becomes unnecessary and should be removed. On the other hand, the default and strict matrices are used to get a common type, and the permissive matrix is used to check whether a type can be converted to another (if I understand it correctly), so they are used differently anyways. But a comment should describe the situation. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java File fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/DefaultCompatibility.java@80 PS5, Line 80: the Nit: in the? http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java File fe/src/main/java/org/apache/impala/catalog/Type.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@675 PS5, Line 675: compatibilityMatrix It could be added to the comment that types are to be looked up in a specific order ("smaller" first) (if the matrix is not modified to be symmetrical). http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@684 PS5, Line 684: * If unsafe mode is enabled, 'unsafeCompatibilityMatrix' is used for lookup. This It could be added that it is not actually used to find a compatible type but check whether conversion in the given direction is allowed. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/Type.java@687 PS5, Line 687: , Nit: dot instead of comma. http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java File fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/3/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java@21 PS3, Line 21: > It gets truncated to the limit of the target type, maybe the "Possible loss This truncating behaviour should be mentioned here. http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java File fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/TypeCompatibility.java@56 PS5, Line 56: // Unreachable state Can't the Java compiler see that the switch is exhaustive so no need for a default branch? http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java File fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java: http://gerrit.cloudera.org:8080/#/c/19881/5/fe/src/main/java/org/apache/impala/catalog/UnsafeCompatibility.java@81 PS5, Line 81: // CHAR, VARCHAR to STRING What about STRING to CHAR, VARCHAR? In TypeCompatibility.java it is written that STRINGs can be converted to CHARs and VARCHARs. http://gerrit.cloudera.org:8080/#/c/19881/3/testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test File testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test: http://gerrit.cloudera.org:8080/#/c/19881/3/testdata/workloads/functional-query/queries/QueryTest/insert-unsafe.test@39 PS3, Line 39: > Yes, the common type for the values clause will be DECIMAL(6,1). The type c Ok, can you open an issue for it so we know about it? -- To view, visit http://gerrit.cloudera.org:8080/19881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee5db2301216c2e088b4b3e4f6cb5a1fd10600f7 Gerrit-Change-Number: 19881 Gerrit-PatchSet: 5 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Comment-Date: Fri, 16 Jun 2023 13:51:06 +0000 Gerrit-HasComments: Yes
