Peter Rozsa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/19790 )
Change subject: IMPALA-12042: Invalid casts in set operations calculation ...................................................................... IMPALA-12042: Invalid casts in set operations calculation This change fixes invalid casts in set operations where the common type calculation fails to determine the incompatibility between 3 or more types. The previous solution does not check every possible combination between types; the new approach provides a solution to this problem by grouping the available types in the statement's slots and reducing the available type set by calculating every combination until only one type remains or any type combination yield an invalid type. The algorithm has 4 steps: 1) The initial list of expressions is transposed; this format orders the expressions in a way that every slot has its respective list of expressions, making the type combination creation easier. 2) A simplification step groups the expressions by types; every group stores the first maximal size type, this can be done because every parametric type has the property to cast a smaller size type to a bigger size type, for example, VARCHAR(10) could be cast to VARCHAR(11) without any concern. This step significantly reduces the number of intermediate-type combinations, regardless of the size of the expression list, the output's upper bound is the number of available types (DECIMAL's with different scales are treated as separate types). 3) The root type compatibilities are calculated from the simplified list of expressions and stored in a set. The root type compatibility stores the original expressions that are used for error reporting. After the first round of compatibility calculation, regular compatibilities are getting calculated, which contain their parent compatibilities. With this tree-like structure, in case of type compatibility errors, the original expressions can be traced back and reported properly. The compatibility calculation goes until the size of the set goes to one or a compatibility error happens. 4) If the common type is deduced, a final step goes through the list of all expressions and validates that every expression is compatible with it. This step is required because of the intermediate "type-casting" condition; if the combination of two types turns out to be a third, different type, the remaining combination set might not deduce the incompatibility. Example: 1) A B C D E 2) B D F (A, B) -> B, (C, D) -> D, (D, E) -> F (new type introduced) 3) D F (B, D) -> D, (D, F) -> F 4) F (D, F) -> F (A, F) check is missing because A is sorted out in the first iteration, and might cause incompatibility. This final step will check unconditionally that every expression is compatible with the common type. Tests: - new test case added to AnalyzeStmtTests - unit test added to SetCompatibilityHandler Change-Id: I02df42c67deda37b7f71db267dc761778a9caa2b --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/SetCompatibilityHandler.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java A fe/src/test/java/org/apache/impala/analysis/SetCompatibilityHandlerTest.java 7 files changed, 514 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/19790/3 -- To view, visit http://gerrit.cloudera.org:8080/19790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02df42c67deda37b7f71db267dc761778a9caa2b Gerrit-Change-Number: 19790 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]>
