Peter Rozsa has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19790 )

Change subject: IMPALA-12042: Invalid casts in set operations calculation
......................................................................

IMPALA-12042: Invalid casts in set operations calculation

This change fixes invalid casts in set operations where the common type
calculation fails to determine the incompatibility between 3 or more
types. The previous solution does not check every possible combination
between types; the new approach provides a solution to this
problem by grouping the available types in the statement's slots and
reducing the available type set by calculating every combination until
only one type remains or any type combination yield an invalid type.

The algorithm has 4 steps:
1) The initial list of expressions is transposed; this format orders
the expressions in a way that every slot has its respective list of
expressions, making the type combination creation easier.

2) A simplification step groups the expressions by types; every group
stores the first maximal size type, this can be done because every
parametric type has the property to cast a smaller size type to a
bigger size type, for example, VARCHAR(10) could be cast to
VARCHAR(11) without any concern. This step significantly reduces the
number of intermediate-type combinations, regardless of the size of
the expression list, the output's upper bound is the number of available
types (DECIMAL's with different scales are treated as separate types).

3) The root type compatibilities are calculated from the simplified
list of expressions and stored in a set. The root type compatibility
stores the original expressions that are used for error reporting.
After the first round of compatibility calculation,  regular
compatibilities are getting calculated, which contain their parent
compatibilities. With this tree-like structure, in case of type
compatibility errors, the original expressions can be traced back and
reported properly. The compatibility calculation goes until the size
of the set goes to one or a compatibility error happens.

4) If the common type is deduced, a final step goes through the list of
all expressions and validates that every expression is compatible with
it. This step is required because of the intermediate "type-casting"
condition; if the combination of two types turns out to be a third,
different type, the remaining combination set might not deduce the
incompatibility. Example:
1) A B C D E
2) B D F (A, B) -> B, (C, D) -> D, (D, E) -> F (new type introduced)
3) D F (B, D) -> D, (D, F) -> F
4) F (D, F) -> F
(A, F) check is missing because A is sorted out in the first iteration,
and might cause incompatibility.
This final step will check unconditionally that every expression
is compatible with the common type.

Tests:
 - new test case added to AnalyzeStmtTests
 - unit test added to SetCompatibilityHandler

Change-Id: I02df42c67deda37b7f71db267dc761778a9caa2b
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/SetCompatibilityHandler.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/SetCompatibilityHandlerTest.java
7 files changed, 514 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/19790/3
--
To view, visit http://gerrit.cloudera.org:8080/19790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02df42c67deda37b7f71db267dc761778a9caa2b
Gerrit-Change-Number: 19790
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>

Reply via email to