Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18999
Change subject: IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted ...................................................................... IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch fixes that by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. TODO: If the destination type is also CHAR, there is no need to cast the values but it can be tricky to detect in time. Can we do it? Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test M tests/query_test/test_chars.py 5 files changed, 309 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/1 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]>
