Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18999


Change subject: IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) 
values are inserted
......................................................................

IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch fixes that by implicitly casting the values to the VARCHAR
type of the longest value if all values in a column are CHAR types AND
not all have the same length. This VARCHAR type will be the common type
of the column in the VALUES statement.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

TODO: If the destination type is also CHAR, there is no need to cast the
values but it can be tricky to detect in time. Can we do it?

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   
testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test
M tests/query_test/test_chars.py
5 files changed, 309 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/1
--
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>

Reply via email to