Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are 
inserted
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8
PS8, Line 8:
           : If, in a VALUES clause, for the same column all of the values are 
CHAR
           : types but not all are of the same length, the common type chosen is
           : CHAR(max(lengths)). This means that shorter values are padded with
           : spaces. If the destination column is not CHAR but VARCHAR or 
STRING,
           : this produces different results than if the values in the column 
are
           : inserted individually, in separate statements. This behaviour is
           : suboptimal because information is lost.
An example could make this clearer.


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17
PS8, Line 17: VALUES_STMT_NON_LOSSY_COMMON_TYPE
I would prefer to add CHAR to the name somehow as it is only applied for that 
type.
e.g. VALUES_STMT_AVOID_LOSSY_CHAR_PADDING


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26
PS8, Line 26: We choose VARCHAR instead of STRING as the common type because 
VARCHAR
Can you add a note about already different behavior in CHAR(N) padding compared 
to Hive? Postgres could be also mentioned.


http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646
PS8, Line 646: 0
nit: false


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200
PS8, Line 200: his is only used
This is no longer true, right?


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146
PS8, Line 146:     if (analyzer.getQueryCtx().client_request
             :         .query_options.values_stmt_non_lossy_common_type) {
             :       // We suppress any AnalysisException thrown by 
castIfAllChars() because it is the
             :       // responsibility of super.analyze() to validate the query 
and report errors. Any
             :       // error that results in castIfAllChars() throwing an 
exception should also trigger
             :       // an exception in super.analyze(), but if there are 
multiple errors, the exception
             :       // thrown by castIfAllChars() may not correspond to the 
error first encountered by
             :       // super.analyze(). To keep error reporting consistent, we 
always choose the
             :       // exception thrown by super.analyze() and suppress 
exceptions thrown by
             :       // castIfAllChars().
             :       try {
             :         castIfAllChars(analyzer);
             :       } catch (AnalysisException e) {
             :         ex = e;
             :       }
             :     }
             :     super.analyze(analyzer);
Calling this before analyze looks a bit convoluted.

It would be nicer to do this during the usual analyses, e.g. create function in 
SetOperationStmt like use EnforceNonLossyCommonType() which could return true 
for ValuesStmt with values_stmt_non_lossy_common_type. This bool could be 
passed to castToSetOpCompatibleTypes() which looks to best place for this logic 
for me.


http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
File 
testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test:

http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32
PS8, Line 32: (cast("12" as char(2)));
Can you also add longer string that will be truncated?
+ the same for varchars



--
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Comment-Date: Mon, 03 Jul 2023 13:10:59 +0000
Gerrit-HasComments: Yes

Reply via email to