Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )
Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8 PS8, Line 8: : If, in a VALUES clause, for the same column all of the values are CHAR : types but not all are of the same length, the common type chosen is : CHAR(max(lengths)). This means that shorter values are padded with : spaces. If the destination column is not CHAR but VARCHAR or STRING, : this produces different results than if the values in the column are : inserted individually, in separate statements. This behaviour is : suboptimal because information is lost. An example could make this clearer. http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17 PS8, Line 17: VALUES_STMT_NON_LOSSY_COMMON_TYPE I would prefer to add CHAR to the name somehow as it is only applied for that type. e.g. VALUES_STMT_AVOID_LOSSY_CHAR_PADDING http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26 PS8, Line 26: We choose VARCHAR instead of STRING as the common type because VARCHAR Can you add a note about already different behavior in CHAR(N) padding compared to Hive? Postgres could be also mentioned. http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646 PS8, Line 646: 0 nit: false http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200 PS8, Line 200: his is only used This is no longer true, right? http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146 PS8, Line 146: if (analyzer.getQueryCtx().client_request : .query_options.values_stmt_non_lossy_common_type) { : // We suppress any AnalysisException thrown by castIfAllChars() because it is the : // responsibility of super.analyze() to validate the query and report errors. Any : // error that results in castIfAllChars() throwing an exception should also trigger : // an exception in super.analyze(), but if there are multiple errors, the exception : // thrown by castIfAllChars() may not correspond to the error first encountered by : // super.analyze(). To keep error reporting consistent, we always choose the : // exception thrown by super.analyze() and suppress exceptions thrown by : // castIfAllChars(). : try { : castIfAllChars(analyzer); : } catch (AnalysisException e) { : ex = e; : } : } : super.analyze(analyzer); Calling this before analyze looks a bit convoluted. It would be nicer to do this during the usual analyses, e.g. create function in SetOperationStmt like use EnforceNonLossyCommonType() which could return true for ValuesStmt with values_stmt_non_lossy_common_type. This bool could be passed to castToSetOpCompatibleTypes() which looks to best place for this logic for me. http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test File testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test: http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32 PS8, Line 32: (cast("12" as char(2))); Can you also add longer string that will be truncated? + the same for varchars -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Comment-Date: Mon, 03 Jul 2023 13:10:59 +0000 Gerrit-HasComments: Yes
