Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14197 )
Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables ...................................................................... IMPALA-5092 Add support for VARCHAR in Kudu tables KUDU-1938 added VARCHAR column type support to Kudu. This commit adds support for Kudu's VARCHAR type to Impala. The length of a Kudu varchar is applied as a character length as opposed to a byte length like Impala currently uses. When writing data to Kudu, the VARCHAR length is not an issue because Impala only officially supports ASCII characters and those characters are the same size in bytes and characters. Additionally, extra bytes would be truncated by the Kudu client if somehow a value was too long. When reading data from Kudu, it is possible that the value written by some other application is wider in bytes than Impala expects and can handle. This can happen due to multi-byte UTF-8 characters. In that case, we adjust the length in Impala to truncate the extra bytes of the value. This isn’t a great solution, but one other integrations have taken as well given Impala doesn’t support UTF-8 values. IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR columns and marked the truncation code with a TODO that references that Jira. Testing: * Performed manual testing of standard DDL and DML interaction * Manually reproduced a check failure due to multi-byte characters and tested that length truncation resolve that issue. * Added/adjusted the following automated tests: ** AnalyzeDDLTest: CTAS into Kudu with varchar type ** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type ** kudu_create.test: Create table with VARCHAR column, key, hash partition, and range partition ** kudu_describe.test: Describe table with VARCHAR column and key ** kudu_insert.test: Insert with VARCHAR columns including null and non-null defaults ** kudu_update.test: Updates with VARCHAR column ** kudu_upsert.test: Upserts with VARCHAR column ** kudu_delete.test Deletes with VARCHAR columns ** kudu-scan-node.test Tests basic predicates with VARCHAR columns Follow on work: - IMPALA-9580: Add min-max runtime filter support/tests - IMPALA-9581: Pushdown string predicates - IMPALA-9583: Automated multibyte truncation tests Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Reviewed-on: http://gerrit.cloudera.org:8080/14197 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Thomas Tauber-Marshall <tmarsh...@cloudera.com> --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/query_test/test_kudu.py 17 files changed, 806 insertions(+), 639 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 18 Gerrit-Owner: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>