Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14197 )
Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables ...................................................................... Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27 PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR : columns and marked the truncation code with a TODO that references : that Jira. I don't expect any additional sorting or predicate issues outside of any that may already exist. VARCHAR is effectively a STRING (which Kudu has had for some time) with a length limit. Impala warns users the UTF-8 functionality is effectively undefined: https://impala.apache.org/docs/build/html/topics/impala_string.html > For full support in all Impala subsystems, restrict string values to the > ASCII character set. Although some UTF-8 character data can be stored in > Impala and retrieved through queries, UTF-8 strings containing non-ASCII > characters are not guaranteed to work properly in combination with many SQL > aspects, including but not limited to: - String manipulation functions. - Comparison operators. - The ORDER BY clause. - Values in partition key columns. If these edge cases and tests look important. We should prioritize UTF-8 functionality as a whole in Impala. Note: Hive, Parquet, and ORC all support UTF-8 http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33 PS14, Line 33: * Manually reproduced a check failure due to multi-byte characters : and tested that length truncation resolve that issue. > If this test is very hard to integrate into the Impala environment, then I I will look at implementing it. The main challenge is inserting data directly via a Kudu client give Impala doesn't support UTF-8 strings. http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47 PS14, Line 47: support > What is the current state of min/max runtime filters for varchars? Are they I am under the impression they should work (given it's just strings), but need tests. Thomas would likely know. -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 30 Mar 2020 21:39:45 +0000 Gerrit-HasComments: Yes
