Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13760 )
Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1 ...................................................................... Patch Set 19: (3 comments) I had some questions about how CHAR padding should be handled, so Attila, Grant and I had a quick chat. I'm summarizing our conclusions below: 1. It makes sense for the CHAR that's stored on-disk to be truncated down to its max length and be free of trailing whitespace. This patch currently only handles the former; it needs to also handle the latter. 2. For simplicity's sake, let's only offer whatever client-side CHAR APIs we actually need. For DECIMAL, Impala bypasses the "safe" handling in KuduScanBatch and the Python bindings do decimal scaling themselves. If CHAR is to be comparable, we should only offer unpadded APIs; it's a simple matter for query engines (and other applications) to pad the CHAR themselves. This should simplify the C++ client API in that there's no longer any need for KuduScanBatch's getter to copy the CHAR into a new std::string; previously this was done to add the padding. 3. Attila is going to prototype the Impala-side changes to expose CHAR/VARCHAR and test them before merging this patch, to avoid any surprises down the line. He's also going to investigate whether there are any Impala CHAR/VARCHAR tests that we can easily repurpose for Kudu so as to benefit from the various truncation/padding scenarios. http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h File src/kudu/util/char_util.h: http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h@44 PS19, Line 44: <==> What's this mean? http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h@54 PS19, Line 54: std::string UTF8Resize(Slice val, size_t utf8_length); For simplicity, could UTF8Truncate be folded into UTF8Resize? Meaning, could UTF8Resize just do the truncation if utf8_length < val.size()? The two functions seem like different halves of the same problem. http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.cc File src/kudu/util/char_util.cc: http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.cc@44 PS19, Line 44: Slice UTF8Truncate(Slice val, size_t max_utf8_length) { Should DCHECK that max_utf8_length < val.size()? -- To view, visit http://gerrit.cloudera.org:8080/13760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54 Gerrit-Change-Number: 13760 Gerrit-PatchSet: 19 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 16 Jul 2019 23:13:19 +0000 Gerrit-HasComments: Yes
