Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13760 )

Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1
......................................................................


Patch Set 19:

(3 comments)

I had some questions about how CHAR padding should be handled, so Attila, Grant 
and I had a quick chat. I'm summarizing our conclusions below:

1. It makes sense for the CHAR that's stored on-disk to be truncated down to 
its max length and be free of trailing whitespace. This patch currently only 
handles the former; it needs to also handle the latter.

2. For simplicity's sake, let's only offer whatever client-side CHAR APIs we 
actually need. For DECIMAL, Impala bypasses the "safe" handling in 
KuduScanBatch and the Python bindings do decimal scaling themselves. If CHAR is 
to be comparable, we should only offer unpadded APIs; it's a simple matter for 
query engines (and other applications) to pad the CHAR themselves. This should 
simplify the C++ client API in that there's no longer any need for 
KuduScanBatch's getter to copy the CHAR into a new std::string; previously this 
was done to add the padding.

3. Attila is going to prototype the Impala-side changes to expose CHAR/VARCHAR 
and test them before merging this patch, to avoid any surprises down the line. 
He's also going to investigate whether there are any Impala CHAR/VARCHAR tests 
that we can easily repurpose for Kudu so as to benefit from the various 
truncation/padding scenarios.

http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h
File src/kudu/util/char_util.h:

http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h@44
PS19, Line 44: <==>
What's this mean?


http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.h@54
PS19, Line 54:   std::string UTF8Resize(Slice val, size_t utf8_length);
For simplicity, could UTF8Truncate be folded into UTF8Resize? Meaning, could 
UTF8Resize just do the truncation if utf8_length < val.size()? The two 
functions seem like different halves of the same problem.


http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

http://gerrit.cloudera.org:8080/#/c/13760/19/src/kudu/util/char_util.cc@44
PS19, Line 44: Slice UTF8Truncate(Slice val, size_t max_utf8_length) {
Should DCHECK that max_utf8_length < val.size()?



--
To view, visit http://gerrit.cloudera.org:8080/13760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54
Gerrit-Change-Number: 13760
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 16 Jul 2019 23:13:19 +0000
Gerrit-HasComments: Yes

Reply via email to