Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13760 )
Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1 ...................................................................... Patch Set 20: (7 comments) http://gerrit.cloudera.org:8080/#/c/13760/20//COMMIT_MSG Commit Message: PS20: I'd use the commit message to document the expected padding/truncation semantics, as well as a discussion of trade-offs (i.e. why are we truncating before we store the data server-side?) http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row-test.cc File src/kudu/common/partial_row-test.cc: http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row-test.cc@290 PS20, Line 290: s = row.SetVarchar("varchar_val", "shortval"); Also add a VARCHAR test that proves that it doesn't truncate excess whitespace. http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row.cc File src/kudu/common/partial_row.cc: http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row.cc@423 PS20, Line 423: case CHAR: : relocated_val = UTF8Truncate(val, col.type_attributes().length, : TRUNCATE_WHITESPACE); : break; : case VARCHAR: : relocated_val = UTF8Truncate(val, col.type_attributes().length, : NO_TRUNCATE_WHITESPACE); : break; Maybe rewrite like this: case CHAR: case VARCHAR: relocated_val = UTF8Truncate(val, col.type_attributes().length, T::type == CHAR ? TRUNCATE_WHITESPACE : NO_TRUNCATE_WHITESPACE); break; http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h File src/kudu/common/schema.h: http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h@124 PS20, Line 124: // Maximum value of the length is 65,535 for compatibility reasons as it's : // used by VARCHAR type which can be set to a maximum of 65,535 in case of : // MySQL and less for other major RDMBMS implementations. Did you consider limiting it to 255 in order to be exactly compatible with Impala? Looks like Impala will truncate excess characters if the underlying storage stores more than 255. Do we care? Also, I'm trying to figure out whether the length should refer to a _byte quantity_ or a _char quantity_ (the two aren't the same in UTF-8). The Impala docs are a little ambiguous here. http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h File src/kudu/util/char_util.h: http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h@30 PS20, Line 30: enum TruncateType { : NO_TRUNCATE_WHITESPACE, : TRUNCATE_WHITESPACE, : }; I might rename this: enum class TrailingWhitespace { IGNORE, TRUNCATE }; By making it an enum class, references to it have to be fully qualified (i.e. TrailingWhitespace::IGNORE). This makes it perfectly clear what the effect will be. http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h@43 PS20, Line 43: // Copy and truncate a slice Should also say that the returned Slice "owns" its memory (which is somewhat unusual for a Slice, hence the need to doc it). http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.cc File src/kudu/util/char_util.cc: http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.cc@45 PS20, Line 45: / sizeof(uint8_t) Not necessary: isn't this always "divide by 1"? -- To view, visit http://gerrit.cloudera.org:8080/13760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54 Gerrit-Change-Number: 13760 Gerrit-PatchSet: 20 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 17 Jul 2019 17:14:18 +0000 Gerrit-HasComments: Yes
