Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13760 )
Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1 ...................................................................... Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/scan_batch.h File src/kudu/client/scan_batch.h: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/scan_batch.h@266 PS10, Line 266: Status GetCharVarchar(const Slice& col_name, Slice* val) const WARN_UNUSED_RESULT; Similar to GetBinary and GetString, I think these should be different methods, GetChar and GetVarChar. The semantics and expectations between the types are different enough to warrant different setters and getters. See my comments elsewhere about padding he CHAR type client side. Separating these methods will make this easy. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc File src/kudu/client/schema.cc: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc@348 PS10, Line 348: } else if (data_->type.value() == KuduColumnSchema::CHAR || I think the standard max CHAR length is 254. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc@377 PS10, Line 377: uint16_t length = (data_->length) ? data_->length.value() : kDefaultCharLength; This should likely default to 0 given it's "unused" in this case. That matches how precision is handled. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/column_predicate-test.cc File src/kudu/common/column_predicate-test.cc: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/column_predicate-test.cc@1124 PS10, Line 1124: ColumnSchema chars("chars", CHAR); nit: "char" http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.h@167 PS10, Line 167: Status SetCharVarchar(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT; Similar to SetBinary and SetString, I think these should be different methods, SetChar and SetVarChar. The semantics and expectations between the types are different enough to warrant different setters and getters. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.cc File src/kudu/common/partial_row.cc: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.cc@455 PS10, Line 455: if (type == CHAR) { I think we should follow the same semantics as Impala: https://impala.apache.org/docs/build/html/topics/impala_char.html If you store a CHAR value containing trailing spaces in a table, those trailing spaces are not stored in the data file. Essentially when setting a CHAR value always truncate trailing spaces, and when getting the CHAR value always pad with the required spaces. This saves space on the wire and disk. This also ensures predicates behave correctly. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h File src/kudu/common/types.h: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h@634 PS10, Line 634: struct DataTypeTraits<CHAR> : public DerivedTypeTraits<BINARY>{ Can this be DerivedTypeTraits STRING to unify all the UTF8 and debug string stuff? http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h@647 PS10, Line 647: struct DataTypeTraits<VARCHAR> : public DerivedTypeTraits<BINARY>{ Can this be DerivedTypeTraits STRING to unify all the UTF8 and debug string stuff? http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h File src/kudu/util/char_util.h: http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h@28 PS10, Line 28: static const uint16_t kMaxCharLength = pow(2, 16) - 1; // 65535 I think the standard for max CHAR length is 255 and max VARCHAR is 65535. http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h@29 PS10, Line 29: static const uint16_t kDefaultCharLength = kMaxCharLength; Do we need to have a default? Shouldn't we require it to be specified fo CHAR and VARCHAR columns? -- To view, visit http://gerrit.cloudera.org:8080/13760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54 Gerrit-Change-Number: 13760 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Sat, 29 Jun 2019 12:30:33 +0000 Gerrit-HasComments: Yes
