Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13760 )

Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1
......................................................................


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/scan_batch.h
File src/kudu/client/scan_batch.h:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/scan_batch.h@266
PS10, Line 266:   Status GetCharVarchar(const Slice& col_name, Slice* val) 
const WARN_UNUSED_RESULT;
Similar to GetBinary and GetString, I think these should be different methods, 
GetChar and GetVarChar.

The semantics and expectations between the types are different enough to 
warrant different setters and getters.

See my comments elsewhere about padding he CHAR type client side. Separating 
these methods will make this easy.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc
File src/kudu/client/schema.cc:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc@348
PS10, Line 348:   } else if (data_->type.value() == KuduColumnSchema::CHAR ||
I think the standard max CHAR length is 254.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/client/schema.cc@377
PS10, Line 377:   uint16_t length = (data_->length) ? data_->length.value() : 
kDefaultCharLength;
This should likely default to 0 given it's "unused" in this case. That matches 
how precision is handled.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/column_predicate-test.cc@1124
PS10, Line 1124:     ColumnSchema chars("chars", CHAR);
nit: "char"


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.h@167
PS10, Line 167:   Status SetCharVarchar(const Slice& col_name, const Slice& 
val) WARN_UNUSED_RESULT;
Similar to SetBinary and SetString, I think these should be different methods, 
SetChar and SetVarChar.

The semantics and expectations between the types are different enough to 
warrant different setters and getters.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/partial_row.cc@455
PS10, Line 455:   if (type == CHAR) {
I think we should follow the same semantics as Impala: 
https://impala.apache.org/docs/build/html/topics/impala_char.html

If you store a CHAR value containing trailing spaces in a table, those trailing 
spaces are not stored in the data file.

Essentially when setting a CHAR value always truncate trailing spaces, and when 
getting the CHAR value always pad with the required spaces. This saves space on 
the wire and disk. This also ensures predicates behave correctly.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h
File src/kudu/common/types.h:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h@634
PS10, Line 634: struct DataTypeTraits<CHAR> : public DerivedTypeTraits<BINARY>{
Can this be DerivedTypeTraits STRING to unify all the UTF8 and debug string 
stuff?


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/common/types.h@647
PS10, Line 647: struct DataTypeTraits<VARCHAR> : public 
DerivedTypeTraits<BINARY>{
Can this be DerivedTypeTraits STRING to unify all the UTF8 and debug string 
stuff?


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h
File src/kudu/util/char_util.h:

http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h@28
PS10, Line 28:   static const uint16_t kMaxCharLength = pow(2, 16) - 1; // 65535
I think the standard for max CHAR length is 255 and max VARCHAR is 65535.


http://gerrit.cloudera.org:8080/#/c/13760/10/src/kudu/util/char_util.h@29
PS10, Line 29:   static const uint16_t kDefaultCharLength = kMaxCharLength;
Do we need to have a default? Shouldn't we require it to be specified fo CHAR 
and VARCHAR columns?



--
To view, visit http://gerrit.cloudera.org:8080/13760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54
Gerrit-Change-Number: 13760
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Sat, 29 Jun 2019 12:30:33 +0000
Gerrit-HasComments: Yes

Reply via email to