Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13760 )

Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1
......................................................................


Patch Set 20:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13760/20//COMMIT_MSG
Commit Message:

PS20:
I'd use the commit message to document the expected padding/truncation 
semantics, as well as a discussion of trade-offs (i.e. why are we truncating 
before we store the data server-side?)


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row-test.cc
File src/kudu/common/partial_row-test.cc:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row-test.cc@290
PS20, Line 290:   s = row.SetVarchar("varchar_val", "shortval");
Also add a VARCHAR test that proves that it doesn't truncate excess whitespace.


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/partial_row.cc@423
PS20, Line 423:     case CHAR:
              :       relocated_val = UTF8Truncate(val, 
col.type_attributes().length,
              :                                    TRUNCATE_WHITESPACE);
              :       break;
              :     case VARCHAR:
              :       relocated_val = UTF8Truncate(val, 
col.type_attributes().length,
              :                                    NO_TRUNCATE_WHITESPACE);
              :       break;
Maybe rewrite like this:

  case CHAR:
  case VARCHAR:
    relocated_val = UTF8Truncate(val, col.type_attributes().length,
                                 T::type == CHAR ? TRUNCATE_WHITESPACE : 
NO_TRUNCATE_WHITESPACE);
    break;


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h
File src/kudu/common/schema.h:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h@124
PS20, Line 124:   // Maximum value of the length is 65,535 for compatibility 
reasons as it's
              :   // used by VARCHAR type which can be set to a maximum of 
65,535 in case of
              :   // MySQL and less for other major RDMBMS implementations.
Did you consider limiting it to 255 in order to be exactly compatible with 
Impala? Looks like Impala will truncate excess characters if the underlying 
storage stores more than 255. Do we care?

Also, I'm trying to figure out whether the length should refer to a _byte 
quantity_ or a _char quantity_ (the two aren't the same in UTF-8). The Impala 
docs are a little ambiguous here.


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h
File src/kudu/util/char_util.h:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h@30
PS20, Line 30:   enum TruncateType {
             :     NO_TRUNCATE_WHITESPACE,
             :     TRUNCATE_WHITESPACE,
             :   };
I might rename this:

  enum class TrailingWhitespace {
    IGNORE,
    TRUNCATE
  };

By making it an enum class, references to it have to be fully qualified (i.e. 
TrailingWhitespace::IGNORE). This makes it perfectly clear what the effect will 
be.


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.h@43
PS20, Line 43:   // Copy and truncate a slice
Should also say that the returned Slice "owns" its memory (which is somewhat 
unusual for a Slice, hence the need to doc it).


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/util/char_util.cc@45
PS20, Line 45: / sizeof(uint8_t)
Not necessary: isn't this always "divide by 1"?



--
To view, visit http://gerrit.cloudera.org:8080/13760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54
Gerrit-Change-Number: 13760
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 17 Jul 2019 17:14:18 +0000
Gerrit-HasComments: Yes

Reply via email to