Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14306 )

Change subject: add a tool to create table
......................................................................


Patch Set 13:

(23 comments)

Haven't reviewed the test code yet, but this is looking much better!

http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/client/schema.cc
File src/kudu/client/schema.cc:

PS13:
Rather than introduce the UNKNOWN types, could we make these functions return a 
Status (and pass the converted enum as an OUT parameter)? Then we could return 
InvalidArgument if we don't recognize the string.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@338
PS13, Line 338:   optional bool is_nullable = 3 [default = false];
That's the built-in default value for a bool.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@342
PS13, Line 342:   // The following attributes refer to the on-disk storage of 
the column.
              :   // They won't always be set, depending on context.
This comment is copied from common.proto but doesn't make sense here.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@347
PS13, Line 347: [default=0]
That's the built-in default value for an int32.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@413
PS13, Line 413:   repeated ColumnPB schema = 2;
What do you think about defining a SchemaPB message, and putting both "repeated 
ColumnPB columns" and "repeated string key_column_names" in it? It'll make it 
easier to evolve the schema in the future, and it'll allow you to move the 
primary key parsing into ParseKuduSchema more naturally.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@415
PS13, Line 415:   repeated string primary_key = 3;
How about "key_column_names"? That explains that we're expecting more than one 
entry, and that these are column names.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@421
PS13, Line 421:   optional ExtraConfigPB extra_config = 6;
Nit: "extra_configs"


http://gerrit.cloudera.org:8080/#/c/14306/10/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

http://gerrit.cloudera.org:8080/#/c/14306/10/src/kudu/tools/tool_action_table.cc@753
PS10, Line 753:     }
> Here I'm not sure if decimal type supports setting default value. Anad no s
KuduValue::FromDecimal should work, even for default values. See 
decimal-itest.cc:79 for an example. Bear in mind the first arg to FromDecimal 
is the raw int128_t value, not the precision.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@640
PS13, Line 640:   vector<pair<string, KuduColumnSchema::DataType>> 
range_name_type_map;
This isn't a map though. Maybe "range_col_names_and_types"?

Elsewhere too.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@870
PS13, Line 870: Status TransferValues(const RepeatedPtrField<string>& values,
This is converting the PB values into a JSON representation of KuduPartialRow, 
right? Maybe call it ToJsonPartialRow?


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@879
PS13, Line 879:   for (int i = 0; i < values.size(); ++i) {
              :     const string& value = values[i];
              :     if (range_columns[i].second == KuduColumnSchema::STRING ||
              :         range_columns[i].second == KuduColumnSchema::BINARY) {
              :       (*v) += "\"";
              :       (*v) += value;
              :       (*v) += "\",";
              :     } else {
              :       (*v) += value;
              :       (*v) += ",";
              :     }
              :   }
Use JoinMapped from gutil/strings/join.h to simplify this.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@899
PS13, Line 899: (*b).
b->


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@908
PS13, Line 908:       if (type != KuduColumnSchema::DECIMAL) {
              :         return Status::InvalidArgument(Substitute(
              :             "column $0 type attributes just can be used for 
decimal type, not $1.",
              :             column.column_name(), column.column_type()));
              :       }
This is already validated in KuduSchemaBuilder::Build.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@967
PS13, Line 967:     table_creator->add_hash_partitions(hash_keys, 
hash_partition.num_buckets());
This is where you should use the optional seed.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@977
PS13, Line 977:     vector<string>(range_keys.begin(), range_keys.end()));
Nit: indent 4 chars further.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@978
PS13, Line 978:     for (const auto& column : table_req.schema()) {
              :       // Find the range key type,
              :       if (ContainsKey(range_keys, column.column_name())) {
You should be able to do this with the KuduSchema (using the Column() method)  
and thus not need the entire table_req here.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@986
PS13, Line 986:     string range_values;
How about bound_partial_row_json instead?


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@990
PS13, Line 990:       KuduTableCreator::RangePartitionBound bound1 = 
KuduTableCreator::INCLUSIVE_BOUND;
              :       KuduTableCreator::RangePartitionBound bound2 = 
KuduTableCreator::INCLUSIVE_BOUND;
Rename to lower_bound_type and upper_bound_type respectively.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1033
PS13, Line 1033: else {
               :     table_creator->set_range_partition_columns({});
               :   }
To reduce nesting and make the code more readable, invert the conditions so 
that you can handle this case first, and do an early return when it happens. So:

  if not has_range_partitions:
    set_range_partition_columns({})
    return Status::OK()

  <handle has_range_partitions case>
  return Status::OK()


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1046
PS13, Line 1046:   if (json_str.empty()) {
               :     return Status::InvalidArgument(Substitute(
               :         "please use a right json object to create a table, it 
can't be empty"));
               :   }
Doesn't the call to JsonStringToMessage below handle this case already?


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1287
PS13, Line 1287:       .Description("Create a new table")
We should probably doc (maybe in ExtraDescription?) that tool.proto describes 
the layout for the JSON object.


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1289
PS13, Line 1289:       .AddRequiredParameter({ kCreateTableJSONArg, "JOSN 
object for creating table" })
JSON


http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1292
PS13, Line 1292:   return ModeBuilder("table")
In theory the actions are supposed to be alphabetically sorted. Could you 
restore that?



--
To view, visit http://gerrit.cloudera.org:8080/14306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0bce4733a504f8ef5f024089a16fe3c6f1e493f1
Gerrit-Change-Number: 14306
Gerrit-PatchSet: 13
Gerrit-Owner: YangSong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Mon, 14 Oct 2019 04:47:54 +0000
Gerrit-HasComments: Yes

Reply via email to