Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14306 )
Change subject: add a tool to create table ...................................................................... Patch Set 13: (23 comments) Haven't reviewed the test code yet, but this is looking much better! http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/client/schema.cc File src/kudu/client/schema.cc: PS13: Rather than introduce the UNKNOWN types, could we make these functions return a Status (and pass the converted enum as an OUT parameter)? Then we could return InvalidArgument if we don't recognize the string. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto File src/kudu/tools/tool.proto: http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@338 PS13, Line 338: optional bool is_nullable = 3 [default = false]; That's the built-in default value for a bool. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@342 PS13, Line 342: // The following attributes refer to the on-disk storage of the column. : // They won't always be set, depending on context. This comment is copied from common.proto but doesn't make sense here. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@347 PS13, Line 347: [default=0] That's the built-in default value for an int32. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@413 PS13, Line 413: repeated ColumnPB schema = 2; What do you think about defining a SchemaPB message, and putting both "repeated ColumnPB columns" and "repeated string key_column_names" in it? It'll make it easier to evolve the schema in the future, and it'll allow you to move the primary key parsing into ParseKuduSchema more naturally. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@415 PS13, Line 415: repeated string primary_key = 3; How about "key_column_names"? That explains that we're expecting more than one entry, and that these are column names. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool.proto@421 PS13, Line 421: optional ExtraConfigPB extra_config = 6; Nit: "extra_configs" http://gerrit.cloudera.org:8080/#/c/14306/10/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: http://gerrit.cloudera.org:8080/#/c/14306/10/src/kudu/tools/tool_action_table.cc@753 PS10, Line 753: } > Here I'm not sure if decimal type supports setting default value. Anad no s KuduValue::FromDecimal should work, even for default values. See decimal-itest.cc:79 for an example. Bear in mind the first arg to FromDecimal is the raw int128_t value, not the precision. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@640 PS13, Line 640: vector<pair<string, KuduColumnSchema::DataType>> range_name_type_map; This isn't a map though. Maybe "range_col_names_and_types"? Elsewhere too. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@870 PS13, Line 870: Status TransferValues(const RepeatedPtrField<string>& values, This is converting the PB values into a JSON representation of KuduPartialRow, right? Maybe call it ToJsonPartialRow? http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@879 PS13, Line 879: for (int i = 0; i < values.size(); ++i) { : const string& value = values[i]; : if (range_columns[i].second == KuduColumnSchema::STRING || : range_columns[i].second == KuduColumnSchema::BINARY) { : (*v) += "\""; : (*v) += value; : (*v) += "\","; : } else { : (*v) += value; : (*v) += ","; : } : } Use JoinMapped from gutil/strings/join.h to simplify this. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@899 PS13, Line 899: (*b). b-> http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@908 PS13, Line 908: if (type != KuduColumnSchema::DECIMAL) { : return Status::InvalidArgument(Substitute( : "column $0 type attributes just can be used for decimal type, not $1.", : column.column_name(), column.column_type())); : } This is already validated in KuduSchemaBuilder::Build. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@967 PS13, Line 967: table_creator->add_hash_partitions(hash_keys, hash_partition.num_buckets()); This is where you should use the optional seed. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@977 PS13, Line 977: vector<string>(range_keys.begin(), range_keys.end())); Nit: indent 4 chars further. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@978 PS13, Line 978: for (const auto& column : table_req.schema()) { : // Find the range key type, : if (ContainsKey(range_keys, column.column_name())) { You should be able to do this with the KuduSchema (using the Column() method) and thus not need the entire table_req here. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@986 PS13, Line 986: string range_values; How about bound_partial_row_json instead? http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@990 PS13, Line 990: KuduTableCreator::RangePartitionBound bound1 = KuduTableCreator::INCLUSIVE_BOUND; : KuduTableCreator::RangePartitionBound bound2 = KuduTableCreator::INCLUSIVE_BOUND; Rename to lower_bound_type and upper_bound_type respectively. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1033 PS13, Line 1033: else { : table_creator->set_range_partition_columns({}); : } To reduce nesting and make the code more readable, invert the conditions so that you can handle this case first, and do an early return when it happens. So: if not has_range_partitions: set_range_partition_columns({}) return Status::OK() <handle has_range_partitions case> return Status::OK() http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1046 PS13, Line 1046: if (json_str.empty()) { : return Status::InvalidArgument(Substitute( : "please use a right json object to create a table, it can't be empty")); : } Doesn't the call to JsonStringToMessage below handle this case already? http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1287 PS13, Line 1287: .Description("Create a new table") We should probably doc (maybe in ExtraDescription?) that tool.proto describes the layout for the JSON object. http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1289 PS13, Line 1289: .AddRequiredParameter({ kCreateTableJSONArg, "JOSN object for creating table" }) JSON http://gerrit.cloudera.org:8080/#/c/14306/13/src/kudu/tools/tool_action_table.cc@1292 PS13, Line 1292: return ModeBuilder("table") In theory the actions are supposed to be alphabetically sorted. Could you restore that? -- To view, visit http://gerrit.cloudera.org:8080/14306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0bce4733a504f8ef5f024089a16fe3c6f1e493f1 Gerrit-Change-Number: 14306 Gerrit-PatchSet: 13 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Mon, 14 Oct 2019 04:47:54 +0000 Gerrit-HasComments: Yes
