Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17025 )
Change subject: wip KUDU-2671: Adds new fields to PartitionSchema. ...................................................................... Patch Set 2: (7 comments) mostly fixing test failures w/ 2nd patch, still need to look into RowOperationsPBDecoder allowing a client schema w/ column IDs more. http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@9 PS1, Line 9: two new fields to PartitionSchema, range bounds : and their respective hash bucket schemas > nit: seems like you've since merged these into one field, which I like. Ack http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@20 PS1, Line 20: viability of : using the schema without its columnIds as the client_schema. > I think this should be safe. The main point of distinction between 'client_ Ack http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto@356 PS1, Line 356: // Each index of 'range_bounds' represents the upper and lower bounds of : // ranges whose hash bucket schemas were s > nit: should doc what these are and ordering requirements with respect to ea Done http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h File src/kudu/common/partition.h: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@176 PS1, Line 176: schema, > It'd be great if you could add more context into how and in what context th Currently, the RowOperationsPBDecoder constructor requires a client schema since it maps the client column indices to the server-side column indices. The two places where we have access to an explicit client schema is in CatalogManager, both in the create/alter table paths. The other use cases will always require a copy of the server schema without its column IDs. As for your idea of allowing the client schema to have column IDs and no-op the ClientServerMapping checks, I agree with the thesis that in the use cases where we don't have an explicit schema, using the same schema for both the server/client schema renders the Mapping useless. As for the implementation, I need to look into it further to ensure its viability. http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@180 PS1, Line 180: rtitio > nit: add docs for new fields Done http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@256 PS1, Line 256: hash_bucket_pb->set_num_buckets(hash_bucket.num_buckets); > nit: should probably wrap all of this in an if (!ranges_with_hash_schemas.e Done http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@262 PS1, Line 262: pb->mutable_range_bounds()->Reserve(ranges_with_hash_schemas_.size()); : } > I forget if we've discussed this, but is it possible for either of these to Currently, if the client doesn't specify a lower/upper bound for a range, the range's respective lower/upper bound is -Inf and +Inf. I haven't written any explicit tests for per-range hash schemas for unbounded ranges but given it's compatible with today's partitioning design I don't see why it wouldn't be. It wouldn't hurt to add tests for it though. -- To view, visit http://gerrit.cloudera.org:8080/17025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5d8615ab9967fdb40292b9c77eb68a19baeca1d Gerrit-Change-Number: 17025 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Wed, 10 Feb 2021 21:45:06 +0000 Gerrit-HasComments: Yes
