Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17025 )
Change subject: wip KUDU-2671: Adds new fields to PartitionSchema. ...................................................................... Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@9 PS1, Line 9: two new fields to PartitionSchema, range bounds : and their respective hash bucket schemas nit: seems like you've since merged these into one field, which I like. http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@20 PS1, Line 20: viability of : using the schema without its columnIds as the client_schema. I think this should be safe. The main point of distinction between 'client_schema' and 'tablet_schema' is that way back when, we didn't want to expose column IDs as a public interface, and so effort was put in to ensuring clients don't know about them. AFAICT there isn't a correctness concern in using the tablet schema sans IDs as the client schema. http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto@356 PS1, Line 356: repeated PerRangeHashBucketSchemasPB range_hash_schemas = 3; : repeated RowOperationsPB range_bounds = 4; nit: should doc what these are and ordering requirements with respect to each other. http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h File src/kudu/common/partition.h: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@176 PS1, Line 176: client_schema It'd be great if you could add more context into how and in what context this will be used. In the places it matters, will we always have access to a client schema without column IDs? Or will we have to make a copy every time? I don't think this is used on any hot paths (at worst, table/tablet creation), but I wonder if it's worth updating the decoder code to just use 'schema'. E.g. Could we update DecodeOperations to allow the client schema to have column IDs, and if it does, no-op the ClientServerMapping checks? I can imagine maybe allowing ClientServerMapping's client_schema to be null, and if so, always returning Status::OK(). Have you considered that? It seems like it'd be safe, since the whole purpose of the ClientServerMapping is to ensure the schemas match, which doesn't seem important if we only have one schema. http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@180 PS1, Line 180: Schema nit: add docs for new fields http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@256 PS1, Line 256: pb->mutable_range_hash_schemas()->Reserve(ranges_with_hash_schemas_.size()); nit: should probably wrap all of this in an if (!ranges_with_hash_schemas.empty()), since calling mutable_* allocates a new object. http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@262 PS1, Line 262: KuduPartialRow lower(&schema); : KuduPartialRow upper(&schema); I forget if we've discussed this, but is it possible for either of these to be -Inf or +Inf? Can users specify per-range hash buckets for unbounded ranges? -- To view, visit http://gerrit.cloudera.org:8080/17025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5d8615ab9967fdb40292b9c77eb68a19baeca1d Gerrit-Change-Number: 17025 Gerrit-PatchSet: 1 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 19:32:09 +0000 Gerrit-HasComments: Yes
