Mahesh Reddy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17025 )

Change subject: wip KUDU-2671: Adds new fields to PartitionSchema.
......................................................................


Patch Set 2:

(7 comments)

mostly fixing test failures w/ 2nd patch, still need to look into 
RowOperationsPBDecoder allowing a client schema w/ column IDs more.

http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@9
PS1, Line 9: two new fields to PartitionSchema, range bounds
           : and their respective hash bucket schemas
> nit: seems like you've since merged these into one field, which I like.
Ack


http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@20
PS1, Line 20: viability of
            : using the schema without its columnIds as the client_schema.
> I think this should be safe. The main point of distinction between 'client_
Ack


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto@356
PS1, Line 356:   // Each index of 'range_bounds' represents the upper and lower 
bounds of
             :   // ranges whose hash bucket schemas were s
> nit: should doc what these are and ordering requirements with respect to ea
Done


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@176
PS1, Line 176: schema,
> It'd be great if you could add more context into how and in what context th
Currently, the RowOperationsPBDecoder constructor requires a client schema 
since it maps the client column indices to the server-side column indices. The 
two places where we have access to an explicit client schema is in 
CatalogManager, both in the create/alter table paths. The other use cases will 
always require a copy of the server schema without its column IDs.

As for your idea of allowing the client schema to have column IDs and no-op the 
ClientServerMapping checks, I agree with the thesis that in the use cases where 
we don't have an explicit schema, using the same schema for both the 
server/client schema renders the Mapping useless. As for the implementation, I 
need to look into it further to ensure its viability.


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@180
PS1, Line 180: rtitio
> nit: add docs for new fields
Done


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@256
PS1, Line 256:   hash_bucket_pb->set_num_buckets(hash_bucket.num_buckets);
> nit: should probably wrap all of this in an if (!ranges_with_hash_schemas.e
Done


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@262
PS1, Line 262: 
pb->mutable_range_bounds()->Reserve(ranges_with_hash_schemas_.size());
             :   }
> I forget if we've discussed this, but is it possible for either of these to
Currently, if the client doesn't specify a lower/upper bound for a range, the 
range's respective lower/upper bound is -Inf and +Inf. I haven't written any 
explicit tests for per-range hash schemas for unbounded ranges but given it's 
compatible with today's partitioning design I don't see why it wouldn't be. It 
wouldn't hurt to add tests for it though.



--
To view, visit http://gerrit.cloudera.org:8080/17025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5d8615ab9967fdb40292b9c77eb68a19baeca1d
Gerrit-Change-Number: 17025
Gerrit-PatchSet: 2
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Wed, 10 Feb 2021 21:45:06 +0000
Gerrit-HasComments: Yes

Reply via email to