Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17025 )

Change subject: wip KUDU-2671: Adds new fields to PartitionSchema.
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@9
PS1, Line 9: two new fields to PartitionSchema, range bounds
           : and their respective hash bucket schemas
nit: seems like you've since merged these into one field, which I like.


http://gerrit.cloudera.org:8080/#/c/17025/1//COMMIT_MSG@20
PS1, Line 20: viability of
            : using the schema without its columnIds as the client_schema.
I think this should be safe. The main point of distinction between 
'client_schema' and 'tablet_schema' is that way back when, we didn't want to 
expose column IDs as a public interface, and so effort was put in to ensuring 
clients don't know about them. AFAICT there isn't a correctness concern in 
using the tablet schema sans IDs as the client schema.


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/common.proto@356
PS1, Line 356:   repeated PerRangeHashBucketSchemasPB range_hash_schemas = 3;
             :   repeated RowOperationsPB range_bounds = 4;
nit: should doc what these are and ordering requirements with respect to each 
other.


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@176
PS1, Line 176: client_schema
It'd be great if you could add more context into how and in what context this 
will be used. In the places it matters, will we always have access to a client 
schema without column IDs? Or will we have to make a copy every time? I don't 
think this is used on any hot paths (at worst, table/tablet creation), but I 
wonder if it's worth updating the decoder code to just use 'schema'.

E.g. Could we update DecodeOperations to allow the client schema to have column 
IDs, and if it does, no-op the ClientServerMapping checks? I can imagine maybe 
allowing ClientServerMapping's client_schema to be null, and if so, always 
returning Status::OK(). Have you considered that? It seems like it'd be safe, 
since the whole purpose of the ClientServerMapping is to ensure the schemas 
match, which doesn't seem important if we only have one schema.


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.h@180
PS1, Line 180: Schema
nit: add docs for new fields


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@256
PS1, Line 256: 
pb->mutable_range_hash_schemas()->Reserve(ranges_with_hash_schemas_.size());
nit: should probably wrap all of this in an if 
(!ranges_with_hash_schemas.empty()), since calling mutable_* allocates a new 
object.


http://gerrit.cloudera.org:8080/#/c/17025/1/src/kudu/common/partition.cc@262
PS1, Line 262: KuduPartialRow lower(&schema);
             :     KuduPartialRow upper(&schema);
I forget if we've discussed this, but is it possible for either of these to be 
-Inf or +Inf? Can users specify per-range hash buckets for unbounded ranges?



--
To view, visit http://gerrit.cloudera.org:8080/17025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5d8615ab9967fdb40292b9c77eb68a19baeca1d
Gerrit-Change-Number: 17025
Gerrit-PatchSet: 1
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 19:32:09 +0000
Gerrit-HasComments: Yes

Reply via email to