[kudu-CR] KUDU-1802: Avoid call to master when deserializing scan tokens

Andrew Wong (Code Review) Mon, 08 Jun 2020 15:23:59 -0700

Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16031 )


Change subject: KUDU-1802: Avoid call to master when deserializing scan tokens
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16031/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16031/5//COMMIT_MSG@23
PS5, Line 23: This doesn’t avoid the need for a call to the master in the case
            : of writing data to Kudu
It also doesn't populate the meta cache, right? We still need to send 
GetTableLocationsRequests to the master, no?


http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@60
PS5, Line 60: This shouldn't be required.
nit: it may not be obvious from the laid out solution why this is required. 
Mind fleshing out the issue a bit more?


http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java:

http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@232
PS5, Line 232:             partitionSchema.getHashBucketSchemas()) {
             :       Common.PartitionSchemaPB.HashBucketSchemaPB.Builder 
hbsBuilder =
             :               
Common.PartitionSchemaPB.HashBucketSchemaPB.newBuilder()
             :                       
.addAllColumns(idsToPb(hashBucketSchema.getColumnIds()))
             :                       
.setNumBuckets(hashBucketSchema.getNumBuckets())
             :                       .setSeed(hashBucketSchema.getSeed());
             :       builder.addHashBucketSchemas(hbsBuilder.build());
             :     }
             :
             :     Common.PartitionSchemaPB.RangeSchemaPB rangeSchemaPB =
             :             Common.PartitionSchemaPB.RangeSchemaPB.newBuilder()
             :                     
.addAllColumns(idsToPb(partitionSchema.getRangeSchema().getColumnIds()))
             :                     .build();
nit: typical newline alignment in Java code is to 4 spaces, not 8, though 
required by the GSG. Also align the .calls()? Same elsewhere


http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java:

http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@489
PS5, Line 489: // TODO(ghenke): This shouldn't be required. Instead we should 
use snapshot scans
             :         //  and Kudu should use the schema at the given snapshot 
time when scanning or we
             :         //  should leverage column ids in the projection and 
predicates for scans
             :         //  (similar to how we use the table id to handle table 
renames).
Should this be a jira? Seems high-hanging enough that a TODO might not do this 
justice, especially if we're copying this rationale everywhere.


http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/client.proto@28
PS5, Line 28: message TableMetadataPB {
nit: add some docs explaining how this can be used, maybe adding some color 
around what RPCs are sent in the absence of these tokens.


http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc
File src/kudu/client/scan_token-internal.cc:

http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc@128
PS5, Line 128: table.reset(new KuduTable(client->shared_from_this(), 
message.table_name(), message.table_id(),
             :         metadata.num_replicas(), kuduSchema, partition_schema, 
extra_configs));
This doesn't clear the meta cache as we would in OpenTable() -- is that 
important? There's a blurb in client-internal.cc about why it might be.


http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc@266
PS5, Line 266:   pb.set_table_id(table->id());
             :   pb.set_table_name(table->name());
Since these are optional when including table metadata, is there a reason to 
set these?

Ah from the commit message, it's for backwards compat. Maybe add a comment here 
denoting that, since it isn't obvious.


http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-test.cc@621
PS5, Line 621:   ASSERT_OK(tokenWithMetadata->IntoKuduScanner(&scanner_ptr));
nit: Perhaps we should convert this to some convenience test method like:

Status IntoUniqueScanner(const KuduScanToken& token, unique_ptr<KuduScanner>* 
scanner_ptr) {
  KuduScanner* scanner_ptr_raw;
  RETURN_NOT_OK(token.IntoKuduScanner(&scanner_ptr_raw));
  scanner_ptr->reset(scanner_ptr_raw);
  return Status::OK()
}



--
To view, visit http://gerrit.cloudera.org:8080/16031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b
Gerrit-Change-Number: 16031
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Mon, 08 Jun 2020 22:23:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1802: Avoid call to master when deserializing scan tokens

Reply via email to