Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16031 )
Change subject: KUDU-1802: Avoid call to master when deserializing scan tokens ...................................................................... Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/16031/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16031/5//COMMIT_MSG@23 PS5, Line 23: This doesn’t avoid the need for a call to the master in the case : of writing data to Kudu It also doesn't populate the meta cache, right? We still need to send GetTableLocationsRequests to the master, no? http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@60 PS5, Line 60: This shouldn't be required. nit: it may not be obvious from the laid out solution why this is required. Mind fleshing out the issue a bit more? http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java: http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@232 PS5, Line 232: partitionSchema.getHashBucketSchemas()) { : Common.PartitionSchemaPB.HashBucketSchemaPB.Builder hbsBuilder = : Common.PartitionSchemaPB.HashBucketSchemaPB.newBuilder() : .addAllColumns(idsToPb(hashBucketSchema.getColumnIds())) : .setNumBuckets(hashBucketSchema.getNumBuckets()) : .setSeed(hashBucketSchema.getSeed()); : builder.addHashBucketSchemas(hbsBuilder.build()); : } : : Common.PartitionSchemaPB.RangeSchemaPB rangeSchemaPB = : Common.PartitionSchemaPB.RangeSchemaPB.newBuilder() : .addAllColumns(idsToPb(partitionSchema.getRangeSchema().getColumnIds())) : .build(); nit: typical newline alignment in Java code is to 4 spaces, not 8, though required by the GSG. Also align the .calls()? Same elsewhere http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java: http://gerrit.cloudera.org:8080/#/c/16031/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@489 PS5, Line 489: // TODO(ghenke): This shouldn't be required. Instead we should use snapshot scans : // and Kudu should use the schema at the given snapshot time when scanning or we : // should leverage column ids in the projection and predicates for scans : // (similar to how we use the table id to handle table renames). Should this be a jira? Seems high-hanging enough that a TODO might not do this justice, especially if we're copying this rationale everywhere. http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/client.proto File src/kudu/client/client.proto: http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/client.proto@28 PS5, Line 28: message TableMetadataPB { nit: add some docs explaining how this can be used, maybe adding some color around what RPCs are sent in the absence of these tokens. http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc File src/kudu/client/scan_token-internal.cc: http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc@128 PS5, Line 128: table.reset(new KuduTable(client->shared_from_this(), message.table_name(), message.table_id(), : metadata.num_replicas(), kuduSchema, partition_schema, extra_configs)); This doesn't clear the meta cache as we would in OpenTable() -- is that important? There's a blurb in client-internal.cc about why it might be. http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-internal.cc@266 PS5, Line 266: pb.set_table_id(table->id()); : pb.set_table_name(table->name()); Since these are optional when including table metadata, is there a reason to set these? Ah from the commit message, it's for backwards compat. Maybe add a comment here denoting that, since it isn't obvious. http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-test.cc File src/kudu/client/scan_token-test.cc: http://gerrit.cloudera.org:8080/#/c/16031/5/src/kudu/client/scan_token-test.cc@621 PS5, Line 621: ASSERT_OK(tokenWithMetadata->IntoKuduScanner(&scanner_ptr)); nit: Perhaps we should convert this to some convenience test method like: Status IntoUniqueScanner(const KuduScanToken& token, unique_ptr<KuduScanner>* scanner_ptr) { KuduScanner* scanner_ptr_raw; RETURN_NOT_OK(token.IntoKuduScanner(&scanner_ptr_raw)); scanner_ptr->reset(scanner_ptr_raw); return Status::OK() } -- To view, visit http://gerrit.cloudera.org:8080/16031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b Gerrit-Change-Number: 16031 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Mon, 08 Jun 2020 22:23:31 +0000 Gerrit-HasComments: Yes
