Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 )
Change subject: IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG@24 PS7, Line 24: see nit: sees? http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h File be/src/scheduling/scheduler-test-util.h: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h@95 PS7, Line 95: PARTITIONED_SIMPLE_NAMES PARTITIONED_SINGLE_NAME? PARTITIONED_CONSTANT_NAMES? http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@295 PS7, Line 295: // Encoding the table name and index in the file helps debugging. Maybe include one or two examples of a path in the comment in each case? That might make the intent easier to follow. http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@353 PS7, Line 353: GetBlockPaths(table_name, true, spec_idx, naming_policy, &relative_path, One of the parameters could be returned by value, e.g. the partition_id http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@264 PS7, Line 264: void InitRemoteCluster(Cluster *cluster) { This could be a member of Cluster, e.g. static Cluster CreateRemoteCluster(num_impala_nodes, num_data_nodes). http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@305 PS7, Line 305: Can you add brief comments above each test describing what to do (see other tests in this file for examples)? http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@314 PS7, Line 314: Schema schema(cluster); These loops could benefit from adding a SCOPED_TRACE for the naming policy (https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#adding-traces-to-assertions). http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@195 PS7, Line 195: partition's path For S3, this could be a bucket? Does it matter whether it uses the whole path or just the partition id? Should we just call it "partition_hash"? http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@196 PS7, Line 196: consistent "consistent" here mean "given the same input, must return the same hash", right? -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 19 Jun 2019 22:29:03 +0000 Gerrit-HasComments: Yes