Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 )
Change subject: IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG@24 PS7, Line 24: see > nit: sees? Done http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h File be/src/scheduling/scheduler-test-util.h: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h@95 PS7, Line 95: PARTITIONED_SIMPLE_NAMES > PARTITIONED_SINGLE_NAME? PARTITIONED_CONSTANT_NAMES? Changed these to PARTITIONED_SINGLE_FILENAME and PARTITIONED_UNIQUE_FILENAMES. http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@295 PS7, Line 295: // Encoding the table name and index in the file helps debugging. > Maybe include one or two examples of a path in the comment in each case? Th Added examples for each section http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@353 PS7, Line 353: GetBlockPaths(table_name, true, spec_idx, naming_policy, &relative_path, > One of the parameters could be returned by value, e.g. the partition_id I think I prefer returning multiple things all via args rather than having one via return value, so I'm going to leave this. http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@264 PS7, Line 264: void InitRemoteCluster(Cluster *cluster) { > This could be a member of Cluster, e.g. static Cluster CreateRemoteCluster( Moved this to a static function on Cluster. I added a CreateStandardRemoteCluster for the 50 impala node, 3 data node configuration. http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@305 PS7, Line 305: > Can you add brief comments above each test describing what to do (see other Added comments for the new tests http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@314 PS7, Line 314: Schema schema(cluster); > These loops could benefit from adding a SCOPED_TRACE for the naming policy Added SCOPED_TRACE for the naming policy. http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@195 PS7, Line 195: partition's path > For S3, this could be a bucket? Does it matter whether it uses the whole pa The partition_id is something that Impala assigns relatively arbitrarily and it is not really stable across time. For example, it is not stable across an "invalidate metadata" for a table. So, it does need to be the path. http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@196 PS7, Line 196: consistent > "consistent" here mean "given the same input, must return the same hash", r Right -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Thu, 20 Jun 2019 00:55:58 +0000 Gerrit-HasComments: Yes
