Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13545 )

Change subject: IMPALA-8630: Hash the full path when calculating consistent 
remote placement
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG@24
PS7, Line 24: see
nit: sees?


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h
File be/src/scheduling/scheduler-test-util.h:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h@95
PS7, Line 95: PARTITIONED_SIMPLE_NAMES
PARTITIONED_SINGLE_NAME? PARTITIONED_CONSTANT_NAMES?


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@295
PS7, Line 295:     // Encoding the table name and index in the file helps 
debugging.
Maybe include one or two examples of a path in the comment in each case? That 
might make the intent easier to follow.


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@353
PS7, Line 353:   GetBlockPaths(table_name, true, spec_idx, naming_policy, 
&relative_path,
One of the parameters could be returned by value, e.g. the partition_id


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@264
PS7, Line 264: void InitRemoteCluster(Cluster *cluster) {
This could be a member of Cluster, e.g. static Cluster 
CreateRemoteCluster(num_impala_nodes, num_data_nodes).


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@305
PS7, Line 305:
Can you add brief comments above each test describing what to do (see other 
tests in this file for examples)?


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@314
PS7, Line 314:     Schema schema(cluster);
These loops could benefit from adding a SCOPED_TRACE for the naming policy 
(https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#adding-traces-to-assertions).


http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@195
PS7, Line 195: partition's path
For S3, this could be a bucket? Does it matter whether it uses the whole path 
or just the partition id? Should we just call it "partition_hash"?


http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@196
PS7, Line 196: consistent
"consistent" here mean "given the same input, must return the same hash", right?



--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Wed, 19 Jun 2019 22:29:03 +0000
Gerrit-HasComments: Yes

Reply via email to