Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13545 )

Change subject: IMPALA-8630: Hash the full path when calculating consistent 
remote placement
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13545/7//COMMIT_MSG@24
PS7, Line 24: see
> nit: sees?
Done


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h
File be/src/scheduling/scheduler-test-util.h:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.h@95
PS7, Line 95: PARTITIONED_SIMPLE_NAMES
> PARTITIONED_SINGLE_NAME? PARTITIONED_CONSTANT_NAMES?
Changed these to PARTITIONED_SINGLE_FILENAME and PARTITIONED_UNIQUE_FILENAMES.


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@295
PS7, Line 295:     // Encoding the table name and index in the file helps 
debugging.
> Maybe include one or two examples of a path in the comment in each case? Th
Added examples for each section


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test-util.cc@353
PS7, Line 353:   GetBlockPaths(table_name, true, spec_idx, naming_policy, 
&relative_path,
> One of the parameters could be returned by value, e.g. the partition_id
I think I prefer returning multiple things all via args rather than having one 
via return value, so I'm going to leave this.


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@264
PS7, Line 264: void InitRemoteCluster(Cluster *cluster) {
> This could be a member of Cluster, e.g. static Cluster CreateRemoteCluster(
Moved this to a static function on Cluster. I added a 
CreateStandardRemoteCluster for the 50 impala node, 3 data node configuration.


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@305
PS7, Line 305:
> Can you add brief comments above each test describing what to do (see other
Added comments for the new tests


http://gerrit.cloudera.org:8080/#/c/13545/7/be/src/scheduling/scheduler-test.cc@314
PS7, Line 314:     Schema schema(cluster);
> These loops could benefit from adding a SCOPED_TRACE for the naming policy
Added SCOPED_TRACE for the naming policy.


http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@195
PS7, Line 195: partition's path
> For S3, this could be a bucket? Does it matter whether it uses the whole pa
The partition_id is something that Impala assigns relatively arbitrarily and it 
is not really stable across time. For example, it is not stable across an 
"invalidate metadata" for a table. So, it does need to be the path.


http://gerrit.cloudera.org:8080/#/c/13545/7/common/thrift/PlanNodes.thrift@196
PS7, Line 196: consistent
> "consistent" here mean "given the same input, must return the same hash", r
Right



--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Thu, 20 Jun 2019 00:55:58 +0000
Gerrit-HasComments: Yes

Reply via email to