Hello Lars Volker, Tim Armstrong, Todd Lipcon, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13545

to look at the new patch set (#8).

Change subject: IMPALA-8630: Hash the full path when calculating consistent 
remote placement
......................................................................

IMPALA-8630: Hash the full path when calculating consistent remote placement

Consistent remote placement currently uses the relative filename within
a partition for the consistent hash. If the relative filenames for
different partitions have a simple naming scheme, then multiple
partitions may have files of the same name. This is true for some
tables written by Hive (e.g. in our minicluster the tpcds.store_sales
has this problem). This can lead to unbalanced placement of remote
ranges.

This adds a partition_path_hash to the THdfsFileSplit and
THdfsFileSplitGeneratorSpec, calculated in the frontend (which has all of
the partition information). The scheduler hashes this in addition to
the relative path.

Testing:
 - Added several new scheduler tests that verify the consistent remote
   scheduling sees blocks with different relative paths, partition paths,
   or offsets as distinct.
 - Ran core tests

Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/ExplainTest.java
7 files changed, 330 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13545/8
--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to