Baike Xia has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
......................................................................

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
A be/src/util/hash-util-test.cc
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
53 files changed, 2,466 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/14
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>

Reply via email to