Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19430 )
Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table ...................................................................... IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,279 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/11 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]>
