Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10421 )
Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() ...................................................................... IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: +------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +------------+-----------------------+---------+------------+------------+----------------+ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | +------------+-----------------------+---------+------------+------------+----------------+ +---------------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---------------------+-----------------------+---------+------------+------------+----------------+ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +---------------------+-----------------------+---------+------------+------------+----------------+ +-------------------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-------------------------+-----------------------+---------+------------+------------+----------------+ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-------------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-------------------+-----------------------+---------+------------+------------+----------------+ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +-------------------+-----------------------+---------+------------+------------+----------------+ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h A testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test M tests/query_test/test_codegen.py 25 files changed, 427 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/2 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>