Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5844: use a MemPool for expr local allocations ......................................................................
IMPALA-5844: use a MemPool for expr local allocations Local allocations in expressions have the following properties: * They are usually small allocations * They can be made frequently (e.g. every function call) * They are owned and managed by the Impala runtime * They are freed in bulk at various points in query execution. A MemPool (i.e. bump allocator) is the right mechanism to manage allocations with the above properties. Before this patch FunctionContext's used a FreePool + vector of allocations to emulate the above behaviour. This patch switches to using a MemPool to bring these allocations in line with the rest of the codebase. The steps required to do this conversion. * Use a MemPool for FunctionContext local allocations. * Identify appropriate MemPools for all of the local allocations from function contexts so that the memory lifetime is correct. * Various cleanup and documentation of existing MemPools. * Replaces calls to FreeLocalAllocations() with calls to MemPool::Clear() More involved surgery was required in a few places: * Made the Sorter own its comparator, exprs and MemPool. * Remove FunctionContext::ReallocateLocal() and just have StringFunctions::Replace() do the doubling itself to avoid the need for a special interface. Worst-case this doubles the memory requirements for Replace() since n / 2 + n / 4 + n / 8 + .... bytes of memory could be wasted instead of recycled for an n-byte output string. * Provide a way redirect agg fn Serialize()/Finalize() allocations to come directly from the output RowBatch's MemPool. This is also potentially applicable to other places where we currently copy out strings from local allocations, e.g. AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs(). * --stress_free_pool_alloc was changed to instead intercept at the FunctionContext layer so that it retains the old behaviour even though allocations do not all come from FreePools. Testing: * ran exhaustive and ASAN Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/common/global-flags.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/plan-root-sink.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/case-expr.cc M be/src/exprs/expr-test.cc M be/src/exprs/hive-udf-call.cc M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/string-functions-ir.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/descriptors.cc D be/src/runtime/free-pool.cc M be/src/runtime/free-pool.h M be/src/runtime/sorted-run-merger.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/tuple.cc M be/src/service/fe-support.cc M be/src/udf/udf-internal.h M be/src/udf/udf-test-harness.cc M be/src/udf/udf.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M tests/custom_cluster/test_alloc_fail.py 72 files changed, 621 insertions(+), 741 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/8025/6 -- To view, visit http://gerrit.cloudera.org:8080/8025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]>
