Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24403

to look at the new patch set (#4).

Change subject: IMPALA-14702: Add ability to build against Google Tcmalloc
......................................................................

IMPALA-14702: Add ability to build against Google Tcmalloc

Impala currently uses Gperftools TCMalloc, which was originally
developed by Google but is now its own open source community.
Google continued development internally and created a new open
source project with their improved version. The biggest changes
are:
 - Google TCMalloc uses Linux RSEQ functionality to use CPU
   caches rather than thread caches. This avoids stranding memory
   in inactive threads. It also avoids work when threads start
   and stop.
 - Google TCMalloc adds native huge page support. It backs most
   allocations with huge pages, which can reduce TLB misses.
There are many other changes across many other areas, including
profiling and NUMA support.

This adds support for building against Google TCMalloc. It is
currently controlled by the IMPALA_MALLOC_IMPL environment
variable, which defaults to "gperftools". When set to
"googletcmalloc", it builds against Google TCMalloc. This is
using a custom CMake build of Google TCMalloc with a couple
patches to make it work. Unlike the regular Google TCMalloc,
this uses madvise() with MADV_HUGEPAGE to allow it to function
on systems with only madvise huge page support. Google TCMalloc
requires Abseil, so this adds an Abseil dependency.

Google TCMalloc retains unused memory, and Impala uses the same
integration points as gperftools with aggressive decommit off.
We start a background thread that periodically releases memory.
Unlike gpeftools, Google TCMalloc provides a
MallocExtension::ProcessBackgroundActions() function that does
various maintenance actions and releases memory periodically
to control the memory overhead. Rather than implementing our
own logic, we use that logic and rely on its decisions about
retaining memory. We also register a garbage collection function
to free memory immediately when hitting the process memory limit.

Since Google TCMalloc is aware of huge pages, this changes the
buffer pool's madvise_huge_page to avoid using madvise() when
the malloc implementation natively supports huge pages.

Google TCMalloc's per-CPU caches rely on RSEQ support, and
it's use of RSEQ currently conflicts with glibc's use of
RSEQ. This disables glibc's use of RSEQ via the
GLIBC_TUNABLES=glibc.pthread.rseq=0 when using Google TCMalloc
in the dev environment.

There will be future changes to package this properly.

Testing:
 - Ran a core job with IMPALA_MALLOC_IMPL=googletcmalloc

Change-Id: I5a84eacb66eb0a216bfb2159542a0d7e4ddf8ec2
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/global-flags.cc
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/runtime/bufferpool/system-allocator.h
M be/src/util/CMakeLists.txt
A be/src/util/malloc-util-googletcmalloc.h
M be/src/util/malloc-util.cc
M be/src/util/malloc-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/run-binary.sh
M bin/run-jvm-binary.sh
M bin/start-impala-cluster.py
M common/thrift/metrics.json
M tests/common/environ.py
M tests/common/skip.py
M tests/custom_cluster/test_malloc_impls.py
18 files changed, 449 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/24403/4
--
To view, visit http://gerrit.cloudera.org:8080/24403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5a84eacb66eb0a216bfb2159542a0d7e4ddf8ec2
Gerrit-Change-Number: 24403
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to