Hello Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24402
to look at the new patch set (#4).
Change subject: IMPALA-14900: Add support for turning off aggressive decommit
......................................................................
IMPALA-14900: Add support for turning off aggressive decommit
Impala has used TCMalloc's aggressive decommit setting
for several years, but it increases the OS allocation /
deallocation rate and can lead to contention on TCMalloc's
central structures. There are many pieces of code that
still rely on malloc for their memory, including
performance sensitive pieces of query execution. Retaining
some malloc memory can accelerate those codepaths by
avoiding OS allocation / deallocation cycles. TCMalloc
holds a lock while allocating and deallocating memory,
and retaining memory can also avoid extreme cases with
high lock contention. For example, there have been previous
issues using 1MB Parquet data pages, because the large
allocations bypass the thread caches and come directly
from the central structures.
There is a long history behind this setting:
- As of late 2015 / Impala 2.3, Impala would let tcmalloc
retain memory. It had two mechanisms for releasing
memory. The first was a periodic check to see if the
overhead of tcmalloc exceeded the memory used. The
second was a garbage collection function that ran
when hitting the process memory limit. Both mechanisms
would free ALL excess tcmalloc heap memory via a single
call to ReleaseFreeMemory(). TCMalloc is holding a lock
for this call, and this can stall other work until it
completes. It could be freeing dozens of GBs and this
could hold the lock for 15 seconds. This issue was
reported via IMPALA-2800.
- In IMPALA-3162, Impala moved to gperftools 2.5, which
had aggressive decommit enabled by default. This frees
memory immediately, so the mechanisms to free memory
had nothing to do. This solved IMPALA-2800. The obsolete
code for the periodic check and garbage collection
function were removed in IMPALA-5220.
- Gperftools only had aggressive decommit enabled by
default for a short period of time. It was enabled by
default in 2.4 and was disabled by default in 2.6.
- When Impala upgraded gperftools later, we added code
to manually set aggressive decommit.
This adds back an option to turn off aggressive decommit.
The shape is similar to the old mechanisms: there is a
background thread doing a periodic check to manage the
memory overhead and a garbage collection function that
gets called when hitting the process memory limit. This
has been redesigned to avoid the issue from IMPALA-2800
(based on an early approach to IMPALA-2800 by Todd Lipcon):
- Both enforcement locations are freeing a specific amount
of memory rather than all accumulated memory (i.e. it
calls ReleaseToSystem() with a target amount of memory
to free). The background thread is maintaining an overhead
specified by the tcmalloc_max_free_bytes startup option.
This can be an absolute value or a percentage of the
process memory limit. It defaults to 5% of the process
memory limit. The garbage collection function is
freeing enough memory to avoid hitting the process
memory limit, plus a bit extra (512MB) to avoid calling
the GC function too frequently.
- Both enforcement locations free memory in small chunks
to avoid holding the lock for extended periods of time.
The chunk size is specified by the tcmalloc_garbage_collection_chunk_size
startup option and defaults to 10MB.
- The implementation retains significantly less memory and
frees it without holding the lock for extended periods of
time.
- Other things have changed since then: The buffer pool
retains memory and frees it gradually over time. This also
reduces the need for freeing a large amount of memory
immediately.
Turning off aggressive decommit is currently incompatible with
the madvise_huge_pages=true startup option. This modifies the
startup check so that aggressive decommit can be false if
madvise_huge_pages is false. A future change may provide a
way to mmap huge buffers to allow these to work together.
This adds the --tcmalloc_aggressive_decommit option to
bin/start-impala-cluster.py to make it easier to startup
the cluster. The default value is determined by the
IMPALA_TCMALLOC_AGGRESSIVE_DECOMMIT environment variable,
so this makes it possible to run cluster tests with this
option.
Testing:
- Added a custom cluster test to run TPC-DS with tcmalloc
aggressive decommit off
- Ran a core job with IMPALA_TCMALLOC_AGGRESSIVE_DECOMMIT=false
Change-Id: If6022f14093f362a5de9a854f4f4496c90b049b8
---
M be/src/common/daemon-env.cc
M be/src/common/global-flags.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/test-env.cc
M be/src/util/malloc-util-gperftools.h
M be/src/util/malloc-util-libc.h
M be/src/util/malloc-util-sanitizers.h
M be/src/util/malloc-util.h
M be/src/util/metrics-test.cc
M bin/start-impala-cluster.py
M tests/common/environ.py
M tests/common/skip.py
A tests/custom_cluster/test_malloc_impls.py
17 files changed, 299 insertions(+), 58 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/24402/4
--
To view, visit http://gerrit.cloudera.org:8080/24402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6022f14093f362a5de9a854f4f4496c90b049b8
Gerrit-Change-Number: 24402
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>