Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24402

to look at the new patch set (#4).

Change subject: IMPALA-14900: Add support for turning off aggressive decommit
......................................................................

IMPALA-14900: Add support for turning off aggressive decommit

Impala has used TCMalloc's aggressive decommit setting
for several years, but it increases the OS allocation /
deallocation rate and can lead to contention on TCMalloc's
central structures. There are many pieces of code that
still rely on malloc for their memory, including
performance sensitive pieces of query execution. Retaining
some malloc memory can accelerate those codepaths by
avoiding OS allocation / deallocation cycles. TCMalloc
holds a lock while allocating and deallocating memory,
and retaining memory can also avoid extreme cases with
high lock contention. For example, there have been previous
issues using 1MB Parquet data pages, because the large
allocations bypass the thread caches and come directly
from the central structures.

There is a long history behind this setting:
 - As of late 2015 / Impala 2.3, Impala would let tcmalloc
   retain memory. It had two mechanisms for releasing
   memory. The first was a periodic check to see if the
   overhead of tcmalloc exceeded the memory used. The
   second was a garbage collection function that ran
   when hitting the process memory limit. Both mechanisms
   would free ALL excess tcmalloc heap memory via a single
   call to ReleaseFreeMemory(). TCMalloc is holding a lock
   for this call, and this can stall other work until it
   completes. It could be freeing dozens of GBs and this
   could hold the lock for 15 seconds. This issue was
   reported via IMPALA-2800.
 - In IMPALA-3162, Impala moved to gperftools 2.5, which
   had aggressive decommit enabled by default. This frees
   memory immediately, so the mechanisms to free memory
   had nothing to do. This solved IMPALA-2800. The obsolete
   code for the periodic check and garbage collection
   function were removed in IMPALA-5220.
 - Gperftools only had aggressive decommit enabled by
   default for a short period of time. It was enabled by
   default in 2.4 and was disabled by default in 2.6.
 - When Impala upgraded gperftools later, we added code
   to manually set aggressive decommit.

This adds back an option to turn off aggressive decommit.
The shape is similar to the old mechanisms: there is a
background thread doing a periodic check to manage the
memory overhead and a garbage collection function that
gets called when hitting the process memory limit. This
has been redesigned to avoid the issue from IMPALA-2800
(based on an early approach to IMPALA-2800 by Todd Lipcon):
 - Both enforcement locations are freeing a specific amount
   of memory rather than all accumulated memory (i.e. it
   calls ReleaseToSystem() with a target amount of memory
   to free). The background thread is maintaining an overhead
   specified by the tcmalloc_max_free_bytes startup option.
   This can be an absolute value or a percentage of the
   process memory limit. It defaults to 5% of the process
   memory limit. The garbage collection function is
   freeing enough memory to avoid hitting the process
   memory limit, plus a bit extra (512MB) to avoid calling
   the GC function too frequently.
 - Both enforcement locations free memory in small chunks
   to avoid holding the lock for extended periods of time.
   The chunk size is specified by the tcmalloc_garbage_collection_chunk_size
   startup option and defaults to 10MB.
 - The implementation retains significantly less memory and
   frees it without holding the lock for extended periods of
   time.
 - Other things have changed since then: The buffer pool
   retains memory and frees it gradually over time. This also
   reduces the need for freeing a large amount of memory
   immediately.

Turning off aggressive decommit is currently incompatible with
the madvise_huge_pages=true startup option. This modifies the
startup check so that aggressive decommit can be false if
madvise_huge_pages is false. A future change may provide a
way to mmap huge buffers to allow these to work together.

This adds the --tcmalloc_aggressive_decommit option to
bin/start-impala-cluster.py to make it easier to startup
the cluster. The default value is determined by the
IMPALA_TCMALLOC_AGGRESSIVE_DECOMMIT environment variable,
so this makes it possible to run cluster tests with this
option.

Testing:
 - Added a custom cluster test to run TPC-DS with tcmalloc
   aggressive decommit off
 - Ran a core job with IMPALA_TCMALLOC_AGGRESSIVE_DECOMMIT=false

Change-Id: If6022f14093f362a5de9a854f4f4496c90b049b8
---
M be/src/common/daemon-env.cc
M be/src/common/global-flags.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/test-env.cc
M be/src/util/malloc-util-gperftools.h
M be/src/util/malloc-util-libc.h
M be/src/util/malloc-util-sanitizers.h
M be/src/util/malloc-util.h
M be/src/util/metrics-test.cc
M bin/start-impala-cluster.py
M tests/common/environ.py
M tests/common/skip.py
A tests/custom_cluster/test_malloc_impls.py
17 files changed, 299 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/24402/4
--
To view, visit http://gerrit.cloudera.org:8080/24402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6022f14093f362a5de9a854f4f4496c90b049b8
Gerrit-Change-Number: 24402
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to