Hello Tidy Bot, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9676

to look at the new patch set (#4).

Change subject: WIP: KUDU-1447. Automatically disable THP on process memory
......................................................................

WIP: KUDU-1447. Automatically disable THP on process memory

Per KUDU-1447, we have long seen issues with process stalls when
transparent huge pages are enabled on the system. Though we've tried to
encourage people to disable it, especially on old kernels like el6, it's
hard to enforce that.

This patch applies one of the suggestions from the discussion on that
JIRA. It hooks tcmalloc so that whenever tcmalloc allocates new memory
from the system, we call madvise(MADV_NOHUGEPAGE) on it. On RHEL 7 and
later, this unregisters the memory region from khugepaged's list of
pages to scan. On all operating systems, this ensures that we don't try
to synchronously allocate a huge page when we first touch the memory.

On el6, madvise() doesn't unregister the pages from khugepaged, so we
take a different approach, and actually replace tcmalloc's system
allocator with one that uses MAP_SHARED instead of MAP_PRIVATE.
Transparent huge pages on el6 don't support shared mappings, so this has
the effect of making our memory invisible to khugepaged as well.

It's hard to add an automated test, but I started a tserver and then
looked at the 'VmFlags' line in /proc/<pid>/smaps for the heap VM area.
When running with --disable_hugepages=always, I saw the 'nh'
(nohugepages) flag indicated on the heap. Without the command line, it
did not have that flag.

On el6, the 'VmFlags' aren't listed in smaps, but I just looked at the
'AnonHugePages' field in this file to verify whether huge pages were or
were not being used.

The default value for the new flag is "auto" which looks at the kernel
version to determine whether THP is problematic. Based on my research
(described in a comment in the code) it seems kernel 4.6 and later are
not susceptible to THP-related stalls.

Tested the effect on latencies using a couple different configuration:

1) THPBad:

Set up khugepaged in a very aggressive mode. With these settings,
khugepaged will very actively look for page collapsing opportunities.
This highlights some worst-case behavior where the default settings may
not always trigger issues in short test runs.

- Set 
/sys/kernel/mm/redhat_transparent_hugepage/khugepaged/alloc_sleep_millisecs to 
10
- Set 
/sys/kernel/mm/redhat_transparent_hugepage/khugepaged/scan_sleep_millisecs to 10
- echo 1 > /proc/sys/vm/compact_memory before running the test to ensure
  an even starting point.
- Run the tserver with --disable_hugepages=never

2) THPDef:

Left khugepaged in the default configuration (alloc_sleep_millisecs =
60000, scan_sleep_millisecs = 1000).

Triggered compact_memory before the test as above.

3) NoTHP:

The same as the 'THPBad' setup, except with this patch's functionality
enabled.

I ran the following load from the same host:
./build/latest/bin/kudu perf loadgen --num_rows_per_thread=25000000 \
    -string_len=1000 -num_threads 3 localhost

I recorded some percentiles from three latency histograms (all in microseconds):
- Write: the 'Write' RPC latency
- Reactor: the reactor_active_latency_us histogram
- Apply: the op_apply_run_time_us histogram

Kernel: 2.6.32-504.30.3.el6.x86_64 (RHEL 6)

                mean   95p    99p    99.99p    max
Write(THPBad)   2858   6688   12992  667458    1674187
Write(THPDef)   2300   2864   3408   317440    354110
Write(NoHTP)    2146   2912   3600   145408    154593
Reactor(THPBad) 35     95     179    1368      584342
Reactor(THPDef) 32     78     164    470       348747
Reactor(NoTHP)  25     63     150    388       22818
Apply(THPBad)   1697   5216   11648  421888    1304537
Apply(THPDef)   1227   1744   2192   272384    351778
Apply(NoTHP)    1152   1824   2432   24704     154043

Kernel: 3.10.0-514.10.2.el7.x86_64 (RHEL 7)

TBD

Change-Id: I4e356466f0473546d52763123e7948f2e8756ceb
---
M src/kudu/kserver/kserver.cc
M src/kudu/util/process_memory.cc
M src/kudu/util/process_memory.h
3 files changed, 285 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/9676/4
--
To view, visit http://gerrit.cloudera.org:8080/9676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e356466f0473546d52763123e7948f2e8756ceb
Gerrit-Change-Number: 9676
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to