ori added a comment.
I rebuilt libjemalloc1 with `--prof-enabled`:
```
lang=bash
$ apt-get source libjemalloc1
$ cd jemalloc-3.5.1
$ cat >> debian/rules
override_dh_auto_configure:
dh_auto_configure -- --enable-prof
$ dch --distribution trusty-wikimedia --local wmf "Rebuilt with --enable-prof"
$ sudo dpkg-buildpackage -b -uc
```
Then I installed the resultant package on mw1114 and mw1189, the HHVM API
servers. (It's on the API servers that the leak is most evident.)
I then enabled jemalloc's heap profiling capabilities via `/etc/malloc.conf`.
The way to do it is to make that file a symlink whose target is an option
string rather than an actual file. Strange!
```
lang=sh
$ ln -sf "prof:true,prof_gdump:true,lg_prof_sample:21,prof_prefix:/tmp/heaps/
```
- `prof_gdump` tells jemalloc to dump a memory profile every time the total
virtual memory exceeds the previous maximum.
- `lg_prof_sample` specifies the average interval (log base 2) between
allocation samples, as measured in bytes of allocation activity. 21 = 2 MiB.
This is granular enough to be useful while keeping the computational overhead
light.
- `prof_prefix` sets the filename prefix for profile dumps. If unset, dumps
are written to the process's working directory -- in this case `/run/hhvm`.
`prof_gdump` ends up being less useful than I thought. Because virtual memory
usage is steadily increasing, there is a new high water mark every few seconds,
so I end up with a lot of noise.
The next thing I try is to use the admin server API to request a heap profile
every ten minutes.
```
lang=sh
#!/bin/bash
# Request a heap profile from HHVM once every ten minutes.
# Profiles are written to /tmp/heaps.
sudo -u apache -g apache mkdir -m0755 -p /tmp/heaps
curl "localhost:9002/jemalloc-prof-activate"
for (( i=1; ; i++ )); do
curl "localhost:9002/jemalloc-prof-dump?file=/tmp/heaps/${i}.heap"
sleep 600
done
```
My plan is to use pprof's `--base` arg to subtract a heap profile from its
successor. The basic invocation looks like this:
```
$ google-pprof `which hhvm` --show_bytes --base=1.heap 2.heap
```
This puts you in interactive mode. You can run 'top' to see the top 10
routines, etc.
I hack together a simple shell script to help:
```
lang=sh
#!/bin/bash
for (( i=1; ; i++ )); do
base="$((i)).heap" prof="$((i+1)).heap"
[ -e "$prof" ] || break
echo "-------------------- $base => $prof --------------------"
google-pprof `which hhvm` --show_bytes --text --base="$base" "$prof"
2>/dev/null | head -6
done
```
Here's the output for an hour's worth of heap profiles:
```
-------------------- 1.heap => 2.heap --------------------
Total: 573794313 B
402653184 70.2% 70.2% 402653184 70.2% HPHP::AsyncFuncImpl::start
48235840 8.4% 78.6% 56624736 9.9% xmlNewText
20972160 3.7% 82.2% 94374384 16.4% xmlSAX2StartElement
18913696 3.3% 85.5% 18913696 3.3% 00007fe80e4343b8
16777520 2.9% 88.5% 16777520 2.9% xmlStrndup
-------------------- 2.heap => 3.heap --------------------
Total: 483619254 B
402653184 83.3% 83.3% 402653184 83.3% HPHP::AsyncFuncImpl::start
50333056 10.4% 93.7% 56624648 11.7% xmlNewText
25166592 5.2% 98.9% 25166592 5.2% xmlNewNodeEatName
23069376 4.8% 103.6% 106957784 22.1% xmlSAX2StartElement
8388760 1.7% 105.4% 8388760 1.7% xmlStrndup
-------------------- 3.heap => 4.heap --------------------
Total: 139819215 B
44041360 31.5% 31.5% 50332872 36.0% xmlNewText
43129392 30.8% 62.3% 43129392 30.8% HPHP::MemoryManager::newSlab
12583296 9.0% 71.3% 81790932 58.5% xmlSAX2StartElement
12583296 9.0% 80.3% 12583296 9.0% xmlNewNodeEatName
12582979 9.0% 89.3% 12582979 9.0% xmlStrndup
-------------------- 4.heap => 5.heap --------------------
Total: 1258200573 B
1207959552 96.0% 96.0% 1207959552 96.0% HPHP::AsyncFuncImpl::start
33555344 2.7% 98.7% 39846844 3.2% xmlNewText
12609571 1.0% 99.7% 12609571 1.0% 00007fe80e4343b8
12605141 1.0% 100.7% 12605141 1.0% pcre_compile2
12583296 1.0% 101.7% 12583296 1.0% xmlNewNodeEatName
-------------------- 5.heap => 6.heap --------------------
Total: 129656509 B
35652544 27.5% 27.5% 37749720 29.1% xmlNewText
33176456 25.6% 53.1% 33176456 25.6% HPHP::MemoryManager::newSlab
23069376 17.8% 70.9% 23069376 17.8% xmlNewNodeEatName
14694959 11.3% 82.2% 14694959 11.3% 00007fe80e4343b8
14680512 11.3% 93.5% 81791088 63.1% xmlSAX2StartElement
```
Working with XML could be the most common, memory-intensive things the API
servers do, so the regular appearance of `xmlNewText` in these lists //could//
be innocent, but it looks suspicious to me.
I'm out of time for now, but the next thing I plan to set a breakpoint on
`xmlNewText` in gdb and capture some backtraces.
TASK DETAIL
https://phabricator.wikimedia.org/T820
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
<username>.
To: tstarling, ori
Cc: wikibugs-l, tstarling, ori
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l