Stuart Henderson wrote:
> On 2016/03/10 19:18, Stefan Kempf wrote:
> > There's still at least one issue with the diff. Again in amap_extend().
> > The slotalloc computation was still off :-(
>
> It's not perfect but this is very significantly better. I've put
> it under load and the machine is still ok.
Cool. I have an idea why still processes become unresponsivene. See
the comments to the vmstats outputs.
> Here's a snapshot of a top display:
>
> 46 processes: 1 running, 43 idle, 2 on processor up 6:27
> CPU0 states: 0.0% user, 0.0% nice, 69.2% system, 0.0% interrupt, 30.8% idle
> CPU1 states: 0.0% user, 0.0% nice, 92.3% system, 0.0% interrupt, 7.7% idle
> Memory: Real: 3211M/5422M act/tot Free: 6540M Cache: 153M Swap: 0K/2640M
>
> PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
> 61052 sthen -18 0 2472M 1505M onproc - 63:51 54.39% perl
> 86140 sthen -18 0 2308M 1732M run - 57:39 52.54% perl
> 47388 sthen 28 0 1284K 2480K onproc - 0:45 1.03% top
> 59819 _pflogd 4 0 684K 260K sleep bpf 0:15 0.73% pflogd
> 89941 sthen 2 0 3552K 1780K sleep select 0:13 0.15% sshd
> 46513 _ntp 2 -20 1240K 1456K sleep poll 0:08 0.05% ntpd
> 89307 root 2 -20 812K 1596K idle poll 3:44 0.00% ntpd
> 78543 sthen 2 0 2536K 1784K sleep poll 0:10 0.00% systat
> 21968 sthen 2 0 3568K 1772K idle select 0:08 0.00% sshd
> 10538 sthen 2 0 9220K 8400K sleep kqread 0:05 0.00% tmux
> 45455 sthen 2 0 3564K 1716K sleep select 0:01 0.00% sshd
> 1 root 10 0 440K 4K idle wait 0:01 0.00% init
> 74816 sthen 2 0 3564K 1736K sleep select 0:01 0.00% sshd
> 39769 _syslogd 2 0 1036K 1144K idle kqread 0:00 0.00% syslogd
>
> At this point the big perl processes were flipping between onproc/run
> and sleep with wchan "fltamap".
>
> The not-so-good bit: I couldn't kill them by signalling with ^C ^\ or
> kill(1), however I was able to get them to stop by killing the tmux they
> were running in. But without the diff this would have been a machine
> hang for sure.
>
> Below I include vmstat -m output, firstly from around the point of
> the top(1) run above, secondly after the processes had exited.
>
> Given the area more eyes are definitely wanted but it reads well to
> me, and I'm happy from my testing that this is a huge improvement,
> so: OK sthen.
Thanks. Agreed, any additional review and testing is useful,
also to ensure it doesn't break stuff.
> I have one nit, "systat pool" normally shows only 11 chars so
> you get "amapslotpl1, amapslotpl1, amapslotpl1, ... amapslotpl2" etc.
> I considered renaming to amapslpl1, 2, .. but that's pretty ugly,
> and since vmstat -m displays 12 chars, I propose this:
I think your diff is ok.
> Index: pool.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/systat/pool.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 pool.c
> --- pool.c 16 Jan 2015 00:03:37 -0000 1.10
> +++ pool.c 11 Mar 2016 21:09:35 -0000
> @@ -51,7 +51,7 @@ struct pool_info *pools = NULL;
>
>
> field_def fields_pool[] = {
> - {"NAME", 11, 32, 1, FLD_ALIGN_LEFT, -1, 0, 0, 0},
> + {"NAME", 12, 32, 1, FLD_ALIGN_LEFT, -1, 0, 0, 0},
> {"SIZE", 8, 24, 1, FLD_ALIGN_RIGHT, -1, 0, 0, 0},
> {"REQUESTS", 8, 24, 1, FLD_ALIGN_RIGHT, -1, 0, 0, 0},
> {"FAIL", 8, 24, 1, FLD_ALIGN_RIGHT, -1, 0, 0, 0},
>
>
> |... during run
>
> <sthen@stamp1:~:295>$ vmstat -m
> Memory statistics by bucket size
> Size In Use Free Requests HighWater Couldfree
> 16 2498 318 8882 1280 12
> 32 1038 114 4266 640 0
> 64 52460 20 114795 320 4544
> 128 129653 43 508458 160 68133
> 256 132 28 3161 80 0
> 512 120414 2 813289 40 40368
> 1024 307 25 60779 20 23820
> 2048 232 58 1439 10 284
> 4096 122 5 110326 5 45531
> 8192 206 0 242 5 0
> 16384 4 0 5 5 0
> 32768 9 0 10 5 0
> 65536 3 0 511 5 0
> 131072 2 0 2 5 0
The 64, 128 and 512 buckets are the biggest consumers.
>
> Memory statistics by type Type Kern
> Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
> [...]
> UVM amap292647 78645K 78645K 78644K 1198929 0 0
> 16,32,64,128,256,512,1024,2048,4096,8192
> [...]
Currently, amap per-page reference counters (ppref) and the per-page slots
for amaps with > 16 slots are still allocated with malloc(9).
Your machine is still reaching the ~76M limit of malloc'd amap memory.
That's why you still see processes waiting in fltamap (and that would
explain their unresponsiveness). But it seems that it's able to recover
repeatedly since most of amap mem gets now allocated from pool.
The 64 and 128 buckets must be memory for ppref.: one ppref is 4 bytes
in size, so these buckets can hold 16 and 32 pprefs: they must
belong to amaps with 16 or 32 slots: the memory for 16 slot amaps
comes from pool(9).
To manage 1 slot on amd16, we currently need 16 bytes, so the 512 bucket
must hold memory for amaps with 32 slots.
Large (> 16 slot) amaps are allocated when:
- the init process is created
- when loading an ELF segment into memory during an exec()
- When a process calls sbrk() with more than 16 pages in size
- When copying an amap for shared mappings between two processes
Our perl has sbrk() calls in it, that would explain the 512 byte
bucket used for amap mem.
I have a diff that makes sbrk() allocate smaller amap slot chunks lazily.
That would reduce kmem pressure some more because then it would use
the amap slot pools. I'll post that one separately.
We could also allocate pprefs for amaps with <= 16 slots with pool(9),
but that would be a separate diff as well.
> Memory Totals: In Use Free Requests
> 83372K 185K 1626165
> Memory resource pool statistics
> Name Size Requests Fail InUse Pgreq Pgrel Npage Hiwat Minpg Maxpg
> Idle
> [...]
> amappl 72 240636456 0 776218 28852 10878 17974 18648 0 75
> 1
> amapslotpl1 16 1515614 0 588746 5760 2719 3041 3556 0 17
> 0
> amapslotpl2 32 39438 0 8389 72 4 68 68 0 34
> 0
> amapslotpl3 48 1375 0 305 8 3 5 6 0 50
> 0
> amapslotpl4 64 3373 0 839 28 12 16 19 0 67
> 0
> amapslotpl5 80 728 0 123 5 1 4 4 0 84
> 0
> amapslotpl6 96 168 0 22 2 1 1 1 0 100
> 0
> amapslotpl7 112 632 0 49 2 0 2 2 0 118
> 0
> amapslotpl8 128 362 0 86 4 1 3 4 0 133
> 0
> amapslotpl9 144 280 0 55 8 5 3 4 0 152
> 0
> amapslotpl10 160 457 0 37 2 0 2 2 0 171
> 0
> amapslotpl11 176 211 0 53 4 1 3 3 0 187
> 0
> amapslotpl12 192 131 0 11 1 0 1 1 0 205
> 0
> amapslotpl13 208 86 0 15 1 0 1 1 0 216
> 0
> amapslotpl14 224 81 0 7 1 0 1 1 0 241
> 0
> amapslotpl15 240 208 0 32 4 2 2 2 0 241
> 0
> amapslotpl16 256 202599 0 57473 6304 2711 3593 3593 0 256
> 0