the pool cache code counts the number of times the mutex on the global info was already held, which we can interpret as demand. in response to this demand we can grow the number of items a cpu can cache, which in return should reduce the codemand/contention on the global info.
the contention is measured in the gc tick task, which runs once a
second. if the cache mtx was contended more than 8 times in that
interval, we consider growing the list length.
rather than double the lists, this adds 8 each time.
ive placed a limit on how much the list can grow to try and avoid
having cpus starve each other. basically, we look at how many pool
items the current set of pages can provide, and compare that to how
many items the cpus could cache if we grew the list.
this means that the length of the lists are bound by the amount of
memory available to the pools, and is proportional to the number
of cpus in the system.
this semantic may be wrong or bad, but id like to put it in the
tree and see how we go. this is the last major chunk of pool work
for a while, which should mean if it doesnt work out it will be
simple to revert or disable.
NAME LEN NL NGC CPU REQ REL LREQ LREL
mbufpl 56 14 38 0 49972265 164731885 2015 2420679
1 33376057 7886458 544004 6510
2 70548537 26047463 979997 25398
3 63359512 22846575 871000 23154
4 5978337 1821032 80711 4033
5 157443 69606 2032 212
6 12656 5297 195 29
7 2526 2126 19 11
mtagpl 8 0 0 0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
mcl2k 8 1 22 0 8 11 0 0
1 2458 1031 179 0
2 2288 3258 13 133
3 2253 3108 17 123
4 1061 873 36 12
5 47 55 1 1
6 6 7 0 0
7 1 5 0 0
mcl2k2 56 3 27 0 49935396 7311 1701468 1
1 865 6802697 0 223701
2 602 22136816 0 774735
3 595 19413129 0 655773
4 161 1515461 0 45021
5 2 57970 0 2071
6 0 4204 0 153
7 0 1656 0 43
mcl4k 8 4 53 0 93 64 10 5
1 120770 43303 11392 1708
2 98822 155361 7703 14770
3 111419 135419 9456 12454
4 24665 21605 1670 1287
5 434 942 21 83
6 57 78 3 5
7 4 19 0 0
mcl8k 8 14 63 0 1154 188 142 20
1 796165 297379 93094 30745
2 784731 1153063 91364 137405
3 841866 1057558 98997 125957
4 247836 161689 28512 17743
5 3796 5849 434 689
6 386 567 43 65
7 7 332 0 39
mcl9k 8 3 29 0 26 11 3 0
1 83201 20680 8693 877
2 60037 99224 5197 10095
3 64869 88702 5807 8785
4 3870 3575 135 98
5 77 180 1 12
6 10 27 0 1
7 0 8 0 0
mcl12k 8 1 27 0 36 35 3 1
1 123459 34162 12719 1556
2 86304 147409 7419 15056
3 97288 124924 8853 12306
4 6369 7108 205 296
5 125 237 3 15
6 18 26 1 1
7 0 9 0 0
mcl16k 8 7 35 0 455 547 55 65
1 845535 244604 101335 26218
2 722711 1219333 85386 147463
3 903736 1004833 108363 120999
4 15360 18394 817 1196
5 298 707 11 61
6 33 95 0 6
7 4 26 0 1
mcl64k 8 2 27 0 441 351 52 39
1 1108484 440270 128271 44744
2 708723 1129795 78252 130886
3 751146 1016019 84471 117579
4 110388 91667 10780 8438
5 2106 3545 200 378
6 273 199 27 17
7 57 66 4 5
NAME SIZE REQUESTS FAIL INUSE PGREQ PGREL NPAGE HIWA
mbufpl 256 219596K 0 300 47 0 47 47
mcl12k 12288 315686 0 0 11 0 11 11
mcl16k 16384 2502660 0 12 18 1 17 17
mcl2k 2048 8400 0 3 10 1 9 10
mcl2k2 2112 50265456 0 82 63 1 62 62
mcl4k 4096 359165 0 3 17 1 16 17
mcl64k 65536 2700619 0 8 13 0 13 13
mcl8k 8192 2695280 0 9 28 0 28 28
mcl9k 9216 213441 0 0 7 0 7 7
this is a sparc64 with ix. relative to growing the lists, the mbuf
numbers are the interesting ones.
mbufs are 256 bytes, and they come from 8k pool pages, so there's
32 mbufs per page. NPAGE of 47 * 32 means there pools are managing
1504 mbufs (both free and allocated ones). there's 8 cpus with 2
possible lists of items (the actv and prev lists), so 16 lists in
total. 1504 mbufs dividied by this number of lists tells us how
long those lists can grow without the cpus starving each other. so
1504 / 16 is about 96 (or 88 if we stick to the multiples of 8 we
grow the lists by).
anyway, ok?
Index: sys/pool.h
===================================================================
RCS file: /cvs/src/sys/sys/pool.h,v
retrieving revision 1.71
diff -u -p -r1.71 pool.h
--- sys/pool.h 16 Jun 2017 01:55:45 -0000 1.71
+++ sys/pool.h 16 Jun 2017 03:03:03 -0000
@@ -189,6 +189,7 @@ struct pool {
u_int pr_cache_nlist; /* # of idle lists */
u_int pr_cache_items; /* target list length */
u_int pr_cache_contention;
+ u_int pr_cache_contention_prev;
int pr_cache_tick; /* time idle list was empty */
int pr_cache_nout;
uint64_t pr_cache_ngc; /* # of times the gc released a list */
Index: kern/subr_pool.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_pool.c,v
retrieving revision 1.214
diff -u -p -r1.214 subr_pool.c
--- kern/subr_pool.c 16 Jun 2017 01:55:45 -0000 1.214
+++ kern/subr_pool.c 16 Jun 2017 03:03:03 -0000
@@ -1926,6 +1926,8 @@ pool_cache_destroy(struct pool *pp)
void
pool_cache_gc(struct pool *pp)
{
+ unsigned int contention;
+
if ((ticks - pp->pr_cache_tick) > (hz * pool_wait_gc) &&
!TAILQ_EMPTY(&pp->pr_cache_lists) &&
mtx_enter_try(&pp->pr_cache_mtx)) {
@@ -1944,6 +1946,25 @@ pool_cache_gc(struct pool *pp)
pool_cache_list_put(pp, pl);
}
+
+ /*
+ * if there's a lot of contention on the pr_cache_mtx then consider
+ * growing the length of the list to reduce the need to access the
+ * global pool.
+ */
+
+ contention = pp->pr_cache_contention;
+ if ((contention - pp->pr_cache_contention_prev) > 8 /* magic */) {
+ unsigned int limit = pp->pr_npages * pp->pr_itemsperpage;
+ unsigned int items = pp->pr_cache_items + 8;
+ unsigned int cache = ncpusfound * items * 2;
+
+ /* are there enough items around so every cpu can hold some? */
+
+ if (cache < limit)
+ pp->pr_cache_items = items;
+ }
+ pp->pr_cache_contention_prev = contention;
}
void
