obmalloc is very nice at allocating small (~224 bytes) memory blocks.
But it seems current SMALL_REQUEST_THRESHOLD (512) is too large to me.

```
>>> pool_size = 4096 - 48  # 48 is pool header size
>>> for bs in range(16, 513, 16):
...     n,r = pool_size//bs, pool_size%bs + 48
...     print(bs, n, r, 100*r/4096)
...
16 253 48 1.171875
32 126 64 1.5625
48 84 64 1.5625
64 63 64 1.5625
80 50 96 2.34375
96 42 64 1.5625
112 36 64 1.5625
128 31 128 3.125
144 28 64 1.5625
160 25 96 2.34375
176 23 48 1.171875
192 21 64 1.5625
208 19 144 3.515625
224 18 64 1.5625
240 16 256 6.25
256 15 256 6.25
272 14 288 7.03125
288 14 64 1.5625
304 13 144 3.515625
320 12 256 6.25
336 12 64 1.5625
352 11 224 5.46875
368 11 48 1.171875
384 10 256 6.25
400 10 96 2.34375
416 9 352 8.59375
432 9 208 5.078125
448 9 64 1.5625
464 8 384 9.375
480 8 256 6.25
496 8 128 3.125
512 7 512 12.5
```

There are two problems here.

First, pool overhead is at most about 3.5 % until 224 bytes.
But it becomes 6.25% at 240 bytes, 8.6% at 416 bytes, 9.4% at 464
bytes, and 12.5% at 512 bytes.

Second, some size classes have the same number of memory blocks.
Class 272 and 286 have 14 blocks.  320 and 336 have 12 blocks.
It reduces utilization of pools.  This problem becomes bigger on 32bit platform.

Increasing pool size is one obvious way to fix these problems.
I think 16KiB pool size and 2MiB (huge page size of x86) arena size is
a sweet spot for recent web servers (typically, about 32 threads, and
64GiB), but there is no evidence about it.
We need a reference application and scenario to benchmark.
pyperformance is not good for measuring memory usage of complex
applications.

```
>>> header_size = 48
>>> pool_size = 16*1024
>>> for bs in range(16, 513, 16):
...     n = (pool_size - header_size) // bs
...     r = (pool_size - header_size) % bs + header_size
...     print(bs, n, r, 100 * r / pool_size)
...
16 1021 48 0.29296875
32 510 64 0.390625
48 340 64 0.390625
64 255 64 0.390625
80 204 64 0.390625
96 170 64 0.390625
112 145 144 0.87890625
128 127 128 0.78125
144 113 112 0.68359375
160 102 64 0.390625
176 92 192 1.171875
192 85 64 0.390625
208 78 160 0.9765625
224 72 256 1.5625
240 68 64 0.390625
256 63 256 1.5625
272 60 64 0.390625
288 56 256 1.5625
304 53 272 1.66015625
320 51 64 0.390625
336 48 256 1.5625
352 46 192 1.171875
368 44 192 1.171875
384 42 256 1.5625
400 40 384 2.34375
416 39 160 0.9765625
432 37 400 2.44140625
448 36 256 1.5625
464 35 144 0.87890625
480 34 64 0.390625
496 32 512 3.125
512 31 512 3.125
```

Another way to fix these problems is shrinking SMALL_REQUEST_THRESHOLD
to 256 and believe malloc works well for medium size memory blocks.

-- 
Inada Naoki  <songofaca...@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AG6UUPKFXYOTZALFV7XD7EUV62SHOI3P/

Reply via email to