Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
Enji, On Thu, Feb 14, 2019 at 05:12:21PM -0800, Enji Cooper wrote: E> > On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote: E> > J> This seems to break 32-bit platforms, or at least 32-bit book-e E> > J> powerpc, which has a limited KVA space (~500MB). It preallocates I've E> > J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, E> > J> leaving very little left for the rest of runtime. E> > J> E> > J> I spent a couple hours earlier today debugging with Mark Johnston, and E> > J> his consensus is that the vnode_pbuf_zone is too big on 32-bit E> > J> platforms. Unfortunately I know very little about this area, so can't E> > J> provide much extra insight, but can readily reproduce the issues I see E> > J> triggered by this change, so am willing to help where I can. E> > E> > Ok, let's roll back to old default on 32-bit platforms and somewhat E> > reduce the default on 64-bits. E> > E> > Can you please confirm that the patch attached works for you? E> E> Quick question: why was the value reduced by a factor of 4 on 64-bit platforms? Fair question. Replying to you and Bruce. This pool of pbufs is used for sendfile(2) and default value of nswbuf / 2 wasn't enough for modern several Gbit/s speeds. At Netflix we run with nswbuf * 8, since we run up to 100 Gbit/s of sendfile traffic. Together with new pbuf allocator I bumped the value up to what we use. Apparently that was overkill for 32-bit machines, so for them I fully switched it back to old value. Nobody is expected to use these machine as high performance web servers. Also, I decided to reduce the default for 64-bit machines as well. Not everybody runs 100 Gbit/s, but I'd like to see default FreeBSD (no tunables in loader.conf) to be able to run 10 Gbit/s and more. So I've chosen a middle ground between old value of nswbuf / 2 and the value we use at Netflix. P.S. Ideally, this needs to be autotuned. The problem is that now we need to pre-allocate pbufs. -- Gleb Smirnoff ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Thu, 14 Feb 2019 15:34:10 -0800 Gleb Smirnoff wrote: > Hi Justin, > > On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote: > J> This seems to break 32-bit platforms, or at least 32-bit book-e > J> powerpc, which has a limited KVA space (~500MB). It preallocates > J> I've seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, > J> leaving very little left for the rest of runtime. > J> > J> I spent a couple hours earlier today debugging with Mark Johnston, > J> and his consensus is that the vnode_pbuf_zone is too big on 32-bit > J> platforms. Unfortunately I know very little about this area, so > J> can't provide much extra insight, but can readily reproduce the > J> issues I see triggered by this change, so am willing to help where > J> I can. > > Ok, let's roll back to old default on 32-bit platforms and somewhat > reduce the default on 64-bits. > > Can you please confirm that the patch attached works for you? > Hi Gleb, Thanks for the patch. I've built and installed. My machine boots up fine, and I dropped to ddb to check vmem. Results are as follows: r343029: kernel arena domain: size: 67108864 inuse: 66482176 free: 62668 kernel arena: size: 624951296 inuse: 579207168 free: 45744128 r344123 with your patch: kernel arena domain: size: 71303168 inuse: 68153344 free: 3149824 kernel arena: 645922816 inuse: 632369152 free: 13553664 I've kicked off a buildworld+buildkernel to see how it survives. This machine has 8GB RAM and 4GB swap, if that has any impact. - Justin ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Thu, 14 Feb 2019, Mark Johnston wrote: On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote: * ... The only relevant commit between the good and bad versions seems to be r343453. This fixes uma_prealloc() to actually work. But it is a feature for it to not work when its caller asks for too much. I guess you meant r343353. In any case, the pbuf keg is _NOFREE, so even without preallocation the large pbuf zone limits may become problematic if there are bursts of allocation requests. Oops. * ... I don't understand how pbuf_preallocate() allocates for the other pbuf pools. When I debugged this for clpbufs, the preallocation was not used. pbuf types other than clpbufs seem to be unused in my configurations. I thought that pbufs were used during initialization, since they end up with a nonzero FREE count, but their only use seems to be to preallocate them. All of the pbuf zones share a common slab allocator. The zones have individual limits but can tap in to the shared preallocation. It seems to be working as intended now (except the allocation count is 3 higher than expected): XX ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP XX XX swrbuf: 336,128, 0, 0, 0, 0, 0 XX swwbuf: 336, 64, 0, 0, 0, 0, 0 XX nfspbuf:336,128, 0, 0, 0, 0, 0 XX mdpbuf: 336, 25, 0, 0, 0, 0, 0 XX clpbuf: 336,128, 0, 35,2918, 0, 0 XX vnpbuf: 336, 2048, 0, 0, 0, 0, 0 XX pbuf: 336, 16, 0,2505, 0, 0, 0 pbuf should har 2537 preallocated and FREE initially, but seems to actually have 2540. pbufs were only used for clustering, and 35 of them were moved from pbuf to clpbuf. In the buggy version, the preallocations stopped after 4. Then clustering presumably moved these 4 to clpbuf. After that, clustering presumably used non-preallocated buffers until it reached its limit, and then recycled its own buffers. What should happen to recover the old overcommit behaviour with better debugging is 256 preallocated buffers (a few more for large systems) in pbuf and moving these to other pools, but never allocating from other pools (keep buffers in other pools only as an optimization and release them to the main pool under pressure). Also allow dynamic tuning of the pool[s] size[s]. The vnode cache does essentially this by using 1 overcommitted pool with unlimited size in uma and external management of the size. The separate pools correspond to separate file systems. These are too hard to manage, so the vnode cache throws everything into the main pool and depends on locality for the overcommit to not be too large. Bruce ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Thu, 14 Feb 2019, Gleb Smirnoff wrote: On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote: J> This seems to break 32-bit platforms, or at least 32-bit book-e J> powerpc, which has a limited KVA space (~500MB). It preallocates I've J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, J> leaving very little left for the rest of runtime. J> J> I spent a couple hours earlier today debugging with Mark Johnston, and J> his consensus is that the vnode_pbuf_zone is too big on 32-bit J> platforms. Unfortunately I know very little about this area, so can't J> provide much extra insight, but can readily reproduce the issues I see J> triggered by this change, so am willing to help where I can. Ok, let's roll back to old default on 32-bit platforms and somewhat reduce the default on 64-bits. This reduces the largest allocation by a factor of 16 on 32-bit arches, (back to where it was), but it leves the other allocations unchanged, so the total allocation is still almost 5 times larger than before (down from 20 times larger). E.g., with the usual limit of 256 on nswbuf, the total allocation was 32MB with overcommit by a factor of about 5/2 on all systems, but it is now almost 80MB with no overcommit on 32-bit systems. Approximately 0MB of the extras are available on systems with 1GB kva, and less on systems with 512MB kva. Can you please confirm that the patch attached works for you? I don't have any systems affected by the bug, except when I boot with small hw.physmem or large kmem to test things. hw.physmem=72m leaves about 2MB afailable to map into buffers, and doesn't properly reduce nswbuf, so almost 80MB of kva is still used for pbufs. Allocating these must fail due to the RAM shortage. The old value of 32MB gives much the same failures (in practice, a larger operation like fork or exec tends to fail first). Limiting available kva is more interesting, and I haven't tested reducing it intentionally, except once I expanded kmem a lot to put a maximal md malloc()-backed disk in it). Expanding kmem steals from residual kva, and residual kva is not properly scaled except in my version. Large allocations then to cause panics at boot time, except for ones that crash because they don't check for errors. Here is debugging output for large allocations (1MB or more) at boot time on i386: XX pae_mode=0 with ~2.7 GB mapped RAM: XX kva_alloc: large allocation: 7490 pages: 0x580[0x1d42000] vm radix XX kva_alloc: large allocation: 6164 pages: 0x840[0x1814000] pmap init XX kva_alloc: large allocation: 28876 pages: 0xa00[0x70cc000] buf XX kmem_suballoc: large allocation: 1364 pages: 0x1140[0x554000] exec XX kmem_suballoc: large allocation: 10986 pages: 0x11954000[0x2aea000] pipe XX kva_alloc: large allocation: 6656 pages: 0x1480[0x1a0] sfbuf It went far above the old size of 1GB to nearly 1.5GB, but there is plenty to spare out of 4GB. Versions that fitted in 1GB started these allocations about 256MB lower and were otherwise similar. XX pae_mode=1 with 16 GB mapped RAM: XX kva_alloc: large allocation: 43832 pages: 0x14e0[0xab38000] vm radix XX kva_alloc: large allocation: 15668 pages: 0x2000[0x3d34000] pmap init XX kva_alloc: large allocation: 28876 pages: 0x23e0[0x70cc000] buf XX kmem_suballoc: large allocation: 1364 pages: 0x2b00[0x554000] exec XX kmem_suballoc: large allocation: 16320 pages: 0x2b554000[0x3fc] pipe XX kva_alloc: large allocation: 6656 pages: 0x2f60[0x1a0] sfbuf Only the vm radix and pmap init allocations are different, and they start much higher. The allocations now go over 3GB without any useful expansion except for the page tables. PAE was didn't work with 16 GB RAM and 1 GB kva, except in my version. PAE needed to be configured with 2 GB of kva to work with 16 GB RAM, but that was not the default or clearly documented. XX old PAE fixed fit work with 16GB RAM in 1GB KVA: XX kva_alloc: large allocation: 15691 pages: 0xd2c0[0x3d4b000] pmap init XX kva_alloc: large allocation: 43917 pages: 0xd6a0[0xab8d000] vm radix XX kva_alloc: large allocation: 27300 pages: 0xe160[0x6aa4000] buf XX kmem_suballoc: large allocation: 1364 pages: 0xe820[0x554000] exec XX kmem_suballoc: large allocation: 2291 pages: 0xe8754000[0x8f3000] pipe XX kva_alloc: large allocation: 6336 pages: 0xe920[0x18c]sfbuf PAE uses much more kva (almost 256MB extra) before the pmap and radix initializations here too. This is page table metadata before kva allocations are available. The fixes start by keeping track of this amout. It is about 1/16 of the address space for PAE in 1GB, so all later scaling was off by a factor of 16/15 (too high), and since there was less than 1/16 of 1GB to spare, PAE didn't fit. Only 'pipe' is reduced significantly to fit. swzone is reduced to 1 page in all cases, so it doesn't show here. It is about the same as sfbuf IIRC. The
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
> On Feb 14, 2019, at 15:34, Gleb Smirnoff wrote: > > Hi Justin, > > On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote: > J> This seems to break 32-bit platforms, or at least 32-bit book-e > J> powerpc, which has a limited KVA space (~500MB). It preallocates I've > J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, > J> leaving very little left for the rest of runtime. > J> > J> I spent a couple hours earlier today debugging with Mark Johnston, and > J> his consensus is that the vnode_pbuf_zone is too big on 32-bit > J> platforms. Unfortunately I know very little about this area, so can't > J> provide much extra insight, but can readily reproduce the issues I see > J> triggered by this change, so am willing to help where I can. > > Ok, let's roll back to old default on 32-bit platforms and somewhat > reduce the default on 64-bits. > > Can you please confirm that the patch attached works for you? Quick question: why was the value reduced by a factor of 4 on 64-bit platforms? Thanks so much! -Enji ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
Hi Justin, On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote: J> This seems to break 32-bit platforms, or at least 32-bit book-e J> powerpc, which has a limited KVA space (~500MB). It preallocates I've J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, J> leaving very little left for the rest of runtime. J> J> I spent a couple hours earlier today debugging with Mark Johnston, and J> his consensus is that the vnode_pbuf_zone is too big on 32-bit J> platforms. Unfortunately I know very little about this area, so can't J> provide much extra insight, but can readily reproduce the issues I see J> triggered by this change, so am willing to help where I can. Ok, let's roll back to old default on 32-bit platforms and somewhat reduce the default on 64-bits. Can you please confirm that the patch attached works for you? -- Gleb Smirnoff diff --git a/sys/vm/vnode_pager.c b/sys/vm/vnode_pager.c index 3e71ab4436cc..ded9e65e4e4c 100644 --- a/sys/vm/vnode_pager.c +++ b/sys/vm/vnode_pager.c @@ -115,13 +115,23 @@ SYSCTL_PROC(_debug, OID_AUTO, vnode_domainset, CTLTYPE_STRING | CTLFLAG_RW, _domainset, 0, sysctl_handle_domainset, "A", "Default vnode NUMA policy"); +static int nvnpbufs; +SYSCTL_INT(_vm, OID_AUTO, vnode_pbufs, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, +, 0, "number of physical buffers allocated for vnode pager"); + static uma_zone_t vnode_pbuf_zone; static void vnode_pager_init(void *dummy) { - vnode_pbuf_zone = pbuf_zsecond_create("vnpbuf", nswbuf * 8); +#ifdef __LP64__ + nvnpbufs = nswbuf * 2; +#else + nvnpbufs = nswbuf / 2; +#endif + TUNABLE_INT_FETCH("vm.vnode_pbufs", ); + vnode_pbuf_zone = pbuf_zsecond_create("vnpbuf", nvnpbufs); } SYSINIT(vnode_pager, SI_SUB_CPU, SI_ORDER_ANY, vnode_pager_init, NULL); ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote: > On Wed, 13 Feb 2019, Justin Hibbits wrote: > > > On Tue, 15 Jan 2019 01:02:17 + (UTC) > > Gleb Smirnoff wrote: > > > >> Author: glebius > >> Date: Tue Jan 15 01:02:16 2019 > >> New Revision: 343030 > >> URL: https://svnweb.freebsd.org/changeset/base/343030 > >> > >> Log: > >> Allocate pager bufs from UMA instead of 80-ish mutex protected > >> linked list. > > ... > > > > This seems to break 32-bit platforms, or at least 32-bit book-e > > powerpc, which has a limited KVA space (~500MB). It preallocates I've > > seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, > > leaving very little left for the rest of runtime. > > Hrmph. I complained other things in this commit this when it was > committed, but not this largest bug since preallocation was broken then > so I thought that it wasn't done, so that problems are smaller unless the > excessive limits are actually reached. > > Now i386 does it: > > XX ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP > XX > XX swrbuf: 336,128, 0, 0, 0, 0, 0 > XX swwbuf: 336, 64, 0, 0, 0, 0, 0 > XX nfspbuf:336,128, 0, 0, 0, 0, 0 > XX mdpbuf: 336, 25, 0, 0, 0, 0, 0 > XX clpbuf: 336,128, 0, 5, 4, 0, 0 > XX vnpbuf: 336, 2048, 0, 0, 0, 0, 0 > XX pbuf: 336, 16, 0,2535, 0, 0, 0 > > but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not > noticed there. > > The preallocation wasn't there in my last mail to the author about nearby > bugs, on 24 Jan 2019: > > YY vnpbuf: 568, 2048, 0, 0, 0, 0, 0 > YY clpbuf: 568,128, 0, 128,8750, 0, 1 > YY pbuf: 568, 16, 0, 4, 0, 0, 0 > > This output is on amd64 where the SIZE is larger and everything else was > the same as on i386. Now amd64 shows the large preallocation too. > > There seems to be another bug for the especially small LIMIT of 16 to > turn into a preallocation of 2535 and not cause immediate reduction to > the limit. > > I happen to have kernels from 24 and 25 Jan handy. The first one is > amd64 r343346M built on Jan 23, and it doesn't do the large > preallocation. The second one is i386 r343388:343418M built on Jan > 25, and it does the large preallocation. Both call uma_prealloc() to > ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates > 4 buffers while later version allocate 0x9e9 buffers. > > The only relevant commit between the good and bad versions seems to be > r343453. This fixes uma_prealloc() to actually work. But it is a feature > for it to not work when its caller asks for too much. I guess you meant r343353. In any case, the pbuf keg is _NOFREE, so even without preallocation the large pbuf zone limits may become problematic if there are bursts of allocation requests. > 0x9e9 is the sum of the LIMITs of all pbuf pools. The main bug in > r343030 is that it expands nswbuf, which is supposed to give the > combined limit, from its normal value of 256 to 0x9e9. (r343030 > actually used nswbuf before it was properly initialized, so used its > maximum value of 256 even on small systems with nswbuf = 16. Only > this has been fixed.) > > On i386, nbuf is excessively limited so as to give a maxbufspace of > about 100MB so as to fit in 1GB of kva even with infinite RAM and > -current's actual 4GB of kva. nbuf is correctly limited to give a > much smaller maxbufspace when RAM is small (kva scaling for this is > not done so well). nswbuf is restricted if nbuf is restricted, but > not enough (except in my version). It is normally 256, so the pbuf > allocation used to be 32MB, and this is already a bit large compared > with 100MB for maxbufspace. Expanding pbufs by a factor of 0x9e9/0x100 > gives the silly combination of 100MB for maxbufspace and 317MB for > pbufs. > > If kva is only 512MB instead of 1GB, then maxbufspace should be only > 50MB and nswbuf should be smaller too. Similarly for PAE on i386 back > when it was configured with 1GB kva by default. Only about 512MB are > left after allocating space for page table metadata. I have fixes > that scale most of this better. Large subsystems starting with kmem > get a hard-coded fraction of the usable kva. E.g., kmem gets about > 60% of usable kva instead of about 40% of nominal kva. Most other > large subsystems including the buffer cache get about 1/8 of the > remaining 40% of usable kva. Scaling for other subsystems is mostly > worse than for kmem. pbufs are part of the buffer cache allocation. > The expansion factor of 0x9e9/0x100 breaks this. > > I don't understand how pbuf_preallocate() allocates
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote: > I don't understand how pbuf_preallocate() allocates for the other > pbuf pools. When I debugged this for clpbufs, the preallocation was > not used. pbuf types other than clpbufs seem to be unused in my > configurations. I thought that pbufs were used during initialization, > since they end up with a nonzero FREE count, but their only use seems > to be to preallocate them. vnode_pager_generic_getpages() typically not used for UFS on modern systems. Instead the buffer pager is active which does not need pbufs, it uses real buffers coherent with the UFS buffer cache. To get to the actual use of pbufs now you can: - perform clustered buffer io; - use vnode-backed md(4) (this case is still broken if md(4) is loaded as a module); - cause system swapping; - use sendfile(2). ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Wed, 13 Feb 2019, Justin Hibbits wrote: On Tue, 15 Jan 2019 01:02:17 + (UTC) Gleb Smirnoff wrote: Author: glebius Date: Tue Jan 15 01:02:16 2019 New Revision: 343030 URL: https://svnweb.freebsd.org/changeset/base/343030 Log: Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. ... This seems to break 32-bit platforms, or at least 32-bit book-e powerpc, which has a limited KVA space (~500MB). It preallocates I've seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, leaving very little left for the rest of runtime. Hrmph. I complained other things in this commit this when it was committed, but not this largest bug since preallocation was broken then so I thought that it wasn't done, so that problems are smaller unless the excessive limits are actually reached. Now i386 does it: XX ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP XX XX swrbuf: 336,128, 0, 0, 0, 0, 0 XX swwbuf: 336, 64, 0, 0, 0, 0, 0 XX nfspbuf:336,128, 0, 0, 0, 0, 0 XX mdpbuf: 336, 25, 0, 0, 0, 0, 0 XX clpbuf: 336,128, 0, 5, 4, 0, 0 XX vnpbuf: 336, 2048, 0, 0, 0, 0, 0 XX pbuf: 336, 16, 0,2535, 0, 0, 0 but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not noticed there. The preallocation wasn't there in my last mail to the author about nearby bugs, on 24 Jan 2019: YY vnpbuf: 568, 2048, 0, 0, 0, 0, 0 YY clpbuf: 568,128, 0, 128,8750, 0, 1 YY pbuf: 568, 16, 0, 4, 0, 0, 0 This output is on amd64 where the SIZE is larger and everything else was the same as on i386. Now amd64 shows the large preallocation too. There seems to be another bug for the especially small LIMIT of 16 to turn into a preallocation of 2535 and not cause immediate reduction to the limit. I happen to have kernels from 24 and 25 Jan handy. The first one is amd64 r343346M built on Jan 23, and it doesn't do the large preallocation. The second one is i386 r343388:343418M built on Jan 25, and it does the large preallocation. Both call uma_prealloc() to ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates 4 buffers while later version allocate 0x9e9 buffers. The only relevant commit between the good and bad versions seems to be r343453. This fixes uma_prealloc() to actually work. But it is a feature for it to not work when its caller asks for too much. 0x9e9 is the sum of the LIMITs of all pbuf pools. The main bug in r343030 is that it expands nswbuf, which is supposed to give the combined limit, from its normal value of 256 to 0x9e9. (r343030 actually used nswbuf before it was properly initialized, so used its maximum value of 256 even on small systems with nswbuf = 16. Only this has been fixed.) On i386, nbuf is excessively limited so as to give a maxbufspace of about 100MB so as to fit in 1GB of kva even with infinite RAM and -current's actual 4GB of kva. nbuf is correctly limited to give a much smaller maxbufspace when RAM is small (kva scaling for this is not done so well). nswbuf is restricted if nbuf is restricted, but not enough (except in my version). It is normally 256, so the pbuf allocation used to be 32MB, and this is already a bit large compared with 100MB for maxbufspace. Expanding pbufs by a factor of 0x9e9/0x100 gives the silly combination of 100MB for maxbufspace and 317MB for pbufs. If kva is only 512MB instead of 1GB, then maxbufspace should be only 50MB and nswbuf should be smaller too. Similarly for PAE on i386 back when it was configured with 1GB kva by default. Only about 512MB are left after allocating space for page table metadata. I have fixes that scale most of this better. Large subsystems starting with kmem get a hard-coded fraction of the usable kva. E.g., kmem gets about 60% of usable kva instead of about 40% of nominal kva. Most other large subsystems including the buffer cache get about 1/8 of the remaining 40% of usable kva. Scaling for other subsystems is mostly worse than for kmem. pbufs are part of the buffer cache allocation. The expansion factor of 0x9e9/0x100 breaks this. I don't understand how pbuf_preallocate() allocates for the other pbuf pools. When I debugged this for clpbufs, the preallocation was not used. pbuf types other than clpbufs seem to be unused in my configurations. I thought that pbufs were used during initialization, since they end up with a nonzero FREE count, but their only use seems to be to preallocate them. Bruce ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Tue, 15 Jan 2019 01:02:17 + (UTC) Gleb Smirnoff wrote: > Author: glebius > Date: Tue Jan 15 01:02:16 2019 > New Revision: 343030 > URL: https://svnweb.freebsd.org/changeset/base/343030 > > Log: > Allocate pager bufs from UMA instead of 80-ish mutex protected > linked list. > o In vm_pager_bufferinit() create pbuf_zone and start accounting on > how many pbufs are we going to have set. > In various subsystems that are going to utilize pbufs create > private zones via call to pbuf_zsecond_create(). The latter calls > uma_zsecond_create(), and sets a limit on created zone. After startup > preallocate pbufs according to requirements of all pbuf zones. > > Subsystems that used to have a private limit with old allocator > now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS > cluster, FFS, swap, vnode pager. > > The following subsystems use shared pbuf zone: cam(4), nvme(4), > physio(9), aio(4). They should have their private limits, but > changing that is out of scope of this commit. > > o Fetch tunable value of kern.nswbuf from init_param2() and while > here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that > was holding only this option. > Default values aren't touched by this commit, but they probably > should be reviewed wrt to modern hardware. > > This change removes a tight bottleneck from sendfile(2) operation, > that uses pbufs in vnode pager. Other pagers also would benefit from > faster allocation. > > Together with: gallatin > Tested by: pho > > Modified: > head/sys/cam/cam_periph.c > head/sys/conf/options > head/sys/dev/md/md.c > head/sys/dev/nvme/nvme_ctrlr.c > head/sys/fs/fuse/fuse_main.c > head/sys/fs/fuse/fuse_vnops.c > head/sys/fs/nfsclient/nfs_clbio.c > head/sys/fs/nfsclient/nfs_clport.c > head/sys/fs/smbfs/smbfs_io.c > head/sys/fs/smbfs/smbfs_vfsops.c > head/sys/kern/kern_physio.c > head/sys/kern/subr_param.c > head/sys/kern/vfs_aio.c > head/sys/kern/vfs_bio.c > head/sys/kern/vfs_cluster.c > head/sys/sys/buf.h > head/sys/ufs/ffs/ffs_rawread.c > head/sys/vm/swap_pager.c > head/sys/vm/vm_pager.c > head/sys/vm/vnode_pager.c > Hi Gleb, This seems to break 32-bit platforms, or at least 32-bit book-e powerpc, which has a limited KVA space (~500MB). It preallocates I've seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, leaving very little left for the rest of runtime. I spent a couple hours earlier today debugging with Mark Johnston, and his consensus is that the vnode_pbuf_zone is too big on 32-bit platforms. Unfortunately I know very little about this area, so can't provide much extra insight, but can readily reproduce the issues I see triggered by this change, so am willing to help where I can. - Justin ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On WITNESS builds after this change and all followup fixes I can see (@ r343108), I get a new warning: Sleeping on "pageprocwait" with the following non-sleepable locks held: exclusive sleep mutex pbuf (UMA zone) r = 0 (0xf80003033e00) locked @ .../sys/vm/uma_core.c:1139 stack backtrace: #0 0x80c0a164 at witness_debugger.part.14+0xa4 #1 0x80c0d465 at witness_warn+0x285 #2 0x80baa3b9 at _sleep+0x59 #3 0x80f03c73 at vm_wait_doms+0x103 #4 0x80eeb8e5 at vm_domainset_iter_policy+0x55 #5 0x80eeab59 at uma_prealloc+0xc9 #6 0x80f0d643 at pbuf_zsecond_create+0x63 #7 0x80ee2c9f at swap_pager_swap_init+0x5f #8 0x80f0cf57 at vm_pageout+0x27 For what it's worth, this is a bhyve guest with vm.ndomains 1. (The bhyve host has 2 domains, but I don't see how that would be relevant.) Best, Conrad On Mon, Jan 14, 2019 at 5:02 PM Gleb Smirnoff wrote: > > Author: glebius > Date: Tue Jan 15 01:02:16 2019 > New Revision: 343030 > URL: https://svnweb.freebsd.org/changeset/base/343030 > > Log: > Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. > > o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many > pbufs are we going to have set. > In various subsystems that are going to utilize pbufs create private zones > via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), > and sets a limit on created zone. After startup preallocate pbufs > according > to requirements of all pbuf zones. > > Subsystems that used to have a private limit with old allocator now have > private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, > swap, vnode pager. > > The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), > aio(4). They should have their private limits, but changing that is out of > scope of this commit. > > o Fetch tunable value of kern.nswbuf from init_param2() and while here move > NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only > this option. > Default values aren't touched by this commit, but they probably should be > reviewed wrt to modern hardware. > > This change removes a tight bottleneck from sendfile(2) operation, that > uses pbufs in vnode pager. Other pagers also would benefit from faster > allocation. > > Together with:gallatin > Tested by:pho > > Modified: > head/sys/cam/cam_periph.c > head/sys/conf/options > head/sys/dev/md/md.c > head/sys/dev/nvme/nvme_ctrlr.c > head/sys/fs/fuse/fuse_main.c > head/sys/fs/fuse/fuse_vnops.c > head/sys/fs/nfsclient/nfs_clbio.c > head/sys/fs/nfsclient/nfs_clport.c > head/sys/fs/smbfs/smbfs_io.c > head/sys/fs/smbfs/smbfs_vfsops.c > head/sys/kern/kern_physio.c > head/sys/kern/subr_param.c > head/sys/kern/vfs_aio.c > head/sys/kern/vfs_bio.c > head/sys/kern/vfs_cluster.c > head/sys/sys/buf.h > head/sys/ufs/ffs/ffs_rawread.c > head/sys/vm/swap_pager.c > head/sys/vm/vm_pager.c > head/sys/vm/vnode_pager.c > > Modified: head/sys/cam/cam_periph.c > == > --- head/sys/cam/cam_periph.c Tue Jan 15 00:52:41 2019(r343029) > +++ head/sys/cam/cam_periph.c Tue Jan 15 01:02:16 2019(r343030) > @@ -936,7 +936,7 @@ cam_periph_mapmem(union ccb *ccb, struct cam_periph_ma > /* > * Get the buffer. > */ > - mapinfo->bp[i] = getpbuf(NULL); > + mapinfo->bp[i] = uma_zalloc(pbuf_zone, M_WAITOK); > > /* put our pointer in the data slot */ > mapinfo->bp[i]->b_data = *data_ptrs[i]; > @@ -962,9 +962,9 @@ cam_periph_mapmem(union ccb *ccb, struct cam_periph_ma > for (j = 0; j < i; ++j) { > *data_ptrs[j] = mapinfo->bp[j]->b_caller1; > vunmapbuf(mapinfo->bp[j]); > - relpbuf(mapinfo->bp[j], NULL); > + uma_zfree(pbuf_zone, mapinfo->bp[j]); > } > - relpbuf(mapinfo->bp[i], NULL); > + uma_zfree(pbuf_zone, mapinfo->bp[i]); > PRELE(curproc); > return(EACCES); > } > @@ -1052,7 +1052,7 @@ cam_periph_unmapmem(union ccb *ccb, struct cam_periph_ > vunmapbuf(mapinfo->bp[i]); > > /* release the buffer */ > - relpbuf(mapinfo->bp[i], NULL); > + uma_zfree(pbuf_zone, mapinfo->bp[i]); > } > > /* allow ourselves to be swapped once again */ > > Modified: head/sys/conf/options > == > --- head/sys/conf/options Tue Jan 15 00:52:41 2019(r343029) > +++ head/sys/conf/options Tue Jan
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On Tue, Jan 15, 2019 at 11:13:18AM -0500, Pedro Giffuni wrote: P> >> Allocate pager bufs from UMA instead of 80-ish mutex protected P> >> linked list. P> > P> >> Together with: gallatin P> > P> > Thank you so much for carrying this over the finish line! P> > P> It appears to be very impressive! Plans for MFC? Nope. I'm very conservative about stable branch being stable branch :) -- Gleb Smirnoff ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On 1/15/19 11:07 AM, Andrew Gallatin wrote: On 1/14/19 8:02 PM, Gleb Smirnoff wrote: Log: Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. <...> Together with: gallatin Thank you so much for carrying this over the finish line! Drew It appears to be very impressive! Plans for MFC? Pedro. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
On 1/14/19 8:02 PM, Gleb Smirnoff wrote: Log: Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. <...> Together with: gallatin Thank you so much for carrying this over the finish line! Drew ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"