Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-15 Thread Gleb Smirnoff
  Enji,

On Thu, Feb 14, 2019 at 05:12:21PM -0800, Enji Cooper wrote:
E> > On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote:
E> > J> This seems to break 32-bit platforms, or at least 32-bit book-e
E> > J> powerpc, which has a limited KVA space (~500MB).  It preallocates I've
E> > J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
E> > J> leaving very little left for the rest of runtime.
E> > J> 
E> > J> I spent a couple hours earlier today debugging with Mark Johnston, and
E> > J> his consensus is that the vnode_pbuf_zone is too big on 32-bit
E> > J> platforms.  Unfortunately I know very little about this area, so can't
E> > J> provide much extra insight, but can readily reproduce the issues I see
E> > J> triggered by this change, so am willing to help where I can.
E> > 
E> > Ok, let's roll back to old default on 32-bit platforms and somewhat
E> > reduce the default on 64-bits.
E> > 
E> > Can you please confirm that the patch attached works for you?
E> 
E> Quick question: why was the value reduced by a factor of 4 on 64-bit 
platforms?

Fair question. Replying to you and Bruce.

This pool of pbufs is used for sendfile(2) and default value of nswbuf / 2
wasn't enough for modern several Gbit/s speeds. At Netflix we run with
nswbuf * 8, since we run up to 100 Gbit/s of sendfile traffic.

Together with new pbuf allocator I bumped the value up to what we use.
Apparently that was overkill for 32-bit machines, so for them I fully
switched it back to old value. Nobody is expected to use these machine
as high performance web servers.

Also, I decided to reduce the default for 64-bit machines as well. Not
everybody runs 100 Gbit/s, but I'd like to see default FreeBSD (no
tunables in loader.conf) to be able to run 10 Gbit/s and more. So
I've chosen a middle ground between old value of nswbuf / 2 and the
value we use at Netflix.

P.S. Ideally, this needs to be autotuned. The problem is that now we need
to pre-allocate pbufs.

-- 
Gleb Smirnoff
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-15 Thread Justin Hibbits
On Thu, 14 Feb 2019 15:34:10 -0800
Gleb Smirnoff  wrote:

>   Hi Justin,
> 
> On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote:
> J> This seems to break 32-bit platforms, or at least 32-bit book-e
> J> powerpc, which has a limited KVA space (~500MB).  It preallocates
> J> I've seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
> J> leaving very little left for the rest of runtime.
> J> 
> J> I spent a couple hours earlier today debugging with Mark Johnston,
> J> and his consensus is that the vnode_pbuf_zone is too big on 32-bit
> J> platforms.  Unfortunately I know very little about this area, so
> J> can't provide much extra insight, but can readily reproduce the
> J> issues I see triggered by this change, so am willing to help where
> J> I can.  
> 
> Ok, let's roll back to old default on 32-bit platforms and somewhat
> reduce the default on 64-bits.
> 
> Can you please confirm that the patch attached works for you?
> 

Hi Gleb,

Thanks for the patch.  I've built and installed.  My machine boots up
fine, and I dropped to ddb to check vmem.  Results are as follows:

r343029:
  kernel arena domain:
size: 67108864
inuse: 66482176
free: 62668

  kernel arena:
size: 624951296
inuse: 579207168
free: 45744128

r344123 with your patch:
  kernel arena domain:
size: 71303168
inuse: 68153344
free: 3149824
  kernel arena:
645922816
inuse: 632369152
free: 13553664

I've kicked off a buildworld+buildkernel to see how it survives.

This machine has 8GB RAM and 4GB swap, if that has any impact.

- Justin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Bruce Evans

On Thu, 14 Feb 2019, Mark Johnston wrote:


On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote:
* ...

The only relevant commit between the good and bad versions seems to be
r343453.  This fixes uma_prealloc() to actually work.  But it is a feature
for it to not work when its caller asks for too much.


I guess you meant r343353.  In any case, the pbuf keg is _NOFREE, so
even without preallocation the large pbuf zone limits may become
problematic if there are bursts of allocation requests.


Oops.


* ...

I don't understand how pbuf_preallocate() allocates for the other
pbuf pools.  When I debugged this for clpbufs, the preallocation was
not used.  pbuf types other than clpbufs seem to be unused in my
configurations.  I thought that pbufs were used during initialization,
since they end up with a nonzero FREE count, but their only use seems
to be to preallocate them.


All of the pbuf zones share a common slab allocator.  The zones have
individual limits but can tap in to the shared preallocation.


It seems to be working as intended now (except the allocation count is
3 higher than expected):

XX ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
XX
XX swrbuf: 336,128,   0,   0,   0,   0,   0
XX swwbuf: 336, 64,   0,   0,   0,   0,   0
XX nfspbuf:336,128,   0,   0,   0,   0,   0
XX mdpbuf: 336, 25,   0,   0,   0,   0,   0
XX clpbuf: 336,128,   0,  35,2918,   0,   0
XX vnpbuf: 336,   2048,   0,   0,   0,   0,   0
XX pbuf:   336, 16,   0,2505,   0,   0,   0

pbuf should har 2537 preallocated and FREE initially, but seems to actually
have 2540.  pbufs were only used for clustering, and 35 of them were moved
from pbuf to clpbuf.

In the buggy version, the preallocations stopped after 4.  Then clustering
presumably moved these 4 to clpbuf.  After that, clustering presumably used
non-preallocated buffers until it reached its limit, and then recycled its
own buffers.

What should happen to recover the old overcommit behaviour with better
debugging is 256 preallocated buffers (a few more for large systems) in
pbuf and moving these to other pools, but never allocating from other
pools (keep buffers in other pools only as an optimization and release
them to the main pool under pressure).  Also allow dynamic tuning of
the pool[s] size[s].  The vnode cache does essentially this by using
1 overcommitted pool with unlimited size in uma and external management
of the size.  The separate pools correspond to separate file systems.
These are too hard to manage, so the vnode cache throws everything into
the main pool and depends on locality for the overcommit to not be too
large.

Bruce
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Bruce Evans

On Thu, 14 Feb 2019, Gleb Smirnoff wrote:


On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote:
J> This seems to break 32-bit platforms, or at least 32-bit book-e
J> powerpc, which has a limited KVA space (~500MB).  It preallocates I've
J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
J> leaving very little left for the rest of runtime.
J>
J> I spent a couple hours earlier today debugging with Mark Johnston, and
J> his consensus is that the vnode_pbuf_zone is too big on 32-bit
J> platforms.  Unfortunately I know very little about this area, so can't
J> provide much extra insight, but can readily reproduce the issues I see
J> triggered by this change, so am willing to help where I can.

Ok, let's roll back to old default on 32-bit platforms and somewhat
reduce the default on 64-bits.


This reduces the largest allocation by a factor of 16 on 32-bit arches,
(back to where it was), but it leves the other allocations unchanged,
so the total allocation is still almost 5 times larger than before
(down from 20 times larger).  E.g., with the usual limit of 256 on
nswbuf, the total allocation was 32MB with overcommit by a factor of
about 5/2 on all systems, but it is now almost 80MB with no overcommit
on 32-bit systems.  Approximately 0MB of the extras are available on
systems with 1GB kva, and less on systems with 512MB kva.


Can you please confirm that the patch attached works for you?


I don't have any systems affected by the bug, except when I boot with
small hw.physmem or large kmem to test things.  hw.physmem=72m leaves
about 2MB afailable to map into buffers, and doesn't properly reduce
nswbuf, so almost 80MB of kva is still used for pbufs.  Allocating these
must fail due to the RAM shortage.  The old value of 32MB gives much the
same failures (in practice, a larger operation like fork or exec tends
to fail first).  Limiting available kva is more interesting, and I haven't
tested reducing it intentionally, except once I expanded kmem a lot to
put a maximal md malloc()-backed disk in it).  Expanding kmem steals from
residual kva, and residual kva is not properly scaled except in my version.
Large allocations then to cause panics at boot time, except for ones that
crash because they don't check for errors.

Here is debugging output for large allocations (1MB or more) at boot time
on i386:

XX pae_mode=0 with ~2.7 GB mapped RAM:
XX kva_alloc: large allocation: 7490 pages: 0x580[0x1d42000]   vm radix
XX kva_alloc: large allocation: 6164 pages: 0x840[0x1814000]   pmap init
XX kva_alloc: large allocation: 28876 pages: 0xa00[0x70cc000]  buf
XX kmem_suballoc: large allocation: 1364 pages: 0x1140[0x554000]   exec
XX kmem_suballoc: large allocation: 10986 pages: 0x11954000[0x2aea000] pipe
XX kva_alloc: large allocation: 6656 pages: 0x1480[0x1a0]  sfbuf

It went far above the old size of 1GB to nearly 1.5GB, but there is plenty
to spare out of 4GB.  Versions that fitted in 1GB started these allocations
about 256MB lower and were otherwise similar.

XX pae_mode=1 with 16 GB mapped RAM:
XX kva_alloc: large allocation: 43832 pages: 0x14e0[0xab38000] vm radix
XX kva_alloc: large allocation: 15668 pages: 0x2000[0x3d34000] pmap init
XX kva_alloc: large allocation: 28876 pages: 0x23e0[0x70cc000] buf
XX kmem_suballoc: large allocation: 1364 pages: 0x2b00[0x554000]   exec
XX kmem_suballoc: large allocation: 16320 pages: 0x2b554000[0x3fc] pipe
XX kva_alloc: large allocation: 6656 pages: 0x2f60[0x1a0]  sfbuf

Only the vm radix and pmap init allocations are different, and they start
much higher.  The allocations now go over 3GB without any useful expansion
except for the page tables.  PAE was didn't work with 16 GB RAM and 1 GB
kva, except in my version.  PAE needed to be configured with 2 GB of kva
to work with 16 GB RAM, but that was not the default or clearly documented.

XX old PAE fixed fit work with 16GB RAM in 1GB KVA:
XX kva_alloc: large allocation: 15691 pages: 0xd2c0[0x3d4b000]   pmap init
XX kva_alloc: large allocation: 43917 pages: 0xd6a0[0xab8d000]   vm radix
XX kva_alloc: large allocation: 27300 pages: 0xe160[0x6aa4000]   buf
XX kmem_suballoc: large allocation: 1364 pages: 0xe820[0x554000] exec
XX kmem_suballoc: large allocation: 2291 pages: 0xe8754000[0x8f3000] pipe
XX kva_alloc: large allocation: 6336 pages: 0xe920[0x18c]sfbuf

PAE uses much more kva (almost 256MB extra) before the pmap and radix
initializations here too.  This is page table metadata before kva
allocations are available.  The fixes start by keeping track of this
amout.   It is about 1/16 of the address space for PAE in 1GB, so all
later scaling was off by a factor of 16/15 (too high), and since there
was less than 1/16 of 1GB to spare, PAE didn't fit.

Only 'pipe' is reduced significantly to fit.  swzone is reduced to 1 page
in all cases, so it doesn't show here.  It is about the same as sfbuf IIRC.
The 

Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Enji Cooper


> On Feb 14, 2019, at 15:34, Gleb Smirnoff  wrote:
> 
>  Hi Justin,
> 
> On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote:
> J> This seems to break 32-bit platforms, or at least 32-bit book-e
> J> powerpc, which has a limited KVA space (~500MB).  It preallocates I've
> J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
> J> leaving very little left for the rest of runtime.
> J> 
> J> I spent a couple hours earlier today debugging with Mark Johnston, and
> J> his consensus is that the vnode_pbuf_zone is too big on 32-bit
> J> platforms.  Unfortunately I know very little about this area, so can't
> J> provide much extra insight, but can readily reproduce the issues I see
> J> triggered by this change, so am willing to help where I can.
> 
> Ok, let's roll back to old default on 32-bit platforms and somewhat
> reduce the default on 64-bits.
> 
> Can you please confirm that the patch attached works for you?

Quick question: why was the value reduced by a factor of 4 on 64-bit 
platforms?
Thanks so much!
-Enji
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Gleb Smirnoff
  Hi Justin,

On Wed, Feb 13, 2019 at 07:24:50PM -0600, Justin Hibbits wrote:
J> This seems to break 32-bit platforms, or at least 32-bit book-e
J> powerpc, which has a limited KVA space (~500MB).  It preallocates I've
J> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
J> leaving very little left for the rest of runtime.
J> 
J> I spent a couple hours earlier today debugging with Mark Johnston, and
J> his consensus is that the vnode_pbuf_zone is too big on 32-bit
J> platforms.  Unfortunately I know very little about this area, so can't
J> provide much extra insight, but can readily reproduce the issues I see
J> triggered by this change, so am willing to help where I can.

Ok, let's roll back to old default on 32-bit platforms and somewhat
reduce the default on 64-bits.

Can you please confirm that the patch attached works for you?

-- 
Gleb Smirnoff
diff --git a/sys/vm/vnode_pager.c b/sys/vm/vnode_pager.c
index 3e71ab4436cc..ded9e65e4e4c 100644
--- a/sys/vm/vnode_pager.c
+++ b/sys/vm/vnode_pager.c
@@ -115,13 +115,23 @@ SYSCTL_PROC(_debug, OID_AUTO, vnode_domainset, CTLTYPE_STRING | CTLFLAG_RW,
 _domainset, 0, sysctl_handle_domainset, "A",
 "Default vnode NUMA policy");
 
+static int nvnpbufs;
+SYSCTL_INT(_vm, OID_AUTO, vnode_pbufs, CTLFLAG_RDTUN | CTLFLAG_NOFETCH,
+, 0, "number of physical buffers allocated for vnode pager");
+
 static uma_zone_t vnode_pbuf_zone;
 
 static void
 vnode_pager_init(void *dummy)
 {
 
-	vnode_pbuf_zone = pbuf_zsecond_create("vnpbuf", nswbuf * 8);
+#ifdef __LP64__
+	nvnpbufs = nswbuf * 2;
+#else
+	nvnpbufs = nswbuf / 2;
+#endif
+	TUNABLE_INT_FETCH("vm.vnode_pbufs", );
+	vnode_pbuf_zone = pbuf_zsecond_create("vnpbuf", nvnpbufs);
 }
 SYSINIT(vnode_pager, SI_SUB_CPU, SI_ORDER_ANY, vnode_pager_init, NULL);
 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Mark Johnston
On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote:
> On Wed, 13 Feb 2019, Justin Hibbits wrote:
> 
> > On Tue, 15 Jan 2019 01:02:17 + (UTC)
> > Gleb Smirnoff  wrote:
> >
> >> Author: glebius
> >> Date: Tue Jan 15 01:02:16 2019
> >> New Revision: 343030
> >> URL: https://svnweb.freebsd.org/changeset/base/343030
> >>
> >> Log:
> >>   Allocate pager bufs from UMA instead of 80-ish mutex protected
> >> linked list.
> > ...
> >
> > This seems to break 32-bit platforms, or at least 32-bit book-e
> > powerpc, which has a limited KVA space (~500MB).  It preallocates I've
> > seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
> > leaving very little left for the rest of runtime.
> 
> Hrmph.  I complained other things in this commit this when it was
> committed, but not this largest bug since preallocation was broken then
> so I thought that it wasn't done, so that problems are smaller unless the
> excessive limits are actually reached.
> 
> Now i386 does it:
> 
> XX ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
> XX 
> XX swrbuf: 336,128,   0,   0,   0,   0,   0
> XX swwbuf: 336, 64,   0,   0,   0,   0,   0
> XX nfspbuf:336,128,   0,   0,   0,   0,   0
> XX mdpbuf: 336, 25,   0,   0,   0,   0,   0
> XX clpbuf: 336,128,   0,   5,   4,   0,   0
> XX vnpbuf: 336,   2048,   0,   0,   0,   0,   0
> XX pbuf:   336, 16,   0,2535,   0,   0,   0
> 
> but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not
> noticed there.
> 
> The preallocation wasn't there in my last mail to the author about nearby
> bugs, on 24 Jan 2019:
> 
> YY vnpbuf: 568,   2048,   0,   0,   0,   0,   0
> YY clpbuf: 568,128,   0, 128,8750,   0,   1
> YY pbuf:   568, 16,   0,   4,   0,   0,   0
> 
> This output is on amd64 where the SIZE is larger and everything else was
> the same as on i386.  Now amd64 shows the large preallocation too.
> 
> There seems to be another bug for the especially small LIMIT of 16 to
> turn into a preallocation of 2535 and not cause immediate reduction to
> the limit.
> 
> I happen to have kernels from 24 and 25 Jan handy.  The first one is
> amd64 r343346M built on Jan 23, and it doesn't do the large
> preallocation.  The second one is i386 r343388:343418M built on Jan
> 25, and it does the large preallocation.  Both call uma_prealloc() to
> ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates
> 4 buffers while later version allocate 0x9e9 buffers.
> 
> The only relevant commit between the good and bad versions seems to be
> r343453.  This fixes uma_prealloc() to actually work.  But it is a feature
> for it to not work when its caller asks for too much.

I guess you meant r343353.  In any case, the pbuf keg is _NOFREE, so
even without preallocation the large pbuf zone limits may become
problematic if there are bursts of allocation requests.

> 0x9e9 is the sum of the LIMITs of all pbuf pools.  The main bug in
> r343030 is that it expands nswbuf, which is supposed to give the
> combined limit, from its normal value of 256 to 0x9e9.  (r343030
> actually used nswbuf before it was properly initialized, so used its
> maximum value of 256 even on small systems with nswbuf = 16.  Only
> this has been fixed.)
> 
> On i386, nbuf is excessively limited so as to give a maxbufspace of
> about 100MB so as to fit in 1GB of kva even with infinite RAM and
> -current's actual 4GB of kva.  nbuf is correctly limited to give a
> much smaller maxbufspace when RAM is small (kva scaling for this is
> not done so well).  nswbuf is restricted if nbuf is restricted, but
> not enough (except in my version).  It is normally 256, so the pbuf
> allocation used to be 32MB, and this is already a bit large compared
> with 100MB for maxbufspace.  Expanding pbufs by a factor of 0x9e9/0x100
> gives the silly combination of 100MB for maxbufspace and 317MB for
> pbufs.
> 
> If kva is only 512MB instead of 1GB, then maxbufspace should be only
> 50MB and nswbuf should be smaller too.  Similarly for PAE on i386 back
> when it was configured with 1GB kva by default.  Only about 512MB are
> left after allocating space for page table metadata.  I have fixes
> that scale most of this better.  Large subsystems starting with kmem
> get a hard-coded fraction of the usable kva.  E.g., kmem gets about
> 60% of usable kva instead of about 40% of nominal kva.  Most other
> large subsystems including the buffer cache get about 1/8 of the
> remaining 40% of usable kva.  Scaling for other subsystems is mostly
> worse than for kmem.  pbufs are part of the buffer cache allocation.
> The expansion factor of 0x9e9/0x100 breaks this.
> 
> I don't understand how pbuf_preallocate() allocates 

Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-14 Thread Konstantin Belousov
On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote:
> I don't understand how pbuf_preallocate() allocates for the other
> pbuf pools.  When I debugged this for clpbufs, the preallocation was
> not used.  pbuf types other than clpbufs seem to be unused in my
> configurations.  I thought that pbufs were used during initialization,
> since they end up with a nonzero FREE count, but their only use seems
> to be to preallocate them.
vnode_pager_generic_getpages() typically not used for UFS on modern
systems. Instead the buffer pager is active which does not need pbufs,
it uses real buffers coherent with the UFS buffer cache.

To get to the actual use of pbufs now you can:
- perform clustered buffer io;
- use vnode-backed md(4) (this case is still broken if md(4) is loaded
  as a module);
- cause system swapping;
- use sendfile(2).
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-13 Thread Bruce Evans

On Wed, 13 Feb 2019, Justin Hibbits wrote:


On Tue, 15 Jan 2019 01:02:17 + (UTC)
Gleb Smirnoff  wrote:


Author: glebius
Date: Tue Jan 15 01:02:16 2019
New Revision: 343030
URL: https://svnweb.freebsd.org/changeset/base/343030

Log:
  Allocate pager bufs from UMA instead of 80-ish mutex protected
linked list.

...

This seems to break 32-bit platforms, or at least 32-bit book-e
powerpc, which has a limited KVA space (~500MB).  It preallocates I've
seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
leaving very little left for the rest of runtime.


Hrmph.  I complained other things in this commit this when it was
committed, but not this largest bug since preallocation was broken then
so I thought that it wasn't done, so that problems are smaller unless the
excessive limits are actually reached.

Now i386 does it:

XX ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
XX 
XX swrbuf: 336,128,   0,   0,   0,   0,   0

XX swwbuf: 336, 64,   0,   0,   0,   0,   0
XX nfspbuf:336,128,   0,   0,   0,   0,   0
XX mdpbuf: 336, 25,   0,   0,   0,   0,   0
XX clpbuf: 336,128,   0,   5,   4,   0,   0
XX vnpbuf: 336,   2048,   0,   0,   0,   0,   0
XX pbuf:   336, 16,   0,2535,   0,   0,   0

but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not
noticed there.

The preallocation wasn't there in my last mail to the author about nearby
bugs, on 24 Jan 2019:

YY vnpbuf: 568,   2048,   0,   0,   0,   0,   0
YY clpbuf: 568,128,   0, 128,8750,   0,   1
YY pbuf:   568, 16,   0,   4,   0,   0,   0

This output is on amd64 where the SIZE is larger and everything else was
the same as on i386.  Now amd64 shows the large preallocation too.

There seems to be another bug for the especially small LIMIT of 16 to
turn into a preallocation of 2535 and not cause immediate reduction to
the limit.

I happen to have kernels from 24 and 25 Jan handy.  The first one is
amd64 r343346M built on Jan 23, and it doesn't do the large
preallocation.  The second one is i386 r343388:343418M built on Jan
25, and it does the large preallocation.  Both call uma_prealloc() to
ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates
4 buffers while later version allocate 0x9e9 buffers.

The only relevant commit between the good and bad versions seems to be
r343453.  This fixes uma_prealloc() to actually work.  But it is a feature
for it to not work when its caller asks for too much.

0x9e9 is the sum of the LIMITs of all pbuf pools.  The main bug in
r343030 is that it expands nswbuf, which is supposed to give the
combined limit, from its normal value of 256 to 0x9e9.  (r343030
actually used nswbuf before it was properly initialized, so used its
maximum value of 256 even on small systems with nswbuf = 16.  Only
this has been fixed.)

On i386, nbuf is excessively limited so as to give a maxbufspace of
about 100MB so as to fit in 1GB of kva even with infinite RAM and
-current's actual 4GB of kva.  nbuf is correctly limited to give a
much smaller maxbufspace when RAM is small (kva scaling for this is
not done so well).  nswbuf is restricted if nbuf is restricted, but
not enough (except in my version).  It is normally 256, so the pbuf
allocation used to be 32MB, and this is already a bit large compared
with 100MB for maxbufspace.  Expanding pbufs by a factor of 0x9e9/0x100
gives the silly combination of 100MB for maxbufspace and 317MB for
pbufs.

If kva is only 512MB instead of 1GB, then maxbufspace should be only
50MB and nswbuf should be smaller too.  Similarly for PAE on i386 back
when it was configured with 1GB kva by default.  Only about 512MB are
left after allocating space for page table metadata.  I have fixes
that scale most of this better.  Large subsystems starting with kmem
get a hard-coded fraction of the usable kva.  E.g., kmem gets about
60% of usable kva instead of about 40% of nominal kva.  Most other
large subsystems including the buffer cache get about 1/8 of the
remaining 40% of usable kva.  Scaling for other subsystems is mostly
worse than for kmem.  pbufs are part of the buffer cache allocation.
The expansion factor of 0x9e9/0x100 breaks this.

I don't understand how pbuf_preallocate() allocates for the other
pbuf pools.  When I debugged this for clpbufs, the preallocation was
not used.  pbuf types other than clpbufs seem to be unused in my
configurations.  I thought that pbufs were used during initialization,
since they end up with a nonzero FREE count, but their only use seems
to be to preallocate them.

Bruce
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To 

Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-02-13 Thread Justin Hibbits
On Tue, 15 Jan 2019 01:02:17 + (UTC)
Gleb Smirnoff  wrote:

> Author: glebius
> Date: Tue Jan 15 01:02:16 2019
> New Revision: 343030
> URL: https://svnweb.freebsd.org/changeset/base/343030
> 
> Log:
>   Allocate pager bufs from UMA instead of 80-ish mutex protected
> linked list. 
>   o In vm_pager_bufferinit() create pbuf_zone and start accounting on
> how many pbufs are we going to have set.
> In various subsystems that are going to utilize pbufs create
> private zones via call to pbuf_zsecond_create(). The latter calls
> uma_zsecond_create(), and sets a limit on created zone. After startup
> preallocate pbufs according to requirements of all pbuf zones.
>   
> Subsystems that used to have a private limit with old allocator
> now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS
> cluster, FFS, swap, vnode pager.
>   
> The following subsystems use shared pbuf zone: cam(4), nvme(4),
> physio(9), aio(4). They should have their private limits, but
> changing that is out of scope of this commit.
>   
>   o Fetch tunable value of kern.nswbuf from init_param2() and while
> here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that
> was holding only this option.
> Default values aren't touched by this commit, but they probably
> should be reviewed wrt to modern hardware.
>   
>   This change removes a tight bottleneck from sendfile(2) operation,
> that uses pbufs in vnode pager. Other pagers also would benefit from
> faster allocation.
>   
>   Together with:  gallatin
>   Tested by:  pho
> 
> Modified:
>   head/sys/cam/cam_periph.c
>   head/sys/conf/options
>   head/sys/dev/md/md.c
>   head/sys/dev/nvme/nvme_ctrlr.c
>   head/sys/fs/fuse/fuse_main.c
>   head/sys/fs/fuse/fuse_vnops.c
>   head/sys/fs/nfsclient/nfs_clbio.c
>   head/sys/fs/nfsclient/nfs_clport.c
>   head/sys/fs/smbfs/smbfs_io.c
>   head/sys/fs/smbfs/smbfs_vfsops.c
>   head/sys/kern/kern_physio.c
>   head/sys/kern/subr_param.c
>   head/sys/kern/vfs_aio.c
>   head/sys/kern/vfs_bio.c
>   head/sys/kern/vfs_cluster.c
>   head/sys/sys/buf.h
>   head/sys/ufs/ffs/ffs_rawread.c
>   head/sys/vm/swap_pager.c
>   head/sys/vm/vm_pager.c
>   head/sys/vm/vnode_pager.c
> 

Hi Gleb,

This seems to break 32-bit platforms, or at least 32-bit book-e
powerpc, which has a limited KVA space (~500MB).  It preallocates I've
seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
leaving very little left for the rest of runtime.

I spent a couple hours earlier today debugging with Mark Johnston, and
his consensus is that the vnode_pbuf_zone is too big on 32-bit
platforms.  Unfortunately I know very little about this area, so can't
provide much extra insight, but can readily reproduce the issues I see
triggered by this change, so am willing to help where I can.

- Justin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-01-17 Thread Conrad Meyer
On WITNESS builds after this change and all followup fixes I can see
(@ r343108), I get a new warning:

Sleeping on "pageprocwait" with the following non-sleepable locks held:
exclusive sleep mutex pbuf (UMA zone) r = 0 (0xf80003033e00)
locked @ .../sys/vm/uma_core.c:1139
stack backtrace:
#0 0x80c0a164 at witness_debugger.part.14+0xa4
#1 0x80c0d465 at witness_warn+0x285
#2 0x80baa3b9 at _sleep+0x59
#3 0x80f03c73 at vm_wait_doms+0x103
#4 0x80eeb8e5 at vm_domainset_iter_policy+0x55
#5 0x80eeab59 at uma_prealloc+0xc9
#6 0x80f0d643 at pbuf_zsecond_create+0x63
#7 0x80ee2c9f at swap_pager_swap_init+0x5f
#8 0x80f0cf57 at vm_pageout+0x27

For what it's worth, this is a bhyve guest with vm.ndomains 1.  (The
bhyve host has 2 domains, but I don't see how that would be relevant.)

Best,
Conrad

On Mon, Jan 14, 2019 at 5:02 PM Gleb Smirnoff  wrote:
>
> Author: glebius
> Date: Tue Jan 15 01:02:16 2019
> New Revision: 343030
> URL: https://svnweb.freebsd.org/changeset/base/343030
>
> Log:
>   Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
>
>   o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
> pbufs are we going to have set.
> In various subsystems that are going to utilize pbufs create private zones
> via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
> and sets a limit on created zone. After startup preallocate pbufs 
> according
> to requirements of all pbuf zones.
>
> Subsystems that used to have a private limit with old allocator now have
> private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
> swap, vnode pager.
>
> The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
> aio(4). They should have their private limits, but changing that is out of
> scope of this commit.
>
>   o Fetch tunable value of kern.nswbuf from init_param2() and while here move
> NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
> this option.
> Default values aren't touched by this commit, but they probably should be
> reviewed wrt to modern hardware.
>
>   This change removes a tight bottleneck from sendfile(2) operation, that
>   uses pbufs in vnode pager. Other pagers also would benefit from faster
>   allocation.
>
>   Together with:gallatin
>   Tested by:pho
>
> Modified:
>   head/sys/cam/cam_periph.c
>   head/sys/conf/options
>   head/sys/dev/md/md.c
>   head/sys/dev/nvme/nvme_ctrlr.c
>   head/sys/fs/fuse/fuse_main.c
>   head/sys/fs/fuse/fuse_vnops.c
>   head/sys/fs/nfsclient/nfs_clbio.c
>   head/sys/fs/nfsclient/nfs_clport.c
>   head/sys/fs/smbfs/smbfs_io.c
>   head/sys/fs/smbfs/smbfs_vfsops.c
>   head/sys/kern/kern_physio.c
>   head/sys/kern/subr_param.c
>   head/sys/kern/vfs_aio.c
>   head/sys/kern/vfs_bio.c
>   head/sys/kern/vfs_cluster.c
>   head/sys/sys/buf.h
>   head/sys/ufs/ffs/ffs_rawread.c
>   head/sys/vm/swap_pager.c
>   head/sys/vm/vm_pager.c
>   head/sys/vm/vnode_pager.c
>
> Modified: head/sys/cam/cam_periph.c
> ==
> --- head/sys/cam/cam_periph.c   Tue Jan 15 00:52:41 2019(r343029)
> +++ head/sys/cam/cam_periph.c   Tue Jan 15 01:02:16 2019(r343030)
> @@ -936,7 +936,7 @@ cam_periph_mapmem(union ccb *ccb, struct cam_periph_ma
> /*
>  * Get the buffer.
>  */
> -   mapinfo->bp[i] = getpbuf(NULL);
> +   mapinfo->bp[i] = uma_zalloc(pbuf_zone, M_WAITOK);
>
> /* put our pointer in the data slot */
> mapinfo->bp[i]->b_data = *data_ptrs[i];
> @@ -962,9 +962,9 @@ cam_periph_mapmem(union ccb *ccb, struct cam_periph_ma
> for (j = 0; j < i; ++j) {
> *data_ptrs[j] = mapinfo->bp[j]->b_caller1;
> vunmapbuf(mapinfo->bp[j]);
> -   relpbuf(mapinfo->bp[j], NULL);
> +   uma_zfree(pbuf_zone, mapinfo->bp[j]);
> }
> -   relpbuf(mapinfo->bp[i], NULL);
> +   uma_zfree(pbuf_zone, mapinfo->bp[i]);
> PRELE(curproc);
> return(EACCES);
> }
> @@ -1052,7 +1052,7 @@ cam_periph_unmapmem(union ccb *ccb, struct cam_periph_
> vunmapbuf(mapinfo->bp[i]);
>
> /* release the buffer */
> -   relpbuf(mapinfo->bp[i], NULL);
> +   uma_zfree(pbuf_zone, mapinfo->bp[i]);
> }
>
> /* allow ourselves to be swapped once again */
>
> Modified: head/sys/conf/options
> ==
> --- head/sys/conf/options   Tue Jan 15 00:52:41 2019(r343029)
> +++ head/sys/conf/options   Tue Jan 

Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-01-15 Thread Gleb Smirnoff
On Tue, Jan 15, 2019 at 11:13:18AM -0500, Pedro Giffuni wrote:
P> >>    Allocate pager bufs from UMA instead of 80-ish mutex protected 
P> >> linked list.
P> >
P> >>    Together with:    gallatin
P> >
P> > Thank you so much for carrying this over the finish line!
P> >
P> It appears to be very impressive! Plans for MFC?

Nope. I'm very conservative about stable branch being stable branch :)

-- 
Gleb Smirnoff
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-01-15 Thread Pedro Giffuni


On 1/15/19 11:07 AM, Andrew Gallatin wrote:

On 1/14/19 8:02 PM, Gleb Smirnoff wrote:


Log:
   Allocate pager bufs from UMA instead of 80-ish mutex protected 
linked list.


<...>


   Together with:    gallatin


Thank you so much for carrying this over the finish line!

Drew



It appears to be very impressive! Plans for MFC?

Pedro.

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

2019-01-15 Thread Andrew Gallatin

On 1/14/19 8:02 PM, Gleb Smirnoff wrote:


Log:
   Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.


<...>


   Together with:   gallatin


Thank you so much for carrying this over the finish line!

Drew

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"