Re: nullfs and ZFS issues

2022-04-20 Thread Alexander Leidinger
Quoting Doug Ambrisko  (from Wed, 20 Apr 2022  
09:20:33 -0700):



On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote:
| Quoting Doug Ambrisko  (from Mon, 18 Apr 2022
| 16:32:38 -0700):
|
| > With nullfs, nocache and settings max vnodes to a low number I can
|
| Where is nocache documented? I don't see it in mount_nullfs(8),
| mount(8) or nullfs(5).

I didn't find it but it is in:
	src/sys/fs/nullfs/null_vfsops.c:  if (vfs_getopt(mp->mnt_optnew,  
"nocache", NULL, NULL) == 0 ||


Also some file systems disable it via MNTK_NULL_NOCACHE


Does the attached diff look ok?


| I tried a nullfs mount with nocache and it doesn't show up in the
| output of "mount".

Yep, I saw that as well.  I could tell by dropping into ddb and then
do a show mount on the FS and look at the count.  That is why I added
the vnode count to mount -v so I could see the usage without dropping
into ddb.


I tried nocache on a system with a lot of jails which use nullfs,  
which showed very slow behavior in the daily periodic runs (12h runs  
in the night after boot, 24h or more in subsequent nights). Now the  
first nightly run after boot was finished after 4h.


What is the benefit of not disabling the cache in nullfs? I would  
expect zfs (or ufs) to cache the (meta)data anyway.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF
diff --git a/sbin/mount/mount.8 b/sbin/mount/mount.8
index 2a877c04c07..823df63953d 100644
--- a/sbin/mount/mount.8
+++ b/sbin/mount/mount.8
@@ -28,7 +28,7 @@
 .\" @(#)mount.8	8.8 (Berkeley) 6/16/94
 .\" $FreeBSD$
 .\"
-.Dd March 17, 2022
+.Dd April 21, 2022
 .Dt MOUNT 8
 .Os
 .Sh NAME
@@ -245,6 +245,9 @@ This file system should be skipped when
 is run with the
 .Fl a
 flag.
+.It Cm nocache
+Disable caching.
+Some filesystems may not support this.
 .It Cm noclusterr
 Disable read clustering.
 .It Cm noclusterw


pgpYPDNVeBw1Z.pgp
Description: Digitale PGP-Signatur


'set but unused' breaks drm-*-kmod

2022-04-20 Thread Michael Butler

Seems this new requirement breaks kmod builds too ..

The first of many errors was (I stopped chasing them all for lack of 
time) ..


--- amdgpu_cs.o ---
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: 
error: variable 'priority' set but not used 
[-Werror,-Wunused-but-set-variable]

enum drm_sched_priority priority;
^
1 error generated.
*** [amdgpu_cs.o] Error code 1



Re: nullfs and ZFS issues

2022-04-20 Thread Doug Ambrisko
On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote:
| Quoting Doug Ambrisko  (from Mon, 18 Apr 2022  
| 16:32:38 -0700):
| 
| > With nullfs, nocache and settings max vnodes to a low number I can
| 
| Where is nocache documented? I don't see it in mount_nullfs(8),  
| mount(8) or nullfs(5).

I didn't find it but it is in:
src/sys/fs/nullfs/null_vfsops.c:  if (vfs_getopt(mp->mnt_optnew, 
"nocache", NULL, NULL) == 0 ||

Also some file systems disable it via MNTK_NULL_NOCACHE

| I tried a nullfs mount with nocache and it doesn't show up in the  
| output of "mount".

Yep, I saw that as well.  I could tell by dropping into ddb and then
do a show mount on the FS and look at the count.  That is why I added
the vnode count to mount -v so I could see the usage without dropping
into ddb.

Doug A.



Re: nullfs and ZFS issues

2022-04-20 Thread Doug Ambrisko
On Wed, Apr 20, 2022 at 11:43:10AM +0200, Mateusz Guzik wrote:
| On 4/19/22, Doug Ambrisko  wrote:
| > On Tue, Apr 19, 2022 at 11:47:22AM +0200, Mateusz Guzik wrote:
| > | Try this: https://people.freebsd.org/~mjg/vnlru_free_pick.diff
| > |
| > | this is not committable but should validate whether it works fine
| >
| > As a POC it's working.  I see the vnode count for the nullfs and
| > ZFS go up.  The ARC cache also goes up until it exceeds the ARC max.
| > size tten the vnodes for nullfs and ZFS goes down.  The ARC cache goes
| > down as well.  This all repeats over and over.  The systems seems
| > healthy.  No excessive running of arc_prune or arc_evict.
| >
| > My only comment is that the vnode freeing seems a bit agressive.
| > Going from ~15,000 to ~200 vnode for nullfs and the same for ZFS.
| > The ARC drops from 70M to 7M (max is set at 64M) for this unit
| > test.
| >
| 
| Can you check what kind of shrinking is requested by arc to begin
| with? I imagine encountering a nullfs vnode may end up recycling 2
| instead of 1, but even repeated a lot it does not explain the above.

I dug it into a bit more and think there could be a bug in:
module/zfs/arc.c
arc_evict_meta_balanced(uint64_t meta_used)
prune += zfs_arc_meta_prune;
//arc_prune_async(prune);
arc_prune_async(zfs_arc_meta_prune);

Since arc_prune_async, is queuing up a run of arc_prune_task for each
call it is actually already accumulating the zfs_arc_meta_prune
amount.  It makes the count to vnlru_free_impl get really big quickly
since it is looping via restart.

   1 HELLO arc_prune_task 164   ticks 2147465958 count 2048

dmesg | grep arc_prune_task | uniq -c
  14 HELLO arc_prune_task 164   ticks -2147343772 count 100
  50 HELLO arc_prune_task 164   ticks -2147343771 count 100
  46 HELLO arc_prune_task 164   ticks -2147343770 count 100
  49 HELLO arc_prune_task 164   ticks -2147343769 count 100
  44 HELLO arc_prune_task 164   ticks -2147343768 count 100
 116 HELLO arc_prune_task 164   ticks -2147343767 count 100
1541 HELLO arc_prune_task 164   ticks -2147343766 count 100
  53 HELLO arc_prune_task 164   ticks -2147343101 count 100
 100 HELLO arc_prune_task 164   ticks -2147343100 count 100
  75 HELLO arc_prune_task 164   ticks -2147343099 count 100
  52 HELLO arc_prune_task 164   ticks -2147343098 count 100
  50 HELLO arc_prune_task 164   ticks -2147343097 count 100
  51 HELLO arc_prune_task 164   ticks -2147343096 count 100
 783 HELLO arc_prune_task 164   ticks -2147343095 count 100
 884 HELLO arc_prune_task 164   ticks -2147343094 count 100

Note I shrunk vfs.zfs.arc.meta_prune=100 to see how that might
help.  Changing it to 1, helps more!  I see less agressive
swings.

I added
printf("HELLO %s %d   ticks %d count 
%ld\n",__FUNCTION__,__LINE__,ticks,nr_scan);

to arc_prune_task.

Adjusting both
sysctl vfs.zfs.arc.meta_adjust_restarts=1
sysctl vfs.zfs.arc.meta_prune=100

without changing arc_prune_async(prune) helps avoid excessive swings.

Thanks,

Doug A.

| > | On 4/19/22, Mateusz Guzik  wrote:
| > | > On 4/19/22, Mateusz Guzik  wrote:
| > | >> On 4/19/22, Doug Ambrisko  wrote:
| > | >>> I've switched my laptop to use nullfs and ZFS.  Previously, I used
| > | >>> localhost NFS mounts instead of nullfs when nullfs would complain
| > | >>> that it couldn't mount.  Since that check has been removed, I've
| > | >>> switched to nullfs only.  However, every so often my laptop would
| > | >>> get slow and the the ARC evict and prune thread would consume two
| > | >>> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
| > | >>> it to 2G now.  Looking into this has uncovered some issues:
| > | >>>  -nullfs would prevent vnlru_free_vfsops from doing 
anything
| > | >>>   when called from ZFS arc_prune_task
| > | >>>  -nullfs would hang onto a bunch of vnodes unless mounted 
with
| > | >>>   nocache
| > | >>>  -nullfs and nocache would break untar.  This has been 
fixed
| > now.
| > | >>>
| > | >>> With nullfs, nocache and settings max vnodes to a low number I can
| > | >>> keep the ARC around the max. without evict and prune consuming
| > | >>> 100% of 2 cores.  This doesn't seem like the best solution but it
| > | >>> better then when the ARC starts spinning.
| > | >>>
| > | >>> Looking into this issue with bhyve and a md drive for testing I
| > create
| > | >>> a brand new zpool mounted as /test and then nullfs mount /test to
| > /mnt.
| > | >>> I loop through untaring the Linux kernel into the nullfs mount, rm
| > -rf
| > | >>> it
| > | >>> and repeat.  I set the ARC to the smallest value I can.  Untarring
| > the
| > | >>> Linux kernel was enough to get the ARC evict and prune to spin since
| > | >>> they couldn't evict/prune anything.
| > | >>>
| > | >>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
| > | >>>   static int

Re: nullfs and ZFS issues

2022-04-20 Thread Mateusz Guzik
On 4/19/22, Doug Ambrisko  wrote:
> On Tue, Apr 19, 2022 at 11:47:22AM +0200, Mateusz Guzik wrote:
> | Try this: https://people.freebsd.org/~mjg/vnlru_free_pick.diff
> |
> | this is not committable but should validate whether it works fine
>
> As a POC it's working.  I see the vnode count for the nullfs and
> ZFS go up.  The ARC cache also goes up until it exceeds the ARC max.
> size tten the vnodes for nullfs and ZFS goes down.  The ARC cache goes
> down as well.  This all repeats over and over.  The systems seems
> healthy.  No excessive running of arc_prune or arc_evict.
>
> My only comment is that the vnode freeing seems a bit agressive.
> Going from ~15,000 to ~200 vnode for nullfs and the same for ZFS.
> The ARC drops from 70M to 7M (max is set at 64M) for this unit
> test.
>

Can you check what kind of shrinking is requested by arc to begin
with? I imagine encountering a nullfs vnode may end up recycling 2
instead of 1, but even repeated a lot it does not explain the above.

>
> | On 4/19/22, Mateusz Guzik  wrote:
> | > On 4/19/22, Mateusz Guzik  wrote:
> | >> On 4/19/22, Doug Ambrisko  wrote:
> | >>> I've switched my laptop to use nullfs and ZFS.  Previously, I used
> | >>> localhost NFS mounts instead of nullfs when nullfs would complain
> | >>> that it couldn't mount.  Since that check has been removed, I've
> | >>> switched to nullfs only.  However, every so often my laptop would
> | >>> get slow and the the ARC evict and prune thread would consume two
> | >>> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
> | >>> it to 2G now.  Looking into this has uncovered some issues:
> | >>>  -  nullfs would prevent vnlru_free_vfsops from doing anything
> | >>> when called from ZFS arc_prune_task
> | >>>  -  nullfs would hang onto a bunch of vnodes unless mounted with
> | >>> nocache
> | >>>  -  nullfs and nocache would break untar.  This has been fixed
> now.
> | >>>
> | >>> With nullfs, nocache and settings max vnodes to a low number I can
> | >>> keep the ARC around the max. without evict and prune consuming
> | >>> 100% of 2 cores.  This doesn't seem like the best solution but it
> | >>> better then when the ARC starts spinning.
> | >>>
> | >>> Looking into this issue with bhyve and a md drive for testing I
> create
> | >>> a brand new zpool mounted as /test and then nullfs mount /test to
> /mnt.
> | >>> I loop through untaring the Linux kernel into the nullfs mount, rm
> -rf
> | >>> it
> | >>> and repeat.  I set the ARC to the smallest value I can.  Untarring
> the
> | >>> Linux kernel was enough to get the ARC evict and prune to spin since
> | >>> they couldn't evict/prune anything.
> | >>>
> | >>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
> | >>>   static int
> | >>>   vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode
> *mvp)
> | >>>   {
> | >>> ...
> | >>>
> | >>> for (;;) {
> | >>> ...
> | >>> vp = TAILQ_NEXT(vp, v_vnodelist);
> | >>> ...
> | >>>
> | >>> /*
> | >>>  * Don't recycle if our vnode is from different type
> | >>>  * of mount point.  Note that mp is type-safe, the
> | >>>  * check does not reach unmapped address even if
> | >>>  * vnode is reclaimed.
> | >>>  */
> | >>> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
> | >>> mp->mnt_op != mnt_op) {
> | >>> continue;
> | >>> }
> | >>> ...
> | >>>
> | >>> The vp ends up being the nulfs mount and then hits the continue
> | >>> even though the passed in mvp is on ZFS.  If I do a hack to
> | >>> comment out the continue then I see the ARC, nullfs vnodes and
> | >>> ZFS vnodes grow.  When the ARC calls arc_prune_task that calls
> | >>> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
> | >>> The ARC cache usage also goes down.  Then they increase again until
> | >>> the ARC gets full and then they go down again.  So with this hack
> | >>> I don't need nocache passed to nullfs and I don't need to limit
> | >>> the max vnodes.  Doing multiple untars in parallel over and over
> | >>> doesn't seem to cause any issues for this test.  I'm not saying
> | >>> commenting out continue is the fix but a simple POC test.
> | >>>
> | >>
> | >> I don't see an easy way to say "this is a nullfs vnode holding onto a
> | >> zfs vnode". Perhaps the routine can be extrended with issuing a nullfs
> | >> callback, if the module is loaded.
> | >>
> | >> In the meantime I think a good enough(tm) fix would be to check that
> | >> nothing was freed and fallback to good old regular clean up without
> | >> filtering by vfsops. This would be very similar to what you are doing
> | >> with your hack.
> | >>
> | >
> | > Now that I wrote this perhaps an acceptable hack would be to extend
> | > struct mount with a pointer to "lower layer" mount 

Re: nullfs and ZFS issues

2022-04-20 Thread Alexander Leidinger
Quoting Doug Ambrisko  (from Mon, 18 Apr 2022  
16:32:38 -0700):



With nullfs, nocache and settings max vnodes to a low number I can


Where is nocache documented? I don't see it in mount_nullfs(8),  
mount(8) or nullfs(5).


I tried a nullfs mount with nocache and it doesn't show up in the  
output of "mount".


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpGalMcrXooX.pgp
Description: Digitale PGP-Signatur