Re: Adaptive RW locks

2020-03-10 Thread Andrew Doran
Following up again..  It turns out the deadlocks I saw with this were down
to a problem with kernel_lock that has since been solved.

I made a couple more tweaks to it: up the max number of tracked holds to 8
because vmobjlock became a rwlock, and because aiodoned is gone now be more
selective giving up and going to sleep in the softint case (walk down the
list of pinned LWPs on curcpu and see if any of them holds the lock that's
wanted, if not held on curcpu then some other CPU has it, and it's OK to
spin).

http://www.netbsd.org/~ad/2020/rwlock3.diff

I'm going to sit on this change for a while longer though to make sure there
is no performance regression or other problems.

Andrew


re: Please review: lookup changes

2020-03-10 Thread matthew green
> This reminds me that we need to find a way to release upper layer
> vnodes when the lower layer is recyclable -- see the comment in
> layer_inactive.  Otherwise files deleted under a null mount while open
> through the null mount may persist on disk indefinitely as long as the
> upper layer is still around...

FWIW, i think it's worse than you describe.  for me:

- it doesn't matter what layer i delete the files from

- the files aren't open anymore

the only way i found to recover disk space was to unmount
the upper layer (which hung for a while, assuming writing
the new disk contents), or, later reduce kern.maxvnodes.


.mrg.


Re: NULL pointer arithmetic issues

2020-03-10 Thread Greg A. Woods
At Mon, 9 Mar 2020 17:36:24 +0100, Joerg Sonnenberger  wrote:
Subject: Re: NULL pointer arithmetic issues
>
> I consider it as something even worse. Just like the case of passing
> NULL pointers to memcpy and friends with zero as size, this
> interpretation / restriction in the standard is actively harmful to some
> code for the sake of potential optimisation opportunities in other code.
> It seems to be a poor choice at that. I.e. it requires adding
> conditional branches for something that behaves sanely everywhere but
> may the DS9k.

Indeed.

I way the very existence of anything called "Undefined Behaviour" and
its exploitation by optimizers is evil.  (by definition, especially if
we accept as valid the claim that "Premature optimization is the root of
all evil in programming" -- this is of course a little bit of a stretch
since my claim could be twisted to say that any and all automatic
optimzation by a compiler or toolchain is evil, but of course that's not
exactly my intent -- normal optimization which does not change the
behaviour and intent of the code is, IMO, OK, but defining "intent" is
obviously the problem)

So in Standard C all "Undefined Behaviour" should be changed to
"Implementation Defined" and it should be required that the
implementation is not allowed to abuse any such things for the evil
purpose of premature optimzation.  For this kind of thing adding an
integer to a pointer (or the equivalent, e.g. taking the address of a
field in a struct pointed to by a nil pointer) should always do just
that, even if the pointer can be proven to be a nil pointer at compile
time.  It is wrong to do anything else, and absolute insanity to remove
any other code just because the compiler assumes SIGSEGV
would/should/could happen before the other code gets a chance to run.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpGIvFrvwd81.pgp
Description: OpenPGP Digital Signature


Re: Please review: lookup changes

2020-03-10 Thread Taylor R Campbell
> Date: Mon, 9 Mar 2020 19:59:03 +
> From: Andrew Doran 
> 
> As far as I recall layer vnodes persist unless recycled like any other
> vnode, so there should be nothing special happening there around contention
> due to lifecycle.  I think you could cache or transpose the cached names
> above, maybe using some kind of VOP.  I definitely think it's doable but
> beyond that, at least for me thare lots of unknowns so I have avoided it.

This reminds me that we need to find a way to release upper layer
vnodes when the lower layer is recyclable -- see the comment in
layer_inactive.  Otherwise files deleted under a null mount while open
through the null mount may persist on disk indefinitely as long as the
upper layer is still around...

(But no need for that to be in the scope of namecache improvements.)

> I also expect that if ufs_lookup() were opened up to allow a shared lock
> along the full path then pressure would be diverted to buffer locks
> previously shielded by the vnode lock.  I would be unsurprised to see that
> performing worse under contention than the single exclusive lock with that
> code as-is.  There's also the fact that modifications and uncached lookups*
> are often a tiny proportion of activity on the system in comparison to
> cached lookups, so without a specific application in mind I'm not sure
> there's a great deal to be won there.
> 
> Looking up names directly in the namecache bypasses these problem to some
> extent, because the directory vnode lock needs to be touched far less often,
> and when it is touched it's because there's real work to be done.

OK, makes sense.

Other than layerfs specifically avoiding the namecache for now, are
there any other barriers to pulling the copypasta cache_* calls out of
every VOP_LOOKUP and into namei?

> LOCKLEAF is something very old and LOCKSHARED a minor tweak to it (a
> conditional picking either LK_SHARED/LK_EXCLUSIVE on the way out of
> namei()).  I'd prefer that namei() didn't lock any vnodes to begin with but
> that's the scheme we have (see my comment re: 1993 :-).  It's also what
> FreeBSD does for this and in my experience these little differences are
> troublesome WRT code sharing so I decided to match their flag.

How much code sharing can we actually take advantage of in the vfs
layer?

> > > vfs_syscalls.c:
> > > 
> > >   Corrected vnode lock types: LK_SHARED ok for VOP_ACCESS(), but
> > >   LK_EXCLUSIVE still needed for VOP_GETATTR() due to itimes handling.
> > >   That's a shame and it would be nice to use a seqlock or something
> > >   there eventually.
> > 
> > I don't understand this itimes handling.  Why do we ever _set_ any
> > itimes in VOP_GETATTR?
> > [...]
> > Why don't we check our watch immediately when we set IN_CHANGE ,
> > and defer the disk write to update the inode like UFS_WAPBL_UPDATE
> > does -- and just skip UFS_ITIMES in VOP_GETATTR?
> 
> This looks to go back least as far as Unix 32V (& possibly further) in one
> form or another, from what I can see.  Whatever the original intent I think
> it might have morphed into an attempt to avoid querying the system clock at
> some point, which is about the only point I can see for it now.

That was my conclusion too.  Let's get rid of it and make VOP_GETATTR
work with a shared lock (if not a seqlock)!

> > > vfs_vnode.c:
> > > 
> > >   Namecache related changes, and the change to not block new vnode
> > >   references across VOP_INACTIVE() mentioned on tech-kern.  I don't
> > >   think this blocking is needed at least for well behaved FSes like
> > >   ffs & tmpfs.
> > 
> > hannken@ cautions me that not blocking until VOP_INACTIVE completes
> > may be dangerous.
> > 
> > The last major deadlock I found in lfs is [...]
> 
> Is this something that should be done in VOP_RECLAIM() instead?  Then any
> concerns about VOP_INACTIVE() don't apply since the inode is well and truly
> dead.

I don't think so.  I believe the reason we block during VOP_INACTIVE
is to prevent file systems from gaining new references to vnodes while
vrele is deciding whether to reclaim them -- that way, the decision to
reclaim a vnode is final and made, under the vnode lock, in one place
that doesn't have to worry about backing out.

Historically, the logic that dealt with backing out of the decision to
reclaim a vnode has been riddled with races and bugs and incoherent
workarounds, so I think it's a Good Thing that hannken@ structured the
logic so that the decision really is final.

For the lfs segment writer, examining a vnode that may be about to be
reclaimed is safe because the lfs segment writer never creates new
references -- actually it just wants to skip all of the vnodes with
ip->i_nlink == 0 because they never need to be written out to disk.
The namecache is a different story, though!


Re: modload(8) v.s. alias

2020-03-10 Thread Rin Okuyama

Hi,

Thank you for your kind explanation! I understand the situation.
The current behavior is clean and safe, and we should not try to
``improve'' it easily.

For the case of NFS on m68k, we can rename ffs to __ffssi2 for
__KERNEL; we do not provide ffs(9) as a kernel routine.

Thanks,
rin

On 2020/03/10 0:49, Paul Goyette wrote:

Our in-kernel linker does not understand weak references.  It would
be a lot of work to implement, and there are some issues that would
need to be considered carefully.

For example, if module A defines a weak alias, and module B is loaded
and resolves the alias, what do you do if module C gets loaded and we
discover a "strong" definition of the same symbol?  Do you keep the
original resolution, or do you re-resolve it to the new definition?


On Mon, 9 Mar 2020, Rin Okuyama wrote:


Hi,

modload(8) fails if some depended functions are alias, at least
on m68k. For example:

  # uname -a
  NetBSD  9.99.49 NetBSD 9.99.49 (GENERIC) #6: Mon Mar  9 22:53:07 JST 2020 
rin@latipes:/build/src/sys/arch/sun2/compile/GENERIC sun2
  # modload nfs
  [xxx.xxx] kobj_checksyms, 998: [nfs]: linker error: symbol '__ffssi2' not 
found
  [xxx.xxx] WARNING: module error: unable to affix module 'nfs', error 8
  modload: nfs: Exec format error
  #

Here, __ffssi4 is weak alias of ffs on m68k:

  # nm /netbsd | grep __ffssi2
  0012327c W __ffssi2
  # nm /netbsd | grep 0012327c
  0012327c W __ffssi2
  0012327c T ffs
  #

If this symbol is turned into a ``real'' object by attached patch,
modload(8) and loaded module work fine:

  # nm /netbsd | grep __ffssi4
  0012327c T __ffssi2
  # modload nfs
  # mount_nfs server:/path /mnt
  # ls /mnt
  foo bar baz ...
  #

Is this a bug or feature? How can I fix this problem?

Thanks,
rin

Index: common/lib/libc/arch/m68k/string/ffs.S
===
RCS file: /cvsroot/src/common/lib/libc/arch/m68k/string/ffs.S,v
retrieving revision 1.7
diff -p -u -r1.7 ffs.S
--- common/lib/libc/arch/m68k/string/ffs.S    9 Mar 2020 13:36:10 - 1.7
+++ common/lib/libc/arch/m68k/string/ffs.S    9 Mar 2020 14:09:13 -
@@ -45,7 +45,11 @@
/* bit = ffs(value) */
+#ifdef _LIBC
WEAK_ALIAS(__ffssi2,ffs)
+#else /* KERNEL */
+#define ffs __ffssi2
+#endif
#if (!defined(__mc68010__) && !defined(__mcoldfire__)) || defined(__mcfisac__)

!DSPAM:5e664fb8246343444541299!



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+