Re: NCHNAMLEN vnode cache limitation removal

2019-09-15 Thread David Holland
On Sat, Sep 14, 2019 at 09:51:12AM -0700, Jason Thorpe wrote:
 > >> Given the list of smart people I've seen discussing this, I'm
 > >> presumably just missing something, but what?
 > > 
 > > Loonix.
 > 
 > Linux isn't the only system that has such a capability.

No, but I'm pretty sure it was the first to commit to it (was quite a
ways back now) and it's where the assumption you can do this gets
baked into dodgy software.

-- 
David A. Holland
dholl...@netbsd.org


Re: NCHNAMLEN vnode cache limitation removal

2019-09-15 Thread Mateusz Guzik
On 9/14/19, Mouse  wrote:
>> [...], because fexecve is causing rumbles about doing significantly
>> more reverse lookups.
>
> Why is everyone so concerned about finding "the" name of an inode, or
> indeed any name for an inode?  As far as I can tell there has never
> been any promise that any given inode has any names pointing to it, at
> least not unless it's got no other references to it (in which case it
> would be freed if it had no names).
>
> Given the list of smart people I've seen discussing this, I'm
> presumably just missing something, but what?
>

I think an always working name resolution is nicer to users when it comes
to diagnosing issues in userspace. In particular if someone spawns a new
program, that program performs a lot of ops in given fd, you can check
what file it is if name resolution works (or you have a facility similar
to linux's /proc//fd). Otherwise you are left with dirty hacks.

There is a very different angle to this though and it has to do with
performance, especially in face of concurrent access. When doing SMP
you want to avoid writes to shared areas as they induce caches misses
on next access by other CPUs. If everyone accesses read-only they can
keep doing it without interfering with others. If everyone writes to
it, everyone else (minus 1) stalls waiting for their turn.

Vast majority of all lookups only care about the last path component.
Currently the kernel leapfrogs through all of them, e.g. consider the
lookup of /foo/bar/baz/qux.

Along the way it refs foo, locks foo, finds bar, references bar, locks
bar, unlocks and unrefs foo... repeat that with s/bar/baz/
and s/foo/bar/. Note that's very simplified (e.g. i skipped the
beginning of the lookup and other work like permission checking).
Crossing filesystems is another hurdle. This is only to highlight
the point below.

That's a lot of work single-threaded which can be partially avoided in
the common case, provided all the info is structured for it. But most
importantly that's a lot of atomic ops on shared areas which severely
limit your performance in case of concurrent access (especially on
multi-socket systems). It's a long way to get to a point where you can
do this in a write-free manner and FreeBSD is very early in there.

If everything is cached and you can roll forward in the common case
without refing/locking anything but the last component you win big
time. (Of course you can't "just" do it, the leapfrogging is there for
a reason but it can be worked out with tracking the state and having
safe memory reclamation.)

TL;DR it's minor quality of life improvement for users and a de facto
mandatory part of VFS if it is to scale on big systems.

-- 
Mateusz Guzik 



Re: NCHNAMLEN vnode cache limitation removal

2019-09-14 Thread Jason Thorpe



> On Sep 14, 2019, at 1:20 AM, David Holland  wrote:
> 
>> Given the list of smart people I've seen discussing this, I'm
>> presumably just missing something, but what?
> 
> Loonix.

Linux isn't the only system that has such a capability.

-- thorpej



Re: NCHNAMLEN vnode cache limitation removal

2019-09-14 Thread David Holland
On Fri, Sep 13, 2019 at 07:49:54PM -0400, Mouse wrote:
 > > [...], because fexecve is causing rumbles about doing significantly
 > > more reverse lookups.
 > 
 > Why is everyone so concerned about finding "the" name of an inode, or
 > indeed any name for an inode?  As far as I can tell there has never
 > been any promise that any given inode has any names pointing to it, at
 > least not unless it's got no other references to it (in which case it
 > would be freed if it had no names).
 > 
 > Given the list of smart people I've seen discussing this, I'm
 > presumably just missing something, but what?

Loonix.

-- 
David A. Holland
dholl...@netbsd.org


Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread Mouse
> [...], because fexecve is causing rumbles about doing significantly
> more reverse lookups.

Why is everyone so concerned about finding "the" name of an inode, or
indeed any name for an inode?  As far as I can tell there has never
been any promise that any given inode has any names pointing to it, at
least not unless it's got no other references to it (in which case it
would be freed if it had no names).

Given the list of smart people I've seen discussing this, I'm
presumably just missing something, but what?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread Rhialto
On Fri 13 Sep 2019 at 18:04:06 +, David Holland wrote:
> On Wed, Sep 11, 2019 at 09:43:18AM +0300, Jason Thorpe wrote:
>  > Another advantage of de-duping the name strings is that you can
>  > compare names by testing pointer equivalence.
> 
> I'd expect that trying to dedup the strings would destroy all the
> available parallelism.

One could have a hybrid method: if the pointers are the same, the
strings surely are, and otherwise the strings can be compared as before.
Maybe a string that is detected as duplicate in the second case can be
easily reduced to the deduped version from that point onwards.
It would need to be tested how much that wins and loses in regards to
comparison time vs. parallelism and that sort of things.

-Olaf.
-- 
Olaf 'Rhialto' Seibert -- rhialto at falu dot nl
___  Anyone who is capable of getting themselves made President should on
\X/  no account be allowed to do the job.   --Douglas Adams, "THGTTG"


signature.asc
Description: PGP signature


Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread Christos Zoulas
In article <20190913180602.gb20...@netbsd.org>,
David Holland   wrote:
>On Wed, Sep 11, 2019 at 03:53:18PM +0700, Robert Elz wrote:
> > What's more interesting to me is to know just how many long names people
> > are seeing which are currently excluded from the cache, and would benefit
> > from the change - that is, what percentage of all lookups fail in the
> > cache for that reason, and do we have a histogram of the longer lengths
> > with their frequencies (that is, would simply making the cutoff a little
> > bigger improve things without requiring mem allocation to be added).
>
>The goal is to not arbitrarily exclude names from the cache at all,
>because fexecve is causing rumbles about doing significantly more
>reverse lookups.

I am going to run more tests with a bigger cache, but if you run:

cc https://www.netbsd.org/~christos/countlen.c
./a.out /usr/src/ /usr/obj/ /usr/xsrc/

you'll see that there are still lots of names now > 32, so perhaps
making NCHNAMLEN 40 makes more sense this days (rather than 31
which was chosen ~40 years ago). Seems that people like longer
names as time goes forward. Also the way the code is now we can
make NCHNAMLEN a variable and use sysctl to change it (flushing
the cache), for quicker experiments that don't need recompiling.

But now we are not excluding anything; it is just that names over
NCHNAMLEN use kmem_alloc instead of pooled storage...

Best,

christos



Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread David Holland
On Wed, Sep 11, 2019 at 03:53:18PM +0700, Robert Elz wrote:
 > What's more interesting to me is to know just how many long names people
 > are seeing which are currently excluded from the cache, and would benefit
 > from the change - that is, what percentage of all lookups fail in the
 > cache for that reason, and do we have a histogram of the longer lengths
 > with their frequencies (that is, would simply making the cutoff a little
 > bigger improve things without requiring mem allocation to be added).

The goal is to not arbitrarily exclude names from the cache at all,
because fexecve is causing rumbles about doing significantly more
reverse lookups.

-- 
David A. Holland
dholl...@netbsd.org


Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread David Holland
On Wed, Sep 11, 2019 at 09:43:18AM +0300, Jason Thorpe wrote:
 > > I'm confused; nothing in there should lead to duplicate entries...
 > 
 > Duplicate names != duplicate entries.
 > [...]
 > Another advantage of de-duping the name strings is that you can
 > compare names by testing pointer equivalence.

I'd expect that trying to dedup the strings would destroy all the
available parallelism.

-- 
David A. Holland
dholl...@netbsd.org


Re: NCHNAMLEN vnode cache limitation removal

2019-09-11 Thread Jason Thorpe



> On Sep 11, 2019, at 11:53 AM, Robert Elz  wrote:
> 
>Date:Wed, 11 Sep 2019 09:43:18 +0300
>From:Jason Thorpe 
>Message-ID:  
> 
>  | Another advantage of de-duping the name strings is that you can
>  | compare names by testing pointer equivalence.
> 
> That works only if string you're comparing against the cache entry
> is itself a cache entry, which would mean a whole new data struct
> to make lookup of those strings fast (as it is now, comparisons get
> a pointer to the string in the cache, and simply compare it with the
> lookup string .. only insert (which is relatively rare) would need to
> search the string cache to see if a name is already there, so that can
> use a slow algorithm, with no extra data struct to manage.

True, you'd have to "create a de-dup'd string" to compare it to other de-dup'd 
strings using the pointer-equivalence method.  There's probably a string length 
threshold where that makes sense to do, and below that where it doesn't.

-- thorpej



Re: NCHNAMLEN vnode cache limitation removal

2019-09-11 Thread Robert Elz
Date:Wed, 11 Sep 2019 09:43:18 +0300
From:Jason Thorpe 
Message-ID:  

  | Another advantage of de-duping the name strings is that you can
  | compare names by testing pointer equivalence.

That works only if string you're comparing against the cache entry
is itself a cache entry, which would mean a whole new data struct
to make lookup of those strings fast (as it is now, comparisons get
a pointer to the string in the cache, and simply compare it with the
lookup string .. only insert (which is relatively rare) would need to
search the string cache to see if a name is already there, so that can
use a slow algorithm, with no extra data struct to manage.

Af for namecache operation, the proposed changes would slow down insert/delete
of cache entries a little (even more for insert if de-dup is included, which
would also mean adding ref counts to the strings, or a garbage collector).
Lookups, which is what matters really, should be very little affected.

What's more interesting to me is to know just how many long names people
are seeing which are currently excluded from the cache, and would benefit
from the change - that is, what percentage of all lookups fail in the
cache for that reason, and do we have a histogram of the longer lengths
with their frequencies (that is, would simply making the cutoff a little
bigger improve things without requiring mem allocation to be added).

kre



Re: NCHNAMLEN vnode cache limitation removal

2019-09-11 Thread Jason Thorpe


> On Sep 11, 2019, at 8:18 AM, David Holland  wrote:
> 
> I'm confused; nothing in there should lead to duplicate entries...

Duplicate names != duplicate entries.

Consider the case:

bin/CVS
bin/cat/CVS
.
.
.
bin/sh/CVS
.
.
.

Distinct vnodes, with distinct parents, all having the same name.

Another advantage of de-duping the name strings is that you can compare names 
by testing pointer equivalence.

-- thorpej



Re: NCHNAMLEN vnode cache limitation removal

2019-09-10 Thread David Holland
On Wed, Sep 11, 2019 at 06:49:05AM +0300, Jason Thorpe wrote:
 > > On Sep 11, 2019, at 2:23 AM, Christos Zoulas  wrote:
 > > 
 > > Comments?
 > 
 > This looks good, and I think it would be fine for it to go in now.
 > However, I think we should probably instrument how many duplicate
 > names may end up in the name cache over the course of "normal"
 > operation (insert "standard" workloads here).  It may be worth
 > implementing a generic string cache to de-dup those names (which
 > has the secondary advantage of making the name cache entry itself
 > fixed-size again).

I'm confused; nothing in there should lead to duplicate entries...

(also it would be helpful to hear from someone who has an informed
opinion about how rearranging the structure is likely to affect cache
behavior, since my understanding is that namecache lookups are both
critical and sensitive.)


-- 
David A. Holland
dholl...@netbsd.org


Re: NCHNAMLEN vnode cache limitation removal

2019-09-10 Thread Jason Thorpe



> On Sep 11, 2019, at 2:23 AM, Christos Zoulas  wrote:
> 
> Comments?

This looks good, and I think it would be fine for it to go in now.  However, I 
think we should probably instrument how many duplicate names may end up in the 
name cache over the course of "normal" operation (insert "standard" workloads 
here).  It may be worth implementing a generic string cache to de-dup those 
names (which has the secondary advantage of making the name cache entry itself 
fixed-size again).

> 
> Thanks,
> 
> christos
> 
> Index: sys/namei.h
> ===
> RCS file: /cvsroot/src/sys/sys/namei.h,v
> retrieving revision 1.98
> diff -u -p -u -r1.98 namei.h
> --- sys/namei.h   3 Jun 2019 06:05:39 -   1.98
> +++ sys/namei.h   10 Sep 2019 23:20:48 -
> @@ -196,19 +196,9 @@ struct nameidata {
> #endif
> 
> /*
> + * Namecache entry.  
>  * This structure describes the elements in the cache of recent
> - * names looked up by namei. NCHNAMLEN is sized to make structure
> - * size a power of two to optimize allocations.  Minimum reasonable
> - * size is 15.
> - */
> -
> -#define  NCHNAMLEN   31  /* maximum name segment length we 
> bother with */
> -
> -/*
> - * Namecache entry.  This structure is arranged so that frequently
> - * accessed and mostly read-only data is toward the front, with
> - * infrequently accessed data and the lock towards the rear.  The
> - * lock is then more likely to be in a separate cache line.
> + * names looked up by namei.
>  *
>  * Locking rules:
>  *
> @@ -225,14 +215,14 @@ struct namecache {
>   struct  vnode *nc_dvp;  /* N vnode of parent of name */
>   struct  vnode *nc_vp;   /* N vnode the name refers to */
>   int nc_flags;   /* - copy of componentname ISWHITEOUT */
> - charnc_nlen;/* - length of name */
> - charnc_name[NCHNAMLEN]; /* - segment name */
>   void*nc_gcqueue;/* N queue for garbage collection */
> - TAILQ_ENTRY(namecache) nc_lru;  /* L psuedo-lru chain */
> + TAILQ_ENTRY(namecache) nc_lru;  /* L pseudo-lru chain */
>   LIST_ENTRY(namecache) nc_dvlist;/* L dvp's list of cache entries */
>   LIST_ENTRY(namecache) nc_vlist; /* L vp's list of cache entries */
>   kmutex_t nc_lock;   /*   lock on this entry */
>   int nc_hittime; /* N last time scored a hit */
> + u_short nc_nlen;/* - length of name */
> + charnc_name[0]; /* - segment name */
> };
> 
> #ifdef _KERNEL
> Index: sys/namei.src
> ===
> RCS file: /cvsroot/src/sys/sys/namei.src,v
> retrieving revision 1.42
> diff -u -p -u -r1.42 namei.src
> --- sys/namei.src 3 Jun 2019 06:04:21 -   1.42
> +++ sys/namei.src 10 Sep 2019 23:20:48 -
> @@ -188,19 +188,9 @@ NAMEIFL  PARAMASK0x02ee300   /* mask of pa
> #endif
> 
> /*
> + * Namecache entry. 
>  * This structure describes the elements in the cache of recent
> - * names looked up by namei. NCHNAMLEN is sized to make structure
> - * size a power of two to optimize allocations.  Minimum reasonable
> - * size is 15.
> - */
> -
> -#define  NCHNAMLEN   31  /* maximum name segment length we 
> bother with */
> -
> -/*
> - * Namecache entry.  This structure is arranged so that frequently
> - * accessed and mostly read-only data is toward the front, with
> - * infrequently accessed data and the lock towards the rear.  The
> - * lock is then more likely to be in a separate cache line.
> + * names looked up by namei.
>  *
>  * Locking rules:
>  *
> @@ -217,14 +207,14 @@ struct namecache {
>   struct  vnode *nc_dvp;  /* N vnode of parent of name */
>   struct  vnode *nc_vp;   /* N vnode the name refers to */
>   int nc_flags;   /* - copy of componentname ISWHITEOUT */
> - charnc_nlen;/* - length of name */
> - charnc_name[NCHNAMLEN]; /* - segment name */
>   void*nc_gcqueue;/* N queue for garbage collection */
>   TAILQ_ENTRY(namecache) nc_lru;  /* L psuedo-lru chain */
>   LIST_ENTRY(namecache) nc_dvlist;/* L dvp's list of cache entries */
>   LIST_ENTRY(namecache) nc_vlist; /* L vp's list of cache entries */
>   kmutex_t nc_lock;   /*   lock on this entry */
>   int nc_hittime; /* N last time scored a hit */
> + charnc_nlen;/* - length of name */
> + charnc_name[0]; /* - segment name */
> };
> 
> #ifdef _KERNEL
> Index: kern/vfs_cache.c
> ===
> RCS file: /cvsroot/src/sys/kern/vfs_cache.c,v
> retrieving revision 1.120
> diff -u -p -u -r1.120 vfs_cache.c
> --- kern/vfs_cache.c  18 Mar 2017 22:36:56 -  1.120
> +++ kern/vfs_cache.c  10 Sep 2019 23:20:48