Re: NCHNAMLEN vnode cache limitation removal
On Sat, Sep 14, 2019 at 09:51:12AM -0700, Jason Thorpe wrote: > >> Given the list of smart people I've seen discussing this, I'm > >> presumably just missing something, but what? > > > > Loonix. > > Linux isn't the only system that has such a capability. No, but I'm pretty sure it was the first to commit to it (was quite a ways back now) and it's where the assumption you can do this gets baked into dodgy software. -- David A. Holland dholl...@netbsd.org
Re: NCHNAMLEN vnode cache limitation removal
On 9/14/19, Mouse wrote: >> [...], because fexecve is causing rumbles about doing significantly >> more reverse lookups. > > Why is everyone so concerned about finding "the" name of an inode, or > indeed any name for an inode? As far as I can tell there has never > been any promise that any given inode has any names pointing to it, at > least not unless it's got no other references to it (in which case it > would be freed if it had no names). > > Given the list of smart people I've seen discussing this, I'm > presumably just missing something, but what? > I think an always working name resolution is nicer to users when it comes to diagnosing issues in userspace. In particular if someone spawns a new program, that program performs a lot of ops in given fd, you can check what file it is if name resolution works (or you have a facility similar to linux's /proc//fd). Otherwise you are left with dirty hacks. There is a very different angle to this though and it has to do with performance, especially in face of concurrent access. When doing SMP you want to avoid writes to shared areas as they induce caches misses on next access by other CPUs. If everyone accesses read-only they can keep doing it without interfering with others. If everyone writes to it, everyone else (minus 1) stalls waiting for their turn. Vast majority of all lookups only care about the last path component. Currently the kernel leapfrogs through all of them, e.g. consider the lookup of /foo/bar/baz/qux. Along the way it refs foo, locks foo, finds bar, references bar, locks bar, unlocks and unrefs foo... repeat that with s/bar/baz/ and s/foo/bar/. Note that's very simplified (e.g. i skipped the beginning of the lookup and other work like permission checking). Crossing filesystems is another hurdle. This is only to highlight the point below. That's a lot of work single-threaded which can be partially avoided in the common case, provided all the info is structured for it. But most importantly that's a lot of atomic ops on shared areas which severely limit your performance in case of concurrent access (especially on multi-socket systems). It's a long way to get to a point where you can do this in a write-free manner and FreeBSD is very early in there. If everything is cached and you can roll forward in the common case without refing/locking anything but the last component you win big time. (Of course you can't "just" do it, the leapfrogging is there for a reason but it can be worked out with tracking the state and having safe memory reclamation.) TL;DR it's minor quality of life improvement for users and a de facto mandatory part of VFS if it is to scale on big systems. -- Mateusz Guzik
Re: NCHNAMLEN vnode cache limitation removal
> On Sep 14, 2019, at 1:20 AM, David Holland wrote: > >> Given the list of smart people I've seen discussing this, I'm >> presumably just missing something, but what? > > Loonix. Linux isn't the only system that has such a capability. -- thorpej
Re: NCHNAMLEN vnode cache limitation removal
On Fri, Sep 13, 2019 at 07:49:54PM -0400, Mouse wrote: > > [...], because fexecve is causing rumbles about doing significantly > > more reverse lookups. > > Why is everyone so concerned about finding "the" name of an inode, or > indeed any name for an inode? As far as I can tell there has never > been any promise that any given inode has any names pointing to it, at > least not unless it's got no other references to it (in which case it > would be freed if it had no names). > > Given the list of smart people I've seen discussing this, I'm > presumably just missing something, but what? Loonix. -- David A. Holland dholl...@netbsd.org
Re: NCHNAMLEN vnode cache limitation removal
> [...], because fexecve is causing rumbles about doing significantly > more reverse lookups. Why is everyone so concerned about finding "the" name of an inode, or indeed any name for an inode? As far as I can tell there has never been any promise that any given inode has any names pointing to it, at least not unless it's got no other references to it (in which case it would be freed if it had no names). Given the list of smart people I've seen discussing this, I'm presumably just missing something, but what? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NCHNAMLEN vnode cache limitation removal
On Fri 13 Sep 2019 at 18:04:06 +, David Holland wrote: > On Wed, Sep 11, 2019 at 09:43:18AM +0300, Jason Thorpe wrote: > > Another advantage of de-duping the name strings is that you can > > compare names by testing pointer equivalence. > > I'd expect that trying to dedup the strings would destroy all the > available parallelism. One could have a hybrid method: if the pointers are the same, the strings surely are, and otherwise the strings can be compared as before. Maybe a string that is detected as duplicate in the second case can be easily reduced to the deduped version from that point onwards. It would need to be tested how much that wins and loses in regards to comparison time vs. parallelism and that sort of things. -Olaf. -- Olaf 'Rhialto' Seibert -- rhialto at falu dot nl ___ Anyone who is capable of getting themselves made President should on \X/ no account be allowed to do the job. --Douglas Adams, "THGTTG" signature.asc Description: PGP signature
Re: NCHNAMLEN vnode cache limitation removal
In article <20190913180602.gb20...@netbsd.org>, David Holland wrote: >On Wed, Sep 11, 2019 at 03:53:18PM +0700, Robert Elz wrote: > > What's more interesting to me is to know just how many long names people > > are seeing which are currently excluded from the cache, and would benefit > > from the change - that is, what percentage of all lookups fail in the > > cache for that reason, and do we have a histogram of the longer lengths > > with their frequencies (that is, would simply making the cutoff a little > > bigger improve things without requiring mem allocation to be added). > >The goal is to not arbitrarily exclude names from the cache at all, >because fexecve is causing rumbles about doing significantly more >reverse lookups. I am going to run more tests with a bigger cache, but if you run: cc https://www.netbsd.org/~christos/countlen.c ./a.out /usr/src/ /usr/obj/ /usr/xsrc/ you'll see that there are still lots of names now > 32, so perhaps making NCHNAMLEN 40 makes more sense this days (rather than 31 which was chosen ~40 years ago). Seems that people like longer names as time goes forward. Also the way the code is now we can make NCHNAMLEN a variable and use sysctl to change it (flushing the cache), for quicker experiments that don't need recompiling. But now we are not excluding anything; it is just that names over NCHNAMLEN use kmem_alloc instead of pooled storage... Best, christos
Re: NCHNAMLEN vnode cache limitation removal
On Wed, Sep 11, 2019 at 03:53:18PM +0700, Robert Elz wrote: > What's more interesting to me is to know just how many long names people > are seeing which are currently excluded from the cache, and would benefit > from the change - that is, what percentage of all lookups fail in the > cache for that reason, and do we have a histogram of the longer lengths > with their frequencies (that is, would simply making the cutoff a little > bigger improve things without requiring mem allocation to be added). The goal is to not arbitrarily exclude names from the cache at all, because fexecve is causing rumbles about doing significantly more reverse lookups. -- David A. Holland dholl...@netbsd.org
Re: NCHNAMLEN vnode cache limitation removal
On Wed, Sep 11, 2019 at 09:43:18AM +0300, Jason Thorpe wrote: > > I'm confused; nothing in there should lead to duplicate entries... > > Duplicate names != duplicate entries. > [...] > Another advantage of de-duping the name strings is that you can > compare names by testing pointer equivalence. I'd expect that trying to dedup the strings would destroy all the available parallelism. -- David A. Holland dholl...@netbsd.org
Re: NCHNAMLEN vnode cache limitation removal
> On Sep 11, 2019, at 11:53 AM, Robert Elz wrote: > >Date:Wed, 11 Sep 2019 09:43:18 +0300 >From:Jason Thorpe >Message-ID: > > | Another advantage of de-duping the name strings is that you can > | compare names by testing pointer equivalence. > > That works only if string you're comparing against the cache entry > is itself a cache entry, which would mean a whole new data struct > to make lookup of those strings fast (as it is now, comparisons get > a pointer to the string in the cache, and simply compare it with the > lookup string .. only insert (which is relatively rare) would need to > search the string cache to see if a name is already there, so that can > use a slow algorithm, with no extra data struct to manage. True, you'd have to "create a de-dup'd string" to compare it to other de-dup'd strings using the pointer-equivalence method. There's probably a string length threshold where that makes sense to do, and below that where it doesn't. -- thorpej
Re: NCHNAMLEN vnode cache limitation removal
Date:Wed, 11 Sep 2019 09:43:18 +0300 From:Jason Thorpe Message-ID: | Another advantage of de-duping the name strings is that you can | compare names by testing pointer equivalence. That works only if string you're comparing against the cache entry is itself a cache entry, which would mean a whole new data struct to make lookup of those strings fast (as it is now, comparisons get a pointer to the string in the cache, and simply compare it with the lookup string .. only insert (which is relatively rare) would need to search the string cache to see if a name is already there, so that can use a slow algorithm, with no extra data struct to manage. Af for namecache operation, the proposed changes would slow down insert/delete of cache entries a little (even more for insert if de-dup is included, which would also mean adding ref counts to the strings, or a garbage collector). Lookups, which is what matters really, should be very little affected. What's more interesting to me is to know just how many long names people are seeing which are currently excluded from the cache, and would benefit from the change - that is, what percentage of all lookups fail in the cache for that reason, and do we have a histogram of the longer lengths with their frequencies (that is, would simply making the cutoff a little bigger improve things without requiring mem allocation to be added). kre
Re: NCHNAMLEN vnode cache limitation removal
> On Sep 11, 2019, at 8:18 AM, David Holland wrote: > > I'm confused; nothing in there should lead to duplicate entries... Duplicate names != duplicate entries. Consider the case: bin/CVS bin/cat/CVS . . . bin/sh/CVS . . . Distinct vnodes, with distinct parents, all having the same name. Another advantage of de-duping the name strings is that you can compare names by testing pointer equivalence. -- thorpej
Re: NCHNAMLEN vnode cache limitation removal
On Wed, Sep 11, 2019 at 06:49:05AM +0300, Jason Thorpe wrote: > > On Sep 11, 2019, at 2:23 AM, Christos Zoulas wrote: > > > > Comments? > > This looks good, and I think it would be fine for it to go in now. > However, I think we should probably instrument how many duplicate > names may end up in the name cache over the course of "normal" > operation (insert "standard" workloads here). It may be worth > implementing a generic string cache to de-dup those names (which > has the secondary advantage of making the name cache entry itself > fixed-size again). I'm confused; nothing in there should lead to duplicate entries... (also it would be helpful to hear from someone who has an informed opinion about how rearranging the structure is likely to affect cache behavior, since my understanding is that namecache lookups are both critical and sensitive.) -- David A. Holland dholl...@netbsd.org
Re: NCHNAMLEN vnode cache limitation removal
> On Sep 11, 2019, at 2:23 AM, Christos Zoulas wrote: > > Comments? This looks good, and I think it would be fine for it to go in now. However, I think we should probably instrument how many duplicate names may end up in the name cache over the course of "normal" operation (insert "standard" workloads here). It may be worth implementing a generic string cache to de-dup those names (which has the secondary advantage of making the name cache entry itself fixed-size again). > > Thanks, > > christos > > Index: sys/namei.h > === > RCS file: /cvsroot/src/sys/sys/namei.h,v > retrieving revision 1.98 > diff -u -p -u -r1.98 namei.h > --- sys/namei.h 3 Jun 2019 06:05:39 - 1.98 > +++ sys/namei.h 10 Sep 2019 23:20:48 - > @@ -196,19 +196,9 @@ struct nameidata { > #endif > > /* > + * Namecache entry. > * This structure describes the elements in the cache of recent > - * names looked up by namei. NCHNAMLEN is sized to make structure > - * size a power of two to optimize allocations. Minimum reasonable > - * size is 15. > - */ > - > -#define NCHNAMLEN 31 /* maximum name segment length we > bother with */ > - > -/* > - * Namecache entry. This structure is arranged so that frequently > - * accessed and mostly read-only data is toward the front, with > - * infrequently accessed data and the lock towards the rear. The > - * lock is then more likely to be in a separate cache line. > + * names looked up by namei. > * > * Locking rules: > * > @@ -225,14 +215,14 @@ struct namecache { > struct vnode *nc_dvp; /* N vnode of parent of name */ > struct vnode *nc_vp; /* N vnode the name refers to */ > int nc_flags; /* - copy of componentname ISWHITEOUT */ > - charnc_nlen;/* - length of name */ > - charnc_name[NCHNAMLEN]; /* - segment name */ > void*nc_gcqueue;/* N queue for garbage collection */ > - TAILQ_ENTRY(namecache) nc_lru; /* L psuedo-lru chain */ > + TAILQ_ENTRY(namecache) nc_lru; /* L pseudo-lru chain */ > LIST_ENTRY(namecache) nc_dvlist;/* L dvp's list of cache entries */ > LIST_ENTRY(namecache) nc_vlist; /* L vp's list of cache entries */ > kmutex_t nc_lock; /* lock on this entry */ > int nc_hittime; /* N last time scored a hit */ > + u_short nc_nlen;/* - length of name */ > + charnc_name[0]; /* - segment name */ > }; > > #ifdef _KERNEL > Index: sys/namei.src > === > RCS file: /cvsroot/src/sys/sys/namei.src,v > retrieving revision 1.42 > diff -u -p -u -r1.42 namei.src > --- sys/namei.src 3 Jun 2019 06:04:21 - 1.42 > +++ sys/namei.src 10 Sep 2019 23:20:48 - > @@ -188,19 +188,9 @@ NAMEIFL PARAMASK0x02ee300 /* mask of pa > #endif > > /* > + * Namecache entry. > * This structure describes the elements in the cache of recent > - * names looked up by namei. NCHNAMLEN is sized to make structure > - * size a power of two to optimize allocations. Minimum reasonable > - * size is 15. > - */ > - > -#define NCHNAMLEN 31 /* maximum name segment length we > bother with */ > - > -/* > - * Namecache entry. This structure is arranged so that frequently > - * accessed and mostly read-only data is toward the front, with > - * infrequently accessed data and the lock towards the rear. The > - * lock is then more likely to be in a separate cache line. > + * names looked up by namei. > * > * Locking rules: > * > @@ -217,14 +207,14 @@ struct namecache { > struct vnode *nc_dvp; /* N vnode of parent of name */ > struct vnode *nc_vp; /* N vnode the name refers to */ > int nc_flags; /* - copy of componentname ISWHITEOUT */ > - charnc_nlen;/* - length of name */ > - charnc_name[NCHNAMLEN]; /* - segment name */ > void*nc_gcqueue;/* N queue for garbage collection */ > TAILQ_ENTRY(namecache) nc_lru; /* L psuedo-lru chain */ > LIST_ENTRY(namecache) nc_dvlist;/* L dvp's list of cache entries */ > LIST_ENTRY(namecache) nc_vlist; /* L vp's list of cache entries */ > kmutex_t nc_lock; /* lock on this entry */ > int nc_hittime; /* N last time scored a hit */ > + charnc_nlen;/* - length of name */ > + charnc_name[0]; /* - segment name */ > }; > > #ifdef _KERNEL > Index: kern/vfs_cache.c > === > RCS file: /cvsroot/src/sys/kern/vfs_cache.c,v > retrieving revision 1.120 > diff -u -p -u -r1.120 vfs_cache.c > --- kern/vfs_cache.c 18 Mar 2017 22:36:56 - 1.120 > +++ kern/vfs_cache.c 10 Sep 2019 23:20:48