Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-20 Thread Niels Kobschaetzki

> On 19. Apr 2018, at 00:07, Rainer Duffner  wrote:
> 
> 
> 
>> Am 17.04.2018 um 06:45 schrieb Niels Kobschätzki :
>> 
>>  solved now finally my problem after two weeks and it wasn't the NFS. I
>> just got derailed from the real solution again and again from some
>> people, thus I didn't look in the right place. The cache misses are gone
>> now, the application performs now faster than on the other servers.
> 
> 
> 
> OK, but what was it?

Miscommunication and a missing php71-opcache. Apparently php is doing then a 
lot of syscalls which lead to getattr-call which cannot be handled by the nfs 
cache. A problem for a lot of years now and it won’t be dealt with.

Niels

P.s. the application is now nearly twice as fast but needs half the RAM and 
produces half the load than on FBSD 10.3/php56
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-16 Thread Niels Kobschaetzki


On 04/14/2018 03:49 AM, Rick Macklem wrote:

> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
> top, where it calculates "timeo" from it.
> Running this hacked kernel might show you if either of these fields is bogus.
> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
> clause that increments "attrcache_misses", which is where the cache misses
> happen to see why it is missing the cache.)
> If you could do this for the 10.3 kernel as well, this might indicate why the
> miss rate has increased?

I just checked the code to see if I can figure out where exactly I have
to put the printf(). And then I saw that there are ifdefs for
NFS_ACDEBUG which seems to be a kernel option. When I add NFS_ACDEBUG in
the config-file for the kernel, the build fails with an

/usr/src/sys/amd64/conf/ACDEBUG: unknown option "NFS_ACDEBUG"

I looked in sysctl.h and there it isn't defined. Do I do something wrong
or did this sysctl-tunable got lost at some point in time?
Can I just use this code by removing the ifdef for getting information?

Sorry, my C is not really existent, thus I have to ask :/

The parts (except the part that looks at the sysctl looks like this):
#ifdef NFS_ACDEBUG
if (nfs_acdebug>1)
printf("ncl_getattrcache: initial timeo = %d\n", timeo);
#endif

……


#ifdef NFS_ACDEBUG
if (nfs_acdebug > 2)
printf("acregmin %d; acregmax %d; acdirmin %d; acdirmax
%d\n",
nmp->nm_acregmin, nmp->nm_acregmax,
nmp->nm_acdirmin, nmp->nm_acdirmax);

if (nfs_acdebug)
printf("ncl_getattrcache: age = %d; final timeo = %d\n",
(time_second - np->n_attrstamp), timeo);
#endif


I would remove the ifdefs and the "if (nfs_acdebug …)"

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Niels Kobschaetzki

> On 15. Apr 2018, at 01:18, Rick Macklem  wrote:
> 
> Niels Kobschätzki wrote:
>>> On 04/14/2018 03:49 AM, Rick Macklem wrote:
>>> Niels Kobschätzki wrote:
 sorry for the cross-posting but so far I had no real luck on the forum
 or on question, thus I want to try my luck here as well.
>>> I read email lists but don't do the other stuff, so I just saw this 
>>> yesterday.
>>> Short answer, I haven't a clue why cache hits rate would have changed.
>>> 
>>> The code that decides if there is a hit/miss for the attribute cache is in
>>> ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
>>> except the old code did a mtx_lock(), but I can't imagine how that
>>> would affect the code.
>>> 
>>> You might want to:
>>> # sysctl -a | fgrep vfs.nfs
>>> for both the 10.3 and 11.1 systems, to check if any defaults have somehow
>>> been changed. (I don't recall any being changed, but??)
>> 
>> I did that and there did nothing change.
>> 
>>> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
>>> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
>>> top, where it calculates "timeo" from it.
>>> Running this hacked kernel might show you if either of these fields is 
>>> bogus.
>>> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
>>> clause that increments "attrcache_misses", which is where the cache misses
>>> happen to see why it is missing the cache.)
>>> If you could do this for the 10.3 kernel as well, this might indicate why 
>>> the
>>> miss rate has increased?
>> 
>> I will do this next week. On monday we switch for other reasons to other
>> nfs-servers and when we see that they run stable, I will do this next.
> With a miss rate of 2.7%, I doubt printing the above will help. I thought
> you were seeing a high miss rate.

It is low but increased by nearly a factor of 1000 to before. I hope the print 
will help. Just a lot of grepping through wherever I can get this data. 

>> Btw. I calculated now the percentages. The old servers had a attr miss
>> rate of something like 0.004%, while the upgraded one has more like
>> 2.7%. This is till low from what I've read (I remember that you should
>> start adjusting acreg* when you hit more than 40% misses) but far higher
>> than before.
> You could try increasing acregmin, acregmax and see if the misses are reduced.
> (The only risk with increasing the cache timeout is that, if another client 
> changes
> the attributes, then the client will use stale ones for longer. Usually, this 
> doesn't
> cause serious problems.)

I tried that and it had exactly no effect

> To be honest, a Getattr RPC is pretty low overhead, so I doubt the increase
> to 2.7% will affect your application's performance, but it is interesting that
> it increased.

It is a website with quite some traffic handles by three webservers behind a 
pair of loadbalancers. 
We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
Google et al doesn’t like it at all) after upgrading to 11.1 with a combined 
upgrade to php7.1. On another server without NFS that upgrade improved 
performance considerably (I was told ca 30% by the front end-dev)

> You might also try increasing acdirmin, acdirmax in case it is the directory
> attributes that are having cache misses.

I did that, too

> Oh, and check that your time of day clocks are in sync with the server,
> since the caches are time based, since there is no cache coherency protocol
> in NFS.

I checked that. All three frontends are using the same server for ntp

Thanks so far,

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"