Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-20 Thread Niels Kobschaetzki

> On 19. Apr 2018, at 00:07, Rainer Duffner  wrote:
> 
> 
> 
>> Am 17.04.2018 um 06:45 schrieb Niels Kobschätzki :
>> 
>>  solved now finally my problem after two weeks and it wasn't the NFS. I
>> just got derailed from the real solution again and again from some
>> people, thus I didn't look in the right place. The cache misses are gone
>> now, the application performs now faster than on the other servers.
> 
> 
> 
> OK, but what was it?

Miscommunication and a missing php71-opcache. Apparently php is doing then a 
lot of syscalls which lead to getattr-call which cannot be handled by the nfs 
cache. A problem for a lot of years now and it won’t be dealt with.

Niels

P.s. the application is now nearly twice as fast but needs half the RAM and 
produces half the load than on FBSD 10.3/php56
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-18 Thread Rainer Duffner


> Am 17.04.2018 um 06:45 schrieb Niels Kobschätzki :
> 
>  solved now finally my problem after two weeks and it wasn't the NFS. I
> just got derailed from the real solution again and again from some
> people, thus I didn't look in the right place. The cache misses are gone
> now, the application performs now faster than on the other servers.



OK, but what was it?






___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-18 Thread Rick Macklem
Niels Kobschätzki wrote:
[stuff snipped]
>I solved now finally my problem after two weeks and it wasn't the NFS. I
>just got derailed from the real solution again and again from some
>people, thus I didn't look in the right place. The cache misses are gone
>now, the application performs now faster than on the other servers.
Good work. Btw, that was why I suggested running the new kernel on a
server with the old userland. It would have isolated out any userland 
differences,
and hopefully what was causing the problem.

Glad to hear NFS isn't the culprit, rick

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-16 Thread Niels Kobschätzki
On 04/17/2018 01:03 AM, Rick Macklem wrote:
> Niels Kobschaetzki wrote:
> [stuff smipped]
>> I just checked the code to see if I can figure out where exactly I have
>> to put the printf(). And then I saw that there are ifdefs for
>> NFS_ACDEBUG which seems to be a kernel option. When I add NFS_ACDEBUG in
>> the config-file for the kernel, the build fails with an
> I don't have sources handy right now, but you can probably just put a line
> like:
> #define NFS_ACDEBUG 1
> at the top of the file /usr/src/sys/fs/nfsclient/nfs_clsubs.c

ok
> After building/booting the kernel "sysctl -a" should have a
> vfs.nfs.acdebug
> in the list. Set it to "1" to get the basic timeout info.
> 
>> /usr/src/sys/amd64/conf/ACDEBUG: unknown option "NFS_ACDEBUG"
>>
>> I looked in sysctl.h and there it isn't defined. Do I do something wrong
>> or did this sysctl-tunable got lost at some point in time?
>> Can I just use this code by removing the ifdef for getting information?
>>
>> Sorry, my C is not really existent, thus I have to ask :/


>> I would remove the ifdefs and the "if (nfs_acdebug …)"
> This would work, too, rick

That worked, I had the kernel running yesterday.

I solved now finally my problem after two weeks and it wasn't the NFS. I
just got derailed from the real solution again and again from some
people, thus I didn't look in the right place. The cache misses are gone
now, the application performs now faster than on the other servers.

Thanks so so much for your help.

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-16 Thread Rick Macklem
Niels Kobschaetzki wrote:
[stuff smipped]
>I just checked the code to see if I can figure out where exactly I have
>to put the printf(). And then I saw that there are ifdefs for
>NFS_ACDEBUG which seems to be a kernel option. When I add NFS_ACDEBUG in
>the config-file for the kernel, the build fails with an
I don't have sources handy right now, but you can probably just put a line
like:
#define NFS_ACDEBUG 1
at the top of the file /usr/src/sys/fs/nfsclient/nfs_clsubs.c

After building/booting the kernel "sysctl -a" should have a
vfs.nfs.acdebug
in the list. Set it to "1" to get the basic timeout info.

>/usr/src/sys/amd64/conf/ACDEBUG: unknown option "NFS_ACDEBUG"
>
>I looked in sysctl.h and there it isn't defined. Do I do something wrong
>or did this sysctl-tunable got lost at some point in time?
>Can I just use this code by removing the ifdef for getting information?
>
>Sorry, my C is not really existent, thus I have to ask :/
>
>The parts (except the part that looks at the sysctl looks like this):
>#ifdef NFS_ACDEBUG
>if (nfs_acdebug>1)
>   printf("ncl_getattrcache: initial timeo = %d\n", timeo);
>#endif
>
>……
>
>
>#ifdef NFS_ACDEBUG
>if (nfs_acdebug > 2)
>printf("acregmin %d; acregmax %d; acdirmin %d; acdirmax
>%d\n",
>nmp->nm_acregmin, nmp->nm_acregmax,
>nmp->nm_acdirmin, nmp->nm_acdirmax);
>
>if (nfs_acdebug)
>printf("ncl_getattrcache: age = %d; final timeo = %d\n",
>(time_second - np->n_attrstamp), timeo);
>#endif
>
>
>I would remove the ifdefs and the "if (nfs_acdebug …)"
This would work, too, rick
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-16 Thread Niels Kobschaetzki


On 04/14/2018 03:49 AM, Rick Macklem wrote:

> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
> top, where it calculates "timeo" from it.
> Running this hacked kernel might show you if either of these fields is bogus.
> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
> clause that increments "attrcache_misses", which is where the cache misses
> happen to see why it is missing the cache.)
> If you could do this for the 10.3 kernel as well, this might indicate why the
> miss rate has increased?

I just checked the code to see if I can figure out where exactly I have
to put the printf(). And then I saw that there are ifdefs for
NFS_ACDEBUG which seems to be a kernel option. When I add NFS_ACDEBUG in
the config-file for the kernel, the build fails with an

/usr/src/sys/amd64/conf/ACDEBUG: unknown option "NFS_ACDEBUG"

I looked in sysctl.h and there it isn't defined. Do I do something wrong
or did this sysctl-tunable got lost at some point in time?
Can I just use this code by removing the ifdef for getting information?

Sorry, my C is not really existent, thus I have to ask :/

The parts (except the part that looks at the sysctl looks like this):
#ifdef NFS_ACDEBUG
if (nfs_acdebug>1)
printf("ncl_getattrcache: initial timeo = %d\n", timeo);
#endif

……


#ifdef NFS_ACDEBUG
if (nfs_acdebug > 2)
printf("acregmin %d; acregmax %d; acdirmin %d; acdirmax
%d\n",
nmp->nm_acregmin, nmp->nm_acregmax,
nmp->nm_acdirmin, nmp->nm_acdirmax);

if (nfs_acdebug)
printf("ncl_getattrcache: age = %d; final timeo = %d\n",
(time_second - np->n_attrstamp), timeo);
#endif


I would remove the ifdefs and the "if (nfs_acdebug …)"

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Eugene Grosbein
15.04.2018 21:19, Rodney W. Grimes wrote:

 It is a website with quite some traffic handles by three webservers behind 
 a pair >of loadbalancers.
 We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
 >Google et al doesn?t like it at all) after upgrading to 11.1 with a 
 combined upgrade >to php7.1. On another server without NFS that upgrade 
 improved performance >considerably (I was told ca 30% by the front end-dev)
>>> One thing you could try is booting the 11.1 kernel on an 10.3 system. Newer
>>> FreeBSD kernels should work with older userland.
>>
>> Though one should remember that some important system utilities
>> may and probably will not work with newer kernel, like /sbin/ipfw, route,
>> ifconfig, netstat etc.
> 
> I thought that as long as the newer kernel has the right
> COMPAT_FREEBSD10 compiled in that all this stuff should work.
> Am I miss understanding this kernel option?

COMPAT_FREEBSD10 does not cover all cases (in-kernel structures etc.)
and bugs can happen, too. You won't be happy if you discover
similar case in /sbin/ipfw or ifconfig as this combination (old ipfw+new kernel)
is not something we test thoroughly.


___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Rodney W. Grimes
[ Charset windows-1252 unsupported, converting... ]
> 15.04.2018 19:58, Rick Macklem wrote:
> 
> > Niels Kobschaetzki wrote:
> > [stuff snipped]
> >> It is a website with quite some traffic handles by three webservers behind 
> >> a pair >of loadbalancers.
> >> We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
> >> >Google et al doesn?t like it at all) after upgrading to 11.1 with a 
> >> combined upgrade >to php7.1. On another server without NFS that upgrade 
> >> improved performance >considerably (I was told ca 30% by the front end-dev)
> > One thing you could try is booting the 11.1 kernel on an 10.3 system. Newer
> > FreeBSD kernels should work with older userland.
> 
> Though one should remember that some important system utilities
> may and probably will not work with newer kernel, like /sbin/ipfw, route,
> ifconfig, netstat etc.

I thought that as long as the newer kernel has the right
COMPAT_FREEBSD10 compiled in that all this stuff should work.
Am I miss understanding this kernel option?

> 
> > This would tell you if it is kernel changes or userland changes that are 
> > causing
> > the higher miss rate.
> 

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Eugene Grosbein
15.04.2018 19:58, Rick Macklem wrote:

> Niels Kobschaetzki wrote:
> [stuff snipped]
>> It is a website with quite some traffic handles by three webservers behind a 
>> pair >of loadbalancers.
>> We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
>> >Google et al doesn’t like it at all) after upgrading to 11.1 with a 
>> combined upgrade >to php7.1. On another server without NFS that upgrade 
>> improved performance >considerably (I was told ca 30% by the front end-dev)
> One thing you could try is booting the 11.1 kernel on an 10.3 system. Newer
> FreeBSD kernels should work with older userland.

Though one should remember that some important system utilities
may and probably will not work with newer kernel, like /sbin/ipfw, route,
ifconfig, netstat etc.

> This would tell you if it is kernel changes or userland changes that are 
> causing
> the higher miss rate.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Rick Macklem
Niels Kobschaetzki wrote:
[stuff snipped]
>It is a website with quite some traffic handles by three webservers behind a 
>pair >of loadbalancers.
>We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
>>Google et al doesn’t like it at all) after upgrading to 11.1 with a combined 
>upgrade >to php7.1. On another server without NFS that upgrade improved 
>performance >considerably (I was told ca 30% by the front end-dev)
One thing you could try is booting the 11.1 kernel on an 10.3 system. Newer
FreeBSD kernels should work with older userland.
This would tell you if it is kernel changes or userland changes that are causing
the higher miss rate.

Good luck with it, rick
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-15 Thread Niels Kobschaetzki

> On 15. Apr 2018, at 01:18, Rick Macklem  wrote:
> 
> Niels Kobschätzki wrote:
>>> On 04/14/2018 03:49 AM, Rick Macklem wrote:
>>> Niels Kobschätzki wrote:
 sorry for the cross-posting but so far I had no real luck on the forum
 or on question, thus I want to try my luck here as well.
>>> I read email lists but don't do the other stuff, so I just saw this 
>>> yesterday.
>>> Short answer, I haven't a clue why cache hits rate would have changed.
>>> 
>>> The code that decides if there is a hit/miss for the attribute cache is in
>>> ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
>>> except the old code did a mtx_lock(), but I can't imagine how that
>>> would affect the code.
>>> 
>>> You might want to:
>>> # sysctl -a | fgrep vfs.nfs
>>> for both the 10.3 and 11.1 systems, to check if any defaults have somehow
>>> been changed. (I don't recall any being changed, but??)
>> 
>> I did that and there did nothing change.
>> 
>>> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
>>> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
>>> top, where it calculates "timeo" from it.
>>> Running this hacked kernel might show you if either of these fields is 
>>> bogus.
>>> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
>>> clause that increments "attrcache_misses", which is where the cache misses
>>> happen to see why it is missing the cache.)
>>> If you could do this for the 10.3 kernel as well, this might indicate why 
>>> the
>>> miss rate has increased?
>> 
>> I will do this next week. On monday we switch for other reasons to other
>> nfs-servers and when we see that they run stable, I will do this next.
> With a miss rate of 2.7%, I doubt printing the above will help. I thought
> you were seeing a high miss rate.

It is low but increased by nearly a factor of 1000 to before. I hope the print 
will help. Just a lot of grepping through wherever I can get this data. 

>> Btw. I calculated now the percentages. The old servers had a attr miss
>> rate of something like 0.004%, while the upgraded one has more like
>> 2.7%. This is till low from what I've read (I remember that you should
>> start adjusting acreg* when you hit more than 40% misses) but far higher
>> than before.
> You could try increasing acregmin, acregmax and see if the misses are reduced.
> (The only risk with increasing the cache timeout is that, if another client 
> changes
> the attributes, then the client will use stale ones for longer. Usually, this 
> doesn't
> cause serious problems.)

I tried that and it had exactly no effect

> To be honest, a Getattr RPC is pretty low overhead, so I doubt the increase
> to 2.7% will affect your application's performance, but it is interesting that
> it increased.

It is a website with quite some traffic handles by three webservers behind a 
pair of loadbalancers. 
We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but 
Google et al doesn’t like it at all) after upgrading to 11.1 with a combined 
upgrade to php7.1. On another server without NFS that upgrade improved 
performance considerably (I was told ca 30% by the front end-dev)

> You might also try increasing acdirmin, acdirmax in case it is the directory
> attributes that are having cache misses.

I did that, too

> Oh, and check that your time of day clocks are in sync with the server,
> since the caches are time based, since there is no cache coherency protocol
> in NFS.

I checked that. All three frontends are using the same server for ntp

Thanks so far,

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-14 Thread Rick Macklem
Niels Kobschätzki wrote:
>On 04/14/2018 03:49 AM, Rick Macklem wrote:
>> Niels Kobschätzki wrote:
>>> sorry for the cross-posting but so far I had no real luck on the forum
>>> or on question, thus I want to try my luck here as well.
>> I read email lists but don't do the other stuff, so I just saw this 
>> yesterday.
>> Short answer, I haven't a clue why cache hits rate would have changed.
>>
>> The code that decides if there is a hit/miss for the attribute cache is in
>> ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
>> except the old code did a mtx_lock(), but I can't imagine how that
>> would affect the code.
>>
>> You might want to:
>> # sysctl -a | fgrep vfs.nfs
>> for both the 10.3 and 11.1 systems, to check if any defaults have somehow
>> been changed. (I don't recall any being changed, but??)
>
>I did that and there did nothing change.
>
>> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
>> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
>> top, where it calculates "timeo" from it.
>> Running this hacked kernel might show you if either of these fields is bogus.
>> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
>> clause that increments "attrcache_misses", which is where the cache misses
>> happen to see why it is missing the cache.)
>> If you could do this for the 10.3 kernel as well, this might indicate why the
>> miss rate has increased?
>
>I will do this next week. On monday we switch for other reasons to other
>nfs-servers and when we see that they run stable, I will do this next.
With a miss rate of 2.7%, I doubt printing the above will help. I thought
you were seeing a high miss rate.

>Btw. I calculated now the percentages. The old servers had a attr miss
>rate of something like 0.004%, while the upgraded one has more like
>2.7%. This is till low from what I've read (I remember that you should
>start adjusting acreg* when you hit more than 40% misses) but far higher
>than before.
You could try increasing acregmin, acregmax and see if the misses are reduced.
(The only risk with increasing the cache timeout is that, if another client 
changes
 the attributes, then the client will use stale ones for longer. Usually, this 
doesn't
 cause serious problems.)
To be honest, a Getattr RPC is pretty low overhead, so I doubt the increase
to 2.7% will affect your application's performance, but it is interesting that
it increased.

You might also try increasing acdirmin, acdirmax in case it is the directory
attributes that are having cache misses.

Oh, and check that your time of day clocks are in sync with the server,
since the caches are time based, since there is no cache coherency protocol
in NFS.
[good stuff snipped]
rick
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-13 Thread Niels Kobschätzki
On 04/14/2018 03:49 AM, Rick Macklem wrote:
> Niels Kobschätzki wrote:
>> sorry for the cross-posting but so far I had no real luck on the forum
>> or on question, thus I want to try my luck here as well.
> I read email lists but don't do the other stuff, so I just saw this yesterday.
> Short answer, I haven't a clue why cache hits rate would have changed.
> 
> The code that decides if there is a hit/miss for the attribute cache is in
> ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
> except the old code did a mtx_lock(), but I can't imagine how that
> would affect the code.
> 
> You might want to:
> # sysctl -a | fgrep vfs.nfs
> for both the 10.3 and 11.1 systems, to check if any defaults have somehow
> been changed. (I don't recall any being changed, but??)

I did that and there did nothing change.

> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
> top, where it calculates "timeo" from it.
> Running this hacked kernel might show you if either of these fields is bogus.
> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
> clause that increments "attrcache_misses", which is where the cache misses
> happen to see why it is missing the cache.)
> If you could do this for the 10.3 kernel as well, this might indicate why the
> miss rate has increased?

I will do this next week. On monday we switch for other reasons to other
nfs-servers and when we see that they run stable, I will do this next.

Btw. I calculated now the percentages. The old servers had a attr miss
rate of something like 0.004%, while the upgraded one has more like
2.7%. This is till low from what I've read (I remember that you should
start adjusting acreg* when you hit more than 40% misses) but far higher
than before.

nfsstat -c for one of the working servers looks like this (I did a -cz
before to reset it and did this a couple of seconds later):
Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW Hits
Misses
 10085375   255   9163995   577   540 0 0
 0
BioRLHitsMisses BioD HitsMisses DirE HitsMisses Accs Hits
Misses
 1380 0 0 0 0 0   9169427
   277

and for the non-working server:
Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW Hits
Misses
  1606365 20647   1418205   239   581 0 0
 0
BioRLHitsMisses BioD HitsMisses DirE HitsMisses Accs Hits
Misses
  895 0 0 0 0 0   1439080
   337


>> I upgraded a machine from 10.3-Prerelease (custom kernel with
>> tcp_fastopen added) to 11.1-Release (standard kernel) with
>> freebsd-update. I have two other machines that are still on
>> 10.3-Prerelease. Those machines mount an NFS-export from a
>> Linux-NFS-server and use NFSv3. The machine that got upgraded shows now
>> far more cache misses for getattr than on the 10.3-machines (we talk a
>> factor of 100) in munin. munin also shows a lot more cache-misses for
>> other metrics like biow, biorl, biod (where can I find what those
>> metrics mean…currently I have not even an understanding what these are)
>> etc.
>>
>> Can anybody help me how I can debug this problem or has an idea what
>> could cause the problem? The result of this behavior is that this
>> machine shows a lower performance than the others and I cannot upgrade
>> other machines before I didn't fix this bug.
> I haven't run a 10.x system in quite a while. When I get home in a few days,
> I might be able to reproduce this. If I can. I can poke at it, but it would 
> be at
> least a week before I might have an answer and I may not figure it out for a
> long time.

Ok, thanks a lot. That would be great.

Niels
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-13 Thread Rick Macklem
Niels Kobschätzki wrote:
>sorry for the cross-posting but so far I had no real luck on the forum
>or on question, thus I want to try my luck here as well.
I read email lists but don't do the other stuff, so I just saw this yesterday.
Short answer, I haven't a clue why cache hits rate would have changed.

The code that decides if there is a hit/miss for the attribute cache is in
ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
except the old code did a mtx_lock(), but I can't imagine how that
would affect the code.

You might want to:
# sysctl -a | fgrep vfs.nfs
for both the 10.3 and 11.1 systems, to check if any defaults have somehow
been changed. (I don't recall any being changed, but??)

If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
top, where it calculates "timeo" from it.
Running this hacked kernel might show you if either of these fields is bogus.
(You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
clause that increments "attrcache_misses", which is where the cache misses
happen to see why it is missing the cache.)
If you could do this for the 10.3 kernel as well, this might indicate why the
miss rate has increased?

>I upgraded a machine from 10.3-Prerelease (custom kernel with
>tcp_fastopen added) to 11.1-Release (standard kernel) with
>freebsd-update. I have two other machines that are still on
>10.3-Prerelease. Those machines mount an NFS-export from a
>Linux-NFS-server and use NFSv3. The machine that got upgraded shows now
>far more cache misses for getattr than on the 10.3-machines (we talk a
>factor of 100) in munin. munin also shows a lot more cache-misses for
>other metrics like biow, biorl, biod (where can I find what those
>metrics mean…currently I have not even an understanding what these are)
>etc.
>
>Can anybody help me how I can debug this problem or has an idea what
>could cause the problem? The result of this behavior is that this
>machine shows a lower performance than the others and I cannot upgrade
>other machines before I didn't fix this bug.
I haven't run a 10.x system in quite a while. When I get home in a few days,
I might be able to reproduce this. If I can. I can poke at it, but it would be 
at
least a week before I might have an answer and I may not figure it out for a
long time.

rick
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release

2018-04-12 Thread Niels Kobschätzki
Hi,

sorry for the cross-posting but so far I had no real luck on the forum
or on question, thus I want to try my luck here as well.

I upgraded a machine from 10.3-Prerelease (custom kernel with
tcp_fastopen added) to 11.1-Release (standard kernel) with
freebsd-update. I have two other machines that are still on
10.3-Prerelease. Those machines mount an NFS-export from a
Linux-NFS-server and use NFSv3. The machine that got upgraded shows now
far more cache misses for getattr than on the 10.3-machines (we talk a
factor of 100) in munin. munin also shows a lot more cache-misses for
other metrics like biow, biorl, biod (where can I find what those
metrics mean…currently I have not even an understanding what these are)
etc. 

Can anybody help me how I can debug this problem or has an idea what
could cause the problem? The result of this behavior is that this
machine shows a lower performance than the others and I cannot upgrade
other machines before I didn't fix this bug.

Thanks,

Niels


___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"