[nfs-discuss] svc_cots_kdup no slots free

2006-06-20 Thread Robert Milkowski
Hello Spencer,

Monday, June 19, 2006, 5:47:21 PM, you wrote:

>> Hello Spencer,
>> 
>> Thursday, June 15, 2006, 11:48:57 PM, you wrote:
>> 
>> SS> On Thu, Robert Milkowski wrote:
>> >> Hello nfs-discuss,
>> >> 
>> >>   Sometimes on nfs servers I get below messages and then I have
>> >>   performance problems. What are they about and what can I do about
>> >>   this?
>> >> 
>> >>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots 
>> >> free
>> >>   last message repeated 700353 times
>> 
>> SS> This is an error from the NFS server when it attempts to place
>> SS> a request into the duplicate request cache and finds that
>> SS> all of the ones in the duplicate request table are "in progress".
>> 
>> SS> The maximum for the duplicate request cache is 1024.
>> 
>> SS> The most effective way to increase the value is /etc/system
>> SS> and the variable would be: rpcmod:cotsmaxdupreqs 
>> 
>> I set it to 8192 and see some problems with nfsd+zfs - like almost all
>> threads hanging in ZFS and no actual IOs are happening while I can
>> issue IOs to the same zfs filesystems locally (see nfsd threads hang
>> in ZFS on zfs-discuss list).
>> 
>> I did set rpcmod:cotsmaxdupreqs=8192
>>   rpcmod:maxdupreqs=8192
>> 
>> Maybe it's too high?

SS> Are you still receiving the original error message your reported
SS> after making these changes? (the "no slots free" error)

No more errors - thanks for hint.

SS> If not, then the problem is in the local filesystem.
SS> What you have done is just allowed the NFS server to use the
SS> threads that it can create and not reject with error the incoming
SS> requests.

ok, thank you.



-- 
Best regards,
 Robertmailto:rmilkowski at task.gda.pl
   http://milek.blogspot.com




[nfs-discuss] svc_cots_kdup no slots free

2006-06-19 Thread Spencer Shepler
> Hello Spencer,
> 
> Thursday, June 15, 2006, 11:48:57 PM, you wrote:
> 
> SS> On Thu, Robert Milkowski wrote:
> >> Hello nfs-discuss,
> >> 
> >>   Sometimes on nfs servers I get below messages and then I have
> >>   performance problems. What are they about and what can I do about
> >>   this?
> >> 
> >>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
> >>   last message repeated 700353 times
> 
> SS> This is an error from the NFS server when it attempts to place
> SS> a request into the duplicate request cache and finds that
> SS> all of the ones in the duplicate request table are "in progress".
> 
> SS> The maximum for the duplicate request cache is 1024.
> 
> SS> The most effective way to increase the value is /etc/system
> SS> and the variable would be: rpcmod:cotsmaxdupreqs 
> 
> I set it to 8192 and see some problems with nfsd+zfs - like almost all
> threads hanging in ZFS and no actual IOs are happening while I can
> issue IOs to the same zfs filesystems locally (see nfsd threads hang
> in ZFS on zfs-discuss list).
> 
> I did set rpcmod:cotsmaxdupreqs=8192
>   rpcmod:maxdupreqs=8192
> 
> Maybe it's too high?

Are you still receiving the original error message your reported
after making these changes? (the "no slots free" error)

If not, then the problem is in the local filesystem.
What you have done is just allowed the NFS server to use the
threads that it can create and not reject with error the incoming
requests.

Spencer




[nfs-discuss] svc_cots_kdup no slots free

2006-06-18 Thread Robert Milkowski
Hello Spencer,

Thursday, June 15, 2006, 11:48:57 PM, you wrote:

SS> On Thu, Robert Milkowski wrote:
>> Hello nfs-discuss,
>> 
>>   Sometimes on nfs servers I get below messages and then I have
>>   performance problems. What are they about and what can I do about
>>   this?
>> 
>>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
>>   last message repeated 700353 times

SS> This is an error from the NFS server when it attempts to place
SS> a request into the duplicate request cache and finds that
SS> all of the ones in the duplicate request table are "in progress".

SS> The maximum for the duplicate request cache is 1024.

SS> The most effective way to increase the value is /etc/system
SS> and the variable would be: rpcmod:cotsmaxdupreqs 

I set it to 8192 and see some problems with nfsd+zfs - like almost all
threads hanging in ZFS and no actual IOs are happening while I can
issue IOs to the same zfs filesystems locally (see nfsd threads hang
in ZFS on zfs-discuss list).

I did set rpcmod:cotsmaxdupreqs=8192
  rpcmod:maxdupreqs=8192

Maybe it's too high?

-- 
Best regards,
 Robertmailto:rmilkowski at task.gda.pl
   http://milek.blogspot.com




[nfs-discuss] svc_cots_kdup no slots free

2006-06-16 Thread Robert Milkowski
Hello Peter,

Friday, June 16, 2006, 11:33:01 AM, you wrote:

PH> FYI - this reminds me of a discussion I had once regarding bug 4321293
PH> (Unable to exit svc_cots_kdup() all cache marked DUP_INPROGRESS).

PH>   
PH> 

PH> That bug is fixed but someone thought they were still hitting it, turned
PH> out they were just hitting the limit you are seeing.

PH> The cache is dynamically sized up to the limit of cotsmaxdupreqs 
PH> (connection orientated, eg TCP) and maxdupreqs (connectionless, eg UDP).

PH> The current behaviour is the expected effect of the fix to bug 4321293
PH> in that previously it would 'soft hang', now it just logs the error and
PH> returns an error to the client.

PH> If the NFS server is stuck processing incoming requests so that it never
PH> replies then the incoming requests will fill up the dup request cache.
PH> In other words, if there's some other problem that is preventing the NFS
PH> server from replying to requests then you will also see this behaviour.
PH> In that case the dup syslog messages are a side effect of the underlying
PH> problem, ie the NFS server not being able to process requests at all, or
PH> at least fast enough.

Like hanging in ZFS due to txg throttling problem? :)
ok, I'm not sure if it's really the cause.


-- 
Best regards,
 Robertmailto:rmilkowski at task.gda.pl
   http://milek.blogspot.com




[nfs-discuss] svc_cots_kdup no slots free

2006-06-16 Thread Peter Harvey
FYI - this reminds me of a discussion I had once regarding bug 4321293 
(Unable to exit svc_cots_kdup() all cache marked DUP_INPROGRESS).

   

That bug is fixed but someone thought they were still hitting it, turned 
out they were just hitting the limit you are seeing.

The cache is dynamically sized up to the limit of cotsmaxdupreqs 
(connection orientated, eg TCP) and maxdupreqs (connectionless, eg UDP).

The current behaviour is the expected effect of the fix to bug 4321293 
in that previously it would 'soft hang', now it just logs the error and 
returns an error to the client.

If the NFS server is stuck processing incoming requests so that it never 
replies then the incoming requests will fill up the dup request cache. 
In other words, if there's some other problem that is preventing the NFS 
server from replying to requests then you will also see this behaviour. 
In that case the dup syslog messages are a side effect of the underlying 
problem, ie the NFS server not being able to process requests at all, or 
at least fast enough.

-- Peter



[nfs-discuss] svc_cots_kdup no slots free

2006-06-16 Thread Robert Milkowski
Hello Spencer,

Thursday, June 15, 2006, 11:48:57 PM, you wrote:

SS> On Thu, Robert Milkowski wrote:
>> Hello nfs-discuss,
>> 
>>   Sometimes on nfs servers I get below messages and then I have
>>   performance problems. What are they about and what can I do about
>>   this?
>> 
>>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
>>   last message repeated 700353 times

SS> This is an error from the NFS server when it attempts to place
SS> a request into the duplicate request cache and finds that
SS> all of the ones in the duplicate request table are "in progress".

SS> The maximum for the duplicate request cache is 1024.

SS> The most effective way to increase the value is /etc/system
SS> and the variable would be: rpcmod:cotsmaxdupreqs 

And when it finds that all requests are 'in progress' - then what?
Thank you for info.


btw: it looks like A LOT of messages are being sent in that case to
syslog and that is perhaps one of the things which kills performance
and makes whole server unresponsive.

-- 
Best regards,
 Robertmailto:rmilkowski at task.gda.pl
   http://milek.blogspot.com




[nfs-discuss] svc_cots_kdup no slots free

2006-06-15 Thread Robert Milkowski
Hello nfs-discuss,

  Sometimes on nfs servers I get below messages and then I have
  performance problems. What are they about and what can I do about
  this?

  rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
  last message repeated 700353 times

-- 
Best regards,
 Robert  mailto:rmilkowski at task.gda.pl
 http://milek.blogspot.com




[nfs-discuss] svc_cots_kdup no slots free

2006-06-15 Thread Robert Gordon

On Jun 15, 2006, at 5:32 PM, Robert Milkowski wrote:

> And when it finds that all requests are 'in progress' - then what?

When the non-idempotent reply can not be placed in the duplicate request
cache, we call svcerr_systemerr() which places SYSTEM_ERR in the reply
ar_stat; This has the effect to cause the client to re-drive the call,
in the hopes that the duplicate request cache congestion has passed.

> Thank you for info.
>
>
> btw: it looks like A LOT of messages are being sent in that case to
> syslog and that is perhaps one of the things which kills performance
> and makes whole server unresponsive.

The unresponsiveness is most likely to be due to the clients having
to re-drive the requests. Increasing the size of the duplicate request
cache as Spencer suggests and see if that helps.

-- Robert.



[nfs-discuss] svc_cots_kdup no slots free

2006-06-15 Thread Spencer Shepler
> 
> Thursday, June 15, 2006, 11:48:57 PM, you wrote:
> 
> SS> On Thu, Robert Milkowski wrote:
> >> Hello nfs-discuss,
> >> 
> >>   Sometimes on nfs servers I get below messages and then I have
> >>   performance problems. What are they about and what can I do about
> >>   this?
> >> 
> >>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
> >>   last message repeated 700353 times
> 
> SS> This is an error from the NFS server when it attempts to place
> SS> a request into the duplicate request cache and finds that
> SS> all of the ones in the duplicate request table are "in progress".
> 
> SS> The maximum for the duplicate request cache is 1024.
> 
> SS> The most effective way to increase the value is /etc/system
> SS> and the variable would be: rpcmod:cotsmaxdupreqs 
> 
> And when it finds that all requests are 'in progress' - then what?
> Thank you for info.
> 
> 
> btw: it looks like A LOT of messages are being sent in that case to
> syslog and that is perhaps one of the things which kills performance
> and makes whole server unresponsive.

The NFS server will return an error to the client and the client will
resend the request; yes, performance will suffer.

The duplicate request cache is only used for non-idempotent NFS
operations (things like create,remove,write).  Getattr and lookup
requests will not be affected by this so your server could have
more than 1024 in progress requests.  Server is very busy but
it may also indicate issues with the underlying filesystem.

So the number of duplicate request cache entries should at least
match the number of available threads in the NFS server.

Spencer



[nfs-discuss] svc_cots_kdup no slots free

2006-06-15 Thread Spencer Shepler
On Thu, Robert Milkowski wrote:
> Hello nfs-discuss,
> 
>   Sometimes on nfs servers I get below messages and then I have
>   performance problems. What are they about and what can I do about
>   this?
> 
>   rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free
>   last message repeated 700353 times

This is an error from the NFS server when it attempts to place
a request into the duplicate request cache and finds that
all of the ones in the duplicate request table are "in progress".

The maximum for the duplicate request cache is 1024.

The most effective way to increase the value is /etc/system
and the variable would be: rpcmod:cotsmaxdupreqs 

Spencer