[CentOS] Intermittent NFS problems with NetApp server

2009-03-11 Thread Alfred von Campe
I've been experiencing some intermittent problems accessing at NetApp  
server via NFS and automount.  I'm running CentOS 5.2 (fully updated)  
on all my servers and workstations.  Usually, everything is working  
just fine, when suddenly we get the following error:

   /bin/sh: /home/epd/srcref/swtools/Crontabs/ 
run_release_requests.sh: Permission denied

This is actually an email from cron because we try to run that shell  
script every minute (yes, the crontab entry is * * * * * /home/epd/ 
srcref/swtools/Crontabs/run_release_requests.sh), and /home/epd is an  
automounted directory.  Here is its map entry:

   epd -rw,nointr,rsize=32768,wsize=32768 XX:/epd

When this is happening, other users can successfully access that  
directory on the server.  The directory is actually mounted  
correctly, and unmounting doesn't fix the issue.  Furthermore, the  
same user that is being denied access, can successfully access that  
directory on a different server.  The problem usually lasts about 20  
minutes and then resolves itself.  We have been pulling our hair out  
trying to debug this problem, because it's intermittent and the debug  
window is fairly short.

Recently we have been getting help from one of the NetApp admins, and  
he ran a command on the NetApp that produced the following warning:

   The TCP receive window advertised by NFS client XXX is 5888.
   This is less than the recommended value of 32768 bytes.
   You should increase the TCP receive buffer size for NFS on the  
client.

Some googling around got me to check these values for TCP:

   # sysctl net.ipv4.tcp_mem
   net.ipv4.tcp_mem = 98304131072  196608
   # sysctl net.ipv4.tcp_rmem
   net.ipv4.tcp_rmem = 409687380   4194304
   # sysctl net.ipv4.tcp_wmem
   net.ipv4.tcp_wmem = 409616384   4194304

So these seem fine to me (i.e., the max is greater than 32768).  Is  
there an NFS (as opposed to TCP) setting I should be tweaking?  Any  
ideas why the NetApp is issuing those warnings?  Any other  
suggestions on how to debug this problem?

Thanks,
Alfred

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intermittent NFS problems with NetApp server

2009-03-11 Thread Louis Lagendijk
On Wed, 2009-03-11 at 17:23 -0400, Alfred von Campe wrote:

 
# sysctl net.ipv4.tcp_mem
net.ipv4.tcp_mem = 98304131072  196608
# sysctl net.ipv4.tcp_rmem
net.ipv4.tcp_rmem = 409687380   4194304
# sysctl net.ipv4.tcp_wmem
net.ipv4.tcp_wmem = 409616384   4194304
 
 So these seem fine to me (i.e., the max is greater than 32768).  Is  
 there an NFS (as opposed to TCP) setting I should be tweaking?  Any  
 ideas why the NetApp is issuing those warnings?  Any other  
 suggestions on how to debug this problem?
 
man nfs
man mount.nfs
cat /proc/mounts

Louis

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intermittent NFS problems with NetApp server

2009-03-11 Thread Ray Van Dolson
snip

 So these seem fine to me (i.e., the max is greater than 32768).  Is  
 there an NFS (as opposed to TCP) setting I should be tweaking?  Any  
 ideas why the NetApp is issuing those warnings?  Any other  
 suggestions on how to debug this problem?

Sounds like a very interesting problem.  The only time I've gotten such
errors have been NFSv4 issues between Linux and Solaris hosts, never
with a NetApp.

You might try asking on the linux-nfs[1] list as well as the
toasters[2] list.  I'd be interested to hera what you come up with.
Very strange symptoms though.  Are you using NFS over TCP or UDP?  It
seems like one side is attempting to use a stale session...

I've always found NFS stuff like this very difficult to troubleshoot.
If you can reproduce the problem on demand maybe you could get a packet
dump right when the issue begins...

Ray

[1]: http://vger.kernel.org/majordomo-info.html
[2]: http://toasters.mathworks.com/toasters.html
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intermittent NFS problems with NetApp server

2009-03-11 Thread Ross Walker
On Mar 11, 2009, at 5:23 PM, Alfred von Campe alf...@von-campe.com  
wrote:

 I've been experiencing some intermittent problems accessing at NetApp
 server via NFS and automount.  I'm running CentOS 5.2 (fully updated)
 on all my servers and workstations.  Usually, everything is working
 just fine, when suddenly we get the following error:

   /bin/sh: /home/epd/srcref/swtools/Crontabs/
 run_release_requests.sh: Permission denied

 This is actually an email from cron because we try to run that shell
 script every minute (yes, the crontab entry is * * * * * /home/epd/
 srcref/swtools/Crontabs/run_release_requests.sh), and /home/epd is an
 automounted directory.  Here is its map entry:

   epd -rw,nointr,rsize=32768,wsize=32768 XX:/epd

 When this is happening, other users can successfully access that
 directory on the server.  The directory is actually mounted
 correctly, and unmounting doesn't fix the issue.  Furthermore, the
 same user that is being denied access, can successfully access that
 directory on a different server.  The problem usually lasts about 20
 minutes and then resolves itself.  We have been pulling our hair out
 trying to debug this problem, because it's intermittent and the debug
 window is fairly short.

 Recently we have been getting help from one of the NetApp admins, and
 he ran a command on the NetApp that produced the following warning:

   The TCP receive window advertised by NFS client XXX is 5888.
   This is less than the recommended value of 32768 bytes.
   You should increase the TCP receive buffer size for NFS on the
 client.

 Some googling around got me to check these values for TCP:

   # sysctl net.ipv4.tcp_mem
   net.ipv4.tcp_mem = 98304131072  196608
   # sysctl net.ipv4.tcp_rmem
   net.ipv4.tcp_rmem = 409687380   4194304
   # sysctl net.ipv4.tcp_wmem
   net.ipv4.tcp_wmem = 409616384   4194304

 So these seem fine to me (i.e., the max is greater than 32768).  Is
 there an NFS (as opposed to TCP) setting I should be tweaking?  Any
 ideas why the NetApp is issuing those warnings?  Any other
 suggestions on how to debug this problem?

Run a headers only tcpdump of the NFS mount from mount to when the  
problem occurs, then use wireshark to analyze it.

Maybe page cache is putting too much pressure on tcp buffering so you  
need to increase the minimum buffer size?

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos