Hi Thomas,
I've been meaning to reply, but life's been rather hectic at the mo.
On Wed, 5 Jun 2002, Thomas McLaughlin wrote:
> I am getting the following error while trying to copy a file (186k) to
> an NFS mounted filesystem
> cp: closing `/jupiter/public/nfs.ps': Input/output error
Input/output error message is a catch-all, you have to look at
kernel-level output for what's actually happening.
You can turn on NFS debugging with some proc magic:
echo 1 > /proc/sys/sunrpc/nfs_debug
for client side, and:
echo 1 > /proc/sys/sunrpc/nfsd_debug
for server side. *Make sure you turn it off again*, as it will fill
syslog very quickly otherwise. Do "echo 0 > /proc/..." to turn off.
Last time I looked, most of the information is unintelligible gibberish.
But it might say something useful.
> I can copy smaller files ok and can can copy from the NFS mounted
> filesystems without any problems. I cannot find any errors in log
> files in any of the machines.
The fact that it working for smaller buffer sizes is very suggestive. I
agree with Martin that this sounds like it's a problem with network
packets on one machine. The "rsize=8192,wsize=8192" options will
definitely cause IP-fragmentation (for files > ~1.5 kiB), so the remote
machine has to reassemble these packets. If you've got in anyway a
non-standard network-stack (ie firewall, rewriting rules, Masq., ...), or
the machine is not running a Linus kernel (e.g. RedHat), try upgrading to
the latest kernel 2.4-series kernel (and choose the networking options
carefully).
Another possibility is packets being dropped. NFS works over/with sunrpc,
which uses either udp or tcp. AFAIK, Linux doesn't support NFS over TCP,
you you'll be using UDP packets, which are unreliable. If you're network
is dropping these (for some reason), and the kernel is unable to recover
(it _should_ be able to recover), then you might get this i/o error. It
may be that your kernel(s) is/are less likely to recover from frag. IP
packets, but this would be a pretty major bug.
Try running tcpdump on the local and remote sites. Limit your search by
only looking for UDP traffic from the remote site ("tcpdump -i eth0 host
<remote host> and host <local host> and udp" should do it), then trigger
the fault. Might be interesting to see what's happening at the IP-level.
BTW, make sure you've got an up-to-the-minute version of tcpdump.
There's a buffer overflow vulnerability announced a few days ago, and
RedHat/SuSE/... have only just started to release updated versions.
Cheers,
Paul.
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Particle Physics (Theory & Experimental) Groups Paul Millar
Department of Physics and Astronomy [EMAIL PROTECTED]
University of Glasgow [EMAIL PROTECTED]
Glasgow, G12 8QQ, Scotland http://www.astro.gla.ac.uk/users/paulm
+44 (0)141 330 4717 A54C A9FC 6A77 1664 2E4E 90E3 FFD2 704B BF0F 03E9
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--------------------------------------------------------------------
http://www.lug.org.uk http://www.linuxportal.co.uk
http://www.linuxjob.co.uk http://www.linuxshop.co.uk
--------------------------------------------------------------------