My responses are down near the bottom of this email interspersed among
Jon's suggestions.
Steven Yellin
On Fri, 10 Apr 2009, Jon Peatfield wrote:
On Thu, 9 Apr 2009, Steven J. Yellin wrote:
We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5. I'll
call them "A" and "B". Each exports two file systems, and each runs amd to
mount whatever filesystems are requested from elsewhere. Filesystems
requested from SL3.0.9 systems mount without problem, and filesystems
requested from the SL5.1 systems also mounted without problem until
recently. But recently attempts to access from "A" a filesystem exported by
"B" or access from "B" a filesystem exported by "A" started being met with
a message "Input/output error". Similar requests on an SL3.0.9 system to
view a SL5.1 exported one give "Permission denied". I'd appreciate advice.
I'll give some more information in the following, and will be glad to add
more depending on what others think might be useful.
The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to
a set of computers including "A" and "B".
Unless I've made a mistake, the firewall is open between "A" and "B".
In the following is what went into /var/log/messages on "A" and "B" at
the time of an attempt to look from "A" at a filesystem exported by "B",
with a perhaps ineffectual paranoid attempt to maintain a low profile by
replacing computer names and IP's with "A" and "B".
On "A" at the time of the "Input/output error", a set of lines went to
/var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and
otherwise containing
get_nfs_version: returning NFS(3,tcp) on host "B"
get_nfs_version: returning NFS(3,udp) on host "B"
Using NFS version 3, protocol tcp on host "B"
initializing "B"'s pinger to 30 sec
creating mountpoint directory '/.automount/"B"/root'
file server "B", type nfs, state starts up
Flushed /net/"B"; dependent on "B"
recompute_portmap: NFS version 3 on "B"
Using MOUNT version: 3
amfs_host_mount: NFS version 3
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
/net/"B": mount (amfs_cont): Input/output error
On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B"
mountd[9831]: " and otherwise contained:
authenticated mount request from "A":1023 for /data (/data)
authenticated mount request from "A":1023 for /scratch (/scratch)
Steven Yellin
To narrow the search I'd suggest seeing if a manual nfs mount from A to B
(and vise-versa) works.
On "B" the command 'mount "A":/scratch /mnt/tmp' failed with response
mount: "A":/scratch failed, reason given by server: Permission denied.
There were no messages at that time in /var/log/messages of "B", but in
/var/log/messages of "A" was
Apr 9 19:01:58 "A" mountd[12500]: authenticated mount request from "B":777 for
/scratch (/scratch)
There was nothing in /var/log/secure at the time of a failed mount for
either "A" or "B".
Similarly for "A" <--> "B".
If the manual mount works then we need to look more closely at how amd is
differing from the manual mount, and if it doesn't we have excluded amd from
the equation and should look at the nfs setup...
I haven't modified /etc/sysconfig/nfs, which has only comment lines.
The next step (whether the manual mount works or not) may well be to check
/var/log/secure for relevant (e.g. blocking) messages and run
rpcinfo -p
against A and B to see that all the expected sunrpc services are registered
and what ports they are listening on (e.g. in case those are being blocked
somewhere...)
On both "A" and "B" the command 'rpcinfo -p' showed portmapper,
status, ypbind, nlockmgr, rquotad, nfs, mountd and amd, all with proto tcp
and udp. I didn't see any port that looked familiar as one that could be
blocked somewhere, but maybe that's just because I don't know how to tell.
From "A" for all the tcp ports shown by 'rpcinfo -p "B"'
telnet "B" <port>
always made a connection.
btw from the error '...mountd...RPC: Can't decode result' it *sounds* like
amd isn't liking (or can't underdstand) the reply it is getting from mountd -
but that could be a problem with mountd or amd...
BTW do you have a spare box to try as a 3rd sl5 machine 'C'?
Yes, I could install SL5 on an old Pentium III machine now running
SL3.0.9. I hope there's something simpler to do to diagnose the problem.
I remember some time ago having trouble exporting from "A" before "B"
was purchased, though instead of trying to diagnose the problem, I just
rebooted "A" and the problem went away for awhile. The machines are more
heavily used now, so I don't feel quite as free to do that. If a 3rd SL5
machine is setup, I suspect it won't have any trouble exporting at first,
any more than "A" or "B" had.
Just in case you can use this information, here's what's in the
/etc/exports file of both "A" and "B", with other computer names also
replaced by something in quotes:
/data "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync)
"X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
/scratch "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync)
"X4"(rw,sync) "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
--
/--------------------------------------------------------------------\
| "Computers are different from telephones. Computers do not ring." |
| -- A. Tanenbaum, "Computer Networks", p. 32 |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP, University of Cambridge |
| Mail: [email protected] Web: http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/