After what seemed like good ideas for diagnosing the failure of two
SL5.1 systems to export filesystems, I was still unable to correct the
problem, even by restarting nfs and related services. Suggestions stopped
coming in, and local users got impatient for the problem to be corrected,
even if it meant interrupting their work. So I rebooted the machines, and
that seemed to fix the problem. But the refusal to export may well recur
-- it has happened before. Ideas are still welcome on how to diagnose or
correct the problem in the future without rebooting.
Steven Yellin
On Thu, 9 Apr 2009, Steven J. Yellin wrote:
My responses are down near the bottom of this email interspersed among
Jon's suggestions.
Steven Yellin
On Fri, 10 Apr 2009, Jon Peatfield wrote:
On Thu, 9 Apr 2009, Steven J. Yellin wrote:
We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5.
I'll call them "A" and "B". Each exports two file systems, and each runs
amd to mount whatever filesystems are requested from elsewhere.
Filesystems requested from SL3.0.9 systems mount without problem, and
filesystems requested from the SL5.1 systems also mounted without problem
until recently. But recently attempts to access from "A" a filesystem
exported by "B" or access from "B" a filesystem exported by "A" started
being met with a message "Input/output error". Similar requests on an
SL3.0.9 system to view a SL5.1 exported one give "Permission denied". I'd
appreciate advice. I'll give some more information in the following, and
will be glad to add more depending on what others think might be useful.
The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to
a set of computers including "A" and "B".
Unless I've made a mistake, the firewall is open between "A" and "B".
In the following is what went into /var/log/messages on "A" and "B" at
the time of an attempt to look from "A" at a filesystem exported by "B",
with a perhaps ineffectual paranoid attempt to maintain a low profile by
replacing computer names and IP's with "A" and "B".
On "A" at the time of the "Input/output error", a set of lines went to
/var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and
otherwise containing
get_nfs_version: returning NFS(3,tcp) on host "B"
get_nfs_version: returning NFS(3,udp) on host "B"
Using NFS version 3, protocol tcp on host "B"
initializing "B"'s pinger to 30 sec
creating mountpoint directory '/.automount/"B"/root'
file server "B", type nfs, state starts up
Flushed /net/"B"; dependent on "B"
recompute_portmap: NFS version 3 on "B"
Using MOUNT version: 3
amfs_host_mount: NFS version 3
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
/net/"B": mount (amfs_cont): Input/output error
On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B"
mountd[9831]: " and otherwise contained:
authenticated mount request from "A":1023 for /data (/data)
authenticated mount request from "A":1023 for /scratch (/scratch)
Steven Yellin
To narrow the search I'd suggest seeing if a manual nfs mount from A to B
(and vise-versa) works.
On "B" the command 'mount "A":/scratch /mnt/tmp' failed with response
mount: "A":/scratch failed, reason given by server: Permission denied.
There were no messages at that time in /var/log/messages of "B", but in
/var/log/messages of "A" was
Apr 9 19:01:58 "A" mountd[12500]: authenticated mount request from "B":777
for /scratch (/scratch)
There was nothing in /var/log/secure at the time of a failed mount for either
"A" or "B".
Similarly for "A" <--> "B".
If the manual mount works then we need to look more closely at how amd is
differing from the manual mount, and if it doesn't we have excluded amd
from the equation and should look at the nfs setup...
I haven't modified /etc/sysconfig/nfs, which has only comment lines.
The next step (whether the manual mount works or not) may well be to check
/var/log/secure for relevant (e.g. blocking) messages and run
rpcinfo -p
against A and B to see that all the expected sunrpc services are registered
and what ports they are listening on (e.g. in case those are being blocked
somewhere...)
On both "A" and "B" the command 'rpcinfo -p' showed portmapper, status,
ypbind, nlockmgr, rquotad, nfs, mountd and amd, all with proto tcp and udp.
I didn't see any port that looked familiar as one that could be blocked
somewhere, but maybe that's just because I don't know how to tell. From "A"
for all the tcp ports shown by 'rpcinfo -p "B"'
telnet "B" <port>
always made a connection.
btw from the error '...mountd...RPC: Can't decode result' it *sounds* like
amd isn't liking (or can't underdstand) the reply it is getting from mountd
- but that could be a problem with mountd or amd...
BTW do you have a spare box to try as a 3rd sl5 machine 'C'?
Yes, I could install SL5 on an old Pentium III machine now running
SL3.0.9. I hope there's something simpler to do to diagnose the problem.
I remember some time ago having trouble exporting from "A" before "B" was
purchased, though instead of trying to diagnose the problem, I just rebooted
"A" and the problem went away for awhile. The machines are more heavily used
now, so I don't feel quite as free to do that. If a 3rd SL5 machine is
setup, I suspect it won't have any trouble exporting at first, any more than
"A" or "B" had.
Just in case you can use this information, here's what's in the
/etc/exports file of both "A" and "B", with other computer names also
replaced by something in quotes:
/data "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync)
"X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
/scratch "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync)
"X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
--
/--------------------------------------------------------------------\
| "Computers are different from telephones. Computers do not ring." |
| -- A. Tanenbaum, "Computer Networks", p. 32 |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP, University of Cambridge |
| Mail: [email protected] Web: http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/