My responses are down near the bottom of this email interspersed among Jon's suggestions.

Steven Yellin

On Fri, 10 Apr 2009, Jon Peatfield wrote:

On Thu, 9 Apr 2009, Steven J. Yellin wrote:

We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5. I'll call them "A" and "B". Each exports two file systems, and each runs amd to mount whatever filesystems are requested from elsewhere. Filesystems requested from SL3.0.9 systems mount without problem, and filesystems requested from the SL5.1 systems also mounted without problem until recently. But recently attempts to access from "A" a filesystem exported by "B" or access from "B" a filesystem exported by "A" started being met with a message "Input/output error". Similar requests on an SL3.0.9 system to view a SL5.1 exported one give "Permission denied". I'd appreciate advice. I'll give some more information in the following, and will be glad to add more depending on what others think might be useful. The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to a set of computers including "A" and "B".
    Unless I've made a mistake, the firewall is open between "A" and "B".
In the following is what went into /var/log/messages on "A" and "B" at the time of an attempt to look from "A" at a filesystem exported by "B", with a perhaps ineffectual paranoid attempt to maintain a low profile by replacing computer names and IP's with "A" and "B". On "A" at the time of the "Input/output error", a set of lines went to /var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and otherwise containing

get_nfs_version: returning NFS(3,tcp) on host "B"
get_nfs_version: returning NFS(3,udp) on host "B"
Using NFS version 3, protocol tcp on host "B"
initializing "B"'s pinger to 30 sec
creating mountpoint directory '/.automount/"B"/root'
file server "B", type nfs, state starts up
Flushed /net/"B"; dependent on "B"
recompute_portmap: NFS version 3 on "B"
Using MOUNT version: 3
amfs_host_mount: NFS version 3
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
/net/"B": mount (amfs_cont): Input/output error

On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B" mountd[9831]: " and otherwise contained:

authenticated mount request from "A":1023 for /data (/data)
authenticated mount request from "A":1023 for /scratch (/scratch)

Steven Yellin

To narrow the search I'd suggest seeing if a manual nfs mount from A to B (and vise-versa) works.


    On "B" the command 'mount "A":/scratch /mnt/tmp' failed with response

 mount: "A":/scratch failed, reason given by server: Permission denied.

There were no messages at that time in /var/log/messages of "B", but in /var/log/messages of "A" was

Apr 9 19:01:58 "A" mountd[12500]: authenticated mount request from "B":777 for 
/scratch (/scratch)

There was nothing in /var/log/secure at the time of a failed mount for either "A" or "B".
    Similarly for "A" <--> "B".

If the manual mount works then we need to look more closely at how amd is differing from the manual mount, and if it doesn't we have excluded amd from the equation and should look at the nfs setup...

    I haven't modified /etc/sysconfig/nfs, which has only comment lines.


The next step (whether the manual mount works or not) may well be to check /var/log/secure for relevant (e.g. blocking) messages and run

rpcinfo -p

against A and B to see that all the expected sunrpc services are registered and what ports they are listening on (e.g. in case those are being blocked somewhere...)

On both "A" and "B" the command 'rpcinfo -p' showed portmapper, status, ypbind, nlockmgr, rquotad, nfs, mountd and amd, all with proto tcp and udp. I didn't see any port that looked familiar as one that could be blocked somewhere, but maybe that's just because I don't know how to tell.
From "A" for all the tcp ports shown by 'rpcinfo -p "B"'

telnet "B" <port>

always made a connection.



btw from the error '...mountd...RPC: Can't decode result' it *sounds* like amd isn't liking (or can't underdstand) the reply it is getting from mountd - but that could be a problem with mountd or amd...

BTW do you have a spare box to try as a 3rd sl5 machine 'C'?


Yes, I could install SL5 on an old Pentium III machine now running SL3.0.9. I hope there's something simpler to do to diagnose the problem.

I remember some time ago having trouble exporting from "A" before "B" was purchased, though instead of trying to diagnose the problem, I just rebooted "A" and the problem went away for awhile. The machines are more heavily used now, so I don't feel quite as free to do that. If a 3rd SL5 machine is setup, I suspect it won't have any trouble exporting at first, any more than "A" or "B" had. Just in case you can use this information, here's what's in the /etc/exports file of both "A" and "B", with other computer names also replaced by something in quotes:

/data "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync) "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync) /scratch "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync) "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)



--
/--------------------------------------------------------------------\
| "Computers are different from telephones.  Computers do not ring." |
|       -- A. Tanenbaum, "Computer Networks", p. 32                  |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP,  University of Cambridge |
| Mail:  [email protected]     Web:  http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/

Reply via email to