Hi all!

I'm having some strange issues with my nfs setup, both speed and mount failures.

The system:
I am running an ldap/kerberos/nfs4 environment with ubuntu 10.04 servers and clients only.
- one kdap/kerberos(mit) host
- one nfs host
- one dhcp/dns host
- a number of other servers running ldap authentication and nfs4/krb mounts
- a number of clients running the same setup

All machines, which are a member of the ldap domain, mounts users home directories from the fileserver using kerberos nfs4 shares. The mounting is done by autofs which gets its mount definitions from the ldap directory.


Most of the time, it all works flawlessly, but every now and then, machines (clients and servers) starts to lock up when logging in over ssh. When it happens, all users (except local) cannot get access to their home directories, and therefore cannot get a shell going. Users can type in their password, and the MOTD is printed, but it then locks up.

I've been searching the internet up and down while trying a heap of different proposed solutions, but nothing seems to work.

What I've tried:
 - disable firewalls on server and client
 - checking that the portmap service is running on the clients and server
- doing portmap checks (rpcinfo -[tu] <server> <program> <version>), which seems to be working fine both ways (server->client, client->server) - restarting the nfs-kernel-server on the server and all services installed by the nfs-common package on the clients
 - changing rsize and wsize on the mounts, both are currently set to 4096
 - async and sync, wdelay and no_wdelay, intr and no intr exports
- checked network interfaces on server and client, neither are seemingly reporting any errors - enabled debugging for nfs on the server and clients, and I cannot see anything other then these:
   * svc: failed to register lockdv1 RPC service (errno 97).
* NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
   * NFSD: starting 90-second grace period
* /export/fileserver/homes and /export/fileserver/homes have same filehandle for gss/krb5, using first
 - one I have a feeling is the source of some issues are this message:
* nslcd[1556]: [a1da7b] nslcd_passwd_byname(nfs/sega.example.com): invalid user name
      - but it shouldn't prevent the shares from being mounted?
 - probably a few other this which I have forgotten...

Some configurations:
# fileserver /etc/exports:

/export gss/krb5(rw,fsid=0,sync,subtree_check,no_root_squash,crossmnt)
/export/fileserver    gss/krb5(rw,sync,subtree_check,no_root_squash)
/export/fileserver/homes        
gss/krb5(rw,no_wdelay,async,no_subtree_check,root_squash,crossmnt)

# Should be noted that the export/fileserver directory is a bind mount to /fileserver.

# client mount command:
rsize=4096,wsize=4096,hard,intr,noatime,tcp,async,timeo=70,retrans=2,fstype=nfs4,rw,sec=krb5 fileserver.example.com:/fileserver/homes/<username>
; could be a bit mis-formatted as it is copied from the ldap automount cn's


So my questions are;
Is there anything I should check which I haven't already?
Are there anyone who have had the same kind of issues and have figured out how to fix them?


And just as a notice, it does not seem that setting the RPCGSSDOPTS in /etc/default/nfs-common works like advertised, as the rpc.gssd process is launched without any parameters.


Hope someone has some good ideas, because I'm running dry at this point...


--
Thanks,
Tor Martin Slåen

--
ubuntu-server mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Reply via email to