Rebooted some of the bad clients last night to no avail. I have added some answers to your questions below.
I think that the clients may have stopped working on 11 April actually (not 10 April).
Thanks for your help on this.
JS.
> > >Is /afs available on other AFS clients? That rules out some > possibilities. > > > > Yes, on some of them. 50% have this problem though.
That's interesting. One I can understand. 50% gives me a pause ... unless of course you only have 2 clients :)
I've discovered at least 6 clients with this problem so far :(
Mmm not sure how to do this? I'm not exactly an AFS expert as you prob guessed!
> > >I'd go to a well-behaved client, cd /afs, fs flushv, cd /afs and see if > >/afs > >is still available on that client. > > Yes, tried this and /afs was still available. > > >
> vos exa root.afs gave (on well behaved client): > > > vos exa root.afs.readonly gace: > >
These are both fine.
I do suggest creating an RO on "server rsl155 partition /vicepa RW
ite" -- same server, same partition as RW -- doesn't cost much since such a
RO is COW/clone and will allow rsl155 to be used as an RO failover site.
(Client won't fail over to the RW if it's supposed to get an RO, and when
root.afs is replicated the client will be looking for an RO.)
Are you able to "vos listvl root.afs" from a good client/bad client?
Bad client:
$ vos listvl root.afs
vsu_ClientInit: Could not process files in configuration directory (/usr/vice/etc).
could not initialize VLDB library (code=4294967295)
Good client: $ vos listvl root.afs
root.afs
RWrite: 536870915 ROnly: 536870916
number of sites -> 3
server rsl155 partition /vicepa RW Site
server rsl156 partition /vicepa RO Site
server rsl59 partition /vicepa RO Site
> >
> >rxdebug <hostname> 7001 will give you some info about activity on the
> >AFS
> >client's callback port
>
> On a working client:
Nothing conclusive here, but nothing unexpected either. rs155 has talked to
a fileserver (apparently itself) and still has an open connection.
> > rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001 > Trying 167.156.154.55 (port 7001): > Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64 > not waiting for packets. > 0 calls waiting for a thread > 1 threads are idle > Connection from host 167.156.154.55, port 7000, Cuid 9915c0ac/1817cfe8 > serial 128760, natMTU 1444, flags pktCksum, security index 2, > client conn > rxkad: level clear, flags pktCksum > Received 271944 bytes in 2518 packets > Sent 180088696 bytes in 128706 packets > call 0: # 2518, state dally, mode: receiving, flags: receive_done > call 1: # 0, state not initialized > call 2: # 0, state not initialized > call 3: # 0, state not initialized > Done.
While rs156 either hasn't talked to a fileserver recently or at all -- in any case there's no connection.
????? Someone correct me if I'm wrong ... IIRC the connections to 7001 from
a given fileserver will time out after a period of non use???
> > On a broekn client: > > rsl56:/# rxdebug rsl56 7001 > Trying 167.156.154.56 (port 7001): > Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64 > not waiting for packets. > 0 calls waiting for a thread > 1 threads are idle > Done. > > > > > >df to see if /afs still appears in the output > > /dev/logsarc 9469952 6327480 34% 244 1% /logs/archive > AFS > df: /afs: No such file or directory >
Expected from broken client.
Is there anything (like a reboot) that happened .... oh, I see, it looks like you're doing the Sunday default fileserver restarts ... judging from the dates ...
rsl57:/usr/afs/local# ps -ef | grep afs
> > > root 17314 40486 0 11 Apr - 0:00 /usr/afs/bin/fileserver
> > > root 17686 40486 0 11 Apr - 0:00 /usr/afs/bin/volserver
> > > root 20134 1 0 08 May - 17:24 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > > root 20384 1 0 08 May - 17:23 /usr/vice/etc/afsd
> > > -stat 2800
How about sending the output from bos status <fileserver> -long ... for each
fileserver -- or at least telling me when the last restart times were for
each.
$ bos status -server rsl155 -long Bosserver reports inappropriate access on server directories Instance fs, (type is fs) currently running normally. Auxiliary status is: file server running. Process last started at Sun Apr 11 04:01:10 2004 (2 proc starts) Command 1 is '/usr/afs/bin/fileserver' Command 2 is '/usr/afs/bin/volserver' Command 3 is '/usr/afs/bin/salvager'
Instance kaserver, (type is simple) currently running normally. Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts) Command 1 is '/usr/afs/bin/kaserver'
Instance buserver, (type is simple) currently running normally. Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts) Command 1 is '/usr/afs/bin/buserver'
Instance ptserver, (type is simple) currently running normally. Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts) Command 1 is '/usr/afs/bin/ptserver'
All of these are possible but it's unlikely 6 clients would have been restarted at the same time.
Is it possible that all your fileservers restarted at the same time, or that
the two fileservers with root.afs.readonly restarted at the same time, or
were unavailable at the same time?
If so, what were the broken clients doing during the restart? Might be worth checking uptime on the broken clients to see if they were restarted while the fileservers were restarting.
????? Do any of the fs commands (fs checkv or fs checks, e.g.) work on the client? I don't know if those are blocked when /afs won't mount or not. Anyone???
These ones worked: # fs checkv All volumeID/name mappings checked. # fs checks All servers are running.
How about fs setcache? (Gotta be root, and I'd try this on a broken client
you don't care about hurting ... but it might be worth resetting the cache
size to 1, waiting for it to purge, then resetting to 0 -- which restores
the original size. fs checkv tells the client to refetch info from the VLDB
instead of trusting its cache, but is irrelevant if the unmounted /afs
breaks all the fs commands -- not sure because afsd is running even tho /afs
is broken)
# fs setcache 1 New cache size set. # fs setcache 0 New cache size set. # fs checkv All volumeID/name mappings checked. # cd /afs ksh: /afs: not found.
Possibility ... I'm thinking that the broken clients may have tried to mount
/afs (root.afs.readonly) while the fileservers were restarting. Couldn't
find any root.afs.readonly so failed to mount /afs. If so, argues for not
restarting all the FS at the same time, and for having root.afs.readonly on
each of your X (how many do you have) fileservers.
Looks like afsd has been up for almost a year? While AFS fileserver procs were restarted 11 Apr? Clients were good on Apr 10 ...
I'm not sure but it looks as if it was OK until the 11th April.
While we're at it, has anything changed with your AFS DB servers? Are the CellServDB files correct on clients (/usr/vice/etc) and servers (/usr/afs/etc)? What does "bos listhosts" report from each of the fileservers? "fs listc" from the clients?
# bos listhosts -server rsl155 Cell name is uk.dd.com Host 1 is rsl155 Host 2 is rsl156
Although if I run this on rsl155 it gives: $ bos listhosts -server rsl155 bos: can't open cell database (/usr/vice/etc) eventhough /usr/vice/etc/CellServDB is present.
# fs listc Cell uk.dd.com on hosts rsl155.dd.com.
# cat /usr/vice/etc/CellServDB
uk.dd.com #Cell name161.2.249.91 #rsl155.dd.com
Kim
> > > Hi,
> > >
> > > I'm having problems getting in to the /afs directory on an AIX
> > > box and I'm
> > > not sure how to fix it:
> > >
> > > # cd /afs
> > > ksh: /afs: not found.
> > >
> > > This has been running fine until now though. The processes are still
> > > running:
> > >
> > > rsl57:/usr/afs/local# ps -ef | grep afs
> > > root 17314 40486 0 11 Apr - 0:00 /usr/afs/bin/fileserver
> > > root 17686 40486 0 11 Apr - 0:00 /usr/afs/bin/volserver
> > > root 20134 1 0 08 May - 17:24 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > > root 20384 1 0 08 May - 17:23 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > > ..
> > > ..
> > > root 40486 1 0 11 Apr - 0:00 /usr/afs/bin/bosserver
> > >
> > > And the logs look normal. I can ping the AFS server also.
> > >
> > > Is there anything else I can try?
> > >
> > > Thanks for any help.
> > >
> > > JS.
_________________________________________________________________
Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger
_______________________________________________ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
