Today we had a weird failure that ended up affecting many of our campus services.
A fileserver that holds nothing but user volumes became unresponsive.
This fileserver is physically in another building  / different subnet.
Our web servers that mount /afs/msu/web-volumes went south. A ls of /afs/ came back empty. The volumes that the web server mount are located on totally different fileservers. Rebooting this remote fileserver returned normal operation. In fact as soon as the server had been hard reset the cell
came back to normal even before this server was back online.

I think this is related to a problem we had about 6 months ago.
We in an effort to provide Disaster Recovery put a root.cell.readonly and root.afs.readonly on this remote fileserver. This was proving to be troublesome due to network issues so we moved root.afs.readonly and root.cell.readonly back onto a server within our own building. This was done over 6 months ago.
So after this long story here's my question:
Can I query the local cache and find out where the client thinks root.cell .readonly is? My theory is the clients (mostly Solaris ) think the root.* volumes are still on this remote fileserver and when the server gets wedged the clients hang. Why these clients can't find the real volumes is beyond me. vos exam root.cell tells me that these volumes are not on the affected fileserver and that they are where I expect them to be.
Any thoughts on why this is happening.

--
Steve Devine
Storage Systems
Academic Computing & Network Services
Michigan State University

506 Computer Center
East Lansing, MI 48824-1042
1-517-432-7327

Baseball is ninety percent mental; the other half is physical.
- Yogi Berra

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to