> Ah, I had thought the whereis results updated when servers were down.
> (i.e. preference dropped when the server was down) A bit of background -
> I'm trying to track down a problem whereby a number of my server-clients
> seem to not be recovering from server failure, even though they seem to
> be hanging on access to read-only replicated volumes.

The preference remains the same; there is another bit in the afs_server struct which 
indicates up/down state.  This way, after the server returns to service, the 
preferences are as the user had requested.  

(Interestingly, this is one of the things that MS-Dfs got badly wrong.  When the 
Microsoft client fails over to a replica location, the original server is never used 
again (unless all the other replicas fail in turn).  So while you can implement some 
sort of preference scheme with MS-Dfs, eventually you will wind up stuck using the 
server which is in Tokyo.  It seems evident that nobody at Microsoft put more than a 
day's effort into the set of hacks they call Dfs.)

Nathan, the cache manager tries to make periodic GetTime RPC calls to "down" servers.  
You can watch them easily with tcpdump, and when you do, you'll see that the call to a 
truly down server is repeatedly retransmitted and never receives a response.  But the 
call to an "up" server gets a response in the form of an RX error packet, where the 
error code is actually the time.  This seems kind of weird, and I have never heard 
anyone say whether it was intentional or not, but it works fine.

So there are four possible things happening:
1. the GetTime calls to the previously-down-but-now-up server are not returning at all.
2. the GetTime calls are returning a negative error code, which the cache manager 
interprets as "call timed out".
3. the GetTime calls are returning properly but the cache manager is failing to mark 
the server as "up".
4. the cache manager is failing to make GetTime calls at all.





_______________________________________________
OpenAFS-devel mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo.cgi/openafs-devel

Reply via email to