If you have tcpdump data for cache manager <-> vlserver and cache manager <-> fileserver traffic during one of these corruptions, that could be very helpful. I've found tcpdump (or wireshark/tshark) to be useful in tracking down issues like this because you can very quickly see if the problem is
1- cache manager asking for the wrong thing to start with (possibly cache corruption -- not conclusive because you have to determine if the cache manager got the bad data and cached it, or if the cache manager 'broke' the data; picking one client and clearing it's cache, then re-trying can help answer that question). Note that in your case, this is pretty unlikely, given that you saw it across multiple clients on mutiple OSes. 2- vlserver giving a wrong answer 3- neither of the above, which means the fileserver is giving a wrong answer. The usual suspects (e.g., cmdebug) are also helpful here. It might also be useful to get the callback state from the fileservers to see what they think the cache managers have for data (if in case 3 above). Given that 'failed volume moves' seem to have been a trigger for this, logfiles might have something interesting, especially if you can provide volume names & volume id's for the X-volumes' -- Steven Jenkins End Point Corporation http://www.endpoint.com/ _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
