I wrote: ] Since the start of the semester, OpenAFS seems to occasionally hang ] for a few seconds (5? 10?) when trying to do things like write files. ] I finally had it happen while running a script that was doing fs calls, ] and got the message: ] fs:'path-to-directory-in-afs': server not responding promptly
Other people have asked whether this happens on reads. I haven't noticed it on reads. It seems to only happen on writes (and mkdir etc) and only on the first write in a little while. After waiting 10 seconds for a write, later writes are fast. This seems consistent with it being the server waiting for callbacks to break. Someone else suggested I use rxdebug. I tried using rxdebug twice while an AFS write was hanging (the results were printed long before the hang finally went through). Both times there was no indication of there being insufficient threads. Here's one example: % rxdebug jeremiah.cs.uwm.edu Trying 129.89.143.70 (port 7000): Free packets: 370, packet reclaims: 331, calls: 409011, used FDs: 64 not waiting for packets. 0 calls waiting for a thread 10 threads are idle Connection ... (lots of connections to report). ] My original guess is that the server is hanging while waiting to break callbacks ] from clients that are behind firewalls and not responding. But even ] running 'fs checks' from all possible clients that are accessing the ] volume doesn't seem to work; at least it still takes a few seconds more. ] But this sort of behavior presumably would drive everyone mad and would ] have been fixed before 1.4.12, so now I'm at a loss. Someone pointed out that one cannot be sure that all possible clients have been contacted. That's true. But all evidence is pointing back that this is just a known problem: the server will hang on writes to a volume while waiting to break callbacks. But the resulting behavior is very annoying and has bad effects (if the hang is long enough there are I/O errors and applications start to fail). Are we just the only place with a significant number of AFS clients behind poorly behaved NAT routers? That's seems hard to believe. We are a tiny cell with only two file servers and about 100 users max. (But 90% of which are behind NATs.) On the other hand, the fact I discover ever new ways in which Windows 7 doesn't support AFS may means that we have a relatively large group of naive AFS users, and this has the added effect that they aren't configuring NAT routers to support AFS and hence the callback problems mentioned. Best regards, John _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
