Hi all, 1.6.6pre1 and 1.6.6pre2 contain an extra feature in the OpenAFS fileserver that could possibly help with communicating with clients behind NATs (Network Address Translation). It's not completely certain how much this feature helps, though, so it will be removed from the 1.6.6 release unless we get some more information about it.
If you are running a fileserver that you believe may have some trouble talking to clients behind NATs, testing this feature would be very helpful. This is most relevant for any site that may have fileservers that are talking to NAT'ed clients, where the clients are old enough to not have the client-side NAT improvements (pre-1.6); this is most common at sites that have users accessing AFS from home that don't know much about AFS. You can test this new feature by just running a fileserver with 1.6.6pre* and see if anything improves; there is no additional configuration or anything to do. But how do you know if this is a problem for you at all? Usually the most user-visible symptom is that access to AFS hangs while a client is tryign to write to AFS, but a lot of different things can cause that. To know if that is being caused _specifically_ because of problems reaching clients behind NATs, you can check the fileserver's FileLog. In there, if you see a lot of log messages talking about errors trying to contact specific IPs and port numbers, you may be suffering from this. In particular, it's somewhat likely to be related to NATs if you see a lot of such error messages logged referring to non-7001 ports. And it's especially likely if you see a lot of connection errors for non-7001 ports that are obviously incrementing over time. (For example, you see an error for port 8005, then 8006, then 8007, etc, all from the same IP.) It can also help to know if the IPs you see logged in FileLog are behind NATs in the first place. If you have no way of knowing that, you can sort-of detect what hosts may be behind NATs by sending the fileserver the SIGXCPU signal, and looking at the resulting /usr/afs/local/hosts.dump file. If you see an entry for a host with a public IP like "ip:203.0.113.40", and later on in that entry you see a list of IPs that include private IPs, like "[ 203.0.113.40:7001 192.168.1.5:7001]", that host may be behind a NAT. "Detecting" a client behind a NAT in this way is far from perfect, but it's just another things to check. Common private IP ranges are of course 192.168/16, 172.16/20, and 10/8. A client can obviously be behind a NAT without an IP in any of those ranges, but those are commonly used by consumer-grade home routers and stuff like that. Anyway, if you ever look into why an OpenAFS fileserver appears to be slow/hanging, and the above information suggests that client NATs are an issue, it would be very helpful if you tried looking into some posible fixes. If you cannot deploy 1.6.6pre* on a server experiencing this issue, we can also provide patches specifically for this issue based on a previous stable version, if that's more feasible. There are also additional possible patches in this area that are not in 1.6.6pre*, if you want to try other approaches. Or even if you can't actually deploy any testing code, I'd still like to hear from you if you think you are experiencing issues in this area. More information is always appreciated. Remember that if we don't hear anything, this will be pulled out. For developers: obviously I'm skipping over the details of what any of this actually does. The 'extra feature' is gerrit 9420, which will be reverted via gerrit 10135. See also: gerrits 10144-10147. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
