On Tue, 04 Nov 2014 16:05:24 +0100 Christian <[email protected]> wrote: >> on some of our windows clients (win7 enterprise x64, openafs 1.7.31), we >> are seeing issues where if I try to access a volume on a given server, >> it gives me "RPC service unavailable". This only happens for one of our >> two file and db servers, which are both almost identical (the first one >> has in fact been cloned from the second one). Servers run openafs >> 1.6.9-1~bpo7 from wheezy-backports on debian wheezy. While that is >> happening, "fs checkservers" reports that particular server as being >> down. > Does syslog report the server coming back up later, if you don't try to > access anything? Sometimes. I can sometimes also "fix" it by completely uninstalling the AFS client and reinstalling it. >> udebug <server> 7003 works, though, and I can ping that server or >> ssh to it just fine. Should I post trace logs and udebug output for >> people to look at, or what is the appropriate way to debug this? Thanks >> a lot, > It's much more likely that you're failing to contact the fileserver > (port 7000), not the vlserver (port 7003). You can check basic > connectivity for that with 'rxdebug <server> 7000 -version'. > > But that will probably just succeed and won't tell you anything. What > would really tell you what's happening is if you could capture AFS > traffic (udp port 7000) close to the client, and close to the server (at > least, 'before' and 'after' the openvpn link). If Jeff's suggestion is > what is happening, you'll see packets that appear to be sent on the > server side, but will not appear on the client side. Specifically, you'd > see packets over a certain size not appear on the client side. > > You can either look at the dump yourself in wireshark or something, or > provide it for one of us to look at. But you don't really need to know > anything about AFS to do the above analysis; just see if larger packets > appear in one dump but not the other. > > If you determine that what Jeff mentioned is what's happening, and you > can't fix or alter the thing that's dropping packets, you might be able > to change a setting in the Windows client to reduce the max size of > packets that we use (RxMaxMTU). Or change the MTU on the local > interface; I don't recall what the specifics are of changing this on > Windows. OK, so udebug 7000 130.75.103.223 fails on that machine. But it also fails for our other server which I can access via the afs client just fine. So I did this:
(on the file server, 130.75.103.223) tcpdump -n host 130.75.103.223 and host 130.75.102.221 and udp 22:00:42.166283 IP 130.75.102.221.55607 > 130.75.103.223.7000: rx data fs call op#10006 (32) 22:00:42.166401 IP 130.75.103.223.7000 > 130.75.102.221.55607: rx abort (32) 22:00:42.169060 IP 130.75.102.221.55607 > 130.75.103.223.7000: rx data fs call op#10004 (32) 22:00:42.169157 IP 130.75.103.223.7000 > 130.75.102.221.55607: rx abort (32) (on the client, 130.75.102.221) windump.exe -n -i blah host 130.75.103.223 and host 130.75.102.221 and udp 22:00:42.166283 IP 130.75.102.221.55607 > 130.75.103.223.7000: rx data fs call op#10006 (32) 22:00:42.166401 IP 130.75.103.223.7000 > 130.75.102.221.55607: rx abort (32) 22:00:42.169060 IP 130.75.102.221.55607 > 130.75.103.223.7000: rx data fs call op#10004 (32) 22:00:42.169157 IP 130.75.103.223.7000 > 130.75.102.221.55607: rx abort (32) This is the result of udebug 130.75.103.223 7000 on the client, which fails with "return code -2 from VOTE_debug" Bizarre. I cannot see much of a difference... Thanks for looking into this, Christian _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
