Hi! I am experiencing frequent hangs and annoying delays with recent openafs-clients. The delays appear both when reading or writing files. Sometimes transfer contunies after some delay (about 8 to 10 seconds), sometimes afs reports "Lost contact to fileserver".
I've seen those hangs with 1.3.84, 1.3.85, 1.4.0rc1 and rc5 clients on Linux (Kernel 2.6). 1.3.80 and 1.3.82 work fine, so I expect that some change between 1.3.82 and 1.3.84 causes the problems. The fileserver is from transarc: # rxdebug c-hoernchen -version Trying 137.208.3.48 (port 7000): AFS version: Base configuration afs3.4 5.77 To track down the problem, I've captured the network traffic between client and server while creating 10 files with 100k each. The dumps are available in pcap format (suitable for viewing with ethereal) and as tcpdump output at http://leo.kloburg.at/tmp/openafs/ Maybe the files for 1.3.82 (ok) and 1.3.84 (hangs) will be relevant. The command that produced the files was: perl -le 'for $i ((1..20)) { print "$i"; local *F; \ open(F, ">f_$i") or die "open: $!"; print F "x"x10240; }' The delays look like that: 17:25:33.967776 IP 137.208.3.48.afs3-fileserver > 137.208.89.100.afs3-callback: rx ack first 1 serial 3151 reason ack requested acked 16777216 (62) 17:25:41.242157 IP 137.208.89.100.afs3-callback > 137.208.3.48.afs3-fileserver: rx data (1472) 17:25:41.305587 IP 137.208.3.48.afs3-fileserver > 137.208.89.100.afs3-callback: rx ack first 1 serial 3162 reason ack requested acked 16777216 (62) 17:25:51.319114 IP 137.208.3.48.afs3-fileserver > 137.208.89.100.afs3-callback: rx ack first 2 serial 3162 reason ping (61) 17:25:51.319224 IP 137.208.89.100.afs3-callback > 137.208.3.48.afs3-fileserver: rx ack first 1 serial 1761 reason ping response (65) 17:25:51.377321 IP 137.208.3.48.afs3-fileserver > 137.208.89.100.afs3-callback: rx ack first 1 serial 3168 reason ack requested acked 16777216 (62) 17:26:01.375107 IP 137.208.3.48.afs3-fileserver > 137.208.89.100.afs3-callback: rx ack first 2 serial 3168 reason ping (61) 17:26:01.375215 IP 137.208.89.100.afs3-callback > 137.208.3.48.afs3-fileserver: rx ack first 1 serial 1765 reason ping response (65) I've also noticed that in versions 1.3.80 and 1.3.82 (those that do not show the delays) each store-data UDP-packet is 5700 bytes and is splitted in four UDP fragments. However, this is also true for 1.3.84, which already shows the problems. In 1.4.0rcX, the store-data packets seem to be smaller, the UDP packet is only 2896 bytes and comes in two fragments. Is there any specific reason why all those packets are larger than the MTU? Any help would be greatly appreciated! Cheers, --leo -- ----------------------------------------------------------------------- [EMAIL PROTECTED] Fax: +43-1-31336-906050 Zentrum fuer Informatikdienste - Wirtschaftsuniversitaet Wien - Austria _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
