I've stumbled into a weird performance problem with NFS client. I have CompuLab's Utilite Pro (evbarm) machine running a very current -current. It's connected to my Ubuntu laptop with Ethernet (direct cable link) and uses it as an NFS server to get e.g. pkgsrc tree and distfiles. The performance is really bad, e.g. make extract of a package may take literal ages (I did a lot of the initial investigation for this mail while python distfile was being extracted). Extracting 31M uncompressed bash distfile may take, from run to run:
real 2m21.110s user 0m0.635s sys 0m4.233s or real 4m52.010s user 0m0.769s sys 0m4.815s or whatever. Looking at the traffic with wireshark I can see a curious recurring pattern. Looking at the time/sequence plot one immediately sees short bursts of activity separated by huge gaps of no traffic. Here's one such instance abridged and wrapped for clarity. Timestamps are shown as delta to the previous frame. Window scale is 3 for the client (utilite) and 7 for the server (majava), i.e the server has plenty of window open. > 413 00.000351 IP utilite.1011 > majava.nfs: Flags [.], seq 177121:178569, ack 79601, win 3225, options [nop,nop,TS val 111 ecr 1941833655], length 1448: NFS request xid 1992059772 1444 write fh ... 5406 (5406) bytes @ 16384 > 414 00.000048 IP utilite.1011 > majava.nfs: Flags [.], seq 178569:180017, ack 79601, win 3225, options [nop,nop,TS val 111 ecr 1941833655], length 1448 415 00.000009 IP majava.nfs > utilite.1011: Flags [.], ack 180017, win 1834, options [nop,nop,TS val 1941833656 ecr 111], length 0 > 416 00.000051 IP utilite.1011 > majava.nfs: Flags [.], seq 180017:181465, ack 79601, win 3225, options [nop,nop,TS val 111 ecr 1941833655], length 1448 417 00.043745 IP majava.nfs > utilite.1011: Flags [.], ack 181465, win 1834, options [nop,nop,TS val 1941833700 ecr 111], length 0 > 418 00.994813 IP utilite.1011 > majava.nfs: Flags [P.], seq 181465:182645, ack 79601, win 3225, options [nop,nop,TS val 111 ecr 1941833655], length 1180 419 00.000032 IP majava.nfs > utilite.1011: Flags [.], ack 182645, win 1834, options [nop,nop,TS val 1941834694 ecr 111], length 0 ! 420 00.000007 IP utilite.1011 > majava.nfs: Flags [P.], seq 181465:182645, ack 79601, win 3225, options [nop,nop,TS val 113 ecr 1941833700], length 1180 421 00.000009 IP majava.nfs > utilite.1011: Flags [.], ack 182645, win 1834, options [nop,nop,TS val 1941834694 ecr 113,nop,nop, sack 1 {181465:182645}], length 0 Here frames 413, 414, 416 and 418 comprise single NFS write request. All, but the last segments are sent very fast. Then note that the last segment (418) is sent after a 1 second delay, and then immediately resent (420, marked as "spurious retransmission" by wireshark). This pattern repeats through the trace. From time to time a large write has its last segment delayed by 1 second, then there's an ACK from the server and then that last segment is immediately "spuriously" resent. Does this ring any bells? Either from the TCP point of view or from NFS might be doing here that might trigger that. Just copying a large file to the server seems to be ok, the time/sequence plot is nice and linear. -uwe