NFS client performance problems

Valery Ushakov Thu, 24 Dec 2020 15:20:40 -0800

I've stumbled into a weird performance problem with NFS client.  I
have CompuLab's Utilite Pro (evbarm) machine running a very current
-current.  It's connected to my Ubuntu laptop with Ethernet (direct
cable link) and uses it as an NFS server to get e.g. pkgsrc tree and
distfiles.  The performance is really bad, e.g. make extract of a
package may take literal ages (I did a lot of the initial
investigation for this mail while python distfile was being
extracted).  Extracting 31M uncompressed bash distfile may take, from
run to run:


real    2m21.110s
user    0m0.635s
sys     0m4.233s

or

real    4m52.010s
user    0m0.769s
sys     0m4.815s

or whatever.

Looking at the traffic with wireshark I can see a curious recurring
pattern.  Looking at the time/sequence plot one immediately sees short
bursts of activity separated by huge gaps of no traffic.

Here's one such instance abridged and wrapped for clarity.  Timestamps
are shown as delta to the previous frame.  Window scale is 3 for the
client (utilite) and 7 for the server (majava), i.e the server has
plenty of window open.

> 413 00.000351 IP utilite.1011 > majava.nfs: Flags [.],
      seq 177121:178569, ack 79601, win 3225,
          options [nop,nop,TS val 111 ecr 1941833655],
          length 1448:
          NFS request xid 1992059772 1444 write fh ... 5406 (5406) bytes
              @ 16384
> 414 00.000048 IP utilite.1011 > majava.nfs: Flags [.],
          seq 178569:180017, ack 79601, win 3225,
          options [nop,nop,TS val 111 ecr 1941833655],
          length 1448
  415 00.000009 IP majava.nfs > utilite.1011: Flags [.],
          ack 180017, win 1834,
          options [nop,nop,TS val 1941833656 ecr 111],
          length 0
> 416 00.000051 IP utilite.1011 > majava.nfs: Flags [.],
          seq 180017:181465, ack 79601, win 3225,
          options [nop,nop,TS val 111 ecr 1941833655],
          length 1448
  417 00.043745 IP majava.nfs > utilite.1011: Flags [.],
          ack 181465, win 1834,
          options [nop,nop,TS val 1941833700 ecr 111],
          length 0
> 418 00.994813 IP utilite.1011 > majava.nfs: Flags [P.],
          seq 181465:182645, ack 79601, win 3225,
          options [nop,nop,TS val 111 ecr 1941833655],
          length 1180
  419 00.000032 IP majava.nfs > utilite.1011: Flags [.],
          ack 182645, win 1834,
          options [nop,nop,TS val 1941834694 ecr 111],
          length 0
! 420 00.000007 IP utilite.1011 > majava.nfs: Flags [P.],
          seq 181465:182645, ack 79601, win 3225,
          options [nop,nop,TS val 113 ecr 1941833700],
          length 1180
  421 00.000009 IP majava.nfs > utilite.1011: Flags [.],
          ack 182645, win 1834,
          options [nop,nop,TS val 1941834694 ecr 113,nop,nop,
                   sack 1 {181465:182645}],
          length 0

Here frames 413, 414, 416 and 418 comprise single NFS write request.
All, but the last segments are sent very fast.  Then note that the
last segment (418) is sent after a 1 second delay, and then
immediately resent (420, marked as "spurious retransmission" by
wireshark).

This pattern repeats through the trace.  From time to time a large
write has its last segment delayed by 1 second, then there's an ACK
from the server and then that last segment is immediately "spuriously"
resent.

Does this ring any bells?  Either from the TCP point of view or from
NFS might be doing here that might trigger that.

Just copying a large file to the server seems to be ok, the
time/sequence plot is nice and linear.

-uwe

NFS client performance problems

Reply via email to