On Wed, Mar 30, 2011 at 03:07:13PM +0200, Walter Haidinger wrote: > Am 29.03.2011 22:42, schrieb Claudio Jeker: > > Here is a possible fix. The problem was that because of the way NFS uses > > the socket API it did not turn of the sendbuffer scaling which reset the > > size of the socket back to 17376 bytes which is a no go when a buffer of > > more then 17k is generated by NFS. It is better to initialize the sb_wat > > in soreserve() which is called by NFS and all attach functions. > > > > Please test and report back. > > Thanks for the patch. Glad to test it. > > Well, the good news: No more lockups, neither in a VM nor on real hardware. > > Everything is also pretty fine with larger buffers, but with small buffers > (e.g. -o tcp,-r=512,-w=512), the system doesn't respond sometimes, like > short freezes of a couple of seconds as if there is a pause while some > buffers are emptied.
Buffers below 8k are stupid. For TCP just use 32k or even 64k. 512byte buffers are silly. They get internally rounded up since the smallest packet seems to be 512bytes data plus header. This will give you TCP send and recv buffers of around 1200bytes. No wonder it is slow as hell. > This is also visible in the "runtime", i.e. time stats to put a 16 MiB file > (dd if=/dev/urandom of=/nfs/foo bs=4096 count=4096), first column is the > buffer size used for the nfs mount: > > 512: 1m9.07s real 0m0.01s user 0m1.13s system > 1024: 0m20.13s real 0m0.00s user 0m1.23s system > 2048: 0m5.59s real 0m0.00s user 0m1.13s system > 4096: 0m2.07s real 0m0.00s user 0m0.86s system > 8192: 0m1.41s real 0m0.00s user 0m0.91s system > 16384: 0m1.19s real 0m0.00s user 0m0.82s system > 32768: 0m1.11s real 0m0.00s user 0m0.76s system > > Writing a 64 MiB file: > > 512: 6m2.83s real 0m0.03s user 0m5.78s system > 1024: 2m58.62s real 0m0.03s user 0m5.45s system > 2048: 1m12.66s real 0m0.07s user 0m4.66s system > 4096: 0m27.60s real 0m0.05s user 0m4.47s system > 8192: 0m11.68s real 0m0.01s user 0m3.85s system > 16384: 0m6.50s real 0m0.00s user 0m3.64s system > 32768: 0m6.15s real 0m0.00s user 0m3.22s system > > ktrace dumps for all dd runs are available, I can put > them somewhere for download if required. > The default block size is 8k for smaller buffers the overhead of header and round trip time is just too big. -- :wq Claudio