Re: Very slow NFS writes

2013-04-23 Thread Tomas Bodzar
On Mon, Apr 22, 2013 at 2:46 PM, Mattieu Baptiste mattie...@gmail.comwrote:

 Hi,

 I'm currently trying to access files from my OpenBSD -current/amd64
 workstation on a NAS under FreeNAS (8.3.1). On my workstation, the
 filesystem is a read/write NFS mounted share. Its size is about 5.2TB.
 While reading seems normal : about 45MB/s, writing is a lot slower
 (fluctuates between 10MB/s and 20MB/s) before eventually stall (under
 1MB/s). Note that at the start, my box is totally unresponsive. When the
 writes fall below 1MB/s, the box became responsive again.

 PF is disabled on my box and on both sides, I have em(4) interfaces
 (autoneg at 1000 baseT).

 With CIFS shares, the NAS can do a lot more throughput : above 50MB/s
 writes.

 I suspect problems with the OpenBSD NFS client since I saw problems like
 that in the archive. Moreover, the behavior of my box which became
 unresponsive when writing at 20MB/s seems strange.

 Any clues ?

 I'm sorry to not have more factual numbers... except the dmesg of my box.
 The NAS isn't accessible to me all the time. I can provide more details in
 the future.



You can start on client side as well to provide some numbers.

nfsstat -c
systat (check more screens)
vmstat
netstat -m
top
...





 OpenBSD 5.3-current (GENERIC.MP) #12: Mon Apr 15 15:18:44 CEST 2013
 matt...@kronenbourg.brimbelle.org:/usr/src/sys/arch/amd64/compile/
 GENERIC.MP
 real mem = 8571518976 (8174MB)
 avail mem = 8335634432 (7949MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf0710 (68 entries)
 bios0: vendor American Megatrends Inc. version 2003 date 12/14/2010
 bios0: ASUSTeK Computer INC. P7P55D
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S1 S3 S4 S5
 acpi0: tables DSDT FACP APIC MCFG OEMB HPET DMAR ASPT OSFR
 acpi0: wakeup devices P0P4(S4) BR1E(S4) UAR1(S4) PS2K(S4) PS2M(S4) EUSB(S4)
 USB0(S4) USB1(S4) USB2(S4) USB3(S4) USBE(S4) USB4(S4) USB5(S4) USB6(S4)
 BR21(S4) BR22(S4) BR23(S4) P0P1(S4) P0P3(S4) P0P5(S4) P0P6(S4) USB8(S4)
 BR20(S4) BR24(S4) BR25(S4) BR26(S4) BR27(S4)
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3374.33 MHz
 cpu0:

 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu0: 256KB 64b/line 8-way L2 cache
 cpu0: smt 0, core 0, package 0
 cpu0: apic clock running at 160MHz
 cpu1 at mainbus0: apid 4 (application processor)
 cpu1: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu1:

 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu1: smt 0, core 2, package 0
 cpu2 at mainbus0: apid 1 (application processor)
 cpu2: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu2:

 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu2: smt 1, core 0, package 0
 cpu3 at mainbus0: apid 5 (application processor)
 cpu3: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu3:

 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu3: 256KB 64b/line 8-way L2 cache
 cpu3: smt 1, core 2, package 0
 ioapic0 at mainbus0: apid 6 pa 0xfec0, version 20, 24 pins
 ioapic0: misconfigured as apic 1, remapped to apid 6
 acpimcfg0 at acpi0 addr 0xf800, bus 0-63
 acpihpet0 at acpi0: 14318179 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus 7 (BR1E)
 acpiprt2 at acpi0: bus -1 (BR21)
 acpiprt3 at acpi0: bus -1 (BR22)
 acpiprt4 at acpi0: bus -1 (BR23)
 acpiprt5 at acpi0: bus 1 (P0P1)
 acpiprt6 at acpi0: bus -1 (P0P3)
 acpiprt7 at acpi0: bus -1 (P0P5)
 acpiprt8 at acpi0: bus -1 (P0P6)
 acpiprt9 at acpi0: bus 6 (BR20)
 acpiprt10 at acpi0: bus 5 (BR24)
 acpiprt11 at acpi0: bus 4 (BR25)
 acpiprt12 at acpi0: bus 3 (BR26)
 acpiprt13 at acpi0: bus 2 (BR27)
 acpiec0 at acpi0
 acpicpu0 at acpi0
 acpicpu1 at acpi0
 acpicpu2 at acpi0
 acpicpu3 at acpi0
 aibs0 at acpi0: GGRP GITM SITM
 acpibtn0 at acpi0: PWRB
 pci0 at mainbus0 bus 0
 pchb0 at pci0 dev 0 function 0 Intel Core Host rev 0x12
 ppb0 at pci0 dev 1 function 0 Intel Core PCIE rev 0x12: msi
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 0 function 0 ATI Radeon HD 4670 rev 0x00
 radeondrm0 at vga1: apic 6 int 16
 drm0 at radeondrm0
 wsdisplay0 at vga1 mux 1: console 

Very slow NFS writes

2013-04-22 Thread Mattieu Baptiste
Hi,

I'm currently trying to access files from my OpenBSD -current/amd64
workstation on a NAS under FreeNAS (8.3.1). On my workstation, the
filesystem is a read/write NFS mounted share. Its size is about 5.2TB.
While reading seems normal : about 45MB/s, writing is a lot slower
(fluctuates between 10MB/s and 20MB/s) before eventually stall (under
1MB/s). Note that at the start, my box is totally unresponsive. When the
writes fall below 1MB/s, the box became responsive again.

PF is disabled on my box and on both sides, I have em(4) interfaces
(autoneg at 1000 baseT).

With CIFS shares, the NAS can do a lot more throughput : above 50MB/s
writes.

I suspect problems with the OpenBSD NFS client since I saw problems like
that in the archive. Moreover, the behavior of my box which became
unresponsive when writing at 20MB/s seems strange.

Any clues ?

I'm sorry to not have more factual numbers... except the dmesg of my box.
The NAS isn't accessible to me all the time. I can provide more details in
the future.


OpenBSD 5.3-current (GENERIC.MP) #12: Mon Apr 15 15:18:44 CEST 2013
matt...@kronenbourg.brimbelle.org:/usr/src/sys/arch/amd64/compile/
GENERIC.MP
real mem = 8571518976 (8174MB)
avail mem = 8335634432 (7949MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf0710 (68 entries)
bios0: vendor American Megatrends Inc. version 2003 date 12/14/2010
bios0: ASUSTeK Computer INC. P7P55D
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET DMAR ASPT OSFR
acpi0: wakeup devices P0P4(S4) BR1E(S4) UAR1(S4) PS2K(S4) PS2M(S4) EUSB(S4)
USB0(S4) USB1(S4) USB2(S4) USB3(S4) USBE(S4) USB4(S4) USB5(S4) USB6(S4)
BR21(S4) BR22(S4) BR23(S4) P0P1(S4) P0P3(S4) P0P5(S4) P0P6(S4) USB8(S4)
BR20(S4) BR24(S4) BR25(S4) BR26(S4) BR27(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3374.33 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: apic clock running at 160MHz
cpu1 at mainbus0: apid 4 (application processor)
cpu1: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 2, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 5 (application processor)
cpu3: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 2, package 0
ioapic0 at mainbus0: apid 6 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 1, remapped to apid 6
acpimcfg0 at acpi0 addr 0xf800, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 7 (BR1E)
acpiprt2 at acpi0: bus -1 (BR21)
acpiprt3 at acpi0: bus -1 (BR22)
acpiprt4 at acpi0: bus -1 (BR23)
acpiprt5 at acpi0: bus 1 (P0P1)
acpiprt6 at acpi0: bus -1 (P0P3)
acpiprt7 at acpi0: bus -1 (P0P5)
acpiprt8 at acpi0: bus -1 (P0P6)
acpiprt9 at acpi0: bus 6 (BR20)
acpiprt10 at acpi0: bus 5 (BR24)
acpiprt11 at acpi0: bus 4 (BR25)
acpiprt12 at acpi0: bus 3 (BR26)
acpiprt13 at acpi0: bus 2 (BR27)
acpiec0 at acpi0
acpicpu0 at acpi0
acpicpu1 at acpi0
acpicpu2 at acpi0
acpicpu3 at acpi0
aibs0 at acpi0: GGRP GITM SITM
acpibtn0 at acpi0: PWRB
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 Intel Core Host rev 0x12
ppb0 at pci0 dev 1 function 0 Intel Core PCIE rev 0x12: msi
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 ATI Radeon HD 4670 rev 0x00
radeondrm0 at vga1: apic 6 int 16
drm0 at radeondrm0
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
azalia0 at pci1 dev 0 function 1 ATI Radeon HD 4000 HD Audio rev 0x00: msi
azalia0: no supported codecs
ehci0 at pci0 dev 26 function 0 Intel 3400 USB rev 0x06: apic 6 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 Intel EHCI root hub 

Re: Very slow NFS writes

2013-04-22 Thread mxb
Have you tried to use jumbo frames (MTU 9000) on both client and server?
(If it is possible in your environment).

//mxb

On 22 apr 2013, at 14:46, Mattieu Baptiste mattie...@gmail.com wrote:

 Hi,
 
 I'm currently trying to access files from my OpenBSD -current/amd64
 workstation on a NAS under FreeNAS (8.3.1). On my workstation, the
 filesystem is a read/write NFS mounted share. Its size is about 5.2TB.
 While reading seems normal : about 45MB/s, writing is a lot slower
 (fluctuates between 10MB/s and 20MB/s) before eventually stall (under
 1MB/s). Note that at the start, my box is totally unresponsive. When the
 writes fall below 1MB/s, the box became responsive again.
 
 PF is disabled on my box and on both sides, I have em(4) interfaces
 (autoneg at 1000 baseT).
 
 With CIFS shares, the NAS can do a lot more throughput : above 50MB/s
 writes.
 
 I suspect problems with the OpenBSD NFS client since I saw problems like
 that in the archive. Moreover, the behavior of my box which became
 unresponsive when writing at 20MB/s seems strange.
 
 Any clues ?
 
 I'm sorry to not have more factual numbers... except the dmesg of my box.
 The NAS isn't accessible to me all the time. I can provide more details in
 the future.
 
 
 OpenBSD 5.3-current (GENERIC.MP) #12: Mon Apr 15 15:18:44 CEST 2013
matt...@kronenbourg.brimbelle.org:/usr/src/sys/arch/amd64/compile/
 GENERIC.MP
 real mem = 8571518976 (8174MB)
 avail mem = 8335634432 (7949MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf0710 (68 entries)
 bios0: vendor American Megatrends Inc. version 2003 date 12/14/2010
 bios0: ASUSTeK Computer INC. P7P55D
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S1 S3 S4 S5
 acpi0: tables DSDT FACP APIC MCFG OEMB HPET DMAR ASPT OSFR
 acpi0: wakeup devices P0P4(S4) BR1E(S4) UAR1(S4) PS2K(S4) PS2M(S4) EUSB(S4)
 USB0(S4) USB1(S4) USB2(S4) USB3(S4) USBE(S4) USB4(S4) USB5(S4) USB6(S4)
 BR21(S4) BR22(S4) BR23(S4) P0P1(S4) P0P3(S4) P0P5(S4) P0P6(S4) USB8(S4)
 BR20(S4) BR24(S4) BR25(S4) BR26(S4) BR27(S4)
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3374.33 MHz
 cpu0:
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu0: 256KB 64b/line 8-way L2 cache
 cpu0: smt 0, core 0, package 0
 cpu0: apic clock running at 160MHz
 cpu1 at mainbus0: apid 4 (application processor)
 cpu1: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu1:
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu1: smt 0, core 2, package 0
 cpu2 at mainbus0: apid 1 (application processor)
 cpu2: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu2:
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu2: smt 1, core 0, package 0
 cpu3 at mainbus0: apid 5 (application processor)
 cpu3: Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz, 3373.90 MHz
 cpu3:
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,LONG,LAHF,PERF,ITSC
 cpu3: 256KB 64b/line 8-way L2 cache
 cpu3: smt 1, core 2, package 0
 ioapic0 at mainbus0: apid 6 pa 0xfec0, version 20, 24 pins
 ioapic0: misconfigured as apic 1, remapped to apid 6
 acpimcfg0 at acpi0 addr 0xf800, bus 0-63
 acpihpet0 at acpi0: 14318179 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus 7 (BR1E)
 acpiprt2 at acpi0: bus -1 (BR21)
 acpiprt3 at acpi0: bus -1 (BR22)
 acpiprt4 at acpi0: bus -1 (BR23)
 acpiprt5 at acpi0: bus 1 (P0P1)
 acpiprt6 at acpi0: bus -1 (P0P3)
 acpiprt7 at acpi0: bus -1 (P0P5)
 acpiprt8 at acpi0: bus -1 (P0P6)
 acpiprt9 at acpi0: bus 6 (BR20)
 acpiprt10 at acpi0: bus 5 (BR24)
 acpiprt11 at acpi0: bus 4 (BR25)
 acpiprt12 at acpi0: bus 3 (BR26)
 acpiprt13 at acpi0: bus 2 (BR27)
 acpiec0 at acpi0
 acpicpu0 at acpi0
 acpicpu1 at acpi0
 acpicpu2 at acpi0
 acpicpu3 at acpi0
 aibs0 at acpi0: GGRP GITM SITM
 acpibtn0 at acpi0: PWRB
 pci0 at mainbus0 bus 0
 pchb0 at pci0 dev 0 function 0 Intel Core Host rev 0x12
 ppb0 at pci0 dev 1 function 0 Intel Core PCIE rev 0x12: msi
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 0 function 0 ATI Radeon HD 4670 rev 0x00
 radeondrm0 at vga1: apic 6 int 16
 drm0 at radeondrm0
 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
 

Re: Very slow NFS writes

2013-04-22 Thread Stuart Henderson
On 2013-04-22, Mattieu Baptiste mattie...@gmail.com wrote:
 Hi,

 I'm currently trying to access files from my OpenBSD -current/amd64
 workstation on a NAS under FreeNAS (8.3.1). On my workstation, the
 filesystem is a read/write NFS mounted share. Its size is about 5.2TB.
 While reading seems normal : about 45MB/s, writing is a lot slower
 (fluctuates between 10MB/s and 20MB/s) before eventually stall (under
 1MB/s). Note that at the start, my box is totally unresponsive. When the
 writes fall below 1MB/s, the box became responsive again.

I had a lot of problems with NFS writes dragging the client to a halt
with NFSv3 on *some* systems which were greatly improved by switching to
NFSv2. On the other hand, other machines were perfectly OK with it... 

NFSv2 has other problems, not least a big write amplification effect
when NFS and disk block sizes don't match (at least with OpenBSD as a
server), also it limits files to 2GB which makes it unusable in some
situations, but it might be worth a try to see if the problem remains.