While doing more tests with my setup, I've noticed quite large
differences between required and effective throughput, both in
hfsc and cbq cases. The problems start, when requested bandwidth
is above 10 - 12 Mb and is also *limited* (so either upperlimit
is set in hfsc, or there's no borrow keyword in cbq). The tests
were made with 26Mb limits, and later with 40Mb (and some values
in between, but the effects were similar - results at the bottom
before dmesg).

Simplified queue setup (the box normally functions as 3 interface
router):

if_100  = fxp0
srv_100 = 192.168.100.1
tport   = 1234

#all 3 at their respective defaults for 100Mb

bw100  = 100Mb
tbr100 = 12000
que100 = 50

altq on $if_100 hfsc bandwidth $bw100 tbrsize $tbr100 \
  qlimit $que100 queue { if100_misc, if100_test, if100_ack }

queue if100_misc on $if_100  bandwidth  5% priority 1 \
  qlimit $que100 hfsc(realtime  2% linkshare  5% upperlimit 100% default)
queue if100_ack  on $if_100  bandwidth  5% priority 7 \
  qlimit $que100 hfsc(realtime  2% linkshare  5% upperlimit 100%)

#"test" queue checked with many variations of realtime and linkshare
#none of them mattered, as long as upperlimit was there

queue if100_test on $if_100  bandwidth 26% priority 3 \
  qlimit $que100 hfsc(realtime 24% linkshare 26% upperlimit  26%)

#alternative cbq version:
#queue if100_test on $if_100  bandwidth 26% priority 3 \
#  qlimit $que100 cbq

#and the rule

pass in on $if_100 inet proto tcp from any to $srv_100 port { \
                80, $tport} \
        flags S/SAFR keep state queue (if100_test, if100_ack)


For actual testing, either 2 NCs were used:

'nc -l $tport </dev/zero' on tested box
and 'nc $srv_100 $tport >/dev/null' on another (linux, with disabled ipt)

or

httpd - standard one from obsd 3.9
and 'wget -O /dev/null http://$srv_100/zeros'

zeros was an earlier prepared 0.5GB file with dd

Anyway, as long as OBSD is on it's default settings (including all
net.inet.tcp ones), the observed throughput (pfctl -vvsq, pftop,
etc.) was around 17Mb, instead of 26Mb. As already mentioned, even
realtime set to 26Mb didn't change it a bit.

But, following methods helped:

 - saturating link with another transfer. I'm not really sure how
   to interpret this. As long as another pair of NCs / ftp / http
   used some (larger) amount of bandwidth, the problematic link
   automagically achieved upperlimit speed (26Mb in above case)

 - decreasing tbrsize also helped (to 3000 or less), but
   at the cost of increased interrupt load, as expected

 - increasing net.inet.tcp.sendspace from default 16k to 24k or more
   (actually, the bigger the upperlimit was, the more I had to push
   the window) also fixed it. Which seems quite unusual considering
   lan (=fast acks) conditions and not too big speed after all:
   20 - 40 Mb

OTOH, following things made no difference of whatsoever:

 - different network cards (xl, vr)
 - cpu speed (1000 and 1400)
 - adjusting realtime / linkshare values, as long as upperlimit was set
 - adjusting qlimit or priority
 - using only one queue, instead of two (for low delay etc. stuff)
 - removing all the queue definitions, except from the one used
 - other even less logical stuff I probably forgot

When I switched scheduler from hfsc to cbq, the speed was maintained at
~~21.8Mb, and in this case changing tbrsize or sendspace barely mattered
at all. Still, saturating link with some other stuff, had the same
positive effect, as in hfsc case.

Increasing the limit to ~40Mb didn't improve much. Hfsc remained around
17.6Mb, cbq a tiny bit better - 23.12Mb. Saturating link was, as usual,
the ultimate cure.

At limits below 10Mb - no such things happen. Both hfsc and cbq set the
speed very precisely.

Is this an expected behaviour ? As mentioned, it's not an issue for small
speeds, but for larger ones, it somewhat rules out strict limiting
of the bandwidth.

And, when saturating link or messing with sendspace help, it seems a bit
strange either way.

Here are the results mentioned at the beginning:

cbq 12k vs. 1500

 8Mb    -  8 / 8
10Mb    - 10 / 10
12Mb    - 12 / 12
14Mb    - 14 / 14
16Mb    - 15.80 / 16
18Mb    - 15.84 / 18
20Mb    - 16    / 20
22Mb    - 17.05 / 21.90
24Mb    - 17.07 / 23.87
26Mb    - 17.18 / 25.73 (19.78 @ 5000) (22 @ 4000) (25.3 @ 3000)
28Mb    - 20.05 / 27.74
30Mb    - 22.93 / 29.65
32Mb    - 21.93 / 31.88 (drop ?)
..


OBSD 3.9, generic (stable) kernel. dmesg follows (nothing extraordinary there though - old msi k7t266pro2 motherboard, with old good athlon and
new seasonic psu):

cpu0: AMD Athlon(tm) Processor ("AuthenticAMD" 686-class, 256KB L2 cache) 1.06 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
cpu0: AMD Powernow: TS
real mem  = 536387584 (523816K)
avail mem = 455569408 (444892K)
using 4278 buffers containing 53846016 bytes (52584K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 06/18/02, BIOS32 rev. 0 @ 0xfdb00
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
apm0: flags 30102 dobusy 0 doidle 1
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf7ca0/192 (10 entries)
pcibios0: PCI Interrupt Router at 000:17:0 ("VIA VT8233 ISA" rev 0x00)
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc0000/0x8000
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "VIA VT8366 PCI" rev 0x00
ppb0 at pci0 dev 1 function 0 "VIA VT8366 AGP" rev 0x00
pci1 at ppb0 bus 1
vga1 at pci0 dev 5 function 0 "S3 ViRGE DX/GX" rev 0x01
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-3 added (80x25, vt100 emulation)
xl0 at pci0 dev 6 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 10, address 00:10:5a:f3:5e:73
exphy0 at xl0 phy 24: 3Com internal media interface
fxp0 at pci0 dev 7 function 0 "Intel 8255x" rev 0x01, i82557: irq 12, address 00:a0:c9:24:3a:bd
nsphy0 at fxp0 phy 1: DP83840 10/100 PHY, rev. 0
xl1 at pci0 dev 8 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 5, address 00:50:da:7b:f1:7c
exphy1 at xl1 phy 24: 3Com internal media interface
viapm0 at pci0 dev 17 function 0 "VIA VT8233 ISA" rev 0x00
iic0 at viapm0
"unknown" at iic0 addr 0x18 not configured
lm1 at iic0 addr 0x2d: W83627HF
pciide0 at pci0 dev 17 function 1 "VIA VT82C571 IDE" rev 0x06: ATA100, channel 0 configured to compatibility, channel 1 configured to compatibility
wd0 at pciide0 channel 0 drive 0: <ST3802110A>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
atapiscsi0 at pciide0 channel 0 drive 1
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <LG, CD-ROM CRD-8522B, 1.02> SCSI0 5/cdrom removable
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
cd0(pciide0:0:1): using PIO mode 4, DMA mode 2
wd1 at pciide0 channel 1 drive 0: <WDC WD400JB-00JJA0>
wd1: 16-sector PIO, LBA, 38166MB, 78165360 sectors
wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 5
isa0 at mainbus0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lm0 at isa0 port 0x290/8: W83627HF
lm1 detached
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask ebcd netmask ffed ttymask ffef
pctr: user-level cycle counter enabled
mtrr: Pentium Pro MTRR support
dkcsum: wd0 matches BIOS drive 0x80
dkcsum: wd1 matches BIOS drive 0x81
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302

Reply via email to