Re: ral rt2661 tx interrupt race fix

2012-10-04 Thread Stefan Sperling
On Sun, Sep 30, 2012 at 10:32:23PM +0100, Tom Murphy wrote:
 Stefan,
 
   Your patch works well on my system:
 
   ral0 at pci0 dev 14 function 0 Ralink RT2661 rev 0x00: irq 10, address 
   00:14:85:d5:39:bb
   ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)
 
   Only problem is downloading from the net is extremely slow. Benchmarks
   have it at 512 KB/sec as opposed to 10 megabits/s.
 
   This is (internet) - OpenBSD - ral0 - wifi client
 
   But it doesn't crash or bring up OACTIVE flag anymore which is
   fantastic.. however, it's a little to slow to use with any regularity.
 
   Uploads are fine (wifi - ral(4) - OpenBSD - out to the net). 

I've already replied to Tom in private requesting some additional
testing but I would still be interested in other reports about
transmission speed issues with ral.

I've also noticed that my ral-based AP can be ridiculously slow.
I believe this is a separate bug which is independent of the OACTIVE
lock-up problem.

Is anyone else out there seeing this, too?

IIRC dragonfly have some ral performance fixes in their git history
which haven't been ported back to OpenBSD yet.



Re: ral rt2661 tx interrupt race fix

2012-10-04 Thread Kenneth R Westerback
On Thu, Oct 04, 2012 at 07:21:50PM +0200, Stefan Sperling wrote:
 On Sun, Sep 30, 2012 at 10:32:23PM +0100, Tom Murphy wrote:
  Stefan,
  
Your patch works well on my system:
  
ral0 at pci0 dev 14 function 0 Ralink RT2661 rev 0x00: irq 10, address 
00:14:85:d5:39:bb
ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)
  
Only problem is downloading from the net is extremely slow. Benchmarks
have it at 512 KB/sec as opposed to 10 megabits/s.
  
This is (internet) - OpenBSD - ral0 - wifi client
  
But it doesn't crash or bring up OACTIVE flag anymore which is
fantastic.. however, it's a little to slow to use with any regularity.
  
Uploads are fine (wifi - ral(4) - OpenBSD - out to the net). 
 
 I've already replied to Tom in private requesting some additional
 testing but I would still be interested in other reports about
 transmission speed issues with ral.
 
 I've also noticed that my ral-based AP can be ridiculously slow.
 I believe this is a separate bug which is independent of the OACTIVE
 lock-up problem.
 
 Is anyone else out there seeing this, too?
 
 IIRC dragonfly have some ral performance fixes in their git history
 which haven't been ported back to OpenBSD yet.
 

I gave up on my ral ap a couple of years ago due to ridiculous slowness,
but the athn replacement was just as slow. Got a new motherboard
going into the firewall shortly and may be motivated to recheck the
wireless performance of -current then.

 Ken



Re: ral rt2661 tx interrupt race fix

2012-09-30 Thread Tom Murphy
Stefan,

  Your patch works well on my system:

  ral0 at pci0 dev 14 function 0 Ralink RT2661 rev 0x00: irq 10, address 
  00:14:85:d5:39:bb
  ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)

  Only problem is downloading from the net is extremely slow. Benchmarks
  have it at 512 KB/sec as opposed to 10 megabits/s.

  This is (internet) - OpenBSD - ral0 - wifi client

  But it doesn't crash or bring up OACTIVE flag anymore which is
  fantastic.. however, it's a little to slow to use with any regularity.

  Uploads are fine (wifi - ral(4) - OpenBSD - out to the net). 

  Tom



Re: ral rt2661 tx interrupt race fix

2012-08-16 Thread Tobias Ulmer
On Wed, Aug 15, 2012 at 10:17:09PM +0200, Stefan Sperling wrote:
 On Tue, Aug 14, 2012 at 08:31:31PM +0200, Tobias Ulmer wrote:
  Finally got around to dig out the card and put it in a Blade 1500. With
  -current, I get around 10Mb using tcpbench. With your diff,
  freelist corruption messages pop up rather quickly, speed is the same. I
  guess the panic at the end is just an effect of the modify after free.
 
 Thanks for catching this, Tobias!
 
 I could reproduce this problem on a Blade 100 with a Ralink RT2561S card.
 This updated diff should fix the problem. It works well for me in both
 hostap and STA modes.

Survived 9h of tcpbench testing here, no problems (except for a low
data rate today, but there was no difference to the stock kernel)



Re: ral rt2661 tx interrupt race fix

2012-08-14 Thread Tobias Ulmer
On Fri, Aug 03, 2012 at 01:31:18PM +0200, Stefan Sperling wrote:
 I haven't received any test reports so far, apart from my own
 testing and edd@'s testing. It's been working splendid for me
 so far but I would like to get more testing if possible.
 
 I've Bcc'd some people who were involved in testing earlier ral diffs
 or complained about ral not working for them. It would be great if
 some of you could try this diff on access points and clients. Cheers!

Finally got around to dig out the card and put it in a Blade 1500. With
-current, I get around 10Mb using tcpbench. With your diff,
freelist corruption messages pop up rather quickly, speed is the same. I
guess the panic at the end is just an effect of the modify after free.

Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2012 OpenBSD. All rights reserved.  http://www.OpenBSD.org

: Tue Aug 14 19:57:29 CEST 2012
tobi...@blade.tmux.org:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 2147483648 (2048MB)
avail mem = 2100854784 (2003MB)
mainbus0 at root: Sun Blade 1500 (Silver)
cpu0 at mainbus0: SUNW,UltraSPARC-IIIi (rev 3.4) @ 1503 MHz
cpu0: physical 32K instruction (32 b/l), 64K data (32 b/l), 1024K external (64 
b/l)
memory-controller at mainbus0 not configured
schizo0 at mainbus0: Tomatillo, version 4, ign 780, bus A 0 to 1
schizo0: dvma map c000-dfff
pci0 at schizo0
ebus0 at pci0 dev 7 function 0 Acer Labs M1533 ISA rev 0x00
flashprom at ebus0 addr 0-f not configured
rtc0 at ebus0 addr 70-71: m5823
pcfiic0 at ebus0 addr 320-321 ivec 0x2e
iic0 at pcfiic0
admtt0 at iic0 addr 0x2e
spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem1 at iic0 addr 0x51: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem2 at iic0 addr 0x52: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem3 at iic0 addr 0x53: 512MB DDR SDRAM registered ECC PC2300CL2.5
ics951601 at iic0 addr 0x69 not configured
power0 at ebus0 addr 800-82f ivec 0x20
com0 at ebus0 addr 3f8-3ff ivec 0x2c: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2c: ns16550a, 16 byte fifo
dma at ebus0 addr 0- not configured
alipm0 at pci0 dev 6 function 0 Acer Labs M7101 Power rev 0x00: 223KHz clock
iic1 at alipm0
scm001 at alipm0 addr 0x20 skipped due to alipm0 bugs
autri0 at pci0 dev 8 function 0 Acer Labs M5451 Audio rev 0x02: ivec 0x7a4
ac97: codec id 0x41445348 (Analog Devices AD1881A)
ac97: codec features headphone, Analog Devices Phat Stereo
audio0 at autri0
midi0 at autri0: 4DWAVE MIDI UART
ohci0 at pci0 dev 10 function 0 Acer Labs M5237 USB rev 0x03: ivec 0x7a7, 
version 1.0, legacy support
ohci1 at pci0 dev 11 function 0 Acer Labs M5237 USB rev 0x03: ivec 0x7a6, 
version 1.0, legacy support
pciide0 at pci0 dev 13 function 0 Acer Labs M5229 UDMA IDE rev 0xc4: DMA, 
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x798 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: ST3120026A
wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
wd1 at pciide0 channel 0 drive 1: ST3250620A
wd1: 16-sector PIO, LBA48, 238474MB, 488395055 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
ral0 at pci0 dev 2 function 0 Ralink RT2561 rev 0x00: ivec 0x790, address 
00:80:5a:38:c4:0b
ral0: MAC/BBP RT2661B, RF RT2527
ppb0 at pci0 dev 3 function 0 TI PCI2250 PCI-PCI rev 0x02
pci1 at ppb0 bus 1
ohci2 at pci1 dev 8 function 0 NEC USB rev 0x43: ivec 0x78c, version 1.0
ohci3 at pci1 dev 8 function 1 NEC USB rev 0x43: ivec 0x78d, version 1.0
ehci0 at pci1 dev 8 function 2 NEC USB rev 0x04: ivec 0x78e
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 NEC EHCI root hub rev 2.00/1.00 addr 1
TI TSB43AB23 FireWire rev 0x00 at pci1 dev 11 function 0 not configured
usb1 at ohci2: USB revision 1.0
uhub1 at usb1 NEC OHCI root hub rev 1.00/1.00 addr 1
usb2 at ohci3: USB revision 1.0
uhub2 at usb2 NEC OHCI root hub rev 1.00/1.00 addr 1
usb3 at ohci0: USB revision 1.0
uhub3 at usb3 Acer Labs OHCI root hub rev 1.00/1.00 addr 1
usb4 at ohci1: USB revision 1.0
uhub4 at usb4 Acer Labs OHCI root hub rev 1.00/1.00 addr 1
ppm at mainbus0 not configured
schizo1 at mainbus0: Tomatillo, version 4, ign 7c0, bus B 0 to 0
schizo1: dvma map c000-dfff
pci2 at schizo1
bge0 at pci2 dev 2 function 0 Broadcom BCM5703 rev 0x00, BCM5702/5703 A2 
(0x1002): ivec 0x7dc, address 00:14:4f:23:72:a2
brgphy0 at bge0 phy 1: BCM5703 10/100/1000baseT PHY, rev. 2
ifb0 at pci2 dev 3 function 0 3D Labs Wildcat 5110 rev 0x01
ifb0: XVR-1200 (SUNW,375-3101), 1152x900
wsdisplay0 at ifb0 mux 1
wsdisplay0: screen 0 added (std, sun emulation)
3D Labs Wildcat 5110 rev 0x01 at pci2 dev 3 function 1 not configured
i2c at mainbus0 not configured
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
scsibus1 at softraid0: 256 targets
bootpath: 

ral rt2661 tx interrupt race fix

2012-07-18 Thread Stefan Sperling
ral(4) cards using the rt2661 driver code (RT2561, RT2561S, RT2661
variants) suffer a race in TX interrupt handling which can cause
TX processing to get stuck.

This problem was previously discussed here:
http://marc.info/?l=openbsd-miscm=125895269930106w=2
The patch proposed there was rejected by Damien for reasons unknown to me:
http://marc.info/?l=openbsd-techm=126451543507085w=2

Sepherosa Ziehau from dragonfly fixed this problem in a different way:
http://gitweb.dragonflybsd.org/dragonfly.git/commit/9dd87f8a6f730e54dfa4f21f3d9fae6a615b1908

My approach for a fix is based on Sephe's approach.

With 2661, TX interrupt processing is split between two interrupt handlers.
One that runs when the chip has offloaded the frame to the ASIC
(rt2661_tx_dma_intr) and one that runs when the ASIC reports back
about transmission success or failure (rt2661_tx_intr).

Currently, we free the mbuf for the transmitted frame during the
first interrupt (rt2661_tx_dma_intr), but do not release the STA
node reference yet and do not make space in the TX queue.

When the second interrupt handler (rt2661_tx_intr) runs, we obtain the
STA node corresponding to the transmitted frame, and update the STA node's
associated AMRR node, which contains counters used by the rate adaptation
code (Adaptive Multi Rate Retry algorithm).
Then space is made in the TX queue.

This approach assumes that rt2661_tx_intr and rt2661_tx_dma_intr will always
be triggered in the right amount and order. This assumption doesn't seem
to be safe. I've seen the driver end up in a situation where some entries
in the TX queue are not removed at all.

To fix this, we can keep track of which AMRR node needs to be updated when
the second interrupt handler gets to run. By not tying the AMRR node to the
STA node directly, we can release the STA node reference in the first
interrupt handler and make room in the TX queue there.

The hardware provides an 8-bit field for driver-specific data in the
TX descriptor, which can be written before TX and read back when TX
completes. This field is currently used to store the ID of the queue
a frame was transmitted from, and read back in rt2661_tx_intr to obtain,
via the queue, the STA node corresponding to the transmitted frame.

Instead of storing a queue ID, we can store an identifier for the AMRR
node that needs to be updated when the TX has completed (in case the
STA node has already been freed when the second interrupt occurs,
we free the corresponding AMRR node instead of updating it).

With the patch below I've haven't seen my ral AP lock up in 2 days,
whereas before I could trigger the problem simply by taking the laptop
to my kitchen, where signal to the AP is poor.

Can anyone confirm that this patch provides similar benefits elsewhere?
If your ral(4) device reports itself as Ralink RT2561, Ralink RT2561S,
or Ralink RT2661, please test and report back. Thanks!

Index: rt2661.c
===
RCS file: /cvs/src/sys/dev/ic/rt2661.c,v
retrieving revision 1.67
diff -u -p -r1.67 rt2661.c
--- rt2661.c17 Jul 2012 14:43:12 -  1.67
+++ rt2661.c18 Jul 2012 13:38:39 -
@@ -34,6 +34,7 @@
 #include sys/timeout.h
 #include sys/conf.h
 #include sys/device.h
+#include sys/queue.h
 
 #include machine/bus.h
 #include machine/endian.h
@@ -57,6 +58,7 @@
 #include net80211/ieee80211_var.h
 #include net80211/ieee80211_amrr.h
 #include net80211/ieee80211_radiotap.h
+#include net80211/ieee80211_node.h
 
 #include dev/ic/rt2661var.h
 #include dev/ic/rt2661reg.h
@@ -88,6 +90,8 @@ void  rt2661_reset_rx_ring(struct rt2661
 void   rt2661_free_rx_ring(struct rt2661_softc *,
struct rt2661_rx_ring *);
 struct ieee80211_node *rt2661_node_alloc(struct ieee80211com *);
+void   rt2661_node_free(struct ieee80211com *,
+   struct ieee80211_node *);
 intrt2661_media_change(struct ifnet *);
 void   rt2661_next_scan(void *);
 void   rt2661_iter_func(void *, struct ieee80211_node *);
@@ -115,7 +119,7 @@ uint16_trt2661_txtime(int, int, uint32_
 uint8_trt2661_plcp_signal(int);
 void   rt2661_setup_tx_desc(struct rt2661_softc *,
struct rt2661_tx_desc *, uint32_t, uint16_t, int, int,
-   const bus_dma_segment_t *, int, int);
+   const bus_dma_segment_t *, int, int, u_int8_t);
 intrt2661_tx_mgt(struct rt2661_softc *, struct mbuf *,
struct ieee80211_node *);
 intrt2661_tx_data(struct rt2661_softc *, struct mbuf *,
@@ -156,6 +160,13 @@ intrt2661_prepare_beacon(struct rt2661
 #endif
 void   rt2661_enable_tsf_sync(struct rt2661_softc *);
 intrt2661_get_rssi(struct rt2661_softc *, uint8_t);
+struct rt2661_amrr_node *rt2661_amrr_node_alloc(struct ieee80211com *,
+   struct rt2661_node *);
+void