Re: Instability in -current with ral/rt2860?

2009-02-17 Thread Dorian Büttner

bbee schrieb:

On Sat, 7 Feb 2009, Dorian B|ttner wrote:

bbee schrieb:

 In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
 ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

Do you also have the mini pci card? Here is mine, which I got myself 
2 or 3 days ago:
ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, 
address 00:08:54:86:5e:6e

ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)
Also in a net5501.


No, this is a pci card, an Edimax EW-7728IN, it's listed in ral(4).

Totally can't confirm the 2 hour time frame with the need for reboot. 
I have connection loss every now and then, but I'm just on my way 
getting some external antennas for the box, hope that will help. 
During initial setup, but that will be more a problem on 
configuration side (pf or somewhat) I had a system crash which made 
the box come with a date 20h in future, which I haven't seen before.


I wasn't having connection problems before, other than described in PR 
5958, where ral just stops transmitting. If that's also the issue 
you're having (which is different from the hangs I'm having with 
-current), then I don't think it's an antenna problem. I have 3 10dBi 
omni antennas on this ral.


Did you try -current? Could you please grab a snapshot, disable the 
watchdog and see if your box also hangs after a few hours, since we 
have pretty much identical hardware? It's 5 minutes of work using 
bsd.rd and sysmerge.


Can you assure your powersupply is ok/not running at it's limit? 
There have been issues with those originally shipped with the 5501.


I'm pretty sure the PSU is ok, it worked fine for 4.4 and it's an 
official Soekris one (well.. sort of.. I haggled it off of Wim, no 
idea where he got it :)


Thanks for your reply.

bbee
M y net5501 showed that error again today, it didn't accept any input on 
the serial anymore, but I have two pics from systat vmstat and top - if 
anyone thinks they're useful I can send them offlist. However I 
personally haven't detected anything unusual in the screens.

If anything else to test, please let me know.



Re: Instability in -current with ral/rt2860?

2009-02-08 Thread Andreas Vögele
bbee writes:

 Hi,

 In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
 ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

 I've been trying snapshots off and on since damien@ started tinkering
 with the rt2860 code two months ago. With any snapshot from the last 2
 months, I can't get the box to stay up for more than 2 hours (or less)
 without it rebooting.  [...]

No problems here.  I've got a net4801 with a SparkLAN WMIR-215GN Mini
PCI card, running the snapshot from 23rd December:

OpenBSD 4.4-current (GENERIC) #1637: Tue Dec 23 15:22:33 MST 2008
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
[...]
ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 11, address 
00:0e:8e:xx:xx:xx
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

The net4801 is up for 44 days.  It's an open access point.  WEP and WPA
aren't enabled.  Only 11g connections are accepted.  The interface is
configured with these settings:

inet 192.168.0.1 255.255.255.0 NONE media autoselect mode 11g mediaopt hostap 
nwid myexample chan 5

I've put another SparkLAN card into my laptop but I've connected to the
access point with Atheros and Intel cards as well.  Also, several
neighbours have used my access point in recent weeks.

Regards,
Andreas



Re: Instability in -current with ral/rt2860?

2009-02-08 Thread Stuart Henderson
On 2009-02-07, bbee bumble@xs4all.nl wrote:
 On Sat, 7 Feb 2009, Stuart Henderson wrote:

 enable ddb.console=1 and send it a BREAK, see if you can get some
 trace out of ddb.

 Thanks for the suggestion. I tried it, but the kernel's not responding to
 the break :(

Does BREAK work under normal circumstances (i.e. before crashing)?
It should drop you to ddb, from where you can type c to continue.
If the watchdog is enabled doing this will trigger a reboot if you
don't continue quickly enough.

 send dmesg :-)

 I'd rather not spam the list

it's much spammier to *not* include it, then be asked to send it,
then to say no.

 it's just an ordinary net5501, dmesg is easily googled.

that says nothing about the exact OS version you have installed.
or how the particular kernel you're running picks up the devices
on your particular hardware.

even making people stop and think, oh that's a net5501, hmm that
has a geode cpu so it _must_ be running some i386 kernel (even if
they already know and don't have to stop reading mail and go into
a web browser and look it up) wastes their time.

the point of including dmesg is to include relevant details in one
place, to save time for people who might be interested in looking
into the problem.

and in any event, google does not easily find me a dmesg from a
net5501 with an RT2860 running OpenBSD.

 I've been running recent snaps on an ALIX board with RT2860 with
 no trouble.

 That's.. unfortunate. I keep thinking that since some people don't even see
 the problems with traffic stalling in PR 5958, there might be something
 specific to the location of the AP, like load or some specific client that
 makes it go boom. Grasping at straws, here.

well, I have seen the problem from 5958 on one busy AP with a larger
range of clients, but never seen it on my home AP in a relatively
uncrowded area RF-wise with just a couple of OpenBSD clients...
but the problem one is quiet over the winter, so I can't tell if the
fixes from early December helped yet.

FWIW, here's how it looks in the alix2c3 (working).

OpenBSD 4.4-current (GENERIC) #1672: Fri Feb  6 14:11:28 MST 2009
t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 499 
MHz
cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX
real mem  = 268009472 (255MB)
avail mem = 25088 (239MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 12/10/07, BIOS32 rev. 0 @ 0xfceb2
pcibios0 at bios0: rev 2.1 @ 0xf/0x1
pcibios0: pcibios_get_intr_routing - function not supported
pcibios0: PCI IRQ Routing information unavailable.
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xe/0xa800
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33
glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES
vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10, address 
00:0d:b9:13:51:98
ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 
00:0d:b9:13:51:99
ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 12, address 
00:0d:b9:13:51:9a
ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
ral0 at pci0 dev 12 function 0 Ralink RT2860 rev 0x00: irq 9, address 
00:0e:8e:1d:f1:71
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)
glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 0, 32-bit 
3579545Hz timer, watchdog, gpio
gpio0 at glxpcib0: 32 pins
pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: SanDisk SDCFJ-1024
wd0: 4-sector PIO, LBA, 977MB, 2001888 sectors
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2
pciide0: channel 1 ignored (disabled)
ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 15, version 1.0, 
legacy support
ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 15
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1
isa0 at glxpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1
biomask e1ef netmask ffef ttymask 
mtrr: K6-family MTRR support (2 registers)
nvram: invalid checksum
softraid0 at root
root on wd0a swap on wd0b dump on wd0b
clock: unknown CMOS layout



Re: Instability in -current with ral/rt2860?

2009-02-07 Thread Stuart Henderson
In gmane.os.openbsd.misc, you wrote:
 Hi,

 In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
 ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

 I've been trying snapshots off and on since damien@ started tinkering with 
 the rt2860 code two months ago. With any snapshot from the last 2 months, I 
 can't get the box to stay up for more than 2 hours (or less) without it 
 rebooting. If I turn off the watchdog timer, it will just hang without 
 printing any messages. If I ifconfig ral0 down, the box is rock stable.

enable ddb.console=1 and send it a BREAK, see if you can get some
trace out of ddb.

leave some sessions open, run things like top -s.1, systat vmstat .1
and see what the system's doing when it freezes.

send dmesg :-)

 Is anyone else seeing this with -current or a snapshot, with this ral or a 
 different one? I'd file a problem report but there's nothing to go on, 
 other than my suspicions that the changes to rt2860 in the last 2 months 
 are the cause.
 I can try to narrow it down to a specific commit if that will help?

I've been running recent snaps on an ALIX board with RT2860 with
no trouble.



Re: Instability in -current with ral/rt2860?

2009-02-07 Thread Dorian Büttner

bbee schrieb:

Hi,

In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

Do you also have the mini pci card? Here is mine, which I got myself 2 
or 3 days ago:
ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, address 
00:08:54:86:5e:6e

ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)
Also in a net5501.
I've been trying snapshots off and on since damien@ started tinkering 
with the rt2860 code two months ago. With any snapshot from the last 2 
months, I can't get the box to stay up for more than 2 hours (or less) 
without it rebooting. If I turn off the watchdog timer, it will just 
hang without printing any messages. If I ifconfig ral0 down, the box 
is rock stable.


Is anyone else seeing this with -current or a snapshot, with this ral 
or a different one? I'd file a problem report but there's nothing to 
go on, other than my suspicions that the changes to rt2860 in the last 
2 months are the cause.

I can try to narrow it down to a specific commit if that will help?

Totally can't confirm the 2 hour time frame with the need for reboot. I 
have connection loss every now and then, but I'm just on my way getting 
some external antennas for the box, hope that will help. During initial 
setup, but that will be more a problem on configuration side (pf or 
somewhat) I had a system crash which made the box come with a date 20h 
in future, which I haven't seen before.
Can you assure your powersupply is ok/not running at it's limit? There 
have been issues with those originally shipped with the 5501.
If I switch to a 4.4 kernel, the hangs stop but the widely reported 
ral traffic freezes are still there (PR 5958), which was what I was 
hoping to fix.


Please CC,

bbee




Re: Instability in -current with ral/rt2860?

2009-02-07 Thread bbee

On Sat, 7 Feb 2009, Stuart Henderson wrote:

In gmane.os.openbsd.misc, you wrote:

In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

I've been trying snapshots off and on since damien@ started tinkering with
the rt2860 code two months ago. With any snapshot from the last 2 months, I
can't get the box to stay up for more than 2 hours (or less) without it
rebooting. If I turn off the watchdog timer, it will just hang without
printing any messages. If I ifconfig ral0 down, the box is rock stable.


enable ddb.console=1 and send it a BREAK, see if you can get some
trace out of ddb.


Thanks for the suggestion. I tried it, but the kernel's not responding to 
the break :(



leave some sessions open, run things like top -s.1, systat vmstat .1
and see what the system's doing when it freezes.


Right, the top is not showing anything out of the ordinary, the vmstat 
shows 7.1% interrupt load and nothing else on the processor at that time:


   7.1%Int   0.0%Sys   0.0%Usr   0.0%Nic  92.9%Idle

Interrupts
732 total
229 vr0
192 vr3
 82 ral0
pciide0
ohci0
com0
101 clock
128 rtc

Proc:r  d  s  wCsw   Trp   Sys   Int   Sof  Flt
   755 9   841   731   110  357

All seems fairly standard to me, some light load on lan/wlan.


send dmesg :-)


I'd rather not spam the list, it's just an ordinary net5501, dmesg is 
easily googled.



Is anyone else seeing this with -current or a snapshot, with this ral or a
different one? I'd file a problem report but there's nothing to go on,
other than my suspicions that the changes to rt2860 in the last 2 months
are the cause.
I can try to narrow it down to a specific commit if that will help?


I've been running recent snaps on an ALIX board with RT2860 with
no trouble.


That's.. unfortunate. I keep thinking that since some people don't even see 
the problems with traffic stalling in PR 5958, there might be something 
specific to the location of the AP, like load or some specific client that 
makes it go boom. Grasping at straws, here.


Thanks for the suggestion,

bbee



Re: Instability in -current with ral/rt2860?

2009-02-07 Thread bbee
On Sat, 7 Feb 2009, Dorian B|ttner wrote:
 bbee schrieb:
  In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
  ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
  ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

 Do you also have the mini pci card? Here is mine, which I got myself 2 or 3
 days ago:
 ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, address
 00:08:54:86:5e:6e
 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)
 Also in a net5501.

No, this is a pci card, an Edimax EW-7728IN, it's listed in ral(4).

 Totally can't confirm the 2 hour time frame with the need for reboot. I have
 connection loss every now and then, but I'm just on my way getting some
 external antennas for the box, hope that will help. During initial setup,
but
 that will be more a problem on configuration side (pf or somewhat) I had a
 system crash which made the box come with a date 20h in future, which I
 haven't seen before.

I wasn't having connection problems before, other than described in PR
5958, where ral just stops transmitting. If that's also the issue you're
having (which is different from the hangs I'm having with -current), then I
don't think it's an antenna problem. I have 3 10dBi omni antennas on this
ral.

Did you try -current? Could you please grab a snapshot, disable the
watchdog and see if your box also hangs after a few hours, since we have
pretty much identical hardware? It's 5 minutes of work using bsd.rd and
sysmerge.

 Can you assure your powersupply is ok/not running at it's limit? There have
 been issues with those originally shipped with the 5501.

I'm pretty sure the PSU is ok, it worked fine for 4.4 and it's an official
Soekris one (well.. sort of.. I haggled it off of Wim, no idea where he got
it :)

Thanks for your reply.

bbee



Re: Instability in -current with ral/rt2860?

2009-02-07 Thread Lars Kotthoff
FYI, I'm having the same problems with

ral0 at pci0 dev 21 function 0 Ralink RT2860 rev 0x00: irq 11, address 
00:00:00:00:00:00
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (2T3R)

I get both traffic freezes and instability with 4.3 and 4.4 kernels, although
the box is stable for a bit longer (a couple of days). I've posted about this
before, only I wasn't sure about the cause then. There's nothing that explains
the instability (no increased CPU or memory usage, nothing in any log, no
increased traffic).

Lars



Instability in -current with ral/rt2860?

2009-02-06 Thread bbee

Hi,

In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot:
ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10
ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R)

I've been trying snapshots off and on since damien@ started tinkering with 
the rt2860 code two months ago. With any snapshot from the last 2 months, I 
can't get the box to stay up for more than 2 hours (or less) without it 
rebooting. If I turn off the watchdog timer, it will just hang without 
printing any messages. If I ifconfig ral0 down, the box is rock stable.


Is anyone else seeing this with -current or a snapshot, with this ral or a 
different one? I'd file a problem report but there's nothing to go on, 
other than my suspicions that the changes to rt2860 in the last 2 months 
are the cause.

I can try to narrow it down to a specific commit if that will help?

If I switch to a 4.4 kernel, the hangs stop but the widely reported ral 
traffic freezes are still there (PR 5958), which was what I was hoping to 
fix.


Please CC,

bbee