Re: Instability in -current with ral/rt2860?
bbee schrieb: On Sat, 7 Feb 2009, Dorian B|ttner wrote: bbee schrieb: In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Do you also have the mini pci card? Here is mine, which I got myself 2 or 3 days ago: ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, address 00:08:54:86:5e:6e ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Also in a net5501. No, this is a pci card, an Edimax EW-7728IN, it's listed in ral(4). Totally can't confirm the 2 hour time frame with the need for reboot. I have connection loss every now and then, but I'm just on my way getting some external antennas for the box, hope that will help. During initial setup, but that will be more a problem on configuration side (pf or somewhat) I had a system crash which made the box come with a date 20h in future, which I haven't seen before. I wasn't having connection problems before, other than described in PR 5958, where ral just stops transmitting. If that's also the issue you're having (which is different from the hangs I'm having with -current), then I don't think it's an antenna problem. I have 3 10dBi omni antennas on this ral. Did you try -current? Could you please grab a snapshot, disable the watchdog and see if your box also hangs after a few hours, since we have pretty much identical hardware? It's 5 minutes of work using bsd.rd and sysmerge. Can you assure your powersupply is ok/not running at it's limit? There have been issues with those originally shipped with the 5501. I'm pretty sure the PSU is ok, it worked fine for 4.4 and it's an official Soekris one (well.. sort of.. I haggled it off of Wim, no idea where he got it :) Thanks for your reply. bbee M y net5501 showed that error again today, it didn't accept any input on the serial anymore, but I have two pics from systat vmstat and top - if anyone thinks they're useful I can send them offlist. However I personally haven't detected anything unusual in the screens. If anything else to test, please let me know.
Re: Instability in -current with ral/rt2860?
bbee writes: Hi, In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) I've been trying snapshots off and on since damien@ started tinkering with the rt2860 code two months ago. With any snapshot from the last 2 months, I can't get the box to stay up for more than 2 hours (or less) without it rebooting. [...] No problems here. I've got a net4801 with a SparkLAN WMIR-215GN Mini PCI card, running the snapshot from 23rd December: OpenBSD 4.4-current (GENERIC) #1637: Tue Dec 23 15:22:33 MST 2008 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC [...] ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 11, address 00:0e:8e:xx:xx:xx ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) The net4801 is up for 44 days. It's an open access point. WEP and WPA aren't enabled. Only 11g connections are accepted. The interface is configured with these settings: inet 192.168.0.1 255.255.255.0 NONE media autoselect mode 11g mediaopt hostap nwid myexample chan 5 I've put another SparkLAN card into my laptop but I've connected to the access point with Atheros and Intel cards as well. Also, several neighbours have used my access point in recent weeks. Regards, Andreas
Re: Instability in -current with ral/rt2860?
On 2009-02-07, bbee bumble@xs4all.nl wrote: On Sat, 7 Feb 2009, Stuart Henderson wrote: enable ddb.console=1 and send it a BREAK, see if you can get some trace out of ddb. Thanks for the suggestion. I tried it, but the kernel's not responding to the break :( Does BREAK work under normal circumstances (i.e. before crashing)? It should drop you to ddb, from where you can type c to continue. If the watchdog is enabled doing this will trigger a reboot if you don't continue quickly enough. send dmesg :-) I'd rather not spam the list it's much spammier to *not* include it, then be asked to send it, then to say no. it's just an ordinary net5501, dmesg is easily googled. that says nothing about the exact OS version you have installed. or how the particular kernel you're running picks up the devices on your particular hardware. even making people stop and think, oh that's a net5501, hmm that has a geode cpu so it _must_ be running some i386 kernel (even if they already know and don't have to stop reading mail and go into a web browser and look it up) wastes their time. the point of including dmesg is to include relevant details in one place, to save time for people who might be interested in looking into the problem. and in any event, google does not easily find me a dmesg from a net5501 with an RT2860 running OpenBSD. I've been running recent snaps on an ALIX board with RT2860 with no trouble. That's.. unfortunate. I keep thinking that since some people don't even see the problems with traffic stalling in PR 5958, there might be something specific to the location of the AP, like load or some specific client that makes it go boom. Grasping at straws, here. well, I have seen the problem from 5958 on one busy AP with a larger range of clients, but never seen it on my home AP in a relatively uncrowded area RF-wise with just a couple of OpenBSD clients... but the problem one is quiet over the winter, so I can't tell if the fixes from early December helped yet. FWIW, here's how it looks in the alix2c3 (working). OpenBSD 4.4-current (GENERIC) #1672: Fri Feb 6 14:11:28 MST 2009 t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 499 MHz cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX real mem = 268009472 (255MB) avail mem = 25088 (239MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 12/10/07, BIOS32 rev. 0 @ 0xfceb2 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: pcibios_get_intr_routing - function not supported pcibios0: PCI IRQ Routing information unavailable. pcibios0: PCI bus #0 is the last bus bios0: ROM list: 0xe/0xa800 cpu0 at mainbus0: (uniprocessor) pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33 glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10, address 00:0d:b9:13:51:98 ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 00:0d:b9:13:51:99 ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 12, address 00:0d:b9:13:51:9a ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 ral0 at pci0 dev 12 function 0 Ralink RT2860 rev 0x00: irq 9, address 00:0e:8e:1d:f1:71 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 0, 32-bit 3579545Hz timer, watchdog, gpio gpio0 at glxpcib0: 32 pins pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: SanDisk SDCFJ-1024 wd0: 4-sector PIO, LBA, 977MB, 2001888 sectors wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 pciide0: channel 1 ignored (disabled) ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 15, version 1.0, legacy support ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 15 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1 isa0 at glxpcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console pcppi0 at isa0 port 0x61 midi0 at pcppi0: PC speaker spkr0 at pcppi0 npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 usb1 at ohci0: USB revision 1.0 uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1 biomask e1ef netmask ffef ttymask mtrr: K6-family MTRR support (2 registers) nvram: invalid checksum softraid0 at root root on wd0a swap on wd0b dump on wd0b clock: unknown CMOS layout
Re: Instability in -current with ral/rt2860?
In gmane.os.openbsd.misc, you wrote: Hi, In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) I've been trying snapshots off and on since damien@ started tinkering with the rt2860 code two months ago. With any snapshot from the last 2 months, I can't get the box to stay up for more than 2 hours (or less) without it rebooting. If I turn off the watchdog timer, it will just hang without printing any messages. If I ifconfig ral0 down, the box is rock stable. enable ddb.console=1 and send it a BREAK, see if you can get some trace out of ddb. leave some sessions open, run things like top -s.1, systat vmstat .1 and see what the system's doing when it freezes. send dmesg :-) Is anyone else seeing this with -current or a snapshot, with this ral or a different one? I'd file a problem report but there's nothing to go on, other than my suspicions that the changes to rt2860 in the last 2 months are the cause. I can try to narrow it down to a specific commit if that will help? I've been running recent snaps on an ALIX board with RT2860 with no trouble.
Re: Instability in -current with ral/rt2860?
bbee schrieb: Hi, In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Do you also have the mini pci card? Here is mine, which I got myself 2 or 3 days ago: ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, address 00:08:54:86:5e:6e ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Also in a net5501. I've been trying snapshots off and on since damien@ started tinkering with the rt2860 code two months ago. With any snapshot from the last 2 months, I can't get the box to stay up for more than 2 hours (or less) without it rebooting. If I turn off the watchdog timer, it will just hang without printing any messages. If I ifconfig ral0 down, the box is rock stable. Is anyone else seeing this with -current or a snapshot, with this ral or a different one? I'd file a problem report but there's nothing to go on, other than my suspicions that the changes to rt2860 in the last 2 months are the cause. I can try to narrow it down to a specific commit if that will help? Totally can't confirm the 2 hour time frame with the need for reboot. I have connection loss every now and then, but I'm just on my way getting some external antennas for the box, hope that will help. During initial setup, but that will be more a problem on configuration side (pf or somewhat) I had a system crash which made the box come with a date 20h in future, which I haven't seen before. Can you assure your powersupply is ok/not running at it's limit? There have been issues with those originally shipped with the 5501. If I switch to a 4.4 kernel, the hangs stop but the widely reported ral traffic freezes are still there (PR 5958), which was what I was hoping to fix. Please CC, bbee
Re: Instability in -current with ral/rt2860?
On Sat, 7 Feb 2009, Stuart Henderson wrote: In gmane.os.openbsd.misc, you wrote: In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) I've been trying snapshots off and on since damien@ started tinkering with the rt2860 code two months ago. With any snapshot from the last 2 months, I can't get the box to stay up for more than 2 hours (or less) without it rebooting. If I turn off the watchdog timer, it will just hang without printing any messages. If I ifconfig ral0 down, the box is rock stable. enable ddb.console=1 and send it a BREAK, see if you can get some trace out of ddb. Thanks for the suggestion. I tried it, but the kernel's not responding to the break :( leave some sessions open, run things like top -s.1, systat vmstat .1 and see what the system's doing when it freezes. Right, the top is not showing anything out of the ordinary, the vmstat shows 7.1% interrupt load and nothing else on the processor at that time: 7.1%Int 0.0%Sys 0.0%Usr 0.0%Nic 92.9%Idle Interrupts 732 total 229 vr0 192 vr3 82 ral0 pciide0 ohci0 com0 101 clock 128 rtc Proc:r d s wCsw Trp Sys Int Sof Flt 755 9 841 731 110 357 All seems fairly standard to me, some light load on lan/wlan. send dmesg :-) I'd rather not spam the list, it's just an ordinary net5501, dmesg is easily googled. Is anyone else seeing this with -current or a snapshot, with this ral or a different one? I'd file a problem report but there's nothing to go on, other than my suspicions that the changes to rt2860 in the last 2 months are the cause. I can try to narrow it down to a specific commit if that will help? I've been running recent snaps on an ALIX board with RT2860 with no trouble. That's.. unfortunate. I keep thinking that since some people don't even see the problems with traffic stalling in PR 5958, there might be something specific to the location of the AP, like load or some specific client that makes it go boom. Grasping at straws, here. Thanks for the suggestion, bbee
Re: Instability in -current with ral/rt2860?
On Sat, 7 Feb 2009, Dorian B|ttner wrote: bbee schrieb: In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Do you also have the mini pci card? Here is mine, which I got myself 2 or 3 days ago: ral0 at pci0 dev 17 function 0 Ralink RT2860 rev 0x00: irq 15, address 00:08:54:86:5e:6e ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) Also in a net5501. No, this is a pci card, an Edimax EW-7728IN, it's listed in ral(4). Totally can't confirm the 2 hour time frame with the need for reboot. I have connection loss every now and then, but I'm just on my way getting some external antennas for the box, hope that will help. During initial setup, but that will be more a problem on configuration side (pf or somewhat) I had a system crash which made the box come with a date 20h in future, which I haven't seen before. I wasn't having connection problems before, other than described in PR 5958, where ral just stops transmitting. If that's also the issue you're having (which is different from the hangs I'm having with -current), then I don't think it's an antenna problem. I have 3 10dBi omni antennas on this ral. Did you try -current? Could you please grab a snapshot, disable the watchdog and see if your box also hangs after a few hours, since we have pretty much identical hardware? It's 5 minutes of work using bsd.rd and sysmerge. Can you assure your powersupply is ok/not running at it's limit? There have been issues with those originally shipped with the 5501. I'm pretty sure the PSU is ok, it worked fine for 4.4 and it's an official Soekris one (well.. sort of.. I haggled it off of Wim, no idea where he got it :) Thanks for your reply. bbee
Re: Instability in -current with ral/rt2860?
FYI, I'm having the same problems with ral0 at pci0 dev 21 function 0 Ralink RT2860 rev 0x00: irq 11, address 00:00:00:00:00:00 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (2T3R) I get both traffic freezes and instability with 4.3 and 4.4 kernels, although the box is stable for a bit longer (a couple of days). I've posted about this before, only I wasn't sure about the cause then. There's nothing that explains the instability (no increased CPU or memory usage, nothing in any log, no increased traffic). Lars
Instability in -current with ral/rt2860?
Hi, In a net5501 I have a rt2860 ral card, running the Feb 04 snapshot: ral0 at pci0 dev 14 function 0 Ralink RT2860 rev 0x00: irq 10 ral0: MAC/BBP RT2860 (rev 0x0101), RF RT2820 (MIMO 2T3R) I've been trying snapshots off and on since damien@ started tinkering with the rt2860 code two months ago. With any snapshot from the last 2 months, I can't get the box to stay up for more than 2 hours (or less) without it rebooting. If I turn off the watchdog timer, it will just hang without printing any messages. If I ifconfig ral0 down, the box is rock stable. Is anyone else seeing this with -current or a snapshot, with this ral or a different one? I'd file a problem report but there's nothing to go on, other than my suspicions that the changes to rt2860 in the last 2 months are the cause. I can try to narrow it down to a specific commit if that will help? If I switch to a 4.4 kernel, the hangs stop but the widely reported ral traffic freezes are still there (PR 5958), which was what I was hoping to fix. Please CC, bbee