Re: BCM4312 Fails when xdm is started

2008-12-07 Thread Yuval Hager
On Tuesday 25 November 2008, Peter Stuge wrote:
 Yuval Hager wrote:
  I played around with different video drivers and the results are:
  * If using the 'via' driver, I lose the PCIe card immediately upon
initialization
  * Using the 'openchrome' (trunk version), It works well in the
beginning.
After first blanking the register reads are all 1's, and then
when the screen is blank I get a different read (some registers
are correct, some are wrong), and when the screen is unblanked, I
get 0xff's again. Very consistent and predictabe (same read every
time).
  * Using the 'vesa' driver I could not recreate the problem. I could
not get the screen to blank for some reason, but closing the lid,
going on standby/hibernate, restarting X - all didn't matter much
to the PCI and the wireless card kept on working.

 Good work! You have beyond any doubt established that the X graphics
 driver can cause this problem.


This issue has been tracked down to be at the openchrome driver. It appears 
that somehow it corrupted the PCI bus, and damaged the device right after the 
video card - the wireless card. 

The current workaround for this is here - 
http://wiki.openchrome.org/pipermail/openchrome-devel/2008-November/000139.html 
- 
but this is just a quick hack, not a fix, although it works great for me. The 
openchrome team is working on a patch based on this.

Thanks for all the help - I am a very happy HP2133 user, and a very happy 
community member. This was an amazing opensource support experience, which 
I'll remember for a long time. Thank you all!

Cheers,

-- 
Yuval Hager
[EMAIL PROTECTED] [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-12-07 Thread Larry Finger
Yuval Hager wrote:
 
 This issue has been tracked down to be at the openchrome driver. It appears 
 that somehow it corrupted the PCI bus, and damaged the device right after the 
 video card - the wireless card. 
 
 The current workaround for this is here - 
 http://wiki.openchrome.org/pipermail/openchrome-devel/2008-November/000139.html
  - 
 but this is just a quick hack, not a fix, although it works great for me. The 
 openchrome team is working on a patch based on this.
 
 Thanks for all the help - I am a very happy HP2133 user, and a very happy 
 community member. This was an amazing opensource support experience, which 
 I'll remember for a long time. Thank you all!

Thanks for the feedback. The community may struggle a bit on some problems, and
the approach to a solution may look like a drunken sailor's walk, but we usually
get there.

Larry

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-25 Thread Michael Buesch
On Tuesday 25 November 2008 06:43:22 Yuval Hager wrote:
 However, I have some few interesting findings. 
 First, this is totally unrelated to b43, but to the PCI. I get the flawed 1's 
 read from lspci even without loading b43.
 
 I played around with different video drivers and the results are:
 * If using the 'via' driver, I lose the PCIe card immediately upon 
 initialization
 * Using the 'openchrome' (trunk version), It works well in the beginning. 
 After first blanking the register reads are all 1's, and then when the screen 
 is blank I get a different read (some registers are correct, some are wrong), 
 and when the screen is unblanked, I get 0xff's again. Very consistent and 
 predictabe (same read every time).
 * Using the 'vesa' driver I could not recreate the problem. I could not get 
 the screen to blank for some reason, but closing the lid, going on 
 standby/hibernate, restarting X - all didn't matter much to the PCI and the 
 wireless card kept on working.

Ok, then you should report the stuff to the X guys. This is not a b43 problem
and I also don't think it's a kernel problem.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-24 Thread Yuval Hager
On Sunday 23 November 2008, Larry Finger wrote:
 From a config file posted earlier, the OP is using SLAB. Is there any point
 in trying SLUB?

Another try, not sure what it means:

* Added CONFIG_SLUB and CONFIG_SLUB_DEBUG

* boot parameters: root=/dev/sda3 debug memory_corruption_check=1 devres.log=1 
debug_objects debugpat 
acpi.debug_layer=0x00410002 acpi.debug_level=0x acpi=off apic=debug 
nolapic irqpoll pci=noacpi slub_debug=FZPU

* cat /proc/interrupts is
   CPU0   
  0:  16658XT-PIC-XTtimer
  1:289XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  3: 60XT-PIC-XTuhci_hcd:usb2, ehci_hcd:usb4
  5:   9163XT-PIC-XTsata_via, HDA Intel
  7:  0XT-PIC-XTuhci_hcd:usb3
  8:  2XT-PIC-XTrtc
 10:   1712XT-PIC-XTb43
 11:131XT-PIC-XTuhci_hcd:usb1
 12:706XT-PIC-XTi8042
 14:  0XT-PIC-XTide0
 15:  0XT-PIC-XTide1
NMI:  0   Non-maskable interrupts
LOC:  0   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   Function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0

* lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00

* xset dpms force standby
* wake up
* dmesg is virtually the same as before, complaining about nobody handling irq 
10 and disabling it.
* /proc/interrupts now is
  0:  80987XT-PIC-XTtimer
  1:   1027XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  3: 60XT-PIC-XTuhci_hcd:usb2, ehci_hcd:usb4
  5:  10400XT-PIC-XTsata_via, HDA Intel
  7:  0XT-PIC-XTuhci_hcd:usb3
  8:  2XT-PIC-XTrtc
 10: 20XT-PIC-XTb43
 11:131XT-PIC-XTuhci_hcd:usb1
 12:   3059XT-PIC-XTi8042
 14:  0XT-PIC-XTide0
 15:  0XT-PIC-XTide1
NMI:  0   Non-maskable interrupts
LOC:  0   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   Function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0

* Now check this out - the output of lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

(I double checked this)

huh?

--yuval


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-24 Thread Michael Buesch
On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
 On Sunday 23 November 2008, Larry Finger wrote:
  From a config file posted earlier, the OP is using SLAB. Is there any point
  in trying SLUB?
 
 Another try, not sure what it means:
 
 * Added CONFIG_SLUB and CONFIG_SLUB_DEBUG
 
 * boot parameters: root=/dev/sda3 debug memory_corruption_check=1 
 devres.log=1 debug_objects debugpat 
 acpi.debug_layer=0x00410002 acpi.debug_level=0x acpi=off apic=debug 
 nolapic irqpoll pci=noacpi slub_debug=FZPU
 
 * cat /proc/interrupts is
CPU0   
   0:  16658XT-PIC-XTtimer
   1:289XT-PIC-XTi8042
   2:  0XT-PIC-XTcascade
   3: 60XT-PIC-XTuhci_hcd:usb2, ehci_hcd:usb4
   5:   9163XT-PIC-XTsata_via, HDA Intel
   7:  0XT-PIC-XTuhci_hcd:usb3
   8:  2XT-PIC-XTrtc
  10:   1712XT-PIC-XTb43
  11:131XT-PIC-XTuhci_hcd:usb1
  12:706XT-PIC-XTi8042
  14:  0XT-PIC-XTide0
  15:  0XT-PIC-XTide1
 NMI:  0   Non-maskable interrupts
 LOC:  0   Local timer interrupts
 RES:  0   Rescheduling interrupts
 CAL:  0   Function call interrupts
 TLB:  0   TLB shootdowns
 TRM:  0   Thermal event interrupts
 SPU:  0   Spurious interrupts
 ERR:  0
 MIS:  0
 
 * lspci -d 14e4:4312 -x
 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
 00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
 10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
 
 * xset dpms force standby
 * wake up
 * dmesg is virtually the same as before, complaining about nobody handling 
 irq 10 and disabling it.

Actually, b43 _does_ use IRQ10 now.
I guess the card dies such a horrible death, that it also asserts the IRQ line 
forever.

 * /proc/interrupts now is
   0:  80987XT-PIC-XTtimer
   1:   1027XT-PIC-XTi8042
   2:  0XT-PIC-XTcascade
   3: 60XT-PIC-XTuhci_hcd:usb2, ehci_hcd:usb4
   5:  10400XT-PIC-XTsata_via, HDA Intel
   7:  0XT-PIC-XTuhci_hcd:usb3
   8:  2XT-PIC-XTrtc
  10: 20XT-PIC-XTb43
  11:131XT-PIC-XTuhci_hcd:usb1
  12:   3059XT-PIC-XTi8042
  14:  0XT-PIC-XTide0
  15:  0XT-PIC-XTide1
 NMI:  0   Non-maskable interrupts
 LOC:  0   Local timer interrupts
 RES:  0   Rescheduling interrupts
 CAL:  0   Function call interrupts
 TLB:  0   TLB shootdowns
 TRM:  0   Thermal event interrupts
 SPU:  0   Spurious interrupts
 ERR:  0
 MIS:  0
 
 * Now check this out - the output of lspci -d 14e4:4312 -x
 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 
 (I double checked this)
 
 huh?

Hah, interesting. I think your hardware may be faulty, in fact.
To me it really seems like the mainboard has power failures on the PCI bus.

This is a laptop, so you can't pull random hardware? Can you run some
hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or memtest?
If that doesn't help, can you try with another operating system?

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-24 Thread Larry Finger
Michael Buesch wrote:
 On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
 * Now check this out - the output of lspci -d 14e4:4312 -x
 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

 (I double checked this)

 huh?
 
 Hah, interesting. I think your hardware may be faulty, in fact.
 To me it really seems like the mainboard has power failures on the PCI bus.
 
 This is a laptop, so you can't pull random hardware? Can you run some
 hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or memtest?
 If that doesn't help, can you try with another operating system?
 

I also think you are seeing a hardware failure. Another test to try is
http://freshmeat.net/projects/cpuburn/?topic_id=146, which will exercise the 
system.

Larry
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-24 Thread Yuval Hager
On Monday 24 November 2008, Larry Finger wrote:
 Michael Buesch wrote:
  On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
  * Now check this out - the output of lspci -d 14e4:4312 -x
  02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g
  (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 
  (I double checked this)
 
  huh?
 
  Hah, interesting. I think your hardware may be faulty, in fact.
  To me it really seems like the mainboard has power failures on the PCI
  bus.
 
  This is a laptop, so you can't pull random hardware? Can you run some
  hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or
  memtest? If that doesn't help, can you try with another operating system?

 I also think you are seeing a hardware failure. Another test to try is
 http://freshmeat.net/projects/cpuburn/?topic_id=146, which will exercise
 the system.

 Larry

I can't argue with what the bits mean, but I must say it doesn't feel like a 
hardware problem. It is very consistent and deterministic. 

I've been running mprime  burnBX  burnMMX for over 6 hours and it is all 
fine (memtest not ran yet).

However, I have some few interesting findings. 
First, this is totally unrelated to b43, but to the PCI. I get the flawed 1's 
read from lspci even without loading b43.

I played around with different video drivers and the results are:
* If using the 'via' driver, I lose the PCIe card immediately upon 
initialization
* Using the 'openchrome' (trunk version), It works well in the beginning. 
After first blanking the register reads are all 1's, and then when the screen 
is blank I get a different read (some registers are correct, some are wrong), 
and when the screen is unblanked, I get 0xff's again. Very consistent and 
predictabe (same read every time).
* Using the 'vesa' driver I could not recreate the problem. I could not get 
the screen to blank for some reason, but closing the lid, going on 
standby/hibernate, restarting X - all didn't matter much to the PCI and the 
wireless card kept on working.

--yuval


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-24 Thread Peter Stuge
Yuval Hager wrote:
 I played around with different video drivers and the results are:
 * If using the 'via' driver, I lose the PCIe card immediately upon 
   initialization
 * Using the 'openchrome' (trunk version), It works well in the
   beginning.
   After first blanking the register reads are all 1's, and then
   when the screen is blank I get a different read (some registers
   are correct, some are wrong), and when the screen is unblanked, I
   get 0xff's again. Very consistent and predictabe (same read every
   time).
 * Using the 'vesa' driver I could not recreate the problem. I could
   not get the screen to blank for some reason, but closing the lid,
   going on standby/hibernate, restarting X - all didn't matter much
   to the PCI and the wireless card kept on working.

Good work! You have beyond any doubt established that the X graphics
driver can cause this problem.

Were you using a kernel framebuffer driver when you saw the problem
also without running X at all? If not, the cause of trouble then is
still unidentified.


//Peter


pgpARW21DAltA.pgp
Description: PGP signature
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Michael Buesch
On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
 [  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
 [  182.891409] ** b43: SSB_TMSLOW 0x2015
 [  258.299027] irq 10: nobody cared (try booting with the irqpoll option)


Does the kernel disable the PCI device, if it ignores the IRQ?


 [  258.299038] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #15
 [  258.299043] Call Trace:
 [  258.299062]  [c0148d9a] __report_bad_irq+0x24/0x69
 [  258.299071]  [c0148da1] __report_bad_irq+0x2b/0x69
 [  258.299080]  [c0148ec8] note_interrupt+0xe9/0x12d
 [  258.299090]  [c014976d] handle_level_irq+0x87/0xba
 [  258.299101]  [c010564e] do_IRQ+0x89/0x9f
 [  258.299109]  [c0103ea8] common_interrupt+0x28/0x30
 [  258.299119]  [c0125406] do_softirq+0x37/0x4d
 [  258.299127]  [c0125301] __do_softirq+0x62/0x130
 [  258.299135]  [c0125406] do_softirq+0x37/0x4d
 [  258.299142]  [c0105653] do_IRQ+0x8e/0x9f
 [  258.299150]  [c0103ea8] common_interrupt+0x28/0x30
 [  258.299161]  [c0108682] default_idle+0x2f/0x4c
 [  258.299168]  [c0101a20] cpu_idle+0x63/0x77
 [  258.299173] handlers:
 [  258.299176] [f7906455] (b43_interrupt_handler+0x0/0x1b7 [b43])
 [  258.299212] Disabling IRQ #10
 [  258.315148] b43-phy0: Radio hardware status changed to DISABLED
 [  258.315160] b43-phy0:  B43_B43_MMIO_RADIO_HWENABLED_HI 0x
 [  258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
 [  258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = 
 '/class/rfkill/rfkill0'
 [  258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = 
 '/devices/pci:00/:00:02.0/:02:00.0/ssb0:0'
 [  258.391951] 
 [  258.391956] =
 [  258.391964] [ INFO: inconsistent lock state ]
 [  258.391971] 2.6.28-rc5 #15
 [  258.391975] -
 [  258.391980] inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 [  258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [  258.391993]  (irq_desc_lock_class){++..}, at: [c0148c60] 
 try_one_irq+0x15/0xe8
 [  258.392016] {in-hardirq-W} state was registered at:
 [  258.392021]   [c013bc07] __lock_acquire+0x490/0x6bc
 [  258.392034]   [c013be8d] lock_acquire+0x5a/0x74
 [  258.392043]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392053]   [c03c4842] _spin_lock+0x1c/0x45
 [  258.392066]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392076]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392085]   [c010564e] do_IRQ+0x89/0x9f
 [  258.392096]   [c0103ea8] common_interrupt+0x28/0x30
 [  258.392105]   [c03c4cc2] _spin_unlock_irqrestore+0x37/0x39
 [  258.392115]   [c01487e6] __setup_irq+0x17a/0x1f3
 [  258.392124]   [c05ce79d] start_kernel+0x285/0x2f1
 [  258.392140]   [] 0x
 [  258.392159] irq event stamp: 1844456
 [  258.392164] hardirqs last  enabled at (1844456): [c03c4b6f] 
 _spin_unlock_irq+0x20/0x23
 [  258.392175] hardirqs last disabled at (1844455): [c03c4ac3] 
 _spin_lock_irq+0xa/0x4b
 [  258.392186] softirqs last  enabled at (1844310): [c0125406] 
 do_softirq+0x37/0x4d
 [  258.392198] softirqs last disabled at (187): [c0125406] 
 do_softirq+0x37/0x4d


That's a bit weird. Looks like another bug in the IRQ layer.


 [  258.392208] 
 [  258.392209] other info that might help us debug this:
 [  258.392215] no locks held by X/3965.
 [  258.392219] 
 [  258.392220] stack backtrace:
 [  258.392226] Pid: 3965, comm: X Not tainted 2.6.28-rc5 #15
 [  258.392231] Call Trace:
 [  258.392241]  [c0139175] print_usage_bug+0x13d/0x146
 [  258.392249]  [c013a2ff] mark_lock+0x4b1/0x7c7
 [  258.392257]  [c013bc7e] __lock_acquire+0x507/0x6bc
 [  258.392266]  [c013be8d] lock_acquire+0x5a/0x74
 [  258.392275]  [c0148c60] try_one_irq+0x15/0xe8
 [  258.392283]  [c03c4842] _spin_lock+0x1c/0x45
 [  258.392291]  [c0148c60] try_one_irq+0x15/0xe8
 [  258.392300]  [c0148c60] try_one_irq+0x15/0xe8
 [  258.392308]  [c03c4b6f] _spin_unlock_irq+0x20/0x23
 [  258.392317]  [c0148d33] poll_spurious_irqs+0x0/0x43
 [  258.392326]  [c0148d55] poll_spurious_irqs+0x22/0x43
 [  258.392338]  [c012874a] run_timer_softirq+0x101/0x156
 [  258.392346]  [c0125321] __do_softirq+0x82/0x130
 [  258.392354]  [c0125406] do_softirq+0x37/0x4d
 [  258.392362]  [c0105653] do_IRQ+0x8e/0x9f
 [  258.392370]  [c0103ea8] common_interrupt+0x28/0x30

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Yuval Hager
 It doesn't hurt to turn on all debugging options. Often you get some hint
 by doing so.

I booted with 'root=/dev/sda3 debug memory_corruption_check=1 devres.log=1 
debug_objects debugpat 
acpi.debug_layer=0x00410002 acpi.debug_level=0x acpi=off noapic nolapic 
irqpoll pci=noacpi' and issued 'xset dpms force 
standby'. After touching the mouse the system locked for about a minute, and 
the wireless stopped working. 
Here's the log portion, I hope it provides a hint of some sort:

[  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
[  182.891409] ** b43: SSB_TMSLOW 0x2015
[  258.299027] irq 10: nobody cared (try booting with the irqpoll option)
[  258.299038] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #15
[  258.299043] Call Trace:
[  258.299062]  [c0148d9a] __report_bad_irq+0x24/0x69
[  258.299071]  [c0148da1] __report_bad_irq+0x2b/0x69
[  258.299080]  [c0148ec8] note_interrupt+0xe9/0x12d
[  258.299090]  [c014976d] handle_level_irq+0x87/0xba
[  258.299101]  [c010564e] do_IRQ+0x89/0x9f
[  258.299109]  [c0103ea8] common_interrupt+0x28/0x30
[  258.299119]  [c0125406] do_softirq+0x37/0x4d
[  258.299127]  [c0125301] __do_softirq+0x62/0x130
[  258.299135]  [c0125406] do_softirq+0x37/0x4d
[  258.299142]  [c0105653] do_IRQ+0x8e/0x9f
[  258.299150]  [c0103ea8] common_interrupt+0x28/0x30
[  258.299161]  [c0108682] default_idle+0x2f/0x4c
[  258.299168]  [c0101a20] cpu_idle+0x63/0x77
[  258.299173] handlers:
[  258.299176] [f7906455] (b43_interrupt_handler+0x0/0x1b7 [b43])
[  258.299212] Disabling IRQ #10
[  258.315148] b43-phy0: Radio hardware status changed to DISABLED
[  258.315160] b43-phy0:  B43_B43_MMIO_RADIO_HWENABLED_HI 0x
[  258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
[  258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = 
'/class/rfkill/rfkill0'
[  258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = 
'/devices/pci:00/:00:02.0/:02:00.0/ssb0:0'
[  258.391951] 
[  258.391956] =
[  258.391964] [ INFO: inconsistent lock state ]
[  258.391971] 2.6.28-rc5 #15
[  258.391975] -
[  258.391980] inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
[  258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  258.391993]  (irq_desc_lock_class){++..}, at: [c0148c60] 
try_one_irq+0x15/0xe8
[  258.392016] {in-hardirq-W} state was registered at:
[  258.392021]   [c013bc07] __lock_acquire+0x490/0x6bc
[  258.392034]   [c013be8d] lock_acquire+0x5a/0x74
[  258.392043]   [c01496f8] handle_level_irq+0x12/0xba
[  258.392053]   [c03c4842] _spin_lock+0x1c/0x45
[  258.392066]   [c01496f8] handle_level_irq+0x12/0xba
[  258.392076]   [c01496f8] handle_level_irq+0x12/0xba
[  258.392085]   [c010564e] do_IRQ+0x89/0x9f
[  258.392096]   [c0103ea8] common_interrupt+0x28/0x30
[  258.392105]   [c03c4cc2] _spin_unlock_irqrestore+0x37/0x39
[  258.392115]   [c01487e6] __setup_irq+0x17a/0x1f3
[  258.392124]   [c05ce79d] start_kernel+0x285/0x2f1
[  258.392140]   [] 0x
[  258.392159] irq event stamp: 1844456
[  258.392164] hardirqs last  enabled at (1844456): [c03c4b6f] 
_spin_unlock_irq+0x20/0x23
[  258.392175] hardirqs last disabled at (1844455): [c03c4ac3] 
_spin_lock_irq+0xa/0x4b
[  258.392186] softirqs last  enabled at (1844310): [c0125406] 
do_softirq+0x37/0x4d
[  258.392198] softirqs last disabled at (187): [c0125406] 
do_softirq+0x37/0x4d
[  258.392208] 
[  258.392209] other info that might help us debug this:
[  258.392215] no locks held by X/3965.
[  258.392219] 
[  258.392220] stack backtrace:
[  258.392226] Pid: 3965, comm: X Not tainted 2.6.28-rc5 #15
[  258.392231] Call Trace:
[  258.392241]  [c0139175] print_usage_bug+0x13d/0x146
[  258.392249]  [c013a2ff] mark_lock+0x4b1/0x7c7
[  258.392257]  [c013bc7e] __lock_acquire+0x507/0x6bc
[  258.392266]  [c013be8d] lock_acquire+0x5a/0x74
[  258.392275]  [c0148c60] try_one_irq+0x15/0xe8
[  258.392283]  [c03c4842] _spin_lock+0x1c/0x45
[  258.392291]  [c0148c60] try_one_irq+0x15/0xe8
[  258.392300]  [c0148c60] try_one_irq+0x15/0xe8
[  258.392308]  [c03c4b6f] _spin_unlock_irq+0x20/0x23
[  258.392317]  [c0148d33] poll_spurious_irqs+0x0/0x43
[  258.392326]  [c0148d55] poll_spurious_irqs+0x22/0x43
[  258.392338]  [c012874a] run_timer_softirq+0x101/0x156
[  258.392346]  [c0125321] __do_softirq+0x82/0x130
[  258.392354]  [c0125406] do_softirq+0x37/0x4d
[  258.392362]  [c0105653] do_IRQ+0x8e/0x9f
[  258.392370]  [c0103ea8] common_interrupt+0x28/0x30
[  260.311944] wlan0: No ProbeResp from current AP 00:22:3f:18:89:5e - assume 
out of range
[  304.082520] [ cut here ]
[  304.082531] WARNING: at drivers/net/wireless/b43/phy_common.c:135 
b43_radio_lock+0x29/0x7c [b43]()
[  304.082538] Modules linked in: rfkill_input b43 ssb led_class input_polldev 
via drm rtc hci_usb snd_h
da_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore ehci_hcd 
uhci_hcd usbcore sg via_agp a
gpgart
[  304.082593] Pid: 5913, 

Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Larry Finger
Michael Buesch wrote:
 On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
 [  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
 [  182.891409] ** b43: SSB_TMSLOW 0x2015
 [  258.299027] irq 10: nobody cared (try booting with the irqpoll option)
 
 
 Does the kernel disable the PCI device, if it ignores the IRQ?

According to /proc/interrupts that Yuval posted earlier, IRQ 10 is not used.

# cat /proc/interrupts

   CPU0
  0: 271998   IO-APIC-edge  timer
  1:  8   IO-APIC-edge  i8042
  8:  2   IO-APIC-edge  rtc
  9:   2796   IO-APIC-fasteoi   acpi
 12:132   IO-APIC-edge  i8042
 14:  0   IO-APIC-edge  ide0
 15:  0   IO-APIC-edge  ide1
 16:  29161   IO-APIC-fasteoi   eth0
 17:503   IO-APIC-fasteoi   HDA Intel
 20:  0   IO-APIC-fasteoi   uhci_hcd:usb2
 21:  11085   IO-APIC-fasteoi   sata_via, ehci_hcd:usb1, uhci_hcd:usb3
 23:  0   IO-APIC-fasteoi   uhci_hcd:usb4
 24:  28003   IO-APIC-fasteoi   b43
NMI:  0   Non-maskable interrupts
LOC:   2098   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0

After the failure occurs, b43 has disappeared from the IRQ 24 list:

 24:  34154   IO-APIC-fasteoi


Larry
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Michael Buesch
On Sunday 23 November 2008 16:42:28 Larry Finger wrote:
 Michael Buesch wrote:
  On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
  [  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
  [  182.891409] ** b43: SSB_TMSLOW 0x2015
  [  258.299027] irq 10: nobody cared (try booting with the irqpoll option)
  
  
  Does the kernel disable the PCI device, if it ignores the IRQ?
 
 According to /proc/interrupts that Yuval posted earlier, IRQ 10 is not used.

Can you try booting with kernel parameters noapic and noacpi
and reproduce?

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Michael Buesch
On Sunday 23 November 2008 18:55:36 Yuval Hager wrote:
 On Sunday 23 November 2008, you wrote:
  On Sunday 23 November 2008 16:42:28 Larry Finger wrote:
   Michael Buesch wrote:
On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
[  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
[  182.891409] ** b43: SSB_TMSLOW 0x2015
[  258.299027] irq 10: nobody cared (try booting with the irqpoll
option)
   
Does the kernel disable the PCI device, if it ignores the IRQ?
  
   According to /proc/interrupts that Yuval posted earlier, IRQ 10 is not
   used.
 
  Can you try booting with kernel parameters noapic and noacpi
  and reproduce?
 
 The dump above was generated with acpi=off noapic nolapic pci=noacpi boot 
 parameters (see my last email). The /proc/interrupts output was from an 
 earlier occurence, without any of these parameters.
 Did you mean to reproduce and provide logs as well as the content 
 of /proc/interrupts before and after the failure?

No thanks.
Anyway, I cannot really help you with the issue.
I don't think this is a b43 bug, but either a hardware bug or a bug
in some code that controls the PCI bus.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Peter Stuge
Michael Buesch wrote:
 On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
  [  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
  [  182.891409] ** b43: SSB_TMSLOW 0x2015
  [  258.299027] irq 10: nobody cared (try booting with the irqpoll option)
 
 Does the kernel disable the PCI device, if it ignores the IRQ?

The kernel disables the IRQ at least internally, maybe also by
deconfiguring the interrupt register in any devices using it, which
would explain the change in config register 0x3c (but not the changes
in all the other bytes, could that be a freak chain reaction inside
the hardware?) but I haven't heard/seen the kernel disable the PCI
device itself. I don't know if it can.

Why doesn't b43 care about this interrupt? Without APIC interrupt 10
is what both device and driver should be using (according to earlier
lspci -x output).


  [  258.299173] handlers:
  [  258.299176] [f7906455] (b43_interrupt_handler+0x0/0x1b7 [b43])
  [  258.299212] Disabling IRQ #10
  [  258.315148] b43-phy0: Radio hardware status changed to DISABLED
  [  258.315160] b43-phy0:  B43_B43_MMIO_RADIO_HWENABLED_HI 0x
  [  258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
  [  258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = 
  '/class/rfkill/rfkill0'
  [  258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = 
  '/devices/pci:00/:00:02.0/:02:00.0/ssb0:0'

Why does the radio hw status changes here?
How is the change notified to the driver?


  [  258.391951] 
  [  258.391956] =
  [  258.391964] [ INFO: inconsistent lock state ]
  [  258.391971] 2.6.28-rc5 #15
  [  258.391975] -
  [  258.391980] inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
  [  258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
  [  258.391993]  (irq_desc_lock_class){++..}, at: [c0148c60] 
  try_one_irq+0x15/0xe8
  [  258.392016] {in-hardirq-W} state was registered at:
  [  258.392021]   [c013bc07] __lock_acquire+0x490/0x6bc
  [  258.392034]   [c013be8d] lock_acquire+0x5a/0x74
  [  258.392043]   [c01496f8] handle_level_irq+0x12/0xba
  [  258.392053]   [c03c4842] _spin_lock+0x1c/0x45
  [  258.392066]   [c01496f8] handle_level_irq+0x12/0xba
  [  258.392076]   [c01496f8] handle_level_irq+0x12/0xba
  [  258.392085]   [c010564e] do_IRQ+0x89/0x9f
  [  258.392096]   [c0103ea8] common_interrupt+0x28/0x30
  [  258.392105]   [c03c4cc2] _spin_unlock_irqrestore+0x37/0x39
  [  258.392115]   [c01487e6] __setup_irq+0x17a/0x1f3
  [  258.392124]   [c05ce79d] start_kernel+0x285/0x2f1
  [  258.392140]   [] 0x
  [  258.392159] irq event stamp: 1844456
  [  258.392164] hardirqs last  enabled at (1844456): [c03c4b6f] 
  _spin_unlock_irq+0x20/0x23
  [  258.392175] hardirqs last disabled at (1844455): [c03c4ac3] 
  _spin_lock_irq+0xa/0x4b
  [  258.392186] softirqs last  enabled at (1844310): [c0125406] 
  do_softirq+0x37/0x4d
  [  258.392198] softirqs last disabled at (187): [c0125406] 
  do_softirq+0x37/0x4d
 
 
 That's a bit weird. Looks like another bug in the IRQ layer.

Something happens with the hardware that confuses the kernel. It's
triggered by software but I don't know where.. Like Michael, I'm
not too convinced that it is in b43. :\


//Peter
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-23 Thread Larry Finger
Peter Stuge wrote:
 Michael Buesch wrote:
 On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
 [  182.891400] ** b43: B43_MMIO_MACCTL 0x840A0503
 [  182.891409] ** b43: SSB_TMSLOW 0x2015
 [  258.299027] irq 10: nobody cared (try booting with the irqpoll option)
 Does the kernel disable the PCI device, if it ignores the IRQ?
 
 The kernel disables the IRQ at least internally, maybe also by
 deconfiguring the interrupt register in any devices using it, which
 would explain the change in config register 0x3c (but not the changes
 in all the other bytes, could that be a freak chain reaction inside
 the hardware?) but I haven't heard/seen the kernel disable the PCI
 device itself. I don't know if it can.
 
 Why doesn't b43 care about this interrupt? Without APIC interrupt 10
 is what both device and driver should be using (according to earlier
 lspci -x output).

I think by this point the BCM43xx hardware is disabled.

 [  258.299173] handlers:
 [  258.299176] [f7906455] (b43_interrupt_handler+0x0/0x1b7 [b43])
 [  258.299212] Disabling IRQ #10
 [  258.315148] b43-phy0: Radio hardware status changed to DISABLED
 [  258.315160] b43-phy0:  B43_B43_MMIO_RADIO_HWENABLED_HI 0x
 [  258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
 [  258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = 
 '/class/rfkill/rfkill0'
 [  258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = 
 '/devices/pci:00/:00:02.0/:02:00.0/ssb0:0'
 
 Why does the radio hw status changes here?
 How is the change notified to the driver?

By setting a bit in the appropriate register; however, device is disabled and
all bits are set. This is a false indication.

 [  258.391951] 
 [  258.391956] =
 [  258.391964] [ INFO: inconsistent lock state ]
 [  258.391971] 2.6.28-rc5 #15
 [  258.391975] -
 [  258.391980] inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 [  258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [  258.391993]  (irq_desc_lock_class){++..}, at: [c0148c60] 
 try_one_irq+0x15/0xe8
 [  258.392016] {in-hardirq-W} state was registered at:
 [  258.392021]   [c013bc07] __lock_acquire+0x490/0x6bc
 [  258.392034]   [c013be8d] lock_acquire+0x5a/0x74
 [  258.392043]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392053]   [c03c4842] _spin_lock+0x1c/0x45
 [  258.392066]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392076]   [c01496f8] handle_level_irq+0x12/0xba
 [  258.392085]   [c010564e] do_IRQ+0x89/0x9f
 [  258.392096]   [c0103ea8] common_interrupt+0x28/0x30
 [  258.392105]   [c03c4cc2] _spin_unlock_irqrestore+0x37/0x39
 [  258.392115]   [c01487e6] __setup_irq+0x17a/0x1f3
 [  258.392124]   [c05ce79d] start_kernel+0x285/0x2f1
 [  258.392140]   [] 0x
 [  258.392159] irq event stamp: 1844456
 [  258.392164] hardirqs last  enabled at (1844456): [c03c4b6f] 
 _spin_unlock_irq+0x20/0x23
 [  258.392175] hardirqs last disabled at (1844455): [c03c4ac3] 
 _spin_lock_irq+0xa/0x4b
 [  258.392186] softirqs last  enabled at (1844310): [c0125406] 
 do_softirq+0x37/0x4d
 [  258.392198] softirqs last disabled at (187): [c0125406] 
 do_softirq+0x37/0x4d

 That's a bit weird. Looks like another bug in the IRQ layer.
 
 Something happens with the hardware that confuses the kernel. It's
 triggered by software but I don't know where.. Like Michael, I'm
 not too convinced that it is in b43. :\

From a config file posted earlier, the OP is using SLAB. Is there any point in
trying SLUB?

Larry

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Peter Stuge
Yuval Hager wrote:
 When the wireless is working:
 00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
 10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
 
 After it fails:
 00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
 10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00

Differences:

04h bit 1: A value of 1 allows the device to respond to Memory Space addresses.
04h bit 2: A value of 1 allows the device to behave as a bus master.
04h bit 8: A value of 1 enables the SERR# driver.

0ch bit 3: System cacheline size in units of DWORDs.

10h: BAR0 (memory mapped address for device)

3ch bits 7:0: Interrupt Line

Basically the card has been deconfigured. This should never happen.

Try the following (somewhat naive) command to see if it starts
working again:

setpci -d 14e4:4312 c.l=8 10.l=fdffc004 4.w=0106


//Peter


pgpFVIyLNz5Et.pgp
Description: PGP signature
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Yuval Hager
On Saturday 22 November 2008, Peter Stuge wrote:
 Yuval Hager wrote:
  When the wireless is working:
  00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
  10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
  30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
 
  After it fails:
  00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
  10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00

 Differences:

 04h bit 1: A value of 1 allows the device to respond to Memory Space
 addresses. 04h bit 2: A value of 1 allows the device to behave as a bus
 master. 04h bit 8: A value of 1 enables the SERR# driver.

 0ch bit 3: System cacheline size in units of DWORDs.

 10h: BAR0 (memory mapped address for device)

 3ch bits 7:0: Interrupt Line

 Basically the card has been deconfigured. This should never happen.

 Try the following (somewhat naive) command to see if it starts
 working again:

 setpci -d 14e4:4312 c.l=8 10.l=fdffc004 4.w=0106


Nope, that doesn't work.
At first I get the same register reads as in the beginning, but no network 
access. When I try to restart the interface, I get Fatal DMA error 
and Controller RESET (DMA error). Trying to unload and reload the modules 
leads to a complete lockup.

--yuval


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Michael Buesch
On Saturday 22 November 2008 07:39:24 Yuval Hager wrote:
 On Friday 21 November 2008, Larry Finger wrote:
  Yuval,
 
  Michael Buesch wrote:
   Can you dump PCI config space and SSB registers (TMSLOW, maybe others,
   too). It looks like a random bus write disabled the device.
 
  Please incorporate the following patch and run your system. In addition,
  run the following command when the wireless is working and after it fails:
 
  sudo lspci -d 14e4:4312 -x
 
 
 When the wireless is working:
 $ lspci -d 14e4:4312 -x
 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
 00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
 10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
 
 After it fails:
 $ lspci -d 14e4:4312 -x
 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
 00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
  ^^  ^^
Somebody disabled MMIO and busmastering.
And somebody cleared the CACHE_LINE_SIZE register.

 10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Michael Buesch
On Saturday 22 November 2008 12:59:27 Yuval Hager wrote:
 On Saturday 22 November 2008, Peter Stuge wrote:
  Yuval Hager wrote:
   When the wireless is working:
   00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
   10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
   30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
  
   After it fails:
   00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
   10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00
 
  Differences:
 
  04h bit 1: A value of 1 allows the device to respond to Memory Space
  addresses. 04h bit 2: A value of 1 allows the device to behave as a bus
  master. 04h bit 8: A value of 1 enables the SERR# driver.
 
  0ch bit 3: System cacheline size in units of DWORDs.
 
  10h: BAR0 (memory mapped address for device)
 
  3ch bits 7:0: Interrupt Line
 
  Basically the card has been deconfigured. This should never happen.
 
  Try the following (somewhat naive) command to see if it starts
  working again:
 
  setpci -d 14e4:4312 c.l=8 10.l=fdffc004 4.w=0106
 
 
 Nope, that doesn't work.
 At first I get the same register reads as in the beginning, but no network 
 access. When I try to restart the interface, I get Fatal DMA error 
 and Controller RESET (DMA error). Trying to unload and reload the modules 
 leads to a complete lockup.

Ok, kind of expected.

Can you turn on _all_ kernel-hacking options for memory debugging?

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Larry Finger
Michael Buesch wrote:

 Somebody disabled MMIO and busmastering.
 And somebody cleared the CACHE_LINE_SIZE register.

Are these all the read/write bits in the configuration area? Should I conclude
that someone zeroed this area?

In case the kernel memory diagnostics don't help, is there any way to trap
writes to the configuration registers?

Larry
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Michael Buesch
On Saturday 22 November 2008 16:32:08 Larry Finger wrote:
 Michael Buesch wrote:
 
  Somebody disabled MMIO and busmastering.
  And somebody cleared the CACHE_LINE_SIZE register.
 
 Are these all the read/write bits in the configuration area? Should I conclude
 that someone zeroed this area?

Yeah well. I'm not sure. It _looks_ like someone completely cut the physical
power line to the card and it reset its complete PCI config.
So well, X does poke with the PCI devices. But as you said it also happens if
X doesn't run, I'd rule that out.
But I would not rule out a fucked BIOS, yet.
Does the BIOS have any powersave options and/or spread-spectrum options for
the PCI-bus? Can you try to turn them all off?
I have a machine that has PCI-slot autodetect and turns of the PCI clock, if
it doesn't detect a card on that slot. Also turn that off, if you have it, too.

 In case the kernel memory diagnostics don't help, is there any way to trap
 writes to the configuration registers?

Well, if we have random memory corruption, that can hit memory and MMIO.
It doesn't hurt to turn on all debugging options. Often you get some hint
by doing so.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Peter Stuge
Michael Buesch wrote:
  Are these all the read/write bits in the configuration area?

No, there are more of them. Most bytes in config space are rw, except
the first four.


  Should I conclude that someone zeroed this area?

No, because there are still valid bytes. Especially the first byte in
the BAR being non-zero (maybe even unchanged) is peculiar.


 Yeah well. I'm not sure. It _looks_ like someone completely cut the
 physical power line to the card and it reset its complete PCI
 config.

PCIe maps config space into MMIO IIRC. It might be clobbered that
way.


 So well, X does poke with the PCI devices. But as you said it also
 happens if X doesn't run, I'd rule that out.

Agree.


 But I would not rule out a fucked BIOS, yet.

Possibly.


 Does the BIOS have any powersave options and/or spread-spectrum
 options for the PCI-bus? Can you try to turn them all off?

Being a HP I don't expect there are many options in the BIOS. :\


  In case the kernel memory diagnostics don't help, is there any
  way to trap writes to the configuration registers?
 
 Well, if we have random memory corruption, that can hit memory and MMIO.
 It doesn't hurt to turn on all debugging options. Often you get some hint
 by doing so.

I hope this will give some more info. I think it will.


//Peter
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-22 Thread Yuval Hager
On Saturday 22 November 2008, Michael Buesch wrote:
 On Saturday 22 November 2008 16:32:08 Larry Finger wrote:
  Michael Buesch wrote:
   Somebody disabled MMIO and busmastering.
   And somebody cleared the CACHE_LINE_SIZE register.
 
  Are these all the read/write bits in the configuration area? Should I
  conclude that someone zeroed this area?

 Yeah well. I'm not sure. It _looks_ like someone completely cut the
 physical power line to the card and it reset its complete PCI config.
 So well, X does poke with the PCI devices. But as you said it also happens
 if X doesn't run, I'd rule that out.
 But I would not rule out a fucked BIOS, yet.
 Does the BIOS have any powersave options and/or spread-spectrum options for
 the PCI-bus? Can you try to turn them all off?
 I have a machine that has PCI-slot autodetect and turns of the PCI clock,
 if it doesn't detect a card on that slot. Also turn that off, if you have
 it, too.

  In case the kernel memory diagnostics don't help, is there any way to
  trap writes to the configuration registers?

 Well, if we have random memory corruption, that can hit memory and MMIO.
 It doesn't hurt to turn on all debugging options. Often you get some hint
 by doing so.

I've enabled all CONFIG*DEBUG I could find relevant, and ran the system with:
'debug memory_corruption_check=1 devres.log=1 debug_objects debugpat 
acpi.debug_layer=0x00410002 acpi.debug_level=0x'
but no hint appears in the logs during the failure.

I did find that certain events recreate the problem immediately. if I 'xset 
dpms force standby' it happens on wakeup. 'xset -dpms' causes this 
immediately as well. If I load X without DPMS support, it still happens after 
the monitor is waken up from (hardware?) blackness.

--yuval


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-21 Thread Michael Buesch
On Friday 21 November 2008 17:25:22 Larry Finger wrote:
 A problem was recently posted to the bcm43xx mailing list that I am unable to
 solve. The machine in question is an HP Mini 2133 (HP product number FU346EA)
 with a BCM4312 PCIe wireless card. This card is known to work with the b43
 driver (I have one.) and it does work on this machine - at least initially.
 
 A problem occurs when xdm/kde is started. Suddenly a read operation on device
 hardware returns all ones as though the register does not exist, or if it were
 suddenly mismapped. If the OP doesn't try to run xdm, the same problem will
 eventually occur, it just takes longer.

Can you dump PCI config space and SSB registers (TMSLOW, maybe others, too).
It looks like a random bus write disabled the device.

 [0.00] Zone PFN ranges:
 [0.00]   DMA  0x - 0x1000
 [0.00]   Normal   0x1000 - 0x000373fe
 [0.00]   HighMem  0x000373fe - 0x0006feb0
 
 On my 64-bit HP machine, I see:
 
 Zone PFN ranges:
   DMA  0x - 0x1000
   DMA320x1000 - 0x0010
   Normal   0x0010 - 0x0010
 
 Is it normal for there not to be a DMA32 range with a 32-bit version of 
 Linux?

Yeah, I think so.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: BCM4312 Fails when xdm is started

2008-11-21 Thread Yuval Hager
On Friday 21 November 2008, Larry Finger wrote:
 Yuval,

 Michael Buesch wrote:
  Can you dump PCI config space and SSB registers (TMSLOW, maybe others,
  too). It looks like a random bus write disabled the device.

 Please incorporate the following patch and run your system. In addition,
 run the following command when the wireless is working and after it fails:

 sudo lspci -d 14e4:4312 -x


When the wireless is working:
$ lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00

After it fails:
$ lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00


 Post the results of the above commands and any entries in /var/log/messages
 that dump registers. They should all be prefaced with 


Sorry for the long time to reply, it took a while to recreate the problem. 
According to the logs, it happened exactly when I checked the machine in the 
morning.
At the beginning, the register dumps look like this:

[   57.279984] ** b43: B43_MMIO_MACCTL 0x840A0503
[   57.279992] ** b43: SSB_TMSLOW 0x2015
(these line repeat exactly the same. Skipping)

[31723.961262] ** b43: B43_MMIO_MACCTL 0x840A0503
[31723.961275] ** b43: SSB_TMSLOW 0x2015
[31732.959490] b43-phy0: Radio hardware status changed to DISABLED
[31732.959505] b43-phy0:  B43_B43_MMIO_RADIO_HWENABLED_HI 0x
[31738.130551] wlan0: No ProbeResp from current AP 00:22:3f:18:89:5e - assume 
out of range
[31783.855931] [ cut here ]
[31783.855944] WARNING: at drivers/net/wireless/b43/phy_common.c:135 
b43_radio_lock+0x29/0x7e [b43]()
[31783.855955] Modules linked in: via drm rfkill_input hci_usb b43 led_class 
input_polldev rtc snd_hda_intel snd_pcm snd_tim
er snd_page_alloc snd_hwdep snd soundcore ehci_hcd uhci_hcd usbcore sg ssb 
video output via_agp agpgart
[31783.856023] Pid: 1220, comm: b43 Not tainted 2.6.28-rc5 #13
[31783.856032] Call Trace:
[31783.856055]  [c011f4e9] warn_on_slowpath+0x40/0x59
[31783.856102]  [f7d93da3] b43_gphy_op_write+0x25/0x29 [b43]
[31783.856143]  [f7d90735] b43_calc_nrssi_slope+0x103e/0x105a [b43]
[31783.856180]  [f7cba429] ssb_pci_write32+0x15/0x3f [ssb]
[31783.856209]  [f7cba545] ssb_pci_read16+0x31/0x3f [ssb]
[31783.856244]  [f7d8854c] __b43_shm_read16+0x79/0x81 [b43]
[31783.856272]  [f7cba545] ssb_pci_read16+0x31/0x3f [ssb]
[31783.856306]  [f7d8854c] __b43_shm_read16+0x79/0x81 [b43]
[31783.856334]  [f7cba4eb] ssb_pci_read32+0x12/0x3b [ssb]
[31783.856370]  [f7d8dc77] b43_radio_lock+0x29/0x7e [b43]
[31783.856408]  [f7d91e92] b43_gphy_op_adjust_txpower+0x111/0x138 [b43]
[31783.856446]  [f7d8da06] b43_phy_txpower_adjust_work+0x0/0x39 [b43]
[31783.856483]  [f7d8da36] b43_phy_txpower_adjust_work+0x30/0x39 [b43]
[31783.856500]  [c012b2a4] run_workqueue+0x6a/0xdf
[31783.856515]  [c012b9ba] worker_thread+0x0/0xbd
[31783.856527]  [c012ba6d] worker_thread+0xb3/0xbd
[31783.856545]  [c012dc8c] autoremove_wake_function+0x0/0x2d
[31783.856560]  [c012dbc9] kthread+0x38/0x5f
[31783.856573]  [c012db91] kthread+0x0/0x5f
[31783.856588]  [c01040a7] kernel_thread_helper+0x7/0x10
[31783.856598] ---[ end trace 7548c7ede66fa0d3 ]---
[31783.856607] ** b43: B43_MMIO_MACCTL 0x
[31783.856616] ** b43: SSB_TMSLOW 0x

And from now on, all reads are ones. I have the full logs if you need.

Cheers,

--yuval


signature.asc
Description: This is a digitally signed message part.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev