Re: Problem With Driver
On Wed, 2006-08-30 at 12:34 -0400, David Reimer wrote: The 05:06.X device is the card bus bridge that the BroadCom card is connetcted to. Basically, it the pcmcia slot. What machine are you using? I tested my own pcmcia slot with a 16-bit device last night on my powerbook and it didn't work because there were interrupt routing problems, even after doing echo -n 1 add_card or whatever that sysfs knob is called under class/pcmcia. johannes ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: Problem With Driver
On Thu, 2006-09-07 at 08:48 +0200, Johannes Berg wrote: What machine are you using? I tested my own pcmcia slot with a 16-bit device last night on my powerbook and it didn't work because there were interrupt routing problems, even after doing echo -n 1 add_card or whatever that sysfs knob is called under class/pcmcia. Eh, umm, never mind. It does show up in your slot after all. I have no idea how drivers shall bind to 16-bit devices though. johannes ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] ucode debug status via sysfs for wireless-2.6
On Thursday 07 September 2006 03:34, Larry Finger wrote: John, Please apply this patch by Martin Langer to wireless-2.6. It must follow the patch to Add firmware version printout to wireless-2.6 (bcm43xx-softmac). As originally submitted, the patch was appropriate for an obsolete version of bcm43xx-softmac, but I have updated and tested. Thanks, Larry This patch prints out the ucode debug status to sysfs. So, users can watch the microcode status of their hardware. Signed-off-by: Martin Langer [EMAIL PROTECTED] Signed-off-by: Larry Finger [EMAIL PROTECTED] = diff --git a/drivers/net/wireless/bcm43xx/bcm43xx.h b/drivers/net/wireless/bcm43xx/bcm43xx.h index 62fd7e2..7c2163e 100644 Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx.h +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h @@ -311,6 +311,7 @@ #define BCM43xx_UCODE_PATCHLEVEL0x0002 #define BCM43xx_UCODE_DATE 0x0004 #define BCM43xx_UCODE_TIME 0x0006 +#define BCM43xx_UCODE_STATUS0x0040 /* MicrocodeFlagsBitfield (addr + lo-word values?)*/ #define BCM43xx_UCODEFLAGS_OFFSET 0x005E Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c @@ -373,6 +373,56 @@ static DEVICE_ATTR(phymode, 0644, bcm43xx_attr_phymode_show, bcm43xx_attr_phymode_store); +static ssize_t bcm43xx_attr_microcode_show(struct device *dev, +struct device_attribute *attr, +char *buf) +{ + struct bcm43xx_private *bcm = dev_to_bcm(dev); + ssize_t count = 0; + u16 status; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + you want to take the spinlock lock here, too. + mutex_lock((bcm)-mutex); + status = bcm43xx_shm_read16(bcm, BCM43xx_SHM_SHARED, + BCM43xx_UCODE_STATUS); can unlock both here. + switch (status) { + case 0x: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (invalid)\n, + status); + break; + case 0x0001: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (init)\n, + status); + break; + case 0x0002: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (active)\n, + status); + break; + case 0x0003: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (suspended)\n, + status); + break; + case 0x0004: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (asleep)\n, + status); + break; + default: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (unknown)\n, + status); + break; + } + mutex_unlock((bcm)-mutex); + + return count; +} + +static DEVICE_ATTR(microcodestatus, 0444, +bcm43xx_attr_microcode_show, +NULL); + int bcm43xx_sysfs_register(struct bcm43xx_private *bcm) { struct device *dev = bcm-pci_dev-dev; @@ -392,9 +442,14 @@ int bcm43xx_sysfs_register(struct bcm43x err = device_create_file(dev, dev_attr_phymode); if (err) goto err_remove_shortpreamble; + err = device_create_file(dev, dev_attr_microcodestatus); + if (err) + goto err_remove_phymode; out: return err; +err_remove_phymode: + device_remove_file(dev, dev_attr_phymode); err_remove_shortpreamble: device_remove_file(dev, dev_attr_shortpreamble); err_remove_interfmode: @@ -408,6 +463,7 @@ void bcm43xx_sysfs_unregister(struct bcm { struct device *dev = bcm-pci_dev-dev; + device_remove_file(dev, dev_attr_microcodestatus); device_remove_file(dev, dev_attr_phymode); device_remove_file(dev, dev_attr_shortpreamble); device_remove_file(dev, dev_attr_interference); === -- Greetings Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] A change in periodic work scheduling in bcm43xx
On Thursday 07 September 2006 10:43, Bin Zhang wrote: On 9/5/06, Hendrik Sattler [EMAIL PROTECTED] wrote: Am Dienstag 05 September 2006 19:58 schrieb Larry Finger: Based on user reports and my own experiences, the current problems with NETDEV WATCHDOG tx timeouts, and the device just falling over do not happen when periodic work is not preemptible. These problems seem to affect BCM4306 rev 2 3 chips. Since I changed BADNESS_LIMIT to 20 to disable preemption during periodic work, my device has stayed up continuously for more than 18 hours. Previously, the longest time between failures was less than 6 hours, and sometimes as short as 10 minutes. I have a BCM4306 rev 3, running with linux-2.6.17.x and have none of these problems. It seems to me that this problem appears only when you use 2.6.18. I had already twice this problem after more then 10 hours uptime : Sep 7 09:39:45 localhost kernel: bcm43xx: Controller restarted Sep 7 09:43:09 localhost syslogd 1.4.1#18: restart. With 2.6.17, I have never seen this problem. That was not the question. 2.6.17 does not have preemptible work. We need a way to trigger this more quick. It's rather hard to debug something if it only triggers every 10 hours. Especially if we don't know for sure what's going on. The problem is _not_ the BADNESS_LIMIT. So raising this _won't_ help us. The problem is somewhere in the preemptible work card- stopping-code. That is the code path triggered by a high badness. A higher BADNESS_LIMIT _won't_ fix the code. It will only introduce more bugs and make it even harder to debug. So _lowering_ the BADNESS_LIMIT could actually be a way to trigger it in reasonable time. We could also try to fire periodic work more often. Say once per second. -- Greetings Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] A change in periodic work scheduling in bcm43xx
On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote: Martin Langer wrote: Larry, IIRC your hardware is a 0x812 rev 4. This core will load a different microcode (bcm43xx_microcode4.fw) than all later core revisions. (I guess the pci revision number isn't usefull here.) Those old microcodes 2 and 4 seem to have a different instruction set in their firmware than later ones. So I'm afraid that those old bcm43xx cores are based on a different microprocessor. And if they are really based on a different microprocessor then their drivers will probably have different problems. But this theory will only fit if your problems are limited to the 0x812 rev2/rev4 world. Just my $0.02. No, mine is a 4306 rev 2 card. Sure. But there are more rev numbers in your logfiles. And there has to be the revision of your wlan core 0x812, too. Look for a line like bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled ^^^ Which microcode will be loaded depends only on that revision number. The output of your new microcode version printout is Microcode rev 0x123, pl 0x21 (2005-01-22 19:48:06) Yeah, the patch seems to work like expected. That's good. Martin ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] A change in periodic work scheduling in bcm43xx
On Thu, Sep 07, 2006 at 02:28:53PM +0200, Bin Zhang wrote: On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote: On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote: Martin Langer wrote: Those old microcodes 2 and 4 seem to have a different instruction set in their firmware than later ones. So I'm afraid that those old bcm43xx cores are based on a different microprocessor. And if they are really based on a different microprocessor then their drivers will probably have different problems. But this theory will only fit if your problems are limited to the 0x812 rev2/rev4 world. Just my $0.02. I have this problem. This is the line in my syslog : bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled Ok, this will load bcm43xx_microcode5.fw which is one of the newer ones. So we can forget that idea. It can not related to those rare old ones only. Martin PS: Don't forget to send a copy of your mail to [EMAIL PROTECTED] next time. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] A change in periodic work scheduling in bcm43xx
Martin Langer wrote: On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote: Sure. But there are more rev numbers in your logfiles. And there has to be the revision of your wlan core 0x812, too. Look for a line like bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled ^^^ Which microcode will be loaded depends only on that revision number. You're right. My numbers are bcm43xx: Core 0: ID 0x800, rev 0x2, vendor 0x4243, enabled bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled bcm43xx: Core 2: ID 0x80d, rev 0x1, vendor 0x4243, enabled bcm43xx: Core 3: ID 0x807, rev 0x1, vendor 0x4243, disabled bcm43xx: Core 4: ID 0x804, rev 0x7, vendor 0x4243, enabled bcm43xx: Core 5: ID 0x812, rev 0x4, vendor 0x4243, disabled The output of your new microcode version printout is Microcode rev 0x123, pl 0x21 (2005-01-22 19:48:06) Yeah, the patch seems to work like expected. That's good. Sure does. Larry ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] A change in periodic work scheduling in bcm43xx
On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote: On Thu, Sep 07, 2006 at 02:28:53PM +0200, Bin Zhang wrote: On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote: On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote: Martin Langer wrote: Those old microcodes 2 and 4 seem to have a different instruction set in their firmware than later ones. So I'm afraid that those old bcm43xx cores are based on a different microprocessor. And if they are really based on a different microprocessor then their drivers will probably have different problems. But this theory will only fit if your problems are limited to the 0x812 rev2/rev4 world. Just my $0.02. I have this problem. This is the line in my syslog : bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled Ok, this will load bcm43xx_microcode5.fw which is one of the newer ones. So we can forget that idea. It can not related to those rare old ones only. grep bcm43xx: Core /var/log/syslog gives me other numbers Sep 7 14:44:17 localhost kernel: bcm43xx: Core 0: ID 0x800, rev 0x4, vendor 0x4243, enabled Sep 7 14:44:17 localhost kernel: bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled Sep 7 14:44:17 localhost kernel: bcm43xx: Core 2: ID 0x80d, rev 0x2, vendor 0x4243, enabled Sep 7 14:44:17 localhost kernel: bcm43xx: Core 3: ID 0x807, rev 0x2, vendor 0x4243, disabled Sep 7 14:44:17 localhost kernel: bcm43xx: Core 4: ID 0x804, rev 0x9, vendor 0x4243, enabled Thanks, Bin Martin PS: Don't forget to send a copy of your mail to [EMAIL PROTECTED] next time. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] ucode debug status via sysfs for wireless-2.6
Michael Buesch wrote: On Thursday 07 September 2006 03:34, Larry Finger wrote: +return -EPERM; + you want to take the spinlock lock here, too. Obviously, I copied the wrong model. Is it correct that one should take both locks if your code will touch the hardware, but the mutex lock only is sufficient if your code just accesses data structures? This seems to be the pattern in the other bcm43xx_attr__show routines. Larry ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[PATCH] Try 2: ucode debug status via sysfs for wireless-2.6
John, Please apply this patch by Martin Langer to wireless-2.6. It must follow the patch to Add firmware version printout to wireless-2.6 (bcm43xx-softmac). Martins's original version was appropriate for an obsolete version of bcm43xx-softmac, but I have updated and tested. This version incorporates the locking revision comments of Michael Buesch. Thanks, Larry This patch prints out the ucode debug status to sysfs. So, users can watch the microcode status of their hardware. Signed-off-by: Martin Langer [EMAIL PROTECTED] Signed-off-by: Larry Finger [EMAIL PROTECTED] === diff --git a/drivers/net/wireless/bcm43xx/bcm43xx.h b/drivers/net/wireless/bcm43xx/bcm43xx.h index 62fd7e2..7c2163e 100644 Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx.h +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h @@ -311,6 +311,7 @@ #define BCM43xx_UCODE_PATCHLEVEL0x0002 #define BCM43xx_UCODE_DATE 0x0004 #define BCM43xx_UCODE_TIME 0x0006 +#define BCM43xx_UCODE_STATUS0x0040 /* MicrocodeFlagsBitfield (addr + lo-word values?)*/ #define BCM43xx_UCODEFLAGS_OFFSET 0x005E Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c @@ -373,6 +373,59 @@ static DEVICE_ATTR(phymode, 0644, bcm43xx_attr_phymode_show, bcm43xx_attr_phymode_store); +static ssize_t bcm43xx_attr_microcode_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + unsigned long flags; + struct bcm43xx_private *bcm = dev_to_bcm(dev); + ssize_t count = 0; + u16 status; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + mutex_lock((bcm)-mutex); + spin_lock_irqsave(bcm-irq_lock, flags); + status = bcm43xx_shm_read16(bcm, BCM43xx_SHM_SHARED, + BCM43xx_UCODE_STATUS); + + spin_unlock_irqrestore(bcm-irq_lock, flags); + mutex_unlock((bcm)-mutex); + switch (status) { + case 0x: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (invalid)\n, +status); + break; + case 0x0001: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (init)\n, +status); + break; + case 0x0002: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (active)\n, +status); + break; + case 0x0003: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (suspended)\n, +status); + break; + case 0x0004: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (asleep)\n, +status); + break; + default: + count = snprintf(buf, PAGE_SIZE, 0x%.4x (unknown)\n, +status); + break; + } + + return count; +} + +static DEVICE_ATTR(microcodestatus, 0444, + bcm43xx_attr_microcode_show, + NULL); + int bcm43xx_sysfs_register(struct bcm43xx_private *bcm) { struct device *dev = bcm-pci_dev-dev; @@ -392,9 +445,14 @@ int bcm43xx_sysfs_register(struct bcm43x err = device_create_file(dev, dev_attr_phymode); if (err) goto err_remove_shortpreamble; + err = device_create_file(dev, dev_attr_microcodestatus); + if (err) + goto err_remove_phymode; out: return err; +err_remove_phymode: + device_remove_file(dev, dev_attr_phymode); err_remove_shortpreamble: device_remove_file(dev, dev_attr_shortpreamble); err_remove_interfmode: @@ -408,6 +466,7 @@ void bcm43xx_sysfs_unregister(struct bcm { struct device *dev = bcm-pci_dev-dev; + device_remove_file(dev, dev_attr_microcodestatus); device_remove_file(dev, dev_attr_phymode); device_remove_file(dev, dev_attr_shortpreamble); device_remove_file(dev, dev_attr_interference); ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: Problem With Driver
First I'd just like to say thank you to everyone who's been trying to help me. You've all been very helpful and patient with all of my newbe questions. I'm still having problems getting my card bound (I believe that's the correct terminology, please correct me if I'm wrong). I'll first do a little rehash of what's going on: I am using kernel 2.6.17-11, and have the following options: Bus Options * PCCard (PCMCIA/CardBus) support * 16 bit PCMCIA [*] Load CIS updates from userspace [*] PCMCIA control ioctl * CardBus yenta- compatible bridge support * Cirrus PD6729 compatible bridge support * i82092 compatible bridge support * I82365 compatible bridge support * Databook TCIC compatible bridge support Networking M Generic IEEE 802.11 Networking Stack M Software MAC add-on to IEEE 802.11 [*] Enable full debugging output Device Drivers Network device support Wireless Lan [*] Wireless Lan Driver (non-ham) M Broadcom BCM43xx wireless support [*] Broadcom BCM43xx debugging BCM43xx data transfer Mode (X) DMA + PIO lspcmcia outputs: lspcmcia, lspcmcia ls, lspcmcia info, lspcmica status, lspcmcia config, and lspcmcia idnet all return: Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0) Socket 0 Device 0: [-- no driver --] (bus ID: 0.0) lspcmcia -v returns: Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0) Configuration: state: on ready: yes Voltage: 3.3V Vcc: 3.3V Vpp: 3.3V Socket 0 Device 0: [-- no driver --] (bus ID: 0.0) Configuration: state: on Product Name: SummitDC 802.11g SC CF 4.0 Identification: manf_id: 0x02d0 card_id: 0x0448 function: 6 (network) prod_id(1): SummitDC (0xde2746ff) prod_id(2): 802.11g SC CF (0xdbe5c59c) prod_id(3): 4.0 (0x0ebc74cc) prod_id(4): --- (---) pccard outputs: pccardctl ls returns: Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0) Socket 0 Device 0: [-- no driver --] (bus ID: 0.0) pccardctl info returns: PRODID_1=SummitDC PRODID_2=802.11g SC CF PRODID_3=4.0 PRODID_4= MANFID=02d0,0448 FUNCID=6 pccardctl status returns: Socket 0: 3.3V 16-bit PC Card Subdevice 0 (function 0) [unbound] pccardctl ident returns: Socket 0: product info: SummitDC, 802.11g SC CF, 4.0, manfid: 0x02d0, 0x0448 function: 6 (network) I guess my first questions is what mechanism is used to bind a device to a driver, and secondly where is a good source for me to read to start to understand the Linux Device Driver system? I've already picked up the O'Reilly book (Linux Device Drivers 3rd Edition). I'm just trying to educate myself, so I can quit bothering all of you with my novice questions. Thank you all again you've been great. Dave ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: Problem With Driver
Am Donnerstag 07 September 2006 19:19 schrieb David Reimer: I guess my first questions is what mechanism is used to bind a device to a driver, and secondly where is a good source for me to read to start to understand the Linux Device Driver system? I suggest existing drivers as example. I know that driver/net/wireless/orinoco_cs.c works well. Also check the files in Documentation/pcmcia. I've already picked up the O'Reilly book (Linux Device Drivers 3rd Edition). That one is a good start. HS ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
RFC/T: Possible fix for bcm43xx periodic work bug
Hi all, I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly fixed, I hope to get the fix into mainline before release of the bug into the stable series. I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler weren't causing us to just miss. To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per minute. Now failures would occur in minutes instead of hours. Operating from the premise that the DMA needed some time to reach the idle state after the MAC was suspended, I tried various delays, but nothing worked. Then I decided to test the premise that the problem was associated with shutting down and restarting the network. That lead to the current patch, which has run for what is effectively 100 times longer than previous versions. Larry --- Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle * be preemtible. */ mutex_lock(bcm-mutex); - netif_stop_queue(bcm-net_dev); - synchronize_net(); spin_lock_irqsave(bcm-irq_lock, flags); bcm43xx_mac_suspend(bcm); if (bcm43xx_using_pio(bcm)) @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle if (bcm43xx_using_pio(bcm)) bcm43xx_pio_thaw_txqueues(bcm); bcm43xx_mac_enable(bcm); - netif_wake_queue(bcm-net_dev); } mmiowb(); spin_unlock_irqrestore(bcm-irq_lock, flags); -- ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: RFC/T: Possible fix for bcm43xx periodic work bug
On Thursday 07 September 2006 20:17, Larry Finger wrote: Hi all, I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly fixed, I hope to get the fix into mainline before release of the bug into the stable series. I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler weren't causing us to just miss. To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per minute. Now failures would occur in minutes instead of hours. Operating from the premise that the DMA needed some time to reach the idle state after the MAC was suspended, I tried various delays, but nothing worked. Then I decided to test the premise that the problem was associated with shutting down and restarting the network. That lead to the current patch, which has run for what is effectively 100 times longer than previous versions. Larry --- Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle * be preemtible. */ mutex_lock(bcm-mutex); - netif_stop_queue(bcm-net_dev); - synchronize_net(); spin_lock_irqsave(bcm-irq_lock, flags); bcm43xx_mac_suspend(bcm); if (bcm43xx_using_pio(bcm)) @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle if (bcm43xx_using_pio(bcm)) bcm43xx_pio_thaw_txqueues(bcm); bcm43xx_mac_enable(bcm); - netif_wake_queue(bcm-net_dev); } mmiowb(); spin_unlock_irqrestore(bcm-irq_lock, flags); The real question is: Why does this patch help? Let's explain it. We don't stop networking just for fun there. While executing long preemptible periodic work, we must ensure that the TX path into the driver is not entered. It's the same reason why we disable IRQs in the first place. We can't take the mutex in the TX path and the IRQ handler. (That are the only places where we can't take the mutex). Short: We must stop netif here. The question is: Why does stopping netif queue cause a watchdog trigger here? The maximum time it can take for the periodic work inside of the critical section is about 0.2sec. So the queue is stopped for about 0.2sec max. Why does the watchdog trigger? Any idea from some networking guru? Could synchronize_net() take over 5sec in some worst case? Why? Questions over questions :D -- Greetings Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] ucode debug status via sysfs for wireless-2.6
On Thursday 07 September 2006 15:21, Larry Finger wrote: Michael Buesch wrote: On Thursday 07 September 2006 03:34, Larry Finger wrote: + return -EPERM; + you want to take the spinlock lock here, too. Obviously, I copied the wrong model. Is it correct that one should take both locks if your code will touch the hardware, but the mutex lock only is sufficient if your code just accesses data structures? This seems to be the pattern in the other bcm43xx_attr__show routines. In general, no. But, for some sysfs attrs it is sufficient to only take the mutex, because: * We don't access hardware. * We don't modify this data in a spinlock-only critical section. Yes, I know that having two locks does not really fit the lock data, not code model. But it's well defined in bcm43xx, so I think we can live with it. (and we must live with it, if we want to have preemptible periodic work. And we _want_). It's defined by the following rules: 1) Always take both, mutex and lock. 2) There are only two places where we can't take the mutex, but only the spinlock. IRQ and TX paths. (Yes, I know that there are still exceptions to 2. At least in dscape. The softmac port should be OK. These are bugs, I am aware of them and will fix it) So these two rules lead to the following rule: * Code where we only take the mutex can race against the TX and IRQ paths. Now we come back to the sysfs problem above. If the data, we access in this sysfs code, is not touched in either TX or IRQ path we don't need to take the spinlock. Yes, it's a little bit black magic. So if you aren't sure, always take both locks. -- Greetings Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: RFC/T: Possible fix for bcm43xx periodic work bug
Michael Buesch wrote: The real question is: Why does this patch help? Let's explain it. We don't stop networking just for fun there. While executing long preemptible periodic work, we must ensure that the TX path into the driver is not entered. It's the same reason why we disable IRQs in the first place. We can't take the mutex in the TX path and the IRQ handler. (That are the only places where we can't take the mutex). Short: We must stop netif here. The question is: Why does stopping netif queue cause a watchdog trigger here? The maximum time it can take for the periodic work inside of the critical section is about 0.2sec. So the queue is stopped for about 0.2sec max. Why does the watchdog trigger? Any idea from some networking guru? Could synchronize_net() take over 5sec in some worst case? Why? Questions over questions :D To check if it takes more than 5 seconds, I restored the original network disabling code and increased the timeout to 30 seconds. If this works without error, I'll try to margin the time. I'm still running that branch every second. Larry Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st SET_MODULE_OWNER(net_dev); SET_NETDEV_DEV(net_dev, pdev-dev); + net_dev-watchdog_timeo = 30 * HZ; net_dev-open = bcm43xx_net_open; net_dev-stop = bcm43xx_net_stop; net_dev-get_stats = bcm43xx_net_get_stats; Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -3207,7 +3207,7 @@ static void do_periodic_work(struct bcm4 bcm43xx_periodic_every15sec(bcm); bcm-periodic_state = state + 1; - schedule_delayed_work(bcm-periodic_work, HZ * 15); + schedule_delayed_work(bcm-periodic_work, HZ * 1); } /* Estimate a Badness value based on the periodic work @@ -3227,7 +3227,7 @@ static int estimate_periodic_work_badnes if (state % 1 == 0) /* every 15 sec */ badness += 1; -#define BADNESS_LIMIT 4 +#define BADNESS_LIMIT 0 return badness; } @@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st SET_MODULE_OWNER(net_dev); SET_NETDEV_DEV(net_dev, pdev-dev); + net_dev-watchdog_timeo = 30 * HZ; net_dev-open = bcm43xx_net_open; net_dev-stop = bcm43xx_net_stop; net_dev-get_stats = bcm43xx_net_get_stats; ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev