Re: Problem With Driver

2006-09-07 Thread Johannes Berg
On Wed, 2006-08-30 at 12:34 -0400, David Reimer wrote:

 The 05:06.X device is the card bus bridge that the BroadCom card is 
 connetcted to. Basically, it the pcmcia slot.

What machine are you using? I tested my own pcmcia slot with a 16-bit
device last night on my powerbook and it didn't work because there were
interrupt routing problems, even after doing echo -n 1  add_card or
whatever that sysfs knob is called under class/pcmcia.

johannes
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: Problem With Driver

2006-09-07 Thread Johannes Berg
On Thu, 2006-09-07 at 08:48 +0200, Johannes Berg wrote:

 What machine are you using? I tested my own pcmcia slot with a 16-bit
 device last night on my powerbook and it didn't work because there were
 interrupt routing problems, even after doing echo -n 1  add_card or
 whatever that sysfs knob is called under class/pcmcia.

Eh, umm, never mind. It does show up in your slot after all. I have no
idea how drivers shall bind to 16-bit devices though.

johannes
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] ucode debug status via sysfs for wireless-2.6

2006-09-07 Thread Michael Buesch
On Thursday 07 September 2006 03:34, Larry Finger wrote:
 John,
 
 Please apply this patch by Martin Langer to wireless-2.6. It must follow the 
 patch to Add firmware 
 version printout to wireless-2.6 (bcm43xx-softmac). As originally submitted, 
 the patch was 
 appropriate for an obsolete version of bcm43xx-softmac, but I have updated 
 and tested.
 
 Thanks,
 
 Larry
 
 This patch prints out the ucode debug status to sysfs. So, users can
 watch the microcode status of their hardware.
 
 Signed-off-by: Martin Langer [EMAIL PROTECTED]
 Signed-off-by: Larry Finger [EMAIL PROTECTED]
 
 =
 
 diff --git a/drivers/net/wireless/bcm43xx/bcm43xx.h 
 b/drivers/net/wireless/bcm43xx/bcm43xx.h
 index 62fd7e2..7c2163e 100644
 Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h
 ===
 --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx.h
 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h
 @@ -311,6 +311,7 @@
   #define BCM43xx_UCODE_PATCHLEVEL0x0002
   #define BCM43xx_UCODE_DATE  0x0004
   #define BCM43xx_UCODE_TIME  0x0006
 +#define BCM43xx_UCODE_STATUS0x0040
 
   /* MicrocodeFlagsBitfield (addr + lo-word values?)*/
   #define BCM43xx_UCODEFLAGS_OFFSET   0x005E
 Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
 ===
 --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
 @@ -373,6 +373,56 @@ static DEVICE_ATTR(phymode, 0644,
  bcm43xx_attr_phymode_show,
  bcm43xx_attr_phymode_store);
 
 +static ssize_t bcm43xx_attr_microcode_show(struct device *dev,
 +struct device_attribute *attr,
 +char *buf)
 +{
 + struct bcm43xx_private *bcm = dev_to_bcm(dev);
 + ssize_t count = 0;
 + u16 status;
 +
 + if (!capable(CAP_NET_ADMIN))
 + return -EPERM;
 +

you want to take the spinlock lock here, too.

 + mutex_lock((bcm)-mutex);
 + status = bcm43xx_shm_read16(bcm, BCM43xx_SHM_SHARED,
 + BCM43xx_UCODE_STATUS);

can unlock both here.

 + switch (status) {
 + case 0x:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (invalid)\n,
 +  status);
 + break;
 + case 0x0001:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (init)\n,
 +  status);
 + break;
 + case 0x0002:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (active)\n,
 +  status);
 + break;
 + case 0x0003:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (suspended)\n,
 +  status);
 + break;
 + case 0x0004:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (asleep)\n,
 +  status);
 + break;
 + default:
 + count = snprintf(buf, PAGE_SIZE, 0x%.4x (unknown)\n,
 +  status);
 + break;
 + }
 + mutex_unlock((bcm)-mutex);
 +
 + return count;
 +}
 +
 +static DEVICE_ATTR(microcodestatus, 0444,
 +bcm43xx_attr_microcode_show,
 +NULL);
 +
   int bcm43xx_sysfs_register(struct bcm43xx_private *bcm)
   {
   struct device *dev = bcm-pci_dev-dev;
 @@ -392,9 +442,14 @@ int bcm43xx_sysfs_register(struct bcm43x
   err = device_create_file(dev, dev_attr_phymode);
   if (err)
   goto err_remove_shortpreamble;
 + err = device_create_file(dev, dev_attr_microcodestatus);
 + if (err)
 + goto err_remove_phymode;
 
   out:
   return err;
 +err_remove_phymode:
 + device_remove_file(dev, dev_attr_phymode);
   err_remove_shortpreamble:
   device_remove_file(dev, dev_attr_shortpreamble);
   err_remove_interfmode:
 @@ -408,6 +463,7 @@ void bcm43xx_sysfs_unregister(struct bcm
   {
   struct device *dev = bcm-pci_dev-dev;
 
 + device_remove_file(dev, dev_attr_microcodestatus);
   device_remove_file(dev, dev_attr_phymode);
   device_remove_file(dev, dev_attr_shortpreamble);
   device_remove_file(dev, dev_attr_interference);
 
 
 
 ===
 

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] A change in periodic work scheduling in bcm43xx

2006-09-07 Thread Michael Buesch
On Thursday 07 September 2006 10:43, Bin Zhang wrote:
 On 9/5/06, Hendrik Sattler [EMAIL PROTECTED] wrote:
  Am Dienstag 05 September 2006 19:58 schrieb Larry Finger:
   Based on user reports and my own experiences, the current problems with
   NETDEV WATCHDOG tx timeouts, and the device just falling over do not 
   happen
   when periodic work is not preemptible. These problems seem to affect
   BCM4306 rev 2  3 chips. Since I changed BADNESS_LIMIT to 20 to disable
   preemption during periodic work, my device has stayed up continuously for
   more than 18 hours. Previously, the longest time between failures was less
   than 6 hours, and sometimes as short as 10 minutes.
 
  I have a BCM4306 rev 3, running with linux-2.6.17.x and have none of these
  problems.
 
 It seems to me that this problem appears only when you use 2.6.18.
 I had already twice this problem after more then 10 hours uptime :
 Sep  7 09:39:45 localhost kernel: bcm43xx: Controller restarted
 Sep  7 09:43:09 localhost syslogd 1.4.1#18: restart.
 
 With 2.6.17, I have never seen this problem.

That was not the question. 2.6.17 does not have preemptible work.

We need a way to trigger this more quick. It's rather hard to debug
something if it only triggers every 10 hours. Especially if we don't
know for sure what's going on.

The problem is _not_ the BADNESS_LIMIT. So raising this _won't_
help us. The problem is somewhere in the preemptible work card-
stopping-code. That is the code path triggered by a high badness.
A higher BADNESS_LIMIT _won't_ fix the code. It will only introduce
more bugs and make it even harder to debug.
So _lowering_ the BADNESS_LIMIT could actually be a way to
trigger it in reasonable time.
We could also try to fire periodic work more often.
Say once per second.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] A change in periodic work scheduling in bcm43xx

2006-09-07 Thread Martin Langer
On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote:
 Martin Langer wrote:
  Larry, IIRC your hardware is a 0x812 rev 4. This core will load a 
  different microcode (bcm43xx_microcode4.fw) than all later core 
  revisions. (I guess the pci revision number isn't usefull here.)
  
  Those old microcodes 2 and 4 seem to have a different instruction set in 
  their firmware than later ones. So I'm afraid that those old bcm43xx 
  cores are based on a different microprocessor. And if they are really 
  based on a different microprocessor then their drivers will probably 
  have different problems. But this theory will only fit if your problems 
  are limited to the 0x812 rev2/rev4 world. Just my $0.02.
 
 No, mine is a 4306 rev 2 card. 

Sure. But there are more rev numbers in your logfiles. And there has to 
be the revision of your wlan core 0x812, too. Look for a line like

bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled
   ^^^

Which microcode will be loaded depends only on that revision number. 

 The output of your new microcode version printout is Microcode rev 
 0x123, pl 0x21 (2005-01-22  19:48:06)

Yeah, the patch seems to work like expected. That's good.

Martin
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] A change in periodic work scheduling in bcm43xx

2006-09-07 Thread Martin Langer
On Thu, Sep 07, 2006 at 02:28:53PM +0200, Bin Zhang wrote:
 On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote:
 On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote:
  Martin Langer wrote:
   Those old microcodes 2 and 4 seem to have a different instruction set 
 in
   their firmware than later ones. So I'm afraid that those old bcm43xx
   cores are based on a different microprocessor. And if they are really
   based on a different microprocessor then their drivers will probably
   have different problems. But this theory will only fit if your problems
   are limited to the 0x812 rev2/rev4 world. Just my $0.02.

 I have this problem. This is the line in my syslog :
 bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled

Ok, this will load bcm43xx_microcode5.fw which is one of the newer ones.
So we can forget that idea. It can not related to those rare old ones 
only.

Martin

PS: Don't forget to send a copy of your mail to [EMAIL PROTECTED] 
next time.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] A change in periodic work scheduling in bcm43xx

2006-09-07 Thread Larry Finger
Martin Langer wrote:
 On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote:
 
 Sure. But there are more rev numbers in your logfiles. And there has to 
 be the revision of your wlan core 0x812, too. Look for a line like
 
 bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled
^^^
 
 Which microcode will be loaded depends only on that revision number. 

You're right. My numbers are

bcm43xx: Core 0: ID 0x800, rev 0x2, vendor 0x4243, enabled
bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled
bcm43xx: Core 2: ID 0x80d, rev 0x1, vendor 0x4243, enabled
bcm43xx: Core 3: ID 0x807, rev 0x1, vendor 0x4243, disabled
bcm43xx: Core 4: ID 0x804, rev 0x7, vendor 0x4243, enabled
bcm43xx: Core 5: ID 0x812, rev 0x4, vendor 0x4243, disabled

 
 The output of your new microcode version printout is Microcode rev 
 0x123, pl 0x21 (2005-01-22  19:48:06)
 
 Yeah, the patch seems to work like expected. That's good.

Sure does.

Larry

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] A change in periodic work scheduling in bcm43xx

2006-09-07 Thread Bin Zhang
On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote:
 On Thu, Sep 07, 2006 at 02:28:53PM +0200, Bin Zhang wrote:
  On 9/7/06, Martin Langer [EMAIL PROTECTED] wrote:
  On Wed, Sep 06, 2006 at 04:51:08PM -0500, Larry Finger wrote:
   Martin Langer wrote:
Those old microcodes 2 and 4 seem to have a different instruction set
  in
their firmware than later ones. So I'm afraid that those old bcm43xx
cores are based on a different microprocessor. And if they are really
based on a different microprocessor then their drivers will probably
have different problems. But this theory will only fit if your problems
are limited to the 0x812 rev2/rev4 world. Just my $0.02.

  I have this problem. This is the line in my syslog :
  bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled

 Ok, this will load bcm43xx_microcode5.fw which is one of the newer ones.
 So we can forget that idea. It can not related to those rare old ones
 only.

grep bcm43xx: Core /var/log/syslog gives me other numbers

Sep  7 14:44:17 localhost kernel: bcm43xx: Core 0: ID 0x800, rev 0x4,
vendor 0x4243, enabled
Sep  7 14:44:17 localhost kernel: bcm43xx: Core 1: ID 0x812, rev 0x5,
vendor 0x4243, disabled
Sep  7 14:44:17 localhost kernel: bcm43xx: Core 2: ID 0x80d, rev 0x2,
vendor 0x4243, enabled
Sep  7 14:44:17 localhost kernel: bcm43xx: Core 3: ID 0x807, rev 0x2,
vendor 0x4243, disabled
Sep  7 14:44:17 localhost kernel: bcm43xx: Core 4: ID 0x804, rev 0x9,
vendor 0x4243, enabled


Thanks,
Bin

 Martin

 PS: Don't forget to send a copy of your mail to [EMAIL PROTECTED]
 next time.

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] ucode debug status via sysfs for wireless-2.6

2006-09-07 Thread Larry Finger
Michael Buesch wrote:
 On Thursday 07 September 2006 03:34, Larry Finger wrote:
 +return -EPERM;
 +
 
 you want to take the spinlock lock here, too.

Obviously, I copied the wrong model. Is it correct that one should take both 
locks if your code will 
touch the hardware, but the mutex lock only is sufficient if your code just 
accesses data 
structures? This seems to be the pattern in the other bcm43xx_attr__show 
routines.

Larry

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[PATCH] Try 2: ucode debug status via sysfs for wireless-2.6

2006-09-07 Thread Larry Finger
John,

Please apply this patch by Martin Langer to wireless-2.6. It must follow the 
patch to Add firmware
version printout to wireless-2.6 (bcm43xx-softmac). Martins's original version 
was appropriate for 
an obsolete version of bcm43xx-softmac, but I have updated and tested. This 
version incorporates the 
locking revision comments of Michael Buesch.

Thanks,

Larry

This patch prints out the ucode debug status to sysfs. So, users can
watch the microcode status of their hardware.

Signed-off-by: Martin Langer [EMAIL PROTECTED]
Signed-off-by: Larry Finger [EMAIL PROTECTED]



===

diff --git a/drivers/net/wireless/bcm43xx/bcm43xx.h 
b/drivers/net/wireless/bcm43xx/bcm43xx.h
index 62fd7e2..7c2163e 100644
Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h
===
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx.h
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h
@@ -311,6 +311,7 @@
  #define BCM43xx_UCODE_PATCHLEVEL0x0002
  #define BCM43xx_UCODE_DATE  0x0004
  #define BCM43xx_UCODE_TIME  0x0006
+#define BCM43xx_UCODE_STATUS0x0040

  /* MicrocodeFlagsBitfield (addr + lo-word values?)*/
  #define BCM43xx_UCODEFLAGS_OFFSET 0x005E
Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
===
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c
@@ -373,6 +373,59 @@ static DEVICE_ATTR(phymode, 0644,
   bcm43xx_attr_phymode_show,
   bcm43xx_attr_phymode_store);

+static ssize_t bcm43xx_attr_microcode_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   unsigned long flags;
+   struct bcm43xx_private *bcm = dev_to_bcm(dev);
+   ssize_t count = 0;
+   u16 status;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
+   mutex_lock((bcm)-mutex);
+   spin_lock_irqsave(bcm-irq_lock, flags);
+   status = bcm43xx_shm_read16(bcm, BCM43xx_SHM_SHARED,
+   BCM43xx_UCODE_STATUS);
+
+   spin_unlock_irqrestore(bcm-irq_lock, flags);
+   mutex_unlock((bcm)-mutex);
+   switch (status) {
+   case 0x:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (invalid)\n,
+status);
+   break;
+   case 0x0001:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (init)\n,
+status);
+   break;
+   case 0x0002:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (active)\n,
+status);
+   break;
+   case 0x0003:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (suspended)\n,
+status);
+   break;
+   case 0x0004:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (asleep)\n,
+status);
+   break;
+   default:
+   count = snprintf(buf, PAGE_SIZE, 0x%.4x (unknown)\n,
+status);
+   break;
+   }
+
+   return count;
+}
+
+static DEVICE_ATTR(microcodestatus, 0444,
+  bcm43xx_attr_microcode_show,
+  NULL);
+
  int bcm43xx_sysfs_register(struct bcm43xx_private *bcm)
  {
struct device *dev = bcm-pci_dev-dev;
@@ -392,9 +445,14 @@ int bcm43xx_sysfs_register(struct bcm43x
err = device_create_file(dev, dev_attr_phymode);
if (err)
goto err_remove_shortpreamble;
+   err = device_create_file(dev, dev_attr_microcodestatus);
+   if (err)
+   goto err_remove_phymode;

  out:
return err;
+err_remove_phymode:
+   device_remove_file(dev, dev_attr_phymode);
  err_remove_shortpreamble:
device_remove_file(dev, dev_attr_shortpreamble);
  err_remove_interfmode:
@@ -408,6 +466,7 @@ void bcm43xx_sysfs_unregister(struct bcm
  {
struct device *dev = bcm-pci_dev-dev;

+   device_remove_file(dev, dev_attr_microcodestatus);
device_remove_file(dev, dev_attr_phymode);
device_remove_file(dev, dev_attr_shortpreamble);
device_remove_file(dev, dev_attr_interference);


___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: Problem With Driver

2006-09-07 Thread David Reimer
First I'd just like to say thank you to everyone who's been trying to help 
me. You've all been very helpful and patient with all of my newbe questions.

I'm still having problems getting my card bound (I believe that's the 
correct terminology, please correct me if I'm wrong).

I'll first do a little rehash of what's going on:

I am using kernel 2.6.17-11, and have the following options:
Bus Options
* PCCard (PCMCIA/CardBus) support
* 16 bit PCMCIA
[*]   Load CIS updates from userspace
[*]   PCMCIA control ioctl
* CardBus yenta- compatible bridge support
* Cirrus PD6729 compatible bridge support
* i82092 compatible bridge support
* I82365 compatible bridge support
* Databook TCIC compatible bridge support
Networking
M Generic IEEE 802.11 Networking Stack
M Software MAC add-on to IEEE 802.11
[*] Enable full debugging output
Device Drivers
Network device support
Wireless Lan
[*] Wireless Lan Driver (non-ham)
M Broadcom BCM43xx wireless support
[*] Broadcom BCM43xx debugging
BCM43xx data transfer Mode
(X) DMA + PIO

lspcmcia outputs:
lspcmcia, lspcmcia ls, lspcmcia info, lspcmica status, lspcmcia config, and 
lspcmcia idnet all return:
Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0)
Socket 0 Device 0:  [-- no driver --]   (bus ID: 0.0)

lspcmcia -v returns:
Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0)
Configuration:  state: on   ready: yes
Voltage: 3.3V Vcc: 3.3V Vpp: 3.3V
Socket 0 Device 0:  [-- no driver --]   (bus ID: 0.0)
Configuration:  state: on
Product Name:   SummitDC 802.11g SC CF 4.0
Identification: manf_id: 0x02d0 card_id: 0x0448
function: 6 (network)
prod_id(1): SummitDC (0xde2746ff)
prod_id(2): 802.11g SC CF (0xdbe5c59c)
prod_id(3): 4.0 (0x0ebc74cc)
prod_id(4): --- (---)

pccard outputs:
pccardctl ls returns:
Socket 0 Bridge:[yenta_cardbus] (bus ID: :03:06.0)
Socket 0 Device 0:  [-- no driver --]   (bus ID: 0.0)

pccardctl info returns:
PRODID_1=SummitDC
PRODID_2=802.11g SC CF
PRODID_3=4.0
PRODID_4=
MANFID=02d0,0448
FUNCID=6

pccardctl status returns:
Socket 0:
  3.3V 16-bit PC Card
  Subdevice 0 (function 0) [unbound]

pccardctl ident returns:
Socket 0:
  product info: SummitDC, 802.11g SC CF, 4.0, 
  manfid: 0x02d0, 0x0448
  function: 6 (network)

I guess my first questions is what mechanism is used to bind a device to a 
driver, and secondly where is a good source for me to read to start to 
understand the Linux Device Driver system?

I've already picked up the O'Reilly book (Linux Device Drivers 3rd Edition).

I'm just trying to educate myself, so I can quit bothering all of you with 
my novice questions.

Thank you all again you've been great.

Dave


___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: Problem With Driver

2006-09-07 Thread Hendrik Sattler
Am Donnerstag 07 September 2006 19:19 schrieb David Reimer:
 I guess my first questions is what mechanism is used to bind a device to a
 driver, and secondly where is a good source for me to read to start to
 understand the Linux Device Driver system?

I suggest existing drivers as example. I know that 
driver/net/wireless/orinoco_cs.c works well.
Also check the files in Documentation/pcmcia.

 I've already picked up the O'Reilly book (Linux Device Drivers 3rd
 Edition).

That one is a good start.

HS
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


RFC/T: Possible fix for bcm43xx periodic work bug

2006-09-07 Thread Larry Finger
Hi all,

I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx 
timeouts and would like it
to get as much testing as possible as this bug affects V2.6.18-rcX. If the 
problem is truly
fixed, I hope to get the fix into mainline before release of the bug into the 
stable series.

I got the idea for the fix when I discovered that the timeout interval used for 
the watchdog is the 
default value of 5 seconds. Obviously, the few milliseconds used in the 
periodic work handler 
weren't causing us to just miss.

To exacerbate the problem, I changed the repeat timer for periodic work from 15 
to 1 sec. I also set 
BADNESS_LIMIT to 0. As a result, I was running the problem code once per second 
instead of once per 
minute. Now failures would occur in minutes instead of hours.

Operating from the premise that the DMA needed some time to reach the idle 
state after the MAC was 
suspended, I tried various delays, but nothing worked.

Then I decided to test the premise that the problem was associated with 
shutting down and restarting 
the network. That lead to the current patch, which has run for what is 
effectively 100 times longer 
than previous versions.

Larry
---


Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
 * be preemtible.
 */
mutex_lock(bcm-mutex);
-   netif_stop_queue(bcm-net_dev);
-   synchronize_net();
spin_lock_irqsave(bcm-irq_lock, flags);
bcm43xx_mac_suspend(bcm);
if (bcm43xx_using_pio(bcm))
@@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
if (bcm43xx_using_pio(bcm))
bcm43xx_pio_thaw_txqueues(bcm);
bcm43xx_mac_enable(bcm);
-   netif_wake_queue(bcm-net_dev);
}
mmiowb();
spin_unlock_irqrestore(bcm-irq_lock, flags);



--
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: RFC/T: Possible fix for bcm43xx periodic work bug

2006-09-07 Thread Michael Buesch
On Thursday 07 September 2006 20:17, Larry Finger wrote:
 Hi all,
 
 I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx 
 timeouts and would like it
 to get as much testing as possible as this bug affects V2.6.18-rcX. If the 
 problem is truly
 fixed, I hope to get the fix into mainline before release of the bug into the 
 stable series.
 
 I got the idea for the fix when I discovered that the timeout interval used 
 for the watchdog is the 
 default value of 5 seconds. Obviously, the few milliseconds used in the 
 periodic work handler 
 weren't causing us to just miss.
 
 To exacerbate the problem, I changed the repeat timer for periodic work from 
 15 to 1 sec. I also set 
 BADNESS_LIMIT to 0. As a result, I was running the problem code once per 
 second instead of once per 
 minute. Now failures would occur in minutes instead of hours.
 
 Operating from the premise that the DMA needed some time to reach the idle 
 state after the MAC was 
 suspended, I tried various delays, but nothing worked.
 
 Then I decided to test the premise that the problem was associated with 
 shutting down and restarting 
 the network. That lead to the current patch, which has run for what is 
 effectively 100 times longer 
 than previous versions.
 
 Larry
 ---
 
 
 Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
 ===
 --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
 @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
* be preemtible.
*/
   mutex_lock(bcm-mutex);
 - netif_stop_queue(bcm-net_dev);
 - synchronize_net();
   spin_lock_irqsave(bcm-irq_lock, flags);
   bcm43xx_mac_suspend(bcm);
   if (bcm43xx_using_pio(bcm))
 @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
   if (bcm43xx_using_pio(bcm))
   bcm43xx_pio_thaw_txqueues(bcm);
   bcm43xx_mac_enable(bcm);
 - netif_wake_queue(bcm-net_dev);
   }
   mmiowb();
   spin_unlock_irqrestore(bcm-irq_lock, flags);

The real question is: Why does this patch help?

Let's explain it. We don't stop networking just for fun there.
While executing long preemptible periodic work, we must ensure
that the TX path into the driver is not entered. It's the same
reason why we disable IRQs in the first place. We can't take the
mutex in the TX path and the IRQ handler. (That are the only places
where we can't take the mutex).
Short: We must stop netif here.
The question is: Why does stopping netif queue cause a watchdog
trigger here? The maximum time it can take for the periodic
work inside of the critical section is about 0.2sec. So the queue
is stopped for about 0.2sec max. Why does the watchdog trigger?
Any idea from some networking guru?
Could synchronize_net() take over 5sec in some worst case? Why?
Questions over questions :D

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] ucode debug status via sysfs for wireless-2.6

2006-09-07 Thread Michael Buesch
On Thursday 07 September 2006 15:21, Larry Finger wrote:
 Michael Buesch wrote:
  On Thursday 07 September 2006 03:34, Larry Finger wrote:
  +  return -EPERM;
  +
  
  you want to take the spinlock lock here, too.
 
 Obviously, I copied the wrong model. Is it correct that one should take both 
 locks if your code will 
 touch the hardware, but the mutex lock only is sufficient if your code just 
 accesses data 
 structures? This seems to be the pattern in the other bcm43xx_attr__show 
 routines.

In general, no.
But, for some sysfs attrs it is sufficient to only take
the mutex, because:
* We don't access hardware.
* We don't modify this data in a spinlock-only critical section.

Yes, I know that having two locks does not really fit the
lock data, not code model. But it's well defined in bcm43xx,
so I think we can live with it. (and we must live with it,
if we want to have preemptible periodic work. And we _want_).
It's defined by the following rules:

1) Always take both, mutex and lock.
2) There are only two places where we can't take the
   mutex, but only the spinlock. IRQ and TX paths.

(Yes, I know that there are still exceptions to 2.
At least in dscape. The softmac port should be OK.
These are bugs, I am aware of them and will fix it)

So these two rules lead to the following rule:

* Code where we only take the mutex can race against the
TX and IRQ paths.
Now we come back to the sysfs problem above. If the data, we
access in this sysfs code, is not touched in either TX or IRQ path
we don't need to take the spinlock. Yes, it's a little bit black
magic. So if you aren't sure, always take both locks.

-- 
Greetings Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: RFC/T: Possible fix for bcm43xx periodic work bug

2006-09-07 Thread Larry Finger

Michael Buesch wrote:


The real question is: Why does this patch help?

Let's explain it. We don't stop networking just for fun there.
While executing long preemptible periodic work, we must ensure
that the TX path into the driver is not entered. It's the same
reason why we disable IRQs in the first place. We can't take the
mutex in the TX path and the IRQ handler. (That are the only places
where we can't take the mutex).
Short: We must stop netif here.
The question is: Why does stopping netif queue cause a watchdog
trigger here? The maximum time it can take for the periodic
work inside of the critical section is about 0.2sec. So the queue
is stopped for about 0.2sec max. Why does the watchdog trigger?
Any idea from some networking guru?
Could synchronize_net() take over 5sec in some worst case? Why?
Questions over questions :D



To check if it takes more than 5 seconds, I restored the original network disabling code and 
increased the timeout to 30 seconds. If this works without error, I'll try to margin the time. I'm 
still running that branch every second.


Larry


Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st
SET_MODULE_OWNER(net_dev);
SET_NETDEV_DEV(net_dev, pdev-dev);

+   net_dev-watchdog_timeo = 30 * HZ;
net_dev-open = bcm43xx_net_open;
net_dev-stop = bcm43xx_net_stop;
net_dev-get_stats = bcm43xx_net_get_stats;



Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3207,7 +3207,7 @@ static void do_periodic_work(struct bcm4
bcm43xx_periodic_every15sec(bcm);
bcm-periodic_state = state + 1;
 
-   schedule_delayed_work(bcm-periodic_work, HZ * 15);
+   schedule_delayed_work(bcm-periodic_work, HZ * 1);
 }
 
 /* Estimate a Badness value based on the periodic work
@@ -3227,7 +3227,7 @@ static int estimate_periodic_work_badnes
if (state % 1 == 0) /* every 15 sec */
badness += 1;
 
-#define BADNESS_LIMIT  4
+#define BADNESS_LIMIT  0
return badness;
 }
 
@@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st
SET_MODULE_OWNER(net_dev);
SET_NETDEV_DEV(net_dev, pdev-dev);
 
+   net_dev-watchdog_timeo = 30 * HZ;
net_dev-open = bcm43xx_net_open;
net_dev-stop = bcm43xx_net_stop;
net_dev-get_stats = bcm43xx_net_get_stats;

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev