Re: [PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523

2009-11-23 Thread Larry Finger
On 11/23/2009 11:37 PM, Peter Lemenkov wrote:
> 
> Seems that b43 has similar issue(s):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=539267
> 

That one has already been addressed by mainline commit no.
d50bae33d1358b909ade05ae, and sent on to stable.

Thanks for the notification,

Larry
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523

2009-11-23 Thread Peter Lemenkov
Hello All!

2009/11/24 Larry Finger :
> For PPC architecture with PHY Revision < 3, a read of the register
> B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43legacy_status()
> returns a value of 2 (B43legacy_STAT_STARTED); however, one finds that
> the driver is unable to associate after resuming from hibernation unless
> this routine returns 1. To satisfy both conditions, the routine is rewritten
> to return TRUE whenever b43legacy_status() returns a value < 2.
>
> This patch fixes the second problem listed in the postings for Red Hat
> Bugzilla #538523.
>
> Signed-off-by: Larry Finger 
> Cc: Stable 

Seems that b43 has similar issue(s):

https://bugzilla.redhat.com/show_bug.cgi?id=539267

-- 
With best regards, Peter Lemenkov.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: Fatal DMA error problem with netbook and BCM4312

2009-11-23 Thread William Bourque

Larry Finger wrote:
> One last check. I would appreciate receiving answers to the following 
> questions.
> These questions apply to anyone else with this problem.
> 
> Does the pm_qos patch help your "fatal DMA error" problem, particularly when
> booted from power-off?
> 
> If you warm-boot after loading the wl driver, does the patch make any 
> difference?
> 
> Larry

Hi

I run a test case scenario on my notebook to figure out the QOS patch
effect on the card reliability.
I would say that the results are not very conclusive, but it seems the
patch helped slightly, but not very time.

To explain the methodology, I ran 4 series of test (Cold boot and warm
boot, on battery and pluged in) between a patched and unpatched
wireless-testing kernel.
The time in the sheet is the time between the first line of the load of
the firmware (b43 ssb0:0: firmware: requesting b43/ucode15.fw) to the
first DMA error (b43-phy0 ERROR: Fatal DMA error: )
Three time, the card crashed but not DMA error were printed in the log.
In that case, the line about the interface failing (b43-phy0 debug:
Wireless interface stopped) were used instead.
The "time" is taken accordingly to kernel timestamp printed by dmesg.

Note that this test was design to make the card fails quickly. As soon
as the card was up, the full kernel tree was transfert over a wireless
LAN, as it has been observed before that fast rate transfert were good
at crashing DMA. (I tried "ping -f 172.16.0.1" before but the card was
still alive after 5 minutes).
Technically the following command was run : ifup wlan0 && scp -r
/usr/src/linux 172.16.0.1:~/


The result seems to show that the patch has no effect on warm-booted
system but it seems to make the card slightly more reliable on
cold-booted system, with an average time before crashing of 24.07
(patched cold boot) vs 22.98 (unpatched coldboot).

- William
Command issued : ifup wlan0 && scp -r /usr/src/linux 172.16.0.1:~/

Count start line : b43 ssb0:0: firmware: requesting b43/ucode15.fw
Count end line   : b43-phy0 ERROR: Fatal DMA error:
Alternate end line: b43-phy0 debug: Wireless interface stopped

 QOS PATH 
Warm boot on Battery: 
1-14.030134
2-14.050803
3-14.012732

Cold boot on Battery :
1-16.981828
2-*29.269881
3-*25.666901

Warm boot Plugged in : 
1-13.992476
2-14.000156
3-14.047689
4-14.010163

Cold boot Plugged in :
1-16.454922
2-16.005255
3-30.873647
4-33.235585


** NO QOS PATCH:
Warm boot on Battery: 
1-14.010197
2-14.043525
3-14.040962

Cold boot on Battery :
1-22.858424
2-24.253975
3-15.584923

Warm boot Plugged in : 
1-13.994969
2-14.022592
3-14.000205

Cold boot Plugged in :
1-19.990719
2-*24.234147
3-29.321423
4-24.582662
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523

2009-11-23 Thread Larry Finger
For PPC architecture with PHY Revision < 3, a read of the register
B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43legacy_status()
returns a value of 2 (B43legacy_STAT_STARTED); however, one finds that
the driver is unable to associate after resuming from hibernation unless
this routine returns 1. To satisfy both conditions, the routine is rewritten
to return TRUE whenever b43legacy_status() returns a value < 2.

This patch fixes the second problem listed in the postings for Red Hat
Bugzilla #538523.

Signed-off-by: Larry Finger 
Cc: Stable 
---

John,

This is 2.6.32 material.

Larry
---

Index: wireless-testing/drivers/net/wireless/b43legacy/rfkill.c
===
--- wireless-testing.orig/drivers/net/wireless/b43legacy/rfkill.c
+++ wireless-testing/drivers/net/wireless/b43legacy/rfkill.c
@@ -34,6 +34,13 @@ bool b43legacy_is_hw_radio_enabled(struc
  & B43legacy_MMIO_RADIO_HWENABLED_HI_MASK))
return 1;
} else {
+   /* To prevent CPU fault on PPC, do not read a register
+* unless the interface is started; however, on resume
+* for hibernation, this routine is entered early. When
+* that happens, unconditionally return TRUE.
+*/
+   if (b43legacy_status(dev) < B43legacy_STAT_STARTED)
+   return 1;
if (b43legacy_read16(dev, B43legacy_MMIO_RADIO_HWENABLED_LO)
& B43legacy_MMIO_RADIO_HWENABLED_LO_MASK)
return 1;
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[PATCH] b43: Fix regression from Bug #14538

2009-11-23 Thread Larry Finger
The routine b43_is_hw_radio_enabled() has long been a problem.
For PPC architecture with PHY Revision < 3, a read of the register
B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status()
returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that
results in Bug 14538 in which the driver is unable to reassociate
after resuming from hibernation because b43_status() returns 0.

The correct fix would be to determine why the status is 0; however,
I have not yet found why that happens. The correct value is found for
my device, which has PHY revision >= 3.

Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes
the regression for 2.6.32.

This patch fixes the problem in Red Hat Bugzilla #538523.

Signed-off-by: Larry Finger 
Tested-by: Christian Casteyde 
---

John,

Your suggested change was a good one and has been implemented. This is
2.6.32 material and should be backported to stable; however, it may not
apply cleanly. Should I make it Cc: stable, or submit it to GregKH in the
proper form once it is committed to mainline?

Larry
---


Index: wireless-testing/drivers/net/wireless/b43/rfkill.c
===
--- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c
+++ wireless-testing/drivers/net/wireless/b43/rfkill.c
@@ -33,8 +33,14 @@ bool b43_is_hw_radio_enabled(struct b43_
  & B43_MMIO_RADIO_HWENABLED_HI_MASK))
return 1;
} else {
-   if (b43_status(dev) >= B43_STAT_STARTED &&
-   b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
+   /* To prevent CPU fault on PPC, do not read a register
+* unless the interface is started; however, on resume
+* for hibernation, this routine is entered early. When
+* that happens, unconditionally return TRUE.
+*/
+   if (b43_status(dev) < B43_STAT_STARTED)
+   return 1;
+   if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
& B43_MMIO_RADIO_HWENABLED_LO_MASK)
return 1;
}
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] b43: Fix regression from Bug #14538

2009-11-23 Thread John W. Linville
On Mon, Nov 23, 2009 at 01:55:06PM -0600, Larry Finger wrote:
> The routine b43_is_hw_radio_enabled() has long been a problem.
> For PPC architecture with PHY Revision < 3, a read of the register
> B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status()
> returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that
> results in Bug 14538 in which the driver is unable to reassociate
> after resuming from hibernation because b43_status() returns 0.
> 
> The correct fix would be to determine why the status is 0; however,
> I have not yet found why that happens. The correct value is found for
> my device, which has PHY revision >= 3.
> 
> Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes
> the regression for 2.6.32.
> 
> Signed-off-by: Larry Finger 
> Tested-by: Christian Casteyde 
> ---
> 
> Index: wireless-testing/drivers/net/wireless/b43/rfkill.c
> ===
> --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c
> +++ wireless-testing/drivers/net/wireless/b43/rfkill.c
> @@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_
> & B43_MMIO_RADIO_HWENABLED_HI_MASK))
>   return 1;
>   } else {
> - if (b43_status(dev) >= B43_STAT_STARTED &&
> - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
> - & B43_MMIO_RADIO_HWENABLED_LO_MASK)
> + /* To prevent CPU fault on PPC, do not read a register
> +  * unless the interface is started; however, on resume
> +  * for hibernation, this routine is entered early. When
> +  * that happens, unconditionally return TRUE.
> +  */
> + if (b43_status(dev) >= B43_STAT_STARTED) {
> + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
> + & B43_MMIO_RADIO_HWENABLED_LO_MASK)
> + return 1;
> + } else
>   return 1;
>   }
>   return 0;

Maybe just me, but I think this version is easier to read (and especially to 
see the difference):

diff --git a/drivers/net/wireless/b43/rfkill.c 
b/drivers/net/wireless/b43/rfkill.c
index ffdce6f..ddc3c93 100644
--- a/drivers/net/wireless/b43/rfkill.c
+++ b/drivers/net/wireless/b43/rfkill.c
@@ -33,8 +33,14 @@ bool b43_is_hw_radio_enabled(struct b43_wldev *dev)
  & B43_MMIO_RADIO_HWENABLED_HI_MASK))
return 1;
} else {
-   if (b43_status(dev) >= B43_STAT_STARTED &&
-   b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
+   /* To prevent CPU fault on PPC, do not read a register
+* unless the interface is started; however, on resume
+* for hibernation, this routine is entered early. When
+* that happens, unconditionally return TRUE.
+*/
+   if (b43_status(dev) < B43_STAT_STARTED)
+   return 1;
+   if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
& B43_MMIO_RADIO_HWENABLED_LO_MASK)
return 1;
}
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [RFC] b43: Fix regression from Bug #14538

2009-11-23 Thread Michael Buesch
On Monday 23 November 2009 20:55:06 Larry Finger wrote:
> The routine b43_is_hw_radio_enabled() has long been a problem.
> For PPC architecture with PHY Revision < 3, a read of the register
> B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status()
> returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that
> results in Bug 14538 in which the driver is unable to reassociate
> after resuming from hibernation because b43_status() returns 0.
> 
> The correct fix would be to determine why the status is 0; however,
> I have not yet found why that happens. The correct value is found for
> my device, which has PHY revision >= 3.
> 
> Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes
> the regression for 2.6.32.
> 
> Signed-off-by: Larry Finger 
> Tested-by: Christian Casteyde 
> ---
> 
> Index: wireless-testing/drivers/net/wireless/b43/rfkill.c
> ===
> --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c
> +++ wireless-testing/drivers/net/wireless/b43/rfkill.c
> @@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_
> & B43_MMIO_RADIO_HWENABLED_HI_MASK))
>   return 1;
>   } else {
> - if (b43_status(dev) >= B43_STAT_STARTED &&
> - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
> - & B43_MMIO_RADIO_HWENABLED_LO_MASK)
> + /* To prevent CPU fault on PPC, do not read a register
> +  * unless the interface is started; however, on resume
> +  * for hibernation, this routine is entered early. When
> +  * that happens, unconditionally return TRUE.
> +  */
> + if (b43_status(dev) >= B43_STAT_STARTED) {
> + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
> + & B43_MMIO_RADIO_HWENABLED_LO_MASK)
> + return 1;
> + } else
>   return 1;
>   }
>   return 0;

looks OK as quick workaround.

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[PATCH] ssb: Fix range check in sprom write

2009-11-23 Thread Michael Buesch
The range check in the sprom image parser hex2sprom() is broken.
One sprom word is 4 hex characters.
This fixes the check and also adds much better sanity checks to the code.
We better make sure the image is OK by doing some sanity checks to avoid
bricking the device by accident.

Signed-off-by: Michael Buesch 
Cc: sta...@kernel.org

---

Index: wireless-testing/drivers/ssb/sprom.c
===
--- wireless-testing.orig/drivers/ssb/sprom.c   2009-11-23 14:24:57.0 
+0100
+++ wireless-testing/drivers/ssb/sprom.c2009-11-23 20:43:04.0 
+0100
@@ -13,6 +13,8 @@
 
 #include "ssb_private.h"
 
+#include 
+
 
 static const struct ssb_sprom *fallback_sprom;
 
@@ -33,17 +35,27 @@ static int sprom2hex(const u16 *sprom, c
 static int hex2sprom(u16 *sprom, const char *dump, size_t len,
 size_t sprom_size_words)
 {
-   char tmp[5] = { 0 };
-   int cnt = 0;
+   char c, tmp[5] = { 0 };
+   int err, cnt = 0;
unsigned long parsed;
 
-   if (len < sprom_size_words * 2)
+   /* Strip whitespace at the end. */
+   while (len) {
+   c = dump[len - 1];
+   if (!isspace(c) && c != '\0')
+   break;
+   len--;
+   }
+   /* Length must match exactly. */
+   if (len != sprom_size_words * 4)
return -EINVAL;
 
while (cnt < sprom_size_words) {
memcpy(tmp, dump, 4);
dump += 4;
-   parsed = simple_strtoul(tmp, NULL, 16);
+   err = strict_strtoul(tmp, 16, &parsed);
+   if (err)
+   return err;
sprom[cnt++] = swab16((u16)parsed);
}
 

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[RFC] b43: Fix regression from Bug #14538

2009-11-23 Thread Larry Finger
The routine b43_is_hw_radio_enabled() has long been a problem.
For PPC architecture with PHY Revision < 3, a read of the register
B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status()
returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that
results in Bug 14538 in which the driver is unable to reassociate
after resuming from hibernation because b43_status() returns 0.

The correct fix would be to determine why the status is 0; however,
I have not yet found why that happens. The correct value is found for
my device, which has PHY revision >= 3.

Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes
the regression for 2.6.32.

Signed-off-by: Larry Finger 
Tested-by: Christian Casteyde 
---

Index: wireless-testing/drivers/net/wireless/b43/rfkill.c
===
--- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c
+++ wireless-testing/drivers/net/wireless/b43/rfkill.c
@@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_
  & B43_MMIO_RADIO_HWENABLED_HI_MASK))
return 1;
} else {
-   if (b43_status(dev) >= B43_STAT_STARTED &&
-   b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
-   & B43_MMIO_RADIO_HWENABLED_LO_MASK)
+   /* To prevent CPU fault on PPC, do not read a register
+* unless the interface is started; however, on resume
+* for hibernation, this routine is entered early. When
+* that happens, unconditionally return TRUE.
+*/
+   if (b43_status(dev) >= B43_STAT_STARTED) {
+   if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO)
+   & B43_MMIO_RADIO_HWENABLED_LO_MASK)
+   return 1;
+   } else
return 1;
}
return 0;

___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


[PATCH] ssb: Fix SPROM writing

2009-11-23 Thread Michael Buesch
The SPROM writing routines were broken since we rewrote the suspend
handling on wireless devices, because SPROM writing depended on suspend.

This patch changes it and freezes devices with the driver remove(), probe()
callbacks instead. This also simplifies the whole logics a lot.

Signed-off-by: Michael Buesch 

---

Tested on BCM4306


Index: wireless-testing/drivers/ssb/main.c
===
--- wireless-testing.orig/drivers/ssb/main.c2009-11-06 21:45:29.0 
+0100
+++ wireless-testing/drivers/ssb/main.c 2009-11-23 19:59:24.0 +0100
@@ -140,6 +140,19 @@ static void ssb_device_put(struct ssb_de
put_device(dev->dev);
 }
 
+static inline struct ssb_driver *ssb_driver_get(struct ssb_driver *drv)
+{
+   if (drv)
+   get_driver(&drv->drv);
+   return drv;
+}
+
+static inline void ssb_driver_put(struct ssb_driver *drv)
+{
+   if (drv)
+   put_driver(&drv->drv);
+}
+
 static int ssb_device_resume(struct device *dev)
 {
struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
@@ -210,90 +223,81 @@ int ssb_bus_suspend(struct ssb_bus *bus)
 EXPORT_SYMBOL(ssb_bus_suspend);
 
 #ifdef CONFIG_SSB_SPROM
-int ssb_devices_freeze(struct ssb_bus *bus)
+/** ssb_devices_freeze - Freeze all devices on the bus.
+ *
+ * After freezing no device driver will be handling a device
+ * on this bus anymore. ssb_devices_thaw() must be called after
+ * a successful freeze to reactivate the devices.
+ *
+ * @bus: The bus.
+ * @ctx: Context structure. Pass this to ssb_devices_thaw().
+ */
+int ssb_devices_freeze(struct ssb_bus *bus, struct ssb_freeze_context *ctx)
 {
-   struct ssb_device *dev;
-   struct ssb_driver *drv;
-   int err = 0;
-   int i;
-   pm_message_t state = PMSG_FREEZE;
+   struct ssb_device *sdev;
+   struct ssb_driver *sdrv;
+   unsigned int i;
+
+   memset(ctx, 0, sizeof(*ctx));
+   ctx->bus = bus;
+   SSB_WARN_ON(bus->nr_devices > ARRAY_SIZE(ctx->device_frozen));
 
-   /* First check that we are capable to freeze all devices. */
for (i = 0; i < bus->nr_devices; i++) {
-   dev = &(bus->devices[i]);
-   if (!dev->dev ||
-   !dev->dev->driver ||
-   !device_is_registered(dev->dev))
-   continue;
-   drv = drv_to_ssb_drv(dev->dev->driver);
-   if (!drv)
+   sdev = ssb_device_get(&bus->devices[i]);
+
+   if (!sdev->dev || !sdev->dev->driver ||
+   !device_is_registered(sdev->dev)) {
+   ssb_device_put(sdev);
continue;
-   if (!drv->suspend) {
-   /* Nope, can't suspend this one. */
-   return -EOPNOTSUPP;
}
-   }
-   /* Now suspend all devices */
-   for (i = 0; i < bus->nr_devices; i++) {
-   dev = &(bus->devices[i]);
-   if (!dev->dev ||
-   !dev->dev->driver ||
-   !device_is_registered(dev->dev))
-   continue;
-   drv = drv_to_ssb_drv(dev->dev->driver);
-   if (!drv)
+   sdrv = ssb_driver_get(drv_to_ssb_drv(sdev->dev->driver));
+   if (!sdrv || SSB_WARN_ON(!sdrv->remove)) {
+   ssb_device_put(sdev);
continue;
-   err = drv->suspend(dev, state);
-   if (err) {
-   ssb_printk(KERN_ERR PFX "Failed to freeze device %s\n",
-  dev_name(dev->dev));
-   goto err_unwind;
}
+   sdrv->remove(sdev);
+   ctx->device_frozen[i] = 1;
}
 
return 0;
-err_unwind:
-   for (i--; i >= 0; i--) {
-   dev = &(bus->devices[i]);
-   if (!dev->dev ||
-   !dev->dev->driver ||
-   !device_is_registered(dev->dev))
-   continue;
-   drv = drv_to_ssb_drv(dev->dev->driver);
-   if (!drv)
-   continue;
-   if (drv->resume)
-   drv->resume(dev);
-   }
-   return err;
 }
 
-int ssb_devices_thaw(struct ssb_bus *bus)
+/** ssb_devices_thaw - Unfreeze all devices on the bus.
+ *
+ * This will re-attach the device drivers and re-init the devices.
+ *
+ * @ctx: The context structure from ssb_devices_freeze()
+ */
+int ssb_devices_thaw(struct ssb_freeze_context *ctx)
 {
-   struct ssb_device *dev;
-   struct ssb_driver *drv;
-   int err;
-   int i;
+   struct ssb_bus *bus = ctx->bus;
+   struct ssb_device *sdev;
+   struct ssb_driver *sdrv;
+   unsigned int i;
+   int err, result = 0;
 
for (i = 0; i < bus->nr_devices; i++) {
-   dev = &(bus->devices[i]);
-   if (!dev->dev ||
-

Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

2009-11-23 Thread Larry Finger
On 11/23/2009 05:00 AM, Francesco Gringoli wrote:
> 
> so you can observe this behavior at your site. Do you mind describing
> the exact configuration? Maybe this time I can reproduce this behavior,
> as I tried everything to make it happen. I also asked Larry one of his
> boards and put it into several PCs but had no chance to reproduce the
> crash. Could you please also report the neighboring stations, the AP you
> are connected and so on.

As Michael said, I was the one that reported the behavior. My card was a BCM4318
in a Cardbus format running V5.2 of the openfwwf. The AP is a Linksys WRT54G V5
running standard firmware v1.02.6. A list of nearby AP's with channels and
strengths are as follows:

  Cell 01 - Address: 00:14:BF:85:49:FA
Channel:1
Frequency:2.412 GHz (Channel 1)
Quality=70/70  Signal level=-6 dBm
Encryption key:on  (WPA2)
ESSID:"lwfdjf_rad"
  Cell 02 - Address: 00:1B:11:5C:B0:83
Channel:1
Frequency:2.412 GHz (Channel 1)
Quality=25/70  Signal level=-85 dBm
Encryption key:on   (WEP)
ESSID:"Browns"
  Cell 03 - Address: 00:1A:70:46:BA:B1
Channel:11
Frequency:2.462 GHz (Channel 11)
Quality=42/70  Signal level=-68 dBm
Encryption key:on(WEP)
ESSID:"Larry with space"
  Cell 04 - Address: 00:23:69:81:B7:D9
Channel:6
Frequency:2.437 GHz (Channel 6)
Quality=34/70  Signal level=-76 dBm
Encryption key:on(WEP)
ESSID:"Hoover"
  Cell 05 - Address: 00:14:BF:0C:7E:14
Channel:6
Frequency:2.437 GHz (Channel 6)
Quality=28/70  Signal level=-82 dBm
Encryption key:on (WPA)
ESSID:"linksys"
  Cell 06 - Address: 00:22:6B:78:18:7D
Channel:11
Frequency:2.462 GHz (Channel 11)
Quality=26/70  Signal level=-84 dBm
Encryption key:on (WPA)
ESSID:"Go Away"

I was connected to the AP in Cell 01. The test was my usual with a repeating
tcpperf run in one terminal and a flood ping to the same server in a second.
Please let me know if I missed any useful information.

This condition may take a long time to show up. For instance, my latest test ran
for 25 hours before failure. All other tests have failed much more quickly, but
one never knows. I have not seen this failure with standard firmware.

Larry

Larry





___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

2009-11-23 Thread Francesco Gringoli
On Nov 23, 2009, at 11:49 AM, Michael Buesch wrote:

> On Monday 23 November 2009 05:45:47 Larry Finger wrote:
>> On 11/19/2009 03:24 PM, Michael Buesch wrote:
>>> This rewrites the error handling policies in the TX status handler.
>>> It tries to be error-tolerant as in "try hard to not crash the  
>>> machine".
>>> It won't recover from errors (that are bugs in the firmware or  
>>> driver),
>>> because that's impossible. However, it will return a more or less  
>>> useful
>>> error message and bail out. It also tries hard to use rate-limited  
>>> messages
>>> to not flood the syslog in case of a failure.
>>
>> This patch definitely helped open-source firmware, but it is not a  
>> complete fix.
>
> It is no fix _at_ _all_.
> The patch does not change a single line of code that wasn't either  
> an assertion
> or a machine crash before.
> So it just transforms assertions into more verbose assertions and  
> crashes into
> assertions without a crash.
>
>> debug: Out of order TX status report on DMA ring 1. Expected 114,  
>> but got 146
>
> Ok, this is what I expected.
>
> Let's see what's going on. Here's the ring. o is unused, * is used.
>
> ooo 
> ***ooo
>   ^   ^ ^
>   114 146   newest
>   oldest
>
> So as you can see, the firmware reported a TX status for a frame  
> right in the middle of
> the ringbuffer. The new code detects this now before getting a  
> double free and/or silent
> memory corruption (freeing of used memory).
Hi Michael,

so you can observe this behavior at your site. Do you mind describing  
the exact configuration? Maybe this time I can reproduce this  
behavior, as I tried everything to make it happen. I also asked Larry  
one of his boards and put it into several PCs but had no chance to  
reproduce the crash. Could you please also report the neighboring  
stations, the AP you are connected and so on.

Many thanks,
-Francesco



Informativa sulla privacy: http://help.ing.unibs.it/privacy.php


___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

2009-11-23 Thread Michael Buesch
On Monday 23 November 2009 12:00:15 Francesco Gringoli wrote:
> Hi Michael,
> 
> so you can observe this behavior at your site. Do you mind describing  
> the exact configuration? Maybe this time I can reproduce this  
> behavior, as I tried everything to make it happen. I also asked Larry  
> one of his boards and put it into several PCs but had no chance to  
> reproduce the crash. Could you please also report the neighboring  
> stations, the AP you are connected and so on.

I don't reproduce this, because I never really tried running the opensource 
firmware.
Larry reproduced it, this time.

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

2009-11-23 Thread Michael Buesch
On Monday 23 November 2009 05:45:47 Larry Finger wrote:
> On 11/19/2009 03:24 PM, Michael Buesch wrote:
> > This rewrites the error handling policies in the TX status handler.
> > It tries to be error-tolerant as in "try hard to not crash the machine".
> > It won't recover from errors (that are bugs in the firmware or driver),
> > because that's impossible. However, it will return a more or less useful
> > error message and bail out. It also tries hard to use rate-limited messages
> > to not flood the syslog in case of a failure.
> 
> This patch definitely helped open-source firmware, but it is not a complete 
> fix.

It is no fix _at_ _all_.
The patch does not change a single line of code that wasn't either an assertion
or a machine crash before.
So it just transforms assertions into more verbose assertions and crashes into
assertions without a crash.

> debug: Out of order TX status report on DMA ring 1. Expected 114, but got 146

Ok, this is what I expected.

Let's see what's going on. Here's the ring. o is unused, * is used.

ooo***ooo
   ^   ^ ^
   114 146   newest
   oldest

So as you can see, the firmware reported a TX status for a frame right in the 
middle of
the ringbuffer. The new code detects this now before getting a double free 
and/or silent
memory corruption (freeing of used memory).

It really is illegal to report a TX status for a frame that's not the oldest 
one in the ring.
The firmware is required to process all frames in-order on one ring.

So how can this failure happen? I think there basically are three ways this can 
happen.

- First is that the ordering within one ring really gets messed up and it loses 
track
  of its ring pointers. I'm not sure if this is likely. Probably not.
- It messes up the ring membership. So it reports a TX status on the wrong ring.
  Note that the "ring" kernel pointer in the TX status report handler is derived
  from the cookie (and so also the number in the message "Out of order TX status
  report on DMA ring 1" is derived from the cookie). So it's untrustworthy in 
case of
  broken firmware. The firmware has QoS-alike mechanisms, even if QoS is 
disabled. Maybe
  these mechanisms are broken.
- Third is the possibility of a driver bug. I rule that out as long as nobody is
  able to reproduce it with proprietary firmware.

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: Fatal DMA error problem with netbook and BCM4312

2009-11-23 Thread Johannes Berg
On Sun, 2009-11-22 at 19:52 -0600, Larry Finger wrote:

> We know that the wl driver does something to the interface that persists 
> across
> a warm boot - we just do not know what. It does not appear to be done in any 
> of
> the MMIO traffic - at least I have not seen it in the mmio-trace output. If
> anyone has a KVM setup using PCI passthrough, it is possible to trace PCI
> configuration traffic?

I'm pretty sure even the binary driver has to go through
drivers/pci/access.c, maybe you can just insert logging into that code?

johannes


signature.asc
Description: This is a digitally signed message part
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

2009-11-23 Thread Michael Buesch
On Monday 23 November 2009 02:34:09 Larry Finger wrote:
> On 11/19/2009 03:24 PM, Michael Buesch wrote:
> > This rewrites the error handling policies in the TX status handler.
> > It tries to be error-tolerant as in "try hard to not crash the machine".
> > It won't recover from errors (that are bugs in the firmware or driver),
> > because that's impossible. However, it will return a more or less useful
> > error message and bail out. It also tries hard to use rate-limited messages
> > to not flood the syslog in case of a failure.
> > 
> > Signed-off-by: Michael Buesch 
> > 
> > ---
> 
> Tested and ACKed by Larry Finger. Not only does this improve the error 
> handling
> for b43, but it also appears to fix the skb == NULL error that I experienced
> with the open-source firmware.

I don't think there's any way it can fix this. The patch doesn't change the
code behavior. It just changes the sanity checks, that under normal 
circumstances
should never trigger.

> John - please push this into wireless-testing. It should also go to 2.6.32, 
> but
> it is likely too large for the current stage. At least Cc it to stable.

Don't put it into stable. This is not a fix.
I don't think it's suitable for 2.6.32 at this stage, too.

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev