Re: [PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523
On 11/23/2009 11:37 PM, Peter Lemenkov wrote: > > Seems that b43 has similar issue(s): > > https://bugzilla.redhat.com/show_bug.cgi?id=539267 > That one has already been addressed by mainline commit no. d50bae33d1358b909ade05ae, and sent on to stable. Thanks for the notification, Larry ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523
Hello All! 2009/11/24 Larry Finger : > For PPC architecture with PHY Revision < 3, a read of the register > B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43legacy_status() > returns a value of 2 (B43legacy_STAT_STARTED); however, one finds that > the driver is unable to associate after resuming from hibernation unless > this routine returns 1. To satisfy both conditions, the routine is rewritten > to return TRUE whenever b43legacy_status() returns a value < 2. > > This patch fixes the second problem listed in the postings for Red Hat > Bugzilla #538523. > > Signed-off-by: Larry Finger > Cc: Stable Seems that b43 has similar issue(s): https://bugzilla.redhat.com/show_bug.cgi?id=539267 -- With best regards, Peter Lemenkov. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: Fatal DMA error problem with netbook and BCM4312
Larry Finger wrote: > One last check. I would appreciate receiving answers to the following > questions. > These questions apply to anyone else with this problem. > > Does the pm_qos patch help your "fatal DMA error" problem, particularly when > booted from power-off? > > If you warm-boot after loading the wl driver, does the patch make any > difference? > > Larry Hi I run a test case scenario on my notebook to figure out the QOS patch effect on the card reliability. I would say that the results are not very conclusive, but it seems the patch helped slightly, but not very time. To explain the methodology, I ran 4 series of test (Cold boot and warm boot, on battery and pluged in) between a patched and unpatched wireless-testing kernel. The time in the sheet is the time between the first line of the load of the firmware (b43 ssb0:0: firmware: requesting b43/ucode15.fw) to the first DMA error (b43-phy0 ERROR: Fatal DMA error: ) Three time, the card crashed but not DMA error were printed in the log. In that case, the line about the interface failing (b43-phy0 debug: Wireless interface stopped) were used instead. The "time" is taken accordingly to kernel timestamp printed by dmesg. Note that this test was design to make the card fails quickly. As soon as the card was up, the full kernel tree was transfert over a wireless LAN, as it has been observed before that fast rate transfert were good at crashing DMA. (I tried "ping -f 172.16.0.1" before but the card was still alive after 5 minutes). Technically the following command was run : ifup wlan0 && scp -r /usr/src/linux 172.16.0.1:~/ The result seems to show that the patch has no effect on warm-booted system but it seems to make the card slightly more reliable on cold-booted system, with an average time before crashing of 24.07 (patched cold boot) vs 22.98 (unpatched coldboot). - William Command issued : ifup wlan0 && scp -r /usr/src/linux 172.16.0.1:~/ Count start line : b43 ssb0:0: firmware: requesting b43/ucode15.fw Count end line : b43-phy0 ERROR: Fatal DMA error: Alternate end line: b43-phy0 debug: Wireless interface stopped QOS PATH Warm boot on Battery: 1-14.030134 2-14.050803 3-14.012732 Cold boot on Battery : 1-16.981828 2-*29.269881 3-*25.666901 Warm boot Plugged in : 1-13.992476 2-14.000156 3-14.047689 4-14.010163 Cold boot Plugged in : 1-16.454922 2-16.005255 3-30.873647 4-33.235585 ** NO QOS PATCH: Warm boot on Battery: 1-14.010197 2-14.043525 3-14.040962 Cold boot on Battery : 1-22.858424 2-24.253975 3-15.584923 Warm boot Plugged in : 1-13.994969 2-14.022592 3-14.000205 Cold boot Plugged in : 1-19.990719 2-*24.234147 3-29.321423 4-24.582662 ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[PATCH] b43legacy: Fix problem for PPC architecture noted in Red Hat Bugzilla #538523
For PPC architecture with PHY Revision < 3, a read of the register B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43legacy_status() returns a value of 2 (B43legacy_STAT_STARTED); however, one finds that the driver is unable to associate after resuming from hibernation unless this routine returns 1. To satisfy both conditions, the routine is rewritten to return TRUE whenever b43legacy_status() returns a value < 2. This patch fixes the second problem listed in the postings for Red Hat Bugzilla #538523. Signed-off-by: Larry Finger Cc: Stable --- John, This is 2.6.32 material. Larry --- Index: wireless-testing/drivers/net/wireless/b43legacy/rfkill.c === --- wireless-testing.orig/drivers/net/wireless/b43legacy/rfkill.c +++ wireless-testing/drivers/net/wireless/b43legacy/rfkill.c @@ -34,6 +34,13 @@ bool b43legacy_is_hw_radio_enabled(struc & B43legacy_MMIO_RADIO_HWENABLED_HI_MASK)) return 1; } else { + /* To prevent CPU fault on PPC, do not read a register +* unless the interface is started; however, on resume +* for hibernation, this routine is entered early. When +* that happens, unconditionally return TRUE. +*/ + if (b43legacy_status(dev) < B43legacy_STAT_STARTED) + return 1; if (b43legacy_read16(dev, B43legacy_MMIO_RADIO_HWENABLED_LO) & B43legacy_MMIO_RADIO_HWENABLED_LO_MASK) return 1; ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[PATCH] b43: Fix regression from Bug #14538
The routine b43_is_hw_radio_enabled() has long been a problem. For PPC architecture with PHY Revision < 3, a read of the register B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status() returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that results in Bug 14538 in which the driver is unable to reassociate after resuming from hibernation because b43_status() returns 0. The correct fix would be to determine why the status is 0; however, I have not yet found why that happens. The correct value is found for my device, which has PHY revision >= 3. Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes the regression for 2.6.32. This patch fixes the problem in Red Hat Bugzilla #538523. Signed-off-by: Larry Finger Tested-by: Christian Casteyde --- John, Your suggested change was a good one and has been implemented. This is 2.6.32 material and should be backported to stable; however, it may not apply cleanly. Should I make it Cc: stable, or submit it to GregKH in the proper form once it is committed to mainline? Larry --- Index: wireless-testing/drivers/net/wireless/b43/rfkill.c === --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c +++ wireless-testing/drivers/net/wireless/b43/rfkill.c @@ -33,8 +33,14 @@ bool b43_is_hw_radio_enabled(struct b43_ & B43_MMIO_RADIO_HWENABLED_HI_MASK)) return 1; } else { - if (b43_status(dev) >= B43_STAT_STARTED && - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) + /* To prevent CPU fault on PPC, do not read a register +* unless the interface is started; however, on resume +* for hibernation, this routine is entered early. When +* that happens, unconditionally return TRUE. +*/ + if (b43_status(dev) < B43_STAT_STARTED) + return 1; + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) & B43_MMIO_RADIO_HWENABLED_LO_MASK) return 1; } ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] b43: Fix regression from Bug #14538
On Mon, Nov 23, 2009 at 01:55:06PM -0600, Larry Finger wrote: > The routine b43_is_hw_radio_enabled() has long been a problem. > For PPC architecture with PHY Revision < 3, a read of the register > B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status() > returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that > results in Bug 14538 in which the driver is unable to reassociate > after resuming from hibernation because b43_status() returns 0. > > The correct fix would be to determine why the status is 0; however, > I have not yet found why that happens. The correct value is found for > my device, which has PHY revision >= 3. > > Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes > the regression for 2.6.32. > > Signed-off-by: Larry Finger > Tested-by: Christian Casteyde > --- > > Index: wireless-testing/drivers/net/wireless/b43/rfkill.c > === > --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c > +++ wireless-testing/drivers/net/wireless/b43/rfkill.c > @@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_ > & B43_MMIO_RADIO_HWENABLED_HI_MASK)) > return 1; > } else { > - if (b43_status(dev) >= B43_STAT_STARTED && > - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) > - & B43_MMIO_RADIO_HWENABLED_LO_MASK) > + /* To prevent CPU fault on PPC, do not read a register > + * unless the interface is started; however, on resume > + * for hibernation, this routine is entered early. When > + * that happens, unconditionally return TRUE. > + */ > + if (b43_status(dev) >= B43_STAT_STARTED) { > + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) > + & B43_MMIO_RADIO_HWENABLED_LO_MASK) > + return 1; > + } else > return 1; > } > return 0; Maybe just me, but I think this version is easier to read (and especially to see the difference): diff --git a/drivers/net/wireless/b43/rfkill.c b/drivers/net/wireless/b43/rfkill.c index ffdce6f..ddc3c93 100644 --- a/drivers/net/wireless/b43/rfkill.c +++ b/drivers/net/wireless/b43/rfkill.c @@ -33,8 +33,14 @@ bool b43_is_hw_radio_enabled(struct b43_wldev *dev) & B43_MMIO_RADIO_HWENABLED_HI_MASK)) return 1; } else { - if (b43_status(dev) >= B43_STAT_STARTED && - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) + /* To prevent CPU fault on PPC, do not read a register +* unless the interface is started; however, on resume +* for hibernation, this routine is entered early. When +* that happens, unconditionally return TRUE. +*/ + if (b43_status(dev) < B43_STAT_STARTED) + return 1; + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) & B43_MMIO_RADIO_HWENABLED_LO_MASK) return 1; } -- John W. LinvilleSomeday the world will need a hero, and you linvi...@tuxdriver.com might be all we have. Be ready. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [RFC] b43: Fix regression from Bug #14538
On Monday 23 November 2009 20:55:06 Larry Finger wrote: > The routine b43_is_hw_radio_enabled() has long been a problem. > For PPC architecture with PHY Revision < 3, a read of the register > B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status() > returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that > results in Bug 14538 in which the driver is unable to reassociate > after resuming from hibernation because b43_status() returns 0. > > The correct fix would be to determine why the status is 0; however, > I have not yet found why that happens. The correct value is found for > my device, which has PHY revision >= 3. > > Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes > the regression for 2.6.32. > > Signed-off-by: Larry Finger > Tested-by: Christian Casteyde > --- > > Index: wireless-testing/drivers/net/wireless/b43/rfkill.c > === > --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c > +++ wireless-testing/drivers/net/wireless/b43/rfkill.c > @@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_ > & B43_MMIO_RADIO_HWENABLED_HI_MASK)) > return 1; > } else { > - if (b43_status(dev) >= B43_STAT_STARTED && > - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) > - & B43_MMIO_RADIO_HWENABLED_LO_MASK) > + /* To prevent CPU fault on PPC, do not read a register > + * unless the interface is started; however, on resume > + * for hibernation, this routine is entered early. When > + * that happens, unconditionally return TRUE. > + */ > + if (b43_status(dev) >= B43_STAT_STARTED) { > + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) > + & B43_MMIO_RADIO_HWENABLED_LO_MASK) > + return 1; > + } else > return 1; > } > return 0; looks OK as quick workaround. -- Greetings, Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[PATCH] ssb: Fix range check in sprom write
The range check in the sprom image parser hex2sprom() is broken. One sprom word is 4 hex characters. This fixes the check and also adds much better sanity checks to the code. We better make sure the image is OK by doing some sanity checks to avoid bricking the device by accident. Signed-off-by: Michael Buesch Cc: sta...@kernel.org --- Index: wireless-testing/drivers/ssb/sprom.c === --- wireless-testing.orig/drivers/ssb/sprom.c 2009-11-23 14:24:57.0 +0100 +++ wireless-testing/drivers/ssb/sprom.c2009-11-23 20:43:04.0 +0100 @@ -13,6 +13,8 @@ #include "ssb_private.h" +#include + static const struct ssb_sprom *fallback_sprom; @@ -33,17 +35,27 @@ static int sprom2hex(const u16 *sprom, c static int hex2sprom(u16 *sprom, const char *dump, size_t len, size_t sprom_size_words) { - char tmp[5] = { 0 }; - int cnt = 0; + char c, tmp[5] = { 0 }; + int err, cnt = 0; unsigned long parsed; - if (len < sprom_size_words * 2) + /* Strip whitespace at the end. */ + while (len) { + c = dump[len - 1]; + if (!isspace(c) && c != '\0') + break; + len--; + } + /* Length must match exactly. */ + if (len != sprom_size_words * 4) return -EINVAL; while (cnt < sprom_size_words) { memcpy(tmp, dump, 4); dump += 4; - parsed = simple_strtoul(tmp, NULL, 16); + err = strict_strtoul(tmp, 16, &parsed); + if (err) + return err; sprom[cnt++] = swab16((u16)parsed); } -- Greetings, Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[RFC] b43: Fix regression from Bug #14538
The routine b43_is_hw_radio_enabled() has long been a problem. For PPC architecture with PHY Revision < 3, a read of the register B43_MMIO_HWENABLED_LO will cause a CPU fault unless b43_status() returns a value of 2 (B43_STAT_STARTED) (BUG 14181). Fixing that results in Bug 14538 in which the driver is unable to reassociate after resuming from hibernation because b43_status() returns 0. The correct fix would be to determine why the status is 0; however, I have not yet found why that happens. The correct value is found for my device, which has PHY revision >= 3. Returning TRUE when the PHY revision < 3 and b43_status() returns 0 fixes the regression for 2.6.32. Signed-off-by: Larry Finger Tested-by: Christian Casteyde --- Index: wireless-testing/drivers/net/wireless/b43/rfkill.c === --- wireless-testing.orig/drivers/net/wireless/b43/rfkill.c +++ wireless-testing/drivers/net/wireless/b43/rfkill.c @@ -33,9 +33,16 @@ bool b43_is_hw_radio_enabled(struct b43_ & B43_MMIO_RADIO_HWENABLED_HI_MASK)) return 1; } else { - if (b43_status(dev) >= B43_STAT_STARTED && - b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) - & B43_MMIO_RADIO_HWENABLED_LO_MASK) + /* To prevent CPU fault on PPC, do not read a register +* unless the interface is started; however, on resume +* for hibernation, this routine is entered early. When +* that happens, unconditionally return TRUE. +*/ + if (b43_status(dev) >= B43_STAT_STARTED) { + if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) + & B43_MMIO_RADIO_HWENABLED_LO_MASK) + return 1; + } else return 1; } return 0; ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
[PATCH] ssb: Fix SPROM writing
The SPROM writing routines were broken since we rewrote the suspend handling on wireless devices, because SPROM writing depended on suspend. This patch changes it and freezes devices with the driver remove(), probe() callbacks instead. This also simplifies the whole logics a lot. Signed-off-by: Michael Buesch --- Tested on BCM4306 Index: wireless-testing/drivers/ssb/main.c === --- wireless-testing.orig/drivers/ssb/main.c2009-11-06 21:45:29.0 +0100 +++ wireless-testing/drivers/ssb/main.c 2009-11-23 19:59:24.0 +0100 @@ -140,6 +140,19 @@ static void ssb_device_put(struct ssb_de put_device(dev->dev); } +static inline struct ssb_driver *ssb_driver_get(struct ssb_driver *drv) +{ + if (drv) + get_driver(&drv->drv); + return drv; +} + +static inline void ssb_driver_put(struct ssb_driver *drv) +{ + if (drv) + put_driver(&drv->drv); +} + static int ssb_device_resume(struct device *dev) { struct ssb_device *ssb_dev = dev_to_ssb_dev(dev); @@ -210,90 +223,81 @@ int ssb_bus_suspend(struct ssb_bus *bus) EXPORT_SYMBOL(ssb_bus_suspend); #ifdef CONFIG_SSB_SPROM -int ssb_devices_freeze(struct ssb_bus *bus) +/** ssb_devices_freeze - Freeze all devices on the bus. + * + * After freezing no device driver will be handling a device + * on this bus anymore. ssb_devices_thaw() must be called after + * a successful freeze to reactivate the devices. + * + * @bus: The bus. + * @ctx: Context structure. Pass this to ssb_devices_thaw(). + */ +int ssb_devices_freeze(struct ssb_bus *bus, struct ssb_freeze_context *ctx) { - struct ssb_device *dev; - struct ssb_driver *drv; - int err = 0; - int i; - pm_message_t state = PMSG_FREEZE; + struct ssb_device *sdev; + struct ssb_driver *sdrv; + unsigned int i; + + memset(ctx, 0, sizeof(*ctx)); + ctx->bus = bus; + SSB_WARN_ON(bus->nr_devices > ARRAY_SIZE(ctx->device_frozen)); - /* First check that we are capable to freeze all devices. */ for (i = 0; i < bus->nr_devices; i++) { - dev = &(bus->devices[i]); - if (!dev->dev || - !dev->dev->driver || - !device_is_registered(dev->dev)) - continue; - drv = drv_to_ssb_drv(dev->dev->driver); - if (!drv) + sdev = ssb_device_get(&bus->devices[i]); + + if (!sdev->dev || !sdev->dev->driver || + !device_is_registered(sdev->dev)) { + ssb_device_put(sdev); continue; - if (!drv->suspend) { - /* Nope, can't suspend this one. */ - return -EOPNOTSUPP; } - } - /* Now suspend all devices */ - for (i = 0; i < bus->nr_devices; i++) { - dev = &(bus->devices[i]); - if (!dev->dev || - !dev->dev->driver || - !device_is_registered(dev->dev)) - continue; - drv = drv_to_ssb_drv(dev->dev->driver); - if (!drv) + sdrv = ssb_driver_get(drv_to_ssb_drv(sdev->dev->driver)); + if (!sdrv || SSB_WARN_ON(!sdrv->remove)) { + ssb_device_put(sdev); continue; - err = drv->suspend(dev, state); - if (err) { - ssb_printk(KERN_ERR PFX "Failed to freeze device %s\n", - dev_name(dev->dev)); - goto err_unwind; } + sdrv->remove(sdev); + ctx->device_frozen[i] = 1; } return 0; -err_unwind: - for (i--; i >= 0; i--) { - dev = &(bus->devices[i]); - if (!dev->dev || - !dev->dev->driver || - !device_is_registered(dev->dev)) - continue; - drv = drv_to_ssb_drv(dev->dev->driver); - if (!drv) - continue; - if (drv->resume) - drv->resume(dev); - } - return err; } -int ssb_devices_thaw(struct ssb_bus *bus) +/** ssb_devices_thaw - Unfreeze all devices on the bus. + * + * This will re-attach the device drivers and re-init the devices. + * + * @ctx: The context structure from ssb_devices_freeze() + */ +int ssb_devices_thaw(struct ssb_freeze_context *ctx) { - struct ssb_device *dev; - struct ssb_driver *drv; - int err; - int i; + struct ssb_bus *bus = ctx->bus; + struct ssb_device *sdev; + struct ssb_driver *sdrv; + unsigned int i; + int err, result = 0; for (i = 0; i < bus->nr_devices; i++) { - dev = &(bus->devices[i]); - if (!dev->dev || -
Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks
On 11/23/2009 05:00 AM, Francesco Gringoli wrote: > > so you can observe this behavior at your site. Do you mind describing > the exact configuration? Maybe this time I can reproduce this behavior, > as I tried everything to make it happen. I also asked Larry one of his > boards and put it into several PCs but had no chance to reproduce the > crash. Could you please also report the neighboring stations, the AP you > are connected and so on. As Michael said, I was the one that reported the behavior. My card was a BCM4318 in a Cardbus format running V5.2 of the openfwwf. The AP is a Linksys WRT54G V5 running standard firmware v1.02.6. A list of nearby AP's with channels and strengths are as follows: Cell 01 - Address: 00:14:BF:85:49:FA Channel:1 Frequency:2.412 GHz (Channel 1) Quality=70/70 Signal level=-6 dBm Encryption key:on (WPA2) ESSID:"lwfdjf_rad" Cell 02 - Address: 00:1B:11:5C:B0:83 Channel:1 Frequency:2.412 GHz (Channel 1) Quality=25/70 Signal level=-85 dBm Encryption key:on (WEP) ESSID:"Browns" Cell 03 - Address: 00:1A:70:46:BA:B1 Channel:11 Frequency:2.462 GHz (Channel 11) Quality=42/70 Signal level=-68 dBm Encryption key:on(WEP) ESSID:"Larry with space" Cell 04 - Address: 00:23:69:81:B7:D9 Channel:6 Frequency:2.437 GHz (Channel 6) Quality=34/70 Signal level=-76 dBm Encryption key:on(WEP) ESSID:"Hoover" Cell 05 - Address: 00:14:BF:0C:7E:14 Channel:6 Frequency:2.437 GHz (Channel 6) Quality=28/70 Signal level=-82 dBm Encryption key:on (WPA) ESSID:"linksys" Cell 06 - Address: 00:22:6B:78:18:7D Channel:11 Frequency:2.462 GHz (Channel 11) Quality=26/70 Signal level=-84 dBm Encryption key:on (WPA) ESSID:"Go Away" I was connected to the AP in Cell 01. The test was my usual with a repeating tcpperf run in one terminal and a flood ping to the same server in a second. Please let me know if I missed any useful information. This condition may take a long time to show up. For instance, my latest test ran for 25 hours before failure. All other tests have failed much more quickly, but one never knows. I have not seen this failure with standard firmware. Larry Larry ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks
On Nov 23, 2009, at 11:49 AM, Michael Buesch wrote: > On Monday 23 November 2009 05:45:47 Larry Finger wrote: >> On 11/19/2009 03:24 PM, Michael Buesch wrote: >>> This rewrites the error handling policies in the TX status handler. >>> It tries to be error-tolerant as in "try hard to not crash the >>> machine". >>> It won't recover from errors (that are bugs in the firmware or >>> driver), >>> because that's impossible. However, it will return a more or less >>> useful >>> error message and bail out. It also tries hard to use rate-limited >>> messages >>> to not flood the syslog in case of a failure. >> >> This patch definitely helped open-source firmware, but it is not a >> complete fix. > > It is no fix _at_ _all_. > The patch does not change a single line of code that wasn't either > an assertion > or a machine crash before. > So it just transforms assertions into more verbose assertions and > crashes into > assertions without a crash. > >> debug: Out of order TX status report on DMA ring 1. Expected 114, >> but got 146 > > Ok, this is what I expected. > > Let's see what's going on. Here's the ring. o is unused, * is used. > > ooo > ***ooo > ^ ^ ^ > 114 146 newest > oldest > > So as you can see, the firmware reported a TX status for a frame > right in the middle of > the ringbuffer. The new code detects this now before getting a > double free and/or silent > memory corruption (freeing of used memory). Hi Michael, so you can observe this behavior at your site. Do you mind describing the exact configuration? Maybe this time I can reproduce this behavior, as I tried everything to make it happen. I also asked Larry one of his boards and put it into several PCs but had no chance to reproduce the crash. Could you please also report the neighboring stations, the AP you are connected and so on. Many thanks, -Francesco Informativa sulla privacy: http://help.ing.unibs.it/privacy.php ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks
On Monday 23 November 2009 12:00:15 Francesco Gringoli wrote: > Hi Michael, > > so you can observe this behavior at your site. Do you mind describing > the exact configuration? Maybe this time I can reproduce this > behavior, as I tried everything to make it happen. I also asked Larry > one of his boards and put it into several PCs but had no chance to > reproduce the crash. Could you please also report the neighboring > stations, the AP you are connected and so on. I don't reproduce this, because I never really tried running the opensource firmware. Larry reproduced it, this time. -- Greetings, Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks
On Monday 23 November 2009 05:45:47 Larry Finger wrote: > On 11/19/2009 03:24 PM, Michael Buesch wrote: > > This rewrites the error handling policies in the TX status handler. > > It tries to be error-tolerant as in "try hard to not crash the machine". > > It won't recover from errors (that are bugs in the firmware or driver), > > because that's impossible. However, it will return a more or less useful > > error message and bail out. It also tries hard to use rate-limited messages > > to not flood the syslog in case of a failure. > > This patch definitely helped open-source firmware, but it is not a complete > fix. It is no fix _at_ _all_. The patch does not change a single line of code that wasn't either an assertion or a machine crash before. So it just transforms assertions into more verbose assertions and crashes into assertions without a crash. > debug: Out of order TX status report on DMA ring 1. Expected 114, but got 146 Ok, this is what I expected. Let's see what's going on. Here's the ring. o is unused, * is used. ooo***ooo ^ ^ ^ 114 146 newest oldest So as you can see, the firmware reported a TX status for a frame right in the middle of the ringbuffer. The new code detects this now before getting a double free and/or silent memory corruption (freeing of used memory). It really is illegal to report a TX status for a frame that's not the oldest one in the ring. The firmware is required to process all frames in-order on one ring. So how can this failure happen? I think there basically are three ways this can happen. - First is that the ordering within one ring really gets messed up and it loses track of its ring pointers. I'm not sure if this is likely. Probably not. - It messes up the ring membership. So it reports a TX status on the wrong ring. Note that the "ring" kernel pointer in the TX status report handler is derived from the cookie (and so also the number in the message "Out of order TX status report on DMA ring 1" is derived from the cookie). So it's untrustworthy in case of broken firmware. The firmware has QoS-alike mechanisms, even if QoS is disabled. Maybe these mechanisms are broken. - Third is the possibility of a driver bug. I rule that out as long as nobody is able to reproduce it with proprietary firmware. -- Greetings, Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: Fatal DMA error problem with netbook and BCM4312
On Sun, 2009-11-22 at 19:52 -0600, Larry Finger wrote: > We know that the wl driver does something to the interface that persists > across > a warm boot - we just do not know what. It does not appear to be done in any > of > the MMIO traffic - at least I have not seen it in the mmio-trace output. If > anyone has a KVM setup using PCI passthrough, it is possible to trace PCI > configuration traffic? I'm pretty sure even the binary driver has to go through drivers/pci/access.c, maybe you can just insert logging into that code? johannes signature.asc Description: This is a digitally signed message part ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev
Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks
On Monday 23 November 2009 02:34:09 Larry Finger wrote: > On 11/19/2009 03:24 PM, Michael Buesch wrote: > > This rewrites the error handling policies in the TX status handler. > > It tries to be error-tolerant as in "try hard to not crash the machine". > > It won't recover from errors (that are bugs in the firmware or driver), > > because that's impossible. However, it will return a more or less useful > > error message and bail out. It also tries hard to use rate-limited messages > > to not flood the syslog in case of a failure. > > > > Signed-off-by: Michael Buesch > > > > --- > > Tested and ACKed by Larry Finger. Not only does this improve the error > handling > for b43, but it also appears to fix the skb == NULL error that I experienced > with the open-source firmware. I don't think there's any way it can fix this. The patch doesn't change the code behavior. It just changes the sanity checks, that under normal circumstances should never trigger. > John - please push this into wireless-testing. It should also go to 2.6.32, > but > it is likely too large for the current stage. At least Cc it to stable. Don't put it into stable. This is not a fix. I don't think it's suitable for 2.6.32 at this stage, too. -- Greetings, Michael. ___ Bcm43xx-dev mailing list Bcm43xx-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/bcm43xx-dev