Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Am Mittwoch, den 08.02.2012, 23:16 +0100 schrieb Francois Romieu: Paul Menzel pm.deb...@googlemail.com : […] [18764.958557] r8169 :02:00.0: PME# disabled [18781.998004] r8169 :02:00.0: eth0: link down ^^ [18781.998024] r8169 :02:00.0: eth0: link down ^^ Two link events within 20 us. /me wonders... [20195.408746] r8169 :02:00.0: PME# enabled [20195.466159] r8169 :02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) [20195.466175] r8169 :02:00.0: restoring config space at offset 0x6 (was 0x4, writing 0xfdaff004) [20195.466182] r8169 :02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xdc01) [20195.466187] r8169 :02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x8) [20195.466194] r8169 :02:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100407) [20195.466826] r8169 :02:00.0: PME# disabled [20211.376483] r8169 :02:00.0: eth0: link down [20211.376507] r8169 :02:00.0: eth0: link down Only ten 24 us difference. But during this resume process the network came back up fine. Also in my prior pasted output this is also shown at the beginning but the network worked fine. [20213.000840] r8169 :02:00.0: eth0: link up [32598.618289] r8169 :02:00.0: eth0: link down This event during resume sometimes shows up and sometimes it does not. I could not find a correlation between successful and failed resumes. [32599.249802] r8169 :02:00.0: PME# enabled [32599.397941] r8169 :02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) [32599.397956] r8169 :02:00.0: restoring config space at offset 0x6 (was 0x4, writing 0xfdaff004) [32599.397963] r8169 :02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xdc01) [32599.397968] r8169 :02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x8) [32599.397975] r8169 :02:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100407) [32599.398766] r8169 :02:00.0: PME# disabled [32599.416148] r8169 :02:00.0: eth0: link down Here it did not work and the link did not come back up. There is over half a second time bet [32673.504101] r8169 :02:00.0: PCI INT A disabled The module is removed and loaded below. [32676.218019] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [32676.218078] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - IRQ 19 [32676.218152] r8169 :02:00.0: setting latency timer to 64 [32676.218246] r8169 :02:00.0: irq 41 for MSI/MSI-X [32676.219792] r8169 :02:00.0: eth0: RTL8168b/8111b at 0xc9368000, 00:1e:8c:aa:1d:b5, XID 1800 IRQ 41 [32676.219803] r8169 :02:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko] [32676.563203] r8169 :02:00.0: eth0: link down [32678.356237] r8169 :02:00.0: eth0: link up The datasheet states [the PHYStatus] register is updated continuously at maximum periods of 300us. but it is far from clear that the coherency with the interrupt status register can be taken for granted. Hayes ? Paul, can you try the hack below ? diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 7a0c800..6daca05 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1278,6 +1278,7 @@ static void __rtl8169_check_link_status(struct net_device *dev, { unsigned long flags; + udelay(500); spin_lock_irqsave(tp-lock, flags); if (tp-link_ok(ioaddr)) { rtl_link_chg_patch(tp); I will try this hack next week. Thank you! Could it be that there is something wrong with the locking though or parallel execution? Sometimes [32598.618289] r8169 :02:00.0: eth0: link down is shown before [32599.249802] r8169 :02:00.0: PME# enabled and sometimes it is not or only afterward. Thanks, Paul signature.asc Description: This is a digitally signed message part
Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Dear Francois, thank you for your fast reply. Am Sonntag, den 05.02.2012, 18:57 +0100 schrieb Francois Romieu: Paul Menzel pm.deb...@googlemail.com : [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331] I experienced this problem (only) three times until now. If I remember correctly the last time with 3.2.1. I still do not know how to reproduce this. (good PR, nice) An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume and a failed one could help if it's a driver thing. The problem has not shown up again until now so I only send the output from the successful resume. Currently Linux version 3.2.4 is installed. The following outputs are identical after startup and (a successful) resume. $ sudo ethtool --version ethtool version 3.1 $ sudo ethtool eth0 # The option `-d` does not exist. Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x0033 (51) drv probe ifdown ifup Link detected: yes $ sudo mii-tool --version $Id: mii-tool.c,v 1.9 2006/09/27 20:59:18 ecki Exp $ (Author: David Hinds based on Donald Becker's mii-diag) net-tools 1.60 $ sudo mii-tool -v eth0: negotiated 100baseTx-FD flow-control, link ok product info: vendor 00:07:32, model 17 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control You may check if runtime power management is enabled or not, especially after a failed resume. See the /sys/devices/pci:../:..:/power directory and its control, runtime_enabled and runtime_status files (control = on - runtime PM disabled, see Documentation/power/runtime_pm.txt) The document is online at [1]. For some reason the ethernet controller is not listed under `/sys/devices`. $ lspci | grep RTL 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) $ lspci -n -s 02:00.0 02:00.0 0200: 10ec:8168 (rev 01) $ ls /sys/devices/ breakpoint i2c-2 LNXSYSTM:00 pnp0 tracepoint cpu i2c-3 pci:00 software virtual i2c-1 i2c-4 platform system I use `/sys/bus/pci/devices` instead. $ more /sys/bus/pci/devices/\:02\:00.0/power/{control,runtime_enabled,runtime_status} :: /sys/bus/pci/devices/:02:00.0/power/control :: on --More--(Next file: /sys/bus/pci/devices/:02:00.0/power/runtime_:: /sys/bus/pci/devices/:02:00.0/power/runtime_enabled :: forbidden --More--(Next file: /sys/bus/pci/devices/:02:00.0/power/runtime_:: /sys/bus/pci/devices/:02:00.0/power/runtime_status :: active If it is enabled and the link does not come up fast enough (5 s), runtime PM will suspend the device. It should not matter as long as the link is still present because the device should (TM) soon generate a power management event. The latter not happening or the PME being ignored could explain the bug. If so, temporarily disabling runtime PM for your device after a failed resume instead of removing the module or the cable may be enough to recover the link. It's just a guess though. So judging from the output above runtime
Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Am Mittwoch, den 08.02.2012, 16:28 +0100 schrieb Paul Menzel: Am Sonntag, den 05.02.2012, 18:57 +0100 schrieb Francois Romieu: Paul Menzel pm.deb...@googlemail.com : [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331] I experienced this problem (only) three times until now. If I remember correctly the last time with 3.2.1. I still do not know how to reproduce this. (good PR, nice) An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume and a failed one could help if it's a driver thing. The problem has not shown up again until now so I only send the output from the successful resume. Currently Linux version 3.2.4 is installed. Right on time a suspend cycle later the problem turned up again. The following outputs are identical after startup and (a successful) resume. $ sudo ethtool --version ethtool version 3.1 $ sudo ethtool eth0 # The option `-d` does not exist. Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x0033 (51) drv probe ifdown ifup Link detected: yes Now from a failed resume. $ sudo ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x0033 (51) drv probe ifdown ifup I could not spot a difference. $ sudo mii-tool --version $Id: mii-tool.c,v 1.9 2006/09/27 20:59:18 ecki Exp $ (Author: David Hinds based on Donald Becker's mii-diag) net-tools 1.60 $ sudo mii-tool -v eth0: negotiated 100baseTx-FD flow-control, link ok product info: vendor 00:07:32, model 17 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control From a failed resume it looks like the following. $ sudo mii-tool -v eth0: negotiated 100baseTx-FD flow-control, link ok product info: vendor 00:07:32, model 17 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Paul Menzel pm.deb...@googlemail.com : [forget runtime PM] [18764.958557] r8169 :02:00.0: PME# disabled [18781.998004] r8169 :02:00.0: eth0: link down ^^ [18781.998024] r8169 :02:00.0: eth0: link down ^^ Two link events within 20 us. /me wonders... The datasheet states [the PHYStatus] register is updated continuously at maximum periods of 300us. but it is far from clear that the coherency with the interrupt status register can be taken for granted. Hayes ? Paul, can you try the hack below ? diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 7a0c800..6daca05 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1278,6 +1278,7 @@ static void __rtl8169_check_link_status(struct net_device *dev, { unsigned long flags; + udelay(500); spin_lock_irqsave(tp-lock, flags); if (tp-link_ok(ioaddr)) { rtl_link_chg_patch(tp); -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120208221644.ga25...@electric-eye.fr.zoreil.com
Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Paul Menzel pm.deb...@googlemail.com : [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331] I experienced this problem (only) three times until now. If I remember correctly the last time with 3.2.1. I still do not know how to reproduce this. (good PR, nice) An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume and a failed one could help if it's a driver thing. You may check if runtime power management is enabled or not, especially after a failed resume. See the /sys/devices/pci:../:..:/power directory and its control, runtime_enabled and runtime_status files (control = on - runtime PM disabled, see Documentation/power/runtime_pm.txt) If it is enabled and the link does not come up fast enough (5 s), runtime PM will suspend the device. It should not matter as long as the link is still present because the device should (TM) soon generate a power management event. The latter not happening or the PME being ignored could explain the bug. If so, temporarily disabling runtime PM for your device after a failed resume instead of removing the module or the cable may be enough to recover the link. It's just a guess though. Please stay with v3.2 or above in the meantime. -- Ueimor -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120205175730.ga23...@electric-eye.fr.zoreil.com
Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
found 656331 3.2.2-1 quit Dear Linux folks, Am Mittwoch, den 18.01.2012, 15:50 + schrieb Ben Hutchings: On Wed, 2012-01-18 at 16:32 +0100, Paul Menzel wrote: Am Mittwoch, den 18.01.2012, 15:03 + schrieb Ben Hutchings: On Wed, 2012-01-18 at 15:15 +0100, Paul Menzel wrote: Package: linux-2.6 Version: 3.1.8-2 Severity: normal suspending and resuming a lot, it happens once to me, that the network device did not come back correctly. I experienced this problem (only) three times until now. If I remember correctly the last time with 3.2.1. I still do not know how to reproduce this. The work around is to unplug and replug the network cable or to unload the module and load it again. [...] Some of the RTL81xx gigabit Ethernet controllers need a firmware patch to be reliable. I can't tell whether you have one of these. Are there any kernel log messages about requesting a firmware file for the NIC? If so, does installing firmware-realtek fix the problem? There are no Linux messages requesting the firmware. OK. And not being able to trigger I probably just have to wait that it happens again. Can I increase some log level to capture more information next time? Or since this could be a firmware bug Linux cannot do anything about this? No, it's probably something that can be fixed in the driver. That sounds promising. It would be great if this could be fixed. Please tell me how I can get you better debugging information next time this happens. $ dmesg | grep -i firmware $ dmesg | grep 8169 [1.109369] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [1.109417] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - IRQ 19 [1.109452] r8169 :02:00.0: setting latency timer to 64 [1.109511] r8169 :02:00.0: irq 41 for MSI/MSI-X [1.110094] r8169 :02:00.0: eth0: RTL8168b/8111b at 0xc9364000, 00:1e:8c:aa:1d:b5, XID 1800 IRQ 41 [...] Right, this variant doesn't need a firmware patch. Please can you re-send your bug report to: […] Be sure to include the log messages from your second mail. Please find the messages from the Linux kernel ring buffer (`dmesg`) attached at the end. Thanks, Paul [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331 $ dmesg | grep -i firmware $ dmesg | grep 8169 [1.109369] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [1.109417] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - IRQ 19 [1.109452] r8169 :02:00.0: setting latency timer to 64 [1.109511] r8169 :02:00.0: irq 41 for MSI/MSI-X [1.110094] r8169 :02:00.0: eth0: RTL8168b/8111b at 0xc9364000, 00:1e:8c:aa:1d:b5, XID 1800 IRQ 41 [ 299.062777] r8169 :02:00.0: eth0: link down [ 300.770805] r8169 :02:00.0: eth0: link up [ 3287.250629] r8169 :02:00.0: PME# enabled [ 3287.397826] r8169 :02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) [ 3287.397841] r8169 :02:00.0: restoring config space at offset 0x6 (was 0x4, writing 0xfdfff004) [ 3287.397848] r8169 :02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xdc01) [ 3287.397853] r8169 :02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x8) [ 3287.397860] r8169 :02:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100407) [ 3287.398443] r8169 :02:00.0: PME# disabled [ 3314.429403] r8169 :02:00.0: eth0: link down [ 3314.429432] r8169 :02:00.0: eth0: link down [ 3316.043306] r8169 :02:00.0: eth0: link up [ 4821.512812] r8169 :02:00.0: PME# enabled [ 4821.661948] r8169 :02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) [ 4821.661963] r8169 :02:00.0: restoring config space at offset 0x6 (was 0x4, writing 0xfdfff004) [ 4821.661969] r8169 :02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xdc01) [ 4821.661975] r8169 :02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x8) [ 4821.661981] r8169 :02:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100407) [ 4821.662532] r8169 :02:00.0: PME# disabled [ 4839.563375] r8169 :02:00.0: eth0: link down [ 4839.563398] r8169 :02:00.0: eth0: link down [ 4841.198305] r8169 :02:00.0: eth0: link up [ 7732.802716] r8169 :02:00.0: PME# enabled [ 7732.910146] r8169 :02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) [ 7732.910161] r8169 :02:00.0: restoring config space at offset 0x6 (was 0x4, writing 0xfdfff004) [ 7732.910168] r8169 :02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xdc01) [ 7732.910173] r8169 :02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x8) [ 7732.910180] r8169 :02:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100407) [ 7732.910733] r8169 :02:00.0: PME# disabled [ 7761.865970] r8169 :02:00.0: eth0:
Processed: Re: Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume
Processing commands for cont...@bugs.debian.org: found 656331 3.2.2-1 Bug #656331 [linux-2.6] r8169 with ASUS M2A-VM (SB600): Network device stays down after resume There is no source info for the package 'linux-2.6' at version '3.2.2-1' with architecture '' Unable to make a source version for version '3.2.2-1' Bug Marked as found in versions 3.2.2-1. quit Stopping processing here. Please contact me if you need assistance. -- 656331: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/handler.s.c.132838357513392.transcr...@bugs.debian.org