Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-12 Thread Paul Menzel
Am Mittwoch, den 08.02.2012, 23:16 +0100 schrieb Francois Romieu:
 Paul Menzel pm.deb...@googlemail.com :

[…]

  [18764.958557] r8169 :02:00.0: PME# disabled
  [18781.998004] r8169 :02:00.0: eth0: link down
  ^^
  [18781.998024] r8169 :02:00.0: eth0: link down
  ^^
 Two link events within 20 us. /me wonders...

[20195.408746] r8169 :02:00.0: PME# enabled
[20195.466159] r8169 :02:00.0: restoring config space at offset 0xf 
(was 0x100, writing 0x10a)
[20195.466175] r8169 :02:00.0: restoring config space at offset 0x6 
(was 0x4, writing 0xfdaff004)
[20195.466182] r8169 :02:00.0: restoring config space at offset 0x4 
(was 0x1, writing 0xdc01)
[20195.466187] r8169 :02:00.0: restoring config space at offset 0x3 
(was 0x0, writing 0x8)
[20195.466194] r8169 :02:00.0: restoring config space at offset 0x1 
(was 0x10, writing 0x100407)
[20195.466826] r8169 :02:00.0: PME# disabled
[20211.376483] r8169 :02:00.0: eth0: link down
[20211.376507] r8169 :02:00.0: eth0: link down

Only ten 24 us difference. But during this resume process the network
came back up fine. Also in my prior pasted output this is also shown at
the beginning but the network worked fine.

[20213.000840] r8169 :02:00.0: eth0: link up
[32598.618289] r8169 :02:00.0: eth0: link down

This event during resume sometimes shows up and sometimes it does not. I
could not find a correlation between successful and failed resumes.

[32599.249802] r8169 :02:00.0: PME# enabled
[32599.397941] r8169 :02:00.0: restoring config space at offset 0xf 
(was 0x100, writing 0x10a)
[32599.397956] r8169 :02:00.0: restoring config space at offset 0x6 
(was 0x4, writing 0xfdaff004)
[32599.397963] r8169 :02:00.0: restoring config space at offset 0x4 
(was 0x1, writing 0xdc01)
[32599.397968] r8169 :02:00.0: restoring config space at offset 0x3 
(was 0x0, writing 0x8)
[32599.397975] r8169 :02:00.0: restoring config space at offset 0x1 
(was 0x10, writing 0x100407)
[32599.398766] r8169 :02:00.0: PME# disabled
[32599.416148] r8169 :02:00.0: eth0: link down

Here it did not work and the link did not come back up. There is over half a 
second time bet

[32673.504101] r8169 :02:00.0: PCI INT A disabled

The module is removed and loaded below.

[32676.218019] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[32676.218078] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - 
IRQ 19
[32676.218152] r8169 :02:00.0: setting latency timer to 64
[32676.218246] r8169 :02:00.0: irq 41 for MSI/MSI-X
[32676.219792] r8169 :02:00.0: eth0: RTL8168b/8111b at 
0xc9368000, 00:1e:8c:aa:1d:b5, XID 1800 IRQ 41
[32676.219803] r8169 :02:00.0: eth0: jumbo features [frames: 4080 
bytes, tx checksumming: ko]
[32676.563203] r8169 :02:00.0: eth0: link down
[32678.356237] r8169 :02:00.0: eth0: link up

 The datasheet states [the PHYStatus] register is updated continuously at
 maximum periods of 300us. but it is far from clear that the coherency
 with the interrupt status register can be taken for granted. Hayes ?
 
 Paul, can you try the hack below ?
 
 diff --git a/drivers/net/ethernet/realtek/r8169.c 
 b/drivers/net/ethernet/realtek/r8169.c
 index 7a0c800..6daca05 100644
 --- a/drivers/net/ethernet/realtek/r8169.c
 +++ b/drivers/net/ethernet/realtek/r8169.c
 @@ -1278,6 +1278,7 @@ static void __rtl8169_check_link_status(struct 
 net_device *dev,
  {
   unsigned long flags;
  
 + udelay(500);
   spin_lock_irqsave(tp-lock, flags);
   if (tp-link_ok(ioaddr)) {
   rtl_link_chg_patch(tp);

I will try this hack next week. Thank you!

Could it be that there is something wrong with the locking though or
parallel execution? Sometimes

[32598.618289] r8169 :02:00.0: eth0: link down

is shown before 

[32599.249802] r8169 :02:00.0: PME# enabled

and sometimes it is not or only afterward.


Thanks,

Paul


signature.asc
Description: This is a digitally signed message part


Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-08 Thread Paul Menzel
Dear Francois,


thank you for your fast reply.


Am Sonntag, den 05.02.2012, 18:57 +0100 schrieb Francois Romieu:
 Paul Menzel pm.deb...@googlemail.com :
 [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331]
  I experienced this problem (only) three times until now. If I remember
  correctly the last time with 3.2.1. I still do not know how to reproduce
  this.
 
 (good PR, nice)
 
 An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume
 and a failed one could help if it's a driver thing.

The problem has not shown up again until now so I only send the output
from the successful resume. Currently Linux version 3.2.4 is installed.
The following outputs are identical after startup and (a successful)
resume.

$ sudo ethtool --version
ethtool version 3.1
$ sudo ethtool eth0 # The option `-d` does not exist.
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Link partner advertised link modes:  10baseT/Half 10baseT/Full 
 100baseT/Half 
100baseT/Full 
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x0033 (51)
   drv probe ifdown ifup
Link detected: yes

$ sudo mii-tool --version
$Id: mii-tool.c,v 1.9 2006/09/27 20:59:18 ecki Exp $
(Author: David Hinds based on Donald Becker's mii-diag)
net-tools 1.60
$ sudo mii-tool -v
eth0: negotiated 100baseTx-FD flow-control, link ok
  product info: vendor 00:07:32, model 17 rev 2
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 
10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD 
flow-control
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 
10baseT-FD 10baseT-HD flow-control

 You may check if runtime power management is enabled or not, especially
 after a failed resume. See the /sys/devices/pci:../:..:/power
 directory and its control, runtime_enabled and runtime_status files
 (control = on - runtime PM disabled, see Documentation/power/runtime_pm.txt)

The document is online at [1].

For some reason the ethernet controller is not listed under
`/sys/devices`.

$ lspci | grep RTL
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
$ lspci -n -s 02:00.0
02:00.0 0200: 10ec:8168 (rev 01)
$ ls /sys/devices/
breakpoint  i2c-2  LNXSYSTM:00  pnp0  tracepoint
cpu i2c-3  pci:00   software  virtual
i2c-1   i2c-4  platform system

I use `/sys/bus/pci/devices` instead.

$ more 
/sys/bus/pci/devices/\:02\:00.0/power/{control,runtime_enabled,runtime_status}
::
/sys/bus/pci/devices/:02:00.0/power/control
::
on
--More--(Next file: 
/sys/bus/pci/devices/:02:00.0/power/runtime_::
/sys/bus/pci/devices/:02:00.0/power/runtime_enabled
::
forbidden
--More--(Next file: 
/sys/bus/pci/devices/:02:00.0/power/runtime_::
/sys/bus/pci/devices/:02:00.0/power/runtime_status
::
active

 If it is enabled and the link does not come up fast enough (5 s), runtime
 PM will suspend the device. It should not matter as long as the link is
 still present because the device should (TM) soon generate a power management
 event. The latter not happening or the PME being ignored could explain
 the bug. If so, temporarily disabling runtime PM for your device after a
 failed resume instead of removing the module or the cable may be enough
 to recover the link. It's just a guess though.

So judging from the output above runtime 

Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-08 Thread Paul Menzel
Am Mittwoch, den 08.02.2012, 16:28 +0100 schrieb Paul Menzel:

 Am Sonntag, den 05.02.2012, 18:57 +0100 schrieb Francois Romieu:
  Paul Menzel pm.deb...@googlemail.com :
  [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331]
   I experienced this problem (only) three times until now. If I remember
   correctly the last time with 3.2.1. I still do not know how to reproduce
   this.
  
  (good PR, nice)
  
  An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume
  and a failed one could help if it's a driver thing.
 
 The problem has not shown up again until now so I only send the output
 from the successful resume. Currently Linux version 3.2.4 is installed.

Right on time a suspend cycle later the problem turned up again.

 The following outputs are identical after startup and (a successful)
 resume.
 
 $ sudo ethtool --version
 ethtool version 3.1
 $ sudo ethtool eth0 # The option `-d` does not exist.
 Settings for eth0:
   Supported ports: [ TP MII ]
   Supported link modes:   10baseT/Half 10baseT/Full 
   100baseT/Half 100baseT/Full 
   1000baseT/Half 1000baseT/Full 
   Supported pause frame use: No
   Supports auto-negotiation: Yes
   Advertised link modes:  10baseT/Half 10baseT/Full 
   100baseT/Half 100baseT/Full 
   1000baseT/Half 1000baseT/Full 
   Advertised pause frame use: Symmetric Receive-only
   Advertised auto-negotiation: Yes
   Link partner advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 
 100baseT/Full 
   Link partner advertised pause frame use: Symmetric
   Link partner advertised auto-negotiation: Yes
   Speed: 100Mb/s
   Duplex: Full
   Port: MII
   PHYAD: 0
   Transceiver: internal
   Auto-negotiation: on
   Supports Wake-on: pumbg
   Wake-on: g
   Current message level: 0x0033 (51)
  drv probe ifdown ifup
   Link detected: yes

Now from a failed resume.

$ sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Link partner advertised link modes:  10baseT/Half 10baseT/Full 
 100baseT/Half 
100baseT/Full 
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x0033 (51)
   drv probe ifdown ifup

I could not spot a difference.

 $ sudo mii-tool --version
 $Id: mii-tool.c,v 1.9 2006/09/27 20:59:18 ecki Exp $
 (Author: David Hinds based on Donald Becker's mii-diag)
 net-tools 1.60
 $ sudo mii-tool -v
 eth0: negotiated 100baseTx-FD flow-control, link ok
   product info: vendor 00:07:32, model 17 rev 2
   basic mode:   autonegotiation enabled
   basic status: autonegotiation complete, link ok
   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 
 10baseT-FD 10baseT-HD
   advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD 
 flow-control
   link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 
 10baseT-FD 10baseT-HD flow-control

From a failed resume it looks like the following.

$ sudo mii-tool -v
eth0: negotiated 100baseTx-FD flow-control, link ok
  product info: vendor 00:07:32, model 17 rev 2
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 
10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD 

Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-08 Thread Francois Romieu
Paul Menzel pm.deb...@googlemail.com :
[forget runtime PM]
 [18764.958557] r8169 :02:00.0: PME# disabled
 [18781.998004] r8169 :02:00.0: eth0: link down
 ^^
 [18781.998024] r8169 :02:00.0: eth0: link down
 ^^
Two link events within 20 us. /me wonders...

The datasheet states [the PHYStatus] register is updated continuously at
maximum periods of 300us. but it is far from clear that the coherency
with the interrupt status register can be taken for granted. Hayes ?

Paul, can you try the hack below ?

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 7a0c800..6daca05 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1278,6 +1278,7 @@ static void __rtl8169_check_link_status(struct net_device 
*dev,
 {
unsigned long flags;
 
+   udelay(500);
spin_lock_irqsave(tp-lock, flags);
if (tp-link_ok(ioaddr)) {
rtl_link_chg_patch(tp);



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120208221644.ga25...@electric-eye.fr.zoreil.com



Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-05 Thread Francois Romieu
Paul Menzel pm.deb...@googlemail.com :
[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331]
 I experienced this problem (only) three times until now. If I remember
 correctly the last time with 3.2.1. I still do not know how to reproduce
 this.

(good PR, nice)

An 'ethtool -d' and a 'mii-tool -v' of the device after a successful resume
and a failed one could help if it's a driver thing.

You may check if runtime power management is enabled or not, especially
after a failed resume. See the /sys/devices/pci:../:..:/power
directory and its control, runtime_enabled and runtime_status files
(control = on - runtime PM disabled, see Documentation/power/runtime_pm.txt)
If it is enabled and the link does not come up fast enough (5 s), runtime
PM will suspend the device. It should not matter as long as the link is
still present because the device should (TM) soon generate a power management
event. The latter not happening or the PME being ignored could explain
the bug. If so, temporarily disabling runtime PM for your device after a
failed resume instead of removing the module or the cable may be enough
to recover the link. It's just a guess though.

Please stay with v3.2 or above in the meantime.

-- 
Ueimor



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120205175730.ga23...@electric-eye.fr.zoreil.com



Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-04 Thread Paul Menzel
found 656331 3.2.2-1
quit


Dear Linux folks,


Am Mittwoch, den 18.01.2012, 15:50 + schrieb Ben Hutchings:
 On Wed, 2012-01-18 at 16:32 +0100, Paul Menzel wrote:
  Am Mittwoch, den 18.01.2012, 15:03 + schrieb Ben Hutchings:
   On Wed, 2012-01-18 at 15:15 +0100, Paul Menzel wrote:
Package: linux-2.6
Version: 3.1.8-2
Severity: normal
  
suspending and resuming a lot, it happens once to me, that the network
device did not come back correctly.

I experienced this problem (only) three times until now. If I remember
correctly the last time with 3.2.1. I still do not know how to reproduce
this.

The work around is to unplug and replug the network cable or to unload
the module and load it again.

   [...]
   
   Some of the RTL81xx gigabit Ethernet controllers need a firmware patch
   to be reliable.  I can't tell whether you have one of these.  Are there
   any kernel log messages about requesting a firmware file for the NIC?
   If so, does installing firmware-realtek fix the problem?
  
  There are no Linux messages requesting the firmware.
 
 OK.
 
  And not being able
  to trigger I probably just have to wait that it happens again. Can I
  increase some log level to capture more information next time? Or since
  this could be a firmware bug Linux cannot do anything about this?
 
 No, it's probably something that can be fixed in the driver.

That sounds promising. It would be great if this could be fixed. Please
tell me how I can get you better debugging information next time this
happens.

  $ dmesg | grep -i firmware
  $ dmesg | grep 8169
  [1.109369] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
  [1.109417] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - IRQ 
  19
  [1.109452] r8169 :02:00.0: setting latency timer to 64
  [1.109511] r8169 :02:00.0: irq 41 for MSI/MSI-X
  [1.110094] r8169 :02:00.0: eth0: RTL8168b/8111b at 
  0xc9364000, 00:1e:8c:aa:1d:b5, XID 1800 IRQ 41
 [...]
 
 Right, this variant doesn't need a firmware patch.
 
 Please can you re-send your bug report to:

[…]

 Be sure to include the log messages from your second mail.

Please find the messages from the Linux kernel ring buffer (`dmesg`)
attached at the end.


Thanks,

Paul


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331


$ dmesg | grep -i firmware
$ dmesg | grep 8169
[1.109369] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.109417] r8169 :02:00.0: PCI INT A - GSI 19 (level, low) - IRQ 19
[1.109452] r8169 :02:00.0: setting latency timer to 64
[1.109511] r8169 :02:00.0: irq 41 for MSI/MSI-X
[1.110094] r8169 :02:00.0: eth0: RTL8168b/8111b at 0xc9364000, 
00:1e:8c:aa:1d:b5, XID 1800 IRQ 41
[  299.062777] r8169 :02:00.0: eth0: link down
[  300.770805] r8169 :02:00.0: eth0: link up
[ 3287.250629] r8169 :02:00.0: PME# enabled
[ 3287.397826] r8169 :02:00.0: restoring config space at offset 0xf (was 
0x100, writing 0x10a)
[ 3287.397841] r8169 :02:00.0: restoring config space at offset 0x6 (was 
0x4, writing 0xfdfff004)
[ 3287.397848] r8169 :02:00.0: restoring config space at offset 0x4 (was 
0x1, writing 0xdc01)
[ 3287.397853] r8169 :02:00.0: restoring config space at offset 0x3 (was 
0x0, writing 0x8)
[ 3287.397860] r8169 :02:00.0: restoring config space at offset 0x1 (was 
0x10, writing 0x100407)
[ 3287.398443] r8169 :02:00.0: PME# disabled
[ 3314.429403] r8169 :02:00.0: eth0: link down
[ 3314.429432] r8169 :02:00.0: eth0: link down
[ 3316.043306] r8169 :02:00.0: eth0: link up
[ 4821.512812] r8169 :02:00.0: PME# enabled
[ 4821.661948] r8169 :02:00.0: restoring config space at offset 0xf (was 
0x100, writing 0x10a)
[ 4821.661963] r8169 :02:00.0: restoring config space at offset 0x6 (was 
0x4, writing 0xfdfff004)
[ 4821.661969] r8169 :02:00.0: restoring config space at offset 0x4 (was 
0x1, writing 0xdc01)
[ 4821.661975] r8169 :02:00.0: restoring config space at offset 0x3 (was 
0x0, writing 0x8)
[ 4821.661981] r8169 :02:00.0: restoring config space at offset 0x1 (was 
0x10, writing 0x100407)
[ 4821.662532] r8169 :02:00.0: PME# disabled
[ 4839.563375] r8169 :02:00.0: eth0: link down
[ 4839.563398] r8169 :02:00.0: eth0: link down
[ 4841.198305] r8169 :02:00.0: eth0: link up
[ 7732.802716] r8169 :02:00.0: PME# enabled
[ 7732.910146] r8169 :02:00.0: restoring config space at offset 0xf (was 
0x100, writing 0x10a)
[ 7732.910161] r8169 :02:00.0: restoring config space at offset 0x6 (was 
0x4, writing 0xfdfff004)
[ 7732.910168] r8169 :02:00.0: restoring config space at offset 0x4 (was 
0x1, writing 0xdc01)
[ 7732.910173] r8169 :02:00.0: restoring config space at offset 0x3 (was 
0x0, writing 0x8)
[ 7732.910180] r8169 :02:00.0: restoring config space at offset 0x1 (was 
0x10, writing 0x100407)
[ 7732.910733] r8169 :02:00.0: PME# disabled
[ 7761.865970] r8169 :02:00.0: eth0: 

Processed: Re: Bug#656331: RTL8168b/8111b with ASUS M2A-VM (SB600): Network device stays down after resume

2012-02-04 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

 found 656331 3.2.2-1
Bug #656331 [linux-2.6] r8169 with ASUS M2A-VM (SB600): Network device stays 
down after resume
There is no source info for the package 'linux-2.6' at version '3.2.2-1' with 
architecture ''
Unable to make a source version for version '3.2.2-1'
Bug Marked as found in versions 3.2.2-1.
 quit
Stopping processing here.

Please contact me if you need assistance.
-- 
656331: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656331
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/handler.s.c.132838357513392.transcr...@bugs.debian.org