Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-04-23 Thread Mike .

Hi,

Sorry for the delay, couldn't test for quite some time due to some fried 
hardware, kernel.org bug report has been created now : 
https://bugzilla.kernel.org/show_bug.cgi?id=43152

Cheers,

Mike.

 Date: Fri, 16 Mar 2012 17:04:13 -0500
 From: jrnie...@gmail.com
 To: mike-bugrep...@hotmail.com
 CC: eric.duma...@gmail.com; b...@decadent.org.uk; kirja...@gmail.com; 
 net...@vger.kernel.org; benoit.mort...@opensides.be; 
 herb...@gondor.apana.org.au
 Subject: Re: Sundance network driver (D-Link DFE-580TX) timeouts rendering 
 interface unusable
 
 Hi again,
 
 Mike . wrote:
 
  Oh well, we also must make sure we held np-lock in TX completion when
  doing our test to eventually call netif_wake_queue(), I missed it was
  released too early.
 
  here is a more complete patch.
 
  I applied the patch, recompiled the module, loaded it into the kernel and
  started testing traffic on the interface with the following result :
 
  [ 1124.008030] [ cut here ]
  [ 1124.008101] WARNING: at 
  /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255
   dev_watchdog+0xb1/0x104()
  [ 1124.008201] Hardware name:
  [ 1124.008252] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out
 [...]
  After this the same repeat of transmit timeouts (as posted earlier) in the
  log untill I down the interface.
 
 Thanks.  I assume current 3.3 release candidates behave the same way.
 
 Based on [2], it looks like v2.6.25-rc9~99^2~24 ([NET]: Add preemption
 point in qdisc_run, 2008-03-28) made this easier to trip.
 
 As for the next step: I'd suggest posting a summary of the symptoms,
 which kernel versions you have tested, and a link to [1] at
 http://bugzilla.kernel.org/, product Drivers, component Network, and
 letting us know the bug number so we can track it without forgetting
 what has already been learned.
 
 Hope that helps,
 Jonathan
 
 [1] http://thread.gmane.org/gmane.linux.network/219101
  

Bug#656476: Info received (Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable)

2012-02-21 Thread Mike .

Some more info.

There is indeed no difference in the sundance driver module between 
2.6.18.dfsg.1-23etch1  2.6.18.dfsg.1-24etch1 as mentioned by dann frazier in 
#514833 (both driver binary and source are 100% identical).

Looking at the initial warning from my report I compared the 
/net/sched/sch_generic.c from the sources for 2.6.18.dfsg.1-23etch1  
2.6.18.dfsg.1-24etch1 and there is a difference there.


etch-dlink-test:~/tmp# diff sch_generic-2.6.18.dfsg.1-23etch1.c 
sch_generic-2.6.18.dfsg.1-24etch1.c
185a186,187
   unsigned long start_time = jiffies;

189,190c191,204
   while (qdisc_restart(dev)  0  !netif_queue_stopped(dev))
   /* NOTHING */;
---
   while (qdisc_restart(dev)  0) {
   if (netif_queue_stopped(dev))
   break;

   /*
* Postpone processing if
* 1. another process needs the CPU;
* 2. we've been doing it for too long.
*/
   if (need_resched() || jiffies != start_time) {
   netif_schedule(dev);
   break;
   }
   }

Hope this helps,

Mike.

  

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-19 Thread Mike .

Any chance of a follow up on this?


Did some more searching myself and ran into the following bug report #514833 
which appears to be the exact same problem.

Grabbed an old Etch (4.0r7) iso and installed it to test for the problem and 
indeed after some heavy traffic the error occurs.

etch-dlink-test:~# dpkg -l |grep linux-image
ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-24etch1
Linux 2.6.18 image on PPro/Celeron/PII/PIII/

After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to :

etch-dlink-test:~# dpkg -l |grep linux-image
ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-23etch1
Linux 2.6.18 image on PPro/Celeron/PII/PIII/

I tested the box for an hour with maximum incoming+outgoing traffic on the 
interface without any problems.

If there's any other testing I can do please let me know.

Mike.

  

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-19 Thread Jonathan Nieder
found 656476 linux-2.6/2.6.18.dfsg.1-24etch1
quit

Hi Mike,

Mike . wrote:

 Grabbed an old Etch (4.0r7) iso and installed it to test for the
 problem and indeed after some heavy traffic the error occurs.

 etch-dlink-test:~# dpkg -l |grep linux-image
 ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-24etch1
 Linux 2.6.18 image on PPro/Celeron/PII/PIII/

 After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to :

 etch-dlink-test:~# dpkg -l |grep linux-image
 ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-23etch1
 Linux 2.6.18 image on PPro/Celeron/PII/PIII/

 I tested the box for an hour with maximum incoming+outgoing traffic on the 
 interface without any problems.

That's awesome. :)  Does 2.6.20-1 reproduce the bug, too?

That range points to

  * NET: Add preemption point in qdisc_run (CVE-2008-5713)

which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the
triggering change.  Old kernels can be found at
http://snapshot.debian.org/ if you are curious about how any
particular one behaves.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120220011351.GE969@burratino



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-19 Thread Ben Hutchings
On Sun, 2012-02-19 at 19:13 -0600, Jonathan Nieder wrote:
 found 656476 linux-2.6/2.6.18.dfsg.1-24etch1
 quit
 
 Hi Mike,
 
 Mike . wrote:
 
  Grabbed an old Etch (4.0r7) iso and installed it to test for the
  problem and indeed after some heavy traffic the error occurs.
 
  etch-dlink-test:~# dpkg -l |grep linux-image
  ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-24etch1
  Linux 2.6.18 image on PPro/Celeron/PII/PIII/
 
  After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to :
 
  etch-dlink-test:~# dpkg -l |grep linux-image
  ii  linux-image-2.6.18-6-686  2.6.18.dfsg.1-23etch1
  Linux 2.6.18 image on PPro/Celeron/PII/PIII/
 
  I tested the box for an hour with maximum incoming+outgoing traffic on the 
  interface without any problems.
 
 That's awesome. :)  Does 2.6.20-1 reproduce the bug, too?
 
 That range points to
 
   * NET: Add preemption point in qdisc_run (CVE-2008-5713)

This just made the existing race conditions in the driver easier to hit.

Ben.

 which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the
 triggering change.  Old kernels can be found at
 http://snapshot.debian.org/ if you are curious about how any
 particular one behaves.
 
 
 

-- 
Ben Hutchings
If at first you don't succeed, you're doing about average.


signature.asc
Description: This is a digitally signed message part


Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-19 Thread Jonathan Nieder
reassign 514833 linux-2.6 linux-2.6/2.6.18.dfsg.1-24etch1
merge 656476 514833
quit

Ben Hutchings wrote:
 On Sun, 2012-02-19 at 19:13 -0600, Jonathan Nieder wrote:

   * NET: Add preemption point in qdisc_run (CVE-2008-5713)

 This just made the existing race conditions in the driver easier to
 hit.

Sure.  I was mostly happy with the discovery because it provides an
answer to the question How could everyone working on the driver have
missed this?



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120220022638.GG969@burratino



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-19 Thread Mike .

  That's awesome. :)  Does 2.6.20-1 reproduce the bug, too?

It does indeed yes.

  That range points to
  
* NET: Add preemption point in qdisc_run (CVE-2008-5713)
 
 This just made the existing race conditions in the driver easier to hit.

Just as with the 2.6.18 kernel it takes quite some time/traffic to produce the 
bug, on the 2.6.32 and 3.2.0 kernels it happens much faster.

  which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the
  triggering change.  Old kernels can be found at
  http://snapshot.debian.org/ if you are curious about how any
  particular one behaves.

Thanks for the pointer! much easier then searching iso's for the correct file :)

Mike.

  

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-02-01 Thread Mike .

 Oh well, we also must make sure we held np-lock in TX completion when
 doing our test to eventually call netif_wake_queue(), I missed it was
 released too early.
 
 here is a more complete patch.

I applied the patch, recompiled the module, loaded it into the kernel and 
started testing traffic on the interface with the following result :

[ 1124.008030] [ cut here ]
[ 1124.008101] WARNING: at 
/build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255
 dev_watchdog+0xb1/0x104()
[ 1124.008201] Hardware name:
[ 1124.008252] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out
[ 1124.008309] Modules linked in: sundance(O) p4_clockmod cpufreq_powersave 
cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib mperf fuse 
w83627ehf hwmon_vid coretemp loop ohci_hcd snd_intel8x0 snd_ac97_codec ehci_hcd 
ac97_bus snd_pcm usbcore snd_seq snd_timer snd_seq_device shpchp psmouse snd 
sis900 pci_hotplug serio_raw pcspkr mii evdev soundcore parport_pc 
snd_page_alloc parport processor tpm_tis tpm tpm_bios thermal_sys button 
usb_common ext3 jbd mbcache sd_mod crc_t10dif ata_generic sata_sis pata_sis 
libata scsi_mod [last unloaded: sundance]
[ 1124.010147] Pid: 5122, comm: gnome-terminal Tainted: G   O 
3.2.0-1-686-pae #1
[ 1124.010219] Call Trace:
[ 1124.010286]  [c1038280] ? warn_slowpath_common+0x68/0x79
[ 1124.010344]  [c1229e38] ? dev_watchdog+0xb1/0x104
[ 1124.010399]  [c10382f9] ? warn_slowpath_fmt+0x29/0x2d
[ 1124.010455]  [c1229e38] ? dev_watchdog+0xb1/0x104
[ 1124.010511]  [c103ccb5] ? local_bh_enable+0x2/0x2
[ 1124.010567]  [c1041e78] ? run_timer_softirq+0x150/0x1f3
[ 1124.010622]  [c1229d87] ? netif_tx_unlock+0x3a/0x3a
[ 1124.010678]  [c103ccb5] ? local_bh_enable+0x2/0x2
[ 1124.010733]  [c103cd49] ? __do_softirq+0x94/0x12f
[ 1124.010788]  [c103ccb5] ? local_bh_enable+0x2/0x2
[ 1124.010841]  IRQ  [c103cf3a] ? irq_exit+0x32/0x80
[ 1124.010931]  [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65
[ 1124.012339]  [c12b9b11] ? apic_timer_interrupt+0x31/0x38
[ 1124.012397]  [c12b007b] ? set_cpu_sibling_map+0x200/0x250
[ 1124.012452] ---[ end trace d55b57d11770d7d5 ]---

After this the same repeat of transmit timeouts (as posted earlier) in the log 
untill I down the interface.

Mike.

  

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Denis Kirjanov
I'll check this out. After kernel.org was cracked I've missed
@kernel.org mail account.

On 1/29/12, Ben Hutchings b...@decadent.org.uk wrote:
 [Trying a different address.]

 Denis,

 It looks like you were working on sundance for a while; are you still
 interested in it?

 Mike reported that:
 Network traffic on my D-Link DFE-580TX card results in a transmit
 queue timeout and gives endless resets after that untill the interface
 is brought down.

 The amount of traffic required to generate the error seems to vary but
 sooner rather then later it will occur.

 and the messages logged under Linux 3.2.1 are:

 [  430.008026] [ cut here ]
 [  430.008100] WARNING:
 at
 /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255
 dev_watchdog+0xb1/0x104()
 [  430.008200] Hardware name:
 [  430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0
 timed out
 [  430.008307] Modules linked in: p4_clockmod cpufreq_powersave
 cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib
 mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0
 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time
 r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore
 tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii
 button usb_common soundcore processor shpchp pci_hotplug thermal_sys
 snd_page_alloc ext3 jbd mbcach
 e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod
 [  430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1
 [  430.010149] Call Trace:
 [  430.010203]  [c1038280] ? warn_slowpath_common+0x68/0x79
 [  430.010260]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010314]  [c10382f9] ? warn_slowpath_fmt+0x29/0x2d
 [  430.010370]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010428]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010484]  [c1041e78] ? run_timer_softirq+0x150/0x1f3
 [  430.010539]  [c1229d87] ? netif_tx_unlock+0x3a/0x3a
 [  430.010595]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010649]  [c103cd49] ? __do_softirq+0x94/0x12f
 [  430.010704]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010757]  IRQ  [c103cf3a] ? irq_exit+0x32/0x80
 [  430.010847]  [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65
 [  430.010906]  [c12b9b11] ? apic_timer_interrupt+0x31/0x38
 [  430.010963]  [c120007b] ? rtc_proc_show+0x15e/0x22d
 [  430.011020]  [c1010e5a] ? mwait_idle+0x65/0x8b
 [  430.011076]  [c100b234] ? cpu_idle+0x95/0xaf
 [  430.011132]  [c1412708] ? start_kernel+0x32a/0x32f
 [  430.011185] ---[ end trace 4f9c55881a85ddc2 ]---
 [  430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a,
 resetting...
 [  430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea
 [  430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea
 [  430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea
 [  430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea
 [  430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea
 [  430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea
 [  430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea
 [  430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea
 [  430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea
 [  430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea
 [  430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea
 [  430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea
 [  430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea
 [  430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea
 [  430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea
 [  430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea
 [  430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea
 [  430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea
 [  430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea
 [  430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea
 [  430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea
 [  430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea
 [  430.011397] 16 35afc160  8059(16) 34d4f002 85ea
 [  430.011401] 17 35afc170 35afc180 0001805d(17)  
 [  430.011406] 18 35afc180 35afc190 00018061(18)  
 [  430.011410] 19 35afc190 35afc1a0 00018065(19)  
 [  430.011414] 1a 35afc1a0 35afc1b0 00018069(1a)  
 [  430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea
 [  430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea
 [  430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea
 [  430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea
 [  430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea
 [  430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1
 [  430.011444] cur_tx=154807(17) dirty_tx=154779(1b)
 [  430.011447] 

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Eric Dumazet
Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit :
 I'll check this out. After kernel.org was cracked I've missed
 @kernel.org mail account.


At first glance, start_tx() is racy against TX completion.

It does :

if (np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 1 
!netif_queue_stopped(dev)) {
/* do nothing */
} else {
netif_stop_queue (dev);
}

So it can call netif_stop_queue() while TX completion handler did a
cleanup of all queued packets right before.


Note intr_handler() doesnt hold the queue spinlock when it does :

if (netif_queue_stopped(dev) 
np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 4) {
/* The ring is no longer full, clear busy flag. */
netif_wake_queue (dev);
}





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1327918447.2288.24.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Eric Dumazet
Le lundi 30 janvier 2012 à 11:14 +0100, Eric Dumazet a écrit :
 Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit :
  I'll check this out. After kernel.org was cracked I've missed
  @kernel.org mail account.
 
 
 At first glance, start_tx() is racy against TX completion.
 
 It does :
 
 if (np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 1 
 !netif_queue_stopped(dev)) {
 /* do nothing */
 } else {
 netif_stop_queue (dev);
 }
 
 So it can call netif_stop_queue() while TX completion handler did a
 cleanup of all queued packets right before.
 
 
 Note intr_handler() doesnt hold the queue spinlock when it does :
 
 if (netif_queue_stopped(dev) 
 np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 4) {
 /* The ring is no longer full, clear busy flag. */
 netif_wake_queue (dev);
 }
 

So I would try following patch :

 drivers/net/ethernet/dlink/sundance.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/dlink/sundance.c 
b/drivers/net/ethernet/dlink/sundance.c
index 28a3a9b..c671a6c 100644
--- a/drivers/net/ethernet/dlink/sundance.c
+++ b/drivers/net/ethernet/dlink/sundance.c
@@ -1099,11 +1099,13 @@ start_tx (struct sk_buff *skb, struct net_device *dev)
tasklet_schedule(np-tx_tasklet);
 
/* On some architectures: explicitly flush cache lines here. */
-   if (np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 1 
-   !netif_queue_stopped(dev)) {
-   /* do nothing */
-   } else {
-   netif_stop_queue (dev);
+   if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) {
+   unsigned long flags;
+
+   spin_lock_irqsave(np-lock, flags);
+   if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1)
+   netif_stop_queue(dev);
+   spin_unlock_irqrestore(np-lock, flags);
}
if (netif_msg_tx_queued(np)) {
printk (KERN_DEBUG





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1327919763.2288.26.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Ben Hutchings
On Mon, 2012-01-30 at 11:14 +0100, Eric Dumazet wrote:
 Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit :
  I'll check this out. After kernel.org was cracked I've missed
  @kernel.org mail account.
 
 
 At first glance, start_tx() is racy against TX completion.
 
 It does :
 
 if (np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 1 
 !netif_queue_stopped(dev)) {
 /* do nothing */
 } else {
 netif_stop_queue (dev);
 }
 
 So it can call netif_stop_queue() while TX completion handler did a
 cleanup of all queued packets right before.

Yes, I spotted that.  But no descriptors are pushed to the hardware
here; that's done in the driver's TX tasklet.  Although... maybe that
can run immediately when scheduled from here?  I've never had to deal
with tasklets so I really don't know their semantics.

Ben.

 Note intr_handler() doesnt hold the queue spinlock when it does :
 
 if (netif_queue_stopped(dev) 
 np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 4) {
 /* The ring is no longer full, clear busy flag. */
 netif_wake_queue (dev);
 }
 
 
 

-- 
Ben Hutchings
Lowery's Law:
 If it jams, force it. If it breaks, it needed replacing anyway.


signature.asc
Description: This is a digitally signed message part


Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Eric Dumazet
Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit :

 Yes, I spotted that.  But no descriptors are pushed to the hardware
 here; that's done in the driver's TX tasklet.  Although... maybe that
 can run immediately when scheduled from here?  I've never had to deal
 with tasklets so I really don't know their semantics.

Thats probable on SMP ...





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1327933736.2288.41.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Ben Hutchings
On Mon, 2012-01-30 at 15:28 +0100, Eric Dumazet wrote:
 Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit :
 
  Yes, I spotted that.  But no descriptors are pushed to the hardware
  here; that's done in the driver's TX tasklet.  Although... maybe that
  can run immediately when scheduled from here?  I've never had to deal
  with tasklets so I really don't know their semantics.
 
 Thats probable on SMP ...

The bug report is for a UP system running a kernel built with
SMP-alternatives.

Ben.

-- 
Ben Hutchings
Lowery's Law:
 If it jams, force it. If it breaks, it needed replacing anyway.


signature.asc
Description: This is a digitally signed message part


Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Eric Dumazet
Le lundi 30 janvier 2012 à 14:41 +, Ben Hutchings a écrit :
 On Mon, 2012-01-30 at 15:28 +0100, Eric Dumazet wrote:
  Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit :
  
   Yes, I spotted that.  But no descriptors are pushed to the hardware
   here; that's done in the driver's TX tasklet.  Although... maybe that
   can run immediately when scheduled from here?  I've never had to deal
   with tasklets so I really don't know their semantics.
  
  Thats probable on SMP ...
 
 The bug report is for a UP system running a kernel built with
 SMP-alternatives.

Hmm, TX _completion_ is not run from tasklet but hardware IRQ, this is
why I added the spin_lock_irqsave().


Tasklet fires the TX, but hardware IRQ does the TX completion part.

This driver is ... interesting :)







-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1327935455.2297.5.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-30 Thread Eric Dumazet
Le lundi 30 janvier 2012 à 15:57 +0100, Eric Dumazet a écrit :

 Hmm, TX _completion_ is not run from tasklet but hardware IRQ, this is
 why I added the spin_lock_irqsave().
 
 
 Tasklet fires the TX, but hardware IRQ does the TX completion part.
 
 This driver is ... interesting :)
 

Oh well, we also must make sure we held np-lock in TX completion when
doing our test to eventually call netif_wake_queue(), I missed it was
released too early.

here is a more complete patch.

diff --git a/drivers/net/ethernet/dlink/sundance.c 
b/drivers/net/ethernet/dlink/sundance.c
index 28a3a9b..d5e9472 100644
--- a/drivers/net/ethernet/dlink/sundance.c
+++ b/drivers/net/ethernet/dlink/sundance.c
@@ -1099,11 +1099,13 @@ start_tx (struct sk_buff *skb, struct net_device *dev)
tasklet_schedule(np-tx_tasklet);
 
/* On some architectures: explicitly flush cache lines here. */
-   if (np-cur_tx - np-dirty_tx  TX_QUEUE_LEN - 1 
-   !netif_queue_stopped(dev)) {
-   /* do nothing */
-   } else {
-   netif_stop_queue (dev);
+   if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) {
+   unsigned long flags;
+
+   spin_lock_irqsave(np-lock, flags);
+   if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1)
+   netif_stop_queue(dev);
+   spin_unlock_irqrestore(np-lock, flags);
}
if (netif_msg_tx_queued(np)) {
printk (KERN_DEBUG
@@ -1242,8 +1244,8 @@ static irqreturn_t intr_handler(int irq, void 
*dev_instance)
hw_frame_id = ioread8(ioaddr + TxFrameId);
}
 
+   spin_lock(np-lock);
if (np-pci_dev-revision = 0x14) {
-   spin_lock(np-lock);
for (; np-cur_tx - np-dirty_tx  0; np-dirty_tx++) {
int entry = np-dirty_tx % TX_RING_SIZE;
struct sk_buff *skb;
@@ -1267,9 +1269,7 @@ static irqreturn_t intr_handler(int irq, void 
*dev_instance)
np-tx_ring[entry].frag[0].addr = 0;
np-tx_ring[entry].frag[0].length = 0;
}
-   spin_unlock(np-lock);
} else {
-   spin_lock(np-lock);
for (; np-cur_tx - np-dirty_tx  0; np-dirty_tx++) {
int entry = np-dirty_tx % TX_RING_SIZE;
struct sk_buff *skb;
@@ -1286,7 +1286,6 @@ static irqreturn_t intr_handler(int irq, void 
*dev_instance)
np-tx_ring[entry].frag[0].addr = 0;
np-tx_ring[entry].frag[0].length = 0;
}
-   spin_unlock(np-lock);
}
 
if (netif_queue_stopped(dev) 
@@ -1294,6 +1293,7 @@ static irqreturn_t intr_handler(int irq, void 
*dev_instance)
/* The ring is no longer full, clear busy flag. */
netif_wake_queue (dev);
}
+   spin_unlock(np-lock);
/* Abnormal error summary/uncommon events handlers. */
if (intr_status  (IntrPCIErr | LinkChange | StatsMax))
netdev_error(dev, intr_status);





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1327936900.2297.7.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC



Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-28 Thread Ben Hutchings
Denis,

It looks like you were working on sundance for a while; are you still
interested in it?

Mike reported that:
 Network traffic on my D-Link DFE-580TX card results in a transmit
 queue timeout and gives endless resets after that untill the interface
 is brought down.
 
 The amount of traffic required to generate the error seems to vary but
 sooner rather then later it will occur.

and the messages logged under Linux 3.2.1 are:

 [  430.008026] [ cut here ]
 [  430.008100] WARNING:
 at 
 /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255
  dev_watchdog+0xb1/0x104()
 [  430.008200] Hardware name:
 [  430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0
 timed out
 [  430.008307] Modules linked in: p4_clockmod cpufreq_powersave
 cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib
 mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0
 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time
 r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore
 tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii
 button usb_common soundcore processor shpchp pci_hotplug thermal_sys
 snd_page_alloc ext3 jbd mbcach
 e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod
 [  430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1
 [  430.010149] Call Trace:
 [  430.010203]  [c1038280] ? warn_slowpath_common+0x68/0x79
 [  430.010260]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010314]  [c10382f9] ? warn_slowpath_fmt+0x29/0x2d
 [  430.010370]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010428]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010484]  [c1041e78] ? run_timer_softirq+0x150/0x1f3
 [  430.010539]  [c1229d87] ? netif_tx_unlock+0x3a/0x3a
 [  430.010595]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010649]  [c103cd49] ? __do_softirq+0x94/0x12f
 [  430.010704]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010757]  IRQ  [c103cf3a] ? irq_exit+0x32/0x80
 [  430.010847]  [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65
 [  430.010906]  [c12b9b11] ? apic_timer_interrupt+0x31/0x38
 [  430.010963]  [c120007b] ? rtc_proc_show+0x15e/0x22d
 [  430.011020]  [c1010e5a] ? mwait_idle+0x65/0x8b
 [  430.011076]  [c100b234] ? cpu_idle+0x95/0xaf
 [  430.011132]  [c1412708] ? start_kernel+0x32a/0x32f
 [  430.011185] ---[ end trace 4f9c55881a85ddc2 ]---
 [  430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a,
 resetting...
 [  430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea
 [  430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea
 [  430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea
 [  430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea
 [  430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea
 [  430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea
 [  430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea
 [  430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea
 [  430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea
 [  430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea
 [  430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea
 [  430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea
 [  430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea
 [  430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea
 [  430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea
 [  430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea
 [  430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea
 [  430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea
 [  430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea
 [  430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea
 [  430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea
 [  430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea
 [  430.011397] 16 35afc160  8059(16) 34d4f002 85ea
 [  430.011401] 17 35afc170 35afc180 0001805d(17)  
 [  430.011406] 18 35afc180 35afc190 00018061(18)  
 [  430.011410] 19 35afc190 35afc1a0 00018065(19)  
 [  430.011414] 1a 35afc1a0 35afc1b0 00018069(1a)  
 [  430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea
 [  430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea
 [  430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea
 [  430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea
 [  430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea
 [  430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1
 [  430.011444] cur_tx=154807(17) dirty_tx=154779(1b)
 [  430.011447] cur_rx=0 dirty_rx=0
 [  430.011449] cur_task=154807
 [  438.008046] eth1: Transmit timed out, TxStatus 00 TxFrameId 00,
 resetting...
 [  438.008115] 00 35afc000 35afc010 00010001(00) 

Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable

2012-01-28 Thread Ben Hutchings
[Trying a different address.]

Denis,

It looks like you were working on sundance for a while; are you still
interested in it?

Mike reported that:
 Network traffic on my D-Link DFE-580TX card results in a transmit
 queue timeout and gives endless resets after that untill the interface
 is brought down.
 
 The amount of traffic required to generate the error seems to vary but
 sooner rather then later it will occur.

and the messages logged under Linux 3.2.1 are:

 [  430.008026] [ cut here ]
 [  430.008100] WARNING:
 at 
 /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255
  dev_watchdog+0xb1/0x104()
 [  430.008200] Hardware name:
 [  430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0
 timed out
 [  430.008307] Modules linked in: p4_clockmod cpufreq_powersave
 cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib
 mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0
 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time
 r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore
 tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii
 button usb_common soundcore processor shpchp pci_hotplug thermal_sys
 snd_page_alloc ext3 jbd mbcach
 e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod
 [  430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1
 [  430.010149] Call Trace:
 [  430.010203]  [c1038280] ? warn_slowpath_common+0x68/0x79
 [  430.010260]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010314]  [c10382f9] ? warn_slowpath_fmt+0x29/0x2d
 [  430.010370]  [c1229e38] ? dev_watchdog+0xb1/0x104
 [  430.010428]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010484]  [c1041e78] ? run_timer_softirq+0x150/0x1f3
 [  430.010539]  [c1229d87] ? netif_tx_unlock+0x3a/0x3a
 [  430.010595]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010649]  [c103cd49] ? __do_softirq+0x94/0x12f
 [  430.010704]  [c103ccb5] ? local_bh_enable+0x2/0x2
 [  430.010757]  IRQ  [c103cf3a] ? irq_exit+0x32/0x80
 [  430.010847]  [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65
 [  430.010906]  [c12b9b11] ? apic_timer_interrupt+0x31/0x38
 [  430.010963]  [c120007b] ? rtc_proc_show+0x15e/0x22d
 [  430.011020]  [c1010e5a] ? mwait_idle+0x65/0x8b
 [  430.011076]  [c100b234] ? cpu_idle+0x95/0xaf
 [  430.011132]  [c1412708] ? start_kernel+0x32a/0x32f
 [  430.011185] ---[ end trace 4f9c55881a85ddc2 ]---
 [  430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a,
 resetting...
 [  430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea
 [  430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea
 [  430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea
 [  430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea
 [  430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea
 [  430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea
 [  430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea
 [  430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea
 [  430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea
 [  430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea
 [  430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea
 [  430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea
 [  430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea
 [  430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea
 [  430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea
 [  430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea
 [  430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea
 [  430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea
 [  430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea
 [  430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea
 [  430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea
 [  430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea
 [  430.011397] 16 35afc160  8059(16) 34d4f002 85ea
 [  430.011401] 17 35afc170 35afc180 0001805d(17)  
 [  430.011406] 18 35afc180 35afc190 00018061(18)  
 [  430.011410] 19 35afc190 35afc1a0 00018065(19)  
 [  430.011414] 1a 35afc1a0 35afc1b0 00018069(1a)  
 [  430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea
 [  430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea
 [  430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea
 [  430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea
 [  430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea
 [  430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1
 [  430.011444] cur_tx=154807(17) dirty_tx=154779(1b)
 [  430.011447] cur_rx=0 dirty_rx=0
 [  430.011449] cur_task=154807
 [  438.008046] eth1: Transmit timed out, TxStatus 00 TxFrameId 00,
 resetting...
 [  438.008115] 00