Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Hi, Sorry for the delay, couldn't test for quite some time due to some fried hardware, kernel.org bug report has been created now : https://bugzilla.kernel.org/show_bug.cgi?id=43152 Cheers, Mike. Date: Fri, 16 Mar 2012 17:04:13 -0500 From: jrnie...@gmail.com To: mike-bugrep...@hotmail.com CC: eric.duma...@gmail.com; b...@decadent.org.uk; kirja...@gmail.com; net...@vger.kernel.org; benoit.mort...@opensides.be; herb...@gondor.apana.org.au Subject: Re: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable Hi again, Mike . wrote: Oh well, we also must make sure we held np-lock in TX completion when doing our test to eventually call netif_wake_queue(), I missed it was released too early. here is a more complete patch. I applied the patch, recompiled the module, loaded it into the kernel and started testing traffic on the interface with the following result : [ 1124.008030] [ cut here ] [ 1124.008101] WARNING: at /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255 dev_watchdog+0xb1/0x104() [ 1124.008201] Hardware name: [ 1124.008252] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out [...] After this the same repeat of transmit timeouts (as posted earlier) in the log untill I down the interface. Thanks. I assume current 3.3 release candidates behave the same way. Based on [2], it looks like v2.6.25-rc9~99^2~24 ([NET]: Add preemption point in qdisc_run, 2008-03-28) made this easier to trip. As for the next step: I'd suggest posting a summary of the symptoms, which kernel versions you have tested, and a link to [1] at http://bugzilla.kernel.org/, product Drivers, component Network, and letting us know the bug number so we can track it without forgetting what has already been learned. Hope that helps, Jonathan [1] http://thread.gmane.org/gmane.linux.network/219101
Bug#656476: Info received (Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable)
Some more info. There is indeed no difference in the sundance driver module between 2.6.18.dfsg.1-23etch1 2.6.18.dfsg.1-24etch1 as mentioned by dann frazier in #514833 (both driver binary and source are 100% identical). Looking at the initial warning from my report I compared the /net/sched/sch_generic.c from the sources for 2.6.18.dfsg.1-23etch1 2.6.18.dfsg.1-24etch1 and there is a difference there. etch-dlink-test:~/tmp# diff sch_generic-2.6.18.dfsg.1-23etch1.c sch_generic-2.6.18.dfsg.1-24etch1.c 185a186,187 unsigned long start_time = jiffies; 189,190c191,204 while (qdisc_restart(dev) 0 !netif_queue_stopped(dev)) /* NOTHING */; --- while (qdisc_restart(dev) 0) { if (netif_queue_stopped(dev)) break; /* * Postpone processing if * 1. another process needs the CPU; * 2. we've been doing it for too long. */ if (need_resched() || jiffies != start_time) { netif_schedule(dev); break; } } Hope this helps, Mike.
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Any chance of a follow up on this? Did some more searching myself and ran into the following bug report #514833 which appears to be the exact same problem. Grabbed an old Etch (4.0r7) iso and installed it to test for the problem and indeed after some heavy traffic the error occurs. etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-24etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to : etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-23etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ I tested the box for an hour with maximum incoming+outgoing traffic on the interface without any problems. If there's any other testing I can do please let me know. Mike.
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
found 656476 linux-2.6/2.6.18.dfsg.1-24etch1 quit Hi Mike, Mike . wrote: Grabbed an old Etch (4.0r7) iso and installed it to test for the problem and indeed after some heavy traffic the error occurs. etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-24etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to : etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-23etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ I tested the box for an hour with maximum incoming+outgoing traffic on the interface without any problems. That's awesome. :) Does 2.6.20-1 reproduce the bug, too? That range points to * NET: Add preemption point in qdisc_run (CVE-2008-5713) which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the triggering change. Old kernels can be found at http://snapshot.debian.org/ if you are curious about how any particular one behaves. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120220011351.GE969@burratino
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
On Sun, 2012-02-19 at 19:13 -0600, Jonathan Nieder wrote: found 656476 linux-2.6/2.6.18.dfsg.1-24etch1 quit Hi Mike, Mike . wrote: Grabbed an old Etch (4.0r7) iso and installed it to test for the problem and indeed after some heavy traffic the error occurs. etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-24etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ After downgrading the kernel-image (taken from an Etch 4.0r6 iso) to : etch-dlink-test:~# dpkg -l |grep linux-image ii linux-image-2.6.18-6-686 2.6.18.dfsg.1-23etch1 Linux 2.6.18 image on PPro/Celeron/PII/PIII/ I tested the box for an hour with maximum incoming+outgoing traffic on the interface without any problems. That's awesome. :) Does 2.6.20-1 reproduce the bug, too? That range points to * NET: Add preemption point in qdisc_run (CVE-2008-5713) This just made the existing race conditions in the driver easier to hit. Ben. which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the triggering change. Old kernels can be found at http://snapshot.debian.org/ if you are curious about how any particular one behaves. -- Ben Hutchings If at first you don't succeed, you're doing about average. signature.asc Description: This is a digitally signed message part
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
reassign 514833 linux-2.6 linux-2.6/2.6.18.dfsg.1-24etch1 merge 656476 514833 quit Ben Hutchings wrote: On Sun, 2012-02-19 at 19:13 -0600, Jonathan Nieder wrote: * NET: Add preemption point in qdisc_run (CVE-2008-5713) This just made the existing race conditions in the driver easier to hit. Sure. I was mostly happy with the discovery because it provides an answer to the question How could everyone working on the driver have missed this? -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120220022638.GG969@burratino
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
That's awesome. :) Does 2.6.20-1 reproduce the bug, too? It does indeed yes. That range points to * NET: Add preemption point in qdisc_run (CVE-2008-5713) This just made the existing race conditions in the driver easier to hit. Just as with the 2.6.18 kernel it takes quite some time/traffic to produce the bug, on the 2.6.32 and 3.2.0 kernels it happens much faster. which is commit v2.6.25-rc9~99^2~24 (2008-03-28) upstream as the triggering change. Old kernels can be found at http://snapshot.debian.org/ if you are curious about how any particular one behaves. Thanks for the pointer! much easier then searching iso's for the correct file :) Mike.
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Oh well, we also must make sure we held np-lock in TX completion when doing our test to eventually call netif_wake_queue(), I missed it was released too early. here is a more complete patch. I applied the patch, recompiled the module, loaded it into the kernel and started testing traffic on the interface with the following result : [ 1124.008030] [ cut here ] [ 1124.008101] WARNING: at /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255 dev_watchdog+0xb1/0x104() [ 1124.008201] Hardware name: [ 1124.008252] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out [ 1124.008309] Modules linked in: sundance(O) p4_clockmod cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib mperf fuse w83627ehf hwmon_vid coretemp loop ohci_hcd snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus snd_pcm usbcore snd_seq snd_timer snd_seq_device shpchp psmouse snd sis900 pci_hotplug serio_raw pcspkr mii evdev soundcore parport_pc snd_page_alloc parport processor tpm_tis tpm tpm_bios thermal_sys button usb_common ext3 jbd mbcache sd_mod crc_t10dif ata_generic sata_sis pata_sis libata scsi_mod [last unloaded: sundance] [ 1124.010147] Pid: 5122, comm: gnome-terminal Tainted: G O 3.2.0-1-686-pae #1 [ 1124.010219] Call Trace: [ 1124.010286] [c1038280] ? warn_slowpath_common+0x68/0x79 [ 1124.010344] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 1124.010399] [c10382f9] ? warn_slowpath_fmt+0x29/0x2d [ 1124.010455] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 1124.010511] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 1124.010567] [c1041e78] ? run_timer_softirq+0x150/0x1f3 [ 1124.010622] [c1229d87] ? netif_tx_unlock+0x3a/0x3a [ 1124.010678] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 1124.010733] [c103cd49] ? __do_softirq+0x94/0x12f [ 1124.010788] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 1124.010841] IRQ [c103cf3a] ? irq_exit+0x32/0x80 [ 1124.010931] [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65 [ 1124.012339] [c12b9b11] ? apic_timer_interrupt+0x31/0x38 [ 1124.012397] [c12b007b] ? set_cpu_sibling_map+0x200/0x250 [ 1124.012452] ---[ end trace d55b57d11770d7d5 ]--- After this the same repeat of transmit timeouts (as posted earlier) in the log untill I down the interface. Mike.
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
I'll check this out. After kernel.org was cracked I've missed @kernel.org mail account. On 1/29/12, Ben Hutchings b...@decadent.org.uk wrote: [Trying a different address.] Denis, It looks like you were working on sundance for a while; are you still interested in it? Mike reported that: Network traffic on my D-Link DFE-580TX card results in a transmit queue timeout and gives endless resets after that untill the interface is brought down. The amount of traffic required to generate the error seems to vary but sooner rather then later it will occur. and the messages logged under Linux 3.2.1 are: [ 430.008026] [ cut here ] [ 430.008100] WARNING: at /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255 dev_watchdog+0xb1/0x104() [ 430.008200] Hardware name: [ 430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out [ 430.008307] Modules linked in: p4_clockmod cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii button usb_common soundcore processor shpchp pci_hotplug thermal_sys snd_page_alloc ext3 jbd mbcach e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod [ 430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1 [ 430.010149] Call Trace: [ 430.010203] [c1038280] ? warn_slowpath_common+0x68/0x79 [ 430.010260] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010314] [c10382f9] ? warn_slowpath_fmt+0x29/0x2d [ 430.010370] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010428] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010484] [c1041e78] ? run_timer_softirq+0x150/0x1f3 [ 430.010539] [c1229d87] ? netif_tx_unlock+0x3a/0x3a [ 430.010595] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010649] [c103cd49] ? __do_softirq+0x94/0x12f [ 430.010704] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010757] IRQ [c103cf3a] ? irq_exit+0x32/0x80 [ 430.010847] [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65 [ 430.010906] [c12b9b11] ? apic_timer_interrupt+0x31/0x38 [ 430.010963] [c120007b] ? rtc_proc_show+0x15e/0x22d [ 430.011020] [c1010e5a] ? mwait_idle+0x65/0x8b [ 430.011076] [c100b234] ? cpu_idle+0x95/0xaf [ 430.011132] [c1412708] ? start_kernel+0x32a/0x32f [ 430.011185] ---[ end trace 4f9c55881a85ddc2 ]--- [ 430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a, resetting... [ 430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea [ 430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea [ 430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea [ 430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea [ 430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea [ 430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea [ 430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea [ 430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea [ 430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea [ 430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea [ 430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea [ 430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea [ 430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea [ 430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea [ 430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea [ 430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea [ 430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea [ 430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea [ 430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea [ 430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea [ 430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea [ 430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea [ 430.011397] 16 35afc160 8059(16) 34d4f002 85ea [ 430.011401] 17 35afc170 35afc180 0001805d(17) [ 430.011406] 18 35afc180 35afc190 00018061(18) [ 430.011410] 19 35afc190 35afc1a0 00018065(19) [ 430.011414] 1a 35afc1a0 35afc1b0 00018069(1a) [ 430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea [ 430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea [ 430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea [ 430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea [ 430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea [ 430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1 [ 430.011444] cur_tx=154807(17) dirty_tx=154779(1b) [ 430.011447]
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit : I'll check this out. After kernel.org was cracked I've missed @kernel.org mail account. At first glance, start_tx() is racy against TX completion. It does : if (np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 1 !netif_queue_stopped(dev)) { /* do nothing */ } else { netif_stop_queue (dev); } So it can call netif_stop_queue() while TX completion handler did a cleanup of all queued packets right before. Note intr_handler() doesnt hold the queue spinlock when it does : if (netif_queue_stopped(dev) np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 4) { /* The ring is no longer full, clear busy flag. */ netif_wake_queue (dev); } -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1327918447.2288.24.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Le lundi 30 janvier 2012 à 11:14 +0100, Eric Dumazet a écrit : Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit : I'll check this out. After kernel.org was cracked I've missed @kernel.org mail account. At first glance, start_tx() is racy against TX completion. It does : if (np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 1 !netif_queue_stopped(dev)) { /* do nothing */ } else { netif_stop_queue (dev); } So it can call netif_stop_queue() while TX completion handler did a cleanup of all queued packets right before. Note intr_handler() doesnt hold the queue spinlock when it does : if (netif_queue_stopped(dev) np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 4) { /* The ring is no longer full, clear busy flag. */ netif_wake_queue (dev); } So I would try following patch : drivers/net/ethernet/dlink/sundance.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/dlink/sundance.c b/drivers/net/ethernet/dlink/sundance.c index 28a3a9b..c671a6c 100644 --- a/drivers/net/ethernet/dlink/sundance.c +++ b/drivers/net/ethernet/dlink/sundance.c @@ -1099,11 +1099,13 @@ start_tx (struct sk_buff *skb, struct net_device *dev) tasklet_schedule(np-tx_tasklet); /* On some architectures: explicitly flush cache lines here. */ - if (np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 1 - !netif_queue_stopped(dev)) { - /* do nothing */ - } else { - netif_stop_queue (dev); + if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) { + unsigned long flags; + + spin_lock_irqsave(np-lock, flags); + if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) + netif_stop_queue(dev); + spin_unlock_irqrestore(np-lock, flags); } if (netif_msg_tx_queued(np)) { printk (KERN_DEBUG -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1327919763.2288.26.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
On Mon, 2012-01-30 at 11:14 +0100, Eric Dumazet wrote: Le lundi 30 janvier 2012 à 12:51 +0300, Denis Kirjanov a écrit : I'll check this out. After kernel.org was cracked I've missed @kernel.org mail account. At first glance, start_tx() is racy against TX completion. It does : if (np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 1 !netif_queue_stopped(dev)) { /* do nothing */ } else { netif_stop_queue (dev); } So it can call netif_stop_queue() while TX completion handler did a cleanup of all queued packets right before. Yes, I spotted that. But no descriptors are pushed to the hardware here; that's done in the driver's TX tasklet. Although... maybe that can run immediately when scheduled from here? I've never had to deal with tasklets so I really don't know their semantics. Ben. Note intr_handler() doesnt hold the queue spinlock when it does : if (netif_queue_stopped(dev) np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 4) { /* The ring is no longer full, clear busy flag. */ netif_wake_queue (dev); } -- Ben Hutchings Lowery's Law: If it jams, force it. If it breaks, it needed replacing anyway. signature.asc Description: This is a digitally signed message part
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit : Yes, I spotted that. But no descriptors are pushed to the hardware here; that's done in the driver's TX tasklet. Although... maybe that can run immediately when scheduled from here? I've never had to deal with tasklets so I really don't know their semantics. Thats probable on SMP ... -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1327933736.2288.41.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
On Mon, 2012-01-30 at 15:28 +0100, Eric Dumazet wrote: Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit : Yes, I spotted that. But no descriptors are pushed to the hardware here; that's done in the driver's TX tasklet. Although... maybe that can run immediately when scheduled from here? I've never had to deal with tasklets so I really don't know their semantics. Thats probable on SMP ... The bug report is for a UP system running a kernel built with SMP-alternatives. Ben. -- Ben Hutchings Lowery's Law: If it jams, force it. If it breaks, it needed replacing anyway. signature.asc Description: This is a digitally signed message part
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Le lundi 30 janvier 2012 à 14:41 +, Ben Hutchings a écrit : On Mon, 2012-01-30 at 15:28 +0100, Eric Dumazet wrote: Le lundi 30 janvier 2012 à 14:05 +, Ben Hutchings a écrit : Yes, I spotted that. But no descriptors are pushed to the hardware here; that's done in the driver's TX tasklet. Although... maybe that can run immediately when scheduled from here? I've never had to deal with tasklets so I really don't know their semantics. Thats probable on SMP ... The bug report is for a UP system running a kernel built with SMP-alternatives. Hmm, TX _completion_ is not run from tasklet but hardware IRQ, this is why I added the spin_lock_irqsave(). Tasklet fires the TX, but hardware IRQ does the TX completion part. This driver is ... interesting :) -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1327935455.2297.5.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Le lundi 30 janvier 2012 à 15:57 +0100, Eric Dumazet a écrit : Hmm, TX _completion_ is not run from tasklet but hardware IRQ, this is why I added the spin_lock_irqsave(). Tasklet fires the TX, but hardware IRQ does the TX completion part. This driver is ... interesting :) Oh well, we also must make sure we held np-lock in TX completion when doing our test to eventually call netif_wake_queue(), I missed it was released too early. here is a more complete patch. diff --git a/drivers/net/ethernet/dlink/sundance.c b/drivers/net/ethernet/dlink/sundance.c index 28a3a9b..d5e9472 100644 --- a/drivers/net/ethernet/dlink/sundance.c +++ b/drivers/net/ethernet/dlink/sundance.c @@ -1099,11 +1099,13 @@ start_tx (struct sk_buff *skb, struct net_device *dev) tasklet_schedule(np-tx_tasklet); /* On some architectures: explicitly flush cache lines here. */ - if (np-cur_tx - np-dirty_tx TX_QUEUE_LEN - 1 - !netif_queue_stopped(dev)) { - /* do nothing */ - } else { - netif_stop_queue (dev); + if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) { + unsigned long flags; + + spin_lock_irqsave(np-lock, flags); + if (np-cur_tx - np-dirty_tx = TX_QUEUE_LEN - 1) + netif_stop_queue(dev); + spin_unlock_irqrestore(np-lock, flags); } if (netif_msg_tx_queued(np)) { printk (KERN_DEBUG @@ -1242,8 +1244,8 @@ static irqreturn_t intr_handler(int irq, void *dev_instance) hw_frame_id = ioread8(ioaddr + TxFrameId); } + spin_lock(np-lock); if (np-pci_dev-revision = 0x14) { - spin_lock(np-lock); for (; np-cur_tx - np-dirty_tx 0; np-dirty_tx++) { int entry = np-dirty_tx % TX_RING_SIZE; struct sk_buff *skb; @@ -1267,9 +1269,7 @@ static irqreturn_t intr_handler(int irq, void *dev_instance) np-tx_ring[entry].frag[0].addr = 0; np-tx_ring[entry].frag[0].length = 0; } - spin_unlock(np-lock); } else { - spin_lock(np-lock); for (; np-cur_tx - np-dirty_tx 0; np-dirty_tx++) { int entry = np-dirty_tx % TX_RING_SIZE; struct sk_buff *skb; @@ -1286,7 +1286,6 @@ static irqreturn_t intr_handler(int irq, void *dev_instance) np-tx_ring[entry].frag[0].addr = 0; np-tx_ring[entry].frag[0].length = 0; } - spin_unlock(np-lock); } if (netif_queue_stopped(dev) @@ -1294,6 +1293,7 @@ static irqreturn_t intr_handler(int irq, void *dev_instance) /* The ring is no longer full, clear busy flag. */ netif_wake_queue (dev); } + spin_unlock(np-lock); /* Abnormal error summary/uncommon events handlers. */ if (intr_status (IntrPCIErr | LinkChange | StatsMax)) netdev_error(dev, intr_status); -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1327936900.2297.7.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
Denis, It looks like you were working on sundance for a while; are you still interested in it? Mike reported that: Network traffic on my D-Link DFE-580TX card results in a transmit queue timeout and gives endless resets after that untill the interface is brought down. The amount of traffic required to generate the error seems to vary but sooner rather then later it will occur. and the messages logged under Linux 3.2.1 are: [ 430.008026] [ cut here ] [ 430.008100] WARNING: at /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255 dev_watchdog+0xb1/0x104() [ 430.008200] Hardware name: [ 430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out [ 430.008307] Modules linked in: p4_clockmod cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii button usb_common soundcore processor shpchp pci_hotplug thermal_sys snd_page_alloc ext3 jbd mbcach e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod [ 430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1 [ 430.010149] Call Trace: [ 430.010203] [c1038280] ? warn_slowpath_common+0x68/0x79 [ 430.010260] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010314] [c10382f9] ? warn_slowpath_fmt+0x29/0x2d [ 430.010370] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010428] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010484] [c1041e78] ? run_timer_softirq+0x150/0x1f3 [ 430.010539] [c1229d87] ? netif_tx_unlock+0x3a/0x3a [ 430.010595] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010649] [c103cd49] ? __do_softirq+0x94/0x12f [ 430.010704] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010757] IRQ [c103cf3a] ? irq_exit+0x32/0x80 [ 430.010847] [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65 [ 430.010906] [c12b9b11] ? apic_timer_interrupt+0x31/0x38 [ 430.010963] [c120007b] ? rtc_proc_show+0x15e/0x22d [ 430.011020] [c1010e5a] ? mwait_idle+0x65/0x8b [ 430.011076] [c100b234] ? cpu_idle+0x95/0xaf [ 430.011132] [c1412708] ? start_kernel+0x32a/0x32f [ 430.011185] ---[ end trace 4f9c55881a85ddc2 ]--- [ 430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a, resetting... [ 430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea [ 430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea [ 430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea [ 430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea [ 430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea [ 430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea [ 430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea [ 430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea [ 430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea [ 430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea [ 430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea [ 430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea [ 430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea [ 430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea [ 430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea [ 430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea [ 430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea [ 430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea [ 430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea [ 430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea [ 430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea [ 430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea [ 430.011397] 16 35afc160 8059(16) 34d4f002 85ea [ 430.011401] 17 35afc170 35afc180 0001805d(17) [ 430.011406] 18 35afc180 35afc190 00018061(18) [ 430.011410] 19 35afc190 35afc1a0 00018065(19) [ 430.011414] 1a 35afc1a0 35afc1b0 00018069(1a) [ 430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea [ 430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea [ 430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea [ 430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea [ 430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea [ 430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1 [ 430.011444] cur_tx=154807(17) dirty_tx=154779(1b) [ 430.011447] cur_rx=0 dirty_rx=0 [ 430.011449] cur_task=154807 [ 438.008046] eth1: Transmit timed out, TxStatus 00 TxFrameId 00, resetting... [ 438.008115] 00 35afc000 35afc010 00010001(00)
Bug#656476: Sundance network driver (D-Link DFE-580TX) timeouts rendering interface unusable
[Trying a different address.] Denis, It looks like you were working on sundance for a while; are you still interested in it? Mike reported that: Network traffic on my D-Link DFE-580TX card results in a transmit queue timeout and gives endless resets after that untill the interface is brought down. The amount of traffic required to generate the error seems to vary but sooner rather then later it will occur. and the messages logged under Linux 3.2.1 are: [ 430.008026] [ cut here ] [ 430.008100] WARNING: at /build/buildd-linux-2.6_3.2.1-2-i386-4wAPNj/linux-2.6-3.2.1/debian/build/source_i386_none/net/sched/sch_generic.c:255 dev_watchdog+0xb1/0x104() [ 430.008200] Hardware name: [ 430.008251] NETDEV WATCHDOG: eth1 (sundance): transmit queue 0 timed out [ 430.008307] Modules linked in: p4_clockmod cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats speedstep_lib mperf fuse w83627ehf hwmon_vid coretemp loop snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq snd_time r snd_seq_device ohci_hcd ehci_hcd tpm_tis sis900 sundance tpm usbcore tpm_bios pcspkr psmouse snd parport_pc evdev serio_raw parport mii button usb_common soundcore processor shpchp pci_hotplug thermal_sys snd_page_alloc ext3 jbd mbcach e sd_mod crc_t10dif sata_sis ata_generic pata_sis libata scsi_mod [ 430.010093] Pid: 0, comm: swapper/0 Not tainted 3.2.0-1-686-pae #1 [ 430.010149] Call Trace: [ 430.010203] [c1038280] ? warn_slowpath_common+0x68/0x79 [ 430.010260] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010314] [c10382f9] ? warn_slowpath_fmt+0x29/0x2d [ 430.010370] [c1229e38] ? dev_watchdog+0xb1/0x104 [ 430.010428] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010484] [c1041e78] ? run_timer_softirq+0x150/0x1f3 [ 430.010539] [c1229d87] ? netif_tx_unlock+0x3a/0x3a [ 430.010595] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010649] [c103cd49] ? __do_softirq+0x94/0x12f [ 430.010704] [c103ccb5] ? local_bh_enable+0x2/0x2 [ 430.010757] IRQ [c103cf3a] ? irq_exit+0x32/0x80 [ 430.010847] [c101e6f4] ? smp_apic_timer_interrupt+0x5b/0x65 [ 430.010906] [c12b9b11] ? apic_timer_interrupt+0x31/0x38 [ 430.010963] [c120007b] ? rtc_proc_show+0x15e/0x22d [ 430.011020] [c1010e5a] ? mwait_idle+0x65/0x8b [ 430.011076] [c100b234] ? cpu_idle+0x95/0xaf [ 430.011132] [c1412708] ? start_kernel+0x32a/0x32f [ 430.011185] ---[ end trace 4f9c55881a85ddc2 ]--- [ 430.011244] eth1: Transmit timed out, TxStatus 00 TxFrameId 1a, resetting... [ 430.011302] 00 35afc000 35afc010 8001(00) 34c2d802 85ea [ 430.011307] 01 35afc010 35afc020 0005(01) 34cfc802 85ea [ 430.011311] 02 35afc020 35afc030 8009(02) 357ca802 85ea [ 430.011316] 03 35afc030 35afc040 000d(03) 34d01802 85ea [ 430.011320] 04 35afc040 35afc050 8011(04) 34d2 85ea [ 430.011324] 05 35afc050 35afc060 0015(05) 35a9f802 85ea [ 430.011328] 06 35afc060 35afc070 8019(06) 34c75002 85ea [ 430.011333] 07 35afc070 35afc080 001d(07) 35ac0002 85ea [ 430.011337] 08 35afc080 35afc090 8021(08) 34d4e802 85ea [ 430.011341] 09 35afc090 35afc0a0 0025(09) 357b0002 85ea [ 430.011346] 0a 35afc0a0 35afc0b0 8029(0a) 34d66802 85ea [ 430.011350] 0b 35afc0b0 35afc0c0 002d(0b) 354f2802 85ea [ 430.011354] 0c 35afc0c0 35afc0d0 8031(0c) 34d04802 85ea [ 430.011359] 0d 35afc0d0 35afc0e0 0035(0d) 34cd1002 85ea [ 430.011363] 0e 35afc0e0 35afc0f0 8039(0e) 34cc9802 85ea [ 430.011367] 0f 35afc0f0 35afc100 003d(0f) 34d3d002 85ea [ 430.011371] 10 35afc100 35afc110 8041(10) 355d3002 85ea [ 430.011376] 11 35afc110 35afc120 0045(11) 34d02802 85ea [ 430.011380] 12 35afc120 35afc130 8049(12) 34d8b002 85ea [ 430.011384] 13 35afc130 35afc140 004d(13) 34cc9002 85ea [ 430.011389] 14 35afc140 35afc150 8051(14) 34d51002 85ea [ 430.011393] 15 35afc150 35afc160 0055(15) 357c7802 85ea [ 430.011397] 16 35afc160 8059(16) 34d4f002 85ea [ 430.011401] 17 35afc170 35afc180 0001805d(17) [ 430.011406] 18 35afc180 35afc190 00018061(18) [ 430.011410] 19 35afc190 35afc1a0 00018065(19) [ 430.011414] 1a 35afc1a0 35afc1b0 00018069(1a) [ 430.011419] 1b 35afc1b0 35afc1c0 806d(1b) 34eea002 85ea [ 430.011423] 1c 35afc1c0 35afc1d0 8071(1c) 355d9802 85ea [ 430.011427] 1d 35afc1d0 35afc1e0 8075(1d) 34d19002 85ea [ 430.011431] 1e 35afc1e0 35afc1f0 8079(1e) 354e4002 85ea [ 430.011436] 1f 35afc1f0 35afc000 007d(1f) 354ea002 85ea [ 430.011440] TxListPtr=35afc1b0 netif_queue_stopped=1 [ 430.011444] cur_tx=154807(17) dirty_tx=154779(1b) [ 430.011447] cur_rx=0 dirty_rx=0 [ 430.011449] cur_task=154807 [ 438.008046] eth1: Transmit timed out, TxStatus 00 TxFrameId 00, resetting... [ 438.008115] 00