Bug#600286: linux-image-2.6.32-5-amd64: atl1c driver hangs after NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out

2011-07-31 Thread Niels Möller
Moritz Mühlenhoff j...@inutil.org writes:

 Niels, did you try the patch or has this been resolved in more recent kernels?

I built and installed a custom kernel with this patch. I think I've seen
the same (or very similar) problem a few times with the patched kernel,
but I haven't really tried to provoke it.

Since I built this custom kernel, I have not tried any newer debian
packaged kernels.

I'm sorry I haven't had much time to look further into this.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/nnei16s051@stalhein.lysator.liu.se



Bug#600286: linux-image-2.6.32-5-amd64: atl1c driver hangs after NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out

2011-07-30 Thread Moritz Mühlenhoff
On Sun, Oct 17, 2010 at 02:38:22AM +0100, Ben Hutchings wrote:
 I have no idea why the kernel sent such large packets. According to
 ifconfig eth0, the MTU is 1500 bytes. Is it supposed to work like
 that, or is this another bug?
 [...]
 
 This driver and hardware implement TCP Segmentation Offload (TSO), which
 means the kernel can provide oversized pseudo-packets that are turned
 into multiple packets on the wire.
 
 I did notice a later bug fix to this driver which might possibly address
 the bug you're seeing.  Please can you test the attached patch,
 following the instructions at
 http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Niels, did you try the patch or has this been resolved in more recent kernels?

Cheers,
Moritz



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110730101423.GA6580@pisco.westfalen.local



Bug#600286: linux-image-2.6.32-5-amd64: atl1c driver hangs after NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out

2010-10-16 Thread Ben Hutchings
On Fri, 2010-10-15 at 16:34 +0200, Niels Möller wrote:
 Package: linux-2.6
 Version: 2.6.32-23
 Severity: important
 
 *** Please type your report below this line ***
 
 My wired network interface stops working (not able to transmit any
 packets, I think, but it might be reception which is broken. Anyway,
 ping to hosts on the local network results in a No route to host error,
 which means that sending or receiving arp packets fail).
[...]
 It has hanged in this way twice today, each time after I did two things
 which might be related:
 
 1. I sent some large packets (tcp over ipv4 over ethernet) of size up to
roughly 2 bytes and at a rate close to the links capacity of
100Mbit/s. One such packet, displayed by tcpdump:
 
 13:58:15.872214 00:26:9e:b3:2f:3b (oui Unknown)  00:11:25:85:b0:1a (oui 
 Unknown), ethertype IPv4 (0x0800), length 22790: 192.168.1.135.47058  
 192.168.1.108.4711: Flags [.], seq 1079380:1102104, ack 1, win 32, options 
 [nop,nop,TS val 19304579 ecr 1030215], length 22724
 
I have no idea why the kernel sent such large packets. According to
ifconfig eth0, the MTU is 1500 bytes. Is it supposed to work like
that, or is this another bug?
[...]

This driver and hardware implement TCP Segmentation Offload (TSO), which
means the kernel can provide oversized pseudo-packets that are turned
into multiple packets on the wire.

I did notice a later bug fix to this driver which might possibly address
the bug you're seeing.  Please can you test the attached patch,
following the instructions at
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From: Jie Yang jie.y...@atheros.com
Date: Tue, 27 Oct 2009 22:31:19 -0700
Subject: [PATCH] atl1c: duplicate atl1c_get_tpd

commit 678b77e265f6d66f1e68f3d095841c44ba5ab112 upstream.

remove duplicate atl1c_get_tpd, it may cause hardware to send wrong packets.

Signed-off-by: Jie Yang jie.y...@atheros.com
Signed-off-by: David S. Miller da...@davemloft.net
---
 drivers/net/atl1c/atl1c_main.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/atl1c/atl1c_main.c b/drivers/net/atl1c/atl1c_main.c
index 1372e9a..3b8801a 100644
--- a/drivers/net/atl1c/atl1c_main.c
+++ b/drivers/net/atl1c/atl1c_main.c
@@ -1981,8 +1981,6 @@ static void atl1c_tx_map(struct atl1c_adapter *adapter,
 		else {
 			use_tpd = atl1c_get_tpd(adapter, type);
 			memcpy(use_tpd, tpd, sizeof(struct atl1c_tpd_desc));
-			use_tpd = atl1c_get_tpd(adapter, type);
-			memcpy(use_tpd, tpd, sizeof(struct atl1c_tpd_desc));
 		}
 		buffer_info = atl1c_get_tx_buffer(adapter, use_tpd);
 		buffer_info-length = buf_len - mapped_len;
-- 
1.7.1



signature.asc
Description: This is a digitally signed message part


Bug#600286: linux-image-2.6.32-5-amd64: atl1c driver hangs after NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out

2010-10-15 Thread Niels Möller
Package: linux-2.6
Version: 2.6.32-23
Severity: important

*** Please type your report below this line ***

My wired network interface stops working (not able to transmit any
packets, I think, but it might be reception which is broken. Anyway,
ping to hosts on the local network results in a No route to host error,
which means that sending or receiving arp packets fail).

I see the following traceback in dmesg:

  [73228.976129] device eth0 entered promiscuous mode
  [73247.596111] device eth0 left promiscuous mode
  [74027.804522] [ cut here ]
  [74027.804536] WARNING: at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/net/sched/sch_generic.c:261
 dev_watchdog+0xe2/0x194()
  [74027.804541] Hardware name: Aspire 1810TZ
  [74027.804544] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
  [74027.804547] Modules linked in: acpi_cpufreq cpufreq_stats 
cpufreq_powersave cpufreq_userspace cpufreq_conservative sco bridge stp bnep 
rfcomm l2cap crc16 binfmt_misc uinput fuse coretemp loop 
snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm_oss snd_mixer_oss arc4 ecb snd_pcm snd_seq_midi snd_rawmidi 
iwlagn snd_seq_midi_event joydev iwlcore snd_seq snd_timer uvcvideo 
snd_seq_device i915 drm_kms_helper drm btusb snd videodev acer_wmi mac80211 
v4l1_compat i2c_i801 i2c_algo_bit psmouse v4l2_compat_ioctl32 led_class 
bluetooth soundcore video cfg80211 rfkill snd_page_alloc output wmi i2c_core 
pcspkr evdev serio_raw processor battery ac button ext3 jbd mbcache sd_mod 
crc_t10dif uhci_hcd ahci libata thermal thermal_sys ehci_hcd atl1c scsi_mod 
usbcore nls_base [last unloaded: scsi_wait_scan]
  [74027.804649] Pid: 0, comm: swapper Tainted: GW  2.6.32-5-amd64 #1
  [74027.804652] Call Trace:
  [74027.804655]  IRQ  [8126157e] ? dev_watchdog+0xe2/0x194
  [74027.804666]  [8126157e] ? dev_watchdog+0xe2/0x194
  [74027.804672]  [8104db34] ? warn_slowpath_common+0x77/0xa3
  [74027.804678]  [8126149c] ? dev_watchdog+0x0/0x194
  [74027.804683]  [8104dbbc] ? warn_slowpath_fmt+0x51/0x59
  [74027.804689]  [8104199b] ? enqueue_task_fair+0x24/0x68
  [74027.804696]  [8103a2fd] ? activate_task+0x20/0x26
  [74027.804701]  [81049fde] ? try_to_wake_up+0x249/0x259
  [74027.804707]  [81261470] ? netif_tx_lock+0x3d/0x69
  [74027.804713]  [8124c2f8] ? netdev_drivername+0x3b/0x40
  [74027.804718]  [8126157e] ? dev_watchdog+0xe2/0x194
  [74027.804724]  [8103f81e] ? __wake_up+0x30/0x44
  [74027.804731]  [8105a137] ? run_timer_softirq+0x1c9/0x268
  [74027.804738]  [810538af] ? __do_softirq+0xdd/0x1a2
  [74027.804744]  [810240da] ? lapic_next_event+0x18/0x1d
  [74027.804750]  [81011cac] ? call_softirq+0x1c/0x30
  [74027.804756]  [8101322b] ? do_softirq+0x3f/0x7c
  [74027.804761]  [8105371e] ? irq_exit+0x36/0x76
  [74027.804766]  [81024ba8] ? smp_apic_timer_interrupt+0x87/0x95
  [74027.804772]  [81011673] ? apic_timer_interrupt+0x13/0x20
  [74027.804775]  EOI  [a0059271] ? acpi_idle_enter_c1+0x9d/0xb8 
[processor]
  [74027.804794]  [a005924c] ? acpi_idle_enter_c1+0x78/0xb8 
[processor]
  [74027.804801]  [81238936] ? cpuidle_idle_call+0x94/0xee
  [74027.804808]  [8100feb1] ? cpu_idle+0xa2/0xda
  [74027.804812] ---[ end trace a7919e7f17c0a727 ]---

It has hanged in this way twice today, each time after I did two things
which might be related:

1. I sent some large packets (tcp over ipv4 over ethernet) of size up to
   roughly 2 bytes and at a rate close to the links capacity of
   100Mbit/s. One such packet, displayed by tcpdump:

13:58:15.872214 00:26:9e:b3:2f:3b (oui Unknown)  00:11:25:85:b0:1a (oui 
Unknown), ethertype IPv4 (0x0800), length 22790: 192.168.1.135.47058  
192.168.1.108.4711: Flags [.], seq 1079380:1102104, ack 1, win 32, options 
[nop,nop,TS val 19304579 ecr 1030215], length 22724

   I have no idea why the kernel sent such large packets. According to
   ifconfig eth0, the MTU is 1500 bytes. Is it supposed to work like
   that, or is this another bug?

2. I used tcpdump, switching the interface to promiscuous mode and
   back (default behaviour, tcpdump -p would have worked just as fine
   for me).

Taking the interface down and up (ifdown eth0 ; ifup eth0) didn't
help. After hibernating the computer and wakening it up, the network
interface started working again.

The version of the atl1c network driver is 1.0.0.1-NAPI. There seems
to be a newer one in Linus' kernel tree, 1.0.1-0-NAPI, but I haven't
been able to figure out if there are any changes which might fix this
problem, and I haven't tried booting any other kernel.

Regards,
/Niels Möller

-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-23) (da...@debian.org) (gcc version 
4.3.5 (Debian 4.3.5-3) ) #1 SMP Fri Sep 17 21:50:19 UTC 2010

** Command line: