Bug#506419: kernel trace during IPv6 ssh output
also sprach martin f krafft [EMAIL PROTECTED] [2008.12.04.1551 +0100]: I cannot reproduce the same bug anymore, which may be due to the fact that I am using a proto-41 IPv6 tunnel at the new location (and thus lower transmission rates). I can reproduce it now that the machine is back in a native IPV6 network. So the bug is kinda horrid: the NIC that provides eth0 also connects to the IPMI card, and when I cause a whole lot of IPv6 traffic (incoming, e.g. downloading ISOs from switch.ch via IPv6), the NIC locks up to the point where the IPMI card also becomes unreachable. A soft-reboot fixes the problem. There is nothing in the logs. Of course I can no longer reproduce this one at home, even with a proto-41 tunnel to my gateway, and bridging and iptables. JOY! -- .''`. martin f. krafft [EMAIL PROTECTED] : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systems digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Bug#506419: kernel trace during IPv6 ssh output
also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.29.2153 +0100]: This could be a bug in the interaction of bridging or netfilter with GSO. Could you try to rule out either of those? I cannot reproduce the same bug anymore, which may be due to the fact that I am using a proto-41 IPv6 tunnel at the new location (and thus lower transmission rates). But I found a but that seems awfully related. This time, it's on incoming traffic though, not outgoing. So the bug is kinda horrid: the NIC that provides eth0 also connects to the IPMI card, and when I cause a whole lot of IPv6 traffic (incoming, e.g. downloading ISOs from switch.ch via IPv6), the NIC locks up to the point where the IPMI card also becomes unreachable. A soft-reboot fixes the problem. There is nothing in the logs. I can produce this problem with bridging and iptables, or either of the two, but not if I disable bridging and iptables. But since it is rather intermittent, sometimes requiring several gigabytes to be shoved across the line before it hangs up, it could be that plain, no-iptables-no-bridge also has the problem. The machine is coming back home with me. :( -- .''`. martin f. krafft [EMAIL PROTECTED] : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systems digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Bug#506419: kernel trace during IPv6 ssh output
On Fri, 2008-11-21 at 14:58 +0100, martin f krafft wrote: Thanks Aioanei and Ben for your replies. Feels like we are getting somewhere. also sprach Aioanei Rares [EMAIL PROTECTED] [2008.11.21.1353 +0100]: Can you reproduce on another version of the kernel or a vanilla? also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.21.1432 +0100]: This is a warning from the GSO (like TSO) code which only applies to outgoing traffic. I build vanilla 2.6.26.7 and 2.6.27.7 and tried them (using deb-pkg target to create the .debs, and mkinitramfs to make the initrd). With 2.6.26.7, ethtool reports generic segmentation offload (GSO) to be off, but it's on again with 2.6.27.7! As expected, the problem does not appear with 2.6.26.7, but it does appear with 2.6.27.7. GSO is enabled by default in 2.6.27. It implements something like TSO (really delayed segmentation) for IPv4 and IPv6 for any device that supports TX checksum offload but not TSO. You later wrote: Bastian Blank helpfully diagnosed the problem to be with the forcedeth driver. He pointed me at commit edcfe5f[0], which should be in 2.6.26.4, but apparently it's either not present in Debian's 2.6.26-10 package, or it does not fix the bug entirely. I believe that bug would result in IPv6 packets being sent with incorrect checksums, and is entirely separate from this. GSO depends on TX checksum offload so it is automatically disabled when you disable TX checksum offload. It is only being used for IPv6 because the hardware implements TSO for IPv4. This could be a bug in the interaction of bridging or netfilter with GSO. Could you try to rule out either of those? Ben. -- Ben Hutchings Any smoothly functioning technology is indistinguishable from a rigged demo. signature.asc Description: This is a digitally signed message part
Bug#506419: kernel trace during IPv6 ssh output
On Fri, Nov 21, 2008 at 12:06 PM, martin f krafft [EMAIL PROTECTED]wrote: Package: linux-image-2.6.26-1-amd64 Version: 2.6.26-10 Severity: important Tags: ipv6 This is a lenny system. Important because IPv6 is a release goal. When I run dmesg through a (native) IPv6 SSH connection on a new server, the kernel spews plenty traces to the console. The first trace says the kernel is not tainted, in subsequent traces, the taint is claimed to be GW(512). Note below how it is tainted at 10:25:50 (second trace), but not at 10:25:49 (first trace). The same problem arises during *outgoing* scp and renders all other SSH sessions basically unusable for the duration of the transfer. I can reproduce this with an *outgoing* HTTP session too. I cannot reproduce the problem with an IPv4 connection. I cannot reproduce the problem with an incoming transfer. Syslog excerpt of the last boot and first trace attached. [snip] Can you reproduce on another version of the kernel or a vanilla?
Bug#506419: kernel trace during IPv6 ssh output
On Fri, 2008-11-21 at 11:06 +0100, martin f krafft wrote: Package: linux-image-2.6.26-1-amd64 Version: 2.6.26-10 Severity: important Tags: ipv6 This is a lenny system. Important because IPv6 is a release goal. When I run dmesg through a (native) IPv6 SSH connection on a new server, the kernel spews plenty traces to the console. The first trace says the kernel is not tainted, in subsequent traces, the taint is claimed to be GW(512). Note below how it is tainted at 10:25:50 (second trace), but not at 10:25:49 (first trace). A WARN or BUG taints the kernel. The same problem arises during *outgoing* scp and renders all other SSH sessions basically unusable for the duration of the transfer. I can reproduce this with an *outgoing* HTTP session too. This is a warning from the GSO (like TSO) code which only applies to outgoing traffic. The warning comes from: if (WARN_ON(skb-ip_summed != CHECKSUM_PARTIAL)) { which means something generated an skb with incorrect flags (GSO depends on having a partial checksum). I notice there's bridging code in the call trace, and kvm in the modules list. Were you initiating the IPv6 connections from inside a VM? Ben. signature.asc Description: This is a digitally signed message part
Bug#506419: kernel trace during IPv6 ssh output
Thanks Aioanei and Ben for your replies. Feels like we are getting somewhere. also sprach Aioanei Rares [EMAIL PROTECTED] [2008.11.21.1353 +0100]: Can you reproduce on another version of the kernel or a vanilla? also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.21.1432 +0100]: This is a warning from the GSO (like TSO) code which only applies to outgoing traffic. I build vanilla 2.6.26.7 and 2.6.27.7 and tried them (using deb-pkg target to create the .debs, and mkinitramfs to make the initrd). With 2.6.26.7, ethtool reports generic segmentation offload (GSO) to be off, but it's on again with 2.6.27.7! As expected, the problem does not appear with 2.6.26.7, but it does appear with 2.6.27.7. Sure enough, turning gso off fixes the problem on the original Debian kernel too. I notice there's bridging code in the call trace, and kvm in the modules list. Were you initiating the IPv6 connections from inside a VM? Not from inside the VM, but from the host, which is connected via the same bridge. -- .''`. martin f. krafft [EMAIL PROTECTED] : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systems oxymoron: micro$oft works digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)