Bug#506419: kernel trace during IPv6 ssh output

2008-12-05 Thread martin f krafft
also sprach martin f krafft [EMAIL PROTECTED] [2008.12.04.1551 +0100]:
 I cannot reproduce the same bug anymore, which may be due to the
 fact that I am using a proto-41 IPv6 tunnel at the new location
 (and thus lower transmission rates).

I can reproduce it now that the machine is back in a native IPV6
network.

 So the bug is kinda horrid: the NIC that provides eth0 also
 connects to the IPMI card, and when I cause a whole lot of IPv6
 traffic (incoming, e.g. downloading ISOs from switch.ch via IPv6),
 the NIC locks up to the point where the IPMI card also becomes
 unreachable. A soft-reboot fixes the problem. There is nothing in
 the logs.

Of course I can no longer reproduce this one at home, even with
a proto-41 tunnel to my gateway, and bridging and iptables.

JOY!

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Bug#506419: kernel trace during IPv6 ssh output

2008-12-04 Thread martin f krafft
also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.29.2153 +0100]:
 This could be a bug in the interaction of bridging or netfilter with
 GSO.  Could you try to rule out either of those?

I cannot reproduce the same bug anymore, which may be due to the
fact that I am using a proto-41 IPv6 tunnel at the new location (and
thus lower transmission rates).

But I found a but that seems awfully related. This time, it's on
incoming traffic though, not outgoing.

So the bug is kinda horrid: the NIC that provides eth0 also connects
to the IPMI card, and when I cause a whole lot of IPv6 traffic
(incoming, e.g. downloading ISOs from switch.ch via IPv6), the NIC
locks up to the point where the IPMI card also becomes unreachable.
A soft-reboot fixes the problem. There is nothing in the logs.

I can produce this problem with bridging and iptables, or either of
the two, but not if I disable bridging and iptables. But since it is
rather intermittent, sometimes requiring several gigabytes to be
shoved across the line before it hangs up, it could be that plain,
no-iptables-no-bridge also has the problem.

The machine is coming back home with me. :(

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Bug#506419: kernel trace during IPv6 ssh output

2008-11-29 Thread Ben Hutchings
On Fri, 2008-11-21 at 14:58 +0100, martin f krafft wrote:
 Thanks Aioanei and Ben for your replies. Feels like we are getting
 somewhere.
 
 
 
 also sprach Aioanei Rares [EMAIL PROTECTED] [2008.11.21.1353 +0100]:
  Can you reproduce on another version of the kernel or a vanilla?
 
 also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.21.1432 +0100]:
  This is a warning from the GSO (like TSO) code which only applies to
  outgoing traffic.
 
 I build vanilla 2.6.26.7 and 2.6.27.7 and tried them (using deb-pkg
 target to create the .debs, and mkinitramfs to make the initrd).
 
 With 2.6.26.7, ethtool reports generic segmentation offload (GSO) to
 be off, but it's on again with 2.6.27.7! As expected, the problem
 does not appear with 2.6.26.7, but it does appear with 2.6.27.7.

GSO is enabled by default in 2.6.27.  It implements something like TSO
(really delayed segmentation) for IPv4 and IPv6 for any device that
supports TX checksum offload but not TSO.

You later wrote:
 Bastian Blank helpfully diagnosed the problem to be with the
 forcedeth driver. He pointed me at commit edcfe5f[0], which should
 be in 2.6.26.4, but apparently it's either not present in Debian's
 2.6.26-10 package, or it does not fix the bug entirely.

I believe that bug would result in IPv6 packets being sent with
incorrect checksums, and is entirely separate from this.

GSO depends on TX checksum offload so it is automatically disabled when
you disable TX checksum offload.  It is only being used for IPv6 because
the hardware implements TSO for IPv4.

This could be a bug in the interaction of bridging or netfilter with
GSO.  Could you try to rule out either of those?

Ben.

-- 
Ben Hutchings
Any smoothly functioning technology is indistinguishable from a rigged demo.


signature.asc
Description: This is a digitally signed message part


Bug#506419: kernel trace during IPv6 ssh output

2008-11-21 Thread Aioanei Rares
On Fri, Nov 21, 2008 at 12:06 PM, martin f krafft [EMAIL PROTECTED]wrote:

 Package: linux-image-2.6.26-1-amd64
 Version: 2.6.26-10
 Severity: important
 Tags: ipv6

 This is a lenny system. Important because IPv6 is a release goal.

 When I run dmesg through a (native) IPv6 SSH connection on a new
 server, the kernel spews plenty traces to the console. The first
 trace says the kernel is not tainted, in subsequent traces, the
 taint is claimed to be GW(512). Note below how it is tainted at
 10:25:50 (second trace), but not at 10:25:49 (first trace).

 The same problem arises during *outgoing* scp and renders all other
 SSH sessions basically unusable for the duration of the transfer.
 I can reproduce this with an *outgoing* HTTP session too.

 I cannot reproduce the problem with an IPv4 connection.
 I cannot reproduce the problem with an incoming transfer.

 Syslog excerpt of the last boot and first trace attached.


 [snip]


Can you reproduce on another version of the kernel or a vanilla?


Bug#506419: kernel trace during IPv6 ssh output

2008-11-21 Thread Ben Hutchings
On Fri, 2008-11-21 at 11:06 +0100, martin f krafft wrote:
 Package: linux-image-2.6.26-1-amd64
 Version: 2.6.26-10
 Severity: important
 Tags: ipv6
 
 This is a lenny system. Important because IPv6 is a release goal.
 
 When I run dmesg through a (native) IPv6 SSH connection on a new
 server, the kernel spews plenty traces to the console. The first
 trace says the kernel is not tainted, in subsequent traces, the
 taint is claimed to be GW(512). Note below how it is tainted at
 10:25:50 (second trace), but not at 10:25:49 (first trace).

A WARN or BUG taints the kernel.

 The same problem arises during *outgoing* scp and renders all other
 SSH sessions basically unusable for the duration of the transfer.
 I can reproduce this with an *outgoing* HTTP session too.

This is a warning from the GSO (like TSO) code which only applies to
outgoing traffic.

The warning comes from:
if (WARN_ON(skb-ip_summed != CHECKSUM_PARTIAL)) {
which means something generated an skb with incorrect flags (GSO depends
on having a partial checksum).

I notice there's bridging code in the call trace, and kvm in the modules
list.  Were you initiating the IPv6 connections from inside a VM?

Ben.



signature.asc
Description: This is a digitally signed message part


Bug#506419: kernel trace during IPv6 ssh output

2008-11-21 Thread martin f krafft
Thanks Aioanei and Ben for your replies. Feels like we are getting
somewhere.



also sprach Aioanei Rares [EMAIL PROTECTED] [2008.11.21.1353 +0100]:
 Can you reproduce on another version of the kernel or a vanilla?

also sprach Ben Hutchings [EMAIL PROTECTED] [2008.11.21.1432 +0100]:
 This is a warning from the GSO (like TSO) code which only applies to
 outgoing traffic.

I build vanilla 2.6.26.7 and 2.6.27.7 and tried them (using deb-pkg
target to create the .debs, and mkinitramfs to make the initrd).

With 2.6.26.7, ethtool reports generic segmentation offload (GSO) to
be off, but it's on again with 2.6.27.7! As expected, the problem
does not appear with 2.6.26.7, but it does appear with 2.6.27.7.

Sure enough, turning gso off fixes the problem on the original
Debian kernel too.

 I notice there's bridging code in the call trace, and kvm in the
 modules list.  Were you initiating the IPv6 connections from
 inside a VM?

Not from inside the VM, but from the host, which is connected via
the same bridge.

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems
 
oxymoron: micro$oft works


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)