Bug#572201: forcedeth driver hangs under heavy load

2010-04-14 Thread stephen mulcahy
Ayaz Abdulla wrote: Attached fix has been submitted to netdev. I've run my reproducer with this patch applied to be Debian 2.6.32 kernel and so far the problem with nodes becoming unresponsive hasn't occurred. NIC settings were left the default so this looks positive r...@node23:~#

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Eric Dumazet wrote: OK it seems forcedeth has problem with checksums ? Try to change ethtool -k eth0 settings ? ethtool -K eth0 tso off tx off Yes, that makes an unresponsive system responsive again immediately, nice! Should the driver default to disabling this until we problem is

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: OK it seems forcedeth has problem with checksums ? Try to change ethtool -k eth0 settings ? ethtool -K eth0 tso off tx off Yes, that makes an unresponsive system responsive again immediately, nice!

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Eric Dumazet wrote: Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: OK it seems forcedeth has problem with checksums ? Try to change ethtool -k eth0 settings ? ethtool -K eth0 tso off tx off Yes, that makes an unresponsive system responsive again

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Ben Hutchings
On Tue, 2010-04-13 at 12:00 +0100, stephen mulcahy wrote: Eric Dumazet wrote: Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: OK it seems forcedeth has problem with checksums ? Try to change ethtool -k eth0 settings ? ethtool -K eth0 tso off tx

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Ok, I've tried both of the following with my reproducer 1. ethtool -K eth0 tso off RESULT: reproducer causes multiple hosts to be come unresponsive on first run. 2. ethtool -K eth0 tx off RESULT: reproducer runs three times without any hosts becoming unresponsive. -stephen -- To

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit : Ok, I've tried both of the following with my reproducer 1. ethtool -K eth0 tso off RESULT: reproducer causes multiple hosts to be come unresponsive on first run. 2. ethtool -K eth0 tx off RESULT: reproducer runs three

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Eric Dumazet wrote: Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit : Ok, I've tried both of the following with my reproducer 1. ethtool -K eth0 tso off RESULT: reproducer causes multiple hosts to be come unresponsive on first run. 2. ethtool -K eth0 tx off RESULT:

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
stephen mulcahy wrote: Now some brave fouls to check the 6410 lines of this driver ? ;) Question of the day : Why TSO is broken in forcedeth ? Is it generically broken or is it broken for specific NICS ? Actually, it is only when tx-checksumming is turned off that the problem doesn't occur

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 15:49 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit : Ok, I've tried both of the following with my reproducer 1. ethtool -K eth0 tso off RESULT: reproducer causes multiple hosts to be come

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Eric Dumazet wrote: I am scratching my head, but I thought you told me that ethtool -K eth0 tso off ethtool -K eth0 tx on was working ? No, sorry for the confusion. ethtool -K eth0 tx off fixes the problem. Setting only ethtool -K eth0 tso off ethtool -K eth0 tx on still results in

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 16:08 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: I am scratching my head, but I thought you told me that ethtool -K eth0 tso off ethtool -K eth0 tx on was working ? No, sorry for the confusion. ethtool -K eth0 tx off fixes the problem.

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread stephen mulcahy
Eric Dumazet wrote: OK, thanks for clarification. Last question, did you tried a vanilla kernel, aka 2.6.33.2 for example ? I built a Debian package from the vanilla 2.6.33.2 and installed that on all nodes and tried my reproducer with the same results - nodes becoming unresponsive. I

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 16:25 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: OK, thanks for clarification. Last question, did you tried a vanilla kernel, aka 2.6.33.2 for example ? I built a Debian package from the vanilla 2.6.33.2 and installed that on all nodes and tried my

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit : Do you really come to the conclusion that TSO is broken with the above test results? I would conclude that there is a TX checksumming issue, since merely turning TSO off does not fix the problem whereas turning TX checksumming off

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com Date: Tue, 13 Apr 2010 16:42:21 +0200 Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit : Ok, I've tried both of the following with my reproducer 1. ethtool -K eth0 tso off RESULT: reproducer causes multiple hosts to be come

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread Ayaz Abdulla
Attached fix has been submitted to netdev. Ayaz Eric Dumazet wrote: Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit : Do you really come to the conclusion that TSO is broken with the above test results? I would conclude that there is a TX checksumming issue, since merely turning

Bug#572201: forcedeth driver hangs under heavy load

2010-04-13 Thread David Miller
From: Ayaz Abdulla aabdu...@nvidia.com Date: Wed, 14 Apr 2010 01:33:15 -0400 Attached fix has been submitted to netdev. Thanks! I apply this soon. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread stephen mulcahy
Ben Hutchings wrote: Stephen Mulcahy reported a regression in forcedeth at http://bugs.debian.org/572201. The system information and some diagnostic information can be found there. Anyone able to help? Incidentally, I also tried the 2.6.33.2 kernel with CONFIG_FORCEDETH_NAPI set to y to see

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread stephen mulcahy
stephen mulcahy wrote: It doesn't - further testing over the weekend saw 6 of 45 machines drop off the network with this problem. Nothing in dmesg or system logs. Happy to run more tests if someone can advise on what should be run. I also just tried using the 2.6.30-2-amd64 (Debian) forcedeth

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread Eric Dumazet
Le lundi 12 avril 2010 à 13:39 +0100, stephen mulcahy a écrit : stephen mulcahy wrote: It doesn't - further testing over the weekend saw 6 of 45 machines drop off the network with this problem. Nothing in dmesg or system logs. Happy to run more tests if someone can advise on what should

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread stephen mulcahy
Eric Dumazet wrote: Le lundi 12 avril 2010 à 13:39 +0100, stephen mulcahy a écrit : I am not sure I understand. Are you saying that using 2.6.30-2-amd64 kernel also makes your forcedeth adapter being not functional ? Hi Eric, If I run my tests with the 2.6.30-2-amd64 kernel the network

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread stephen mulcahy
stephen mulcahy wrote: Are both way non functional (RX and TX), or only one side ? Whats the best way of testing this? (tcpdump listening on both hosts and then running pings between the systems?) stephen mulcahy wrote: Are both way non functional (RX and TX), or only one side ? Whats

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread Eric Dumazet
Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit : Does that help? Well, yes, because it seems a TCP problem. r...@node20:~# tcpdump host node20 and node05 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet),

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread stephen mulcahy
Eric Dumazet wrote: Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit : Do you have some netfilters rules ? Hi Eric, I don't have any netfilters rules: r...@node34:~# for table in filter nat mangle raw; do iptables -t $table -L; done Chain INPUT (policy ACCEPT) target

Bug#572201: forcedeth driver hangs under heavy load

2010-04-12 Thread Eric Dumazet
Le lundi 12 avril 2010 à 17:11 +0100, stephen mulcahy a écrit : Eric Dumazet wrote: Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit : Do you have some netfilters rules ? Hi Eric, I don't have any netfilters rules: r...@node34:~# for table in filter nat mangle raw;

Bug#572201: forcedeth driver hangs under heavy load

2010-04-10 Thread Ben Hutchings
Stephen Mulcahy reported a regression in forcedeth at http://bugs.debian.org/572201. The system information and some diagnostic information can be found there. Anyone able to help? Ben. stephen mulcahy wrote: When running linux-image-2.6.32-trunk-amd64, the network stops responding if large