Re: e1000: Detected Tx Unit Hang

2008-02-19 Thread Kok, Auke
Bernd Schubert wrote: On Saturday 16 February 2008, Kok, Auke wrote: Bernd Schubert wrote: Hello, I can't login to one of our servers and just got this in an ipmi sol session: [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.209183] Tx Queue 0

e1000: Detected Tx Unit Hang

2008-02-15 Thread Bernd Schubert
Hello, I can't login to one of our servers and just got this in an ipmi sol session: [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.209183] Tx Queue 0 [18169.209184] TDH e3 [18169.209185] TDT e3 [18169.209186]

Re: e1000: Detected Tx Unit Hang

2008-02-15 Thread Kok, Auke
Bernd Schubert wrote: Hello, I can't login to one of our servers and just got this in an ipmi sol session: [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.209183] Tx Queue 0 [18169.209184] TDH e3 [18169.209185] TDT

Re: e1000: Detected Tx Unit Hang

2008-02-15 Thread Bernd Schubert
On Saturday 16 February 2008, Kok, Auke wrote: Bernd Schubert wrote: Hello, I can't login to one of our servers and just got this in an ipmi sol session: [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.209183] Tx Queue 0 [18169.209184]

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-21 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Mon, 21 Jan 2008 14:27:13 +0100 Yes it works. e1000 tested for ~3 hours with high very high load and interface up/down every 5:th sec. Without the patch the irq's gets disabled within a couple of seconds A resolute way of handling the

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-21 Thread Robert Olsson
David Miller writes: Yes, this semaphore thing is highly problematic. In the most crucial areas where network driver consistency matters the most for ease of understanding and debugging, the Intel drivers choose to be different :-( The way the napi_disable() logic breaks out from

RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-20 Thread Brandeburg, Jesse
David Miller wrote: From: Robert Olsson [EMAIL PROTECTED] Date: Fri, 18 Jan 2008 14:00:57 +0100 I don't understand the idea with semaphore for enabling/disabling irq's either the overall logic must safer/better without it. They must have had code paths where they didn't know if IRQs

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-20 Thread Andrey Rahmatullin
On Sun, Jan 20, 2008 at 01:20:11AM -0800, Brandeburg, Jesse wrote: I continually get the kernel: unregister_netdevice: waiting for eth2 to become free. Usage count = 1 http://bugzilla.kernel.org/show_bug.cgi?id=9778 -- WBR, wRAR (ALT Linux Team) signature.asc Description: Digital signature

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-20 Thread Badalian Vyacheslav
Hello. Its work, thanks for resend it! Sorry, i understand that patch 53e52c729cc169db82a6105fac7a166e10c2ec36 ([NET]: Make -poll() breakout consistent in Intel ethernet drivers.) have regression and rollback it, i not see your patch. Sorry again. Thanks! From: Badalian Vyacheslav [EMAIL

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-18 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Wed, 16 Jan 2008 18:07:38 +0100 eth0 e1000_irq_enable sem = 1- High netload eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-18 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Fri, 18 Jan 2008 14:00:57 +0100 I don't understand the idea with semaphore for enabling/disabling irq's either the overall logic must safer/better without it. They must have had code paths where they didn't know if IRQs were enabled or not

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-18 Thread Robert Olsson
David Miller writes: eth0 e1000_irq_enable sem = 1- ifconfig eth0 down eth0 e1000_irq_disable sem = 2 **e1000_open - ifconfig eth0 up eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled e1000_irq_enable miss eth0 e1000_irq_enable sem = 2

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-17 Thread David Miller
From: Frans Pop [EMAIL PROTECTED] Date: Thu, 17 Jan 2008 08:51:55 +0100 On Thursday 17 January 2008, David Miller wrote: From: Brandeburg, Jesse [EMAIL PROTECTED] We spent Wednesday trying to reproduce (without the patch) these issues without much luck, and have applied the patch

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-17 Thread Arnaldo Carvalho de Melo
Em Thu, Jan 17, 2008 at 12:00:02AM -0800, David Miller escreveu: From: Frans Pop [EMAIL PROTECTED] Date: Thu, 17 Jan 2008 08:51:55 +0100 On Thursday 17 January 2008, David Miller wrote: From: Brandeburg, Jesse [EMAIL PROTECTED] We spent Wednesday trying to reproduce (without the

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-17 Thread David Miller
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Date: Thu, 17 Jan 2008 07:40:07 -0200 I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again to reproduce. Thanks for the datapoints and testing. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread Frans Pop
On Wednesday 16 January 2008, David Miller wrote: Ok, here is the patch I'll propose to fix this. The goal is to make it as simple as possible without regressing the thing we were trying to fix. Looks good to me. Tested with -rc8. Cheers, FJP -- To unsubscribe from this list: send the line

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread Badalian Vyacheslav
applied to 2.6.24-rc7-git2 Have messages Also have regression after apply patch. System may do above 800mbs traffic before patch. After its exit polling mode? (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3 CPU is IDLE) After patch system was go to exit polling mode at above 600mbs. Thanks.

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread David Miller
From: Frans Pop [EMAIL PROTECTED] Date: Wed, 16 Jan 2008 09:56:08 +0100 On Wednesday 16 January 2008, David Miller wrote: Ok, here is the patch I'll propose to fix this. The goal is to make it as simple as possible without regressing the thing we were trying to fix. Looks good to me.

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread David Miller
From: Badalian Vyacheslav [EMAIL PROTECTED] Date: Wed, 16 Jan 2008 12:02:28 +0300 Also have regression after apply patch. BTW, if you are using the e1000e driver then this initial patch will not work. My more recent patch posting for this problem, will. I include it again below for you:

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread David Miller
From: Badalian Vyacheslav [EMAIL PROTECTED] Date: Wed, 16 Jan 2008 12:02:28 +0300 applied to 2.6.24-rc7-git2 Have messages Also have regression after apply patch. System may do above 800mbs traffic before patch. After its exit polling mode? (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread Robert Olsson
David Miller writes: On Wednesday 16 January 2008, David Miller wrote: Ok, here is the patch I'll propose to fix this. The goal is to make it as simple as possible without regressing the thing we were trying to fix. Looks good to me. Tested with -rc8. Thanks for

RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread Brandeburg, Jesse
David Miller wrote: From: Brandeburg, Jesse [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 13:53:43 -0800 The tx code has an early exit that tries to limit the amount of tx packets handled in a single poll loop and requires napi or interrupt rescheduling based on the return value from

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread David Miller
From: Brandeburg, Jesse [EMAIL PROTECTED] Date: Wed, 16 Jan 2008 23:09:47 -0800 We spent Wednesday trying to reproduce (without the patch) these issues without much luck, and have applied the patch cleanly and will continue testing it. Given the simplicity of the changes, and the community

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-16 Thread Frans Pop
On Thursday 17 January 2008, David Miller wrote: From: Brandeburg, Jesse [EMAIL PROTECTED] We spent Wednesday trying to reproduce (without the patch) these issues without much luck, and have applied the patch cleanly and will continue testing it. Given the simplicity of the changes, and

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 Thread Frans Pop
On Tuesday 15 January 2008, David Miller wrote: From: Frans Pop [EMAIL PROTECTED] kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Does this make the problem go away? Yes, it very much looks like that solves it. I ran with the patch for 6 hours or so without any errors. I then

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 Thread slavon
Quoting Frans Pop [EMAIL PROTECTED]: On Tuesday 15 January 2008, David Miller wrote: From: Frans Pop [EMAIL PROTECTED] kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Does this make the problem go away? Yes, it very much looks like that solves it. I ran with the patch for 6

RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 Thread Brandeburg, Jesse
[EMAIL PROTECTED] wrote: Quoting Frans Pop [EMAIL PROTECTED]: (Note this isn't the final correct patch we should apply. There is no reason why this revert back to the older -poll() logic here should have any effect on the TX hang triggering...) s/no reason/no obvious reason/ ? ;-) The

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 Thread David Miller
From: Brandeburg, Jesse [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 13:53:43 -0800 The tx code has an early exit that tries to limit the amount of tx packets handled in a single poll loop and requires napi or interrupt rescheduling based on the return value from e1000_clean_tx_irq. That explains

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-14 Thread David Miller
From: Frans Pop [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 06:25:10 +0100 kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Does this make the problem go away? (Note this isn't the final correct patch we should apply. There is no reason why this revert back to the older -poll()

[REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-14 Thread Frans Pop
After compiling v2.6.24-rc7-163-g1a1b285 (x86_64) yesterday I suddenly see this error repeatedly: kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang kernel: Tx Queue 0 kernel: TDH a kernel: TDT a kernel: next_to_use a

Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-14 Thread Frans Pop
Wow. That's fast! :-) On Tuesday 15 January 2008, David Miller wrote: From: Frans Pop [EMAIL PROTECTED] kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Does this make the problem go away? I'm compiling a kernel with the patch now. Will let you know the result. May take a

Re: e1000 Detected Tx Unit Hang

2006-09-16 Thread Paul Aviles
- Original Message - From: Jesse Brandeburg [EMAIL PROTECTED] To: Paul Aviles [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Tuesday, September 05, 2006 12:09 PM Subject: Re: e1000 Detected Tx Unit Hang On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply

Re: e1000 Detected Tx Unit Hang

2006-09-10 Thread Paul Aviles
Jesse, testing without NAPI, will see how it behaves. Paul Aviles - Original Message - From: Jesse Brandeburg [EMAIL PROTECTED] To: Paul Aviles [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Tuesday, September 05, 2006 12:09 PM Subject: Re: e1000 Detected Tx Unit Hang On 9/3/06

Re: e1000 Detected Tx Unit Hang

2006-09-05 Thread Jesse Brandeburg
On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply. Here is the stuff on /procs. The weird no problem, part is that I have several other identical systems and only one is affected. Today I moved the hard drive to another similar system and I am not seeing the

Re: e1000 Detected Tx Unit Hang

2006-09-05 Thread Paul Aviles
Detected Tx Unit Hang On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply. Here is the stuff on /procs. The weird no problem, part is that I have several other identical systems and only one is affected. Today I moved the hard drive to another similar system and I

Re: e1000 Detected Tx Unit Hang

2006-09-03 Thread Jesse Brandeburg
On 9/2/06, Paul Aviles [EMAIL PROTECTED] wrote: I am getting e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang using stock 2.6.17.11, 2.6.17.5 or 2.6.17.4 kernels on centos 4.3. The server is a Tyan GS12 ( 82541GI/PI and 82547GI) and is connected to a Netgear GS724T Gig switch. I can

Re: e1000 Detected Tx Unit Hang

2006-09-03 Thread Paul Aviles
NMI: 0 0 LOC:77158397715838 ERR: 0 MIS: 0 - Original Message - From: Jesse Brandeburg [EMAIL PROTECTED] To: Paul Aviles [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Sunday, September 03, 2006 1:45 PM Subject: Re: e1000 Detected Tx Unit Hang

e1000 Detected Tx Unit Hang

2006-09-02 Thread Paul Aviles
I am getting e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang using stock 2.6.17.11, 2.6.17.5 or 2.6.17.4 kernels on centos 4.3. The server is a Tyan GS12 ( 82541GI/PI and 82547GI) and is connected to a Netgear GS724T Gig switch. I can easily reproduce the problem by trying to do a

Re: e1000 Detected Tx Unit Hang

2006-09-01 Thread Auke Kok
Paul Aviles wrote: I am getting e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang using stock 2.6.17.11, 2.6.17.5 or 2.6.17.4 kernels on centos 4.3. The server is a Tyan GS10 and is connected to a Netgear GS724T Gig switch. I can easily reproduce the problem by trying to do a large ftp