Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
> On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
>> Pavel Machek wrote:
>>> Hi!
>>>
> I have the famous e1000 latency problems:
>
> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>
> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> checksum problems, which I fixed by the update, but problems are not
> gone.
 pavel, start using "e1000e" instead - this driver replaces e1000 for all 
 the
 pci-express devices and has the infamous L1 ASPM disable patch to
 fix this issue.
>>> Ok, e1000e seems to work for me.
>>>
>>> In another email, you asked for lspci - of failing e1000
>>> case. Should I still provide it?
>> well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
>> whereas with e1000 it is still enabled. That's the fix that you need...
> 
> Is there easy way to push that fix to e1000, too? Or print "use e1000e
> instead" and refuse to load?

well we're going to delete all pci-e related code from this driver soon anyway,
but I am indeed writing a patch right now that prints out this warning...

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Pavel Machek
On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
> Pavel Machek wrote:
> > Hi!
> > 
> >>> I have the famous e1000 latency problems:
> >>>
> >>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
> >>>
> >>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> >>> checksum problems, which I fixed by the update, but problems are not
> >>> gone.
> >> pavel, start using "e1000e" instead - this driver replaces e1000 for all 
> >> the
> >> pci-express devices and has the infamous L1 ASPM disable patch to
> >> fix this issue.
> > 
> > Ok, e1000e seems to work for me.
> > 
> > In another email, you asked for lspci - of failing e1000
> > case. Should I still provide it?
> 
> well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
> whereas with e1000 it is still enabled. That's the fix that you need...

Is there easy way to push that fix to e1000, too? Or print "use e1000e
instead" and refuse to load?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
> Hi!
> 
>>> I have the famous e1000 latency problems:
>>>
>>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>>>
>>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
>>> checksum problems, which I fixed by the update, but problems are not
>>> gone.
>> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
>> pci-express devices and has the infamous L1 ASPM disable patch to
>> fix this issue.
> 
> Ok, e1000e seems to work for me.
> 
> In another email, you asked for lspci - of failing e1000
> case. Should I still provide it?

well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
whereas with e1000 it is still enabled. That's the fix that you need...

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Pavel Machek
Hi!

> > I have the famous e1000 latency problems:
> > 
> > 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> > 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> > 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> > 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> > 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> > 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> > 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
> > 
> > ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> > checksum problems, which I fixed by the update, but problems are not
> > gone.
> 
> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
> pci-express devices and has the infamous L1 ASPM disable patch to
> fix this issue.

Ok, e1000e seems to work for me.

In another email, you asked for lspci - of failing e1000
case. Should I still provide it?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Max Krasnyansky
Kok, Auke wrote:
> Max Krasnyansky wrote:
>> Kok, Auke wrote:
>>> Max Krasnyansky wrote:
 Kok, Auke wrote:
> Max Krasnyansky wrote:
>> So you don't think it's related to the interrupt coalescing by any 
>> chance ?
>> I'd suggest to try and disable the coalescing and see if it makes any 
>> difference.
>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
>> second) though.
>>
>> Add this to modprobe.conf and reload e1000 module
>>
>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
>> TxIntDelay=0,0 TxAbsIntDelay=0,0
> that can't be the problem. irq moderation would only account for 2-3ms 
> variance
> maximum.
 Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
 Plus you're talking
 about the case when coalescing logic is working as designed ;-). What if 
 there is some kind of 
 bug where timer did not expire or something.
>>> we don't use a software timer in e1000 irq coalescing/moderation, it's all 
>>> in
>>> hardware, so we don't have that problem at all. And I certainly have never 
>>> seen
>>> anything you are referring to with e1000 hardware, and I do not know of any 
>>> bug
>>> related to this.
>>>
>>> are you maybe confused with other hardware ?
>>>
>>> feel free to demonstrate an example...
>> Just to give you a background. I wrote and maintain http://libe1000.sf.net
>> So I know E1000 HW and SW in and out.
> 
> wow, even I do not dare to say that!
Ok maybe that was a bit of an overstatement :). 

>> And no I'm not confused with other HW and I know that we're
>> not using SW timers for the coalescing. HW can be buggy as well. Note that 
>> I'm not saying that I
>> know for sure that the problem is coalescing, I'm just suggesting to take it 
>> out of the equation
>> while Pavel is investigating.
>>
>> Unfortunately I cannot demonstrate an example but I've seen unexplained 
>> packet delays in the range 
>> of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
>> labs). Once coalescing 
>> was disabled those problems have gone away.
> 
> this sounds like you have some sort of PCI POST-ing problem and those can 
> indeed
> be worse if you use any form of interrupt coalescing. In any case that is 
> largely
> irrelevant to the in-kernel drivers, and as I said we definately have no open
> issues on that right now, and I really do not recollect any as well either 
> (other
> than the issue of interference when both ends are irq coalescing)
I was actually talking about in kernel drivers. ie We were seeing delays with 
TIPC running over in
kernel E1000 driver. And no it was not a TIPC issue, everything worked fine 
with over TG3 and issues
went away when coalescing was disabled. 
Anyway, I think we can drop this subject.

Max


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Max Krasnyansky wrote:
> 
> Kok, Auke wrote:
>> Max Krasnyansky wrote:
>>> Kok, Auke wrote:
 Max Krasnyansky wrote:
> So you don't think it's related to the interrupt coalescing by any chance 
> ?
> I'd suggest to try and disable the coalescing and see if it makes any 
> difference.
> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
> second) though.
>
> Add this to modprobe.conf and reload e1000 module
>
> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
> TxIntDelay=0,0 TxAbsIntDelay=0,0
 that can't be the problem. irq moderation would only account for 2-3ms 
 variance
 maximum.
>>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
>>> Plus you're talking
>>> about the case when coalescing logic is working as designed ;-). What if 
>>> there is some kind of 
>>> bug where timer did not expire or something.
>> we don't use a software timer in e1000 irq coalescing/moderation, it's all in
>> hardware, so we don't have that problem at all. And I certainly have never 
>> seen
>> anything you are referring to with e1000 hardware, and I do not know of any 
>> bug
>> related to this.
>>
>> are you maybe confused with other hardware ?
>>
>> feel free to demonstrate an example...
> 
> Just to give you a background. I wrote and maintain http://libe1000.sf.net
> So I know E1000 HW and SW in and out.

wow, even I do not dare to say that!

> And no I'm not confused with other HW and I know that we're
> not using SW timers for the coalescing. HW can be buggy as well. Note that 
> I'm not saying that I
> know for sure that the problem is coalescing, I'm just suggesting to take it 
> out of the equation
> while Pavel is investigating.
> 
> Unfortunately I cannot demonstrate an example but I've seen unexplained 
> packet delays in the range 
> of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
> labs). Once coalescing 
> was disabled those problems have gone away.

this sounds like you have some sort of PCI POST-ing problem and those can indeed
be worse if you use any form of interrupt coalescing. In any case that is 
largely
irrelevant to the in-kernel drivers, and as I said we definately have no open
issues on that right now, and I really do not recollect any as well either 
(other
than the issue of interference when both ends are irq coalescing)

Cheers,

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Max Krasnyansky


Kok, Auke wrote:
> Max Krasnyansky wrote:
>> Kok, Auke wrote:
>>> Max Krasnyansky wrote:
 So you don't think it's related to the interrupt coalescing by any chance ?
 I'd suggest to try and disable the coalescing and see if it makes any 
 difference.
 We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
 second) though.

 Add this to modprobe.conf and reload e1000 module

 options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
 TxIntDelay=0,0 TxAbsIntDelay=0,0
>>> that can't be the problem. irq moderation would only account for 2-3ms 
>>> variance
>>> maximum.
>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
>> Plus you're talking
>> about the case when coalescing logic is working as designed ;-). What if 
>> there is some kind of 
>> bug where timer did not expire or something.
> 
> we don't use a software timer in e1000 irq coalescing/moderation, it's all in
> hardware, so we don't have that problem at all. And I certainly have never 
> seen
> anything you are referring to with e1000 hardware, and I do not know of any 
> bug
> related to this.
> 
> are you maybe confused with other hardware ?
> 
> feel free to demonstrate an example...

Just to give you a background. I wrote and maintain http://libe1000.sf.net
So I know E1000 HW and SW in and out. And no I'm not confused with other HW and 
I know that we're 
not using SW timers for the coalescing. HW can be buggy as well. Note that I'm 
not saying that I
know for sure that the problem is coalescing, I'm just suggesting to take it 
out of the equation
while Pavel is investigating.

Unfortunately I cannot demonstrate an example but I've seen unexplained packet 
delays in the range 
of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
labs). Once coalescing 
was disabled those problems have gone away.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
> Hi!
> 
> I have the famous e1000 latency problems:
> 
> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
> 
> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> checksum problems, which I fixed by the update, but problems are not
> gone.

pavel, start using "e1000e" instead - this driver replaces e1000 for all the
pci-express devices and has the infamous L1 ASPM disable patch to fix this 
issue.

make sure you have CONFIG_E1000E=m/y in your .config, otherwise the old e1000 
code
will drive your card, and that driver does not have the fix.

BAH, this is a good example how Linus' patch can wreak havoc - a lot of people
will now not see fixes since they only go into e1000e, but people can unnoticed
now go and use e1000 for too long...

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Max Krasnyansky wrote:
> Kok, Auke wrote:
>> Max Krasnyansky wrote:
>>> So you don't think it's related to the interrupt coalescing by any chance ?
>>> I'd suggest to try and disable the coalescing and see if it makes any 
>>> difference.
>>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
>>> second) though.
>>>
>>> Add this to modprobe.conf and reload e1000 module
>>>
>>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
>>> TxIntDelay=0,0 TxAbsIntDelay=0,0
>> that can't be the problem. irq moderation would only account for 2-3ms 
>> variance
>> maximum.
> Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
> Plus you're talking
> about the case when coalescing logic is working as designed ;-). What if 
> there is some kind of 
> bug where timer did not expire or something.

we don't use a software timer in e1000 irq coalescing/moderation, it's all in
hardware, so we don't have that problem at all. And I certainly have never seen
anything you are referring to with e1000 hardware, and I do not know of any bug
related to this.

are you maybe confused with other hardware?

feel free to demonstrate an example...

Cheers,

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Max Krasnyansky
Kok, Auke wrote:
> Max Krasnyansky wrote:
>> So you don't think it's related to the interrupt coalescing by any chance ?
>> I'd suggest to try and disable the coalescing and see if it makes any 
>> difference.
>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
>> second) though.
>>
>> Add this to modprobe.conf and reload e1000 module
>>
>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
>> TxIntDelay=0,0 TxAbsIntDelay=0,0
> 
> that can't be the problem. irq moderation would only account for 2-3ms 
> variance
> maximum.
Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus 
you're talking
about the case when coalescing logic is working as designed ;-). What if there 
is some kind of 
bug where timer did not expire or something.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Max Krasnyansky wrote:
> Pavel Machek wrote:
>> Hi!
>>
>> I have the famous e1000 latency problems:
>>
>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>>
>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
>> checksum problems, which I fixed by the update, but problems are not
>> gone.
>>
>> irqpoll helps.
>>
>> nosmp (which implies XT-PIC is being used) does not help.
>>
>>  16:   1925  0   IO-APIC-fasteoi   ahci, yenta, uhci_hcd:usb2, 
>> eth0
>>
>> Booting kernel with nosmp/ no yenta, no usb does not help.
>>
>> Hmm, as expected, interrupt load on ahci (find /) makes latencies go
>> away.
>>
>> It should be easily reproducible on x60 with latest bios, it is 100%
>> reproducible for me...
> 
> So you don't think it's related to the interrupt coalescing by any chance ?
> I'd suggest to try and disable the coalescing and see if it makes any 
> difference.
> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
> second) though.
> 
> Add this to modprobe.conf and reload e1000 module
> 
> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
> TxIntDelay=0,0 TxAbsIntDelay=0,0

that can't be the problem. irq moderation would only account for 2-3ms variance
maximum.

Pavel, can you send me the `lspci -vvv` of your machine with the very latest git
tree and after it's showing the poor ping performance?

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Max Krasnyansky
Kok, Auke wrote:
 Max Krasnyansky wrote:
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 So you don't think it's related to the interrupt coalescing by any 
 chance ?
 I'd suggest to try and disable the coalescing and see if it makes any 
 difference.
 We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
 second) though.

 Add this to modprobe.conf and reload e1000 module

 options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
 TxIntDelay=0,0 TxAbsIntDelay=0,0
 that can't be the problem. irq moderation would only account for 2-3ms 
 variance
 maximum.
 Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
 Plus you're talking
 about the case when coalescing logic is working as designed ;-). What if 
 there is some kind of 
 bug where timer did not expire or something.
 we don't use a software timer in e1000 irq coalescing/moderation, it's all 
 in
 hardware, so we don't have that problem at all. And I certainly have never 
 seen
 anything you are referring to with e1000 hardware, and I do not know of any 
 bug
 related to this.

 are you maybe confused with other hardware ?

 feel free to demonstrate an example...
 Just to give you a background. I wrote and maintain http://libe1000.sf.net
 So I know E1000 HW and SW in and out.
 
 wow, even I do not dare to say that!
Ok maybe that was a bit of an overstatement :). 

 And no I'm not confused with other HW and I know that we're
 not using SW timers for the coalescing. HW can be buggy as well. Note that 
 I'm not saying that I
 know for sure that the problem is coalescing, I'm just suggesting to take it 
 out of the equation
 while Pavel is investigating.

 Unfortunately I cannot demonstrate an example but I've seen unexplained 
 packet delays in the range 
 of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
 labs). Once coalescing 
 was disabled those problems have gone away.
 
 this sounds like you have some sort of PCI POST-ing problem and those can 
 indeed
 be worse if you use any form of interrupt coalescing. In any case that is 
 largely
 irrelevant to the in-kernel drivers, and as I said we definately have no open
 issues on that right now, and I really do not recollect any as well either 
 (other
 than the issue of interference when both ends are irq coalescing)
I was actually talking about in kernel drivers. ie We were seeing delays with 
TIPC running over in
kernel E1000 driver. And no it was not a TIPC issue, everything worked fine 
with over TG3 and issues
went away when coalescing was disabled. 
Anyway, I think we can drop this subject.

Max


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
 Hi!
 
 I have the famous e1000 latency problems:
 
 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
 
 ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
 checksum problems, which I fixed by the update, but problems are not
 gone.

pavel, start using e1000e instead - this driver replaces e1000 for all the
pci-express devices and has the infamous L1 ASPM disable patch to fix this 
issue.

make sure you have CONFIG_E1000E=m/y in your .config, otherwise the old e1000 
code
will drive your card, and that driver does not have the fix.

BAH, this is a good example how Linus' patch can wreak havoc - a lot of people
will now not see fixes since they only go into e1000e, but people can unnoticed
now go and use e1000 for too long...

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Max Krasnyansky


Kok, Auke wrote:
 Max Krasnyansky wrote:
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 So you don't think it's related to the interrupt coalescing by any chance ?
 I'd suggest to try and disable the coalescing and see if it makes any 
 difference.
 We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
 second) though.

 Add this to modprobe.conf and reload e1000 module

 options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
 TxIntDelay=0,0 TxAbsIntDelay=0,0
 that can't be the problem. irq moderation would only account for 2-3ms 
 variance
 maximum.
 Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
 Plus you're talking
 about the case when coalescing logic is working as designed ;-). What if 
 there is some kind of 
 bug where timer did not expire or something.
 
 we don't use a software timer in e1000 irq coalescing/moderation, it's all in
 hardware, so we don't have that problem at all. And I certainly have never 
 seen
 anything you are referring to with e1000 hardware, and I do not know of any 
 bug
 related to this.
 
 are you maybe confused with other hardware ?
 
 feel free to demonstrate an example...

Just to give you a background. I wrote and maintain http://libe1000.sf.net
So I know E1000 HW and SW in and out. And no I'm not confused with other HW and 
I know that we're 
not using SW timers for the coalescing. HW can be buggy as well. Note that I'm 
not saying that I
know for sure that the problem is coalescing, I'm just suggesting to take it 
out of the equation
while Pavel is investigating.

Unfortunately I cannot demonstrate an example but I've seen unexplained packet 
delays in the range 
of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
labs). Once coalescing 
was disabled those problems have gone away.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Max Krasnyansky wrote:
 
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 So you don't think it's related to the interrupt coalescing by any chance 
 ?
 I'd suggest to try and disable the coalescing and see if it makes any 
 difference.
 We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
 second) though.

 Add this to modprobe.conf and reload e1000 module

 options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
 TxIntDelay=0,0 TxAbsIntDelay=0,0
 that can't be the problem. irq moderation would only account for 2-3ms 
 variance
 maximum.
 Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
 Plus you're talking
 about the case when coalescing logic is working as designed ;-). What if 
 there is some kind of 
 bug where timer did not expire or something.
 we don't use a software timer in e1000 irq coalescing/moderation, it's all in
 hardware, so we don't have that problem at all. And I certainly have never 
 seen
 anything you are referring to with e1000 hardware, and I do not know of any 
 bug
 related to this.

 are you maybe confused with other hardware ?

 feel free to demonstrate an example...
 
 Just to give you a background. I wrote and maintain http://libe1000.sf.net
 So I know E1000 HW and SW in and out.

wow, even I do not dare to say that!

 And no I'm not confused with other HW and I know that we're
 not using SW timers for the coalescing. HW can be buggy as well. Note that 
 I'm not saying that I
 know for sure that the problem is coalescing, I'm just suggesting to take it 
 out of the equation
 while Pavel is investigating.
 
 Unfortunately I cannot demonstrate an example but I've seen unexplained 
 packet delays in the range 
 of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my 
 labs). Once coalescing 
 was disabled those problems have gone away.

this sounds like you have some sort of PCI POST-ing problem and those can indeed
be worse if you use any form of interrupt coalescing. In any case that is 
largely
irrelevant to the in-kernel drivers, and as I said we definately have no open
issues on that right now, and I really do not recollect any as well either 
(other
than the issue of interference when both ends are irq coalescing)

Cheers,

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Max Krasnyansky wrote:
 Kok, Auke wrote:
 Max Krasnyansky wrote:
 So you don't think it's related to the interrupt coalescing by any chance ?
 I'd suggest to try and disable the coalescing and see if it makes any 
 difference.
 We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 
 second) though.

 Add this to modprobe.conf and reload e1000 module

 options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 
 TxIntDelay=0,0 TxAbsIntDelay=0,0
 that can't be the problem. irq moderation would only account for 2-3ms 
 variance
 maximum.
 Oh, I've definitely seen worse than that. Not as bad as a 1second though. 
 Plus you're talking
 about the case when coalescing logic is working as designed ;-). What if 
 there is some kind of 
 bug where timer did not expire or something.

we don't use a software timer in e1000 irq coalescing/moderation, it's all in
hardware, so we don't have that problem at all. And I certainly have never seen
anything you are referring to with e1000 hardware, and I do not know of any bug
related to this.

are you maybe confused with other hardware?

feel free to demonstrate an example...

Cheers,

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Pavel Machek
On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
 Pavel Machek wrote:
  Hi!
  
  I have the famous e1000 latency problems:
 
  64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
  64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
  64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
  64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
  64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
  64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
  64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
 
  ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
  checksum problems, which I fixed by the update, but problems are not
  gone.
  pavel, start using e1000e instead - this driver replaces e1000 for all 
  the
  pci-express devices and has the infamous L1 ASPM disable patch to
  fix this issue.
  
  Ok, e1000e seems to work for me.
  
  In another email, you asked for lspci - of failing e1000
  case. Should I still provide it?
 
 well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
 whereas with e1000 it is still enabled. That's the fix that you need...

Is there easy way to push that fix to e1000, too? Or print use e1000e
instead and refuse to load?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
 Hi!
 
 I have the famous e1000 latency problems:

 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms

 ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
 checksum problems, which I fixed by the update, but problems are not
 gone.
 pavel, start using e1000e instead - this driver replaces e1000 for all the
 pci-express devices and has the infamous L1 ASPM disable patch to
 fix this issue.
 
 Ok, e1000e seems to work for me.
 
 In another email, you asked for lspci - of failing e1000
 case. Should I still provide it?

well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
whereas with e1000 it is still enabled. That's the fix that you need...

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Pavel Machek
Hi!

  I have the famous e1000 latency problems:
  
  64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
  64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
  64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
  64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
  64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
  64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
  64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
  
  ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
  checksum problems, which I fixed by the update, but problems are not
  gone.
 
 pavel, start using e1000e instead - this driver replaces e1000 for all the
 pci-express devices and has the infamous L1 ASPM disable patch to
 fix this issue.

Ok, e1000e seems to work for me.

In another email, you asked for lspci - of failing e1000
case. Should I still provide it?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] e1000 1sec latency problem

2008-02-07 Thread Kok, Auke
Pavel Machek wrote:
 On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
 Pavel Machek wrote:
 Hi!

 I have the famous e1000 latency problems:

 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms

 ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
 checksum problems, which I fixed by the update, but problems are not
 gone.
 pavel, start using e1000e instead - this driver replaces e1000 for all 
 the
 pci-express devices and has the infamous L1 ASPM disable patch to
 fix this issue.
 Ok, e1000e seems to work for me.

 In another email, you asked for lspci - of failing e1000
 case. Should I still provide it?
 well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
 whereas with e1000 it is still enabled. That's the fix that you need...
 
 Is there easy way to push that fix to e1000, too? Or print use e1000e
 instead and refuse to load?

well we're going to delete all pci-e related code from this driver soon anyway,
but I am indeed writing a patch right now that prints out this warning...

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/