Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: > On Thu 2008-02-07 14:32:16, Kok, Auke wrote: >> Pavel Machek wrote: >>> Hi! >>> > I have the famous e1000 latency problems: > > 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms > 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms > 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms > 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms > 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms > 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms > 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms > > ...and they are still there in 2.6.25-git0. I had ethernet EEPROM > checksum problems, which I fixed by the update, but problems are not > gone. pavel, start using "e1000e" instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. >>> Ok, e1000e seems to work for me. >>> >>> In another email, you asked for lspci - of failing e1000 >>> case. Should I still provide it? >> well, if you do it you should see that L1 ASPM is now disabled (with e1000e) >> whereas with e1000 it is still enabled. That's the fix that you need... > > Is there easy way to push that fix to e1000, too? Or print "use e1000e > instead" and refuse to load? well we're going to delete all pci-e related code from this driver soon anyway, but I am indeed writing a patch right now that prints out this warning... Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
On Thu 2008-02-07 14:32:16, Kok, Auke wrote: > Pavel Machek wrote: > > Hi! > > > >>> I have the famous e1000 latency problems: > >>> > >>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms > >>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms > >>> > >>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM > >>> checksum problems, which I fixed by the update, but problems are not > >>> gone. > >> pavel, start using "e1000e" instead - this driver replaces e1000 for all > >> the > >> pci-express devices and has the infamous L1 ASPM disable patch to > >> fix this issue. > > > > Ok, e1000e seems to work for me. > > > > In another email, you asked for lspci - of failing e1000 > > case. Should I still provide it? > > well, if you do it you should see that L1 ASPM is now disabled (with e1000e) > whereas with e1000 it is still enabled. That's the fix that you need... Is there easy way to push that fix to e1000, too? Or print "use e1000e instead" and refuse to load? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: > Hi! > >>> I have the famous e1000 latency problems: >>> >>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms >>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms >>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms >>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms >>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms >>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms >>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms >>> >>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM >>> checksum problems, which I fixed by the update, but problems are not >>> gone. >> pavel, start using "e1000e" instead - this driver replaces e1000 for all the >> pci-express devices and has the infamous L1 ASPM disable patch to >> fix this issue. > > Ok, e1000e seems to work for me. > > In another email, you asked for lspci - of failing e1000 > case. Should I still provide it? well, if you do it you should see that L1 ASPM is now disabled (with e1000e) whereas with e1000 it is still enabled. That's the fix that you need... Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Hi! > > I have the famous e1000 latency problems: > > > > 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms > > 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms > > 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms > > 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms > > 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms > > 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms > > 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms > > > > ...and they are still there in 2.6.25-git0. I had ethernet EEPROM > > checksum problems, which I fixed by the update, but problems are not > > gone. > > pavel, start using "e1000e" instead - this driver replaces e1000 for all the > pci-express devices and has the infamous L1 ASPM disable patch to > fix this issue. Ok, e1000e seems to work for me. In another email, you asked for lspci - of failing e1000 case. Should I still provide it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Kok, Auke wrote: > Max Krasnyansky wrote: >> Kok, Auke wrote: >>> Max Krasnyansky wrote: Kok, Auke wrote: > Max Krasnyansky wrote: >> So you don't think it's related to the interrupt coalescing by any >> chance ? >> I'd suggest to try and disable the coalescing and see if it makes any >> difference. >> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 >> second) though. >> >> Add this to modprobe.conf and reload e1000 module >> >> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 >> TxIntDelay=0,0 TxAbsIntDelay=0,0 > that can't be the problem. irq moderation would only account for 2-3ms > variance > maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. >>> we don't use a software timer in e1000 irq coalescing/moderation, it's all >>> in >>> hardware, so we don't have that problem at all. And I certainly have never >>> seen >>> anything you are referring to with e1000 hardware, and I do not know of any >>> bug >>> related to this. >>> >>> are you maybe confused with other hardware ? >>> >>> feel free to demonstrate an example... >> Just to give you a background. I wrote and maintain http://libe1000.sf.net >> So I know E1000 HW and SW in and out. > > wow, even I do not dare to say that! Ok maybe that was a bit of an overstatement :). >> And no I'm not confused with other HW and I know that we're >> not using SW timers for the coalescing. HW can be buggy as well. Note that >> I'm not saying that I >> know for sure that the problem is coalescing, I'm just suggesting to take it >> out of the equation >> while Pavel is investigating. >> >> Unfortunately I cannot demonstrate an example but I've seen unexplained >> packet delays in the range >> of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my >> labs). Once coalescing >> was disabled those problems have gone away. > > this sounds like you have some sort of PCI POST-ing problem and those can > indeed > be worse if you use any form of interrupt coalescing. In any case that is > largely > irrelevant to the in-kernel drivers, and as I said we definately have no open > issues on that right now, and I really do not recollect any as well either > (other > than the issue of interference when both ends are irq coalescing) I was actually talking about in kernel drivers. ie We were seeing delays with TIPC running over in kernel E1000 driver. And no it was not a TIPC issue, everything worked fine with over TG3 and issues went away when coalescing was disabled. Anyway, I think we can drop this subject. Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Max Krasnyansky wrote: > > Kok, Auke wrote: >> Max Krasnyansky wrote: >>> Kok, Auke wrote: Max Krasnyansky wrote: > So you don't think it's related to the interrupt coalescing by any chance > ? > I'd suggest to try and disable the coalescing and see if it makes any > difference. > We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 > second) though. > > Add this to modprobe.conf and reload e1000 module > > options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 > TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. >>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. >>> Plus you're talking >>> about the case when coalescing logic is working as designed ;-). What if >>> there is some kind of >>> bug where timer did not expire or something. >> we don't use a software timer in e1000 irq coalescing/moderation, it's all in >> hardware, so we don't have that problem at all. And I certainly have never >> seen >> anything you are referring to with e1000 hardware, and I do not know of any >> bug >> related to this. >> >> are you maybe confused with other hardware ? >> >> feel free to demonstrate an example... > > Just to give you a background. I wrote and maintain http://libe1000.sf.net > So I know E1000 HW and SW in and out. wow, even I do not dare to say that! > And no I'm not confused with other HW and I know that we're > not using SW timers for the coalescing. HW can be buggy as well. Note that > I'm not saying that I > know for sure that the problem is coalescing, I'm just suggesting to take it > out of the equation > while Pavel is investigating. > > Unfortunately I cannot demonstrate an example but I've seen unexplained > packet delays in the range > of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my > labs). Once coalescing > was disabled those problems have gone away. this sounds like you have some sort of PCI POST-ing problem and those can indeed be worse if you use any form of interrupt coalescing. In any case that is largely irrelevant to the in-kernel drivers, and as I said we definately have no open issues on that right now, and I really do not recollect any as well either (other than the issue of interference when both ends are irq coalescing) Cheers, Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Kok, Auke wrote: > Max Krasnyansky wrote: >> Kok, Auke wrote: >>> Max Krasnyansky wrote: So you don't think it's related to the interrupt coalescing by any chance ? I'd suggest to try and disable the coalescing and see if it makes any difference. We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though. Add this to modprobe.conf and reload e1000 module options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0 >>> that can't be the problem. irq moderation would only account for 2-3ms >>> variance >>> maximum. >> Oh, I've definitely seen worse than that. Not as bad as a 1second though. >> Plus you're talking >> about the case when coalescing logic is working as designed ;-). What if >> there is some kind of >> bug where timer did not expire or something. > > we don't use a software timer in e1000 irq coalescing/moderation, it's all in > hardware, so we don't have that problem at all. And I certainly have never > seen > anything you are referring to with e1000 hardware, and I do not know of any > bug > related to this. > > are you maybe confused with other hardware ? > > feel free to demonstrate an example... Just to give you a background. I wrote and maintain http://libe1000.sf.net So I know E1000 HW and SW in and out. And no I'm not confused with other HW and I know that we're not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation while Pavel is investigating. Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing was disabled those problems have gone away. Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: > Hi! > > I have the famous e1000 latency problems: > > 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms > 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms > 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms > 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms > 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms > 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms > 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms > > ...and they are still there in 2.6.25-git0. I had ethernet EEPROM > checksum problems, which I fixed by the update, but problems are not > gone. pavel, start using "e1000e" instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. make sure you have CONFIG_E1000E=m/y in your .config, otherwise the old e1000 code will drive your card, and that driver does not have the fix. BAH, this is a good example how Linus' patch can wreak havoc - a lot of people will now not see fixes since they only go into e1000e, but people can unnoticed now go and use e1000 for too long... Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Max Krasnyansky wrote: > Kok, Auke wrote: >> Max Krasnyansky wrote: >>> So you don't think it's related to the interrupt coalescing by any chance ? >>> I'd suggest to try and disable the coalescing and see if it makes any >>> difference. >>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 >>> second) though. >>> >>> Add this to modprobe.conf and reload e1000 module >>> >>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 >>> TxIntDelay=0,0 TxAbsIntDelay=0,0 >> that can't be the problem. irq moderation would only account for 2-3ms >> variance >> maximum. > Oh, I've definitely seen worse than that. Not as bad as a 1second though. > Plus you're talking > about the case when coalescing logic is working as designed ;-). What if > there is some kind of > bug where timer did not expire or something. we don't use a software timer in e1000 irq coalescing/moderation, it's all in hardware, so we don't have that problem at all. And I certainly have never seen anything you are referring to with e1000 hardware, and I do not know of any bug related to this. are you maybe confused with other hardware? feel free to demonstrate an example... Cheers, Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Kok, Auke wrote: > Max Krasnyansky wrote: >> So you don't think it's related to the interrupt coalescing by any chance ? >> I'd suggest to try and disable the coalescing and see if it makes any >> difference. >> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 >> second) though. >> >> Add this to modprobe.conf and reload e1000 module >> >> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 >> TxIntDelay=0,0 TxAbsIntDelay=0,0 > > that can't be the problem. irq moderation would only account for 2-3ms > variance > maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Max Krasnyansky wrote: > Pavel Machek wrote: >> Hi! >> >> I have the famous e1000 latency problems: >> >> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms >> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms >> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms >> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms >> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms >> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms >> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms >> >> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM >> checksum problems, which I fixed by the update, but problems are not >> gone. >> >> irqpoll helps. >> >> nosmp (which implies XT-PIC is being used) does not help. >> >> 16: 1925 0 IO-APIC-fasteoi ahci, yenta, uhci_hcd:usb2, >> eth0 >> >> Booting kernel with nosmp/ no yenta, no usb does not help. >> >> Hmm, as expected, interrupt load on ahci (find /) makes latencies go >> away. >> >> It should be easily reproducible on x60 with latest bios, it is 100% >> reproducible for me... > > So you don't think it's related to the interrupt coalescing by any chance ? > I'd suggest to try and disable the coalescing and see if it makes any > difference. > We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 > second) though. > > Add this to modprobe.conf and reload e1000 module > > options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 > TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. Pavel, can you send me the `lspci -vvv` of your machine with the very latest git tree and after it's showing the poor ping performance? Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Kok, Auke wrote: Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: So you don't think it's related to the interrupt coalescing by any chance ? I'd suggest to try and disable the coalescing and see if it makes any difference. We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though. Add this to modprobe.conf and reload e1000 module options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. we don't use a software timer in e1000 irq coalescing/moderation, it's all in hardware, so we don't have that problem at all. And I certainly have never seen anything you are referring to with e1000 hardware, and I do not know of any bug related to this. are you maybe confused with other hardware ? feel free to demonstrate an example... Just to give you a background. I wrote and maintain http://libe1000.sf.net So I know E1000 HW and SW in and out. wow, even I do not dare to say that! Ok maybe that was a bit of an overstatement :). And no I'm not confused with other HW and I know that we're not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation while Pavel is investigating. Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing was disabled those problems have gone away. this sounds like you have some sort of PCI POST-ing problem and those can indeed be worse if you use any form of interrupt coalescing. In any case that is largely irrelevant to the in-kernel drivers, and as I said we definately have no open issues on that right now, and I really do not recollect any as well either (other than the issue of interference when both ends are irq coalescing) I was actually talking about in kernel drivers. ie We were seeing delays with TIPC running over in kernel E1000 driver. And no it was not a TIPC issue, everything worked fine with over TG3 and issues went away when coalescing was disabled. Anyway, I think we can drop this subject. Max -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: Hi! I have the famous e1000 latency problems: 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms ...and they are still there in 2.6.25-git0. I had ethernet EEPROM checksum problems, which I fixed by the update, but problems are not gone. pavel, start using e1000e instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. make sure you have CONFIG_E1000E=m/y in your .config, otherwise the old e1000 code will drive your card, and that driver does not have the fix. BAH, this is a good example how Linus' patch can wreak havoc - a lot of people will now not see fixes since they only go into e1000e, but people can unnoticed now go and use e1000 for too long... Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Kok, Auke wrote: Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: So you don't think it's related to the interrupt coalescing by any chance ? I'd suggest to try and disable the coalescing and see if it makes any difference. We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though. Add this to modprobe.conf and reload e1000 module options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. we don't use a software timer in e1000 irq coalescing/moderation, it's all in hardware, so we don't have that problem at all. And I certainly have never seen anything you are referring to with e1000 hardware, and I do not know of any bug related to this. are you maybe confused with other hardware ? feel free to demonstrate an example... Just to give you a background. I wrote and maintain http://libe1000.sf.net So I know E1000 HW and SW in and out. And no I'm not confused with other HW and I know that we're not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation while Pavel is investigating. Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing was disabled those problems have gone away. Max -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: So you don't think it's related to the interrupt coalescing by any chance ? I'd suggest to try and disable the coalescing and see if it makes any difference. We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though. Add this to modprobe.conf and reload e1000 module options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. we don't use a software timer in e1000 irq coalescing/moderation, it's all in hardware, so we don't have that problem at all. And I certainly have never seen anything you are referring to with e1000 hardware, and I do not know of any bug related to this. are you maybe confused with other hardware ? feel free to demonstrate an example... Just to give you a background. I wrote and maintain http://libe1000.sf.net So I know E1000 HW and SW in and out. wow, even I do not dare to say that! And no I'm not confused with other HW and I know that we're not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation while Pavel is investigating. Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing was disabled those problems have gone away. this sounds like you have some sort of PCI POST-ing problem and those can indeed be worse if you use any form of interrupt coalescing. In any case that is largely irrelevant to the in-kernel drivers, and as I said we definately have no open issues on that right now, and I really do not recollect any as well either (other than the issue of interference when both ends are irq coalescing) Cheers, Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Max Krasnyansky wrote: Kok, Auke wrote: Max Krasnyansky wrote: So you don't think it's related to the interrupt coalescing by any chance ? I'd suggest to try and disable the coalescing and see if it makes any difference. We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though. Add this to modprobe.conf and reload e1000 module options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0 that can't be the problem. irq moderation would only account for 2-3ms variance maximum. Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking about the case when coalescing logic is working as designed ;-). What if there is some kind of bug where timer did not expire or something. we don't use a software timer in e1000 irq coalescing/moderation, it's all in hardware, so we don't have that problem at all. And I certainly have never seen anything you are referring to with e1000 hardware, and I do not know of any bug related to this. are you maybe confused with other hardware? feel free to demonstrate an example... Cheers, Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
On Thu 2008-02-07 14:32:16, Kok, Auke wrote: Pavel Machek wrote: Hi! I have the famous e1000 latency problems: 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms ...and they are still there in 2.6.25-git0. I had ethernet EEPROM checksum problems, which I fixed by the update, but problems are not gone. pavel, start using e1000e instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. Ok, e1000e seems to work for me. In another email, you asked for lspci - of failing e1000 case. Should I still provide it? well, if you do it you should see that L1 ASPM is now disabled (with e1000e) whereas with e1000 it is still enabled. That's the fix that you need... Is there easy way to push that fix to e1000, too? Or print use e1000e instead and refuse to load? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: Hi! I have the famous e1000 latency problems: 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms ...and they are still there in 2.6.25-git0. I had ethernet EEPROM checksum problems, which I fixed by the update, but problems are not gone. pavel, start using e1000e instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. Ok, e1000e seems to work for me. In another email, you asked for lspci - of failing e1000 case. Should I still provide it? well, if you do it you should see that L1 ASPM is now disabled (with e1000e) whereas with e1000 it is still enabled. That's the fix that you need... Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Hi! I have the famous e1000 latency problems: 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms ...and they are still there in 2.6.25-git0. I had ethernet EEPROM checksum problems, which I fixed by the update, but problems are not gone. pavel, start using e1000e instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. Ok, e1000e seems to work for me. In another email, you asked for lspci - of failing e1000 case. Should I still provide it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] e1000 1sec latency problem
Pavel Machek wrote: On Thu 2008-02-07 14:32:16, Kok, Auke wrote: Pavel Machek wrote: Hi! I have the famous e1000 latency problems: 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms ...and they are still there in 2.6.25-git0. I had ethernet EEPROM checksum problems, which I fixed by the update, but problems are not gone. pavel, start using e1000e instead - this driver replaces e1000 for all the pci-express devices and has the infamous L1 ASPM disable patch to fix this issue. Ok, e1000e seems to work for me. In another email, you asked for lspci - of failing e1000 case. Should I still provide it? well, if you do it you should see that L1 ASPM is now disabled (with e1000e) whereas with e1000 it is still enabled. That's the fix that you need... Is there easy way to push that fix to e1000, too? Or print use e1000e instead and refuse to load? well we're going to delete all pci-e related code from this driver soon anyway, but I am indeed writing a patch right now that prints out this warning... Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/