Re: e1000 full-duplex TCP performance well below wire speed
Bill Fink wrote: If the receive direction uses a different GigE NIC that's part of the same quad-GigE, all is fine: [EMAIL PROTECTED] ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.5.79 tx: 1186.5051 MB / 10.05 sec = 990.2250 Mbps 12 %TX 13 %RX 0 retrans rx: 1186.7656 MB / 10.05 sec = 990.5204 Mbps 15 %TX 14 %RX 0 retrans Could this be an issue with pause frames? At a previous job I remember having issues with a similar configuration using two broadcom sb1250 3 gigE port devices. If I ran bidirectional tests on a single pair of ports connected via cross over, it was slower than when I gave each direction its own pair of ports. The problem turned out to be that pause frame generation and handling was not configured correctly. -Ack -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: e1000 full-duplex TCP performance well below wire speed
Bill Fink wrote: If the receive direction uses a different GigE NIC that's part of the same quad-GigE, all is fine: [EMAIL PROTECTED] ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 nuttcp -f-beta -Irx -r -w2m 192.168.5.79 tx: 1186.5051 MB / 10.05 sec = 990.2250 Mbps 12 %TX 13 %RX 0 retrans rx: 1186.7656 MB / 10.05 sec = 990.5204 Mbps 15 %TX 14 %RX 0 retrans Could this be an issue with pause frames? At a previous job I remember having issues with a similar configuration using two broadcom sb1250 3 gigE port devices. If I ran bidirectional tests on a single pair of ports connected via cross over, it was slower than when I gave each direction its own pair of ports. The problem turned out to be that pause frame generation and handling was not configured correctly. -Ack -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25 patch] the planned eepro100 removal
Kok, Auke wrote: Jeff Garzik wrote: Bill Davidsen wrote: Adrian Bunk wrote: This patch contains the planned removal of the eepro100 driver. Are the e100 people satisfied that e100 now handles all known cases? I Nope. There are still e100 work outstanding that means we cannot kill eepro100. Agreed, there is still a receive unit hang in the last version that I got from David Acker. Auke Yes, I have not had much time to work on it. :-( In testing, I was also able to get crashes which I believe are from receive skb corruption. It doesn't ever appear in our normal use testing, just when using pktgen against it while it uses pktgen out. Sadly this has pushed it down the slippery priority slope. I am sorry if I am holding things up. I will try to put some more time in on this issue as soon as I can. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25 patch] the planned eepro100 removal
Kok, Auke wrote: Jeff Garzik wrote: Bill Davidsen wrote: Adrian Bunk wrote: This patch contains the planned removal of the eepro100 driver. Are the e100 people satisfied that e100 now handles all known cases? I Nope. There are still e100 work outstanding that means we cannot kill eepro100. Agreed, there is still a receive unit hang in the last version that I got from David Acker. Auke Yes, I have not had much time to work on it. :-( In testing, I was also able to get crashes which I believe are from receive skb corruption. It doesn't ever appear in our normal use testing, just when using pktgen against it while it uses pktgen out. Sadly this has pushed it down the slippery priority slope. I am sorry if I am holding things up. I will try to put some more time in on this issue as soon as I can. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
Lennert Buytenhek wrote: The people who need a LE network driver can use Christian's driver, as Christian's driver works in LE just fine. The people who care about LE support can add LE support to the driver that Krzysztof wrote. I don't think that not supporting LE is a reason not to merge Krzysztof's driver. Don't make supporting LE systems Krzysztof's problem. Krzysztof has written an excellent driver, and while it would be 100% Debian style to reject his driver just because it doesn't support LE[*], thankfully, Linux is not Debian. Please don't turn Linux into Debian. I am using the ixp425 on the avila from gateworks. It only has 16 MB of flash built in. We needed to squeeze a production and a failsafe linux inside that so debian was not an option. I found intel's original drivers horrible to read, maintain, and use. We are using Cristian's driver (rev 0.2.1) and are preparing to go to his latest for the crypto support. I only had one bug in 0.2.1, which is fixed in later versions. I would love to see mail line support for this device, including the ethernet ports and the crypto capabilities. We run in big endian despite the extra difficulty in toolset setup and finding lots of brain0-damaged-designed-for-x86 code. We already can use up the CPU when we have the mini-pci slots populated with 802.11g radios and the ethernet port in use. Swapping packets would kill us. Never mind if we do any kind of software based crypto! For those of us in the embedded space, performance matters. Big endian is the natural setup for the NPEs on this hardware according to the intel documentation. Please don't stop this driver from moving forward so that a few folks can run their hardware in slow mode. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
Lennert Buytenhek wrote: The people who need a LE network driver can use Christian's driver, as Christian's driver works in LE just fine. The people who care about LE support can add LE support to the driver that Krzysztof wrote. I don't think that not supporting LE is a reason not to merge Krzysztof's driver. Don't make supporting LE systems Krzysztof's problem. Krzysztof has written an excellent driver, and while it would be 100% Debian style to reject his driver just because it doesn't support LE[*], thankfully, Linux is not Debian. Please don't turn Linux into Debian. I am using the ixp425 on the avila from gateworks. It only has 16 MB of flash built in. We needed to squeeze a production and a failsafe linux inside that so debian was not an option. I found intel's original drivers horrible to read, maintain, and use. We are using Cristian's driver (rev 0.2.1) and are preparing to go to his latest for the crypto support. I only had one bug in 0.2.1, which is fixed in later versions. I would love to see mail line support for this device, including the ethernet ports and the crypto capabilities. We run in big endian despite the extra difficulty in toolset setup and finding lots of brain0-damaged-designed-for-x86 code. We already can use up the CPU when we have the mini-pci slots populated with 802.11g radios and the ethernet port in use. Swapping packets would kill us. Never mind if we do any kind of software based crypto! For those of us in the embedded space, performance matters. Big endian is the natural setup for the NPEs on this hardware according to the intel documentation. Please don't stop this driver from moving forward so that a few folks can run their hardware in slow mode. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] e100 rx: or s and el bits
David Acker wrote: So far my testing has shown both the original and the new version of the S-bit patch work in that no corruption seemed to occur over long term runs. I spoke too soon. Further testing has not gone well. If I use the default settings for CPU saver and drop the receive pool down to 16 buffers I can cause problems with various forms of the patch. With the original S-bit patch I can get: e100: eth0: e100_update_stats: exec cuc_dump_reset failed At this point sends and receives are failing and their counters are not changing. When a raw socket opened on the port tries to send it gets errno 11, Resource temporarily unavailable. When this error occurred, I added a reset call through the tx timeout schedule_work(>tx_timeout_task); This made the port come back but obviously this is somewhat of a kludge and eventually, the entire system hangs or get an oops from memory corruption. Apparently my earlier tests got lucky with a large receive pool size. The only way the original patch was stable for me was by disabling CPU saver, and setting the NAPI weight (default of 16) as large as the default receive pool size (256) so that all buffers were claimed and reallocated in one NAPI call. None of this should be needed so there is definitely a problem with the original patch. The updated patch produced a different issue. We got an RNR interrupt indicating the receive unit got ahead of the software. The S-bit patch removed any handling of this issue as it assumed the hardware would spin on the sbit. Apparently if both the S-bit and the EL-bit are set on the same RFD, it follows the EL-bit handling. Printing the stat/ack and status bytes on the RNR interrupts I get: status 01001000 = 0x48 = RUS of 0010 = No Resources, CUS of 01 = Suspended stat/ack 0101 = 0x50 = FR, RNR or 0001 = 0x10 = RNR Notice that the RUS went into No Resources and not suspended. Thus clearing the S-bit does not wake it up; it needs a new start command. I could not find documentation that states that the S-bit need only be cleared to take the RU out of suspended state. Before the S-bit patch the driver tried to track this need but that version of the driver didn't work for me either. By the way, I am using, "Intel 8255x 10/100 Mbps Ethernet Controller Family, Open Source Software Developer Manual, January 2006" as my documentation. This got me looking at just how in the world this worked on the old eepro100 driver. It had another difference; it did not reap the last rx buffer in the chain. It set a postponed bit and then picked it up on the next interrupt after more buffers had been allocated. It then noticed that the RU was in a suspended or no resources state and did a softreset. I don't believe this avoid the last buffer trick really fixes the race. Imagine the following: 1. 4 buffers in receive pool, all freshly allocated 2. Hardware consumes 3 buffers 3. Software processes 3 buffers, begins to allocate new buffers 4. Hardware writes status bits into buffer 4 while software updates link and command word bits in buffer 4. They share a cache line and corrupt each other. This appears to be possible with any of the versions of this driver I have seen. The problem is one of packet ownership. Once the driver gives a list of buffers to hardware, hardware owns them all. The driver can not safely change these buffers. Sadly, this means that the idea of the driver "staying ahead" of the hardware such that the hardware never runs out of resources will not work here. Once the driver gives the hardware a packet with S or EL bits set, it must let the hardware encounter the packet and return it to software. I think the driver needs to protect the last entry in the ring by putting the S-bit on the entry before it. The first time the driver allocates a block of packets, it writes a new S-bit out on the next to last packet. As buffers complete it allocates more packets in the chain but does not set a new S-bit since the old one will stop hardware. It can not clear the old S-bit because the driver does not own the buffer, hardware does. After processing the s-bit packet the hardware will interrupt with a stat/ack of RNR and RUS of suspended. When software processes a packet with an old S-bit it allocates new buffers and sets the s-bit on the new next to last packet. The above case changes now: 1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated. S-bit is on buffer 3. 2. Hardware consumes 3 buffers, hits S-bit, RNR interrupts 3. Software processes 3 buffers, begins to allocate new buffers 4. Software sends resume once buffers are allocated, S-bit is on buffer 2. 5. Hardware gets resume. When it processed buffer 3, it saved the link to buffer 4 and thus resumes at buffer 4. Here is a different flow where the software stays ahead: 1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated. S-bit is on
Re: [PATCH] e100 rx: or s and el bits
David Acker wrote: So far my testing has shown both the original and the new version of the S-bit patch work in that no corruption seemed to occur over long term runs. I spoke too soon. Further testing has not gone well. If I use the default settings for CPU saver and drop the receive pool down to 16 buffers I can cause problems with various forms of the patch. With the original S-bit patch I can get: e100: eth0: e100_update_stats: exec cuc_dump_reset failed At this point sends and receives are failing and their counters are not changing. When a raw socket opened on the port tries to send it gets errno 11, Resource temporarily unavailable. When this error occurred, I added a reset call through the tx timeout schedule_work(nic-tx_timeout_task); This made the port come back but obviously this is somewhat of a kludge and eventually, the entire system hangs or get an oops from memory corruption. Apparently my earlier tests got lucky with a large receive pool size. The only way the original patch was stable for me was by disabling CPU saver, and setting the NAPI weight (default of 16) as large as the default receive pool size (256) so that all buffers were claimed and reallocated in one NAPI call. None of this should be needed so there is definitely a problem with the original patch. The updated patch produced a different issue. We got an RNR interrupt indicating the receive unit got ahead of the software. The S-bit patch removed any handling of this issue as it assumed the hardware would spin on the sbit. Apparently if both the S-bit and the EL-bit are set on the same RFD, it follows the EL-bit handling. Printing the stat/ack and status bytes on the RNR interrupts I get: status 01001000 = 0x48 = RUS of 0010 = No Resources, CUS of 01 = Suspended stat/ack 0101 = 0x50 = FR, RNR or 0001 = 0x10 = RNR Notice that the RUS went into No Resources and not suspended. Thus clearing the S-bit does not wake it up; it needs a new start command. I could not find documentation that states that the S-bit need only be cleared to take the RU out of suspended state. Before the S-bit patch the driver tried to track this need but that version of the driver didn't work for me either. By the way, I am using, Intel 8255x 10/100 Mbps Ethernet Controller Family, Open Source Software Developer Manual, January 2006 as my documentation. This got me looking at just how in the world this worked on the old eepro100 driver. It had another difference; it did not reap the last rx buffer in the chain. It set a postponed bit and then picked it up on the next interrupt after more buffers had been allocated. It then noticed that the RU was in a suspended or no resources state and did a softreset. I don't believe this avoid the last buffer trick really fixes the race. Imagine the following: 1. 4 buffers in receive pool, all freshly allocated 2. Hardware consumes 3 buffers 3. Software processes 3 buffers, begins to allocate new buffers 4. Hardware writes status bits into buffer 4 while software updates link and command word bits in buffer 4. They share a cache line and corrupt each other. This appears to be possible with any of the versions of this driver I have seen. The problem is one of packet ownership. Once the driver gives a list of buffers to hardware, hardware owns them all. The driver can not safely change these buffers. Sadly, this means that the idea of the driver staying ahead of the hardware such that the hardware never runs out of resources will not work here. Once the driver gives the hardware a packet with S or EL bits set, it must let the hardware encounter the packet and return it to software. I think the driver needs to protect the last entry in the ring by putting the S-bit on the entry before it. The first time the driver allocates a block of packets, it writes a new S-bit out on the next to last packet. As buffers complete it allocates more packets in the chain but does not set a new S-bit since the old one will stop hardware. It can not clear the old S-bit because the driver does not own the buffer, hardware does. After processing the s-bit packet the hardware will interrupt with a stat/ack of RNR and RUS of suspended. When software processes a packet with an old S-bit it allocates new buffers and sets the s-bit on the new next to last packet. The above case changes now: 1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated. S-bit is on buffer 3. 2. Hardware consumes 3 buffers, hits S-bit, RNR interrupts 3. Software processes 3 buffers, begins to allocate new buffers 4. Software sends resume once buffers are allocated, S-bit is on buffer 2. 5. Hardware gets resume. When it processed buffer 3, it saved the link to buffer 4 and thus resumes at buffer 4. Here is a different flow where the software stays ahead: 1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated. S-bit is on buffer 3. 2. Hardware
Re: [PATCH] e100 rx: or s and el bits
David Acker wrote: Milton Miller wrote: In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description talks about emulating another driver by setting addtional bits and the being unable to test when submitted. Seeing the & operator to set more bits made me suspicious, and indeed the bits are defined in positive logic: cb_s = 0x4000, cb_el = 0x8000, So anding those together would be 0. I'm guessing they should be or'd, but don't have hardware here to test, much like the committed patch. In fact, I'll let someone else do the compile test too. I'll update the comment. I wonder if this worked for me because the hardware also spun on the link field being NULL? Since the RU base is also set to 0, the calculated physical address would be 0 as well. I would imagine if the hardware tried to read/write to very low addresses across PCI, there would be issues. I will retest with a small receive pool to try to hit the problem. I will also run these tests with the new patch and with a smaller receive pool (default is 256) to make the pool run out more often. So far my testing has shown both the original and the new version of the S-bit patch work in that no corruption seemed to occur over long term runs. The previous S-bit patch may have only worked due to something specific about how my PCI companion chip handles I/O to low memory addresses (from dereferencing a link address of 0). Perhaps the e100 handles the NULL link as well, but given that the manual does not seem to state what happens when the hardware encounters a buffer with a link of 0, I think Milton's fix is the proper way to do it. The old eepro driver did set both bits although it did it with a hardcoded constant. I will continue testing with slab debug on but that will take longer. Has anyone tried this on other platforms? -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] e100 rx: or s and el bits
David Acker wrote: Milton Miller wrote: In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description talks about emulating another driver by setting addtional bits and the being unable to test when submitted. Seeing the operator to set more bits made me suspicious, and indeed the bits are defined in positive logic: cb_s = 0x4000, cb_el = 0x8000, So anding those together would be 0. I'm guessing they should be or'd, but don't have hardware here to test, much like the committed patch. In fact, I'll let someone else do the compile test too. I'll update the comment. I wonder if this worked for me because the hardware also spun on the link field being NULL? Since the RU base is also set to 0, the calculated physical address would be 0 as well. I would imagine if the hardware tried to read/write to very low addresses across PCI, there would be issues. I will retest with a small receive pool to try to hit the problem. I will also run these tests with the new patch and with a smaller receive pool (default is 256) to make the pool run out more often. So far my testing has shown both the original and the new version of the S-bit patch work in that no corruption seemed to occur over long term runs. The previous S-bit patch may have only worked due to something specific about how my PCI companion chip handles I/O to low memory addresses (from dereferencing a link address of 0). Perhaps the e100 handles the NULL link as well, but given that the manual does not seem to state what happens when the hardware encounters a buffer with a link of 0, I think Milton's fix is the proper way to do it. The old eepro driver did set both bits although it did it with a hardcoded constant. I will continue testing with slab debug on but that will take longer. Has anyone tried this on other platforms? -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] e100 rx: or s and el bits
Milton Miller wrote: In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description talks about emulating another driver by setting addtional bits and the being unable to test when submitted. Seeing the & operator to set more bits made me suspicious, and indeed the bits are defined in positive logic: cb_s = 0x4000, cb_el = 0x8000, So anding those together would be 0. I'm guessing they should be or'd, but don't have hardware here to test, much like the committed patch. In fact, I'll let someone else do the compile test too. I'll update the comment. I wonder if this worked for me because the hardware also spun on the link field being NULL? Since the RU base is also set to 0, the calculated physical address would be 0 as well. I would imagine if the hardware tried to read/write to very low addresses across PCI, there would be issues. I will retest with a small receive pool to try to hit the problem. It seems to apply to a pretty recent git pull from linus's tree. I manually merged this into the 2.6.18.4 kernel we are using. With the original in kernel driver (just EL bit, no S bit), I had two tests that would always end in horrible memory corruption on a PXA255 based system. One is a 12 hour bidirectional TCP test using iperf with the ethernet port sending packets to a wireless card and vice versa. The other is a similar configuration running a 12 hour UDP test sending 20 megabits/second in each direction. Even through the original S-bit patch seems broken, we have had days of continuous traffic through it without issue where previously we could never go more than 6 hours. I will let folks know how it goes. In the UDP test, the ethernet side often gets ahead of the available buffers due to CPU and PCI usage by the wireless card and its driver. I will also run these tests with the new patch and with a smaller receive pool (default is 256) to make the pool run out more often. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] e100 rx: or s and el bits
Milton Miller wrote: In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description talks about emulating another driver by setting addtional bits and the being unable to test when submitted. Seeing the operator to set more bits made me suspicious, and indeed the bits are defined in positive logic: cb_s = 0x4000, cb_el = 0x8000, So anding those together would be 0. I'm guessing they should be or'd, but don't have hardware here to test, much like the committed patch. In fact, I'll let someone else do the compile test too. I'll update the comment. I wonder if this worked for me because the hardware also spun on the link field being NULL? Since the RU base is also set to 0, the calculated physical address would be 0 as well. I would imagine if the hardware tried to read/write to very low addresses across PCI, there would be issues. I will retest with a small receive pool to try to hit the problem. It seems to apply to a pretty recent git pull from linus's tree. I manually merged this into the 2.6.18.4 kernel we are using. With the original in kernel driver (just EL bit, no S bit), I had two tests that would always end in horrible memory corruption on a PXA255 based system. One is a 12 hour bidirectional TCP test using iperf with the ethernet port sending packets to a wireless card and vice versa. The other is a similar configuration running a 12 hour UDP test sending 20 megabits/second in each direction. Even through the original S-bit patch seems broken, we have had days of continuous traffic through it without issue where previously we could never go more than 6 hours. I will let folks know how it goes. In the UDP test, the ethernet side often gets ahead of the available buffers due to CPU and PCI usage by the wireless card and its driver. I will also run these tests with the new patch and with a smaller receive pool (default is 256) to make the pool run out more often. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Lennart Sorensen wrote: Well the IT8152G+PXA255 combination used on the SBC we tried a couple of years ago did not work. The PCI bus had errors and the SBC maker gave up trying to fix it. We switched to a Geode SC1200 based board instead which works fine PCI wise. I don't think this is it. Other PCI devices work fine on this board include several 802.11 radios. The S-bit patch totally fixes the problem. Here is a quote from Russell that describes what I believe is the main problem: http://www-gatago.com/linux/kernel/15457063.html " Has e100 actually been fixed to use the PCI DMA API correctly yet? Looking at it, it doesn't look like it, so until it does, eepro100 is the far better bet for platforms needing working DMA API. What I'm talking about is e100's apparant belief that it can modify rfd's in the receive ring on a non-cache coherent architecture and expect the data around it to remain unaffected (see e100_rx_alloc_skb): struct rfd { u16 status; u16 command; u32 link; u32 rbd; u16 actual_size; u16 size; }; it touches command and link. This means that the whole rfd plus maybe the following or preceding 16 bytes get loaded into a cache line (assuming cache lines of 32 bytes), and that data written out again at sync. However, it does this on what seems to be an active receive chain. So, both the CPU _and_ the device own the same data. Which is a violation of the DMA API. " I think that the S-bit patch fixes it because the hardware spins on the s-bit instead of using the packet. With just the el-bit, the hardware tries to use the same cache line that the software is updating. Can someone from Intel let us know if I understand the hardware's handling of the S and EL bits? If my interpretation is correct, can the s-bit patch be applied? It seems like the correct way to lock out the hardware while a packet is being updated. I have not seen a reason given not to apply the patch. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Lennart Sorensen wrote: Well the IT8152G+PXA255 combination used on the SBC we tried a couple of years ago did not work. The PCI bus had errors and the SBC maker gave up trying to fix it. We switched to a Geode SC1200 based board instead which works fine PCI wise. I don't think this is it. Other PCI devices work fine on this board include several 802.11 radios. The S-bit patch totally fixes the problem. Here is a quote from Russell that describes what I believe is the main problem: http://www-gatago.com/linux/kernel/15457063.html Has e100 actually been fixed to use the PCI DMA API correctly yet? Looking at it, it doesn't look like it, so until it does, eepro100 is the far better bet for platforms needing working DMA API. What I'm talking about is e100's apparant belief that it can modify rfd's in the receive ring on a non-cache coherent architecture and expect the data around it to remain unaffected (see e100_rx_alloc_skb): struct rfd { u16 status; u16 command; u32 link; u32 rbd; u16 actual_size; u16 size; }; it touches command and link. This means that the whole rfd plus maybe the following or preceding 16 bytes get loaded into a cache line (assuming cache lines of 32 bytes), and that data written out again at sync. However, it does this on what seems to be an active receive chain. So, both the CPU _and_ the device own the same data. Which is a violation of the DMA API. I think that the S-bit patch fixes it because the hardware spins on the s-bit instead of using the packet. With just the el-bit, the hardware tries to use the same cache line that the software is updating. Can someone from Intel let us know if I understand the hardware's handling of the S and EL bits? If my interpretation is correct, can the s-bit patch be applied? It seems like the correct way to lock out the hardware while a packet is being updated. I have not seen a reason given not to apply the patch. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Lennart Sorensen wrote: Which PCI host controller are you using with the PXA255? We tried using a PXA255 based system with a PCI controller a couple of years ago and have to change to a different cpu in the end due to the PCI controller simply not being valid PCI. The PXA255 wasn't designed for PCI, and I get the impression that non of the PCI companion chips for it do a good enough job to actually add it correctly. Sorry for the delay in responding...my wife and I just had twins! We are using the IT8152G RISC-to-PCI companion chip. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Lennart Sorensen wrote: Which PCI host controller are you using with the PXA255? We tried using a PXA255 based system with a PCI controller a couple of years ago and have to change to a different cpu in the end due to the PCI controller simply not being valid PCI. The PXA255 wasn't designed for PCI, and I get the impression that non of the PCI companion chips for it do a good enough job to actually add it correctly. Sorry for the delay in responding...my wife and I just had twins! We are using the IT8152G RISC-to-PCI companion chip. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Kok, Auke wrote: Lennert Buytenhek wrote: On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote: 1) Does e100 driver work on ARM? FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based) board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based) board. ixp2350 works both with hardware coherency turned on (cpu snoops bus) and turned off (manual dma cache clean/invalidate as usual.) As for the other ARM platforms that I'm interested in / have hardware for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other two (iop32x/iop33x) I can't test because I don't have such systems with e100 NICs, but I expect those would work, since they're both xscale based like the ixp2400, and the ixp2400 works. I just got an iop342 board dropped on my lap. Once it's running, I'll make sure to make this the first thing to test. I have a pxa255 based system with PCI added to it. The e100 would have memory corruption in its receive buffers detected by slab debugging unless I put in the patch to use the S-bit. Here is a link to the patch posting: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/broken-out/git-netdev-all.patch Search for e100.c. http://www-gatago.com/linux/kernel/15457063.html - This discussion seems to hit the issue. There appears to be a race on the cache line where the EL bit and the next packet info live. In my case the hardware appeared to write to a free packet. The S-bit seems to make the hardware stop and spin on the bit, while the EL bit seems to let the hardware try to use that packet. This race would occur less often when the receive buffer chain is always refilled before the hardware can use them up. On our 400 Mhz Xscale, we can use up all 256 buffers if the PCI bus has another busy device on it. In our case it is an 802.11g miniPCI card and our software was routing all ethernet packets to the wireless interface and vice versa while TCP streams were running accross these connections. -Ack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] e100 driver on ARM
Kok, Auke wrote: Lennert Buytenhek wrote: On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote: 1) Does e100 driver work on ARM? FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based) board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based) board. ixp2350 works both with hardware coherency turned on (cpu snoops bus) and turned off (manual dma cache clean/invalidate as usual.) As for the other ARM platforms that I'm interested in / have hardware for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other two (iop32x/iop33x) I can't test because I don't have such systems with e100 NICs, but I expect those would work, since they're both xscale based like the ixp2400, and the ixp2400 works. I just got an iop342 board dropped on my lap. Once it's running, I'll make sure to make this the first thing to test. I have a pxa255 based system with PCI added to it. The e100 would have memory corruption in its receive buffers detected by slab debugging unless I put in the patch to use the S-bit. Here is a link to the patch posting: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/broken-out/git-netdev-all.patch Search for e100.c. http://www-gatago.com/linux/kernel/15457063.html - This discussion seems to hit the issue. There appears to be a race on the cache line where the EL bit and the next packet info live. In my case the hardware appeared to write to a free packet. The S-bit seems to make the hardware stop and spin on the bit, while the EL bit seems to let the hardware try to use that packet. This race would occur less often when the receive buffer chain is always refilled before the hardware can use them up. On our 400 Mhz Xscale, we can use up all 256 buffers if the PCI bus has another busy device on it. In our case it is an 802.11g miniPCI card and our software was routing all ethernet packets to the wireless interface and vice versa while TCP streams were running accross these connections. -Ack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/