Re: sk98lin for 2.6.23-rc1
On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote: > On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote: > > >From http://www.krose.org/~krose/computing.html: > > > > Since the sky2 driver continues to suck ass (which is a technical > > description for "it hangs all the time under load, at least on my > > hardware" :-) ), I've fixed the sk98lin driver to compile for > > linux-2.6.23-rc1. Those who continue to have problems with sky2 can > > still use 2.6.23-rc1, simply by doing the following: > >... > > Personally, I'd like to see sk98lin remain in the kernel proper until > > sky2 goes at least 6 months without reported problems. The fact that I > > am not the only one still seeing issues is a clear indication that sky2 > > (even with the recent patches in 2.6.23-rc1) is not yet ready to replace > > sk98lin. > >... > > This sounds good in theory. > > The practical problem with this approach is that there are always many > people who use the old driver when the new driver doesn't work for them > instead of reporting their problems with the new driver. > > For these people a new driver will often suck when the old driver gets > removed, but after the removal of the old driver they are finally forced > to report their bugs resulting in a better new driver for everyone. > > The sky2 driver is since nearly 2 years in the kernel and Stephen is > usually quite good at handling bugs. The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in the past), and cycling the module generally fixes the issues. I have supplied all the information that Stephen has asked for, but still no resolution. I am not complaining about the lack of a fix, but don't assume that all it takes to get sky2 working is adequate bug reports. I have been and remain willing to test and assist debug, but after several dropped threads, I feel like the desire or ability to fix this issue isn't there (and remote debug of an intermittent hardware issue IS hard), and I didn't want to be a nuisance to someone that has no obligation to me to address the issue in the first place. Stability has improved, it's just not there yet. I'll switch to 1.16 soon, and respond to Stephen's request on netdev for current issues. -- Rob signature.asc Description: Digital signature
Re: sk98lin for 2.6.23-rc1
On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote: On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote: From http://www.krose.org/~krose/computing.html: Since the sky2 driver continues to suck ass (which is a technical description for it hangs all the time under load, at least on my hardware :-) ), I've fixed the sk98lin driver to compile for linux-2.6.23-rc1. Those who continue to have problems with sky2 can still use 2.6.23-rc1, simply by doing the following: ... Personally, I'd like to see sk98lin remain in the kernel proper until sky2 goes at least 6 months without reported problems. The fact that I am not the only one still seeing issues is a clear indication that sky2 (even with the recent patches in 2.6.23-rc1) is not yet ready to replace sk98lin. ... This sounds good in theory. The practical problem with this approach is that there are always many people who use the old driver when the new driver doesn't work for them instead of reporting their problems with the new driver. For these people a new driver will often suck when the old driver gets removed, but after the removal of the old driver they are finally forced to report their bugs resulting in a better new driver for everyone. The sky2 driver is since nearly 2 years in the kernel and Stephen is usually quite good at handling bugs. The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in the past), and cycling the module generally fixes the issues. I have supplied all the information that Stephen has asked for, but still no resolution. I am not complaining about the lack of a fix, but don't assume that all it takes to get sky2 working is adequate bug reports. I have been and remain willing to test and assist debug, but after several dropped threads, I feel like the desire or ability to fix this issue isn't there (and remote debug of an intermittent hardware issue IS hard), and I didn't want to be a nuisance to someone that has no obligation to me to address the issue in the first place. Stability has improved, it's just not there yet. I'll switch to 1.16 soon, and respond to Stephen's request on netdev for current issues. -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Sat, Apr 07, 2007 at 11:53:48AM -0600, Rob Sims wrote: > On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote: ... > > You might have over run the hub and it wedged. Try doing: > > ethtool -r eth0 > > that forces a down/up > > I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s) > periodically. I did an ethtool -S and ethtool -d to capture state. > (sky2-fail.log). I ran ethtool -r and retested; no change > (sky2-ethtool-r.log). Finally, ifdown - rmmod - modprobe - ifup, which > restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log). > > Please let me know if there's anything else I can poke into. As a > reminder, this is a 88E8053 r20. Next time I see a degradation, I'll > try cycling the switch. I continued to have problems with this interface. I tried bringing up the second interface on the motherboard; I added a cable to the same hub as the first interface and configured an address on the same subnet. I intended to see if I could use it as a redundant interface for when the first hung. I haven't been able to test that, because now I haven't seen a hang or slowdown since I brought the second interface up. Could the idle interface have generated an interrupt which is never cleared because the interface is down? Or something similar? -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Sat, Apr 07, 2007 at 11:53:48AM -0600, Rob Sims wrote: On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote: ... You might have over run the hub and it wedged. Try doing: ethtool -r eth0 that forces a down/up I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s) periodically. I did an ethtool -S and ethtool -d to capture state. (sky2-fail.log). I ran ethtool -r and retested; no change (sky2-ethtool-r.log). Finally, ifdown - rmmod - modprobe - ifup, which restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log). Please let me know if there's anything else I can poke into. As a reminder, this is a 88E8053 r20. Next time I see a degradation, I'll try cycling the switch. I continued to have problems with this interface. I tried bringing up the second interface on the motherboard; I added a cable to the same hub as the first interface and configured an address on the same subnet. I intended to see if I could use it as a redundant interface for when the first hung. I haven't been able to test that, because now I haven't seen a hang or slowdown since I brought the second interface up. Could the idle interface have generated an interrupt which is never cleared because the interface is down? Or something similar? -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote: > On Mon, 26 Mar 2007 21:24:06 -0600 > Rob Sims <[EMAIL PROTECTED]> wrote: > > > On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote: > > > Use ethtool -S to if there are any pause frames, etc. See if frames are > > > still making it into PHY statistics but not being received. > > > > > > Use ethtool -d to dump registers. Need current version of ethtool with > > > decode logic. > > > > > > Then look for things like is Ram buffer read/write pointer changing? > > > > > > Is GMAC stuck in pause: > > > > > > Normal is: > > > GMAC 1 > > > Status 0x5010 (see GM_GPSR_XXX in sky2.h) > > > Control 0x1800 > > > > > > Stuck is > > > GMAC 1 > > > Status 0x5810 (or 0x5A10) > > > > First, here's the described hang in action, on the Core2 Duo on a 1Gb > > hub: > > GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed. > > Read/write buffer pointers are changing. Full ethtool output in > > http://www.robsims.com/sky2.netmon.log.gz > > > > This machine was also having major throughput problems - 17 kB/s. > > Rebooting brought it to ~ 20 MB/s. Booting into a kernel with the > > proprietary sk98lin kernel module showed ~ 80MB/s. Finally, returning > > to sky2 gave 117 MB/s. Tests run using netcat, dd, /dev/zero, and > > /dev/null, transmitting from the problem box to an e1000 via a Netgear > > GS108. No hangs were observed during the "load test." > > The vendor driver does not do hardware flow control correctly. It ignores > Tx pause frames. > Were you using Jumbo MTU? No - 1500. > You might have over run the hub and it wedged. Try doing: > ethtool -r eth0 > that forces a down/up I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s) periodically. I did an ethtool -S and ethtool -d to capture state. (sky2-fail.log). I ran ethtool -r and retested; no change (sky2-ethtool-r.log). Finally, ifdown - rmmod - modprobe - ifup, which restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log). Please let me know if there's anything else I can poke into. As a reminder, this is a 88E8053 r20. Next time I see a degradation, I'll try cycling the switch. -- Rob NIC statistics: tx_bytes: 84771583 rx_bytes: 94680579 tx_broadcast: 3 rx_broadcast: 1356 tx_multicast: 655 rx_multicast: 22 tx_unicast: 173780 rx_unicast: 181842 tx_mac_pause: 0 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 2360 rx_65_to_127_byte_packets: 82349 rx_128_to_255_byte_packets: 35420 rx_256_to_511_byte_packets: 8584 rx_512_to_1023_byte_packets: 3734 rx_1024_to_1518_byte_packets: 50773 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 1744 tx_65_to_127_byte_packets: 102366 tx_128_to_255_byte_packets: 19607 tx_256_to_511_byte_packets: 8735 tx_512_to_1023_byte_packets: 4438 tx_1024_to_1518_byte_packets: 31033 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 PCI config -- 00: ab 11 62 43 07 00 10 00 20 00 00 02 04 00 00 00 10: 04 c0 9f fa 00 00 00 00 01 c8 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81 30: 00 00 9c fa 48 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 13 50: 03 5c fc 80 00 00 00 01 00 00 00 01 05 e0 82 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Control Registers - Register Access Port 0x00 LED Control/Status 0xA603164A Interrupt Source 0x Interrupt Mask 0xC01D Interrupt Hardware Error Source 0x Interrupt Hardware Error Mask0x2E003F3F Bus Management Unit --- CSR Receive Queue 1 0x0001 CSR Sync Queue 1 0x CSR Async Queue 10x MAC Addresses --- Addr 100 1A 92 23 52 4D Addr 200 1A 92 23 52 4D Addr 300 00 00 00 00 00 Connector type 0x4A (J) PMD type 0x31 (1) PHY type 0x80 Chip Id 0xB6 Yukon-2 EC (rev 0) Ram Buffer 0x0C Status BMU: --- Control0x0002220A Last Index 0x07FF Put Index
Re: sky2 PHY setup
On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote: On Mon, 26 Mar 2007 21:24:06 -0600 Rob Sims [EMAIL PROTECTED] wrote: On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote: Use ethtool -S to if there are any pause frames, etc. See if frames are still making it into PHY statistics but not being received. Use ethtool -d to dump registers. Need current version of ethtool with decode logic. Then look for things like is Ram buffer read/write pointer changing? Is GMAC stuck in pause: Normal is: GMAC 1 Status 0x5010 (see GM_GPSR_XXX in sky2.h) Control 0x1800 Stuck is GMAC 1 Status 0x5810 (or 0x5A10) First, here's the described hang in action, on the Core2 Duo on a 1Gb hub: GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed. Read/write buffer pointers are changing. Full ethtool output in http://www.robsims.com/sky2.netmon.log.gz This machine was also having major throughput problems - 17 kB/s. Rebooting brought it to ~ 20 MB/s. Booting into a kernel with the proprietary sk98lin kernel module showed ~ 80MB/s. Finally, returning to sky2 gave 117 MB/s. Tests run using netcat, dd, /dev/zero, and /dev/null, transmitting from the problem box to an e1000 via a Netgear GS108. No hangs were observed during the load test. The vendor driver does not do hardware flow control correctly. It ignores Tx pause frames. Were you using Jumbo MTU? No - 1500. You might have over run the hub and it wedged. Try doing: ethtool -r eth0 that forces a down/up I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s) periodically. I did an ethtool -S and ethtool -d to capture state. (sky2-fail.log). I ran ethtool -r and retested; no change (sky2-ethtool-r.log). Finally, ifdown - rmmod - modprobe - ifup, which restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log). Please let me know if there's anything else I can poke into. As a reminder, this is a 88E8053 r20. Next time I see a degradation, I'll try cycling the switch. -- Rob NIC statistics: tx_bytes: 84771583 rx_bytes: 94680579 tx_broadcast: 3 rx_broadcast: 1356 tx_multicast: 655 rx_multicast: 22 tx_unicast: 173780 rx_unicast: 181842 tx_mac_pause: 0 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 2360 rx_65_to_127_byte_packets: 82349 rx_128_to_255_byte_packets: 35420 rx_256_to_511_byte_packets: 8584 rx_512_to_1023_byte_packets: 3734 rx_1024_to_1518_byte_packets: 50773 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 1744 tx_65_to_127_byte_packets: 102366 tx_128_to_255_byte_packets: 19607 tx_256_to_511_byte_packets: 8735 tx_512_to_1023_byte_packets: 4438 tx_1024_to_1518_byte_packets: 31033 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 PCI config -- 00: ab 11 62 43 07 00 10 00 20 00 00 02 04 00 00 00 10: 04 c0 9f fa 00 00 00 00 01 c8 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81 30: 00 00 9c fa 48 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 13 50: 03 5c fc 80 00 00 00 01 00 00 00 01 05 e0 82 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Control Registers - Register Access Port 0x00 LED Control/Status 0xA603164A Interrupt Source 0x Interrupt Mask 0xC01D Interrupt Hardware Error Source 0x Interrupt Hardware Error Mask0x2E003F3F Bus Management Unit --- CSR Receive Queue 1 0x0001 CSR Sync Queue 1 0x CSR Async Queue 10x MAC Addresses --- Addr 100 1A 92 23 52 4D Addr 200 1A 92 23 52 4D Addr 300 00 00 00 00 00 Connector type 0x4A (J) PMD type 0x31 (1) PHY type 0x80 Chip Id 0xB6 Yukon-2 EC (rev 0) Ram Buffer 0x0C Status BMU: --- Control0x0002220A Last Index 0x07FF Put Index 0x025F List Address 0x1276 Transmit 1 done index 0x0118 Transmit index threshold 0x000A Status FIFO Write Pointer0x16 Read Pointer 0x16 Level0x00 Watermark0x10 ISR Watermark0x10 Status level Init
Re: sky2 PHY setup
On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote: > On Fri, 16 Mar 2007 14:36:45 -0600 > Rob Sims <[EMAIL PROTECTED]> wrote: > > Are there some debug hooks that can be activated? My sky2 stops > > responding (very light load) about twice a day. The netdev watchdog > > notices after a while and is able to reactivate the interface: > > Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out > > Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout > > Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 > > done=458 > > Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface > > Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface > > Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K > > Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full > > duplex, flow control both > > Use ethtool -S to if there are any pause frames, etc. See if frames are > still making it into PHY statistics but not being received. > > Use ethtool -d to dump registers. Need current version of ethtool with decode > logic. > > Then look for things like is Ram buffer read/write pointer changing? > > Is GMAC stuck in pause: > > Normal is: > GMAC 1 > Status 0x5010 (see GM_GPSR_XXX in sky2.h) > Control 0x1800 > > Stuck is > GMAC 1 > Status 0x5810 (or 0x5A10) First, here's the described hang in action, on the Core2 Duo on a 1Gb hub: GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed. Read/write buffer pointers are changing. Full ethtool output in http://www.robsims.com/sky2.netmon.log.gz This machine was also having major throughput problems - 17 kB/s. Rebooting brought it to ~ 20 MB/s. Booting into a kernel with the proprietary sk98lin kernel module showed ~ 80MB/s. Finally, returning to sky2 gave 117 MB/s. Tests run using netcat, dd, /dev/zero, and /dev/null, transmitting from the problem box to an e1000 via a Netgear GS108. No hangs were observed during the "load test." I also had a hang on a Pentium 4 w/sky2, 100Mb/s hub. I neglected to try removing and re-inserting the module before rebooting. GMAC 1 Status 0xF004 Control 0x1800 RAMbuffer pointers not moving, Read buffer Read pointer != Write pointer. http://www.robsims.com/sky2.ethtooldumps.tgz Thanks for looking at this. -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote: On Fri, 16 Mar 2007 14:36:45 -0600 Rob Sims [EMAIL PROTECTED] wrote: Are there some debug hooks that can be activated? My sky2 stops responding (very light load) about twice a day. The netdev watchdog notices after a while and is able to reactivate the interface: Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 done=458 Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both Use ethtool -S to if there are any pause frames, etc. See if frames are still making it into PHY statistics but not being received. Use ethtool -d to dump registers. Need current version of ethtool with decode logic. Then look for things like is Ram buffer read/write pointer changing? Is GMAC stuck in pause: Normal is: GMAC 1 Status 0x5010 (see GM_GPSR_XXX in sky2.h) Control 0x1800 Stuck is GMAC 1 Status 0x5810 (or 0x5A10) First, here's the described hang in action, on the Core2 Duo on a 1Gb hub: GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed. Read/write buffer pointers are changing. Full ethtool output in http://www.robsims.com/sky2.netmon.log.gz This machine was also having major throughput problems - 17 kB/s. Rebooting brought it to ~ 20 MB/s. Booting into a kernel with the proprietary sk98lin kernel module showed ~ 80MB/s. Finally, returning to sky2 gave 117 MB/s. Tests run using netcat, dd, /dev/zero, and /dev/null, transmitting from the problem box to an e1000 via a Netgear GS108. No hangs were observed during the load test. I also had a hang on a Pentium 4 w/sky2, 100Mb/s hub. I neglected to try removing and re-inserting the module before rebooting. GMAC 1 Status 0xF004 Control 0x1800 RAMbuffer pointers not moving, Read buffer Read pointer != Write pointer. http://www.robsims.com/sky2.ethtooldumps.tgz Thanks for looking at this. -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote: > On Fri, 16 Mar 2007 01:29:12 +0100 > Thomas Glanzmann <[EMAIL PROTECTED]> wrote: > > > Hello Stephen, > > > > > yesterday I pulled from Linus tree because I saw the sky2 updated and I > > > tried to break it but it seems that my problems are gone. I let you know > > > if anything pops up in the future. > > > > bad news. I today tried the sky2 driver which is in Linus Kernel Tree > > (HEAD) on a machine with very high network load and it stopped working > > without any kernel messages after doing a flawless job under high load > > for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-( > > > > Thomas > > I have run for 2+ days under load without problems. It is hard to > reproduce or do much about your problem without more info. Are there some debug hooks that can be activated? My sky2 stops responding (very light load) about twice a day. The netdev watchdog notices after a while and is able to reactivate the interface: Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 done=458 Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both This machine is a Core2 Duo e6700, and the interface is: 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) to 1 Gb hub. On a Pentium 4, with: 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) I have no issues, but with a very light network load, 100 Mb/s hub.. Each machine has two identical interfaces; only one has a cable in it. Both machines can be used for testing/debug. -- Rob signature.asc Description: Digital signature
Re: sky2 PHY setup
On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote: On Fri, 16 Mar 2007 01:29:12 +0100 Thomas Glanzmann [EMAIL PROTECTED] wrote: Hello Stephen, yesterday I pulled from Linus tree because I saw the sky2 updated and I tried to break it but it seems that my problems are gone. I let you know if anything pops up in the future. bad news. I today tried the sky2 driver which is in Linus Kernel Tree (HEAD) on a machine with very high network load and it stopped working without any kernel messages after doing a flawless job under high load for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-( Thomas I have run for 2+ days under load without problems. It is hard to reproduce or do much about your problem without more info. Are there some debug hooks that can be activated? My sky2 stops responding (very light load) about twice a day. The netdev watchdog notices after a while and is able to reactivate the interface: Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 done=458 Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both This machine is a Core2 Duo e6700, and the interface is: 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) to 1 Gb hub. On a Pentium 4, with: 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) I have no issues, but with a very light network load, 100 Mb/s hub.. Each machine has two identical interfaces; only one has a cable in it. Both machines can be used for testing/debug. -- Rob signature.asc Description: Digital signature
Re: Change in NFS client behavior
On Fri, Sep 02, 2005 at 12:19:07AM -0400, Trond Myklebust wrote: > fr den 02.09.2005 Klokka 00:15 (-0400) skreiv Trond Myklebust: > > > Sure. The other problem is that the test is made before the i_sem is > > grabbed. OK, so how about the appended patch instead? > > Doh! > > Trond > VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC) > > POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime > whereas truncate() should only do so if the file size actually changes. > > Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix > the VFS truncate() so that it no enforces the POSIX rules. > > Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]> > --- > attr.c | 14 +++--- > nfs/inode.c |5 - > open.c | 25 +++-- > 3 files changed, 26 insertions(+), 18 deletions(-) This patch does not fix the original issue - timestamps are not updated as expected. -- Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Change in NFS client behavior
On Fri, Sep 02, 2005 at 12:19:07AM -0400, Trond Myklebust wrote: fr den 02.09.2005 Klokka 00:15 (-0400) skreiv Trond Myklebust: Sure. The other problem is that the test is made before the i_sem is grabbed. OK, so how about the appended patch instead? Doh! Trond VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC) POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime whereas truncate() should only do so if the file size actually changes. Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix the VFS truncate() so that it no enforces the POSIX rules. Signed-off-by: Trond Myklebust [EMAIL PROTECTED] --- attr.c | 14 +++--- nfs/inode.c |5 - open.c | 25 +++-- 3 files changed, 26 insertions(+), 18 deletions(-) This patch does not fix the original issue - timestamps are not updated as expected. -- Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Change in NFS client behavior
On Thu, Sep 01, 2005 at 11:43:17PM -0400, Trond Myklebust wrote: > to den 01.09.2005 Klokka 19:38 (-0400) skreiv Trond Myklebust: > > This is a consequence of 2.6 NFS clients optimising away unnecessary > > truncate calls. Whereas this is correct behaviour for truncate(), it > > appears to be incorrect for open(O_TRUNC). > > In fact, local filesystems like xfs and ext3 appear to have the opposite > > problem: they change ctime if you call ftruncate(0) on the zero-length > > file, as the attached test shows. The following patch fixes the problem, at least when applied against 2.6.8: --- inode.c.orig2005-08-31 16:54:27.0 -0600 +++ inode.c 2005-08-31 17:06:52.0 -0600 @@ -756,7 +756,7 @@ int error; if (attr->ia_valid & ATTR_SIZE) { - if (!S_ISREG(inode->i_mode) || attr->ia_size == i_size_read(inode)) + if (!S_ISREG(inode->i_mode)) attr->ia_valid &= ~ATTR_SIZE; } > Could you please check if the following patch fixes NFS (and also the > local filesystems) for you? I'll try the latest in the flood today. Thanks for all the help! -- Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Change in NFS client behavior
On Thu, Sep 01, 2005 at 11:43:17PM -0400, Trond Myklebust wrote: to den 01.09.2005 Klokka 19:38 (-0400) skreiv Trond Myklebust: This is a consequence of 2.6 NFS clients optimising away unnecessary truncate calls. Whereas this is correct behaviour for truncate(), it appears to be incorrect for open(O_TRUNC). In fact, local filesystems like xfs and ext3 appear to have the opposite problem: they change ctime if you call ftruncate(0) on the zero-length file, as the attached test shows. The following patch fixes the problem, at least when applied against 2.6.8: --- inode.c.orig2005-08-31 16:54:27.0 -0600 +++ inode.c 2005-08-31 17:06:52.0 -0600 @@ -756,7 +756,7 @@ int error; if (attr-ia_valid ATTR_SIZE) { - if (!S_ISREG(inode-i_mode) || attr-ia_size == i_size_read(inode)) + if (!S_ISREG(inode-i_mode)) attr-ia_valid = ~ATTR_SIZE; } Could you please check if the following patch fixes NFS (and also the local filesystems) for you? I'll try the latest in the flood today. Thanks for all the help! -- Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Change in NFS client behavior
We have noticed when changing from kernel 2.4.23 to 2.6.8 that timestamps of files are not changed if opened for a write and nothing is written. When using 2.4.23 timestamps are changed. When using a local filesystem (reiserfs) with either kernel, timestamps are changed. Symptoms vary with the client, not the server. See the script below. When run on a 2.4.23 machine in an NFS mounted directory, output is "Good." When run on a 2.6.8 or 2.6.12-rc4 machine in an NFS directory, output is "Error." Is this a bug? How do we revert to the 2.4/local fs behavior? Thanks, Rob #!/bin/sh if [ -n "$1" ]; then if [ -e "$1" ]; then printf "%s exists - please specify a new file name.\n" "$1" else touch $1 origtime=`stat -c '%X %Y %Z' "$1"` sleep 5 cat /dev/null > "$1" newtime=`stat -c '%X %Y %Z' "$1"` rm "$1" printf "%s\n%s\n" "$origtime" "$newtime" if [ "$origtime" = "$newtime" ]; then printf "Error - timestamps not modified\n" else printf "Good - timestamps modified\n" fi fi else printf "Please specify a file name.\n" fi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Change in NFS client behavior
We have noticed when changing from kernel 2.4.23 to 2.6.8 that timestamps of files are not changed if opened for a write and nothing is written. When using 2.4.23 timestamps are changed. When using a local filesystem (reiserfs) with either kernel, timestamps are changed. Symptoms vary with the client, not the server. See the script below. When run on a 2.4.23 machine in an NFS mounted directory, output is Good. When run on a 2.6.8 or 2.6.12-rc4 machine in an NFS directory, output is Error. Is this a bug? How do we revert to the 2.4/local fs behavior? Thanks, Rob #!/bin/sh if [ -n $1 ]; then if [ -e $1 ]; then printf %s exists - please specify a new file name.\n $1 else touch $1 origtime=`stat -c '%X %Y %Z' $1` sleep 5 cat /dev/null $1 newtime=`stat -c '%X %Y %Z' $1` rm $1 printf %s\n%s\n $origtime $newtime if [ $origtime = $newtime ]; then printf Error - timestamps not modified\n else printf Good - timestamps modified\n fi fi else printf Please specify a file name.\n fi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/