Re: sk98lin for 2.6.23-rc1

2007-07-29 Thread Rob Sims
On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote:
> On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
> > >From http://www.krose.org/~krose/computing.html:
> > 
> > Since the sky2 driver continues to suck ass (which is a technical
> > description for "it hangs all the time under load, at least on my
> > hardware" :-) ), I've fixed the sk98lin driver to compile for
> > linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> > still use 2.6.23-rc1, simply by doing the following:
> >...
> > Personally, I'd like to see sk98lin remain in the kernel proper until
> > sky2 goes at least 6 months without reported problems.  The fact that I
> > am not the only one still seeing issues is a clear indication that sky2
> > (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
> > sk98lin.
> >...
> 
> This sounds good in theory.
> 
> The practical problem with this approach is that there are always many 
> people who use the old driver when the new driver doesn't work for them 
> instead of reporting their problems with the new driver.
> 
> For these people a new driver will often suck when the old driver gets 
> removed, but after the removal of the old driver they are finally forced 
> to report their bugs resulting in a better new driver for everyone.
> 
> The sky2 driver is since nearly 2 years in the kernel and Stephen is 
> usually quite good at handling bugs.

The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in
the past), and cycling the module generally fixes the issues.  I have
supplied all the information that Stephen has asked for, but still no
resolution.  I am not complaining about the lack of a fix, but don't
assume that all it takes to get sky2 working is adequate bug reports.  I
have been and remain willing to test and assist debug, but after several
dropped threads, I feel like the desire or ability to fix this issue
isn't there (and remote debug of an intermittent hardware issue IS
hard), and I didn't want to be a nuisance to someone that has no
obligation to me to address the issue in the first place.

Stability has improved, it's just not there yet.

I'll switch to 1.16 soon, and respond to Stephen's request on netdev for
current issues.
-- 
Rob


signature.asc
Description: Digital signature


Re: sk98lin for 2.6.23-rc1

2007-07-29 Thread Rob Sims
On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote:
 On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
  From http://www.krose.org/~krose/computing.html:
  
  Since the sky2 driver continues to suck ass (which is a technical
  description for it hangs all the time under load, at least on my
  hardware :-) ), I've fixed the sk98lin driver to compile for
  linux-2.6.23-rc1. Those who continue to have problems with sky2 can
  still use 2.6.23-rc1, simply by doing the following:
 ...
  Personally, I'd like to see sk98lin remain in the kernel proper until
  sky2 goes at least 6 months without reported problems.  The fact that I
  am not the only one still seeing issues is a clear indication that sky2
  (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
  sk98lin.
 ...
 
 This sounds good in theory.
 
 The practical problem with this approach is that there are always many 
 people who use the old driver when the new driver doesn't work for them 
 instead of reporting their problems with the new driver.
 
 For these people a new driver will often suck when the old driver gets 
 removed, but after the removal of the old driver they are finally forced 
 to report their bugs resulting in a better new driver for everyone.
 
 The sky2 driver is since nearly 2 years in the kernel and Stephen is 
 usually quite good at handling bugs.

The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in
the past), and cycling the module generally fixes the issues.  I have
supplied all the information that Stephen has asked for, but still no
resolution.  I am not complaining about the lack of a fix, but don't
assume that all it takes to get sky2 working is adequate bug reports.  I
have been and remain willing to test and assist debug, but after several
dropped threads, I feel like the desire or ability to fix this issue
isn't there (and remote debug of an intermittent hardware issue IS
hard), and I didn't want to be a nuisance to someone that has no
obligation to me to address the issue in the first place.

Stability has improved, it's just not there yet.

I'll switch to 1.16 soon, and respond to Stephen's request on netdev for
current issues.
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-05-06 Thread Rob Sims
On Sat, Apr 07, 2007 at 11:53:48AM -0600, Rob Sims wrote:
> On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote:
... 
> > You might have over run the hub and it wedged.  Try doing:
> > ethtool -r eth0
> > that forces a down/up
> 
> I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s)
> periodically.  I did an ethtool -S and ethtool -d to capture state.
> (sky2-fail.log).  I ran ethtool -r and retested; no change
> (sky2-ethtool-r.log).  Finally, ifdown - rmmod - modprobe - ifup, which
> restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log).
> 
> Please let me know if there's anything else I can poke into.  As a
> reminder, this is a 88E8053 r20.  Next time I see a degradation, I'll
> try cycling the switch.

I continued to have problems with this interface.  I tried bringing up
the second interface on the motherboard; I added a cable to the same hub
as the first interface and configured an address on the same subnet.
I intended to see if I could use it as a redundant interface for when
the first hung.  I haven't been able to test that, because now I haven't
seen a hang or slowdown since I brought the second interface up.  

Could the idle interface have generated an interrupt which is never
cleared because the interface is down?  Or something similar?
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-05-06 Thread Rob Sims
On Sat, Apr 07, 2007 at 11:53:48AM -0600, Rob Sims wrote:
 On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote:
... 
  You might have over run the hub and it wedged.  Try doing:
  ethtool -r eth0
  that forces a down/up
 
 I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s)
 periodically.  I did an ethtool -S and ethtool -d to capture state.
 (sky2-fail.log).  I ran ethtool -r and retested; no change
 (sky2-ethtool-r.log).  Finally, ifdown - rmmod - modprobe - ifup, which
 restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log).
 
 Please let me know if there's anything else I can poke into.  As a
 reminder, this is a 88E8053 r20.  Next time I see a degradation, I'll
 try cycling the switch.

I continued to have problems with this interface.  I tried bringing up
the second interface on the motherboard; I added a cable to the same hub
as the first interface and configured an address on the same subnet.
I intended to see if I could use it as a redundant interface for when
the first hung.  I haven't been able to test that, because now I haven't
seen a hang or slowdown since I brought the second interface up.  

Could the idle interface have generated an interrupt which is never
cleared because the interface is down?  Or something similar?
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-04-07 Thread Rob Sims
On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote:
> On Mon, 26 Mar 2007 21:24:06 -0600
> Rob Sims <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote:

> > > Use ethtool -S to if there are any pause frames, etc. See if frames are
> > > still making it into PHY statistics but not being received.
> > > 
> > > Use ethtool -d to dump registers. Need current version of ethtool with 
> > > decode logic.
> > > 
> > > Then look for things like is Ram buffer read/write pointer changing?
> > > 
> > > Is GMAC stuck in pause:
> > > 
> > > Normal is:
> > >   GMAC 1
> > >   Status   0x5010  (see GM_GPSR_XXX in sky2.h)
> > >   Control  0x1800
> > > 
> > > Stuck is
> > >   GMAC 1
> > >   Status   0x5810 (or 0x5A10)
> > 
> > First, here's the described hang in action, on the Core2 Duo on a 1Gb
> > hub:
> > GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed.
> > Read/write buffer pointers are changing.  Full ethtool output in
> > http://www.robsims.com/sky2.netmon.log.gz
> > 
> > This machine was also having major throughput problems - 17 kB/s.
> > Rebooting brought it to ~ 20 MB/s.  Booting into a kernel with the
> > proprietary sk98lin kernel module showed ~ 80MB/s.  Finally, returning
> > to sky2 gave 117 MB/s.  Tests run using netcat, dd, /dev/zero, and
> > /dev/null, transmitting from the problem box to an e1000 via a Netgear
> > GS108.  No hangs were observed during the "load test."
> 
> The vendor driver does not do hardware flow control correctly. It ignores
> Tx pause frames.
 
> Were you using Jumbo MTU?

No - 1500.

> You might have over run the hub and it wedged.  Try doing:
>   ethtool -r eth0
> that forces a down/up

I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s)
periodically.  I did an ethtool -S and ethtool -d to capture state.
(sky2-fail.log).  I ran ethtool -r and retested; no change
(sky2-ethtool-r.log).  Finally, ifdown - rmmod - modprobe - ifup, which
restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log).

Please let me know if there's anything else I can poke into.  As a
reminder, this is a 88E8053 r20.  Next time I see a degradation, I'll
try cycling the switch.
-- 
Rob
NIC statistics:
 tx_bytes: 84771583
 rx_bytes: 94680579
 tx_broadcast: 3
 rx_broadcast: 1356
 tx_multicast: 655
 rx_multicast: 22
 tx_unicast: 173780
 rx_unicast: 181842
 tx_mac_pause: 0
 rx_mac_pause: 0
 collisions: 0
 late_collision: 0
 aborted: 0
 single_collisions: 0
 multi_collisions: 0
 rx_short: 0
 rx_runt: 0
 rx_64_byte_packets: 2360
 rx_65_to_127_byte_packets: 82349
 rx_128_to_255_byte_packets: 35420
 rx_256_to_511_byte_packets: 8584
 rx_512_to_1023_byte_packets: 3734
 rx_1024_to_1518_byte_packets: 50773
 rx_1518_to_max_byte_packets: 0
 rx_too_long: 0
 rx_fifo_overflow: 0
 rx_jabber: 0
 rx_fcs_error: 0
 tx_64_byte_packets: 1744
 tx_65_to_127_byte_packets: 102366
 tx_128_to_255_byte_packets: 19607
 tx_256_to_511_byte_packets: 8735
 tx_512_to_1023_byte_packets: 4438
 tx_1024_to_1518_byte_packets: 31033
 tx_1519_to_max_byte_packets: 0
 tx_fifo_underrun: 0

PCI config
--
00: ab 11 62 43 07 00 10 00 20 00 00 02 04 00 00 00
10: 04 c0 9f fa 00 00 00 00 01 c8 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81
30: 00 00 9c fa 48 00 00 00 00 00 00 00 05 01 00 00
40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 13
50: 03 5c fc 80 00 00 00 01 00 00 00 01 05 e0 82 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Control Registers
-
Register Access Port 0x00
LED Control/Status   0xA603164A
Interrupt Source 0x
Interrupt Mask   0xC01D
Interrupt Hardware Error Source  0x
Interrupt Hardware Error Mask0x2E003F3F

Bus Management Unit
---
CSR Receive Queue 1  0x0001
CSR Sync Queue 1 0x
CSR Async Queue 10x

MAC Addresses
---
Addr 100 1A 92 23 52 4D
Addr 200 1A 92 23 52 4D
Addr 300 00 00 00 00 00

Connector type   0x4A (J)
PMD type 0x31 (1)
PHY type 0x80
Chip Id  0xB6 Yukon-2 EC (rev 0)
Ram Buffer   0x0C

Status BMU:
---
Control0x0002220A
Last Index 0x07FF
Put Index 

Re: sky2 PHY setup

2007-04-07 Thread Rob Sims
On Wed, Apr 04, 2007 at 11:19:30AM -0700, Stephen Hemminger wrote:
 On Mon, 26 Mar 2007 21:24:06 -0600
 Rob Sims [EMAIL PROTECTED] wrote:
 
  On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote:

   Use ethtool -S to if there are any pause frames, etc. See if frames are
   still making it into PHY statistics but not being received.
   
   Use ethtool -d to dump registers. Need current version of ethtool with 
   decode logic.
   
   Then look for things like is Ram buffer read/write pointer changing?
   
   Is GMAC stuck in pause:
   
   Normal is:
 GMAC 1
 Status   0x5010  (see GM_GPSR_XXX in sky2.h)
 Control  0x1800
   
   Stuck is
 GMAC 1
 Status   0x5810 (or 0x5A10)
  
  First, here's the described hang in action, on the Core2 Duo on a 1Gb
  hub:
  GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed.
  Read/write buffer pointers are changing.  Full ethtool output in
  http://www.robsims.com/sky2.netmon.log.gz
  
  This machine was also having major throughput problems - 17 kB/s.
  Rebooting brought it to ~ 20 MB/s.  Booting into a kernel with the
  proprietary sk98lin kernel module showed ~ 80MB/s.  Finally, returning
  to sky2 gave 117 MB/s.  Tests run using netcat, dd, /dev/zero, and
  /dev/null, transmitting from the problem box to an e1000 via a Netgear
  GS108.  No hangs were observed during the load test.
 
 The vendor driver does not do hardware flow control correctly. It ignores
 Tx pause frames.
 
 Were you using Jumbo MTU?

No - 1500.

 You might have over run the hub and it wedged.  Try doing:
   ethtool -r eth0
 that forces a down/up

I'm still seeing throughput drop to under 1 Mb/s (50-100kB/s)
periodically.  I did an ethtool -S and ethtool -d to capture state.
(sky2-fail.log).  I ran ethtool -r and retested; no change
(sky2-ethtool-r.log).  Finally, ifdown - rmmod - modprobe - ifup, which
restored to Gigabit speeds (96+ MB/s), (sky2-modcycle.log).

Please let me know if there's anything else I can poke into.  As a
reminder, this is a 88E8053 r20.  Next time I see a degradation, I'll
try cycling the switch.
-- 
Rob
NIC statistics:
 tx_bytes: 84771583
 rx_bytes: 94680579
 tx_broadcast: 3
 rx_broadcast: 1356
 tx_multicast: 655
 rx_multicast: 22
 tx_unicast: 173780
 rx_unicast: 181842
 tx_mac_pause: 0
 rx_mac_pause: 0
 collisions: 0
 late_collision: 0
 aborted: 0
 single_collisions: 0
 multi_collisions: 0
 rx_short: 0
 rx_runt: 0
 rx_64_byte_packets: 2360
 rx_65_to_127_byte_packets: 82349
 rx_128_to_255_byte_packets: 35420
 rx_256_to_511_byte_packets: 8584
 rx_512_to_1023_byte_packets: 3734
 rx_1024_to_1518_byte_packets: 50773
 rx_1518_to_max_byte_packets: 0
 rx_too_long: 0
 rx_fifo_overflow: 0
 rx_jabber: 0
 rx_fcs_error: 0
 tx_64_byte_packets: 1744
 tx_65_to_127_byte_packets: 102366
 tx_128_to_255_byte_packets: 19607
 tx_256_to_511_byte_packets: 8735
 tx_512_to_1023_byte_packets: 4438
 tx_1024_to_1518_byte_packets: 31033
 tx_1519_to_max_byte_packets: 0
 tx_fifo_underrun: 0

PCI config
--
00: ab 11 62 43 07 00 10 00 20 00 00 02 04 00 00 00
10: 04 c0 9f fa 00 00 00 00 01 c8 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81
30: 00 00 9c fa 48 00 00 00 00 00 00 00 05 01 00 00
40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 13
50: 03 5c fc 80 00 00 00 01 00 00 00 01 05 e0 82 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Control Registers
-
Register Access Port 0x00
LED Control/Status   0xA603164A
Interrupt Source 0x
Interrupt Mask   0xC01D
Interrupt Hardware Error Source  0x
Interrupt Hardware Error Mask0x2E003F3F

Bus Management Unit
---
CSR Receive Queue 1  0x0001
CSR Sync Queue 1 0x
CSR Async Queue 10x

MAC Addresses
---
Addr 100 1A 92 23 52 4D
Addr 200 1A 92 23 52 4D
Addr 300 00 00 00 00 00

Connector type   0x4A (J)
PMD type 0x31 (1)
PHY type 0x80
Chip Id  0xB6 Yukon-2 EC (rev 0)
Ram Buffer   0x0C

Status BMU:
---
Control0x0002220A
Last Index 0x07FF
Put Index  0x025F
List Address   0x1276
Transmit 1 done index  0x0118
Transmit index threshold   0x000A

Status FIFO
Write Pointer0x16
Read Pointer 0x16
Level0x00
Watermark0x10
ISR Watermark0x10
Status level
Init

Re: sky2 PHY setup

2007-03-28 Thread Rob Sims
On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote:
> On Fri, 16 Mar 2007 14:36:45 -0600
> Rob Sims <[EMAIL PROTECTED]> wrote:

> > Are there some debug hooks that can be activated?  My sky2 stops
> > responding (very light load) about twice a day.  The netdev watchdog
> > notices after a while and is able to reactivate the interface:

> > Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
> > Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
> > done=458
> > Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
> > Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
> > Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
> > Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full 
> > duplex, flow control both
> 
> Use ethtool -S to if there are any pause frames, etc. See if frames are
> still making it into PHY statistics but not being received.
> 
> Use ethtool -d to dump registers. Need current version of ethtool with decode 
> logic.
> 
> Then look for things like is Ram buffer read/write pointer changing?
> 
> Is GMAC stuck in pause:
> 
> Normal is:
>   GMAC 1
>   Status   0x5010  (see GM_GPSR_XXX in sky2.h)
>   Control  0x1800
> 
> Stuck is
>   GMAC 1
>   Status   0x5810 (or 0x5A10)

First, here's the described hang in action, on the Core2 Duo on a 1Gb
hub:
GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed.
Read/write buffer pointers are changing.  Full ethtool output in
http://www.robsims.com/sky2.netmon.log.gz

This machine was also having major throughput problems - 17 kB/s.
Rebooting brought it to ~ 20 MB/s.  Booting into a kernel with the
proprietary sk98lin kernel module showed ~ 80MB/s.  Finally, returning
to sky2 gave 117 MB/s.  Tests run using netcat, dd, /dev/zero, and
/dev/null, transmitting from the problem box to an e1000 via a Netgear
GS108.  No hangs were observed during the "load test."

I also had a hang on a Pentium 4 w/sky2, 100Mb/s hub.  I neglected to
try removing and re-inserting the module before rebooting.
GMAC 1
Status   0xF004
Control  0x1800

RAMbuffer pointers not moving, Read buffer Read pointer != Write pointer.
http://www.robsims.com/sky2.ethtooldumps.tgz

Thanks for looking at this.
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-03-28 Thread Rob Sims
On Fri, Mar 16, 2007 at 02:16:48PM -0700, Stephen Hemminger wrote:
 On Fri, 16 Mar 2007 14:36:45 -0600
 Rob Sims [EMAIL PROTECTED] wrote:

  Are there some debug hooks that can be activated?  My sky2 stops
  responding (very light load) about twice a day.  The netdev watchdog
  notices after a while and is able to reactivate the interface:

  Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
  Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
  Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
  done=458
  Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
  Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
  Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
  Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full 
  duplex, flow control both
 
 Use ethtool -S to if there are any pause frames, etc. See if frames are
 still making it into PHY statistics but not being received.
 
 Use ethtool -d to dump registers. Need current version of ethtool with decode 
 logic.
 
 Then look for things like is Ram buffer read/write pointer changing?
 
 Is GMAC stuck in pause:
 
 Normal is:
   GMAC 1
   Status   0x5010  (see GM_GPSR_XXX in sky2.h)
   Control  0x1800
 
 Stuck is
   GMAC 1
   Status   0x5810 (or 0x5A10)

First, here's the described hang in action, on the Core2 Duo on a 1Gb
hub:
GMAC 1 Status/Control remains at 0x5010/0x1800 until module is removed.
Read/write buffer pointers are changing.  Full ethtool output in
http://www.robsims.com/sky2.netmon.log.gz

This machine was also having major throughput problems - 17 kB/s.
Rebooting brought it to ~ 20 MB/s.  Booting into a kernel with the
proprietary sk98lin kernel module showed ~ 80MB/s.  Finally, returning
to sky2 gave 117 MB/s.  Tests run using netcat, dd, /dev/zero, and
/dev/null, transmitting from the problem box to an e1000 via a Netgear
GS108.  No hangs were observed during the load test.

I also had a hang on a Pentium 4 w/sky2, 100Mb/s hub.  I neglected to
try removing and re-inserting the module before rebooting.
GMAC 1
Status   0xF004
Control  0x1800

RAMbuffer pointers not moving, Read buffer Read pointer != Write pointer.
http://www.robsims.com/sky2.ethtooldumps.tgz

Thanks for looking at this.
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-03-16 Thread Rob Sims
On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote:
> On Fri, 16 Mar 2007 01:29:12 +0100
> Thomas Glanzmann <[EMAIL PROTECTED]> wrote:
> 
> > Hello Stephen,
> > 
> > > yesterday I pulled from Linus tree because I saw the sky2 updated and I
> > > tried to break it but it seems that my problems are gone. I let you know
> > > if anything pops up in the future.
> > 
> > bad news. I today tried the sky2 driver which is in Linus Kernel Tree
> > (HEAD) on a machine with very high network load and it stopped working
> > without any kernel messages after doing a flawless job under high load
> > for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-(
> > 
> > Thomas
> 
> I have run for 2+ days under load without problems. It is hard to
> reproduce or do much about your problem without more info.

Are there some debug hooks that can be activated?  My sky2 stops
responding (very light load) about twice a day.  The netdev watchdog
notices after a while and is able to reactivate the interface:

Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
done=458
Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, 
flow control both

This machine is a Core2 Duo e6700, and the interface is:
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 20)
to 1 Gb hub.

On a Pentium 4, with:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 15)
I have no issues, but with a very light network load, 100 Mb/s hub..

Each machine has two identical interfaces; only one has a cable in it.

Both machines can be used for testing/debug.
-- 
Rob


signature.asc
Description: Digital signature


Re: sky2 PHY setup

2007-03-16 Thread Rob Sims
On Fri, Mar 16, 2007 at 09:59:32AM -0700, Stephen Hemminger wrote:
 On Fri, 16 Mar 2007 01:29:12 +0100
 Thomas Glanzmann [EMAIL PROTECTED] wrote:
 
  Hello Stephen,
  
   yesterday I pulled from Linus tree because I saw the sky2 updated and I
   tried to break it but it seems that my problems are gone. I let you know
   if anything pops up in the future.
  
  bad news. I today tried the sky2 driver which is in Linus Kernel Tree
  (HEAD) on a machine with very high network load and it stopped working
  without any kernel messages after doing a flawless job under high load
  for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-(
  
  Thomas
 
 I have run for 2+ days under load without problems. It is hard to
 reproduce or do much about your problem without more info.

Are there some debug hooks that can be activated?  My sky2 stops
responding (very light load) about twice a day.  The netdev watchdog
notices after a while and is able to reactivate the interface:

Mar 15 13:28:12 btd kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 15 13:28:12 btd kernel: sky2 eth0: tx timeout
Mar 15 13:28:12 btd kernel: sky2 eth0: transmit ring 458 .. 435 report=458 
done=458
Mar 15 13:28:12 btd kernel: sky2 eth0: disabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: enabling interface
Mar 15 13:28:12 btd kernel: sky2 eth0: ram buffer 48K
Mar 15 13:28:15 btd kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, 
flow control both

This machine is a Core2 Duo e6700, and the interface is:
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 20)
to 1 Gb hub.

On a Pentium 4, with:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 15)
I have no issues, but with a very light network load, 100 Mb/s hub..

Each machine has two identical interfaces; only one has a cable in it.

Both machines can be used for testing/debug.
-- 
Rob


signature.asc
Description: Digital signature


Re: Change in NFS client behavior

2005-09-07 Thread Rob Sims
On Fri, Sep 02, 2005 at 12:19:07AM -0400, Trond Myklebust wrote:
> fr den 02.09.2005 Klokka 00:15 (-0400) skreiv Trond Myklebust:
> 
> > Sure. The other problem is that the test is made before the i_sem is
> > grabbed. OK, so how about the appended patch instead?
> 
> Doh!
> 
> Trond

> VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
> 
>  POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
>  whereas truncate() should only do so if the file size actually changes.
> 
>  Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
>  the VFS truncate() so that it no enforces the POSIX rules.
> 
>  Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> ---
>  attr.c  |   14 +++---
>  nfs/inode.c |5 -
>  open.c  |   25 +++--
>  3 files changed, 26 insertions(+), 18 deletions(-)

This patch does not fix the original issue - timestamps are not updated
as expected.
-- 
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Change in NFS client behavior

2005-09-07 Thread Rob Sims
On Fri, Sep 02, 2005 at 12:19:07AM -0400, Trond Myklebust wrote:
 fr den 02.09.2005 Klokka 00:15 (-0400) skreiv Trond Myklebust:
 
  Sure. The other problem is that the test is made before the i_sem is
  grabbed. OK, so how about the appended patch instead?
 
 Doh!
 
 Trond

 VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
 
  POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
  whereas truncate() should only do so if the file size actually changes.
 
  Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
  the VFS truncate() so that it no enforces the POSIX rules.
 
  Signed-off-by: Trond Myklebust [EMAIL PROTECTED]
 ---
  attr.c  |   14 +++---
  nfs/inode.c |5 -
  open.c  |   25 +++--
  3 files changed, 26 insertions(+), 18 deletions(-)

This patch does not fix the original issue - timestamps are not updated
as expected.
-- 
Rob
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Change in NFS client behavior

2005-09-02 Thread Rob Sims
On Thu, Sep 01, 2005 at 11:43:17PM -0400, Trond Myklebust wrote:
> to den 01.09.2005 Klokka 19:38 (-0400) skreiv Trond Myklebust:
> > This is a consequence of 2.6 NFS clients optimising away unnecessary
> > truncate calls. Whereas this is correct behaviour for truncate(), it
> > appears to be incorrect for open(O_TRUNC).

> > In fact, local filesystems like xfs and ext3 appear to have the opposite
> > problem: they change ctime if you call ftruncate(0) on the zero-length
> > file, as the attached test shows.
 
The following patch fixes the problem, at least when applied against
2.6.8:

--- inode.c.orig2005-08-31 16:54:27.0 -0600
+++ inode.c 2005-08-31 17:06:52.0 -0600
@@ -756,7 +756,7 @@
int error;
 
if (attr->ia_valid & ATTR_SIZE) {
-   if (!S_ISREG(inode->i_mode) || attr->ia_size == 
i_size_read(inode))
+   if (!S_ISREG(inode->i_mode))
attr->ia_valid &= ~ATTR_SIZE;
}

> Could you please check if the following patch fixes NFS (and also the
> local filesystems) for you?
 
I'll try the latest in the flood today.  Thanks for all the help!
-- 
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Change in NFS client behavior

2005-09-02 Thread Rob Sims
On Thu, Sep 01, 2005 at 11:43:17PM -0400, Trond Myklebust wrote:
 to den 01.09.2005 Klokka 19:38 (-0400) skreiv Trond Myklebust:
  This is a consequence of 2.6 NFS clients optimising away unnecessary
  truncate calls. Whereas this is correct behaviour for truncate(), it
  appears to be incorrect for open(O_TRUNC).

  In fact, local filesystems like xfs and ext3 appear to have the opposite
  problem: they change ctime if you call ftruncate(0) on the zero-length
  file, as the attached test shows.
 
The following patch fixes the problem, at least when applied against
2.6.8:

--- inode.c.orig2005-08-31 16:54:27.0 -0600
+++ inode.c 2005-08-31 17:06:52.0 -0600
@@ -756,7 +756,7 @@
int error;
 
if (attr-ia_valid  ATTR_SIZE) {
-   if (!S_ISREG(inode-i_mode) || attr-ia_size == 
i_size_read(inode))
+   if (!S_ISREG(inode-i_mode))
attr-ia_valid = ~ATTR_SIZE;
}

 Could you please check if the following patch fixes NFS (and also the
 local filesystems) for you?
 
I'll try the latest in the flood today.  Thanks for all the help!
-- 
Rob
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Change in NFS client behavior

2005-08-31 Thread Rob Sims
We have noticed when changing from kernel 2.4.23 to 2.6.8 that
timestamps of files are not changed if opened for a write and nothing is
written.  When using 2.4.23 timestamps are changed.  When using a local
filesystem (reiserfs) with either kernel, timestamps are changed.
Symptoms vary with the client, not the server.  See the script below.

When run on a 2.4.23 machine in an NFS mounted directory, output is
"Good."  When run on a 2.6.8 or 2.6.12-rc4 machine in an NFS directory,
output is "Error."

Is this a bug?  How do we revert to the 2.4/local fs behavior?  

Thanks,
Rob

#!/bin/sh

if [ -n "$1" ]; then
  if [ -e "$1" ]; then
printf "%s exists - please specify a new file name.\n" "$1"
  else
touch $1
origtime=`stat -c '%X %Y %Z' "$1"`
sleep 5
cat /dev/null > "$1"
newtime=`stat -c '%X %Y %Z' "$1"`
rm "$1"

printf "%s\n%s\n" "$origtime" "$newtime"
if [ "$origtime" = "$newtime" ]; then
  printf "Error - timestamps not modified\n"
else
  printf "Good - timestamps modified\n"
fi
  fi
else
  printf "Please specify a file name.\n"
fi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Change in NFS client behavior

2005-08-31 Thread Rob Sims
We have noticed when changing from kernel 2.4.23 to 2.6.8 that
timestamps of files are not changed if opened for a write and nothing is
written.  When using 2.4.23 timestamps are changed.  When using a local
filesystem (reiserfs) with either kernel, timestamps are changed.
Symptoms vary with the client, not the server.  See the script below.

When run on a 2.4.23 machine in an NFS mounted directory, output is
Good.  When run on a 2.6.8 or 2.6.12-rc4 machine in an NFS directory,
output is Error.

Is this a bug?  How do we revert to the 2.4/local fs behavior?  

Thanks,
Rob

#!/bin/sh

if [ -n $1 ]; then
  if [ -e $1 ]; then
printf %s exists - please specify a new file name.\n $1
  else
touch $1
origtime=`stat -c '%X %Y %Z' $1`
sleep 5
cat /dev/null  $1
newtime=`stat -c '%X %Y %Z' $1`
rm $1

printf %s\n%s\n $origtime $newtime
if [ $origtime = $newtime ]; then
  printf Error - timestamps not modified\n
else
  printf Good - timestamps modified\n
fi
  fi
else
  printf Please specify a file name.\n
fi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/