Launchpad has imported 23 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=8962.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2007-08-30T11:22:39+00:00 gbailey wrote:

Most recent kernel where this bug did not occur:  Unknown

Distribution:  CentOS 4.5

Hardware Environment:  Intel server board SE7320VP21

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF 
Gigabit Ethernet Controller (rev 18)
        Subsystem: Intel Corporation: Unknown device 3466
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at deefc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at b800 [size=256]
        Expansion ROM at deec0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 
Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0

Software Environment:  CentOS 4.5 install + "vanilla" kernel 2.6.23-rc4

Problem Description:

Discovered while attempting to troubleshoot:
https://bugzilla.redhat.com/show_bug.cgi?id=228733

I'm trying to understand the "tx timeout" messages, and how to reproduce
them.  In my test environment, I have 2 servers, each of which has a
sky2 Marvell NIC connected to a switch as "eth0".

On server "B", I type "nc -l -p 3409 > /dev/null"

On server "A", I type "nc server-B 3409 < /dev/zero"

I see lots of traffic from A->B, as would be expected.  If I shutdown
eth0 on server "B" using "ifdown eth0", wait a few seconds, and then re-
enable eth0 on server "B" using "ifup eth0", I see the following in
"dmesg" output on server B:

sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
ip_tables: (C) 2000-2006 Netfilter Core Team

As expected...  The problem is that server B can occasionally end up in
a state where it is unable to ping or access the local subnet anymore.
Both "mii-tool" and "ethtool eth0" shows a link present.

If I perform "ifdown eth0; ifup eth0" on server B, it doesn't help anything. 
If I unload the sky2 module, then things clear up and I'm back on the network 
again.

I'm curious about this testcase because the symptom seems to match the
earlier "tx timeout" messages; the driver tried to re-enable itself
after a timeout, but it's still not able to see any traffic.

Steps to reproduce:

See "Problem Description" above.  While traffic is continuously being
transmitted from server "A" to server "B", shutdown the network
interface on server "B", and then start the interface on server "B".
Monitoring RX traffic on server "B" will indicate when it is no longer
receiving the bytes sent from server "A".

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/4

------------------------------------------------------------------------
On 2007-08-30T13:40:30+00:00 stephen wrote:

CentOS has older version of driver please update to latest version from 
2.6.22.6 or 2.6.23-rc4.  There are several bugs that caused tx timeouts (hung 
chip),
and a problem that led to PHY clock issues.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/5

------------------------------------------------------------------------
On 2007-08-30T14:42:09+00:00 gbailey wrote:

The kernel version I encountered this on is 2.6.23-rc4, as marked in the
bug report and is why I chose CentOS 4.5 install + "vanilla" kernel
2.6.23-rc4" under "Software Environment".

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/6

------------------------------------------------------------------------
On 2007-09-05T06:35:35+00:00 stephen wrote:

Please enable the sky2 debugfs kernel configuration option.
Mount debugfs on somewhere (/debug)
Hang system then capture sky2 state.  (cat /debug/sky2/eth0 >savefile)
It will show the status of IRQ and receive/transmit.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/8

------------------------------------------------------------------------
On 2007-09-07T13:59:57+00:00 gbailey wrote:

Rebuilt 2.6.23-rc5 with SKY2_DEBUG.  I've reproduced the issue where
ifdown/ifup does not reset the interface properly.

# cat /debug/sky2/eth0
IRQ src=0 mask=c000001d control=0
Status ring (empty)
Tx ring pending=24...24 report=24 done=24

Rx ring hw get=169 put=169 last=1023

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/9

------------------------------------------------------------------------
On 2007-09-10T13:49:12+00:00 tony wrote:

I can confirm that we can reproduce this issue (or one nearly identical
to it). We are using the current stable 2.6.22.6 kernel on a system with
a Marvell 88E8055 (Panasonic Toughbook CF-74).

To reproduce it, we can open any kind of persistent socket connection
(such as an Apache SSL connection using a browser) and then yank the
cable. We wait a bit and pop the cable back in and the driver is dead.
We can't ping in or out until we down the interface, remove and reinsert
the sky2 driver and bring the interface back up.

I will be happy to provide any info or test any patches you provide.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/10

------------------------------------------------------------------------
On 2007-09-11T17:04:30+00:00 xt.knight wrote:

I'm having trouble here too.  Using an Ubuntu Gutsy kernel:

Linux andy-desktop 2.6.22-10-generic #1 SMP Wed Aug 22 07:42:05 GMT 2007
x86_64 GNU/Linux

I'm not getting tx timeouts AFAIK.  I'm not getting any driver crash
dumps either.  I'm just having connection issues.  I'm not transferring
anything big.  I will be browsing the web, then all of a sudden the
interface will get in some type of corrupted state where nothing works.
Sometimes ifdown/ifup will do it, sometimes it will not.  Sometimes
dhclient works, sometimes not.  Unloading sky2 and reloading it *always*
fixes the problem, indicating some type of issue with the "current
state" of the driver.  Maybe a variable not getting cleared/etc but I
can only guess.

Sometimes ifdown/ifup will work and then it will only work for about a
minute.  Redoing ifdown/ifup will make sky2 work for another few hours
(it's like refilling your gas tank, just on a smaller level ;)).

Sometimes I will get Destination Host Unreachable from pinging my
router, sometimes ping says nothing at all.

I tried with the modprobe sky2 debug=16 option but the log output looks
not much different from when the adapter is working.  And, I haven't
caught it just when it stopped working, yet.  I have only turned on my
monitor to notice that my net wasn't working and then dumped a few logs
of it.  In any case, I don't think they're helpful but if you need them
I will gladly post them.

Most importantly, this is a regression from 2.6.20.  I hope this can get
fixed and if so I'll notify those at Ubuntu and get this into the kernel
and hopefully an exception for it if necessary.

Ubuntu bug link: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.22/+bug/138611

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/11

------------------------------------------------------------------------
On 2007-09-16T11:14:38+00:00 xt.knight wrote:

I fixed my problems by using 2.6.23-rc6.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/12

------------------------------------------------------------------------
On 2007-09-17T11:49:19+00:00 nhorman wrote:

Interesting, the only thing that went in between rc5 and rc6 was the
restore multicast list on resume, which while potentially applicable,
doesn't sound like it addresses the whole of the problem.  Does rc5 fix
the problem for you as well?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/13

------------------------------------------------------------------------
On 2007-09-26T15:26:02+00:00 xt.knight wrote:

Sorry for the misunderstanding.

I fixed my problems by upgrading from the Ubuntu Linux 2.6.22-11 kernel
to the vanilla 2.6.23-rc6 kernel.  I hadn't even tried any other 2.6.23
yet.  I'm thinking the Ubuntu kernel has a problem due to mismatched or
partially backported patches, at least in my case.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/14

------------------------------------------------------------------------
On 2007-09-30T16:55:57+00:00 xt.knight wrote:

Created attachment 13006
debugfs sky2

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/15

------------------------------------------------------------------------
On 2007-09-30T16:56:11+00:00 xt.knight wrote:

Created attachment 13007
debugfs sky2 (when it did work)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/16

------------------------------------------------------------------------
On 2007-09-30T16:56:43+00:00 xt.knight wrote:

I am still having issues with 2.6.23-rc6 and rc8, but it took awhile for
them to begin happening again.  I attached two debugfs logs of sky2.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/17

------------------------------------------------------------------------
On 2007-10-27T10:20:54+00:00 rf wrote:

I'm running SuSe 10.3 and with an updated kernel (2.6.23.1-164-default) the 
problem remains. 
The interface is listed as "sky2 0000:02:00.0: v1.18 addr 0xd5020000 irq 17 
Yukon-EC (0xb6) rev 1"
I only run 100mbit to a switch.  Using it on a media server and unfortunately 
after a few hours of reasonably heavy use streaming media, the interface dies, 
then a 3-4 hours later, the machine crashes.  
If I get to the machine before it dies, I can restart the interface, but as 
others report, it lasts for a shorter time.  
When restarting it, "ifstatus" reports it as up in the failed mode, doing an 
"ifdown" and "ifup" restarts it.
ifup reports: "device: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit 
Ethernet Controller (rev 19)"
I see nothing in dmesg when the interface dies

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/18

------------------------------------------------------------------------
On 2007-10-29T08:15:48+00:00 stephen wrote:

There is a problem on Yukon-EC that causes the receive fifo to hang.
Workaround code in 2.6.23 that is supposed to detect and fix it.

The problem also only occurs if there is no flow control. The sky2
autonegotiates to enable flow control but some hardware doesn't support
flow control or has it disables.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/19

------------------------------------------------------------------------
On 2007-10-29T19:49:16+00:00 rf wrote:

Thanks.
Unfortunately the log reports:
kernel: [  982.916325] sky2 eth0: Link is up at 100 Mbps, full duplex, flow 
control both
So I'm not sure it's limited to the case when flow control is on.
I noticed some threads earlier this year where you tried flow control off. Is 
that worth trying again with latest release, if so how?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/20

------------------------------------------------------------------------
On 2007-10-30T11:50:32+00:00 rf wrote:

I can repeat the failure by trying to copy about 20G of files over a
Samba connection from a Windows box.  I can never get past 5G before it
fails.  So perhaps I can do some debugging for you?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/21

------------------------------------------------------------------------
On 2008-01-07T22:27:32+00:00 stephen wrote:

Is this the same bug as the original report, or is the bug becoming a
tar baby for all the possible "my sky2 has hung" reports?

The original report said problem was reproducible after up/down. Not one
of the "my box hangs under load" problems.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/22

------------------------------------------------------------------------
On 2008-01-09T16:45:47+00:00 rf wrote:

Sorry, no, to avoid raising another bug on sky2 this was the nearest I could 
find.
Sky2 hangs under load, that's the problem.  Very repeatable.  
I've now compiled and switched to the Marvell driver sk98lin, and that gives me 
no problems... 

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/23

------------------------------------------------------------------------
On 2008-04-24T15:28:20+00:00 andree182 wrote:

Tried to find the bug source, but couldn't ;-( I used ubuntu 2.6.24
sources, placed the 2.6.22 (ubuntu) sky2.[ch] (ver. 1.18) files into the
tree and applied the

[NET]: Make NAPI polling independent of struct net_device objects.
+
[NET]: Nuke SET_MODULE_OWNER macro.

patches (from git). Then I build the module, did a rmmod/modprobe, but
nothing changed - the sky2 still fails with "sky2 eth0: rx error ..." in
the dmesg.

Thus I guess the error could be somewhere else (maybe the napi polling
isn't working quite right?), or maybe... I guess I'm gonna try to really
find the bug...

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/24

------------------------------------------------------------------------
On 2008-04-30T14:34:41+00:00 ryan.roth wrote:

I have consistently had the same issue reported above where the kernel
reports the following and the interface does not work.  It seems to work
fine the first timeyou bring up the interface, but if you do a
ifdown/ifup you get the following message, but no connection.

"sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both"

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/25

------------------------------------------------------------------------
On 2009-03-23T11:34:11+00:00 alan wrote:

Closing out old bugs

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/29

------------------------------------------------------------------------
On 2021-10-15T17:59:37+00:00 ucelsanicin wrote:

------8<-------
 1 size_t fwrite(const void * __restrict ptr, size_t size, 
http://www-look-4.com/category/travel/
 2               size_t nmemb, register FILE * __restrict stream)
 3 {
 4     size_t retval; https://komiya-dental.com/category/technology/
 5     __STDIO_AUTO_THREADLOCK_VAR;
 6  http://www.iu-bloomington.com/category/technology/
 7 >   __STDIO_AUTO_THREADLOCK(stream);
 8 
 9     retval = fwrite_unlocked(ptr, size, nmemb, stream);
10  https://waytowhatsnext.com/category/technology/
11     __STDIO_AUTO_THREADUNLOCK(stream);
12  http://www.wearelondonmade.com/category/travel/
13     return retval;
14 }
------>8-------
 http://www.jopspeech.com/category/travel/
Here, we are at line 7. Using the "next" command leads no where. However,
setting a breakpoint on line 9 and issuing "continue" works.
http://joerg.li/category/travel/
Looking at the assembly instructions reveals that we're dealing with the
critical section entry code [1] that should never be interrupted, in this
case by the debugger's implicit breakpoints: http://connstr.net/category/travel/

------8<-------
  ... http://embermanchester.uk/category/travel/
1 add_s   r0,r13,0x38
2 mov_s   r3,1
3 llock   r2,[r0]        <-.
4 brne.nt r2,0,14     --.  | http://www.slipstone.co.uk/category/travel/
5 scond   r3,[r0]       |  |
6 bne     -10         --|--'
7 brne_s  r2,0,84     <-' http://www.logoarts.co.uk/category/travel/
  ...
------>8-------
 http://www.acpirateradio.co.uk/category/travel/
Lines 3 until 5 (inclusive) are supposed to be executed atomically. Therefore,
GDB should never (implicitly) insert a breakpoint on lines 4 and 5, else the 
http://www.compilatori.com/category/travel/ 
program will try to acquire the lock again by jumping back to line 3 and
gets stuck in an infinite loop. https://www.webb-dev.co.uk/category/technology/

The solution is to make GDB aware of these patterns so it inserts breakpoints
after the sequence -- line 6 in this example.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
source-2.6.20/+bug/114019/comments/30


** Bug watch added: Red Hat Bugzilla #228733
   https://bugzilla.redhat.com/show_bug.cgi?id=228733

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/114019

Title:
  sky2 driver "tx timeout" with large uploads

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/114019/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to