via-rhine driver stalls with: PHY status 786d, resetting...

2007-11-03 Thread Martin J. Bligh

Linux 2.6.23

http://bugzilla.kernel.org/show_bug.cgi?id=9300

Under any sort of traffic load (recursive scp from another box of a 
bunch of mp3s, for instance), I get continuous stalls. Recovers every

time, but is dog slow.

NETDEV WATCHDOG: eth2: transmit timed out
eth2: Transmit timed out, status , PHY status 786d, resetting...
eth2: link up, 100Mbps, full-duplex, lpa 0xCDE1

driver is via-rhine.

Google search indicates this has been a problem since at least 2.4.19 
and 2002 ... can we not fix this somehow? I have an e1000 card in this

box too, but that has similar issues ;-(
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange errors from e1000 driver (2.6.18)

2006-10-22 Thread Martin J. Bligh

I'm getting a lot of these type of errors if I run 2.6.18. If
I run the standard Ubuntu Dapper kernel, I don't get them.
What do they indicate?

Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:28 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:28 localhost kernel:   next_to_watch3d
Oct 21 18:48:28 localhost kernel:   jiffies  7b7a0c1
Oct 21 18:48:28 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:30 localhost kernel:   Tx Queue 0
Oct 21 18:48:30 localhost kernel:   TDH  3d
Oct 21 18:48:30 localhost kernel:   TDT  44
Oct 21 18:48:30 localhost kernel:   next_to_use  44
Oct 21 18:48:30 localhost kernel:   next_to_clean39
Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:30 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:30 localhost kernel:   next_to_watch3d
Oct 21 18:48:30 localhost kernel:   jiffies  7b7a2b5
Oct 21 18:48:30 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:32 localhost kernel:   Tx Queue 0
Oct 21 18:48:32 localhost kernel:   TDH  3d
Oct 21 18:48:32 localhost kernel:   TDT  44
Oct 21 18:48:32 localhost kernel:   next_to_use  44
Oct 21 18:48:32 localhost kernel:   next_to_clean39
Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:32 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:32 localhost kernel:   next_to_watch3d
Oct 21 18:48:32 localhost kernel:   jiffies  7b7a4a9
Oct 21 18:48:32 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:34 localhost kernel:   Tx Queue 0
Oct 21 18:48:34 localhost kernel:   TDH  3d
Oct 21 18:48:34 localhost kernel:   TDT  44
Oct 21 18:48:34 localhost kernel:   next_to_use  44
Oct 21 18:48:34 localhost kernel:   next_to_clean39
Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:34 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:34 localhost kernel:   next_to_watch3d
Oct 21 18:48:34 localhost kernel:   jiffies  7b7a69d
Oct 21 18:48:34 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link 
is Up 100 Mbps Full Duplex

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange errors from e1000 driver (2.6.18)

2006-10-22 Thread Martin J. Bligh

Martin J. Bligh wrote:

I'm getting a lot of these type of errors if I run 2.6.18. If
I run the standard Ubuntu Dapper kernel, I don't get them.
What do they indicate?

Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:28 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:28 localhost kernel:   next_to_watch3d
Oct 21 18:48:28 localhost kernel:   jiffies  7b7a0c1
Oct 21 18:48:28 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:30 localhost kernel:   Tx Queue 0
Oct 21 18:48:30 localhost kernel:   TDH  3d
Oct 21 18:48:30 localhost kernel:   TDT  44
Oct 21 18:48:30 localhost kernel:   next_to_use  44
Oct 21 18:48:30 localhost kernel:   next_to_clean39
Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:30 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:30 localhost kernel:   next_to_watch3d
Oct 21 18:48:30 localhost kernel:   jiffies  7b7a2b5
Oct 21 18:48:30 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:32 localhost kernel:   Tx Queue 0
Oct 21 18:48:32 localhost kernel:   TDH  3d
Oct 21 18:48:32 localhost kernel:   TDT  44
Oct 21 18:48:32 localhost kernel:   next_to_use  44
Oct 21 18:48:32 localhost kernel:   next_to_clean39
Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:32 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:32 localhost kernel:   next_to_watch3d
Oct 21 18:48:32 localhost kernel:   jiffies  7b7a4a9
Oct 21 18:48:32 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:34 localhost kernel:   Tx Queue 0
Oct 21 18:48:34 localhost kernel:   TDH  3d
Oct 21 18:48:34 localhost kernel:   TDT  44
Oct 21 18:48:34 localhost kernel:   next_to_use  44
Oct 21 18:48:34 localhost kernel:   next_to_clean39
Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:34 localhost kernel:   time_stamp   7b79d33
Oct 21 18:48:34 localhost kernel:   next_to_watch3d
Oct 21 18:48:34 localhost kernel:   jiffies  7b7a69d
Oct 21 18:48:34 localhost kernel:   next_to_watch.status 0
Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link 
is Up 100 Mbps Full Duplex


Actually, maybe this set is more helpful:

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  6
  TDT  1f
  next_to_use  1f
  next_to_clean2
buffer_info[next_to_clean]
  time_stamp   2de8b54
  next_to_watch6
  jiffies  2de8db7
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  6
  TDT  1f
  next_to_use  1f
  next_to_clean2
buffer_info[next_to_clean]
  time_stamp   2de8b54
  next_to_watch6
  jiffies  2de8fab
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  6
  TDT  1f
  next_to_use  1f
  next_to_clean2
buffer_info[next_to_clean]
  time_stamp   2de8b54
  next_to_watch6
  jiffies  2de919f
  next_to_watch.status 0
NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange errors from e1000 driver (2.6.18)

2006-10-22 Thread Martin J. Bligh

Jesse Brandeburg wrote:

On 10/22/06, Martin J. Bligh [EMAIL PROTECTED] wrote:

Martin J. Bligh wrote:
 I'm getting a lot of these type of errors if I run 2.6.18. If
 I run the standard Ubuntu Dapper kernel, I don't get them.
 What do they indicate?


Hi Martin, they indicate that you're getting transmit hangs.  Means
your hardware is having issues with some of the buffers it is being
handed.  Because the TDH and TDT noted below are not equal, it means
the hardware is hung processing buffers that the driver gave to it.

We need the standard bug report particulars,


Sure.

lspci -vv, 


:00:0a.0 Ethernet controller: Intel Corporation 82546EB Gigabit 
Ethernet Con

troller (Copper) (rev 01)
Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Step

ping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium 
TAbort- TAbort

- MAbort- SERR- PERR-
Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at ef02 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at a000 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot

+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4]  Capabilities: [f0] Message Signalled 
Interrupts:

 64bit+ Queue=0/0 Enable-
Address:   Data: 

:00:0a.1 Ethernet controller: Intel Corporation 82546EB Gigabit 
Ethernet Con

troller (Copper) (rev 01)
Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Step

ping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium 
TAbort- TAbort

- MAbort- SERR- PERR-
Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin B routed to IRQ 11
Region 0: Memory at ef00 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at a400 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot

+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4]  Capabilities: [f0] Message Signalled 
Interrupts:

 64bit+ Queue=0/0 Enable-
Address:   Data: 

cat /proc/interrupts, 


   CPU0
  0:  146271373  XT-PIC  timer
  1: 179459  XT-PIC  i8042
  2:  0  XT-PIC  cascade
  5:1975991  XT-PIC  ehci_hcd:usb2, VIA8237, eth0
  6:  2  XT-PIC  floppy
 10:  0  XT-PIC  uhci_hcd:usb4, uhci_hcd:usb5, 
uhci_hcd:usb6
 11:  0  XT-PIC  ehci_hcd:usb1, uhci_hcd:usb3, 
uhci_hcd:usb7, uhci_hcd:usb8

 12:2758142  XT-PIC  i8042
 14:6344745  XT-PIC  ide0
 15:   20014468  XT-PIC  ide1
NMI:  0
LOC:  146264664
ERR:  52805


dmesg


Did that bit already.


ethtool -e eth0,


[EMAIL PROTECTED]:/usr/local/autotest/bin # ethtool -e eth0
Offset  Values
--  --
0x  00 07 e9 09 0b 08 30 05 ff ff ff ff ff ff ff ff
0x0010  44 a9 03 98 0b 46 11 10 86 80 10 10 86 80 68 34
0x0020  0c 00 10 10 00 00 02 21 c8 18 ff ff ff ff ff ff
0x0030  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0040  0c c3 61 78 04 50 02 21 c8 08 ff ff ff ff ff ff
0x0050  ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06
0x0060  2c 00 00 40 07 11 00 00 2c 00 00 40 ff ff ff ff
0x0070  ff ff ff ff ff ff ff ff ff ff ff ff ff ff 4f 29
0x0080  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Re: Strange errors from e1000 driver (2.6.18)

2006-10-22 Thread Martin J. Bligh

Jesse Brandeburg wrote:

Analysis follows, but I wanted to ask you to bisect back if you can to
find the apparent patch to make the difference.  Basically at this
point I'd say its not likely to be an e1000 issue, but I'd like to
follow up and make sure.


That's going to be ugly, since I can't reproduce it at will. Maybe if
I netperf it to the other box I can push it over.


Nothing seems out of order, but the latency may be low, I'd be curious
what these looked like before with the old kernel.  Some of the other
things to compare would have been the lspci -vv output from your
chipset with old/new kernel, in case the bridge/system configuration
changed.  There are no known problems right now with this chipset
82546EB


OK. will try later when I have more time. For now I switched to the
onboard via rhine controller.


shared int, fine, but whats with the ERR: ?


Hmm. Having rebooted they look rather lower. but might be a time thing.

   CPU0
  0:1405995  XT-PIC  timer
  1:   5910  XT-PIC  i8042
  2:  0  XT-PIC  cascade
  5:  0  XT-PIC  uhci_hcd:usb3
  7:  27135  XT-PIC  ehci_hcd:usb2, VIA8237, eth0
 10:  0  XT-PIC  uhci_hcd:usb4, uhci_hcd:usb5, 
uhci_hcd:usb6
 11:  0  XT-PIC  ehci_hcd:usb1, uhci_hcd:usb7, 
uhci_hcd:usb8

 12: 157547  XT-PIC  i8042
 14:  36296  XT-PIC  ide0
 15: 196690  XT-PIC  ide1
NMI:  0
LOC:1406006
ERR: 26


except you didn't include any of the e1000 load information nor the
system's boot information as it came up.


OK, it had gone since reboot, but I rebooted just now  new info
attached.


This chipset is one of the most frequent common elements in problem
reports of TX hangs for e1000.  My current theory (we've bought a
bunch of these systems and never reproduced the issue) is that there
is something either design specific or BIOS specific that causes this
chipset to interact very badly with e1000 hardware.  Some systems have
the issue and some don't.  If you could bisect back to a working point
it would be interesting to see where that pointed.


OK, is going to be hard to bisect, since the other one was an Ubuntu
kernel, but I guess I can give 2.6.15 virgin a shot, at least.


doesn't seem you're overclocked.  Good.


Nah, I'm pretty conservative with hardware, get enough problems when
it's all running within specs ;-)

Thanks for looking at all this.

M.


Linux version 2.6.18 ([EMAIL PROTECTED]) (gcc version 3.4.6 (Ubuntu 
3.4.6-1ubuntu2)) #2 Sun Oct 22 13:45:39 PDT 2006
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3fff (usable)
 BIOS-e820: 3fff - 3fff3000 (ACPI NVS)
 BIOS-e820: 3fff3000 - 4000 (ACPI data)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820:  - 0001 (reserved)
Warning only 896MB will be used.
Use a HIGHMEM enabled kernel.
896MB LOWMEM available.
found SMP MP-table at 000f52b0
On node 0 totalpages: 229376
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
DMI 2.2 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: OEM0 Product ID: PROD APIC at: 0xFEE0
Processor #0 6:10 APIC version 17
I/O APIC #2 Version 17 at 0xFEC0.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 5000 (gap: 4000:bec0)
Detected 1098.980 MHz processor.
Built 1 zonelists.  Total pages: 229376
Kernel command line: root=/dev/hda1 ro lapic profile=2
kernel profiling enabled (shift: 2)
mapped APIC to d000 (fee0)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c04e4000 soft=c04e3000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 902328k/917504k available (2647k kernel code, 14784k reserved, 1144k 
data, 160k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 2200.00 BogoMIPS (lpj=4400011)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff c1c3fbff    
 
CPU: After vendor identify, caps: 0383fbff c1c3fbff    
 
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1c3fbff  0420  

Re: [patch 09/13] net: add the UdpSndbufErrors and UdpRcvbufErrors MIBs

2006-08-15 Thread Martin J. Bligh

David Miller wrote:

From: [EMAIL PROTECTED]
Date: Mon, 14 Aug 2006 23:03:43 -0700


From: Martin Bligh [EMAIL PROTECTED]

Signed-off-by: Martin Bligh [EMAIL PROTECTED]
Cc: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]


Applied to net-2.6.19.

I implemented the IPv6 side since you didn't bother to.


Thanks Dave ... was more ignorance than malice ;-)

M.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: open bugzilla reports

2006-02-04 Thread Martin J. Bligh

  [Bug 5958] CF bluetooth card oopses machine when
  http://bugzilla.kernel.org/show_bug.cgi?id=5958
 
 This isn't a serial bug - it's a bluetooth ldisc bug.  I reported it
 to the bluetooth folk back when it first got raised by Pavel.  However,
 they seem to be completely disinterested in fixing it.
 
 Unfortunately, there isn't a category for bt crap in bugzilla, otherwise
 I'd reassign it.  Please kick the bt folk.

Stick it under Drivers/Other if you want ...

M.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network vm deadlock... solution?

2005-08-02 Thread Martin J. Bligh


--Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 
23:43:40 +0200):

 Daniel Phillips [EMAIL PROTECTED] :
 [...]
 A point on memory pressure: here, we are not talking about the continuous 
 state of running under heavy load, but rather the microscopically short 
 periods where not a single page of memory is available to normal tasks.  It 
 is when a block IO event happens to land inside one of those microscopically 
 short periods that we run into problems.
 
 You suggested in a previous message to use an emergency allocation pool at
 the driver level. Afaik, 1) the usual network driver can already buffer a
 bit with its Rx descriptor ring and 2) it more or less tries to refill it
 each time napi issues its -poll() method. So it makes me wonder:
 - have you collected evidence that the drivers actually run out of memory
   in the (microscopical) situation you describe ?

There's other situations where it does (ie swap device dies, etc).

 - instead of modifying each and every driver to be vm aware, why don't
   you hook in net_rx_action() when memory starts to be low ?
 
 Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant
 with the threshold (if (memory_pressure)) used in the Rx path to decide
 that memory is low.

It's send-side, not receive.

M.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html