via-rhine driver stalls with: PHY status 786d, resetting...
Linux 2.6.23 http://bugzilla.kernel.org/show_bug.cgi?id=9300 Under any sort of traffic load (recursive scp from another box of a bunch of mp3s, for instance), I get continuous stalls. Recovers every time, but is dog slow. NETDEV WATCHDOG: eth2: transmit timed out eth2: Transmit timed out, status , PHY status 786d, resetting... eth2: link up, 100Mbps, full-duplex, lpa 0xCDE1 driver is via-rhine. Google search indicates this has been a problem since at least 2.4.19 and 2002 ... can we not fix this somehow? I have an e1000 card in this box too, but that has similar issues ;-( - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Strange errors from e1000 driver (2.6.18)
I'm getting a lot of these type of errors if I run 2.6.18. If I run the standard Ubuntu Dapper kernel, I don't get them. What do they indicate? Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:28 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:28 localhost kernel: next_to_watch3d Oct 21 18:48:28 localhost kernel: jiffies 7b7a0c1 Oct 21 18:48:28 localhost kernel: next_to_watch.status 0 Oct 21 18:48:30 localhost kernel: Tx Queue 0 Oct 21 18:48:30 localhost kernel: TDH 3d Oct 21 18:48:30 localhost kernel: TDT 44 Oct 21 18:48:30 localhost kernel: next_to_use 44 Oct 21 18:48:30 localhost kernel: next_to_clean39 Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:30 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:30 localhost kernel: next_to_watch3d Oct 21 18:48:30 localhost kernel: jiffies 7b7a2b5 Oct 21 18:48:30 localhost kernel: next_to_watch.status 0 Oct 21 18:48:32 localhost kernel: Tx Queue 0 Oct 21 18:48:32 localhost kernel: TDH 3d Oct 21 18:48:32 localhost kernel: TDT 44 Oct 21 18:48:32 localhost kernel: next_to_use 44 Oct 21 18:48:32 localhost kernel: next_to_clean39 Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:32 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:32 localhost kernel: next_to_watch3d Oct 21 18:48:32 localhost kernel: jiffies 7b7a4a9 Oct 21 18:48:32 localhost kernel: next_to_watch.status 0 Oct 21 18:48:34 localhost kernel: Tx Queue 0 Oct 21 18:48:34 localhost kernel: TDH 3d Oct 21 18:48:34 localhost kernel: TDT 44 Oct 21 18:48:34 localhost kernel: next_to_use 44 Oct 21 18:48:34 localhost kernel: next_to_clean39 Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:34 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:34 localhost kernel: next_to_watch3d Oct 21 18:48:34 localhost kernel: jiffies 7b7a69d Oct 21 18:48:34 localhost kernel: next_to_watch.status 0 Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange errors from e1000 driver (2.6.18)
Martin J. Bligh wrote: I'm getting a lot of these type of errors if I run 2.6.18. If I run the standard Ubuntu Dapper kernel, I don't get them. What do they indicate? Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:28 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:28 localhost kernel: next_to_watch3d Oct 21 18:48:28 localhost kernel: jiffies 7b7a0c1 Oct 21 18:48:28 localhost kernel: next_to_watch.status 0 Oct 21 18:48:30 localhost kernel: Tx Queue 0 Oct 21 18:48:30 localhost kernel: TDH 3d Oct 21 18:48:30 localhost kernel: TDT 44 Oct 21 18:48:30 localhost kernel: next_to_use 44 Oct 21 18:48:30 localhost kernel: next_to_clean39 Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:30 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:30 localhost kernel: next_to_watch3d Oct 21 18:48:30 localhost kernel: jiffies 7b7a2b5 Oct 21 18:48:30 localhost kernel: next_to_watch.status 0 Oct 21 18:48:32 localhost kernel: Tx Queue 0 Oct 21 18:48:32 localhost kernel: TDH 3d Oct 21 18:48:32 localhost kernel: TDT 44 Oct 21 18:48:32 localhost kernel: next_to_use 44 Oct 21 18:48:32 localhost kernel: next_to_clean39 Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:32 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:32 localhost kernel: next_to_watch3d Oct 21 18:48:32 localhost kernel: jiffies 7b7a4a9 Oct 21 18:48:32 localhost kernel: next_to_watch.status 0 Oct 21 18:48:34 localhost kernel: Tx Queue 0 Oct 21 18:48:34 localhost kernel: TDH 3d Oct 21 18:48:34 localhost kernel: TDT 44 Oct 21 18:48:34 localhost kernel: next_to_use 44 Oct 21 18:48:34 localhost kernel: next_to_clean39 Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean] Oct 21 18:48:34 localhost kernel: time_stamp 7b79d33 Oct 21 18:48:34 localhost kernel: next_to_watch3d Oct 21 18:48:34 localhost kernel: jiffies 7b7a69d Oct 21 18:48:34 localhost kernel: next_to_watch.status 0 Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex Actually, maybe this set is more helpful: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH 6 TDT 1f next_to_use 1f next_to_clean2 buffer_info[next_to_clean] time_stamp 2de8b54 next_to_watch6 jiffies 2de8db7 next_to_watch.status 0 e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH 6 TDT 1f next_to_use 1f next_to_clean2 buffer_info[next_to_clean] time_stamp 2de8b54 next_to_watch6 jiffies 2de8fab next_to_watch.status 0 e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH 6 TDT 1f next_to_use 1f next_to_clean2 buffer_info[next_to_clean] time_stamp 2de8b54 next_to_watch6 jiffies 2de919f next_to_watch.status 0 NETDEV WATCHDOG: eth0: transmit timed out e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange errors from e1000 driver (2.6.18)
Jesse Brandeburg wrote: On 10/22/06, Martin J. Bligh [EMAIL PROTECTED] wrote: Martin J. Bligh wrote: I'm getting a lot of these type of errors if I run 2.6.18. If I run the standard Ubuntu Dapper kernel, I don't get them. What do they indicate? Hi Martin, they indicate that you're getting transmit hangs. Means your hardware is having issues with some of the buffers it is being handed. Because the TDH and TDT noted below are not equal, it means the hardware is hung processing buffers that the driver gave to it. We need the standard bug report particulars, Sure. lspci -vv, :00:0a.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Con troller (Copper) (rev 01) Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort - MAbort- SERR- PERR- Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 5 Region 0: Memory at ef02 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at a000 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot +,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: Data: :00:0a.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Con troller (Copper) (rev 01) Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort - MAbort- SERR- PERR- Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes) Interrupt: pin B routed to IRQ 11 Region 0: Memory at ef00 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at a400 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot +,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: Data: cat /proc/interrupts, CPU0 0: 146271373 XT-PIC timer 1: 179459 XT-PIC i8042 2: 0 XT-PIC cascade 5:1975991 XT-PIC ehci_hcd:usb2, VIA8237, eth0 6: 2 XT-PIC floppy 10: 0 XT-PIC uhci_hcd:usb4, uhci_hcd:usb5, uhci_hcd:usb6 11: 0 XT-PIC ehci_hcd:usb1, uhci_hcd:usb3, uhci_hcd:usb7, uhci_hcd:usb8 12:2758142 XT-PIC i8042 14:6344745 XT-PIC ide0 15: 20014468 XT-PIC ide1 NMI: 0 LOC: 146264664 ERR: 52805 dmesg Did that bit already. ethtool -e eth0, [EMAIL PROTECTED]:/usr/local/autotest/bin # ethtool -e eth0 Offset Values -- -- 0x 00 07 e9 09 0b 08 30 05 ff ff ff ff ff ff ff ff 0x0010 44 a9 03 98 0b 46 11 10 86 80 10 10 86 80 68 34 0x0020 0c 00 10 10 00 00 02 21 c8 18 ff ff ff ff ff ff 0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0040 0c c3 61 78 04 50 02 21 c8 08 ff ff ff ff ff ff 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06 0x0060 2c 00 00 40 07 11 00 00 2c 00 00 40 ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 4f 29 0x0080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0090 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00e0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00f0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0100 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0110 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0120 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0130 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0140 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0150 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0160 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0170 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0180 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0190 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Re: Strange errors from e1000 driver (2.6.18)
Jesse Brandeburg wrote: Analysis follows, but I wanted to ask you to bisect back if you can to find the apparent patch to make the difference. Basically at this point I'd say its not likely to be an e1000 issue, but I'd like to follow up and make sure. That's going to be ugly, since I can't reproduce it at will. Maybe if I netperf it to the other box I can push it over. Nothing seems out of order, but the latency may be low, I'd be curious what these looked like before with the old kernel. Some of the other things to compare would have been the lspci -vv output from your chipset with old/new kernel, in case the bridge/system configuration changed. There are no known problems right now with this chipset 82546EB OK. will try later when I have more time. For now I switched to the onboard via rhine controller. shared int, fine, but whats with the ERR: ? Hmm. Having rebooted they look rather lower. but might be a time thing. CPU0 0:1405995 XT-PIC timer 1: 5910 XT-PIC i8042 2: 0 XT-PIC cascade 5: 0 XT-PIC uhci_hcd:usb3 7: 27135 XT-PIC ehci_hcd:usb2, VIA8237, eth0 10: 0 XT-PIC uhci_hcd:usb4, uhci_hcd:usb5, uhci_hcd:usb6 11: 0 XT-PIC ehci_hcd:usb1, uhci_hcd:usb7, uhci_hcd:usb8 12: 157547 XT-PIC i8042 14: 36296 XT-PIC ide0 15: 196690 XT-PIC ide1 NMI: 0 LOC:1406006 ERR: 26 except you didn't include any of the e1000 load information nor the system's boot information as it came up. OK, it had gone since reboot, but I rebooted just now new info attached. This chipset is one of the most frequent common elements in problem reports of TX hangs for e1000. My current theory (we've bought a bunch of these systems and never reproduced the issue) is that there is something either design specific or BIOS specific that causes this chipset to interact very badly with e1000 hardware. Some systems have the issue and some don't. If you could bisect back to a working point it would be interesting to see where that pointed. OK, is going to be hard to bisect, since the other one was an Ubuntu kernel, but I guess I can give 2.6.15 virgin a shot, at least. doesn't seem you're overclocked. Good. Nah, I'm pretty conservative with hardware, get enough problems when it's all running within specs ;-) Thanks for looking at all this. M. Linux version 2.6.18 ([EMAIL PROTECTED]) (gcc version 3.4.6 (Ubuntu 3.4.6-1ubuntu2)) #2 Sun Oct 22 13:45:39 PDT 2006 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3fff (usable) BIOS-e820: 3fff - 3fff3000 (ACPI NVS) BIOS-e820: 3fff3000 - 4000 (ACPI data) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: - 0001 (reserved) Warning only 896MB will be used. Use a HIGHMEM enabled kernel. 896MB LOWMEM available. found SMP MP-table at 000f52b0 On node 0 totalpages: 229376 DMA zone: 4096 pages, LIFO batch:0 Normal zone: 225280 pages, LIFO batch:31 DMI 2.2 present. Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: OEM0 Product ID: PROD APIC at: 0xFEE0 Processor #0 6:10 APIC version 17 I/O APIC #2 Version 17 at 0xFEC0. Enabling APIC mode: Flat. Using 1 I/O APICs Processors: 1 Allocating PCI resources starting at 5000 (gap: 4000:bec0) Detected 1098.980 MHz processor. Built 1 zonelists. Total pages: 229376 Kernel command line: root=/dev/hda1 ro lapic profile=2 kernel profiling enabled (shift: 2) mapped APIC to d000 (fee0) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c04e4000 soft=c04e3000 PID hash table entries: 4096 (order: 12, 16384 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 902328k/917504k available (2647k kernel code, 14784k reserved, 1144k data, 160k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 2200.00 BogoMIPS (lpj=4400011) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0383fbff c1c3fbff CPU: After vendor identify, caps: 0383fbff c1c3fbff CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: After all inits, caps: 0383fbff c1c3fbff 0420
Re: [patch 09/13] net: add the UdpSndbufErrors and UdpRcvbufErrors MIBs
David Miller wrote: From: [EMAIL PROTECTED] Date: Mon, 14 Aug 2006 23:03:43 -0700 From: Martin Bligh [EMAIL PROTECTED] Signed-off-by: Martin Bligh [EMAIL PROTECTED] Cc: David S. Miller [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Applied to net-2.6.19. I implemented the IPv6 side since you didn't bother to. Thanks Dave ... was more ignorance than malice ;-) M. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open bugzilla reports
[Bug 5958] CF bluetooth card oopses machine when http://bugzilla.kernel.org/show_bug.cgi?id=5958 This isn't a serial bug - it's a bluetooth ldisc bug. I reported it to the bluetooth folk back when it first got raised by Pavel. However, they seem to be completely disinterested in fixing it. Unfortunately, there isn't a category for bt crap in bugzilla, otherwise I'd reassign it. Please kick the bt folk. Stick it under Drivers/Other if you want ... M. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network vm deadlock... solution?
--Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 23:43:40 +0200): Daniel Phillips [EMAIL PROTECTED] : [...] A point on memory pressure: here, we are not talking about the continuous state of running under heavy load, but rather the microscopically short periods where not a single page of memory is available to normal tasks. It is when a block IO event happens to land inside one of those microscopically short periods that we run into problems. You suggested in a previous message to use an emergency allocation pool at the driver level. Afaik, 1) the usual network driver can already buffer a bit with its Rx descriptor ring and 2) it more or less tries to refill it each time napi issues its -poll() method. So it makes me wonder: - have you collected evidence that the drivers actually run out of memory in the (microscopical) situation you describe ? There's other situations where it does (ie swap device dies, etc). - instead of modifying each and every driver to be vm aware, why don't you hook in net_rx_action() when memory starts to be low ? Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant with the threshold (if (memory_pressure)) used in the Rx path to decide that memory is low. It's send-side, not receive. M. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html