em0 lock up / hangs (WAS: em0: Watchdog timeout -- resetting)
Hello, Eugene. You wrote 1 февраля 2011 г., 15:38:33: > Eugene wrote: >> You could give a try to netisr parallelism of RELENG_8 instead of POLLING >> (and tune interrupt throttling) if your box does not have lots of dynamic >> interfaces like when using mpd. > Jack wrote: >> I don't test POLLING, sounds like its broken, I don't understand >> why you think you need you need it? This hardware supports >> MSI why not use it? > I send one answer to two messages, because data is the same. > Here it is snapshot of "top -S" with "H" pressed when server sends > 1Gbit/s via SMB with polling (Windows'7 client copies 8GiB sparse file to very > fast local disk): > the same without polling, with net.isr settings: > # sysctl net.isr > net.isr.direct: 0 > net.isr.direct_force: 0 After these settings server lost connection. It works locally, no panic, but "ping gateway" shows "No buffer space available", and any other "network activity" shows the same message. Up-down of interface helps. I attached outputs of: vmstat -m netstat -m sysctl dev.em0 BEFORE interface reset No polling, net.isr.direct=0, net.isr.direct_force=0 -- // Black Lion AKA Lev Serebryakov sysctl.dev.em0.log Description: Binary data vmstat-m.log Description: Binary data netstat-m.log Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
Hello, Eugene. You wrote 1 февраля 2011 г., 16:52:57: >> = INTR - ISR.DIRECT=1 >> Real speed (accroding to Windows'7 report) ~101MiB/s. >> I've re-created file to flush caches on both sides between trys. > netisr queues help to deal with lots of incoming traffic. > If you bother about outgoing traffic only, it won't help. This server is mostly-R/O storage server, so I bother about outgoing traffic. And now, after switching polling off & experiments, it is lost -- about 30 minutes after experiments it stops answer on pings and other network activity. I'll be near local console only at night to report panic or something else. -- // Black Lion AKA Lev Serebryakov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
On 01.02.2011 18:38, Lev Serebryakov wrote: > = INTR - ISR.DIRECT=1 > Real speed (accroding to Windows'7 report) ~101MiB/s. > > I've re-created file to flush caches on both sides between trys. > netisr queues help to deal with lots of incoming traffic. If you bother about outgoing traffic only, it won't help. Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
Hello, Eugene & Jack. You wrote 1 февраля 2011 г., 11:23:23: Eugene wrote: > You could give a try to netisr parallelism of RELENG_8 instead of POLLING > (and tune interrupt throttling) if your box does not have lots of dynamic > interfaces like when using mpd. Jack wrote: > I don't test POLLING, sounds like its broken, I don't understand > why you think you need you need it? This hardware supports > MSI why not use it? I send one answer to two messages, because data is the same. Here it is snapshot of "top -S" with "H" pressed when server sends 1Gbit/s via SMB with polling (Windows'7 client copies 8GiB sparse file to very fast local disk): = POLLING CPU: 0.5% user, 0.0% nice, 0.6% system, 1.3% interrupt, 98.1% idle PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K32K CPU11 90.1H 100.00% {idle: cpu1} 11 root 171 ki31 0K32K RUN 0 82.1H 100.00% {idle: cpu0} 12 root -64- 0K 304K WAIT0 33:40 0.68% {irq18: uhci2 ehc} 12 root -44- 0K 304K WAIT1 225:22 0.00% {swi1: netisr 0} 14 root -68- 0K 528K - 1 16:19 0.00% {usbus3} 12 root -40- 0K 304K WAIT0 14:25 0.00% {swi2: cambio} 12 root -64- 0K 304K WAIT1 12:50 0.00% {irq22: ahci0} 4 root -8- 0K16K - 0 12:26 0.00% g_down = POLLING NB: no "smbd" process at all in first 8 positions. Real speed (accroding to Windows'7 report) ~75MiB/s. the same without polling, with net.isr settings: # sysctl net.isr net.isr.numthreads: 1 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 1 net.isr.direct: 0 net.isr.direct_force: 0 = INTR - ISR.DIRECT=0 CPU: 3.8% user, 0.0% nice, 26.5% system, 6.6% interrupt, 63.2% idle PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K32K RUN 0 82.1H 83.59% {idle: cpu0} 11 root 171 ki31 0K32K RUN 1 90.1H 64.06% {idle: cpu1} 33873 root 720 28912K 5432K select 0 0:28 34.96% smbd 12 root -44- 0K 304K WAIT0 225:29 9.18% {swi1: netisr 0} 0 root -680 0K 128K - 1 0:02 6.30% {em0 taskq} 12 root -68- 0K 304K WAIT0 0:00 1.56% {irq20: em0 fwohc} 7 root 44- 0K16K psleep 0 3:12 0.39% pagedaemon 12 root -64- 0K 304K WAIT1 33:41 0.20% {irq18: uhci2 ehc} 14 root -68- 0K 528K - 0 16:19 0.00% {usbus3} 12 root -40- 0K 304K WAIT0 14:25 0.00% {swi2: cambio} = INTR - ISR.DIRECT=0 Real speed (accroding to Windows'7 report) ~85MiB/s. the same without polling, with net.isr settings: # sysctl net.isr net.isr.numthreads: 1 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 1 net.isr.direct: 1 net.isr.direct_force: 1 = INTR - ISR.DIRECT=1 CPU: 2.8% user, 0.0% nice, 30.1% system, 1.7% interrupt, 65.4% idle PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K32K RUN 1 90.2H 89.36% {idle: cpu1} 11 root 171 ki31 0K32K RUN 0 82.2H 67.87% {idle: cpu0} 33873 root 1030 28912K 5424K CPU00 0:51 33.98% smbd 0 root -680 0K 128K - 1 0:06 12.70% {em0 taskq} 12 root -68- 0K 304K WAIT0 0:01 1.66% {irq20: em0 fwohc} 7 root 45- 0K16K psleep 0 3:12 0.78% pagedaemon 12 root -64- 0K 304K WAIT0 33:42 0.20% {irq18: uhci2 ehc} 12 root -44- 0K 304K WAIT1 225:33 0.00% {swi1: netisr 0} 14 root -68- 0K 528K - 1 16:20 0.00% {usbus3} 12 root -40- 0K 304K WAIT0 14:25 0.00% {swi2: cambio} = INTR - ISR.DIRECT=1 Real speed (accroding to Windows'7 report) ~101MiB/s. I've re-created file to flush caches on both sides between trys. -- // Black Lion AKA Lev Serebryakov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
We have tried POLLING here on Intel cards attached to the igb driver (see my post entitled "High interrupt rate on a PF box + performance" from 27/01/2011". This broke carp *badly* and we switched back to interrupts. You say a single thread eats up a full CPU core, can you post a top to show the %interrupt and your smb process' usage ? On 2/1/11 10:28 AM, Jack Vogel wrote: > I don't test POLLING, sounds like its broken, I don't understand > why you think you need you need it? This hardware supports > MSI why not use it? > > Jack > > > 2011/1/31 Lev Serebryakov > >> Hello, Freebsd-stable. >> You wrote 1 февраля 2011 г., 10:24:16: >> >>> And all connections are reset. Before latest commits to driver >>> this system paniced in swi_clock. Now it works without panics, but >>> seems, that problem is not fixed completely. >> I forgot to give one last pice of information: POLLING is in action. >> Without it single thread copy from this server via SMB eats one core >> of CPU completely. >> >> -- >> // Black Lion AKA Lev Serebryakov >> >> ___ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" >> > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
I don't test POLLING, sounds like its broken, I don't understand why you think you need you need it? This hardware supports MSI why not use it? Jack 2011/1/31 Lev Serebryakov > Hello, Freebsd-stable. > You wrote 1 февраля 2011 г., 10:24:16: > > > And all connections are reset. Before latest commits to driver > > this system paniced in swi_clock. Now it works without panics, but > > seems, that problem is not fixed completely. > I forgot to give one last pice of information: POLLING is in action. > Without it single thread copy from this server via SMB eats one core > of CPU completely. > > -- > // Black Lion AKA Lev Serebryakov > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
On 01.02.2011 13:58, Lev Serebryakov wrote: > Hello, Freebsd-stable. > You wrote 1 февраля 2011 г., 10:24:16: > >> And all connections are reset. Before latest commits to driver >> this system paniced in swi_clock. Now it works without panics, but >> seems, that problem is not fixed completely. > I forgot to give one last pice of information: POLLING is in action. > Without it single thread copy from this server via SMB eats one core > of CPU completely. > You could give a try to netisr parallelism of RELENG_8 instead of POLLING (and tune interrupt throttling) if your box does not have lots of dynamic interfaces like when using mpd. In /etc/sysctl.conf: net.isr.direct=0 net.isr.direct_force=0 Eugene Grosbein. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em0: Watchdog timeout -- resetting
Hello, Freebsd-stable. You wrote 1 февраля 2011 г., 10:24:16: > And all connections are reset. Before latest commits to driver > this system paniced in swi_clock. Now it works without panics, but > seems, that problem is not fixed completely. I forgot to give one last pice of information: POLLING is in action. Without it single thread copy from this server via SMB eats one core of CPU completely. -- // Black Lion AKA Lev Serebryakov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
em0: Watchdog timeout -- resetting
Hello, Freebsd-stable. System is 8-STABLE (8.2-PRERELEASE) with very last e1000 driver (cvsupped 27 Jan, last commits to e1000 were done 22 Jan). NIC is: em0: port 0xdc00-0xdc1f mem 0xfea4-0xfea5,0xfea79000-0xfea79fff irq 20 at device 25.0 on pci0 em0: No MSI/MSIX using a Legacy IRQ em0: [FILTER] em0@pci0:0:25:0:class=0x02 card=0x82681043 chip=0x10bd8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfea4, size 131072, enabled bar [14] = type Memory, range 32, base 0xfea79000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled It is on-board LAN on ASUS P5R-VM DO MoBo (Q35 chipset). I have these tunables in "/etc/loader.conf" hw.em.rxd=4096 hw.em.txd=4096 And these non-standard sysctls: dev.em.0.rx_int_delay=200 dev.em.0.tx_int_delay=200 dev.em.0.rx_abs_int_delay=4000 dev.em.0.tx_abs_int_delay=4000 dev.em.0.rx_processing_limit=4096 Several times a day I got messages like this: em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 1302, hw tdt = 1265 em0: TX(0) desc avail = 31,Next TX to Clean = 1296 em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 3999, hw tdt = 3959 em0: TX(0) desc avail = 31,Next TX to Clean = 3990 em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 1431, hw tdt = 1394 em0: TX(0) desc avail = 31,Next TX to Clean = 1425 And all connections are reset. Before latest commits to driver this system paniced in swi_clock. Now it works without panics, but seems, that problem is not fixed completely. -- // Black Lion AKA Lev Serebryakov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
"em0: watchdog timeout -- resetting" revisited
I am having some issues with the em(4) card. I am getting the "em0: watchdog timeout -- resetting" error. The box has a DFI KT600-AL mobo and a Intel Pro/1000 GT nic. I am running 6.2-RELEASE. I have rma'd the card twice already and get the same error. I've tried different PCI ports, different switch ports, and different cables with the same negative results. I rebuilt the kernel without em support and used the latest em(4) driver from Intel (v6.6.6) as a module. I still get the same error. The card is connected via Cat6 patch to a 3Com OfficeConnect unmanaged GigE switch. The switch shows 1000bTX full-duplex link via LED's. Below is some info from the system: Script started on Fri Nov 2 16:48:32 2007 freenas:~# uname -a FreeBSD freenas.local 6.2-RELEASE-p8 FreeBSD 6.2-RELEASE-p8 #0: Thu Nov 1 14:37:18 EDT 2007 [EMAIL PROTECTED]:/usr/obj/freenas/usr/src/sys/FREENAS-i386 i386 freenas:~# dmesg Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p8 #0: Thu Nov 1 14:37:18 EDT 2007 [EMAIL PROTECTED]:/usr/obj/freenas/usr/src/sys/FREENAS-i386 MPTable: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) (1999.79-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x6a0 Stepping = 0 Features=0x383fbff AMD Features=0xc0400800 real memory = 536870912 (512 MB) avail memory = 461340672 (439 MB) ioapic0: Assuming intbase of 0 ioapic0 irqs 0-23 on motherboard wlan: mac acl policy registered kbd1 at kbdmux0 PadLock: No ACE support. module_register_init: MOD_LOAD (padlock, 0xc0896bb0, 0) error 22 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) rr174x: RocketRAID 174x controller driver v1.02 (Feb 1 2007 10:51:17) ACPI-0159: *** Error: AcpiLoadTables: Could not get RSDP, AE_NO_ACPI_TABLES ACPI-0213: *** Error: AcpiLoadTables: Could not load tables: AE_NO_ACPI_TABLES ACPI: table load failed: AE_NO_ACPI_TABLES cpu0 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 agp0: mem 0xd000-0xd7ff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) em0: port 0xd000-0xd03f mem 0xe31a-0xe31b,0xe318-0xe319 irq 16 at device 13.0 on pci0 em0: Ethernet address: 00:1b:21:0a:0d:5a em0: [FAST] atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd400-0xd40f at device 15.0 on pci0 ata0: on atapci0 ata1: on atapci0 uhci0: port 0xd800-0xd81f irq 21 at device 16.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xdc00-0xdc1f irq 21 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xe000-0xe01f irq 21 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0xe400-0xe41f irq 21 at device 16.3 on pci0 uhci3: [GIANT-LOCKED] usb3: on uhci3 usb3: USB revision 1.0 uhub3: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: mem 0xe31c-0xe31c00ff irq 21 at device 16.4 on pci0 ehci0: [GIANT-LOCKED] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: on ehci0 usb4: USB revision 2.0 uhub4: VIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered isab0: at device 17.0 on pci0 isa0: on isab0 rr174x0: port 0xe800-0xe8ff mem 0xe300-0xe30f irq 19 at device 20.0 on pci0 rr174x: adapter at PCI 0:20:0, IRQ 19 pmtimer0 on isa0 orm0: at iomem 0xc-0xcf7ff,0xd-0xd0fff,0xd1000-0xd8fff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 ppi0: on ppbus0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 unknown: can't assign resources (port) speaker0: at port 0x61 on isa0 unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounter "TSC" frequency 1999790331 Hz quality 800 Timecounters tick every 10.000 msec rr174x: sta
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
Hello, Just wanted to send a me too on this issue. Whenever it happends I can see our Cisco switch reporting the interface going down and up as well (Line Protocol). FreeBSD localhost.localdomain 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Wed Sep 13 00:10:04 CEST 2006 [EMAIL PROTECTED]:/ usr/obj/usr/src/sys/PT i386 em0: flags=8843 mtu 1500 options=b media: Ethernet autoselect (1000baseTX ) status: active [EMAIL PROTECTED]:11:0: class=0x02 card=0x10048086 chip=0x10048086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82543GC Gigabit Ethernet Controller (Copper)' class= network subclass = ethernet (This is a add-in 64bit PCI card.) I am stress-testing -STABLE on a spare server to aid in making 6.2 as bugfree as possible. It is set up as a NFS server with two Linux NFS clients connected that is concurrently extracting 5 copies of /usr/src to it, and running a program that creates millions of files with random UID's to test for QUOTA issues. On the server I repeatedly dump the exported filesystem with snapshot and cache enabled. (dump -L -C 32 -af /dev/null ...) I'm building todays -STABLE on a different server with SMP and two em NIC's onboard, and will start similar tests on it to see if I can reproduce the watchdog timeouts there as well. -- Frode Nordahl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On 9/15/06, Martin Nilsson <[EMAIL PROTECTED]> wrote: I'm also seeing these on a Supermicro PDSMi board with a recent stable. Please tell me what debugging info that is needed to fix this. /Martin FreeBSD mailbox 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Sun Sep 10 17:43:15 CEST 2006 [EMAIL PROTECTED]:/usr/obj-local/usr/src/sys/SMP amd64 lspci -v output: 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Super Micro Computer Inc Unknown device 108c Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at ed20 (32-bit, non-prefetchable) I/O ports at 4000 Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Super Micro Computer Inc Unknown device 109a Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at ed30 (32-bit, non-prefetchable) I/O ports at 5000 Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 Martin, do you see similar problems using either port, I ask because this system may be similar to one that Yahoo has and there was only a problem with one port and not the other, can you check this out please? Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On 9/14/06, David C. Myers <[EMAIL PROTECTED]> wrote: > watchdogs mean that the transmit ring is not being cleaned, so the > question is what is your machine doing at 100% cpu, if its that busy > the network watchdogs may just be a side effect and not the real > problem? I get them with a completely idle machine. My home directory is mounted via NFS (from FreeBSD 6.1 on an amd64 machine), and with the kernel from earlier this week, the machine would just hang for 30 seconds to a couple of minutes. A slew of "watchdog timeout" messages would appear. Then I'd get a moment's responsiveness out of the machine, then another long wait, then a moment's responsiveness, then a long wait... The machine would never recover from this cycle (at least, so far as I was patient enough to wait). Going back to a kernel dated late July resolved everything. Someone else asked me for the hardware version of my em0 board... [EMAIL PROTECTED]:10:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet Could you perhaps go back to the kernel you say was stable and then drop in the latest em driver? Or if that has issues building do it the other way around, take the em driver from the build that gave you no problems and put it on this kernel you are running now? It would be helpful to know if this is a driver problem or something in the stack. Cheers, Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
Something with em0 is really wrong. I dont get timeouts, but Before cvsup I had 6.0-PRERELEASE and didn't have a problem. Now I have "FreeBSD 6.2-PRERELEASE #8: Fri Sep 15 03:44:49 MSD 2006" and the problem is so: (On machine I have LARGE_NAT, em0, em1, em2) on fresh system ping to www.ru from client computer (goes to inet via nat) is 3-5ms after few hours (i see it in the night) then traffic is smaller ping to www.ru is 11-12 ms. Why? after reboot it still gut for a few ours. FreeBSD/amd64 kernel with options DEVICE_POLLING options HZ=2500 with HZ=1000 and without DEVICE_POLLING nothing changes - 11-12 still goes after few hours. PS Should I downgrade to 6.0-RELEASE or earlier or tonight cvsup updates could resolve a problem (files sounds like tcp...): Checkout src/sys/contrib/ipfilter/netinet/ip_nat.h Edit src/sys/netinet/in_pcb.c Edit src/sys/netinet/tcp_input.c Edit src/sys/netinet/tcp_subr.c Edit src/sys/netinet/tcp_timer.c Edit src/sys/netinet/tcp_timer.h Edit src/sys/netinet/tcp_var.h Edit src/sys/sys/param.h Edit src/usr.sbin/pkg_install/add/main.c PPS Now I rebuild kernels and tomorrow night will se. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On Thu, Sep 14, 2006 at 02:27:29AM +0200, Ronald Klop wrote: > Them manual page em(4) mentions trying another cable when the watchdog > timeout happens, so I tried that. But it didn't help. > Is there anything I can test to (help) debug this? > It happens a lot when my machine is under load. (100% CPU) > Is it possible that it happens since I upgraded the memory from 1GB to 2 > GB? I don't think it's the cable. I started getting these recently as well (starting about a week ago). Always when there's a lot of CPU and disk I/O load. Also sometimes my USB keyboard would become unresponsive at about the same time (under high load). Sometimes it would stutter and act like the key was being held down for a second or two. I built a new kernel (6.2-PRE now) on 9/12. The keyboard problem seems to be gone but I still get the em watchdog timeouts occasionally. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
I'm also seeing these on a Supermicro PDSMi board with a recent stable. Please tell me what debugging info that is needed to fix this. /Martin FreeBSD mailbox 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Sun Sep 10 17:43:15 CEST 2006 [EMAIL PROTECTED]:/usr/obj-local/usr/src/sys/SMP amd64 lspci -v output: 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Super Micro Computer Inc Unknown device 108c Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at ed20 (32-bit, non-prefetchable) I/O ports at 4000 Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Super Micro Computer Inc Unknown device 109a Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at ed30 (32-bit, non-prefetchable) I/O ports at 5000 Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On Fri, 15 Sep 2006 02:06:08 +0200, David C. Myers <[EMAIL PROTECTED]> wrote: watchdogs mean that the transmit ring is not being cleaned, so the question is what is your machine doing at 100% cpu, if its that busy the network watchdogs may just be a side effect and not the real problem? I get them with a completely idle machine. My home directory is mounted via NFS (from FreeBSD 6.1 on an amd64 machine), and with the kernel from earlier this week, the machine would just hang for 30 seconds to a couple of minutes. A slew of "watchdog timeout" messages would appear. Then I'd get a moment's responsiveness out of the machine, then another long wait, then a moment's responsiveness, then a long wait... The machine would never recover from this cycle (at least, so far as I was patient enough to wait). Going back to a kernel dated late July resolved everything. Someone else asked me for the hardware version of my em0 board... [EMAIL PROTECTED]:10:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet -David. This sounds familiar to my problem. I solved it today by enabling polling. I know it's a workaround. -- Ronald Klop Amsterdam, The Netherlands ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
watchdogs mean that the transmit ring is not being cleaned, so the question is what is your machine doing at 100% cpu, if its that busy the network watchdogs may just be a side effect and not the real problem? I get them with a completely idle machine. My home directory is mounted via NFS (from FreeBSD 6.1 on an amd64 machine), and with the kernel from earlier this week, the machine would just hang for 30 seconds to a couple of minutes. A slew of "watchdog timeout" messages would appear. Then I'd get a moment's responsiveness out of the machine, then another long wait, then a moment's responsiveness, then a long wait... The machine would never recover from this cycle (at least, so far as I was patient enough to wait). Going back to a kernel dated late July resolved everything. Someone else asked me for the hardware version of my em0 board... [EMAIL PROTECTED]:10:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet -David. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
Jack Vogel wrote: On 9/13/06, Ronald Klop <[EMAIL PROTECTED]> wrote: ... Them manual page em(4) mentions trying another cable when the watchdog timeout happens, so I tried that. But it didn't help. Is there anything I can test to (help) debug this? It happens a lot when my machine is under load. (100% CPU) Is it possible that it happens since I upgraded the memory from 1GB to 2 GB? watchdogs mean that the transmit ring is not being cleaned, so the question is what is your machine doing at 100% cpu, if its that busy the network watchdogs may just be a side effect and not the real problem? Jack I see these too when installing packages over nfs on my Laptop. If I run with a low level of network traffic, i.e. ssh compile, and peg out the cpu with a benchmark such as flops, I don't see these timeouts. 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 26 14:45:40 CDT 2006 [EMAIL PROTECTED]:1:0: class=0x02 card=0x05491014 chip=0x101e8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82540EP Gigabit Ethernet Controller (Mobile)' class= network Any suggestions? Thanks Dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On 9/13/06, Ronald Klop <[EMAIL PROTECTED]> wrote: ... Them manual page em(4) mentions trying another cable when the watchdog timeout happens, so I tried that. But it didn't help. Is there anything I can test to (help) debug this? It happens a lot when my machine is under load. (100% CPU) Is it possible that it happens since I upgraded the memory from 1GB to 2 GB? watchdogs mean that the transmit ring is not being cleaned, so the question is what is your machine doing at 100% cpu, if its that busy the network watchdogs may just be a side effect and not the real problem? Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
At 10:20 PM 9/13/2006, David Myers wrote: Sep 5 11:55:12 ronald kernel: em0: watchdog timeout -- resetting I got a bazillion of these, and a completely unusable machine, when I upgraded to 6.1-stable sources as of two days ago. The machine would simply freeze for minutes at a time. Going back to my previous kernel (dating from late July) made everything just fine. So something got seriously broken in the em driver in the last few weeks. Which version of the NIC do you have ? (pciconf -lv ) ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
Sep 5 11:55:12 ronald kernel: em0: watchdog timeout -- resetting I got a bazillion of these, and a completely unusable machine, when I upgraded to 6.1-stable sources as of two days ago. The machine would simply freeze for minutes at a time. Going back to my previous kernel (dating from late July) made everything just fine. So something got seriously broken in the em driver in the last few weeks. -David. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On Tue, 05 Sep 2006 23:52:05 +0200, Ronald Klop <[EMAIL PROTECTED]> wrote: Hello, I get these errors a lot. Sep 5 11:55:12 ronald kernel: em0: watchdog timeout -- resetting Sep 5 11:55:12 ronald kernel: em0: link state changed to DOWN Sep 5 11:55:14 ronald kernel: em0: link state changed to UP Sep 5 12:00:37 ronald kernel: em0: watchdog timeout -- resetting Sep 5 12:00:37 ronald kernel: em0: link state changed to DOWN Sep 5 12:00:39 ronald kernel: em0: link state changed to UP I tried turning off rxcsum/txcsum and set these sysctl's. dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 0 (default 66) But the error is still there. Searching the internet and the list provides more of the same problems, but I didn't find an answer. My dmesg is attached. Is there any info I need to provide to debug this or can I try patches? Them manual page em(4) mentions trying another cable when the watchdog timeout happens, so I tried that. But it didn't help. Is there anything I can test to (help) debug this? It happens a lot when my machine is under load. (100% CPU) Is it possible that it happens since I upgraded the memory from 1GB to 2 GB? (dmesg was attached to my previous mail, but I can provide it again.) Ronald. -- Ronald Klop Amsterdam, The Netherlands ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em0: watchdog timeout -- resetting (6.1-STABLE)
On Tuesday 05 September 2006 14:52, Ronald Klop wrote: > Hello, > > I get these errors a lot. > > Sep 5 11:55:12 ronald kernel: em0: watchdog timeout -- resetting > Sep 5 11:55:12 ronald kernel: em0: link state changed to DOWN > Sep 5 11:55:14 ronald kernel: em0: link state changed to UP > Sep 5 12:00:37 ronald kernel: em0: watchdog timeout -- resetting > Sep 5 12:00:37 ronald kernel: em0: link state changed to DOWN > Sep 5 12:00:39 ronald kernel: em0: link state changed to UP So am I. Especially when I transfer a GB or 2 from Windows XP to 6.1-stable. I use the FreeBSD machine as a backup for digital photos and my ripped mp3 files. A photo session is usually in excess of 1 GB and can hang with the watchdog timeout. Kent > > I tried turning off rxcsum/txcsum and set these sysctl's. > dev.em.0.rx_int_delay: 0 > dev.em.0.tx_int_delay: 0 (default 66) > But the error is still there. > Searching the internet and the list provides more of the same > problems, but I didn't find an answer. > > My dmesg is attached. > > Is there any info I need to provide to debug this or can I try > patches? > > Ronald. -- Kent Stewart Richland, WA http://www.soyandina.com/ "I am Andean project". http://users.owt.com/kstewart/index.html ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
em0: watchdog timeout -- resetting (6.1-STABLE)
Hello, I get these errors a lot. Sep 5 11:55:12 ronald kernel: em0: watchdog timeout -- resetting Sep 5 11:55:12 ronald kernel: em0: link state changed to DOWN Sep 5 11:55:14 ronald kernel: em0: link state changed to UP Sep 5 12:00:37 ronald kernel: em0: watchdog timeout -- resetting Sep 5 12:00:37 ronald kernel: em0: link state changed to DOWN Sep 5 12:00:39 ronald kernel: em0: link state changed to UP I tried turning off rxcsum/txcsum and set these sysctl's. dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 0 (default 66) But the error is still there. Searching the internet and the list provides more of the same problems, but I didn't find an answer. My dmesg is attached. Is there any info I need to provide to debug this or can I try patches? Ronald. -- Ronald Klop Amsterdam, The Netherlands dmesg.boot Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"