Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Hi, We have a Dell 1950 with the same problem (bce). We tried debug.mpsafenet=0, but to no avail. It's a very frustrating show-stopper for us as well, we're moving all 1950 out of the production environment. Any help would be greatly appreciated. See mail to freebsd-current mail attached. Kind regards, Fredrik Widlund ---BeginMessage--- Hi, Suddenly the problem occured again. We are running the same setup as below, but with debug.mpsafenet=0, but it didn't help. This is indeed a showstopper for us, we are moving all our dell 1950 out of production environment until we can solve this issue. Any help would be greatly appreciated. Kind regards, Fredrik Widlund bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP [repeat 30 times] # vmstat -i interrupt total rate irq14: ata0 47 0 irq16: bce0 bce13019 5 irq18: mfi0 123 0 irq21: uhci0 uhci+ 6 0 irq64: mpt0 1214 2 cpu0: timer 1118344 1997 Total1122753 2004 Fredrik Widlund wrote: Hi, I can't reproduce the problem. Everything is exactly the same, but I get no timeouts and the nic seems to work without any problems. Kind regards, Fredrik Widlund Fredrik Widlund wrote: Hi, An update, right now the BCE nic seems to work, I'm not sure exactly why yet. I'm attaching the dmesg however. SAS adapter is the PERC 5I, which is handled by the MPT driver in 6.2-Beta2. I'll continue to look at this. There are some unhandled events (0x12, 0x16), but these might not be needed. [mpi_ioc.h] #define MPI_EVENT_SAS_PHY_LINK_STATUS (0x0012) ... #define MPI_EVENT_SAS_DISCOVERY (0x0016) [dmesg mpt part] mpt0: LSILogic SAS/SATA Adapter port 0xec00-0xecff mem 0xfc7fc000-0xfc7f,0xfc7e-0xfc7e irq 64 at device 8.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.5.12.0 mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). Kind regards, Fredrik Widlund Fredrik Widlund wrote: Hi, I'm trying to get FreeBSD working on Dell 1950 (and 2950), which is vital since it's no longer possible to buy 1850/2850 units here. Hardware: PE1950 Xeon 5130, 2GB 667MHz SAS 5I PERC5E 6.1-RELEASE: not possible since SAS drives aren't found. 6.2-BETA2: bce interfaces does not work at all, watchdog timeout occured every other second, and _no_ connectivity. We are also having problems with some PE1850 failing from time to time with watchdog timeout hangs, and have had to debug.mpsafenet=0 these. How can we help solve this issue? It would really be a pity to be forced to leave FreeBSD but we really can't afford to replace our choice of hardware platform. Kind regards, Fredrik Widlund ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-BETA2 #0: Mon Oct 2 03:32:44 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU5130 @ 2.00GHz (1995.01-MHz 686-class CPU) Origin = GenuineIntel Id = 0x6f6 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS, HTT,TM,PBE Features2=0x4e33dSSE3,RSVD2,MON,DS_CPL,VMX,TM2,b9,CX16,b14,b15,b18 AMD Features=0x2010NX,LM AMD Features2=0x1LAHF Cores per package: 2 real memory = 2147123200 (2047 MB) avail memory = 2096009216 (1998 MB) ACPI APIC Table: DELL PE_SC3 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 2 ioapic1: Changing APIC ID to 3 ioapic1: WARNING: intbase 64 != expected base 24 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 64-87 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Scott Long ([EMAIL PROTECTED]) on 04/10/2006 at 14:49 wrote: #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc glebius 2006-08-08 09:20:26 utc So you tested before these two changes and after these two changes, yes? Yes that's it. What about with just the first change and not the second? Anyways, I'm Because building a kernel that only has the first change (2006-08-08 09:19:25) fails. Can you try a quick test? Reboot and press '6' at the FreeBSD loader menu. That will drop you to a prompt. Then enter the following line: set hint.apic.0.disabled=1 Done: synced to STABLE-6 of this morning (9:00 UTC)i, made world and kernel and boot with APIC disabled. Still same freeze after starting X and loading a few tabs in Firefox. Thanks for the suggestion Scott. -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote: One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) Hi, Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756): interrupt total rate irq1: atkbd0 5 0 irq14: ata0 47 0 irq16: nvidia0 em+ 86545185 irq17: fwohci0 7 0 irq21: twe0 6426 13 cpu0: timer 927735 1986 Total1020765 2185 I freeze the box by starting firefox which reloads a few tabs I keep open in my session when under X. This is perfectly reproductible. From the logs, first I see: Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010597 Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010598 Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010599 Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059a Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059b Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059c Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059d Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059e Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059f Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a0 then come the watchdogs: Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN Oct 2 16:48:58 mojito kernel: em0: link state changed to UP Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a1 Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a2 Oct 2 16:49:08 mojito kernel: em0: link state changed to UP Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a3 Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:18 mojito kernel: em0: link state changed to UP Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a4 Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:29 mojito kernel: em0: link state changed to UP Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a5 Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:39 mojito kernel: em0: link state changed to UP Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:49 mojito kernel: em0: link state changed to UP and the box ends up frozen less than a minute later. The traffic on the Intel card can be low (pinging a host for a few dozen of seconds), medium (reloading a few pages in the tabs of Firefox) or high (downloading several iso images from our local FTP mirror): whatever I do, if both nvidia and em0 are used, the box freezes. Note that I can't freeze the box when doing several simultaneous big downloads or taring up a lot of files but NOT running X. So I guess it is a shared nvidia/em IRQ issue. FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem. The DEBUG kernconf is GENERIC + witness options enabled (but they do not help in this case). I traced back to find which changeset introduced the trouble. The results are: #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00 # OK ... #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc freebsd src repository modified files:
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
In response to Scott Long [EMAIL PROTECTED]: Corrected patch is at: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff I have a Dell 1950 here that's been dedicated to helping solve this problem. I can reliably reproduce the watchdog timeout by doing the following steps: 1) Mount /usr/src via nfs 2) start a -j99 buildworld 3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory Usually only takes a few minutes before a watchdog occurs, and I have no more networking. Your patch applied cleanly, and everything built OK. The results are: a) My USB keyboard stopped working :( b) The problem does _not_ improve. In my case, it's a bce driver that's doing it. I also have some em cards in this machine that I can test if the information will be helpful. This is quite a show-stopper for us, if there's any other testing/etc I can do, _please_ let me know. I might even be able to get remote console access to this machine approved for a developer. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
In response to Bill Moran [EMAIL PROTECTED]: In my case, it's a bce driver that's doing it. I also have some em cards in this machine that I can test if the information will be helpful. Note that I can _not_ reproduce the problem with an em interface (a PCI NIC). As mentioned earlier, I can reliably and easily produce a watchdog timeout on the bce interface (onboard). The em interface seems rock-solid. I guess I have a workaround for now, but the offer to test/provide more information stands. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Wed, Oct 04, 2006 at 10:40:25AM -0400, Bill Moran wrote: In response to Scott Long [EMAIL PROTECTED]: Corrected patch is at: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff I have a Dell 1950 here that's been dedicated to helping solve this problem. I can reliably reproduce the watchdog timeout by doing the following steps: 1) Mount /usr/src via nfs 2) start a -j99 buildworld 3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory Usually only takes a few minutes before a watchdog occurs, and I have no more networking. Your patch applied cleanly, and everything built OK. The results are: a) My USB keyboard stopped working :( b) The problem does _not_ improve. In my case, it's a bce driver that's doing it. I also have some em cards in this machine that I can test if the information will be helpful. This is quite a show-stopper for us, if there's any other testing/etc I can do, _please_ let me know. I might even be able to get remote console access to this machine approved for a developer. Remote console access would be a help. I suspect there may be more than one problem here. Kris pgpu6t2nkM1Ej.pgp Description: PGP signature
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
At 12:27 PM 10/4/2006, Bill Moran wrote: In response to Bill Moran [EMAIL PROTECTED]: In my case, it's a bce driver that's doing it. I also have some em cards in this machine that I can test if the information will be helpful. Note that I can _not_ reproduce the problem with an em interface (a PCI NIC). As mentioned earlier, I can reliably and easily produce Hi, Just to clarify, you mean without the patch you do run into the problem, but with the patch you cannot generate the problem ? Or with the em NIC, you have never seen the issue at all ? ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
In response to Mike Tancsa [EMAIL PROTECTED]: At 12:27 PM 10/4/2006, Bill Moran wrote: In response to Bill Moran [EMAIL PROTECTED]: In my case, it's a bce driver that's doing it. I also have some em cards in this machine that I can test if the information will be helpful. Note that I can _not_ reproduce the problem with an em interface (a PCI NIC). As mentioned earlier, I can reliably and easily produce Hi, Just to clarify, you mean without the patch you do run into the problem, but with the patch you cannot generate the problem ? Or with the em NIC, you have never seen the issue at all ? Without patch: * bce locks up easily * Unable to lock up em * keyboard works With patch: * bce locks up easily * unable to lock up em * keyboard doesn't work -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
In response to Kris Kennaway [EMAIL PROTECTED]: This is quite a show-stopper for us, if there's any other testing/etc I can do, _please_ let me know. I might even be able to get remote console access to this machine approved for a developer. Remote console access would be a help. I suspect there may be more than one problem here. In progress ... I'll contact you privately when it's ready. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Guy Brand wrote: Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote: One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) Hi, Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756): interrupt total rate irq1: atkbd0 5 0 irq14: ata0 47 0 irq16: nvidia0 em+ 86545185 irq17: fwohci0 7 0 irq21: twe0 6426 13 cpu0: timer 927735 1986 Total1020765 2185 I freeze the box by starting firefox which reloads a few tabs I keep open in my session when under X. This is perfectly reproductible. From the logs, first I see: Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010597 Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010598 Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010599 Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059a Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059b Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059c Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059d Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059e Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059f Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a0 then come the watchdogs: Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN Oct 2 16:48:58 mojito kernel: em0: link state changed to UP Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a1 Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a2 Oct 2 16:49:08 mojito kernel: em0: link state changed to UP Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a3 Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:18 mojito kernel: em0: link state changed to UP Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a4 Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:29 mojito kernel: em0: link state changed to UP Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a5 Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:39 mojito kernel: em0: link state changed to UP Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:49 mojito kernel: em0: link state changed to UP and the box ends up frozen less than a minute later. The traffic on the Intel card can be low (pinging a host for a few dozen of seconds), medium (reloading a few pages in the tabs of Firefox) or high (downloading several iso images from our local FTP mirror): whatever I do, if both nvidia and em0 are used, the box freezes. Note that I can't freeze the box when doing several simultaneous big downloads or taring up a lot of files but NOT running X. So I guess it is a shared nvidia/em IRQ issue. FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem. The DEBUG kernconf is GENERIC + witness options enabled (but they do not help in this case). I traced back to find which changeset introduced the trouble. The results are: #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00 # OK ... #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc freebsd src repository
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Hi, What about with just the first change and not the second? Anyways, I'm starting to see a trend here. Problem reports are clustering around UP systems, not SMP systems. I don't know if that's just coincidence or not. We've got also about twenty SMP Systems, seven of them now with 6.1 Prerelease and we don't have any affected systems. bge- and em- cards are working fine, even under high load situations. Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
I also have been using em (on-board NIC) with SMP without any problems, I just upgraded to check and all is still fine: New kernel : FreeBSD 6.2-PRERELEASE #7: Mon Oct 2 15:15:47 PDT 2006 Old kernel : FreeBSD 6.1-STABLE #4: Wed Sep 6 16:01:23 PDT 2006 I also have nvidia and use firefox with pre-saved tabs (~30), all works fine even on re-loading. Let me know if you would like any other info. Jorge On Thu, 5 Oct 2006, Martin Blapp wrote: Hi, We've got also about twenty SMP Systems, seven of them now with 6.1 Prerelease and we don't have any affected systems. bge- and em- cards are working fine, even under high load situations. Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Martin Blapp wrote: Hi, What about with just the first change and not the second? Anyways, I'm starting to see a trend here. Problem reports are clustering around UP systems, not SMP systems. I don't know if that's just coincidence or not. We've got also about twenty SMP Systems, seven of them now with 6.1 Prerelease and we don't have any affected systems. bge- and em- cards are working fine, even under high load situations. Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] I remember having this problem a few years ago on an openbsd box with 2 nics. At that time, I found a mailing list post outlining a process where you'd enter a break sequence to get to a command prompt before booting and enter some command there , I believe to disable acpi, and that would help. its been like 3-4 years so i dont remember the details. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Are you enabling an option, like IPv6, that puts Giant over the network stack? Am not enabling anything, but if INET6 is part of GENERIC (which I think it is isn't it?) then I would have that in my kernels as they basically look like this: include GENERIC options SMP device pf device atapicam options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_CDNR options ALTQ_PRIQ options ALTQ_NOPCC Actually, how do I 'unoption' something which has already been included, is there some equivalent to 'nodevice' for options ? -pete. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Sun, Oct 01, 2006 at 01:37:38PM +0100, Pete French wrote: Are you enabling an option, like IPv6, that puts Giant over the network stack? Am not enabling anything, but if INET6 is part of GENERIC (which I think it is isn't it?) then I would have that in my kernels as they basically look like this: include GENERIC options SMP device pf device atapicam options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_CDNR options ALTQ_PRIQ options ALTQ_NOPCC Actually, how do I 'unoption' something which has already been included, is there some equivalent to 'nodevice' for options ? nooption Kris pgpgu0c4TSegO.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Sun, Oct 01, 2006 at 01:37:38PM +0100, Pete French wrote: Are you enabling an option, like IPv6, that puts Giant over the network stack? Am not enabling anything, but if INET6 is part of GENERIC (which I think it is isn't it?) then I would have that in my kernels as they basically look like this: include GENERIC options SMP device pf device atapicam options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_CDNR options ALTQ_PRIQ options ALTQ_NOPCC Actually, how do I 'unoption' something which has already been included, is there some equivalent to 'nodevice' for options ? Yes, there is such a thing. It is (not too surprisingly) spelled 'nooption' and is actually documented in the config(5) manpage. -- Insert your favourite quote here. Erik Trulsson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Just an observation. All the boxes I've had this problem on have _two_ em interfaces. I have never seen it on my boxes with just one em NIC. The error is always em0 timeout - never em1 (I haven't seen any!) Yesterday my local network got completely wacky, the gateway had em0 timeouts on the screen: but em0 is the _outside_ the windows box that I had to reboot was attached to the inside on em1! Could there be something wrong in the driver if we have more than one em interface? Regards, Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Martin Nilsson wrote: Just an observation. All the boxes I've had this problem on have _two_ em interfaces. I have never seen it on my boxes with just one em NIC. The error is always em0 timeout - never em1 (I haven't seen any!) Yesterday my local network got completely wacky, the gateway had em0 timeouts on the screen: but em0 is the _outside_ the windows box that I had to reboot was attached to the inside on em1! Could there be something wrong in the driver if we have more than one em interface? Regards, Martin Multiple instances of the driver have no knowledge of each other. Nothing between them is shared. Even if they share an interrupt, it is a detail that is hidden. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Just an observation. All the boxes I've had this problem on have _two_ em interfaces. I have never seen it on my boxes with just one em NIC. The error is always em0 timeout - never em1 (I haven't seen any!) Yesterday my local network got completely wacky, the gateway had em0 timeouts on the screen: but em0 is the _outside_ the windows box that I had to reboot was attached to the inside on em1! Could there be something wrong in the driver if we have more than one em interface? A machine I have here that shows the problem has one fxp and one em and the timeouts occur on both interfaces. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Just to add a data point: I just upgraded feral.com to the latest RELENG_6 branch. I have a dual port em for internal networks and I've never seen the problems reported. Also, for -current, things have now been stable again for the last week or so for em on multiple machines (most of which have dual em i/f's) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM -0400: On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote: On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote: On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. I can confirm that despite the other side effect I already mentioned, this patch does fix or at least mask the problem I'm seing with em (and probably usb). Which is odd since the hypothesis Scott was working on should have shown up clearly in the mutex trace, but did not. But it is consistent with there being a beat-frequency problem with respect to the scheduler. I think the number you really need is not how long giant was held but how long was spent waiting for it. Paul ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Craig Boston wrote: On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) Oh, and if anyone is curious, I am able to reproduce the problem after booting without nvidia.ko loaded, using qemu in -nographic mode. Just wanted to rule that out since its code that's out of our control and would be a prime target to blame if I didn't. Craig My patch shouldn't have a single effect on nvidia. It just gets the USB out of the way of other drivers. Weird. But what does 'blah blah' translate into? Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
David G Lawrence wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. I am runnign this on a single processor machine with an SMP kernel and it does not have any effect. I dont tink I have any single processor machines running a non SMP kernel to try it on though. Not particularly helpful I know. I'll Actually, I think it is helpful to know that the program only has an effect on some machines. We just need to figure out what the common denominator is. Are you enabling an option, like IPv6, that puts Giant over the network stack? Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Are you enabling an option, like IPv6, that puts Giant over the network stack? From dmesg: WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant WARNING: MPSAFE network stack disabled, expect reduced performance. ...the kernel has IPSEC. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Sat, 30 Sep 2006, Scott Long wrote: David G Lawrence wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. I am runnign this on a single processor machine with an SMP kernel and it does not have any effect. I dont tink I have any single processor machines running a non SMP kernel to try it on though. Not particularly helpful I know. I'll Actually, I think it is helpful to know that the program only has an effect on some machines. We just need to figure out what the common denominator is. Are you enabling an option, like IPv6, that puts Giant over the network stack? IPv6 has Giant over its netisr, but not over the entire network stack. If Giant is being placed over the stack due to use of an option that forces it (such as KAME IPSEC) you should be able to grep this out of dmesg by doing something along the lines of the following: grep WARNING: debug.mpsafenet /var/run/dmesg Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, 29 Sep 2006, David G Lawrence wrote: Are you enabling an option, like IPv6, that puts Giant over the network stack? From dmesg: WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant WARNING: MPSAFE network stack disabled, expect reduced performance. ...the kernel has IPSEC. If you're not using IPv6 over IPSEC, consider trying FAST_IPSEC isntead. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Fri, Sep 29, 2006 at 11:05:35PM -0700, Paul Allen wrote: From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM -0400: On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote: On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote: On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. I can confirm that despite the other side effect I already mentioned, this patch does fix or at least mask the problem I'm seing with em (and probably usb). Which is odd since the hypothesis Scott was working on should have shown up clearly in the mutex trace, but did not. But it is consistent with there being a beat-frequency problem with respect to the scheduler. I think the number you really need is not how long giant was held but how long was spent waiting for it. It also seemed to show that nothing was really waiting for it (the cnt_* entries). Kris pgpfcLLwtdeCE.pgp Description: PGP signature
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Sat, Sep 30, 2006 at 12:14:17AM -0600, Scott Long wrote: One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) My patch shouldn't have a single effect on nvidia. It just gets the USB out of the way of other drivers. Weird. But what does 'blah blah' translate into? It didn't make any sense to me either after looking at the patch... I'm 100% sure that was the only change between boots, and it started working again after I reverted the sys/dev/usb directory and rebuilt. (svk is great for juggling patch sets around) That's one of the reasons I briefly suspected the nvidia driver causing problems somewhere, so I removed that from the mix just to be sure. 'blah blah' translates into numbers that mean nothing to me, but they may be useful to someone: Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head Count 0ae5 Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae4 Sep 29 16:57:11 kernel: NVRM: Xid (0001:00): 8, Channel Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head Count 0ae6 Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae5 Sep 29 16:57:19 kernel: NVRM: Xid (0001:00): 8, Channel 001e Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head Count 0ae7 Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae6 Sep 29 16:57:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head Count 0ae8 Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae7 Sep 29 16:57:35 kernel: NVRM: Xid (0001:00): 8, Channel 001e Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head Count 0ae9 Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae8 Sep 29 16:57:43 kernel: NVRM: Xid (0001:00): 8, Channel 001e Sep 29 16:57:49 kernel: NVRM: Xid (0001:00): 16, Head Count 0aea Sep 29 16:58:19 kernel: NVRM: Xid (0001:00): 8, Channel Sep 29 16:58:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e Sep 29 16:58:51 last message repeated 3 times Sep 29 16:58:51 kernel: NVRM: Xid (0001:00): 7, Ch 0001 M D bfef0007 intr 0001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Sat, Sep 30, 2006 at 02:39:06PM -0400, Kris Kennaway wrote: Which is odd since the hypothesis Scott was working on should have shown up clearly in the mutex trace, but did not. But it is consistent with there being a beat-frequency problem with respect to the scheduler. I think the number you really need is not how long giant was held but how long was spent waiting for it. It also seemed to show that nothing was really waiting for it (the cnt_* entries). I can set up a serial console an poke around in DDB during my test case if anyone thinks some useful information can be found. Unfortunately I'm remote from the machine right now so I won't be able to do that until Monday :/ Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
I wonder if this is related to the breakage of the Rocketport driver (PR is open, but it appears that nobody has looked at it.) It breaks specifically when I use a piece of software that does a lot of SELECTs on a terminal line to do pretty much what poll does but it is not specific to a uniprocessor or SMP kernel - it is reliably hosed in both cases. -- -- Karl Denninger ([EMAIL PROTECTED]) Internet Consultant Kids Rights Activist http://www.denninger.netMy home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://genesis3.blogspot.comMusings Of A Sentient Mind On Fri, Sep 29, 2006 at 05:14:33AM -0700, David G Lawrence wrote: Do you have any history of seeing the watchdog timeout problem on your machine? On this machine no - but it's the only one running em0. On other machines running bge0 then, yes, I see it a lot. But those are all SMP machines, aside from one. On that one I am currently building the latest 6-STABLE and when it's done (give it a couple of hours) I will give it a shot with your code and see what happens. Another data point: After rebooting my machine, the program no longer causes the problem. It appears that something else has to occur first on the machine to put it into a state that makes it suspectible to the program. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. WARNING: This program will kill the network on your 6.x server. Do not run this on a production machine unless you are on the console and can ctrl-C it! -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. #include sys/poll.h main() { struct pollfd pfd; pfd.fd = 1; pfd.events = POLLOUT; pfd.revents = 0; while (1) { if (poll(pfd, 1 /* stdout */, -1) 0) break; } } ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. Oh, one more thing - I've only tried this on uni-processor machines. The only MP machine that I have here is a production machine that I can't test this on right now. If running this on an SMP machine doesn't show the problem, then try running multiple copies of it (one for each CPU). -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. WARNING: This program will kill the network on your 6.x server. Do not run this on a production machine unless you are on the console and can ctrl-C it! I have tried this program on my workstation and I have not got any timeouts, network works good. sysadm:~uname -a FreeBSD sysadm.stc 6.1-STABLE FreeBSD 6.1-STABLE #4: Fri Aug 11 14:11:18 MSD 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SYSADM amd64 sysadm:~ ifconfig nve0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet6 fe80::2e0:81ff:fe55:bc54%nve0 prefixlen 64 scopeid 0x1 inet 192.168.2.26 netmask 0xff00 broadcast 192.168.2.255 inet 192.168.2.55 netmask 0x broadcast 192.168.2.55 ether 00:e0:81:55:bc:54 media: Ethernet autoselect (100baseTX full-duplex) status: active ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. WARNING: This program will kill the network on your 6.x server. Do not run this on a production machine unless you are on the console and can ctrl-C it! I have tried this program on my workstation and I have not got any timeouts, network works good. sysadm:~uname -a FreeBSD sysadm.stc 6.1-STABLE FreeBSD 6.1-STABLE #4: Fri Aug 11 14:11:18 Is this build date also about the same date that you cvsup'd the sources? MSD 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SYSADM amd64 sysadm:~ ifconfig nve0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet6 fe80::2e0:81ff:fe55:bc54%nve0 prefixlen 64 scopeid 0x1 inet 192.168.2.26 netmask 0xff00 broadcast 192.168.2.255 inet 192.168.2.55 netmask 0x broadcast 192.168.2.55 ether 00:e0:81:55:bc:54 media: Ethernet autoselect (100baseTX full-duplex) status: active Is this a UP machine or MP machine? -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 01:16:47AM -0700, David G Lawrence wrote: Is this a UP machine or MP machine? Dualcore AMD64. sysadm:~sysctl hw.ncpu hw.ncpu: 2 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. I am runnign this on a single processor machine with an SMP kernel and it does not have any effect. I dont tink I have any single processor machines running a non SMP kernel to try it on though. Not particularly helpful I know. I'll try building a non SMP kernel for this machine if I can. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. I am runnign this on a single processor machine with an SMP kernel and it does not have any effect. I dont tink I have any single processor machines running a non SMP kernel to try it on though. Not particularly helpful I know. I'll Actually, I think it is helpful to know that the program only has an effect on some machines. We just need to figure out what the common denominator is. try building a non SMP kernel for this machine if I can. Do you have any history of seeing the watchdog timeout problem on your machine? -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Do you have any history of seeing the watchdog timeout problem on your machine? On this machine no - but it's the only one running em0. On other machines running bge0 then, yes, I see it a lot. But those are all SMP machines, aside from one. On that one I am currently building the latest 6-STABLE and when it's done (give it a couple of hours) I will give it a shot with your code and see what happens. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Do you have any history of seeing the watchdog timeout problem on your machine? O.K., I just finished compiing up a uniprocessor kenel for the machine on which I had been seeing bge0 timeouts, and the lopppoll.c code has no effect there. The kerenl I am running is the latest STABLE from a couple of hours ago. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Do you have any history of seeing the watchdog timeout problem on your machine? On this machine no - but it's the only one running em0. On other machines running bge0 then, yes, I see it a lot. But those are all SMP machines, aside from one. On that one I am currently building the latest 6-STABLE and when it's done (give it a couple of hours) I will give it a shot with your code and see what happens. Another data point: After rebooting my machine, the program no longer causes the problem. It appears that something else has to occur first on the machine to put it into a state that makes it suspectible to the program. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
I've been experiencing this problem too, along with my USB keyboard acting 'wonky' (stuttering from time to time). For me at least it seems to be tied to CPU usage, meaning it's probably related to the taskqueue or maybe even the scheduler. I can also reproduce the problem on a much bigger scale than I've seen mentioned anywhere else (up to 30 seconds!). One sure-fire way to trigger it for me is to boot the Ubuntu 6.06.1 CD inside of qemu. I don't have kqemu or anything loaded -- it can be provoked by an ordinary process running as an ordinary user. While it's sitting at the GRUB screen (30 second countdown), my USB keyboard becomes inoperable, and em0 goes totally dead. It feels like no interrupts getting through -- if a key was pressed it will repeat until the 30 seconds are up or I kill the process. I initially suspected something holding GIANT for a long time, so I tried the giantless USB patches but that didn't help. Interestingly, I have another em interface in this machine but it continues to work. em0 is sharing irq19 with uhci1 (which the keyboard is attached to). em1 is on irq18. So whatever it is somehow stops irq19 from getting through, but the other IRQ lines seem unaffected. Sounding more and more like an APIC problem to me. Or possibly the ithread getting stuck. This machine *DID* work fine until sometime between 6.1 release and now. Unfortunately I can't seem to reproduce the problem on any of my test machines, only on the one that I need for day to day work :) I'm about halfway through reading the thread, but will be happy to test any patches do whatever I can to help. Craig [EMAIL PROTECTED]:10:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet [EMAIL PROTECTED]:12:0: class=0x02 card=0x002e1028 chip=0x100e8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet On Thu, Sep 28, 2006 at 08:13:51AM -0600, Scott Long wrote: All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. This patch only addresses the USB driver. If your network card is sharing an interrupt with something other than (or additional to) USB, this might not work for you. Also, this patch is just a very rough proof-of-concept and is not meant for production use. But I'd like to get feedback now before I spend more time on this. If this works then I'll clean it up and make it suitable for the release, and I'll look at some of the other drivers like ichsmb. If this is to be fixed for 6.2, I need lots of feedback ASAP. So please do not be shy =-) The patch is at: http://people.freebsd.org/~scottl/usb_fastintr.diff Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Doesn't seem to have any effect for me (other than high sys% times). qemu is really good at provoking my em0 to timeout. On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. WARNING: This program will kill the network on your 6.x server. Do not run this on a production machine unless you are on the console and can ctrl-C it! -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. #include sys/poll.h main() { struct pollfd pfd; pfd.fd = 1; pfd.events = POLLOUT; pfd.revents = 0; while (1) { if (poll(pfd, 1 /* stdout */, -1) 0) break; } } ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Friday 29 September 2006 06:37, Pete French wrote: Attached is a simple user program that will immediately cause pretty much all of the network drivers (at least the ones I own) to stop working and get watchdog timeouts. I am runnign this on a single processor machine with an SMP kernel and it does not have any effect. I dont tink I have any single processor machines running a non SMP kernel to try it on though. Not particularly helpful I know. I'll try building a non SMP kernel for this machine if I can. You can set kern.smp.disabled=1 from the loader to force UP with an SMP kernel. No need to recompile. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 04:21:55PM -0500, Craig Boston wrote: Doesn't seem to have any effect for me (other than high sys% times). qemu is really good at provoking my em0 to timeout. What might be useful for someone who can provoke this, is to configure your kernel with MUTEX_PROFILING, then do the following: sysctl debug.mutex.prof.enable=1 start_your_test_case sysctl debug.mutex.prof.enable=0 Then: sysctl debug.mutex.prof.stats stats.out and provide access to that file. This will help to show whether something is causing Giant starvation. Kris pgp2MwL1NZbeY.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote: What might be useful for someone who can provoke this, is to configure your kernel with MUTEX_PROFILING, then do the following: snip This will help to show whether something is causing Giant starvation. I'm currently building a kernel with Scott's patch -- if it still happens I'll build one with MUTEX_PROFILING and get the results. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote: and provide access to that file. This will help to show whether something is causing Giant starvation. http://www.gank.org/freebsd/stats.out That's after about 25 seconds of the em0 interface being unable to receive because of an apparent lack of interrupt processing (it can still transmit though! I had a half-open ssh session that continued to receive data for a while). Interesting data point #1: After a fresh boot, I'm unable to reproduce the problem until I use the kernel linker. After kldloading a module (any module) and then immediately unloading it, my test case works 100% until I reboot again. Interesting data point #2: After a reboot, the problem moved from em0 on irq19 to em1 on irq18. I'm remote from the machine right now so I can't test the usb controller that's sharing that interrupt, though I suspect it experienced the same lack of response. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) Oh, and if anyone is curious, I am able to reproduce the problem after booting without nvidia.ko loaded, using qemu in -nographic mode. Just wanted to rule that out since its code that's out of our control and would be a prime target to blame if I didn't. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote: On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. I can confirm that despite the other side effect I already mentioned, this patch does fix or at least mask the problem I'm seing with em (and probably usb). Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Fri, Sep 29, 2006 at 08:03:29PM -0500, Craig Boston wrote: On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote: and provide access to that file. This will help to show whether something is causing Giant starvation. http://www.gank.org/freebsd/stats.out That's after about 25 seconds of the em0 interface being unable to receive because of an apparent lack of interrupt processing (it can still transmit though! I had a half-open ssh session that continued to receive data for a while). maxtotal count avg cnt_hold cnt_lock name 61 5748 285205 24 /compile/src/sys/kern/kern_conf.c:311 (Giant) 921 2016 219610 /compile/src/sys/kern/kern_sysctl.c:1313 (Giant) 27 646 69 901 /compile/src/sys/kern/kern_conf.c:287 (Giant) 223113685334 253 /compile/src/sys/kern/kern_timeout.c:258 (Giant) 67351714496 774 /compile/src/sys/kern/kern_conf.c:323 (Giant) 40 1421 236 610 /compile/src/sys/kern/kern_conf.c:299 (Giant) 2 698 360 10 10 /compile/src/sys/kern/kern_intr.c:681 (Giant) 931037046 212 17421 /compile/src/sys/kern/kern_synch.c:218 (Giant) 2440 3304 13 25404 /compile/src/sys/net/netisr.c:339 (Giant) 18 5 100 /compile/src/sys/i386/i386/sys_machdep.c:115 (Giant) 29 585 341700 /compile/src/sys/kern/kern_descrip.c:376 (Giant) 162 162 1 16200 /compile/src/sys/kern/uipc_usrreq.c:937 (Giant) 88 1 800 /compile/src/sys/kern/uipc_usrreq.c:1032 (Giant) 81 138 26900 /compile/src/sys/kern/kern_conf.c:265 (Giant) 423 457 2 22830 /compile/src/sys/fs/fifofs/fifo_vnops.c:733 (Giant) 41 76 32500 /compile/src/sys/fs/fifofs/fifo_vnops.c:711 (Giant) 29 29 12900 /compile/src/sys/kern/vfs_syscalls.c:336 (Giant) The times are in microseconds. There are a couple of places where Giant was held on the order of milliseconds at least once (the max column). One thing to note is that you are using IPv6 on this machine, which is still under Giant; that may be relevant. Nothing really stands out to me as being a major problem though. Kris pgp5Gki4gYWq3.pgp Description: PGP signature
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote: On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote: On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff At first glance it appeared to work, but I'm about to do some more testing since I just discovered that I have to kldload something (anything) first in order to reproduce the problem. Weird. I can confirm that despite the other side effect I already mentioned, this patch does fix or at least mask the problem I'm seing with em (and probably usb). Which is odd since the hypothesis Scott was working on should have shown up clearly in the mutex trace, but did not. Kris pgp9YFBjiq1LX.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
To add another twist to this: I added options POLLING to the kernel, moved the fireware and USB drivers from the kernel and loaded them as modules. I have NOT enabled polling on the em-interface but this new kernal, built on the same sources as the failing one works without a hitch. As before, let me know if there is anything I can do to help. Regards, Goran L --On Wednesday, September 27, 2006 13:24:15 +0200 glz [EMAIL PROTECTED] wrote: I have seen the watchdog and reset problem on a -STABLE laptop, both em and iwi. It only occur when I try to connect using Mulberry e-mail client so I thought it could be a problem with the linuxilator. The load on the box is normally low but both driver have shared interrupts, either with cbb or usb. Here is what I can see: uname -a: FreeBSD viglaf 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #55: Thu Sep 21 22:15:38 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/VIGLAF i386 dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0x8000-0x803f mem 0xc022-0xc023,0xc020-0xc020 irq 11 at device 1.0 on pci2 em0: Ethernet address: 00:0d:60:89:36:e8 em0: [FAST] iwi0: Intel(R) PRO/Wireless 2915ABG mem 0xc0214000-0xc0214fff irq 9 at device 2.0 on pci2 iwi0: Ethernet address: 00:16:6f:8b:0a:21 vmstat -i interrupt total rate irq0: clk 11148090999 irq1: atkbd0 32271 2 irq5: pcm0 atapci+157115 14 irq6: fdc0 1 0 irq7: 1 0 stray irq7 1 0 irq8: rtc1426745127 irq9: cbb1 cbb2++* 26582 2 irq11: cbb0 em0++*762544 68 irq12: psm0 516858 46 irq14: ata043494 3 irq15: ata1 82 0 Total 14113784 1265 This is a development machine so I can debug and test patches as needed. Best regards, Goran L Patrick M. Hausen wrote: Hello! On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. I forgot to mention: we do have systems with em interfaces that never showed this problem! Regards, Patrick -- ... the future isMobile Goran Lowkrantz [EMAIL PROTECTED] System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden Phone: +46(0)920-75559 Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82 http://www.ismobile.com ... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -- ... the future isMobile Goran Lowkrantz [EMAIL PROTECTED] System Architect, isMobile, Aurorum 2, S-977 75 Lule¥, Sweden Phone: +46(0)920-75559 Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82 http://www.ismobile.com ... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. This patch only addresses the USB driver. If your network card is sharing an interrupt with something other than (or additional to) USB, this might not work for you. Also, this patch is just a very rough proof-of-concept and is not meant for production use. But I'd like to get feedback now before I spend more time on this. If this works then I'll clean it up and make it suitable for the release, and I'll look at some of the other drivers like ichsmb. If this is to be fixed for 6.2, I need lots of feedback ASAP. So please do not be shy =-) The patch is at: http://people.freebsd.org/~scottl/usb_fastintr.diff Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wednesday, 27 September 2006 at 9:40:52 -0700, Jeremy Chadwick wrote: On Wed, Sep 27, 2006 at 06:32:59PM +0200, Patrick M. Hausen wrote: On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote: I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. I can confirm that making em0 share an interrupt with the SATA-controller on my box makes the problem much much more apparent. So we're all on the same page here -- this really appears to be some kind-of kernel interrupt handler problem (something somewhere is getting deadlocked? Not sure). Has anyone tried rolling back to previous 6.2 builds to try and figure out timeframes when this was introduced? From my perspective, it happened sometime between August and the end of September. I want to confirm that i have watchdog timeout on 6.1-RELEASE-p3 with GENERIC kernel. USB was disabled on BIOS at all. Another box calld media2 using 6.1-STABLE from Fri Sep 1 11:54:11 EDT 2006 GENERIC kernel. Both boxes are UP machines. Here is additional info: === [EMAIL PROTECTED]:~# ifconfig em0 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU media: Ethernet autoselect (1000baseTX full-duplex) status: active === [EMAIL PROTECTED]:~# ifconfig em0 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU media: Ethernet autoselect (1000baseTX full-duplex) status: active === [EMAIL PROTECTED]:~# vmstat -i interrupt total rate irq1: atkbd0 576 0 irq6: fdc0 9 0 irq14: ata0 47 0 irq24: amr0 1314154135 irq28: em0 42909062 4415 cpu0: timer 19436324 2000 Total 63660172 6550 === [EMAIL PROTECTED]:~# vmstat -i interrupt total rate irq1: atkbd0 14 0 irq28: em01480465106 3616 irq48: amr0 66293858161 irq72: amr1 6586 0 cpu0: timer818728378 2000 Total 2365493942 5779 === [EMAIL PROTECTED]:~# pciconf -lv [EMAIL PROTECTED]:0:0:class=0x06 card=0x348015d9 chip=0x254c8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'E7501 Host Controller' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:0:1: class=0xff card=0x348015d9 chip=0x25418086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'E7500 System Controller (MCH, Hub Interface A) Error Reporter' [EMAIL PROTECTED]:2:0: class=0x060400 card=0x chip=0x25438086 rev=0x01 hdr=0x01 vendor = 'Intel Corporation' device = 'E7500/E7501 HI_B Virtual PCI-to-PCI Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:3:0: class=0x060400 card=0x chip=0x25458086 rev=0x01 hdr=0x01 vendor = 'Intel Corporation' device = 'E7500/E7501 HI_C Virtual PCI-to-PCI Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x348015d9 chip=0x24828086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x348015d9 chip=0x24848086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:29:2:class=0x0c0300 card=0x348015d9 chip=0x24878086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 rev=0x42 hdr=0x01 vendor = 'Intel Corporation' device = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB Hub Interface to PCI Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x24808086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82801CA/CAM (ICH3-S/ICH3-M) LPC Interface' class= bridge subclass = PCI-ISA [EMAIL PROTECTED]:31:1:
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Scott Long wrote: All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. This patch only addresses the USB driver. If your network card is sharing an interrupt with something other than (or additional to) USB, this might not work for you. Also, this patch is just a very rough proof-of-concept and is not meant for production use. But I'd like to get feedback now before I spend more time on this. If this works then I'll clean it up and make it suitable for the release, and I'll look at some of the other drivers like ichsmb. If this is to be fixed for 6.2, I need lots of feedback ASAP. So please do not be shy =-) The patch is at: http://people.freebsd.org/~scottl/usb_fastintr.diff Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] patch does not work on my box: cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -Werror /usr/src/sys/dev/usb/usb.c /usr/src/sys/dev/usb/usb.c: In function `usb_attach': /usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first use in this function) /usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.) /usr/src/sys/dev/usb/usb.c: At top level: /usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used *** Error code 1 Stop in /usr/obj/usr/src/sys/THOR. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. Uname: FreeBSD my.box.org 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #85: Thu Sep 28 17:09:24 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THOR amd64 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
O. Hartmann wrote: Scott Long wrote: All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. This patch only addresses the USB driver. If your network card is sharing an interrupt with something other than (or additional to) USB, this might not work for you. Also, this patch is just a very rough proof-of-concept and is not meant for production use. But I'd like to get feedback now before I spend more time on this. If this works then I'll clean it up and make it suitable for the release, and I'll look at some of the other drivers like ichsmb. If this is to be fixed for 6.2, I need lots of feedback ASAP. So please do not be shy =-) The patch is at: http://people.freebsd.org/~scottl/usb_fastintr.diff Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] patch does not work on my box: cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -Werror /usr/src/sys/dev/usb/usb.c /usr/src/sys/dev/usb/usb.c: In function `usb_attach': /usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first use in this function) /usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.) /usr/src/sys/dev/usb/usb.c: At top level: /usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used *** Error code 1 Stop in /usr/obj/usr/src/sys/THOR. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. I accidentally posted a patch against HEAD, not RELENG_6. I'll correct that shortly. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
At 03:15 PM 9/28/2006, O. Hartmann wrote: /usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.) /usr/src/sys/dev/usb/usb.c: At top level: /usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used *** Error code 1 Are you sure the patch applied cleanly to STABLE ? There are a couple of spots you need to change manually as it assumes the version of USB from HEAD. Manually apply the patch for usb.c and ohci_pci.c if you are using STABLE and remove the offending bits from the patch and it should compile cleanly. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Additional data point: On 6.1-RELEASE I've observed the same sort of behavior, but without any noticeable consistency. It affects bge(4) and em(4) systems. In the case of the bge(4)-equipped system, there's a very weak correlation between heavy disk activity and watchdog timeouts. However, on that system, it doesn't look like the network card shares its PCI bus and interrupt with any other devices: bgehost % pciconf -l [EMAIL PROTECTED]:0:0:class=0x06 card=0x chip=0x00081166 rev=0x23 hdr=0x00 [EMAIL PROTECTED]:0:1:class=0x06 card=0x chip=0x00081166 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:0:2:class=0x06 card=0x chip=0x00061166 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:0:3:class=0x06 card=0x chip=0x00061166 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:8:0: class=0x01 card=0xe2a09005 chip=0x00809005 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:14:0:class=0x03 card=0x00d11028 chip=0x47521002 rev=0x27 hdr=0x00 [EMAIL PROTECTED]:15:0:class=0x060100 card=0x02001166 chip=0x02001166 rev=0x50 hdr=0x00 [EMAIL PROTECTED]:15:1: class=0x01018a card=0x chip=0x0266 rev=0x00 hdr=0x00 [EMAIL PROTECTED]:15:2:class=0x0c0310 card=0x02201166 chip=0x02201166 rev=0x04 hdr=0x00 [EMAIL PROTECTED]:8:0: class=0x02 card=0x00d11028 chip=0x164414e4 rev=0x12 hdr=0x00 [EMAIL PROTECTED]:2:0: class=0x060400 card=0x0068 chip=0x09628086 rev=0x01 hdr=0x01 [EMAIL PROTECTED]:2:1: class=0x010400 card=0x00d11028 chip=0x00021028 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:4:0: class=0x02 card=0x009b1028 chip=0x12298086 rev=0x08 hdr=0x00 bgehost % grep irq /var/run/dmesg.boot ioapic0 Version 1.1 irqs 0-15 on motherboard ioapic1 Version 1.1 irqs 16-31 on motherboard ahc0: Adaptec 29160 Ultra160 SCSI adapter port 0xec00-0xecff mem 0xfe102000-0xfe102fff irq 18 at device 8.0 on pci0 ohci0: OHCI (generic) USB controller mem 0xfe10-0xfe100fff irq 5 at device 15.2 on pci0 bge0: Broadcom BCM5700 Gigabit Ethernet, ASIC rev. 0x7102 mem 0xfeb0-0xfeb0 irq 17 at device 8.0 on pci1 aac0: Dell PERC 3/Di mem 0xf000-0xf7ff irq 31 at device 2.1 on pci2 fxp0: Intel 82559 Pro/100 Ethernet port 0xccc0-0xccff mem 0xfe90-0xfe900fff,0xfe70-0xfe7f irq 16 at device 4.0 on pci2 fdc0: floppy drive controller port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0 atkbd0: AT Keyboard irq 1 on atkbdc0 psm0: PS/2 Mouse irq 12 on atkbdc0 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0 This is an SMP host (a pair of Pentium IIIs). The em(4)-equipped host emits watchdog timeout warnings far more frequently, but not with any discernable pattern. However, it routinely handles a *lot* more network traffic, and that traffic is unpredictable and bursty in nature. Its interfaces also appear to have their own resources allocated: emhost %pciconf -l [EMAIL PROTECTED]:0:0:class=0x06 card=0x chip=0x25788086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:3:0: class=0x060400 card=0x chip=0x257b8086 rev=0x02 hdr=0x01 [EMAIL PROTECTED]:28:0:class=0x060400 card=0x0050 chip=0x25ae8086 rev=0x02 hdr=0x01 [EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x01651028 chip=0x25a98086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x01651028 chip=0x25aa8086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:29:4:class=0x088000 card=0x01651028 chip=0x25ab8086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:29:5:class=0x080020 card=0x01651028 chip=0x25ac8086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:29:7:class=0x0c0320 card=0x01651028 chip=0x25ad8086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 rev=0x0a hdr=0x01 [EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x25a18086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:31:2: class=0x01018a card=0x01651028 chip=0x25a38086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:31:3:class=0x0c0500 card=0x01651028 chip=0x25a48086 rev=0x02 hdr=0x00 [EMAIL PROTECTED]:1:0: class=0x02 card=0x01651028 chip=0x10758086 rev=0x00 hdr=0x00 [EMAIL PROTECTED]:1:0: class=0x02 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:1:1: class=0x02 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:2:0: class=0x02 card=0x01651028 chip=0x10768086 rev=0x00 hdr=0x00 [EMAIL PROTECTED]:3:0: class=0x010400 card=0x05201028 chip=0x19601000 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:14:0:class=0x03 card=0x01651028 chip=0x47521002 rev=0x27 hdr=0x00 emhost %grep irq /var/run/dmesg.boot ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard em0: Intel(R) PRO/1000 Network Connection Version - 3.2.18 port 0xece0-0xecff mem 0xfe3e-0xfe3f irq 18 at device 1.0 on pci1 em1: Intel(R) PRO/1000 Network
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Mike Tancsa wrote: At 03:15 PM 9/28/2006, O. Hartmann wrote: /usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.) /usr/src/sys/dev/usb/usb.c: At top level: /usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used *** Error code 1 Are you sure the patch applied cleanly to STABLE ? There are a couple of spots you need to change manually as it assumes the version of USB from HEAD. Manually apply the patch for usb.c and ohci_pci.c if you are using STABLE and remove the offending bits from the patch and it should compile cleanly. ---Mike Corrected patch is at: http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff Sorry for the confusion. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Scott Long wrote: All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. Just to be clear, has it been established that the problem only occurs when em is sharing an interrupt? I have a lot of production machines using the PDSMi board, which is one of the boards that the problem was noticed on, however i do not share any irqs, i always disable USB in the BIOS. # vmstat -i interrupt total rate irq16: em0 13001181 7 irq19: atapci0 76559511 42 cpu0: timer 3643365617 1999 cpu1: timer 3643365610 1999 Total 7376291919 4048 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi! On Thu, Sep 28, 2006 at 02:47:09PM -0500, Alan Amesbury wrote: Additional data point: On 6.1-RELEASE I've observed the same sort of behavior, but without any noticeable consistency. It affects bge(4) and em(4) systems. In the case of the bge(4)-equipped system, there's a very weak correlation between heavy disk activity and watchdog timeouts. However, on that system, it doesn't look like the network card shares its PCI bus and interrupt with any other devices: Same here, just to make sure to get that point through: em doesn't share an interrupt with anything else - hang will occur sooner or later if the system is busy (sometimes later, but reproducably) force system to share interrupt of, say, ata0 and em0 - immediate *kaboom* whenever both are busy HTH, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Mike Jakubik wrote: Scott Long wrote: All, Attached is my first cut at addressing the problems described in this thread. As I discussed earlier, the VM syncer thread is likely starving the USB interrupt thread. This causes the shared usb+network interrupt to remain masked, preventing network interrupts from being delivered, and thus triggering watchdog timeouts. Just to be clear, has it been established that the problem only occurs when em is sharing an interrupt? I have a lot of production machines using the PDSMi board, which is one of the boards that the problem was noticed on, however i do not share any irqs, i always disable USB in the BIOS. On many of our servers, we have bge cards and I can see a lot of watchdog timeouts. We always disable USB in the bios and they didn't share irq. # vmstat -i interrupt total rate irq16: em0 13001181 7 irq19: atapci0 76559511 42 cpu0: timer 3643365617 1999 cpu1: timer 3643365610 1999 Total 7376291919 4048 example with our ftp server (ftp8.fr.freebsd.org), a HP DL360 G4 SMP : # vmstat -i interrupt total rate irq1: atkbd01576 0 irq4: sio0 3 0 irq6: fdc012 0 irq14: ata0 57 0 irq24: ciss117181184 8 irq25: bge0841821262402 irq26: bge1674342644322 irq72: ciss024194679 11 cpu0: timer 4180478365 1999 cpu1: timer 4180886439 1999 Total 9918906221 4743 # bzgrep watchdog /var/log/messages* /var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.1.bz2:Sep 6 08:33:54 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog timeout -- resetting /var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog timeout -- resetting # pciconf -lv [EMAIL PROTECTED]:0:0:class=0x06 card=0x32000e11 chip=0x35908086 rev=0x0a hdr=0x00 vendor = 'Intel Corporation' device = 'E752x Server Memory Controller Hub' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:2:0: class=0x060400 card=0x0050 chip=0x35958086 rev=0x0a hdr=0x01 vendor = 'Intel Corporation' device = 'E752x Memory Controller Hub PCI Express Port A0' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:4:0: class=0x060400 card=0x0050 chip=0x35978086 rev=0x0a hdr=0x01 vendor = 'Intel Corporation' device = 'E752x Memory Controller Hub PCI Express Port B0' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:6:0: class=0x060400 card=0x0050 chip=0x35998086 rev=0x0a hdr=0x01 vendor = 'Intel Corporation' device = 'E752x Memory Controller Hub PCI Express Port C0' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:28:0:class=0x060400 card=0x0050 chip=0x25ae8086 rev=0x02 hdr=0x01 vendor = 'Intel Corporation' device = '6300ESB Hub Interface to PCI-X Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 rev=0x0a hdr=0x01 vendor = 'Intel Corporation' device = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB Hub Interface to PCI Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x25a18086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '6300ESB LPC Interface Bridge' class= bridge subclass = PCI-ISA [EMAIL PROTECTED]:31:1: class=0x01018a card=0x32010e11 chip=0x25a28086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '6300ESB IDE Controller' class= mass storage subclass = ATA [EMAIL PROTECTED]:0:0: class=0x060400 card=0x0044 chip=0x03298086 rev=0x09 hdr=0x01 vendor = 'Intel Corporation' device = '6700PXH PCI Express-to-PCI Express Bridge A' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:0:2: class=0x060400 card=0x0044 chip=0x032a8086 rev=0x09 hdr=0x01 vendor = 'Intel Corporation' device = '6700PXH PCI Express-to-PCI Express Bridge B' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:1:0:class=0x010400 card=0x409b0e11 chip=0x00460e11 rev=0x01 hdr=0x00
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
On many of our servers, we have bge cards and I can see a lot of watchdog timeouts. We always disable USB in the bios and they didn't share irq. I see the same thing - we have a number of HP blades which use bge interfaces and I get many watchdog timeouts on them. These are also not sharing any interrupts interrupt total rate irq1: atkbd0 2 0 irq24: ciss0 13208 11 irq74: bge1 1452046216120 cpu0: timer 2581779930214 cpu2: timer 2579262777214 cpu1: timer 2581771929214 cpu3: timer 2579262777214 Total11909678839989 This is 6.1 - I have a couple of boxes running 6.2 and those have not shown any timeouts so far. They are, however, far more lightly loaded. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi, On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote: I get tons of these: em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP mailbox# pciconf -lv [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet [...] I have only seen them on em0. Yesterday I tried sysutils/cpuburn on similar boxes that are netbooted with NFS mounted drives and everytime I loaded the two CPU cores the network went down. I see the same. Very much on this one, where I workaround the problem by using polling, it's a UP machine. FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:36 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NESSIE i386 [EMAIL PROTECTED]:1:0: class=0x02 card=0x10198086 chip=0x10198086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller (LOM)' class= network subclass = ethernet irq18: em0 uhci23319 0 Another machine, also UP, but with two interfaces. The problem is not as apparent as on the first machine, but it's there. This machine is not as loaded usually (CPU wise) as the first machine. The problem is ONLY on em1: FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:46 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386 [EMAIL PROTECTED]:1:0: class=0x02 card=0x10758086 chip=0x10758086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' class= network subclass = ethernet [EMAIL PROTECTED]:2:0: class=0x02 card=0x10768086 chip=0x10768086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' class= network subclass = ethernet irq17: em1 ichsmb0 950121879855 irq18: em0 71437344 64 The problem appeared after the em updates during the last weeks in the kernel and has not been observed before this. em is always loaded as a module in my kernels. The problem seems to occur more often if the machine's CPU is busy. I have several SMP machines with the following em interfaces, which DON'T show the problem, but they also have different chipset on the em interface. Most of the kernels were built between Sep 7 and Sep 19. 3 times this: [EMAIL PROTECTED]:5:0: class=0x02 card=0x34248086 chip=0x10108086 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:5:1: class=0x02 card=0x34248086 chip=0x10108086 rev=0x01 hdr=0x00 irq23: em0 970303432750 3 times this: [EMAIL PROTECTED]:5:0: class=0x02 card=0x34258086 chip=0x100e8086 rev=0x02 hdr=0x00 irq23: em0 292477376435 So I can observe at least 3 interesting differences: - the interface showing the problems shares the interrupt - for me it happens on UP machines only - the chips are different What I can't do: moving the interfaces between machines, these are onboard interfaces. What I could do: I could try to unload the USB driver or the ichsmb driver on the machiens, where the interrupts are shared. Anyway, the USB is not used currently (I have it enabled to be prepared to hook up a USB Mass Storage device, which never happend since the problem occured). The ichsmb also is usually not queried. Any suggestions on how I could help? - Olli -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgpo9EsOWtG7V.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Oliver Brandmueller wrote: Hi, On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote: I get tons of these: em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP mailbox# pciconf -lv [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet [...] I have only seen them on em0. Yesterday I tried sysutils/cpuburn on similar boxes that are netbooted with NFS mounted drives and everytime I loaded the two CPU cores the network went down. I see the same. Very much on this one, where I workaround the problem by using polling, it's a UP machine. FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:36 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NESSIE i386 [EMAIL PROTECTED]:1:0: class=0x02 card=0x10198086 chip=0x10198086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller (LOM)' class= network subclass = ethernet irq18: em0 uhci23319 0 Another machine, also UP, but with two interfaces. The problem is not as apparent as on the first machine, but it's there. This machine is not as loaded usually (CPU wise) as the first machine. The problem is ONLY on em1: FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:46 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386 [EMAIL PROTECTED]:1:0: class=0x02 card=0x10758086 chip=0x10758086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' class= network subclass = ethernet [EMAIL PROTECTED]:2:0: class=0x02 card=0x10768086 chip=0x10768086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' class= network subclass = ethernet irq17: em1 ichsmb0 950121879855 irq18: em0 71437344 64 The problem appeared after the em updates during the last weeks in the kernel and has not been observed before this. em is always loaded as a module in my kernels. The problem seems to occur more often if the machine's CPU is busy. I have several SMP machines with the following em interfaces, which DON'T show the problem, but they also have different chipset on the em interface. Most of the kernels were built between Sep 7 and Sep 19. 3 times this: [EMAIL PROTECTED]:5:0: class=0x02 card=0x34248086 chip=0x10108086 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:5:1: class=0x02 card=0x34248086 chip=0x10108086 rev=0x01 hdr=0x00 irq23: em0 970303432750 3 times this: [EMAIL PROTECTED]:5:0: class=0x02 card=0x34258086 chip=0x100e8086 rev=0x02 hdr=0x00 irq23: em0 292477376435 So I can observe at least 3 interesting differences: - the interface showing the problems shares the interrupt - for me it happens on UP machines only - the chips are different What I can't do: moving the interfaces between machines, these are onboard interfaces. What I could do: I could try to unload the USB driver or the ichsmb driver on the machiens, where the interrupts are shared. Anyway, the USB is not used currently (I have it enabled to be prepared to hook up a USB Mass Storage device, which never happend since the problem occured). The ichsmb also is usually not queried. Any suggestions on how I could help? - Olli Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. Since you've stated that it seems to be related to shared interrupts, the first possibility is more likely. However, I'm not sure why the symptom would only be showing up now. The Intel docs say that the 82547EI are a bit interesting, and I wonder if assumptions that we make about PCI ordering aren't true (or if there are bugs that make our assumptions invalid). Does this happen after there has been a lot of disk activity, like a large tar extraction? Are you using the SMBus interface at all, or is it sitting completely idle? Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hello! Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. I helped Pyun with some debugging by providing ssh access to a machine showing the (seemingly) same problem. At first he thought the interrupt handler of the em driver was the culprit, but we applied quite a few patches and tested afterwards - seems like the driver is not the cause. On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. So, Pyun suggested it was a problem with the taskqueue that was introduced some time between 6.0 and 6.1. With my system (Tyan GT20 B5161G20) the problem shows when there is heavy disk and cpu activity, like make buildworld. I made sure that the em interface doesn't share an interrupt with the SATA controller. When the problem occurs, I get the well known watchdog timeout messages and then the system's network activity over that interface freezes completely for a couple of minutes. Usually the system recovers after a while without reboot or other measures. What I can do: give ssh access to a system showing this behaviour including a network connection to another box, so one can transfer large amounts of data over a private LAN. I used FTP of a sparse big file. Prerequisite: fixed IP address of the machine that the developer whishes to use to connect to my system. HTH, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On 9/27/06, Martin Nilsson [EMAIL PROTECTED] wrote: mailbox# uname -a FreeBSD mailbox 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Fri Sep 22 00:31:29 CEST 2006 [EMAIL PROTECTED]:/usr/obj-local/usr/src/sys/SMP amd64 I get tons of these: em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP mailbox# pciconf -lv [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU inet6 fe80::230:48ff:fe89:c958%em0 prefixlen 64 scopeid 0x1 inet 192.168.10.2 netmask 0xff00 broadcast 192.168.10.255 ether 00:30:48:89:c9:58 media: Ethernet autoselect (1000baseTX full-duplex) status: active We have several SMP systems with onboard em0/em1 Interfaces running on a RELENG_6 snapshot taken at 2006-09-20 00:00+0. They are not in production yet, so the load is not that much. However I haven't seen any watchdog timeouts on them. Only annoyance is, that the em(4) interfaces take too long for the interface to come up, ie, the system will boot, run ifconfig, the interface still has no link so syslogd/ntpdate/ntpd will complain about 'no route to host'. A 'sleep 5' fixes that problem, though I'd like to avoid such hacks. Anyway, here's the data: [EMAIL PROTECTED]:2:0: class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class= network subclass = ethernet [EMAIL PROTECTED]:2:1: class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class= network subclass = ethernet em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0x3040-0x307f mem 0xd832-0xd833 irq 54 at device 2.0 on pci3 em0: Ethernet address: XX em0: [FAST] em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0x3080-0x30bf mem 0xd834-0xd835 irq 55 at device 2.1 on pci3 em1: Ethernet address: XX em1: [FAST] em0: link state changed to UP em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU inet 1.2.3.4 netmask 0xff00 broadcast 1.2.3.4 ether X media: Ethernet autoselect (100baseTX full-duplex) status: active Hope this helps to narrow down the problem. Uli ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hello! On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. I forgot to mention: we do have systems with em interfaces that never showed this problem! Regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
I have seen the watchdog and reset problem on a -STABLE laptop, both em and iwi. It only occur when I try to connect using Mulberry e-mail client so I thought it could be a problem with the linuxilator. The load on the box is normally low but both driver have shared interrupts, either with cbb or usb. Here is what I can see: uname -a: FreeBSD viglaf 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #55: Thu Sep 21 22:15:38 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/VIGLAF i386 dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0x8000-0x803f mem 0xc022-0xc023,0xc020-0xc020 irq 11 at device 1.0 on pci2 em0: Ethernet address: 00:0d:60:89:36:e8 em0: [FAST] iwi0: Intel(R) PRO/Wireless 2915ABG mem 0xc0214000-0xc0214fff irq 9 at device 2.0 on pci2 iwi0: Ethernet address: 00:16:6f:8b:0a:21 vmstat -i interrupt total rate irq0: clk 11148090999 irq1: atkbd0 32271 2 irq5: pcm0 atapci+157115 14 irq6: fdc0 1 0 irq7: 1 0 stray irq7 1 0 irq8: rtc1426745127 irq9: cbb1 cbb2++* 26582 2 irq11: cbb0 em0++*762544 68 irq12: psm0 516858 46 irq14: ata043494 3 irq15: ata1 82 0 Total 14113784 1265 This is a development machine so I can debug and test patches as needed. Best regards, Goran L Patrick M. Hausen wrote: Hello! On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. I forgot to mention: we do have systems with em interfaces that never showed this problem! Regards, Patrick -- ... the future isMobile Goran Lowkrantz [EMAIL PROTECTED] System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden Phone: +46(0)920-75559 Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82 http://www.ismobile.com ... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, 27 Sep 2006 13:24:15 +0200 glz [EMAIL PROTECTED] wrote about Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2: G I have seen the watchdog and reset problem on a -STABLE laptop, both em G and iwi. It only occur when I try to connect using Mulberry e-mail G client so I thought it could be a problem with the linuxilator. Same (or at least similar) behaviour here on an HP/Compaq nx7010 with an internal rl interface. I can trigger the problems using cvsup (even at moderate speeds connected via ADSL). Just drop me a note which further infos are needed for debugging. cu Gerrit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi, it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : /var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: link state changed to DOWN /var/log/messages:Sep 23 02:47:11 anubis kernel: bge1: link state changed to UP /var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: link state changed to DOWN /var/log/messages.0.bz2:Sep 12 22:22:51 anubis kernel: bge1: link state changed to UP /var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: link state changed to DOWN /var/log/messages.0.bz2:Sep 17 15:22:06 anubis kernel: bge1: link state changed to UP /var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: link state changed to DOWN /var/log/messages.0.bz2:Sep 20 12:13:11 anubis kernel: bge1: link state changed to UP /var/log/messages.1.bz2:Sep 6 08:33:54 anubis kernel: bge1: watchdog timeout -- resetting /var/log/messages.1.bz2:Sep 6 08:33:54 anubis kernel: bge1: link state changed to DOWN /var/log/messages.1.bz2:Sep 6 08:33:59 anubis kernel: bge1: link state changed to UP /var/log/messages.2.bz2:Sep 4 17:39:25 anubis kernel: bge1: link state changed to DOWN /var/log/messages.2.bz2:Sep 4 17:39:28 anubis kernel: bge1: link state changed to UP /var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog timeout -- resetting /var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: link state changed to DOWN /var/log/messages.3.bz2:Aug 29 12:09:41 anubis kernel: bge0: link state changed to UP /var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog timeout -- resetting /var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: link state changed to DOWN /var/log/messages.4.bz2:Aug 22 15:44:03 anubis kernel: bge0: link state changed to UP -- Philippe Pegon Patrick M. Hausen wrote: Hello! Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. I helped Pyun with some debugging by providing ssh access to a machine showing the (seemingly) same problem. At first he thought the interrupt handler of the em driver was the culprit, but we applied quite a few patches and tested afterwards - seems like the driver is not the cause. On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. So, Pyun suggested it was a problem with the taskqueue that was introduced some time between 6.0 and 6.1. With my system (Tyan GT20 B5161G20) the problem shows when there is heavy disk and cpu activity, like make buildworld. I made sure that the em interface doesn't share an interrupt with the SATA controller. When the problem occurs, I get the well known watchdog timeout messages and then the system's network activity over that interface freezes completely for a couple of minutes. Usually the system recovers after a while without reboot or other measures. What I can do: give ssh access to a system showing this behaviour including a network connection to another box, so one can transfer large amounts of data over a private LAN. I used FTP of a sparse big file. Prerequisite: fixed IP address of the machine that the developer whishes to use to connect to my system. HTH, Patrick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Me Too(tm). FreeBSD jacinta.home.cacheboy.net 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Mon Sep 18 07:59:50 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 Lots of this in dmesg: em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP vmstat -i: irq16: em01053995830 2844 According to dmesg only em0 is on the bus. This is on an NForce2 board with an AMD 1800XP+. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi! On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. Some people experience additional complete hangs of network communications, that may or may not be related to them. Regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
s/is on the bus/is alone on the irq/. (And it shows up when I'm running polygraph and apachebench tests.) On 9/27/06, Adrian Chadd [EMAIL PROTECTED] wrote: Me Too(tm). FreeBSD jacinta.home.cacheboy.net 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE#0: Mon Sep 18 07:59:50 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 Lots of this in dmesg: em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP vmstat -i: irq16: em01053995830 2844 According to dmesg only em0 is on the bus. This is on an NForce2 board with an AMD 1800XP+. Adrian -- Adrian Chadd - [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote: On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. You'll still see impact -- that is, no packets flowing. The reason things are recoverable is solely because of the retry functionality for layer 2 packets... In general, it's not a good thing to have watchdog timeouts. It means the interrupt is hung, or the card is hung. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 09:06:09PM +0800, Adrian Chadd wrote: Me Too(tm). Me three -- and the interesting part (in my case) is that em0 shares an IRQ with the ATA controller. http://www.freebsd.org/cgi/query-pr.cgi?pr=103435 Because people are reporting this on more than just the em driver (bge driver as well), my guess is that it's not specific to the Ethernet drivers. I've seen some semi-recent commits pertaining to the APIC handling code -- could these explain what's happening? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
At 09:25 AM 9/27/2006, Patrick M. Hausen wrote: Hi! On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. If it up / downs the interface, it can be painful depending on your setup. In one of the colos I dont have control over, the switch port will block for 15 seconds for Spanning Tree when the interface transitions like that. Even in cases where this does not happen, a 1-2 second network outage can play havoc with some applications. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi! On Wed, Sep 27, 2006 at 06:52:51AM -0700, Jeremy Chadwick wrote: On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote: On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. You'll still see impact -- that is, no packets flowing. The reason things are recoverable is solely because of the retry functionality for layer 2 packets... You are, of course, right. What I meant is: these timeouts should not lead to freezing of all network communications for a couple of minutes like me and some other people seem to experience. TCP and most UDP based upper level protocols will recover gently from a lost packet or two. Regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi. On Wed, Sep 27, 2006 at 04:19:55PM +0200, Patrick M. Hausen wrote: You'll still see impact -- that is, no packets flowing. The reason things are recoverable is solely because of the retry functionality for layer 2 packets... You are, of course, right. What I meant is: these timeouts should not lead to freezing of all network communications for a couple of minutes like me and some other people seem to experience. port fast on a switchport is not in all cases a desirable option, apart from the fact that you probably don't have the acces and choice in some places to do so. Withtout this this means at least 10-20 seconds without network on some switches until the port is up again on theswitch after it went down! - Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi Scott, On Wed, Sep 27, 2006 at 03:16:57AM -0600, Scott Long wrote: Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. Since you've stated that it seems to be related to shared interrupts, the first possibility is more likely. However, I'm not sure why the symptom would only be showing up now. The Intel docs say that the 82547EI are a bit interesting, and I wonder if assumptions that we make about PCI ordering aren't true (or if there are bugs that make our assumptions invalid). Does this happen after there has been a lot of disk activity, like a large tar extraction? Are you using the SMBus interface at all, or is it sitting completely idle? Disk activity does not trigger the problem, I hammered the disk with around 85 MB/s (dd) for about half an hour without seeing any effect. A CPU bound thing like a buildworld triggered the problem. The SMBus Interface is not used at all (it's not even really usable). Anyway, as soon as I unload the ichsmb module I cannot triger the problem anymore. If I load it again, the problem cann again be triggered by a buildworld. Statistical relevance: I did 4 buildworlds, alternating the load/unload of ichsmb - both times with ichsmb loaded I saw 3 watchdog timeouts during the buildworld was running, while ichsmb was not loaded I did not see a single watchdog timeout. The use of the interface was around the same during all the time (constant NFS traffic of around 1-2 MBit/s). Since we all seem to see this on only the interfaces sharing interrupts (as I read the other poster's mails) and the problem can be worked around by using polling, it seems to become pretty clear, that it has to to with interrupt handling. The UP/SMP idea seems to be only of interest, because on an UP machine it's more likely to share interrupts than on SMP machines, it has nothing to do with the fact of UP or SMP itself. - Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgpjEqLJq9Fh8.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 09:56:22AM -0400, Mike Tancsa wrote: If it up / downs the interface, it can be painful depending on your setup. In one of the colos I dont have control over, the switch port will block for 15 seconds for Spanning Tree when the interface transitions like that. Even in cases where this does not happen, a 1-2 second network outage can play havoc with some applications. Ouch! This is one of many reasons people don't use STP. (I did note the colos I don't have control over part -- frustrating eh?) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 05:28:24PM +0200, Oliver Brandmueller wrote: Hi Scott, On Wed, Sep 27, 2006 at 03:16:57AM -0600, Scott Long wrote: Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. Since you've stated that it seems to be related to shared interrupts, the first possibility is more likely. However, I'm not sure why the symptom would only be showing up now. The Intel docs say that the 82547EI are a bit interesting, and I wonder if assumptions that we make about PCI ordering aren't true (or if there are bugs that make our assumptions invalid). Does this happen after there has been a lot of disk activity, like a large tar extraction? Are you using the SMBus interface at all, or is it sitting completely idle? Disk activity does not trigger the problem, I hammered the disk with around 85 MB/s (dd) for about half an hour without seeing any effect. A CPU bound thing like a buildworld triggered the problem. I'm not sure that's a valid test by it self. As things go, dd is pretty easy on the disk IO system especially with large buffer sizes. I'd suggest tar extraction or possible parallel tar extraction. The goal is to generate a large number of transations not large transactions. -- Brooks pgpeCNuKVMIZG.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 05:28:24PM +0200, Oliver Brandmueller wrote: Disk activity does not trigger the problem, I hammered the disk with around 85 MB/s (dd) for about half an hour without seeing any effect. A CPU bound thing like a buildworld triggered the problem. The SMBus Interface is not used at all (it's not even really usable). Anyway, as soon as I unload the ichsmb module I cannot triger the problem anymore. If I load it again, the problem cann again be triggered by a buildworld. Statistical relevance: I did 4 buildworlds, alternating the load/unload of ichsmb - both times with ichsmb loaded I saw 3 watchdog timeouts during the buildworld was running, while ichsmb was not loaded I did not see a single watchdog timeout. The use of the interface was around the same during all the time (constant NFS traffic of around 1-2 MBit/s). Interesting find. For what it's worth -- I too load the appropriate smbus drivers on the system with the em0 problem (loading smbus and ichsmb). That system is a single processor / single core system, with HT disabled in the BIOS (which doesn't matter since FreeBSD disables it anyways). Kernel is non-SMP. Only reason I mention this is: The UP/SMP idea seems to be only of interest, because on an UP machine it's more likely to share interrupts than on SMP machines, it has nothing to do with the fact of UP or SMP itself. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi, On Wed, Sep 27, 2006 at 10:50:55AM -0500, Brooks Davis wrote: Disk activity does not trigger the problem, I hammered the disk with around 85 MB/s (dd) for about half an hour without seeing any effect. A CPU bound thing like a buildworld triggered the problem. I'm not sure that's a valid test by it self. As things go, dd is pretty easy on the disk IO system especially with large buffer sizes. I'd suggest tar extraction or possible parallel tar extraction. The goal is to generate a large number of transations not large transactions. The dd generated (accordings to gstat) around 600 tps by itself. Anyway, at night, when the to-disk-backups from the other machines are coming in, there are variuos large and small disk operations - and it never happens in that case. On the other hand my other server, which does only few things on the disk, but has fewer CPU power and more CPU bound actions to do shows the behaviour very often (until I started to use polling). Disk activity might be a reason if the interrupt is shared with a disk controller, which is not the case for any of my affected machines. - Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgpoX252pE4W2.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi, On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote: The SMBus Interface is not used at all (it's not even really usable). Anyway, as soon as I unload the ichsmb module I cannot triger the problem anymore. If I load it again, the problem cann again be triggered by a buildworld. Statistical relevance: I did 4 buildworlds, alternating the load/unload of ichsmb - both times with ichsmb loaded I saw 3 watchdog timeouts during the buildworld was running, while ichsmb was not loaded I did not see a single watchdog timeout. The use of the interface was around the same during all the time (constant NFS traffic of around 1-2 MBit/s). Interesting find. For what it's worth -- I too load the appropriate smbus drivers on the system with the em0 problem (loading smbus and ichsmb). That system is a single processor / single core system, with HT disabled in the BIOS (which doesn't matter since FreeBSD disables it anyways). Kernel is non-SMP. Only reason I mention this is: The UP/SMP idea seems to be only of interest, because on an UP machine it's more likely to share interrupts than on SMP machines, it has nothing to do with the fact of UP or SMP itself. I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. - Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgpJbOPp94Jsf.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi! On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote: I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. I can confirm that making em0 share an interrupt with the SATA-controller on my box makes the problem much much more apparent. HTH, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Oliver Brandmueller wrote: Hi, On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote: The SMBus Interface is not used at all (it's not even really usable). Anyway, as soon as I unload the ichsmb module I cannot triger the problem anymore. If I load it again, the problem cann again be triggered by a buildworld. Statistical relevance: I did 4 buildworlds, alternating the load/unload of ichsmb - both times with ichsmb loaded I saw 3 watchdog timeouts during the buildworld was running, while ichsmb was not loaded I did not see a single watchdog timeout. The use of the interface was around the same during all the time (constant NFS traffic of around 1-2 MBit/s). Interesting find. For what it's worth -- I too load the appropriate smbus drivers on the system with the em0 problem (loading smbus and ichsmb). That system is a single processor / single core system, with HT disabled in the BIOS (which doesn't matter since FreeBSD disables it anyways). Kernel is non-SMP. Only reason I mention this is: The UP/SMP idea seems to be only of interest, because on an UP machine it's more likely to share interrupts than on SMP machines, it has nothing to do with the fact of UP or SMP itself. I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. - Oliver My theory here is that something in the kernel, likely VM/VFS, is holding the Giant lock for an inordinate amount of time. During this time, an interrupt fires on the shared em/ichsmb interrupt. The em interrupt handler runs and schedules a task to handle the event. Then the system blocks the interrupt at the PIC and schedules the ichsmb ithread. However, as soon as this ithread tries to run, it gets blocked on the Giant lock that is held elsewhere. While it is blocked, the interrupt stays masked at the PIC, blocking out both ichsmb and em device interrupts. Normally the PIC would get unmasked after the ithread has run, but until the ithread unblocks, this cannot happen. This goes on long enough that pending transactions on the em interface trigger a timeout. Assuming the this analysis is correct, there are a couple of questions. First would be, why is the ithread being blocked for so long? Is the Giant lock actually being held continuously for that long, or is being dropped and relocked often but the scheduler isn't giving the ithread a chance to grab it and run? Second is, why is this only being noticed now? Whether the em driver uses an INTR_FAST handler, like it does now, or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb driver and its interaction with the Giant lock. Maybe there isn't a direct correlation here, and it's just a coincidence that something else in the system changed at the same time as the driver changing. I have a few ideas on tracking down the root cause, but they are pretty pretty painful and slow. The root cause does need to be found and fixed, as it's either a very bad scheduler bug, or a very badly misbehaving subsystem. Both have implications for other possible problems in FreeBSD. Also, the usb driver has the same potential for blocking as the ichsmb driver, as do other drivers. But in the mean time, something needs to be done for 6.2. The options are: 1. Revert the em driver to its 6.1 form, ask people to test if the problem persists. If it doesn't, leave it at that for now. 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither uses an ithread. Without an ithread, no PIC masking will happen, and these drivers can block all they want without interfering with the em driver. This is a bit of risky work, though, and may not be possible if the devices don't support certain functionality. Also, it doesn't address the root problem. But, getting more interrupt handlers away from needing Giant is a good thing, even if this only a band-aid. 3. Spend the time tracking down and fixing the root problem for 6.2. This is ideal, but it is also an unbounded problem. Thus, it is absolutely not conducive for having a timely and successful 6.2 release. 4. Do nothing for now and tell people to disable usb, ichsmb, etc, as needed. This, of course, is not a good option. Option 1 is the quickest and likely most risk-free fix for the 6.2 release. If someone could test doing a revert and report back, I would appreciate it. Any volunteers? Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Well, HTH - I don't have *any* problems with this configuration: FreeBSD 6.2-PRERELEASE #6: Wed Sep 20 18:52:56 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/MAILSMP CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf48 Stepping = 8 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF Cores per package: 2 Logical CPUs per core: 2 real memory = 9126805504 (8704 MB) avail memory = 8302972928 (7918 MB) ACPI APIC Table: DELL PE BKC pci6: ACPI PCI bus on pcib6 em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xecc0-0xecff mem 0xfe6e-0xfe6f irq 64 at device 7.0 on pci6 em0: [FAST] pci7: ACPI PCI bus on pcib7 em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xdcc0-0xdcff mem 0xfe4e-0xfe4f irq 65 at device 8.0 on pci7 em1: [FAST] [EMAIL PROTECTED]:7:0: class=0x02 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' [EMAIL PROTECTED]:8:0: class=0x02 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82547EI Gigabit Ethernet Controller' em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU media: Ethernet autoselect (1000baseTX full-duplex) (em1 is not used) interrupt total rate irq1: atkbd01139 0 irq6: fdc0 8 0 irq14: ata0 36 0 irq18: uhci221714980 37 irq23: ehci0 3 0 irq46: amr0 20493929 34 irq64: em0 106173807181 cpu0: timer 1172649960 1999 This is heave duty mail server, loaded with a lot of postfix/amavis/courier processes.. I can provide my kernel/loader/sysctl configuration at request. Ponc -- Tomasz Pilat http://poncki.freebsd.pl./ AXEL SPRINGER POLSKA Sp. z o.o. PONC-RIPE | PGPKEY-EDEB47FC A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on e-mail/Usenet? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 06:32:59PM +0200, Patrick M. Hausen wrote: On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote: I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. I can confirm that making em0 share an interrupt with the SATA-controller on my box makes the problem much much more apparent. So we're all on the same page here -- this really appears to be some kind-of kernel interrupt handler problem (something somewhere is getting deadlocked? Not sure). Has anyone tried rolling back to previous 6.2 builds to try and figure out timeframes when this was introduced? From my perspective, it happened sometime between August and the end of September. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Sep 27, 2006, at 11:50 AM, Jeremy Chadwick wrote: On Wed, Sep 27, 2006 at 09:56:22AM -0400, Mike Tancsa wrote: If it up / downs the interface, it can be painful depending on your setup. In one of the colos I dont have control over, the switch port will block for 15 seconds for Spanning Tree when the interface transitions like that. Even in cases where this does not happen, a 1-2 second network outage can play havoc with some applications. Ouch! This is one of many reasons people don't use STP. (I did note the colos I don't have control over part -- frustrating eh?) You could enable port fast and still have spanning tree in place. What many reasons do you and others have to shun STP? -jav ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Hi, Scott! On Wed, Sep 27, 2006 at 10:32:49AM -0600, Scott Long wrote: 1. Revert the em driver to its 6.1 form, ask people to test if the problem persists. If it doesn't, leave it at that for now. For me the problem manifested itself some time between 6.0 and 6.1. I did the testing with Pyun with 6-STABLE up to two weeks before 6.2-PRERELEASE. Currently we do not dare upgrade typo3.org from 5.5 to 6.x for precisely this problem. 5.5 is running fine for the time being, no need to hurry for the latest and greatest, yet. And the problem is not at all bound to shared interrupts. I'll let you ssh in, if you like. Kind regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
As an optional data point you might wish to consider the Intel driver I am about to release, it has everything that 6.2 does EXCEPT the interrupt changes. I kept those out because I didn't want to break backward compatibility. If someone that has repro'd this problem wants to check this speak up and I'll send a tarball. Jack On 9/27/06, Scott Long [EMAIL PROTECTED] wrote: Oliver Brandmueller wrote: Hi, On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote: The SMBus Interface is not used at all (it's not even really usable). Anyway, as soon as I unload the ichsmb module I cannot triger the problem anymore. If I load it again, the problem cann again be triggered by a buildworld. Statistical relevance: I did 4 buildworlds, alternating the load/unload of ichsmb - both times with ichsmb loaded I saw 3 watchdog timeouts during the buildworld was running, while ichsmb was not loaded I did not see a single watchdog timeout. The use of the interface was around the same during all the time (constant NFS traffic of around 1-2 MBit/s). Interesting find. For what it's worth -- I too load the appropriate smbus drivers on the system with the em0 problem (loading smbus and ichsmb). That system is a single processor / single core system, with HT disabled in the BIOS (which doesn't matter since FreeBSD disables it anyways). Kernel is non-SMP. Only reason I mention this is: The UP/SMP idea seems to be only of interest, because on an UP machine it's more likely to share interrupts than on SMP machines, it has nothing to do with the fact of UP or SMP itself. I don't think it has to especially with ichsmb here, but only with the fact, that ichsmb is for me exactly the thing that shares the interrupt with the em interface that shows the problems. - Oliver My theory here is that something in the kernel, likely VM/VFS, is holding the Giant lock for an inordinate amount of time. During this time, an interrupt fires on the shared em/ichsmb interrupt. The em interrupt handler runs and schedules a task to handle the event. Then the system blocks the interrupt at the PIC and schedules the ichsmb ithread. However, as soon as this ithread tries to run, it gets blocked on the Giant lock that is held elsewhere. While it is blocked, the interrupt stays masked at the PIC, blocking out both ichsmb and em device interrupts. Normally the PIC would get unmasked after the ithread has run, but until the ithread unblocks, this cannot happen. This goes on long enough that pending transactions on the em interface trigger a timeout. Assuming the this analysis is correct, there are a couple of questions. First would be, why is the ithread being blocked for so long? Is the Giant lock actually being held continuously for that long, or is being dropped and relocked often but the scheduler isn't giving the ithread a chance to grab it and run? Second is, why is this only being noticed now? Whether the em driver uses an INTR_FAST handler, like it does now, or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb driver and its interaction with the Giant lock. Maybe there isn't a direct correlation here, and it's just a coincidence that something else in the system changed at the same time as the driver changing. I have a few ideas on tracking down the root cause, but they are pretty pretty painful and slow. The root cause does need to be found and fixed, as it's either a very bad scheduler bug, or a very badly misbehaving subsystem. Both have implications for other possible problems in FreeBSD. Also, the usb driver has the same potential for blocking as the ichsmb driver, as do other drivers. But in the mean time, something needs to be done for 6.2. The options are: 1. Revert the em driver to its 6.1 form, ask people to test if the problem persists. If it doesn't, leave it at that for now. 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither uses an ithread. Without an ithread, no PIC masking will happen, and these drivers can block all they want without interfering with the em driver. This is a bit of risky work, though, and may not be possible if the devices don't support certain functionality. Also, it doesn't address the root problem. But, getting more interrupt handlers away from needing Giant is a good thing, even if this only a band-aid. 3. Spend the time tracking down and fixing the root problem for 6.2. This is ideal, but it is also an unbounded problem. Thus, it is absolutely not conducive for having a timely and successful 6.2 release. 4. Do nothing for now and tell people to disable usb, ichsmb, etc, as needed. This, of course, is not a good option. Option 1 is the quickest and likely most risk-free fix for the 6.2 release. If someone could test doing a revert and report back, I would appreciate it. Any volunteers? Scott ___ freebsd-stable@freebsd.org mailing list
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 12:44:04PM -0400, Javier Henderson wrote: You could enable port fast and still have spanning tree in place. What many reasons do you and others have to shun STP? Rather than ramble off all the things I've experienced with STP, most of them are covered in this caveat document written by none other than Cisco: http://www.cisco.com/warp/public/473/16.html portfast is mentioned, but I'll remind you that not everyone uses Cisco equipment (nor should they). I consider portfast admission that STP wasn't such a great idea after all. opinion My logic is as follows: a properly managed network should never encounter layer 1 loops. STP is most commonly used for oh crap, I made a mistake situations. Humans aren't perfect, but if you've engineers who continue to make physical segment loops over and over, you're better off getting different engineers rather than deploying STP and making a mess of network fail-over reliability. /opinion Regardless, this is totally off-topic for the list. I'll be more than happy to discuss all of this privately. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
At 12:32 PM 9/27/2006, Scott Long wrote: My theory here is that something in the kernel, likely VM/VFS, is holding the Giant lock for an inordinate amount of time. During this time, an interrupt fires on the shared em/ichsmb interrupt. The em Hi Scott, Do you think this issue is something particular to Intel based chipsets, and specifically NICs that share their interrupt with ichsmb or the USB subsystem ? I have not gone through all the threads, but I dont recall people with say, AMD based boards running into this issue. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote: Hi! On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. Some people experience additional complete hangs of network communications, that may or may not be related to them. I had watchdog timeouts occur on a small network setup, for a: ssh remote cd /usr tar xf - ports | tar xvf - and this resulted in a pretty sparse ports tree on the local drive. Lots of stuff being dropped. Shifting a single big tar-ball worked though. -- Jonathan Chen [EMAIL PROTECTED] -- If everything's under control, you're going too slow - Mario Andretti ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Thu, Sep 28, 2006 at 06:32:16AM +1200, Jonathan Chen wrote: On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote: Hi! On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote: it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes we see some watchdog timeout in the log with a bge card, but maybe it's not the same problem... : As far as I know the watchdog timeouts are _supposed_ to be mostly harmless, i.e. recoverable. Some people experience additional complete hangs of network communications, that may or may not be related to them. I had watchdog timeouts occur on a small network setup, for a: ssh remote cd /usr tar xf - ports | tar xvf - and this resulted in a pretty sparse ports tree on the local drive. Lots of stuff being dropped. Shifting a single big tar-ball worked though. I'm highly skeptical of this claim. It's possible the connection failed part way through and thus you didn't get all your files, but you wouldn't get random dropouts. TCP doesn't work that way. -- Brooks pgp8zFZ1luFKR.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On Wed, 2006-Sep-27 10:32:49 -0600, Scott Long wrote: My theory here is that something in the kernel, likely VM/VFS, is holding the Giant lock for an inordinate amount of time. In the past (RELENG_5) I've had major problems with syncer delaying interrupt threads for long periods (I've seen 8msec). See http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html I'm not sure if this is still a problem (but I am still having some problems which may be caused by excessive interrupt and will be doing some debugging as I get time). I have a few ideas on tracking down the root cause, but they are pretty pretty painful and slow. In my case, I was fairly certain that the problem I was seeing was excessive interrupt latency for my driver. The approach I took was to capture TSC, IRQ number and curproc address in lapic_handle_intr(), atpic_handle_intr() and at the beginning of my interrupt handler into a ring buffer. I'd dump the ring buffer into a file using a userland tool and then post-process the file looking for oddities. In my case, there was a _very_ high correlation between long latencies and syncer. If anyone's interested in this approach, I can provide the relevant code diffs. 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither uses an ithread. The problem I ran into with this approach was that my interrupt handler needs to use psignal(9) - which requires PROC_LOCK() which (AFAIK) isn't allowed in an INTR_FAST handler. It would be useful if our interrupt subsystem allowed both INTR_FAST and normal interrupt handlers to be defined. If an INTR_FAST handler is defined then it gets executed and defines whether its associated interrupt thread handler needs to be triggered. If there's no INTR_FAST handler then the interrupt thread is always triggered. -- Peter Jeremy pgpXrDVFGe4sP.pgp Description: PGP signature
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Peter Jeremy wrote: On Wed, 2006-Sep-27 10:32:49 -0600, Scott Long wrote: My theory here is that something in the kernel, likely VM/VFS, is holding the Giant lock for an inordinate amount of time. In the past (RELENG_5) I've had major problems with syncer delaying interrupt threads for long periods (I've seen 8msec). See http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html I'm not sure if this is still a problem (but I am still having some problems which may be caused by excessive interrupt and will be doing some debugging as I get time). I have a few ideas on tracking down the root cause, but they are pretty pretty painful and slow. In my case, I was fairly certain that the problem I was seeing was excessive interrupt latency for my driver. The approach I took was to capture TSC, IRQ number and curproc address in lapic_handle_intr(), atpic_handle_intr() and at the beginning of my interrupt handler into a ring buffer. I'd dump the ring buffer into a file using a userland tool and then post-process the file looking for oddities. In my case, there was a _very_ high correlation between long latencies and syncer. If anyone's interested in this approach, I can provide the relevant code diffs. Yes, I was thinking about the syncer too, but the timeouts for ethernet interfaces are measured in seconds, not milliseconds. 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither uses an ithread. The problem I ran into with this approach was that my interrupt handler needs to use psignal(9) - which requires PROC_LOCK() which (AFAIK) isn't allowed in an INTR_FAST handler. You can define a very simple INTR_FAST handler that just disables the interrupt at the device and then schedules a taskqueue to do the real work. This is what I did for if_em, actually. It would be useful if our interrupt subsystem allowed both INTR_FAST and normal interrupt handlers to be defined. If an INTR_FAST handler is defined then it gets executed and defines whether its associated interrupt thread handler needs to be triggered. If there's no INTR_FAST handler then the interrupt thread is always triggered. This was an SoC2006 project, and I believe it will be committed fairly soon. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
In the past (RELENG_5) I've had major problems with syncer delaying interrupt threads for long periods (I've seen 8msec). See http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html I'm not sure if this is still a problem (but I am still having some problems which may be caused by excessive interrupt and will be doing some debugging as I get time). ... tool and then post-process the file looking for oddities. In my case, there was a _very_ high correlation between long latencies and syncer. If anyone's interested in this approach, I can provide the relevant code diffs. I've seen this problem as well - results in around 9-10ms of occasional scheduling delay for a real-time streaming application that I'm developing. Shutting off softupdates on all of the mounted filesystems helps. Note that the watchdog timeout for the network drivers is usually 8000ms (8 seconds), so this is unlikely to be related to that problem. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
On 37378-12-23 20:59, Patrick M. Hausen wrote: Hello! Well, the best I can say at the moment is, Wow. =-( I guess the thing to do here is to figure out if the problem lies with the em interrupt handler not getting run, or the taskqueue not getting run. I helped Pyun with some debugging by providing ssh access to a machine showing the (seemingly) same problem. At first he thought the interrupt handler of the em driver was the culprit, but we applied quite a few patches and tested afterwards - seems like the driver is not the cause. On -stable occasionally other people complained about very similar looking problems with bge and other drivers. My guess is, though I'm not a kernel developer, just an experienced admin, that em stands out as problematic just by coincidence. Certain onboard network components tend to come with certaiin chipsets and certain architectures. So, Pyun suggested it was a problem with the taskqueue that was introduced some time between 6.0 and 6.1. With my system (Tyan GT20 B5161G20) the problem shows when there is heavy disk and cpu activity, like make buildworld. I made sure that the em interface doesn't share an interrupt with the SATA controller. When the problem occurs, I get the well known watchdog timeout messages and then the system's network activity over that interface freezes completely for a couple of minutes. Usually the system recovers after a while without reboot or other measures. Strange... I've seen exactly that on a (recent) RELENG_6 box but using a dirty old USB 1.1 NIC (aue). I've seen DOWN and UP messages (mostly while rebuilding kernel + world + ports) on the console all the time (but did not care about). The machine in question is an Athlon XP-64 Socket 939, Asus A8N-VM CSM. The USB ethernet NIC is a low budget ADMtek device. My observations are probably not related to your issues but maybe a sign of not really being a driver issue or not GigE related. Greeting, Volker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
David G Lawrence wrote: In the past (RELENG_5) I've had major problems with syncer delaying interrupt threads for long periods (I've seen 8msec). See http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html I'm not sure if this is still a problem (but I am still having some problems which may be caused by excessive interrupt and will be doing some debugging as I get time). ... tool and then post-process the file looking for oddities. In my case, there was a _very_ high correlation between long latencies and syncer. If anyone's interested in this approach, I can provide the relevant code diffs. I've seen this problem as well - results in around 9-10ms of occasional scheduling delay for a real-time streaming application that I'm developing. Shutting off softupdates on all of the mounted filesystems helps. Note that the watchdog timeout for the network drivers is usually 8000ms (8 seconds), so this is unlikely to be related to that problem. Well, I kinda danced around the issue before, but I'll say it now. I, as well as a few others, have seen instances of Giant being held by the syncer for 5 or more seconds at a time. I can't explain why, and I've never been able to catch it in the act in a meaningful way. But it is known to happen. My best wild guess is that the syncer is doing a lot of work (there is no question here), and keeps on getting preempted, and as part of this, it blocks without locks being dropped. Actually, this is most likely exactly what is going on. The syncer is sending out I/O and is getting interrupted+preempted by the sata controller+driver, and it winds up making very slow progress, while never actually releasing Giant. An easy way to test this would be to turn off preemption. Could someone with this problem remove the 'option PREEMPTION' line in their kernel config and recompile/retest? If this is in fact the root cause, then it indeed has nothing to do with em driver INTR_FAST changes. The easiest fix then becomes the ichsmb and usb driver shims that I talked about. The longer term fix is to continue progress on making the syncer run without Giant and also not do so much work. I think that there should also be some discussion on the locking consequences of preemption. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]