Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Fri, Oct 17, 2008 at 03:21:48PM +0200, Aniruddha wrote: On Thu, 2008-10-16 at 08:56 +0900, Pyun YongHyeon wrote: It seems that msk(4) in HEAD does not build correctly on RELENG_7. Try attached patch. Save attached patch to /path/to/patch #cd /usr/src/sys/dev/msk #patch -p0 /path/to/patch/msk.watchdog.diff And rebuild your kernel. This patch failed with the following error: /if_msk.c /usr/src/sys/dev/msk/if_msk.c:845:50: error: macro MEXTADD passed 8 arguments, but takes just 7 /usr/src/sys/dev/msk/if_msk.c: In function 'msk_jumbo_newbuf': /usr/src/sys/dev/msk/if_msk.c:844: error: 'MEXTADD' undeclared (first use in this function) /usr/src/sys/dev/msk/if_msk.c:844: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/msk/if_msk.c:844: error: for each function it appears in.) *** Error code 1 Stop in /usr/obj/usr/src/sys/MYKERNEL. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. Are you sure you've backed out any changes to if_msk.c/if_mskreg.h? You should apply the patch above to if_msk.c/if_mskreg.h in 7.1-PRELEASE. I've verified compilation testing and it should build without problems. -- Regards, Pyun YongHyeon ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Thu, 2008-10-16 at 08:56 +0900, Pyun YongHyeon wrote: It seems that msk(4) in HEAD does not build correctly on RELENG_7. Try attached patch. Save attached patch to /path/to/patch #cd /usr/src/sys/dev/msk #patch -p0 /path/to/patch/msk.watchdog.diff And rebuild your kernel. This patch failed with the following error: /if_msk.c /usr/src/sys/dev/msk/if_msk.c:845:50: error: macro MEXTADD passed 8 arguments, but takes just 7 /usr/src/sys/dev/msk/if_msk.c: In function 'msk_jumbo_newbuf': /usr/src/sys/dev/msk/if_msk.c:844: error: 'MEXTADD' undeclared (first use in this function) /usr/src/sys/dev/msk/if_msk.c:844: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/msk/if_msk.c:844: error: for each function it appears in.) *** Error code 1 Stop in /usr/obj/usr/src/sys/MYKERNEL. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote: Each time my internet connection is under heavy lead it gets killed after a minute of 10. I tried the following commands to get the internet back up, but nothing helped: /etc/rc.d/netif restart ifconfig mynic down ifconfig mynic up Even worse the last time I issued a '/etc/rc.d/netif restart' my whole system hardlocked (wasn't responding to capslock presses). So far the only solution has been te reboot the computer. Is there any way I can prevent my internet connection from getting killed? How do I get it back up after it has been killed? Thanks in advance! What network card are you using? Can you provide output from the following commands? dmesg vmstat -i netstat -in -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
netstat -in I have a Marvell Yukon onboard nic. that's normal ;) but some NORMAL working NIC. you may try to disable some features on that nic too ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 01:17:58PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote: Each time my internet connection is under heavy lead it gets killed after a minute of 10. I tried the following commands to get the internet back up, but nothing helped: /etc/rc.d/netif restart ifconfig mynic down ifconfig mynic up Even worse the last time I issued a '/etc/rc.d/netif restart' my whole system hardlocked (wasn't responding to capslock presses). So far the only solution has been te reboot the computer. Is there any way I can prevent my internet connection from getting killed? How do I get it back up after it has been killed? Thanks in advance! What network card are you using? Can you provide output from the following commands? dmesg vmstat -i netstat -in I have a Marvell Yukon onboard nic. Here's the output: netstat -in NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll msk0 1500 Link#1 29 0 25 0 0 msk0 1500 :0 -5 - - msk0 1500 192.168.2.0/2 192.168.2.111 16 - 14 - - fwe0* 1500 Link#2 0 00 0 0 fwip0 1500 Link#3 0 00 0 0 lo0 16384 Link#4 0 00 0 0 lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 ::1/64 0 -0 - - lo0 16384 127.0.0.0/8 127.0.0.10 -0 - - This looks okay. I see no interface errors, which is good. vmstat -i interrupt total rate irq17: atapci0+ 13 0 irq18: atapci1+ 1045 5 irq20: uhci0 ehci0 13462 69 irq21: fwohci0 3 0 irq23: atapci3102718529 cpu0: timer 386229 1990 irq256: mskc0 46 0 cpu1: timer 376453 1940 Total 879969 4535 msk(4) appears to be using MSI/MSI-X here. One thing worth trying would be to disable MSI/MSI-X. You can disable these by adding the following to your /boot/loader.conf : hw.pci.enable_msix=0 hw.pci.enable_msi=0 Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-BETA #0: Sun Sep 7 13:49:18 UTC 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (3001.18-MHz 686-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e3fdSSE3,RSVD2,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,b19 AMD Features=0x2000LM AMD Features2=0x1LAHF Cores per package: 2 real memory = 3220701184 (3071 MB) avail memory = 3146145792 (3000 MB) ACPI APIC Table: A_M_I_ OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: A_M_I_ OEMRSDT on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, bff0 (3) failed Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on pci0 pci5: ACPI PCI bus on pcib1 vgapci0: VGA-compatible display port 0xc800-0xc8ff mem 0xd000-0xdfff,0xff9f-0xff9f irq 16 at device 0.0 on pci5 pci5: multimedia at device 0.1 (no driver attached) pci0: multimedia at device 27.0 (no driver attached) pcib2: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci4: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge irq 19 at device 28.3 on pci0 pci3: ACPI PCI bus on pcib3 mskc0: Marvell Yukon 88E8053 Gigabit Ethernet port 0xb800-0xb8ff mem 0xff8fc000-0xff8f irq 19 at device 0.0 on pci3 msk0: Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02 on mskc0 msk0: Ethernet address: 00:1e:8c:5a:62:da miibus0: MII bus on msk0 e1000phy0: Marvell 88E Gigabit PHY PHY 0 on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX,
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, 2008-10-15 at 21:09 +0900, PYUN Yong-Hyeon wrote: This controller is known to buggy one. See below. [...] Trying to mount root from ufs:/dev/ad16s3a WARNING: / was not properly dismounted GEOM_LABEL: Label ext2fs/home removed. GEOM_LABEL: Label ext2fs/data removed. mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. I guess the message above could be safely ignored. Adding Yong-Hyeon PYUN to this thread, since he helps maintain the msk(4) driver. Yong-Hyeon, do you know of any conditions where heavy network I/O could cause msk(4) to lock up or stop transmitting traffic, or possibly hard-lock on ifconfig down/up? I think workaround for the controller bug was committed to HEAD(SVN r183346). To original poster, would you try latest if_msk.c from HEAD?(Just copy if_msk.c/if_mskreg.h from HEAD to your box.) You got to help me a little bit here. How do I achieve this? Btw I am running FreeBSD 7.1 BETA. Doesn't that mean the fix is already applied? -- Regards, Aniruddha ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 04:24:21PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 21:09 +0900, PYUN Yong-Hyeon wrote: This controller is known to buggy one. See below. [...] Trying to mount root from ufs:/dev/ad16s3a WARNING: / was not properly dismounted GEOM_LABEL: Label ext2fs/home removed. GEOM_LABEL: Label ext2fs/data removed. mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. I guess the message above could be safely ignored. Adding Yong-Hyeon PYUN to this thread, since he helps maintain the msk(4) driver. Yong-Hyeon, do you know of any conditions where heavy network I/O could cause msk(4) to lock up or stop transmitting traffic, or possibly hard-lock on ifconfig down/up? I think workaround for the controller bug was committed to HEAD(SVN r183346). To original poster, would you try latest if_msk.c from HEAD?(Just copy if_msk.c/if_mskreg.h from HEAD to your box.) You got to help me a little bit here. How do I achieve this? Btw I am running FreeBSD 7.1 BETA. Doesn't that mean the fix is already applied? FreeBSD 7.1-BETA == RELENG_7 in CVS tag terms. FreeBSD 8.0 == CURRENT == HEAD in CVS tag terms. You need to download the data at the below links and save the output in files shown on the left: if_msk.c -- http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/msk/if_msk.c?rev=1.34;content-type=text%2Fplain if_mskreg.h -- http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/msk/if_mskreg.h?rev=1.13;content-type=text%2Fplain These are the msk(4) Ethernet driver in CURRENT. *DO NOT* visit those web pages in a browser then copy/paste the output into a file. Use a tool like fetch(1) or wget(1) to do the work for you. It's not hard. Once you have those two files, you will need to replace your existing driver code with the new files. First make backups: $ cd /usr/src/sys/dev/msk $ cp -p if_msk.c if_msk.c.orig $ cp -p if_mskreg.h if_mskreg.h.orig Now replace the old code with the new: $ cd /wherever/you/downloaded/the/files $ mv if_msk.c /usr/src/sys/dev/msk $ mv if_mskreg.h /usr/src/sys/dev/msk Now you need to rebuild the kernel and install the kernel. In this scenario, when building the kernel DO NOT use any -j flags, as if the driver doesn't build, you'll be scrolling back through pages of data to try and find out why. If the build doesn't occur successfully, paste the errors you get here and one of us can try to figure out why. Otherwise, installkernel and reboot. You should not need to build world for this. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote: Each time my internet connection is under heavy lead it gets killed after a minute of 10. I tried the following commands to get the internet back up, but nothing helped: /etc/rc.d/netif restart ifconfig mynic down ifconfig mynic up Even worse the last time I issued a '/etc/rc.d/netif restart' my whole system hardlocked (wasn't responding to capslock presses). So far the only solution has been te reboot the computer. Is there any way I can prevent my internet connection from getting killed? How do I get it back up after it has been killed? Thanks in advance! What network card are you using? Can you provide output from the following commands? dmesg vmstat -i netstat -in I have a Marvell Yukon onboard nic. Here's the output: netstat -in NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll msk0 1500 Link#1 29 0 25 0 0 msk0 1500 :0 -5 - - msk0 1500 192.168.2.0/2 192.168.2.111 16 - 14 - - fwe0* 1500 Link#2 0 00 0 0 fwip0 1500 Link#3 0 00 0 0 lo0 16384 Link#4 0 00 0 0 lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 ::1/64 0 -0 - - lo0 16384 127.0.0.0/8 127.0.0.10 -0 - - vmstat -i interrupt total rate irq17: atapci0+ 13 0 irq18: atapci1+ 1045 5 irq20: uhci0 ehci0 13462 69 irq21: fwohci0 3 0 irq23: atapci3102718529 cpu0: timer 386229 1990 irq256: mskc0 46 0 cpu1: timer 376453 1940 Total 879969 4535 Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-BETA #0: Sun Sep 7 13:49:18 UTC 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (3001.18-MHz 686-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e3fdSSE3,RSVD2,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,b19 AMD Features=0x2000LM AMD Features2=0x1LAHF Cores per package: 2 real memory = 3220701184 (3071 MB) avail memory = 3146145792 (3000 MB) ACPI APIC Table: A_M_I_ OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: A_M_I_ OEMRSDT on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, bff0 (3) failed Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on pci0 pci5: ACPI PCI bus on pcib1 vgapci0: VGA-compatible display port 0xc800-0xc8ff mem 0xd000-0xdfff,0xff9f-0xff9f irq 16 at device 0.0 on pci5 pci5: multimedia at device 0.1 (no driver attached) pci0: multimedia at device 27.0 (no driver attached) pcib2: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci4: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge irq 19 at device 28.3 on pci0 pci3: ACPI PCI bus on pcib3 mskc0: Marvell Yukon 88E8053 Gigabit Ethernet port 0xb800-0xb8ff mem 0xff8fc000-0xff8f irq 19 at device 0.0 on pci3 msk0: Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02 on mskc0 msk0: Ethernet address: 00:1e:8c:5a:62:da miibus0: MII bus on msk0 e1000phy0: Marvell 88E Gigabit PHY PHY 0 on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto mskc0: [FILTER] pcib4: ACPI PCI-PCI bridge irq 17 at device 28.5 on pci0 pci2: ACPI PCI bus on pcib4 atapci0: JMicron AHCI controller mem 0xff7fe000-0xff7f irq 17 at device 0.0 on pci2 atapci0: [ITHREAD] atapci0: AHCI Version 01.00 controller with 2 ports detected ata2: ATA channel 0 on atapci0 ata2: [ITHREAD] ata3: ATA channel 1 on atapci0 ata3: [ITHREAD] atapci1: JMicron JMB363 UDMA133 controller port
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 04:31:01AM -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 01:17:58PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote: Each time my internet connection is under heavy lead it gets killed after a minute of 10. I tried the following commands to get the internet back up, but nothing helped: /etc/rc.d/netif restart ifconfig mynic down ifconfig mynic up Even worse the last time I issued a '/etc/rc.d/netif restart' my whole system hardlocked (wasn't responding to capslock presses). So far the only solution has been te reboot the computer. Is there any way I can prevent my internet connection from getting killed? How do I get it back up after it has been killed? Thanks in advance! What network card are you using? Can you provide output from the following commands? dmesg vmstat -i netstat -in I have a Marvell Yukon onboard nic. Here's the output: netstat -in NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll msk0 1500 Link#1 29 0 25 0 0 msk0 1500 :0 -5 - - msk0 1500 192.168.2.0/2 192.168.2.111 16 - 14 - - fwe0* 1500 Link#2 0 00 0 0 fwip0 1500 Link#3 0 00 0 0 lo0 16384 Link#4 0 00 0 0 lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 ::1/64 0 -0 - - lo0 16384 127.0.0.0/8 127.0.0.10 -0 - - This looks okay. I see no interface errors, which is good. vmstat -i interrupt total rate irq17: atapci0+ 13 0 irq18: atapci1+ 1045 5 irq20: uhci0 ehci0 13462 69 irq21: fwohci0 3 0 irq23: atapci3102718529 cpu0: timer 386229 1990 irq256: mskc0 46 0 cpu1: timer 376453 1940 Total 879969 4535 msk(4) appears to be using MSI/MSI-X here. One thing worth trying would be to disable MSI/MSI-X. You can disable these by adding the following to your /boot/loader.conf : hw.pci.enable_msix=0 hw.pci.enable_msi=0 The command above will disable all MSI/MSIX capability of box. If the intention is to disable MSI feature of Marvell network controller add hw.msk.msi_disable=1 to /boot/loader.conf. But I don't think you need to disable MSI capability unless you have buggy PCI bridges. Without MSI msk(4) would normally share interrupts with other devices(e.g. USB). Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-BETA #0: Sun Sep 7 13:49:18 UTC 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (3001.18-MHz 686-class CPU) [...] mskc0: Marvell Yukon 88E8053 Gigabit Ethernet port 0xb800-0xb8ff mem 0xff8fc000-0xff8f irq 19 at device 0.0 on pci3 msk0: Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02 on mskc0 msk0: Ethernet address: 00:1e:8c:5a:62:da miibus0: MII bus on msk0 e1000phy0: Marvell 88E Gigabit PHY PHY 0 on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto mskc0: [FILTER] This controller is known to buggy one. See below. [...] Trying to mount root from ufs:/dev/ad16s3a WARNING: / was not properly dismounted GEOM_LABEL: Label ext2fs/home removed. GEOM_LABEL: Label ext2fs/data removed. mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. I guess the message above could be safely ignored. Adding Yong-Hyeon PYUN to this thread, since he helps maintain the msk(4) driver. Yong-Hyeon, do you know of any conditions where heavy network I/O could cause msk(4) to lock up or stop transmitting traffic, or possibly hard-lock on ifconfig down/up? I think workaround for the controller bug was committed to HEAD(SVN r183346). To original poster, would you try latest if_msk.c from HEAD?(Just copy
Re: Under heavy load internet gets killed, only a reboot can bring it back up
One thing worth trying would be to disable MSI/MSI-X. You can disable these by adding the following to your /boot/loader.conf : hw.pci.enable_msix=0 hw.pci.enable_msi=0 what's wrong in MSI interrupts? mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. or just connectors should be cleaner or card isn't fitted well - contact problems. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 09:09:11PM +0900, PYUN Yong-Hyeon wrote: On Wed, Oct 15, 2008 at 04:31:01AM -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 01:17:58PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote: On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote: Each time my internet connection is under heavy lead it gets killed after a minute of 10. I tried the following commands to get the internet back up, but nothing helped: /etc/rc.d/netif restart ifconfig mynic down ifconfig mynic up Even worse the last time I issued a '/etc/rc.d/netif restart' my whole system hardlocked (wasn't responding to capslock presses). So far the only solution has been te reboot the computer. Is there any way I can prevent my internet connection from getting killed? How do I get it back up after it has been killed? Thanks in advance! What network card are you using? Can you provide output from the following commands? dmesg vmstat -i netstat -in I have a Marvell Yukon onboard nic. Here's the output: netstat -in NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll msk0 1500 Link#1 29 0 25 0 0 msk0 1500 :0 -5 - - msk0 1500 192.168.2.0/2 192.168.2.111 16 - 14 - - fwe0* 1500 Link#2 0 00 0 0 fwip0 1500 Link#3 0 00 0 0 lo0 16384 Link#4 0 00 0 0 lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 ::1/64 0 -0 - - lo0 16384 127.0.0.0/8 127.0.0.10 -0 - - This looks okay. I see no interface errors, which is good. vmstat -i interrupt total rate irq17: atapci0+ 13 0 irq18: atapci1+ 1045 5 irq20: uhci0 ehci0 13462 69 irq21: fwohci0 3 0 irq23: atapci3102718529 cpu0: timer 386229 1990 irq256: mskc0 46 0 cpu1: timer 376453 1940 Total 879969 4535 msk(4) appears to be using MSI/MSI-X here. One thing worth trying would be to disable MSI/MSI-X. You can disable these by adding the following to your /boot/loader.conf : hw.pci.enable_msix=0 hw.pci.enable_msi=0 The command above will disable all MSI/MSIX capability of box. If the intention is to disable MSI feature of Marvell network controller add hw.msk.msi_disable=1 to /boot/loader.conf. But I don't think you need to disable MSI capability unless you have buggy PCI bridges. Without MSI msk(4) would normally share interrupts with other devices(e.g. USB). Based on your below conclusion (about this particular Marvell NIC and/or PHY being buggy), I don't think disabling MSI/MSI-X will do any good. mskc0: Marvell Yukon 88E8053 Gigabit Ethernet port 0xb800-0xb8ff mem 0xff8fc000-0xff8f irq 19 at device 0.0 on pci3 msk0: Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02 on mskc0 msk0: Ethernet address: 00:1e:8c:5a:62:da miibus0: MII bus on msk0 e1000phy0: Marvell 88E Gigabit PHY PHY 0 on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto mskc0: [FILTER] This controller is known to buggy one. See below. Adding Yong-Hyeon PYUN to this thread, since he helps maintain the msk(4) driver. Yong-Hyeon, do you know of any conditions where heavy network I/O could cause msk(4) to lock up or stop transmitting traffic, or possibly hard-lock on ifconfig down/up? I think workaround for the controller bug was committed to HEAD(SVN r183346). To original poster, would you try latest if_msk.c from HEAD?(Just copy if_msk.c/if_mskreg.h from HEAD to your box.) As usual, thanks much for the explanation. :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, 2008-10-15 at 07:43 -0700, Jeremy Chadwick wrote: Now you need to rebuild the kernel and install the kernel. In this scenario, when building the kernel DO NOT use any -j flags, as if the driver doesn't build, you'll be scrolling back through pages of data to try and find out why. If the build doesn't occur successfully, paste the errors you get here and one of us can try to figure out why. Otherwise, installkernel and reboot. You should not need to build world for this. Thanks for the extensive description. Unfortunately I got the following error: param inline-unit-growth=100 --param large-function-growth=1000 -mno-align-long -strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -m no-sse3 -ffreestanding /usr/src/sys/dev/msk/if_msk.c:241:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:244:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:245:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:246:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:247:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:248:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:249:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:250:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:251:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:252:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:253:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:254:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:255:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:256:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:258:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:259:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:260:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:261:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:262:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:263:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:264:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:265:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:267:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:268:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:269:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:270:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:271:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:273:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:274:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:275:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:276:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:278:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:279:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:280:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:282:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:283:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:285:10: error: #include expects FILENAME or FIL ENAME /usr/src/sys/dev/msk/if_msk.c:292:10: error: #include expects FILENAME or FIL
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 01:40:45PM +0200, Wojciech Puchar wrote: One thing worth trying would be to disable MSI/MSI-X. You can disable these by adding the following to your /boot/loader.conf : hw.pci.enable_msix=0 hw.pci.enable_msi=0 what's wrong in MSI interrupts? Nothing -- but there are known compatibility problems with MSI/MSI-X on some boards. I remember reading about this with regards to em(4) not too long ago. It's worth ruling out, especially since his problem is reproducible (if disabling MSI doesn't fix the problem, he can simply remove those two loader.conf variables and we've ruled out one possibility). mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. or just connectors should be cleaner or card isn't fitted well - contact problems. I'm under the impression his NIC is on-board, not a physical PCI-E card. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 07:26:36PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 07:43 -0700, Jeremy Chadwick wrote: Now you need to rebuild the kernel and install the kernel. In this scenario, when building the kernel DO NOT use any -j flags, as if the driver doesn't build, you'll be scrolling back through pages of data to try and find out why. If the build doesn't occur successfully, paste the errors you get here and one of us can try to figure out why. Otherwise, installkernel and reboot. You should not need to build world for this. Thanks for the extensive description. Unfortunately I got the following error: param inline-unit-growth=100 --param large-function-growth=1000 -mno-align-long -strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -m no-sse3 -ffreestanding /usr/src/sys/dev/msk/if_msk.c:241:10: error: #include expects FILENAME or FILENAME How exactly did you download the URLs I gave you? Can you show me what's on line 241 of if_msk.c? A 'grep ^#include if_msk.c' for me returns lines which only include filenames surrounded with or . -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 10:02:46PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 11:49 -0700, Jeremy Chadwick wrote: How exactly did you download the URLs I gave you? Can you show me what's on line 241 of if_msk.c? A 'grep ^#include if_msk.c' for me returns lines which only include filenames surrounded with or . I downloaded the files this way: wget http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/msk/if_msk.c?rev=1.34;content-type=text%2Fplain This is wrong. Your shell has interpreted variables in the URL, and you ended up downloading the wrong URL, which caused HTML and other things to appear in the file. Is this your first time using UNIX? This is a little surprising. You need to do (note the apostrophes, DO NOT use double-quotes): $ wget -O if_msk.c 'http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/msk/if_msk.c?rev=1.34;content-type=text%2Fplain' And be sure to do the same for the include file (change the -O argument, obviously). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Under heavy load internet gets killed, only a reboot can bring it back up
On Wed, Oct 15, 2008 at 04:24:21PM +0200, Aniruddha wrote: On Wed, 2008-10-15 at 21:09 +0900, PYUN Yong-Hyeon wrote: This controller is known to buggy one. See below. [...] Trying to mount root from ufs:/dev/ad16s3a WARNING: / was not properly dismounted GEOM_LABEL: Label ext2fs/home removed. GEOM_LABEL: Label ext2fs/data removed. mskc0: Uncorrectable PCI Express error mskc0: Uncorrectable PCI Express error Those errors at the end of your dmesg don't look good; could be the sign of a NIC or motherboard that's going bad, or possibly a very strange driver problem. I guess the message above could be safely ignored. Adding Yong-Hyeon PYUN to this thread, since he helps maintain the msk(4) driver. Yong-Hyeon, do you know of any conditions where heavy network I/O could cause msk(4) to lock up or stop transmitting traffic, or possibly hard-lock on ifconfig down/up? I think workaround for the controller bug was committed to HEAD(SVN r183346). To original poster, would you try latest if_msk.c from HEAD?(Just copy if_msk.c/if_mskreg.h from HEAD to your box.) You got to help me a little bit here. How do I achieve this? Btw I am running FreeBSD 7.1 BETA. Doesn't that mean the fix is already applied? It seems that msk(4) in HEAD does not build correctly on RELENG_7. Try attached patch. Save attached patch to /path/to/patch #cd /usr/src/sys/dev/msk #patch -p0 /path/to/patch/msk.watchdog.diff And rebuild your kernel. -- Regards, Pyun YongHyeon Index: if_msk.c === --- if_msk.c(revision 183165) +++ if_msk.c(working copy) @@ -244,6 +244,9 @@ static int msk_handle_events(struct msk_softc *); static void msk_handle_hwerr(struct msk_if_softc *, uint32_t); static void msk_intr_hwerr(struct msk_softc *); +#ifndef __NO_STRICT_ALIGNMENT +static __inline void msk_fixup_rx(struct mbuf *); +#endif static void msk_rxeof(struct msk_if_softc *, uint32_t, int); static void msk_jumbo_rxeof(struct msk_if_softc *, uint32_t, int); static void msk_txeof(struct msk_if_softc *, int); @@ -783,7 +786,12 @@ return (ENOBUFS); m-m_len = m-m_pkthdr.len = MCLBYTES; - m_adj(m, ETHER_ALIGN); + if ((sc_if-msk_flags MSK_FLAG_RAMBUF) == 0) + m_adj(m, ETHER_ALIGN); +#ifndef __NO_STRICT_ALIGNMENT + else + m_adj(m, MSK_RX_BUF_ALIGN); +#endif if (bus_dmamap_load_mbuf_sg(sc_if-msk_cdata.msk_rx_tag, sc_if-msk_cdata.msk_rx_sparemap, m, segs, nsegs, @@ -840,7 +848,12 @@ return (ENOBUFS); } m-m_pkthdr.len = m-m_len = MSK_JLEN; - m_adj(m, ETHER_ALIGN); + if ((sc_if-msk_flags MSK_FLAG_RAMBUF) == 0) + m_adj(m, ETHER_ALIGN); +#ifndef __NO_STRICT_ALIGNMENT + else + m_adj(m, MSK_RX_BUF_ALIGN); +#endif if (bus_dmamap_load_mbuf_sg(sc_if-msk_cdata.msk_jumbo_rx_tag, sc_if-msk_cdata.msk_jumbo_rx_sparemap, m, segs, nsegs, @@ -1041,14 +1054,16 @@ { int next; int i; - uint8_t val; /* Get adapter SRAM size. */ - val = CSR_READ_1(sc, B2_E_0); - sc-msk_ramsize = (val == 0) ? 128 : val * 4; + sc-msk_ramsize = CSR_READ_1(sc, B2_E_0) * 4; if (bootverbose) device_printf(sc-msk_dev, RAM buffer size : %dKB\n, sc-msk_ramsize); + if (sc-msk_ramsize == 0) + return (0); + + sc-msk_pflags |= MSK_FLAG_RAMBUF; /* * Give receiver 2/3 of memory and round down to the multiple * of 1024. Tx/Rx RAM buffer size of Yukon II shoud be multiple @@ -1412,6 +1427,7 @@ sc_if-msk_if_dev = dev; sc_if-msk_port = port; sc_if-msk_softc = sc; + sc_if-msk_flags = sc-msk_pflags; sc-msk_if[port] = sc_if; /* Setup Tx/Rx queue register offsets. */ if (port == MSK_PORT_A) { @@ -1976,6 +1992,7 @@ struct msk_rxdesc *jrxd; struct msk_jpool_entry *entry; uint8_t *ptr; + bus_size_t rxalign; int error, i; mtx_init(sc_if-msk_jlist_mtx, msk_jlist_mtx, NULL, MTX_DEF); @@ -2107,9 +2124,16 @@ goto fail; } + rxalign = 1; + /* +* Workaround hardware hang which seems to happen when Rx buffer +* is not aligned on multiple of FIFO word(8 bytes). +*/ + if ((sc_if-msk_flags MSK_FLAG_RAMBUF) != 0) + rxalign = MSK_RX_BUF_ALIGN; /* Create tag for Rx buffers. */ error = bus_dma_tag_create(sc_if-msk_cdata.msk_parent_tag,/* parent */ - 1, 0, /* alignment, boundary */ + rxalign, 0, /* alignment, boundary */ BUS_SPACE_MAXADDR, /* lowaddr */ BUS_SPACE_MAXADDR,