Re: panic: spin lock held too long (RELENG_8 from today)
on 07/07/2011 08:55 Mike Tancsa said the following: I did a buildworld on this box to bring it up to RELENG_8 for the BIND fixes. Unfortunately, the formerly solid box (April 13th kernel) panic'd tonight with Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long panic: spin lock held too long cpuid = 0 Uptime: 13h30m4s Physical memory: 2035 MB Its a somewhat busy box taking in mail as well as backups for a few servers over nfs. At the time, it would have been getting about 250Mb/s inbound on its gigabit interface. Full core.txt file at http://www.tancsa.com/core-jul8-2011.txt I thought that this was supposed to contain output of 'thread apply all bt' in kgdb. Anyway, I think that stacktrace for tid 100107 may have some useful information. #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc5dac5c0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc5dac5c0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc5dac5c0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0, pri=Variable pri is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824, so=0xc6372ce0, tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xcaa8d200) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 (kgdb) -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic: spin lock held too long (RELENG_8 from today)
On Thu, Jul 07, 2011 at 10:36:42AM +0300, Andriy Gapon wrote: on 07/07/2011 08:55 Mike Tancsa said the following: I did a buildworld on this box to bring it up to RELENG_8 for the BIND fixes. Unfortunately, the formerly solid box (April 13th kernel) panic'd tonight with Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long panic: spin lock held too long cpuid = 0 Uptime: 13h30m4s Physical memory: 2035 MB Its a somewhat busy box taking in mail as well as backups for a few servers over nfs. At the time, it would have been getting about 250Mb/s inbound on its gigabit interface. Full core.txt file at http://www.tancsa.com/core-jul8-2011.txt I thought that this was supposed to contain output of 'thread apply all bt' in kgdb. Anyway, I think that stacktrace for tid 100107 may have some useful information. #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc5dac5c0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc5dac5c0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc5dac5c0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0, pri=Variable pri is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824, so=0xc6372ce0, tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xcaa8d200) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 (kgdb) BTW, we had a similar panic, spinlock held too long, the spinlock is the sched lock N, on busy 8-core box recently upgraded to the stable/8. Unfortunately, machine hung dumping core, so the stack trace for the owner thread was not available. I was unable to make any conclusion from the data that was present. If the situation is reproducable, you coulld try to revert r221937. This is pure speculation, though. pgpW2o7azLFBo.pgp Description: PGP signature
system internal timer runs 10 times too slow
We upgraded an existing system to a new motherboard/CPU and found that timing in various programs is very odd. For example top only updates every 10 seconds instead of every second. And this confirms the oddness: # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:01 EST 2011 Thu Jul 7 19:09:11 EST 2011 Thu Jul 7 19:09:21 EST 2011 10 seconds instead of 1. So I looked first at the kernel timers: # dmesg | grep -i time Timecounter i8254 frequency 1193182 Hz quality 0 Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 pci3: multimedia, HDA at device 0.1 (no driver attached) atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 14318180 Hz quality 900 Timecounters tick every 1.000 msec I switched i8254 and then to HPET. No difference. # sysctl -w kern.timecounter.hardware=i8254 kern.timecounter.hardware: ACPI-fast - i8254 # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:40 EST 2011 Thu Jul 7 19:09:41 EST 2011 I switched to TSC: # sysctl -w kern.timecounter.hardware=TSC kern.timecounter.hardware: HPET - TSC # while true; do echo `date`; sleep 1; done Thu Jul 7 19:25:56 EST 2011 Thu Jul 7 19:25:57 EST 2011 Thu Jul 7 19:25:58 EST 2011 Now this looks like it fixed the problem, but actually it is worse. Now the clock matches what you'd expect, but there is still 10 seconds in real time between those date entries. That is, now the system clock is running 10 times too slow as well. # uname -a FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Base board information Manufacturer: ASUSTeK Computer INC. Product Name: P6X58D-E BIOS information Vendor: American Megatrends Inc. Version: 0502 Release Date: 11/16/2010 BIOS Revision: 8.15 CPU Model: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz Thanks in advance for any help. Ari -- -- Aristedes Maniatis ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: system internal timer runs 10 times too slow
On Thu, Jul 07, 2011 at 07:39:05PM +1000, Aristedes Maniatis wrote: We upgraded an existing system to a new motherboard/CPU and found that timing in various programs is very odd. For example top only updates every 10 seconds instead of every second. And this confirms the oddness: # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:01 EST 2011 Thu Jul 7 19:09:11 EST 2011 Thu Jul 7 19:09:21 EST 2011 10 seconds instead of 1. So I looked first at the kernel timers: # dmesg | grep -i time Timecounter i8254 frequency 1193182 Hz quality 0 Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 pci3: multimedia, HDA at device 0.1 (no driver attached) atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 14318180 Hz quality 900 Timecounters tick every 1.000 msec I switched i8254 and then to HPET. No difference. # sysctl -w kern.timecounter.hardware=i8254 kern.timecounter.hardware: ACPI-fast - i8254 # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:40 EST 2011 Thu Jul 7 19:09:41 EST 2011 I switched to TSC: # sysctl -w kern.timecounter.hardware=TSC kern.timecounter.hardware: HPET - TSC # while true; do echo `date`; sleep 1; done Thu Jul 7 19:25:56 EST 2011 Thu Jul 7 19:25:57 EST 2011 Thu Jul 7 19:25:58 EST 2011 Now this looks like it fixed the problem, but actually it is worse. Now the clock matches what you'd expect, but there is still 10 seconds in real time between those date entries. That is, now the system clock is running 10 times too slow as well. # uname -a FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Base board information Manufacturer: ASUSTeK Computer INC. Product Name: P6X58D-E BIOS information Vendor: American Megatrends Inc. Version: 0502 Release Date: 11/16/2010 BIOS Revision: 8.15 CPU Model:Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz Do you have anything like powerd(8) enabled, or EIST / Intel SpeedStep technology enabled in your system BIOS? If so, can you try disabling powerd and/or disabling EIST/SS? Alternately, and this isn't to say FreeBSD doesn't have a problem, do you have a replacement/spare motherboard you can try? There's always the possibility that you have a bad crystal on the motherboard and a replacement board would rule that out. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic: spin lock held too long (RELENG_8 from today)
On 7/7/2011 4:20 AM, Kostik Belousov wrote: BTW, we had a similar panic, spinlock held too long, the spinlock is the sched lock N, on busy 8-core box recently upgraded to the stable/8. Unfortunately, machine hung dumping core, so the stack trace for the owner thread was not available. I was unable to make any conclusion from the data that was present. If the situation is reproducable, you coulld try to revert r221937. This is pure speculation, though. Another crash just now after 5hrs uptime. I will try and revert r221937 unless there is any extra debugging you want me to add to the kernel instead ? This is an inbound mail server so a little disruption is possible kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long panic: spin lock held too long cpuid = 0 Uptime: 5h37m43s Physical memory: 2035 MB Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from /boot/kernel/amdsbwd.ko.symbols...done. done. Loaded symbols for /boot/kernel/amdsbwd.ko #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc61892e0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc61892e0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc61892e0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xc55311c0, td=0xc61892e0, pri=Variable pri is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc60386d0, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc60386d0) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc5db8a00) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc618a19c, arg=0xc5db8a00, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc618a19c, sb=0xc618a1f0) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xc6567a00, th=0xc6785824, so=0xc618a19c, tp=0xc617e000, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xc6567a00, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xc6567a00) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xc6567a00) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xc6567a00) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xc6567a00) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xc6567a00) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic: spin lock held too long (RELENG_8 from today)
On Thu, Jul 07, 2011 at 07:32:41AM -0400, Mike Tancsa wrote: On 7/7/2011 4:20 AM, Kostik Belousov wrote: BTW, we had a similar panic, spinlock held too long, the spinlock is the sched lock N, on busy 8-core box recently upgraded to the stable/8. Unfortunately, machine hung dumping core, so the stack trace for the owner thread was not available. I was unable to make any conclusion from the data that was present. If the situation is reproducable, you coulld try to revert r221937. This is pure speculation, though. Another crash just now after 5hrs uptime. I will try and revert r221937 unless there is any extra debugging you want me to add to the kernel instead ? This is an inbound mail server so a little disruption is possible kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long panic: spin lock held too long cpuid = 0 Uptime: 5h37m43s Physical memory: 2035 MB Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from /boot/kernel/amdsbwd.ko.symbols...done. done. Loaded symbols for /boot/kernel/amdsbwd.ko #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc61892e0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc61892e0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc61892e0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xc55311c0, td=0xc61892e0, pri=Variable pri is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc60386d0, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc60386d0) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc5db8a00) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc618a19c, arg=0xc5db8a00, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc618a19c, sb=0xc618a1f0) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xc6567a00, th=0xc6785824, so=0xc618a19c, tp=0xc617e000, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xc6567a00, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xc6567a00) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xc6567a00) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xc6567a00) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xc6567a00) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xc6567a00) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 1. info threads 2. Find the index value that matches the tid in question (in the above spin lock panic, that'd be tid 100109). The index value will be the first number shown on the left 3. thread {index} 4. bt If this doesn't work, alternatively you can try (from the beginning) thread apply all bt and provide the output from that. (It will be quite lengthy, and at this point I think tid 100109 is the one of interest in this
Re: panic: spin lock held too long (RELENG_8 from today)
on 07/07/2011 14:41 Jeremy Chadwick said the following: 1. info threads 2. Find the index value that matches the tid in question (in the above spin lock panic, that'd be tid 100109). The index value will be the first number shown on the left 3. thread {index} Just in case, in kgdb there is a command 'tid' that does all of the above steps in one go. 4. bt If this doesn't work, alternatively you can try (from the beginning) thread apply all bt and provide the output from that. (It will be quite lengthy, and at this point I think tid 100109 is the one of interest in this crash, based on what Andriy said earlier) -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic: spin lock held too long (RELENG_8 from today)
On 7/7/2011 7:32 AM, Mike Tancsa wrote: On 7/7/2011 4:20 AM, Kostik Belousov wrote: BTW, we had a similar panic, spinlock held too long, the spinlock is the sched lock N, on busy 8-core box recently upgraded to the stable/8. Unfortunately, machine hung dumping core, so the stack trace for the owner thread was not available. I was unable to make any conclusion from the data that was present. If the situation is reproducable, you coulld try to revert r221937. This is pure speculation, though. Another crash just now after 5hrs uptime. I will try and revert r221937 unless there is any extra debugging you want me to add to the kernel instead ? This is an inbound mail server so a little disruption is possible kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long panic: spin lock held too long cpuid = 0 Uptime: 5h37m43s Physical memory: 2035 MB Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 And the second crash from today kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long panic: spin lock held too long cpuid = 0 Uptime: 5h37m43s Physical memory: 2035 MB Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from /boot/kernel/amdsbwd.ko.symbols...done. done. Loaded symbols for /boot/kernel/amdsbwd.ko #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) tid 100109 [Switching to thread 82 (Thread 100109)]#0 sched_switch (td=0xc5dac2e0, newtd=0xc553c5c0, flags=260) at /usr/src/sys/kern/sched_ule.c:1866 1866cpuid = PCPU_GET(cpuid); (kgdb) list 1861/* 1862 * We may return from cpu_switch on a different cpu. However, 1863 * we always return with td_lock pointing to the current cpu's 1864 * run queue lock. 1865 */ 1866cpuid = PCPU_GET(cpuid); 1867tdq = TDQ_CPU(cpuid); 1868lock_profile_obtain_lock_success( 1869TDQ_LOCKPTR(tdq)-lock_object, 0, 0, __FILE__, __LINE__); 1870#ifdef HWPMC_HOOKS (kgdb) p *td $1 = {td_lock = 0xc0b1d200, td_proc = 0xc5db4000, td_plist = {tqe_next = 0xc5dac5c0, tqe_prev = 0xc5dac008}, td_runq = {tqe_next = 0x0, tqe_prev = 0xc0b1d334}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xc65d3b00}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xc51f6b38}, td_cpuset = 0xc5533e38, td_sel = 0x0, td_sleepqueue = 0xc65d3b00, td_turnstile = 0xc63ceb80, td_umtxq = 0xc5d229c0, td_tid = 100109, td_sigqueue = { sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xc5dac340}, sq_proc = 0xc5db4000, sq_flags = 1}, td_flags = 4, td_inhibitors = 0, td_pflags = 2097152, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 0 '\0', td_oncpu = 1 '\001', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = -291, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xc5538100, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 1048, ru_ixrss = 85216, ru_idrss = 3834720, ru_isrss = 681728, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 82, ru_oublock = 271222, ru_msgsnd = 135625, ru_msgrcv = 2427350, ru_nsignals = 0, ru_nvcsw = 2076938, ru_nivcsw = 731134}, td_incruntime = 852332612, td_runtime = 88202475877, td_pticks = 5326, td_sticks = 48, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 2808072, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0, td_profil_addr
Re: Crashes with Promise controller
On 06.07.2011 01:00, George Kontostanos wrote: [Promise PDC40718 SATA300 controller] There are a lot of people I know that have similar issues. It has caused me to replace 3 disks so far. I am afraid that this controller should be marked as junk. Do you have an alternative controller in mind? Preferably I mean one that doesn't cost ten times as much. :-) Cheers! Chris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Crashes with Promise controller
On 7/7/11 2:09 PM, Christian Baer wrote: Do you have an alternative controller in mind? Preferably I mean one that doesn't cost ten times as much. :-) I suggest an LSI 1068 based SAS controller. With the SAS-SATA cables included it will cost at most five times as much :) Regards, Thomas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Crashes with Promise controller
On 06.07.2011 01:00, George Kontostanos wrote: There are a lot of people I know that have similar issues. It has caused me to replace 3 disks so far. I am afraid that this controller should be marked as junk. Just go another crash. This one is different... --- snip --- dev = ad16p1.eli, ino = 4, fs = /archive/drives/archive01 panic: ffs_freefile: freeing free inode KDB: stack backtrace: db_trace_self_wrapper(c0a1552c,6972642f,2f736576,68637261,30657669,a0d31,0,0,c485b0a4,100,c3ad05c0,d124c000,d96ceb7c,c06eb07b,c485b0a4,100,d96ceb7c,c06eb32b,c0a13393,d96ceb88) at db_trace_self_wrapper+0x26 kdb_backtrace(c0a13393,c0ace8c0,c0a2edcc,d96cebac,d96cebac,...) at kdb_backtrace+0x2b panic(c0a2edcc,c3cd7a78,4,c3b070d4,0,...) at panic+0xf8 ffs_freefile(c3b0b400,c3b07000,c3d8b000,4,41c0,...) at ffs_freefile+0x357 handle_workitem_freefile(0,d96cec6c,2,d96cec78,c06f1ea0,...) at handle_workitem_freefile+0xc7 process_worklist_item(c0a3012c,0,0,0,c3ad05c0,...) at process_worklist_item+0x2bc softdep_process_worklist(c3b1a284,0,44,c0a3012c,3e8,...) at softdep_process_worklist+0xc2 softdep_flush(0,d96ced28,0,0,0,...) at softdep_flush+0x161 fork_exit(c08fb600,0,d96ced28) at fork_exit+0x86 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd96ced60, ebp = 0 --- KDB: enter: panic [thread pid 18 tid 100038 ] Stopped at kdb_enter+0x3b: movl$0,kdb_why --- snap --- I got it while trying to delete some files (5 or 6 in all). Does that help anyone here? Cheers! Chris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Crashes with Promise controller
On Thu, Jul 7, 2011 at 3:15 PM, Thomas Ronner tho...@ronner.org wrote: On 7/7/11 2:09 PM, Christian Baer wrote: Do you have an alternative controller in mind? Preferably I mean one that doesn't cost ten times as much. :-) I suggest an LSI 1068 based SAS controller. With the SAS-SATA cables included it will cost at most five times as much :) Regards, Thomas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Hi Thomas, I have found a used LSI SAS3041E-HP 4-Port PCI-E SAS/SATA RAID controller on ebay. It comes around 70$ which is pretty much the same price I paid for the junk am using now. Do you know if this is supported because I can't find anything relevant in HCL and I read that some people use it without problems. Thanks -- George Kontostanos aisecure.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: system internal timer runs 10 times too slow
--- On Thu, 7/7/11, Jeremy Chadwick free...@jdc.parodius.com wrote: From: Jeremy Chadwick free...@jdc.parodius.com Subject: Re: system internal timer runs 10 times too slow To: Aristedes Maniatis a...@ish.com.au Cc: freebsd-stable freebsd-stable@freebsd.org Date: Thursday, July 7, 2011, 1:58 PM On Thu, Jul 07, 2011 at 07:39:05PM +1000, Aristedes Maniatis wrote: We upgraded an existing system to a new motherboard/CPU and found that timing in various programs is very odd. For example top only updates every 10 seconds instead of every second. And this confirms the oddness: # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:01 EST 2011 Thu Jul 7 19:09:11 EST 2011 Thu Jul 7 19:09:21 EST 2011 10 seconds instead of 1. So I looked first at the kernel timers: # dmesg | grep -i time Timecounter i8254 frequency 1193182 Hz quality 0 Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 pci3: multimedia, HDA at device 0.1 (no driver attached) atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 14318180 Hz quality 900 Timecounters tick every 1.000 msec I switched i8254 and then to HPET. No difference. # sysctl -w kern.timecounter.hardware=i8254 kern.timecounter.hardware: ACPI-fast - i8254 # while true; do echo `date`; sleep 1; done Thu Jul 7 19:09:40 EST 2011 Thu Jul 7 19:09:41 EST 2011 I switched to TSC: # sysctl -w kern.timecounter.hardware=TSC kern.timecounter.hardware: HPET - TSC # while true; do echo `date`; sleep 1; done Thu Jul 7 19:25:56 EST 2011 Thu Jul 7 19:25:57 EST 2011 Thu Jul 7 19:25:58 EST 2011 Now this looks like it fixed the problem, but actually it is worse. Now the clock matches what you'd expect, but there is still 10 seconds in real time between those date entries. That is, now the system clock is running 10 times too slow as well. # uname -a FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Base board information Manufacturer: ASUSTeK Computer INC. Product Name: P6X58D-E BIOS information Vendor: American Megatrends Inc. Version: 0502 Release Date: 11/16/2010 BIOS Revision: 8.15 CPU Model: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz Do you have anything like powerd(8) enabled, or EIST / Intel SpeedStep technology enabled in your system BIOS? If so, can you try disabling powerd and/or disabling EIST/SS? Alternately, and this isn't to say FreeBSD doesn't have a problem, do you have a replacement/spare motherboard you can try? There's always the possibility that you have a bad crystal on the motherboard and a replacement board would rule that out. I also suggest to check you C mode hw.acpi.cpu.cx_lowest. I had same behavior on my note some time ago. This happened when i tried to use C3 so i stayed at C2. Kirill ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org