Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Andriy Gapon
on 07/07/2011 08:55 Mike Tancsa said the following:
 I did a buildworld on this box to bring it up to RELENG_8 for the BIND
 fixes.  Unfortunately, the formerly solid box (April 13th kernel)
 panic'd tonight with
 
 Unread portion of the kernel message buffer:
 spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long
 panic: spin lock held too long
 cpuid = 0
 Uptime: 13h30m4s
 Physical memory: 2035 MB
 
 
 Its a somewhat busy box taking in mail as well as backups for a few
 servers over nfs.  At the time, it would have been getting about 250Mb/s
 inbound on its gigabit interface.  Full core.txt file at
 
 http://www.tancsa.com/core-jul8-2011.txt

I thought that this was supposed to contain output of 'thread apply all bt' in
kgdb.  Anyway, I think that stacktrace for tid 100107 may have some useful
information.

 #0  doadump () at pcpu.h:231
 231 pcpu.h: No such file or directory.
 in pcpu.h
 (kgdb) #0  doadump () at pcpu.h:231
 #1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
 #2  0xc06fd937 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:602
 #3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0)
 at /usr/src/sys/kern/kern_mutex.c:490
 #4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
 file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526
 #5  0xc0720254 in sched_add (td=0xc5dac5c0, flags=0)
 at /usr/src/sys/kern/sched_ule.c:1119
 #6  0xc07203f9 in sched_wakeup (td=0xc5dac5c0)
 at /usr/src/sys/kern/sched_ule.c:1950
 #7  0xc07061f8 in setrunnable (td=0xc5dac5c0)
 at /usr/src/sys/kern/kern_synch.c:499
 #8  0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0,
 pri=Variable pri is not available.
 )
 at /usr/src/sys/kern/subr_sleepqueue.c:751
 #9  0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0)
 at /usr/src/sys/kern/subr_sleepqueue.c:825
 #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50)
 at /usr/src/sys/kern/kern_condvar.c:422
 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available.
 ) at /usr/src/sys/rpc/svc.c:342
 #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at
 /usr/src/sys/rpc/svc.c:378
 #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600,
 waitflag=1)
 at /usr/src/sys/rpc/svc_vc.c:747
 #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34)
 at /usr/src/sys/kern/uipc_sockbuf.c:191
 #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824,
 so=0xc6372ce0,
 tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2)
 at /usr/src/sys/netinet/tcp_input.c:1775
 #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20)
 at /usr/src/sys/netinet/tcp_input.c:1329
 #17 0xc07ddaf7 in ip_input (m=0xcaa8d200)
 at /usr/src/sys/netinet/ip_input.c:787
 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200)
 at /usr/src/sys/net/netisr.c:859
 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200)
 at /usr/src/sys/net/netisr.c:946
 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200)
 at /usr/src/sys/net/if_ethersubr.c:894
 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200)
 at /usr/src/sys/net/if_ethersubr.c:753
 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1)
 at /usr/src/sys/dev/nfe/if_nfe.c:2187
 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440)
 at /usr/src/sys/kern/subr_taskqueue.c:248
 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130)
 at /usr/src/sys/kern/subr_taskqueue.c:385
 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop,
 arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861
 #26 0xc09a5c24 in fork_trampoline () at
 /usr/src/sys/i386/i386/exception.s:275
 (kgdb)
 


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Kostik Belousov
On Thu, Jul 07, 2011 at 10:36:42AM +0300, Andriy Gapon wrote:
 on 07/07/2011 08:55 Mike Tancsa said the following:
  I did a buildworld on this box to bring it up to RELENG_8 for the BIND
  fixes.  Unfortunately, the formerly solid box (April 13th kernel)
  panic'd tonight with
  
  Unread portion of the kernel message buffer:
  spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long
  panic: spin lock held too long
  cpuid = 0
  Uptime: 13h30m4s
  Physical memory: 2035 MB
  
  
  Its a somewhat busy box taking in mail as well as backups for a few
  servers over nfs.  At the time, it would have been getting about 250Mb/s
  inbound on its gigabit interface.  Full core.txt file at
  
  http://www.tancsa.com/core-jul8-2011.txt
 
 I thought that this was supposed to contain output of 'thread apply all bt' in
 kgdb.  Anyway, I think that stacktrace for tid 100107 may have some useful
 information.
 
  #0  doadump () at pcpu.h:231
  231 pcpu.h: No such file or directory.
  in pcpu.h
  (kgdb) #0  doadump () at pcpu.h:231
  #1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
  #2  0xc06fd937 in panic (fmt=Variable fmt is not available.
  ) at /usr/src/sys/kern/kern_shutdown.c:602
  #3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0)
  at /usr/src/sys/kern/kern_mutex.c:490
  #4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
  file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526
  #5  0xc0720254 in sched_add (td=0xc5dac5c0, flags=0)
  at /usr/src/sys/kern/sched_ule.c:1119
  #6  0xc07203f9 in sched_wakeup (td=0xc5dac5c0)
  at /usr/src/sys/kern/sched_ule.c:1950
  #7  0xc07061f8 in setrunnable (td=0xc5dac5c0)
  at /usr/src/sys/kern/kern_synch.c:499
  #8  0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0,
  pri=Variable pri is not available.
  )
  at /usr/src/sys/kern/subr_sleepqueue.c:751
  #9  0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0)
  at /usr/src/sys/kern/subr_sleepqueue.c:825
  #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50)
  at /usr/src/sys/kern/kern_condvar.c:422
  #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available.
  ) at /usr/src/sys/rpc/svc.c:342
  #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at
  /usr/src/sys/rpc/svc.c:378
  #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600,
  waitflag=1)
  at /usr/src/sys/rpc/svc_vc.c:747
  #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34)
  at /usr/src/sys/kern/uipc_sockbuf.c:191
  #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824,
  so=0xc6372ce0,
  tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2)
  at /usr/src/sys/netinet/tcp_input.c:1775
  #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20)
  at /usr/src/sys/netinet/tcp_input.c:1329
  #17 0xc07ddaf7 in ip_input (m=0xcaa8d200)
  at /usr/src/sys/netinet/ip_input.c:787
  #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200)
  at /usr/src/sys/net/netisr.c:859
  #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200)
  at /usr/src/sys/net/netisr.c:946
  #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200)
  at /usr/src/sys/net/if_ethersubr.c:894
  #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200)
  at /usr/src/sys/net/if_ethersubr.c:753
  #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1)
  at /usr/src/sys/dev/nfe/if_nfe.c:2187
  #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440)
  at /usr/src/sys/kern/subr_taskqueue.c:248
  #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130)
  at /usr/src/sys/kern/subr_taskqueue.c:385
  #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop,
  arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861
  #26 0xc09a5c24 in fork_trampoline () at
  /usr/src/sys/i386/i386/exception.s:275
  (kgdb)
  

BTW, we had a similar panic, spinlock held too long, the spinlock
is the sched lock N, on busy 8-core box recently upgraded to the
stable/8. Unfortunately, machine hung dumping core, so the stack trace
for the owner thread was not available.

I was unable to make any conclusion from the data that was present.
If the situation is reproducable, you coulld try to revert r221937. This
is pure speculation, though.


pgpW2o7azLFBo.pgp
Description: PGP signature


system internal timer runs 10 times too slow

2011-07-07 Thread Aristedes Maniatis

We upgraded an existing system to a new motherboard/CPU and found that timing in various 
programs is very odd. For example top only updates every 10 seconds instead 
of every second. And this confirms the oddness:

# while true; do echo `date`; sleep 1; done
Thu Jul 7 19:09:01 EST 2011
Thu Jul 7 19:09:11 EST 2011
Thu Jul 7 19:09:21 EST 2011

10 seconds instead of 1.


So I looked first at the kernel timers:

# dmesg | grep -i time
Timecounter i8254 frequency 1193182 Hz quality 0
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
pci3: multimedia, HDA at device 0.1 (no driver attached)
atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
Timecounter HPET frequency 14318180 Hz quality 900
Timecounters tick every 1.000 msec


I switched i8254 and then to HPET. No difference.

# sysctl -w kern.timecounter.hardware=i8254
kern.timecounter.hardware: ACPI-fast - i8254
# while true; do echo `date`; sleep 1; done
Thu Jul 7 19:09:40 EST 2011
Thu Jul 7 19:09:41 EST 2011

I switched to TSC:

# sysctl -w kern.timecounter.hardware=TSC
kern.timecounter.hardware: HPET - TSC
# while true; do echo `date`; sleep 1; done
Thu Jul 7 19:25:56 EST 2011
Thu Jul 7 19:25:57 EST 2011
Thu Jul 7 19:25:58 EST 2011

Now this looks like it fixed the problem, but actually it is worse. Now the 
clock matches what you'd expect, but there is still 10 seconds in real time 
between those date entries. That is, now the system clock is running 10 times 
too slow as well.


# uname -a
FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 
02:41:51 UTC 2011 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  
amd64

Base board information
Manufacturer: ASUSTeK Computer INC.
Product Name: P6X58D-E

BIOS information
Vendor: American Megatrends Inc.
Version: 0502
Release Date: 11/16/2010
BIOS Revision: 8.15

CPU Model:  Intel(R) Core(TM) i7 CPU 960  @ 3.20GHz


Thanks in advance for any help.


Ari


--
--
Aristedes Maniatis
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: system internal timer runs 10 times too slow

2011-07-07 Thread Jeremy Chadwick
On Thu, Jul 07, 2011 at 07:39:05PM +1000, Aristedes Maniatis wrote:
 We upgraded an existing system to a new motherboard/CPU and found that timing 
 in various programs is very odd. For example top only updates every 10 
 seconds instead of every second. And this confirms the oddness:
 
 # while true; do echo `date`; sleep 1; done
 Thu Jul 7 19:09:01 EST 2011
 Thu Jul 7 19:09:11 EST 2011
 Thu Jul 7 19:09:21 EST 2011
 
 10 seconds instead of 1.
 
 
 So I looked first at the kernel timers:
 
 # dmesg | grep -i time
 Timecounter i8254 frequency 1193182 Hz quality 0
 Timecounter ACPI-fast frequency 3579545 Hz quality 1000
 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
 pci3: multimedia, HDA at device 0.1 (no driver attached)
 atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0
 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
 Timecounter HPET frequency 14318180 Hz quality 900
 Timecounters tick every 1.000 msec
 
 
 I switched i8254 and then to HPET. No difference.
 
 # sysctl -w kern.timecounter.hardware=i8254
 kern.timecounter.hardware: ACPI-fast - i8254
 # while true; do echo `date`; sleep 1; done
 Thu Jul 7 19:09:40 EST 2011
 Thu Jul 7 19:09:41 EST 2011
 
 I switched to TSC:
 
 # sysctl -w kern.timecounter.hardware=TSC
 kern.timecounter.hardware: HPET - TSC
 # while true; do echo `date`; sleep 1; done
 Thu Jul 7 19:25:56 EST 2011
 Thu Jul 7 19:25:57 EST 2011
 Thu Jul 7 19:25:58 EST 2011
 
 Now this looks like it fixed the problem, but actually it is worse. Now the 
 clock matches what you'd expect, but there is still 10 seconds in real time 
 between those date entries. That is, now the system clock is running 10 times 
 too slow as well.
 
 
 # uname -a
 FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 
 02:41:51 UTC 2011 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC 
  amd64
 
 Base board information
 Manufacturer: ASUSTeK Computer INC.
 Product Name: P6X58D-E
 
 BIOS information
 Vendor: American Megatrends Inc.
 Version: 0502
 Release Date: 11/16/2010
 BIOS Revision: 8.15
 
 CPU Model:Intel(R) Core(TM) i7 CPU 960  @ 3.20GHz

Do you have anything like powerd(8) enabled, or EIST / Intel SpeedStep
technology enabled in your system BIOS?  If so, can you try disabling
powerd and/or disabling EIST/SS?

Alternately, and this isn't to say FreeBSD doesn't have a problem, do
you have a replacement/spare motherboard you can try?  There's always
the possibility that you have a bad crystal on the motherboard and a
replacement board would rule that out.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Mike Tancsa
On 7/7/2011 4:20 AM, Kostik Belousov wrote:
 
 BTW, we had a similar panic, spinlock held too long, the spinlock
 is the sched lock N, on busy 8-core box recently upgraded to the
 stable/8. Unfortunately, machine hung dumping core, so the stack trace
 for the owner thread was not available.
 
 I was unable to make any conclusion from the data that was present.
 If the situation is reproducable, you coulld try to revert r221937. This
 is pure speculation, though.

Another crash just now after 5hrs uptime. I will try and revert r221937
unless there is any extra debugging you want me to add to the kernel
instead  ?

This is an inbound mail server so a little disruption is possible

 kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd...

Unread portion of the kernel message buffer:
spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long
panic: spin lock held too long
cpuid = 0
Uptime: 5h37m43s
Physical memory: 2035 MB
Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5

Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from
/boot/kernel/amdsbwd.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/amdsbwd.ko
#0  doadump () at pcpu.h:231
231 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:231
#1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
#2  0xc06fd937 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:602
#3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at
/usr/src/sys/kern/kern_mutex.c:490
#4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:526
#5  0xc0720254 in sched_add (td=0xc61892e0, flags=0) at
/usr/src/sys/kern/sched_ule.c:1119
#6  0xc07203f9 in sched_wakeup (td=0xc61892e0) at
/usr/src/sys/kern/sched_ule.c:1950
#7  0xc07061f8 in setrunnable (td=0xc61892e0) at
/usr/src/sys/kern/kern_synch.c:499
#8  0xc07362af in sleepq_resume_thread (sq=0xc55311c0, td=0xc61892e0,
pri=Variable pri is not available.
)
at /usr/src/sys/kern/subr_sleepqueue.c:751
#9  0xc0736e18 in sleepq_signal (wchan=0xc60386d0, flags=1, pri=0, queue=0)
at /usr/src/sys/kern/subr_sleepqueue.c:825
#10 0xc06b6764 in cv_signal (cvp=0xc60386d0) at
/usr/src/sys/kern/kern_condvar.c:422
#11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available.
) at /usr/src/sys/rpc/svc.c:342
#12 0xc08ec502 in xprt_active (xprt=0xc5db8a00) at
/usr/src/sys/rpc/svc.c:378
#13 0xc08ee051 in svc_vc_soupcall (so=0xc618a19c, arg=0xc5db8a00,
waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747
#14 0xc075bbb1 in sowakeup (so=0xc618a19c, sb=0xc618a1f0) at
/usr/src/sys/kern/uipc_sockbuf.c:191
#15 0xc08447bc in tcp_do_segment (m=0xc6567a00, th=0xc6785824,
so=0xc618a19c, tp=0xc617e000, drop_hdrlen=52,
tlen=1448, iptos=0 '\0', ti_locked=2) at
/usr/src/sys/netinet/tcp_input.c:1775
#16 0xc0847930 in tcp_input (m=0xc6567a00, off0=20) at
/usr/src/sys/netinet/tcp_input.c:1329
#17 0xc07ddaf7 in ip_input (m=0xc6567a00) at
/usr/src/sys/netinet/ip_input.c:787
#18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xc6567a00)
at /usr/src/sys/net/netisr.c:859
#19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xc6567a00) at
/usr/src/sys/net/netisr.c:946
#20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xc6567a00) at
/usr/src/sys/net/if_ethersubr.c:894
#21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xc6567a00) at
/usr/src/sys/net/if_ethersubr.c:753
#22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at
/usr/src/sys/dev/nfe/if_nfe.c:2187
#23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at
/usr/src/sys/kern/subr_taskqueue.c:248
#24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at
/usr/src/sys/kern/subr_taskqueue.c:385
#25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop,
arg=0xc56ff130, frame=0xc538ed28)
at /usr/src/sys/kern/kern_fork.c:861
#26 0xc09a5c24 in fork_trampoline () at
/usr/src/sys/i386/i386/exception.s:275


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Jeremy Chadwick
On Thu, Jul 07, 2011 at 07:32:41AM -0400, Mike Tancsa wrote:
 On 7/7/2011 4:20 AM, Kostik Belousov wrote:
  
  BTW, we had a similar panic, spinlock held too long, the spinlock
  is the sched lock N, on busy 8-core box recently upgraded to the
  stable/8. Unfortunately, machine hung dumping core, so the stack trace
  for the owner thread was not available.
  
  I was unable to make any conclusion from the data that was present.
  If the situation is reproducable, you coulld try to revert r221937. This
  is pure speculation, though.
 
 Another crash just now after 5hrs uptime. I will try and revert r221937
 unless there is any extra debugging you want me to add to the kernel
 instead  ?
 
 This is an inbound mail server so a little disruption is possible
 
  kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long
 panic: spin lock held too long
 cpuid = 0
 Uptime: 5h37m43s
 Physical memory: 2035 MB
 Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5
 
 Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from
 /boot/kernel/amdsbwd.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/amdsbwd.ko
 #0  doadump () at pcpu.h:231
 231 pcpu.h: No such file or directory.
 in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:231
 #1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
 #2  0xc06fd937 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:602
 #3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at
 /usr/src/sys/kern/kern_mutex.c:490
 #4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
 file=0x0, line=0)
 at /usr/src/sys/kern/kern_mutex.c:526
 #5  0xc0720254 in sched_add (td=0xc61892e0, flags=0) at
 /usr/src/sys/kern/sched_ule.c:1119
 #6  0xc07203f9 in sched_wakeup (td=0xc61892e0) at
 /usr/src/sys/kern/sched_ule.c:1950
 #7  0xc07061f8 in setrunnable (td=0xc61892e0) at
 /usr/src/sys/kern/kern_synch.c:499
 #8  0xc07362af in sleepq_resume_thread (sq=0xc55311c0, td=0xc61892e0,
 pri=Variable pri is not available.
 )
 at /usr/src/sys/kern/subr_sleepqueue.c:751
 #9  0xc0736e18 in sleepq_signal (wchan=0xc60386d0, flags=1, pri=0, queue=0)
 at /usr/src/sys/kern/subr_sleepqueue.c:825
 #10 0xc06b6764 in cv_signal (cvp=0xc60386d0) at
 /usr/src/sys/kern/kern_condvar.c:422
 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available.
 ) at /usr/src/sys/rpc/svc.c:342
 #12 0xc08ec502 in xprt_active (xprt=0xc5db8a00) at
 /usr/src/sys/rpc/svc.c:378
 #13 0xc08ee051 in svc_vc_soupcall (so=0xc618a19c, arg=0xc5db8a00,
 waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747
 #14 0xc075bbb1 in sowakeup (so=0xc618a19c, sb=0xc618a1f0) at
 /usr/src/sys/kern/uipc_sockbuf.c:191
 #15 0xc08447bc in tcp_do_segment (m=0xc6567a00, th=0xc6785824,
 so=0xc618a19c, tp=0xc617e000, drop_hdrlen=52,
 tlen=1448, iptos=0 '\0', ti_locked=2) at
 /usr/src/sys/netinet/tcp_input.c:1775
 #16 0xc0847930 in tcp_input (m=0xc6567a00, off0=20) at
 /usr/src/sys/netinet/tcp_input.c:1329
 #17 0xc07ddaf7 in ip_input (m=0xc6567a00) at
 /usr/src/sys/netinet/ip_input.c:787
 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xc6567a00)
 at /usr/src/sys/net/netisr.c:859
 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xc6567a00) at
 /usr/src/sys/net/netisr.c:946
 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xc6567a00) at
 /usr/src/sys/net/if_ethersubr.c:894
 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xc6567a00) at
 /usr/src/sys/net/if_ethersubr.c:753
 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at
 /usr/src/sys/dev/nfe/if_nfe.c:2187
 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at
 /usr/src/sys/kern/subr_taskqueue.c:248
 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at
 /usr/src/sys/kern/subr_taskqueue.c:385
 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop,
 arg=0xc56ff130, frame=0xc538ed28)
 at /usr/src/sys/kern/kern_fork.c:861
 #26 0xc09a5c24 in fork_trampoline () at
 /usr/src/sys/i386/i386/exception.s:275

1. info threads
2. Find the index value that matches the tid in question (in the above
   spin lock panic, that'd be tid 100109).  The index value will be
   the first number shown on the left
3. thread {index}
4. bt

If this doesn't work, alternatively you can try (from the beginning)
thread apply all bt and provide the output from that.  (It will be
quite lengthy, and at this point I think tid 100109 is the one of
interest in this 

Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Andriy Gapon
on 07/07/2011 14:41 Jeremy Chadwick said the following:
 1. info threads
 2. Find the index value that matches the tid in question (in the above
spin lock panic, that'd be tid 100109).  The index value will be
the first number shown on the left
 3. thread {index}

Just in case, in kgdb there is a command 'tid' that does all of the above steps 
in
one go.

 4. bt
 
 If this doesn't work, alternatively you can try (from the beginning)
 thread apply all bt and provide the output from that.  (It will be
 quite lengthy, and at this point I think tid 100109 is the one of
 interest in this crash, based on what Andriy said earlier)


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Mike Tancsa
On 7/7/2011 7:32 AM, Mike Tancsa wrote:
 On 7/7/2011 4:20 AM, Kostik Belousov wrote:

 BTW, we had a similar panic, spinlock held too long, the spinlock
 is the sched lock N, on busy 8-core box recently upgraded to the
 stable/8. Unfortunately, machine hung dumping core, so the stack trace
 for the owner thread was not available.

 I was unable to make any conclusion from the data that was present.
 If the situation is reproducable, you coulld try to revert r221937. This
 is pure speculation, though.
 
 Another crash just now after 5hrs uptime. I will try and revert r221937
 unless there is any extra debugging you want me to add to the kernel
 instead  ?
 
 This is an inbound mail server so a little disruption is possible
 
  kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long
 panic: spin lock held too long
 cpuid = 0
 Uptime: 5h37m43s
 Physical memory: 2035 MB
 Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5

And the second crash from today

 kgdb /usr/obj/usr/src/sys/recycle/kernel.debug vmcore.13
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd...

Unread portion of the kernel message buffer:
spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac2e0 (tid 100109) too long
panic: spin lock held too long
cpuid = 0
Uptime: 5h37m43s
Physical memory: 2035 MB
Dumping 260 MB: 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5

Reading symbols from /boot/kernel/amdsbwd.ko...Reading symbols from
/boot/kernel/amdsbwd.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/amdsbwd.ko
#0  doadump () at pcpu.h:231
231 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) tid 100109
[Switching to thread 82 (Thread 100109)]#0  sched_switch (td=0xc5dac2e0,
newtd=0xc553c5c0, flags=260)
at /usr/src/sys/kern/sched_ule.c:1866
1866cpuid = PCPU_GET(cpuid);
(kgdb) list
1861/*
1862 * We may return from cpu_switch on a different
cpu.  However,
1863 * we always return with td_lock pointing to the
current cpu's
1864 * run queue lock.
1865 */
1866cpuid = PCPU_GET(cpuid);
1867tdq = TDQ_CPU(cpuid);
1868lock_profile_obtain_lock_success(
1869TDQ_LOCKPTR(tdq)-lock_object, 0, 0,
__FILE__, __LINE__);
1870#ifdef  HWPMC_HOOKS
(kgdb) p *td
$1 = {td_lock = 0xc0b1d200, td_proc = 0xc5db4000, td_plist = {tqe_next =
0xc5dac5c0, tqe_prev = 0xc5dac008},
  td_runq = {tqe_next = 0x0, tqe_prev = 0xc0b1d334}, td_slpq = {tqe_next
= 0x0, tqe_prev = 0xc65d3b00},
  td_lockq = {tqe_next = 0x0, tqe_prev = 0xc51f6b38}, td_cpuset =
0xc5533e38, td_sel = 0x0,
  td_sleepqueue = 0xc65d3b00, td_turnstile = 0xc63ceb80, td_umtxq =
0xc5d229c0, td_tid = 100109, td_sigqueue = {
sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0,
0}}, sq_list = {tqh_first = 0x0,
  tqh_last = 0xc5dac340}, sq_proc = 0xc5db4000, sq_flags = 1},
td_flags = 4, td_inhibitors = 0,
  td_pflags = 2097152, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0,
td_wmesg = 0x0, td_lastcpu = 0 '\0',
  td_oncpu = 1 '\001', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0',
td_locks = -291, td_rw_rlocks = 0,
  td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested =
{lh_first = 0x0}, td_sleeplocks = 0x0,
  td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xc5538100,
td_estcpu = 0, td_slptick = 0,
  td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0},
ru_stime = {tv_sec = 0, tv_usec = 0},
ru_maxrss = 1048, ru_ixrss = 85216, ru_idrss = 3834720, ru_isrss =
681728, ru_minflt = 0, ru_majflt = 0,
ru_nswap = 0, ru_inblock = 82, ru_oublock = 271222, ru_msgsnd =
135625, ru_msgrcv = 2427350,
ru_nsignals = 0, ru_nvcsw = 2076938, ru_nivcsw = 731134},
td_incruntime = 852332612,
  td_runtime = 88202475877, td_pticks = 5326, td_sticks = 48, td_iticks
= 0, td_uticks = 0, td_intrval = 0,
  td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0,
0, 0}}, td_generation = 2808072,
  td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0,
td_profil_addr 

Re: Crashes with Promise controller

2011-07-07 Thread Christian Baer
On 06.07.2011 01:00, George Kontostanos wrote:

[Promise PDC40718 SATA300 controller]

 There are a lot of people I know that have similar issues. It has
 caused me to replace 3 disks so far. I am afraid that this controller
 should be marked as junk.

Do you have an alternative controller in mind? Preferably I mean one
that doesn't cost ten times as much. :-)

Cheers!
Chris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Crashes with Promise controller

2011-07-07 Thread Thomas Ronner

On 7/7/11 2:09 PM, Christian Baer wrote:

Do you have an alternative controller in mind? Preferably I mean one
that doesn't cost ten times as much. :-)


I suggest an LSI 1068 based SAS controller. With the SAS-SATA cables 
included it will cost at most five times as much :)




Regards,
Thomas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Crashes with Promise controller

2011-07-07 Thread Christian Baer
On 06.07.2011 01:00, George Kontostanos wrote:

 There are a lot of people I know that have similar issues. It has
 caused me to replace 3 disks so far. I am afraid that this controller
 should be marked as junk.

Just go another crash. This one is different...

--- snip ---
dev = ad16p1.eli, ino = 4, fs = /archive/drives/archive01
panic: ffs_freefile: freeing free inode
KDB: stack backtrace:
db_trace_self_wrapper(c0a1552c,6972642f,2f736576,68637261,30657669,a0d31,0,0,c485b0a4,100,c3ad05c0,d124c000,d96ceb7c,c06eb07b,c485b0a4,100,d96ceb7c,c06eb32b,c0a13393,d96ceb88)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0a13393,c0ace8c0,c0a2edcc,d96cebac,d96cebac,...) at
kdb_backtrace+0x2b
panic(c0a2edcc,c3cd7a78,4,c3b070d4,0,...) at panic+0xf8
ffs_freefile(c3b0b400,c3b07000,c3d8b000,4,41c0,...) at ffs_freefile+0x357
handle_workitem_freefile(0,d96cec6c,2,d96cec78,c06f1ea0,...) at
handle_workitem_freefile+0xc7
process_worklist_item(c0a3012c,0,0,0,c3ad05c0,...) at
process_worklist_item+0x2bc
softdep_process_worklist(c3b1a284,0,44,c0a3012c,3e8,...) at
softdep_process_worklist+0xc2
softdep_flush(0,d96ced28,0,0,0,...) at softdep_flush+0x161
fork_exit(c08fb600,0,d96ced28) at fork_exit+0x86
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xd96ced60, ebp = 0 ---
KDB: enter: panic
[thread pid 18 tid 100038 ]
Stopped at  kdb_enter+0x3b: movl$0,kdb_why
--- snap ---

I got it while trying to delete some files (5 or 6 in all). Does that
help anyone here?

Cheers!
Chris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Crashes with Promise controller

2011-07-07 Thread George Kontostanos
On Thu, Jul 7, 2011 at 3:15 PM, Thomas Ronner tho...@ronner.org wrote:
 On 7/7/11 2:09 PM, Christian Baer wrote:

 Do you have an alternative controller in mind? Preferably I mean one
 that doesn't cost ten times as much. :-)

 I suggest an LSI 1068 based SAS controller. With the SAS-SATA cables
 included it will cost at most five times as much :)



 Regards,
 Thomas
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Hi Thomas,

I have found a used LSI SAS3041E-HP 4-Port PCI-E SAS/SATA RAID
controller on ebay. It comes around 70$ which is pretty much the same
price I paid for the junk am using now. Do you know if this is
supported because I can't find anything relevant in HCL and I read
that some people use it without problems.

Thanks

-- 
George Kontostanos
aisecure.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: system internal timer runs 10 times too slow

2011-07-07 Thread Kirill Yelizarov


--- On Thu, 7/7/11, Jeremy Chadwick free...@jdc.parodius.com wrote:

 From: Jeremy Chadwick free...@jdc.parodius.com
 Subject: Re: system internal timer runs 10 times too slow
 To: Aristedes Maniatis a...@ish.com.au
 Cc: freebsd-stable freebsd-stable@freebsd.org
 Date: Thursday, July 7, 2011, 1:58 PM
 On Thu, Jul 07, 2011 at 07:39:05PM
 +1000, Aristedes Maniatis wrote:
  We upgraded an existing system to a new
 motherboard/CPU and found that timing in various programs is
 very odd. For example top only updates every 10 seconds
 instead of every second. And this confirms the oddness:
  
  # while true; do echo `date`; sleep 1; done
  Thu Jul 7 19:09:01 EST 2011
  Thu Jul 7 19:09:11 EST 2011
  Thu Jul 7 19:09:21 EST 2011
  
  10 seconds instead of 1.
  
  
  So I looked first at the kernel timers:
  
  # dmesg | grep -i time
  Timecounter i8254 frequency 1193182 Hz quality 0
  Timecounter ACPI-fast frequency 3579545 Hz quality
 1000
  acpi_timer0: 24-bit timer at 3.579545MHz port
 0x808-0x80b on acpi0
  pci3: multimedia, HDA at device 0.1 (no driver
 attached)
  atrtc0: AT realtime clock port 0x70-0x71 irq 8
 on acpi0
  acpi_hpet0: High Precision Event Timer iomem
 0xfed0-0xfed003ff on acpi0
  Timecounter HPET frequency 14318180 Hz quality 900
  Timecounters tick every 1.000 msec
  
  
  I switched i8254 and then to HPET. No difference.
  
  # sysctl -w kern.timecounter.hardware=i8254
  kern.timecounter.hardware: ACPI-fast - i8254
  # while true; do echo `date`; sleep 1; done
  Thu Jul 7 19:09:40 EST 2011
  Thu Jul 7 19:09:41 EST 2011
  
  I switched to TSC:
  
  # sysctl -w kern.timecounter.hardware=TSC
  kern.timecounter.hardware: HPET - TSC
  # while true; do echo `date`; sleep 1; done
  Thu Jul 7 19:25:56 EST 2011
  Thu Jul 7 19:25:57 EST 2011
  Thu Jul 7 19:25:58 EST 2011
  
  Now this looks like it fixed the problem, but actually
 it is worse. Now the clock matches what you'd expect, but
 there is still 10 seconds in real time between those date
 entries. That is, now the system clock is running 10 times
 too slow as well.
  
  
  # uname -a
  FreeBSD delish.ish.com.au 8.2-RELEASE FreeBSD
 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 
    r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC 
 amd64
  
  Base board information
  Manufacturer: ASUSTeK Computer INC.
  Product Name: P6X58D-E
  
  BIOS information
  Vendor: American Megatrends Inc.
  Version: 0502
  Release Date: 11/16/2010
  BIOS Revision: 8.15
  
  CPU Model:    Intel(R) Core(TM) i7
 CPU         960  @
 3.20GHz
 
 Do you have anything like powerd(8) enabled, or EIST /
 Intel SpeedStep
 technology enabled in your system BIOS?  If so, can
 you try disabling
 powerd and/or disabling EIST/SS?
 
 Alternately, and this isn't to say FreeBSD doesn't have a
 problem, do
 you have a replacement/spare motherboard you can try? 
 There's always
 the possibility that you have a bad crystal on the
 motherboard and a
 replacement board would rule that out.

I also suggest to check you C mode hw.acpi.cpu.cx_lowest. I had same behavior 
on my note some time ago. This happened when i tried to use C3 so i stayed at 
C2.

Kirill


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org