Re: panic in deadlkres() on r267110

2014-06-06 Thread Sean Bruno
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
 Two machines in the cluster panic last night with the same backtrace.
 It is unclear yet exactly what was happening on the systems, but both
 are port building machines using ports-mgmt/tinderbox.
 
 Any ideas or information on how to further debug this would be
 appreciated.
 
These machines were happily running r266621 previously to this update
yesterday.  So, that gives us a bisection point.

sean


 Script started on Fri Jun  6 14:01:53 2014
 command: /bin/sh
 # uname -a
 FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 
 r267110: Thu Jun  5 15:57:43 UTC 2014 
 sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD  amd64
 # kgdb ./kernel.debug /var/crash/vmcore.0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 panic: deadlkres: possible deadlock detected on allproc_lock
 
 cpuid = 17
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0
 panic() at panic+0x155/frame 0xfe1838702b50
 deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0
 fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0
 --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 ---
 KDB: enter: panic
 
 Reading symbols from /boot/kernel/zfs.ko.symbols...done.
 Loaded symbols for /boot/kernel/zfs.ko.symbols
 Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
 Loaded symbols for /boot/kernel/opensolaris.ko.symbols
 Reading symbols from /boot/kernel/ums.ko.symbols...done.
 Loaded symbols for /boot/kernel/ums.ko.symbols
 Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
 Loaded symbols for /boot/kernel/linprocfs.ko.symbols
 Reading symbols from /boot/kernel/linux.ko.symbols...done.
 Loaded symbols for /boot/kernel/linux.ko.symbols
 #0  doadump (textdump=-946873840) at pcpu.h:219
 219 __asm(movq %%gs:%1,%0 : =r (td)
 (kgdb) bt
 #0  doadump (textdump=-946873840) at pcpu.h:219
 #1  0x8034e865 in db_fncall (dummy1=value optimized out, 
 dummy2=value optimized out, dummy3=value optimized out, 
 dummy4=value optimized out) at /usr/src/sys/ddb/db_command.c:578
 #2  0x8034e54d in db_command (cmd_table=0x0)
 at /usr/src/sys/ddb/db_command.c:449
 #3  0x8034e2c4 in db_command_loop ()
 at /usr/src/sys/ddb/db_command.c:502
 #4  0x80350d20 in db_trap (type=value optimized out, code=0)
 at /usr/src/sys/ddb/db_main.c:231
 #5  0x809a9bd9 in kdb_trap (type=3, code=0, tf=value optimized out)
 at /usr/src/sys/kern/subr_kdb.c:656
 #6  0x80dc00e3 in trap (frame=0xfe1838702a00)
 at /usr/src/sys/amd64/amd64/trap.c:551
 #7  0x80da29c2 in calltrap ()
 at /usr/src/sys/amd64/amd64/exception.S:231
 #8  0x809a933e in kdb_enter (why=0x81039a72 panic, 
 msg=value optimized out) at cpufunc.h:63
 #9  0x8096a8b5 in panic (fmt=value optimized out)
 at /usr/src/sys/kern/kern_shutdown.c:749
 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, 
 arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
 ---Type return to continue, or q return to quit---
 #12 0x80da2efe in fork_trampoline ()
 at /usr/src/sys/amd64/amd64/exception.S:605
 #13 0x in ?? ()
 Current language:  auto; currently minimal
 (kgdb) fr 10
 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
 203 panic(%s: possible deadlock detected on 
 allproc_lock\n,
 (kgdb) l
 198  * priority inversion problem leading to starvation.
 199  * If the lock can't be held after 100 tries, panic.
 200  */
 201 if (!sx_try_slock(allproc_lock)) {
 202 if (tryl  100)
 203 panic(%s: possible deadlock detected on 
 allproc_lock\n,
 204 __func__);
 205 tryl++;
 206 pause(allproc, sleepfreq * hz);
 207 continue;
 (kgdb) up
 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, 
 arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
 977 callout(arg, frame);
 (kgdb) l
 972  * cpu_set_fork_handler intercepts 

Re: panic in deadlkres() on r267110

2014-06-06 Thread Ryan Stone
On Fri, Jun 6, 2014 at 10:12 AM, Glen Barber g...@freebsd.org wrote:
 Two machines in the cluster panic last night with the same backtrace.
 It is unclear yet exactly what was happening on the systems, but both
 are port building machines using ports-mgmt/tinderbox.

 Any ideas or information on how to further debug this would be
 appreciated.

The first thing that I would do is try to figure out who holds the
allproc_lock.  If the lock is held exclusively then
the pointer to the thread struct will be in allproc_lock.sx_lock.

If the lock is held shared then it gets trickier.  I'm not sure if
there is a better possible way than doing a thread apply all bt in
kgdb and finding which stack(s) could hold allproc_lock.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic in deadlkres() on r267110

2014-06-06 Thread Glen Barber
On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
 On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
  Two machines in the cluster panic last night with the same backtrace.
  It is unclear yet exactly what was happening on the systems, but both
  are port building machines using ports-mgmt/tinderbox.
  
  Any ideas or information on how to further debug this would be
  appreciated.
  
 These machines were happily running r266621 previously to this update
 yesterday.  So, that gives us a bisection point.
 

Some more debug information.  Thank you to Attilio for information on
what data to get.

Script started on Fri Jun  6 15:00:53 2014
command: /bin/sh
# kgdb ./kernel.debug /var/crash/vmcore.0
[...]
#0  doadump (textdump=-946873840) at pcpu.h:219
219 __asm(movq %%gs:%1,%0 : =r (td)
(kgdb) p/x allproc_lock.sx_lock
$1 = 0xf813ae7f4924
Current language:  auto; currently minimal
(kgdb) p ((struct thread *)0xf813ae7f4924)
$2 = (struct thread *) 0xf813ae7f4924
(kgdb) p $2-td_tid
$3 = 0
(kgdb) ^D
# ^D
Script done on Fri Jun  6 15:03:19 2014

Glen



pgpAoA_3Wzmb1.pgp
Description: PGP signature


Re: panic in deadlkres() on r267110

2014-06-06 Thread Attilio Rao
On Fri, Jun 6, 2014 at 5:06 PM, Glen Barber g...@freebsd.org wrote:
 On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
 On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
  Two machines in the cluster panic last night with the same backtrace.
  It is unclear yet exactly what was happening on the systems, but both
  are port building machines using ports-mgmt/tinderbox.
 
  Any ideas or information on how to further debug this would be
  appreciated.
 
 These machines were happily running r266621 previously to this update
 yesterday.  So, that gives us a bisection point.


 Some more debug information.  Thank you to Attilio for information on
 what data to get.

 Script started on Fri Jun  6 15:00:53 2014
 command: /bin/sh
 # kgdb ./kernel.debug /var/crash/vmcore.0
 [...]
 #0  doadump (textdump=-946873840) at pcpu.h:219
 219 __asm(movq %%gs:%1,%0 : =r (td)
 (kgdb) p/x allproc_lock.sx_lock
 $1 = 0xf813ae7f4924
 Current language:  auto; currently minimal
 (kgdb) p ((struct thread *)0xf813ae7f4924)

The actual thread address is: 0xf813ae7f4920.

Then look at the GDB threads list and match with the tid.

Attilio
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic in deadlkres() on r267110

2014-06-06 Thread Glen Barber
On Fri, Jun 06, 2014 at 11:06:40AM -0400, Glen Barber wrote:
 On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
  On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
   Two machines in the cluster panic last night with the same backtrace.
   It is unclear yet exactly what was happening on the systems, but both
   are port building machines using ports-mgmt/tinderbox.
   
   Any ideas or information on how to further debug this would be
   appreciated.
   
  These machines were happily running r266621 previously to this update
  yesterday.  So, that gives us a bisection point.
  
 
 Some more debug information.  Thank you to Attilio for information on
 what data to get.
 

Sorry, I did not mask the bits properly.

Script started on Fri Jun  6 15:20:14 2014
command: /bin/sh
# kgdb ./kernel.debug /var/crash/vmcore.0
[...]
#0  doadump (textdump=-946873840) at pcpu.h:219
219 __asm(movq %%gs:%1,%0 : =r (td)
(kgdb) p/x allproc_lock.sx_lock
$1 = 0xf813ae7f4924
Current language:  auto; currently minimal
(kgdb) p ((struct thread *)0xf813ae7f4920)
$4 = (struct thread *) 0xf813ae7f4920
(kgdb) p $4-td_tid
$5 = 195038
(kgdb) tid 195038
[Switching to thread 94414 (Thread 195038)]#0  0x80daf1f8 in 
cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
1432savectx(stoppcbs[cpu]);
(kgdb) bt
#0  0x80daf1f8 in cpustop_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
#1  0x80daf1bf in ipi_nmi_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1417
#2  0x80dbfcba in trap (frame=0xfe17d5389f30)
at /usr/src/sys/amd64/amd64/trap.c:189
#3  0x80da2e03 in nmi_calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:504
#4  0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, pages=Cannot 
access memory at address 0x4
)
at /usr/src/sys/kern/kern_fork.c:268
#5  0x8092f00f in sys_fork (td=0xf813ae7f4920, 
uap=value optimized out) at /usr/src/sys/kern/kern_fork.c:106
#6  0x80dc118b in amd64_syscall (td=0xf813ae7f4920, traced=0)
at subr_syscall.c:133
#7  0x80da2cab in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:390
#8  0x000800d46d1a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

Glen



pgpI1kaMzAPLk.pgp
Description: PGP signature


Re: panic in deadlkres() on r267110

2014-06-06 Thread Ryan Stone
On Fri, Jun 6, 2014 at 11:33 AM, Glen Barber g...@freebsd.org wrote:
 #4  0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, 
 pages=Cannot access memory at address 0x4
 )
 at /usr/src/sys/kern/kern_fork.c:268

This looks like a suspicious piece of code.  Could something be
causing the kernel to go into an infinite loop there?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic in deadlkres() on r267110

2014-06-06 Thread Sean Bruno
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
 Two machines in the cluster panic last night with the same backtrace.
 It is unclear yet exactly what was happening on the systems, but both
 are port building machines using ports-mgmt/tinderbox.
 
 Any ideas or information on how to further debug this would be
 appreciated.
 
 Script started on Fri Jun  6 14:01:53 2014
 command: /bin/sh
 # uname -a
 FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 
 r267110: Thu Jun  5 15:57:43 UTC 2014 
 sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD  amd64
 # kgdb ./kernel.debug /var/crash/vmcore.0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 panic: deadlkres: possible deadlock detected on allproc_lock
 
 cpuid = 17
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0
 panic() at panic+0x155/frame 0xfe1838702b50
 deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0
 fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0
 --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 ---
 KDB: enter: panic
 
 Reading symbols from /boot/kernel/zfs.ko.symbols...done.
 Loaded symbols for /boot/kernel/zfs.ko.symbols
 Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
 Loaded symbols for /boot/kernel/opensolaris.ko.symbols
 Reading symbols from /boot/kernel/ums.ko.symbols...done.
 Loaded symbols for /boot/kernel/ums.ko.symbols
 Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
 Loaded symbols for /boot/kernel/linprocfs.ko.symbols
 Reading symbols from /boot/kernel/linux.ko.symbols...done.
 Loaded symbols for /boot/kernel/linux.ko.symbols
 #0  doadump (textdump=-946873840) at pcpu.h:219
 219 __asm(movq %%gs:%1,%0 : =r (td)
 (kgdb) bt
 #0  doadump (textdump=-946873840) at pcpu.h:219
 #1  0x8034e865 in db_fncall (dummy1=value optimized out, 
 dummy2=value optimized out, dummy3=value optimized out, 
 dummy4=value optimized out) at /usr/src/sys/ddb/db_command.c:578
 #2  0x8034e54d in db_command (cmd_table=0x0)
 at /usr/src/sys/ddb/db_command.c:449
 #3  0x8034e2c4 in db_command_loop ()
 at /usr/src/sys/ddb/db_command.c:502
 #4  0x80350d20 in db_trap (type=value optimized out, code=0)
 at /usr/src/sys/ddb/db_main.c:231
 #5  0x809a9bd9 in kdb_trap (type=3, code=0, tf=value optimized out)
 at /usr/src/sys/kern/subr_kdb.c:656
 #6  0x80dc00e3 in trap (frame=0xfe1838702a00)
 at /usr/src/sys/amd64/amd64/trap.c:551
 #7  0x80da29c2 in calltrap ()
 at /usr/src/sys/amd64/amd64/exception.S:231
 #8  0x809a933e in kdb_enter (why=0x81039a72 panic, 
 msg=value optimized out) at cpufunc.h:63
 #9  0x8096a8b5 in panic (fmt=value optimized out)
 at /usr/src/sys/kern/kern_shutdown.c:749
 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, 
 arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
 ---Type return to continue, or q return to quit---
 #12 0x80da2efe in fork_trampoline ()
 at /usr/src/sys/amd64/amd64/exception.S:605
 #13 0x in ?? ()
 Current language:  auto; currently minimal
 (kgdb) fr 10
 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
 203 panic(%s: possible deadlock detected on 
 allproc_lock\n,
 (kgdb) l
 198  * priority inversion problem leading to starvation.
 199  * If the lock can't be held after 100 tries, panic.
 200  */
 201 if (!sx_try_slock(allproc_lock)) {
 202 if (tryl  100)
 203 panic(%s: possible deadlock detected on 
 allproc_lock\n,
 204 __func__);
 205 tryl++;
 206 pause(allproc, sleepfreq * hz);
 207 continue;
 (kgdb) up
 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, 
 arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
 977 callout(arg, frame);
 (kgdb) l
 972  * cpu_set_fork_handler intercepts this function call to
 973  * have this call a non-return function to stay in kernel 
 mode.
 974  *