Re: panic in deadlkres() on r267110
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: > Two machines in the cluster panic last night with the same backtrace. > It is unclear yet exactly what was happening on the systems, but both > are port building machines using ports-mgmt/tinderbox. > > Any ideas or information on how to further debug this would be > appreciated. > > Script started on Fri Jun 6 14:01:53 2014 > command: /bin/sh > # uname -a > FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 > r267110: Thu Jun 5 15:57:43 UTC 2014 > sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD amd64 > # kgdb ./kernel.debug /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > panic: deadlkres: possible deadlock detected on allproc_lock > > cpuid = 17 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0 > panic() at panic+0x155/frame 0xfe1838702b50 > deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0 > fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0 > --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 --- > KDB: enter: panic > > Reading symbols from /boot/kernel/zfs.ko.symbols...done. > Loaded symbols for /boot/kernel/zfs.ko.symbols > Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. > Loaded symbols for /boot/kernel/opensolaris.ko.symbols > Reading symbols from /boot/kernel/ums.ko.symbols...done. > Loaded symbols for /boot/kernel/ums.ko.symbols > Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. > Loaded symbols for /boot/kernel/linprocfs.ko.symbols > Reading symbols from /boot/kernel/linux.ko.symbols...done. > Loaded symbols for /boot/kernel/linux.ko.symbols > #0 doadump (textdump=-946873840) at pcpu.h:219 > 219 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) bt > #0 doadump (textdump=-946873840) at pcpu.h:219 > #1 0x8034e865 in db_fncall (dummy1=, > dummy2=, dummy3=, > dummy4=) at /usr/src/sys/ddb/db_command.c:578 > #2 0x8034e54d in db_command (cmd_table=0x0) > at /usr/src/sys/ddb/db_command.c:449 > #3 0x8034e2c4 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:502 > #4 0x80350d20 in db_trap (type=, code=0) > at /usr/src/sys/ddb/db_main.c:231 > #5 0x809a9bd9 in kdb_trap (type=3, code=0, tf=) > at /usr/src/sys/kern/subr_kdb.c:656 > #6 0x80dc00e3 in trap (frame=0xfe1838702a00) > at /usr/src/sys/amd64/amd64/trap.c:551 > #7 0x80da29c2 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:231 > #8 0x809a933e in kdb_enter (why=0x81039a72 "panic", > msg=) at cpufunc.h:63 > #9 0x8096a8b5 in panic (fmt=) > at /usr/src/sys/kern/kern_shutdown.c:749 > #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 > #11 0x8093170a in fork_exit (callout=0x8090cd40 , > arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 > ---Type to continue, or q to quit--- > #12 0x80da2efe in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:605 > #13 0x in ?? () > Current language: auto; currently minimal > (kgdb) fr 10 > #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 > 203 panic("%s: possible deadlock detected on > allproc_lock\n", > (kgdb) l > 198 * priority inversion problem leading to starvation. > 199 * If the lock can't be held after 100 tries, panic. > 200 */ > 201 if (!sx_try_slock(&allproc_lock)) { > 202 if (tryl > 100) > 203 panic("%s: possible deadlock detected on > allproc_lock\n", > 204 __func__); > 205 tryl++; > 206 pause("allproc", sleepfreq * hz); > 207 continue; > (kgdb) up > #11 0x8093170a in fork_exit (callout=0x8090cd40 , > arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 > 977 callout(arg, frame); > (kgdb) l > 972 * cpu_set_fork_handler intercepts this function call to > 973 * have this call a non-return function to stay in kernel > mode. > 974 * initproc has its own fork handler, but it does return. > 975
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 11:33 AM, Glen Barber wrote: > #4 0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, > pages=Cannot access memory at address 0x4 > ) > at /usr/src/sys/kern/kern_fork.c:268 This looks like a suspicious piece of code. Could something be causing the kernel to go into an infinite loop there? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic in deadlkres() on r267110
On Fri, Jun 06, 2014 at 11:06:40AM -0400, Glen Barber wrote: > On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: > > On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: > > > Two machines in the cluster panic last night with the same backtrace. > > > It is unclear yet exactly what was happening on the systems, but both > > > are port building machines using ports-mgmt/tinderbox. > > > > > > Any ideas or information on how to further debug this would be > > > appreciated. > > > > > These machines were happily running r266621 previously to this update > > yesterday. So, that gives us a bisection point. > > > > Some more debug information. Thank you to Attilio for information on > what data to get. > Sorry, I did not mask the bits properly. Script started on Fri Jun 6 15:20:14 2014 command: /bin/sh # kgdb ./kernel.debug /var/crash/vmcore.0 [...] #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) p/x allproc_lock.sx_lock $1 = 0xf813ae7f4924 Current language: auto; currently minimal (kgdb) p ((struct thread *)0xf813ae7f4920) $4 = (struct thread *) 0xf813ae7f4920 (kgdb) p $4->td_tid $5 = 195038 (kgdb) tid 195038 [Switching to thread 94414 (Thread 195038)]#0 0x80daf1f8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432savectx(&stoppcbs[cpu]); (kgdb) bt #0 0x80daf1f8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 #1 0x80daf1bf in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1417 #2 0x80dbfcba in trap (frame=0xfe17d5389f30) at /usr/src/sys/amd64/amd64/trap.c:189 #3 0x80da2e03 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:504 #4 0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, pages=Cannot access memory at address 0x4 ) at /usr/src/sys/kern/kern_fork.c:268 #5 0x8092f00f in sys_fork (td=0xf813ae7f4920, uap=) at /usr/src/sys/kern/kern_fork.c:106 #6 0x80dc118b in amd64_syscall (td=0xf813ae7f4920, traced=0) at subr_syscall.c:133 #7 0x80da2cab in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390 #8 0x000800d46d1a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) Glen pgpI1kaMzAPLk.pgp Description: PGP signature
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 5:06 PM, Glen Barber wrote: > On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: >> On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: >> > Two machines in the cluster panic last night with the same backtrace. >> > It is unclear yet exactly what was happening on the systems, but both >> > are port building machines using ports-mgmt/tinderbox. >> > >> > Any ideas or information on how to further debug this would be >> > appreciated. >> > >> These machines were happily running r266621 previously to this update >> yesterday. So, that gives us a bisection point. >> > > Some more debug information. Thank you to Attilio for information on > what data to get. > > Script started on Fri Jun 6 15:00:53 2014 > command: /bin/sh > # kgdb ./kernel.debug /var/crash/vmcore.0 > [...] > #0 doadump (textdump=-946873840) at pcpu.h:219 > 219 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) p/x allproc_lock.sx_lock > $1 = 0xf813ae7f4924 > Current language: auto; currently minimal > (kgdb) p ((struct thread *)0xf813ae7f4924) The actual thread address is: 0xf813ae7f4920. Then look at the GDB threads list and match with the tid. Attilio ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic in deadlkres() on r267110
On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: > On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: > > Two machines in the cluster panic last night with the same backtrace. > > It is unclear yet exactly what was happening on the systems, but both > > are port building machines using ports-mgmt/tinderbox. > > > > Any ideas or information on how to further debug this would be > > appreciated. > > > These machines were happily running r266621 previously to this update > yesterday. So, that gives us a bisection point. > Some more debug information. Thank you to Attilio for information on what data to get. Script started on Fri Jun 6 15:00:53 2014 command: /bin/sh # kgdb ./kernel.debug /var/crash/vmcore.0 [...] #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) p/x allproc_lock.sx_lock $1 = 0xf813ae7f4924 Current language: auto; currently minimal (kgdb) p ((struct thread *)0xf813ae7f4924) $2 = (struct thread *) 0xf813ae7f4924 (kgdb) p $2->td_tid $3 = 0 (kgdb) ^D # ^D Script done on Fri Jun 6 15:03:19 2014 Glen pgpAoA_3Wzmb1.pgp Description: PGP signature
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 10:12 AM, Glen Barber wrote: > Two machines in the cluster panic last night with the same backtrace. > It is unclear yet exactly what was happening on the systems, but both > are port building machines using ports-mgmt/tinderbox. > > Any ideas or information on how to further debug this would be > appreciated. The first thing that I would do is try to figure out who holds the allproc_lock. If the lock is held exclusively then the pointer to the thread struct will be in allproc_lock.sx_lock. If the lock is held shared then it gets trickier. I'm not sure if there is a better possible way than doing a thread apply all bt in kgdb and finding which stack(s) could hold allproc_lock. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic in deadlkres() on r267110
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: > Two machines in the cluster panic last night with the same backtrace. > It is unclear yet exactly what was happening on the systems, but both > are port building machines using ports-mgmt/tinderbox. > > Any ideas or information on how to further debug this would be > appreciated. > These machines were happily running r266621 previously to this update yesterday. So, that gives us a bisection point. sean > Script started on Fri Jun 6 14:01:53 2014 > command: /bin/sh > # uname -a > FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 > r267110: Thu Jun 5 15:57:43 UTC 2014 > sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD amd64 > # kgdb ./kernel.debug /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > panic: deadlkres: possible deadlock detected on allproc_lock > > cpuid = 17 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0 > panic() at panic+0x155/frame 0xfe1838702b50 > deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0 > fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0 > --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 --- > KDB: enter: panic > > Reading symbols from /boot/kernel/zfs.ko.symbols...done. > Loaded symbols for /boot/kernel/zfs.ko.symbols > Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. > Loaded symbols for /boot/kernel/opensolaris.ko.symbols > Reading symbols from /boot/kernel/ums.ko.symbols...done. > Loaded symbols for /boot/kernel/ums.ko.symbols > Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. > Loaded symbols for /boot/kernel/linprocfs.ko.symbols > Reading symbols from /boot/kernel/linux.ko.symbols...done. > Loaded symbols for /boot/kernel/linux.ko.symbols > #0 doadump (textdump=-946873840) at pcpu.h:219 > 219 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) bt > #0 doadump (textdump=-946873840) at pcpu.h:219 > #1 0x8034e865 in db_fncall (dummy1=, > dummy2=, dummy3=, > dummy4=) at /usr/src/sys/ddb/db_command.c:578 > #2 0x8034e54d in db_command (cmd_table=0x0) > at /usr/src/sys/ddb/db_command.c:449 > #3 0x8034e2c4 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:502 > #4 0x80350d20 in db_trap (type=, code=0) > at /usr/src/sys/ddb/db_main.c:231 > #5 0x809a9bd9 in kdb_trap (type=3, code=0, tf=) > at /usr/src/sys/kern/subr_kdb.c:656 > #6 0x80dc00e3 in trap (frame=0xfe1838702a00) > at /usr/src/sys/amd64/amd64/trap.c:551 > #7 0x80da29c2 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:231 > #8 0x809a933e in kdb_enter (why=0x81039a72 "panic", > msg=) at cpufunc.h:63 > #9 0x8096a8b5 in panic (fmt=) > at /usr/src/sys/kern/kern_shutdown.c:749 > #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 > #11 0x8093170a in fork_exit (callout=0x8090cd40 , > arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 > ---Type to continue, or q to quit--- > #12 0x80da2efe in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:605 > #13 0x in ?? () > Current language: auto; currently minimal > (kgdb) fr 10 > #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 > 203 panic("%s: possible deadlock detected on > allproc_lock\n", > (kgdb) l > 198 * priority inversion problem leading to starvation. > 199 * If the lock can't be held after 100 tries, panic. > 200 */ > 201 if (!sx_try_slock(&allproc_lock)) { > 202 if (tryl > 100) > 203 panic("%s: possible deadlock detected on > allproc_lock\n", > 204 __func__); > 205 tryl++; > 206 pause("allproc", sleepfreq * hz); > 207 continue; > (kgdb) up > #11 0x8093170a in fork_exit (callout=0x8090cd40 , > arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 > 977 callout(arg, frame); > (kgdb) l > 972 * cpu_set_fork_handler intercepts this function call to > 973 * have this call a non-ret