Re: panic in deadlkres() on r267110
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. These machines were happily running r266621 previously to this update yesterday. So, that gives us a bisection point. sean Script started on Fri Jun 6 14:01:53 2014 command: /bin/sh # uname -a FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r267110: Thu Jun 5 15:57:43 UTC 2014 sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD amd64 # kgdb ./kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: deadlkres: possible deadlock detected on allproc_lock cpuid = 17 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0 panic() at panic+0x155/frame 0xfe1838702b50 deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0 fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0 --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 --- KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) bt #0 doadump (textdump=-946873840) at pcpu.h:219 #1 0x8034e865 in db_fncall (dummy1=value optimized out, dummy2=value optimized out, dummy3=value optimized out, dummy4=value optimized out) at /usr/src/sys/ddb/db_command.c:578 #2 0x8034e54d in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:449 #3 0x8034e2c4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 #4 0x80350d20 in db_trap (type=value optimized out, code=0) at /usr/src/sys/ddb/db_main.c:231 #5 0x809a9bd9 in kdb_trap (type=3, code=0, tf=value optimized out) at /usr/src/sys/kern/subr_kdb.c:656 #6 0x80dc00e3 in trap (frame=0xfe1838702a00) at /usr/src/sys/amd64/amd64/trap.c:551 #7 0x80da29c2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231 #8 0x809a933e in kdb_enter (why=0x81039a72 panic, msg=value optimized out) at cpufunc.h:63 #9 0x8096a8b5 in panic (fmt=value optimized out) at /usr/src/sys/kern/kern_shutdown.c:749 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 ---Type return to continue, or q return to quit--- #12 0x80da2efe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605 #13 0x in ?? () Current language: auto; currently minimal (kgdb) fr 10 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 203 panic(%s: possible deadlock detected on allproc_lock\n, (kgdb) l 198 * priority inversion problem leading to starvation. 199 * If the lock can't be held after 100 tries, panic. 200 */ 201 if (!sx_try_slock(allproc_lock)) { 202 if (tryl 100) 203 panic(%s: possible deadlock detected on allproc_lock\n, 204 __func__); 205 tryl++; 206 pause(allproc, sleepfreq * hz); 207 continue; (kgdb) up #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 977 callout(arg, frame); (kgdb) l 972 * cpu_set_fork_handler intercepts
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 10:12 AM, Glen Barber g...@freebsd.org wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. The first thing that I would do is try to figure out who holds the allproc_lock. If the lock is held exclusively then the pointer to the thread struct will be in allproc_lock.sx_lock. If the lock is held shared then it gets trickier. I'm not sure if there is a better possible way than doing a thread apply all bt in kgdb and finding which stack(s) could hold allproc_lock. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic in deadlkres() on r267110
On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. These machines were happily running r266621 previously to this update yesterday. So, that gives us a bisection point. Some more debug information. Thank you to Attilio for information on what data to get. Script started on Fri Jun 6 15:00:53 2014 command: /bin/sh # kgdb ./kernel.debug /var/crash/vmcore.0 [...] #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) p/x allproc_lock.sx_lock $1 = 0xf813ae7f4924 Current language: auto; currently minimal (kgdb) p ((struct thread *)0xf813ae7f4924) $2 = (struct thread *) 0xf813ae7f4924 (kgdb) p $2-td_tid $3 = 0 (kgdb) ^D # ^D Script done on Fri Jun 6 15:03:19 2014 Glen pgpAoA_3Wzmb1.pgp Description: PGP signature
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 5:06 PM, Glen Barber g...@freebsd.org wrote: On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. These machines were happily running r266621 previously to this update yesterday. So, that gives us a bisection point. Some more debug information. Thank you to Attilio for information on what data to get. Script started on Fri Jun 6 15:00:53 2014 command: /bin/sh # kgdb ./kernel.debug /var/crash/vmcore.0 [...] #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) p/x allproc_lock.sx_lock $1 = 0xf813ae7f4924 Current language: auto; currently minimal (kgdb) p ((struct thread *)0xf813ae7f4924) The actual thread address is: 0xf813ae7f4920. Then look at the GDB threads list and match with the tid. Attilio ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic in deadlkres() on r267110
On Fri, Jun 06, 2014 at 11:06:40AM -0400, Glen Barber wrote: On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote: On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. These machines were happily running r266621 previously to this update yesterday. So, that gives us a bisection point. Some more debug information. Thank you to Attilio for information on what data to get. Sorry, I did not mask the bits properly. Script started on Fri Jun 6 15:20:14 2014 command: /bin/sh # kgdb ./kernel.debug /var/crash/vmcore.0 [...] #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) p/x allproc_lock.sx_lock $1 = 0xf813ae7f4924 Current language: auto; currently minimal (kgdb) p ((struct thread *)0xf813ae7f4920) $4 = (struct thread *) 0xf813ae7f4920 (kgdb) p $4-td_tid $5 = 195038 (kgdb) tid 195038 [Switching to thread 94414 (Thread 195038)]#0 0x80daf1f8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432savectx(stoppcbs[cpu]); (kgdb) bt #0 0x80daf1f8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 #1 0x80daf1bf in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1417 #2 0x80dbfcba in trap (frame=0xfe17d5389f30) at /usr/src/sys/amd64/amd64/trap.c:189 #3 0x80da2e03 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:504 #4 0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, pages=Cannot access memory at address 0x4 ) at /usr/src/sys/kern/kern_fork.c:268 #5 0x8092f00f in sys_fork (td=0xf813ae7f4920, uap=value optimized out) at /usr/src/sys/kern/kern_fork.c:106 #6 0x80dc118b in amd64_syscall (td=0xf813ae7f4920, traced=0) at subr_syscall.c:133 #7 0x80da2cab in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390 #8 0x000800d46d1a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) Glen pgpI1kaMzAPLk.pgp Description: PGP signature
Re: panic in deadlkres() on r267110
On Fri, Jun 6, 2014 at 11:33 AM, Glen Barber g...@freebsd.org wrote: #4 0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, pages=Cannot access memory at address 0x4 ) at /usr/src/sys/kern/kern_fork.c:268 This looks like a suspicious piece of code. Could something be causing the kernel to go into an infinite loop there? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic in deadlkres() on r267110
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote: Two machines in the cluster panic last night with the same backtrace. It is unclear yet exactly what was happening on the systems, but both are port building machines using ports-mgmt/tinderbox. Any ideas or information on how to further debug this would be appreciated. Script started on Fri Jun 6 14:01:53 2014 command: /bin/sh # uname -a FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r267110: Thu Jun 5 15:57:43 UTC 2014 sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD amd64 # kgdb ./kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: deadlkres: possible deadlock detected on allproc_lock cpuid = 17 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0 panic() at panic+0x155/frame 0xfe1838702b50 deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0 fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0 --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 --- KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols #0 doadump (textdump=-946873840) at pcpu.h:219 219 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) bt #0 doadump (textdump=-946873840) at pcpu.h:219 #1 0x8034e865 in db_fncall (dummy1=value optimized out, dummy2=value optimized out, dummy3=value optimized out, dummy4=value optimized out) at /usr/src/sys/ddb/db_command.c:578 #2 0x8034e54d in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:449 #3 0x8034e2c4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 #4 0x80350d20 in db_trap (type=value optimized out, code=0) at /usr/src/sys/ddb/db_main.c:231 #5 0x809a9bd9 in kdb_trap (type=3, code=0, tf=value optimized out) at /usr/src/sys/kern/subr_kdb.c:656 #6 0x80dc00e3 in trap (frame=0xfe1838702a00) at /usr/src/sys/amd64/amd64/trap.c:551 #7 0x80da29c2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231 #8 0x809a933e in kdb_enter (why=0x81039a72 panic, msg=value optimized out) at cpufunc.h:63 #9 0x8096a8b5 in panic (fmt=value optimized out) at /usr/src/sys/kern/kern_shutdown.c:749 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 ---Type return to continue, or q return to quit--- #12 0x80da2efe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605 #13 0x in ?? () Current language: auto; currently minimal (kgdb) fr 10 #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 203 panic(%s: possible deadlock detected on allproc_lock\n, (kgdb) l 198 * priority inversion problem leading to starvation. 199 * If the lock can't be held after 100 tries, panic. 200 */ 201 if (!sx_try_slock(allproc_lock)) { 202 if (tryl 100) 203 panic(%s: possible deadlock detected on allproc_lock\n, 204 __func__); 205 tryl++; 206 pause(allproc, sleepfreq * hz); 207 continue; (kgdb) up #11 0x8093170a in fork_exit (callout=0x8090cd40 deadlkres, arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977 977 callout(arg, frame); (kgdb) l 972 * cpu_set_fork_handler intercepts this function call to 973 * have this call a non-return function to stay in kernel mode. 974 *