Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
panic: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/182999 deadlock: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/183007 Anton ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
On Wed, Oct 16, 2013 at 09:02:19AM +0100, Anton Shterenlikht wrote: panic: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/182999 db show pginfo 0xe0027d352600 page 0xe0027d352600 obj 0xe000128fda00 pidx 0x0 phys 0x275dc6000 q 255 hold 0 wire 1 af 0x0 of 0x0 f 0x0 act 0 busy 1 valid 0xff dirty 0x0 AFAIR ia64 uses 8K pages. Please do the following: 1. apply the patch at the end of this message, reproduce the problem and show me both exact panic message from the patched kernel and 'show pginfo addr' again. 2. show me the ls -la file output for the file which was accessed through nginx, also what is the filesystem where the file resides on ? pgpjxcvGkOlZw.pgp Description: PGP signature
Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
On Wed, Oct 16, 2013 at 02:55:26PM +0300, Konstantin Belousov wrote: On Wed, Oct 16, 2013 at 09:02:19AM +0100, Anton Shterenlikht wrote: panic: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/182999 db show pginfo 0xe0027d352600 page 0xe0027d352600 obj 0xe000128fda00 pidx 0x0 phys 0x275dc6000 q 255 hold 0 wire 1 af 0x0 of 0x0 f 0x0 act 0 busy 1 valid 0xff dirty 0x0 AFAIR ia64 uses 8K pages. Please do the following: 1. apply the patch at the end of this message, reproduce the problem and show me both exact panic message from the patched kernel and 'show pginfo addr' again. 2. show me the ls -la file output for the file which was accessed through nginx, also what is the filesystem where the file resides on ? Sure, I forgot the patch. diff --git a/sys/kern/uipc_syscalls.c b/sys/kern/uipc_syscalls.c index 322550b..9d46dc7 100644 --- a/sys/kern/uipc_syscalls.c +++ b/sys/kern/uipc_syscalls.c @@ -2070,7 +2070,7 @@ free_page: } KASSERT(error != 0 || (m-wire_count 0 vm_page_is_valid(m, off PAGE_MASK, xfsize)), - (wrong page state m %p, m)); + (wrong page state m %p off %#jx xfsize %d, m, off, xfsize)); VM_OBJECT_WUNLOCK(obj); return (error); } pgpgJsr9KTiio.pgp Description: PGP signature
Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
From kostik...@gmail.com Wed Oct 16 13:02:51 2013 On Wed, Oct 16, 2013 at 02:55:26PM +0300, Konstantin Belousov wrote: On Wed, Oct 16, 2013 at 09:02:19AM +0100, Anton Shterenlikht wrote: panic: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/182999 =20 db show pginfo 0xe0027d352600 page 0xe0027d352600 obj 0xe000128fda00 pidx 0x0 phys 0x275dc6000 = q 255 hold 0 wire 1 af 0x0 of 0x0 f 0x0 act 0 busy 1 valid 0xff dirty 0x0 =20 AFAIR ia64 uses 8K pages. =20 Please do the following: 1. apply the patch at the end of this message, reproduce the problem and show me both exact panic message from the patched kernel and 'show pginfo addr' again. 2. show me the ls -la file output for the file which was accessed through nginx, also what is the filesystem where the file resides on ? Sure, I forgot the patch. diff --git a/sys/kern/uipc_syscalls.c b/sys/kern/uipc_syscalls.c index 322550b..9d46dc7 100644 --- a/sys/kern/uipc_syscalls.c +++ b/sys/kern/uipc_syscalls.c @@ -2070,7 +2070,7 @@ free_page: } KASSERT(error !=3D 0 || (m-wire_count 0 vm_page_is_valid(m, off PAGE_MASK, xfsize)), - (wrong page state m %p, m)); + (wrong page state m %p off %#jx xfsize %d, m, off, xfsize)); VM_OBJECT_WUNLOCK(obj); return (error); } The patch didn't apply cleanly. I concluded that my src was older that yours, so I updated to r 256624. Now I don't get this panic! I don't know whether to be happy or not. Anyway, I was on r255488 when the panic happened, and there have been a lot of changes under sys/kern. Specifically related to the patch: # svn info Path: . Working Copy Root Path: /usr/src URL: https://svn0.eu.freebsd.org/base/head/sys/kern Relative URL: ^/head/sys/kern Repository Root: https://svn0.eu.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 256624 Node Kind: directory Schedule: normal Last Changed Author: mav Last Changed Rev: 256614 Last Changed Date: 2013-10-16 10:56:40 +0100 (Wed, 16 Oct 2013) # svn diff -r 255488 uipc_syscalls.c Index: uipc_syscalls.c === --- uipc_syscalls.c (revision 255488) +++ uipc_syscalls.c (working copy) @@ -123,21 +123,13 @@ /* * sendfile(2)-related variables and associated sysctls */ -int nsfbufs; -int nsfbufspeak; -int nsfbufsused; +static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0, +sendfile(2) tunables); static int sfreadahead = 1; +SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW, +sfreadahead, 0, Number of sendfile(2) read-ahead MAXBSIZE blocks); -SYSCTL_INT(_kern_ipc, OID_AUTO, nsfbufs, CTLFLAG_RDTUN, nsfbufs, 0, -Maximum number of sendfile(2) sf_bufs available); -SYSCTL_INT(_kern_ipc, OID_AUTO, nsfbufspeak, CTLFLAG_RD, nsfbufspeak, 0, -Number of sendfile(2) sf_bufs at peak usage); -SYSCTL_INT(_kern_ipc, OID_AUTO, nsfbufsused, CTLFLAG_RD, nsfbufsused, 0, -Number of sendfile(2) sf_bufs in use); -SYSCTL_INT(_kern_ipc, OID_AUTO, sfreadahead, CTLFLAG_RW, sfreadahead, 0, -Number of sendfile(2) read-ahead MAXBSIZE blocks); - static void sfstat_init(const void *unused) { @@ -2076,10 +2068,10 @@ vm_page_free(m); vm_page_unlock(m); } + KASSERT(error != 0 || (m-wire_count 0 + vm_page_is_valid(m, off PAGE_MASK, xfsize)), + (wrong page state m %p off %#jx xfsize %d, m, off, xfsize)); VM_OBJECT_WUNLOCK(obj); - KASSERT(error != 0 || (m-wire_count 0 m-valid == - VM_PAGE_BITS_ALL), - (wrong page state m %p, m)); return (error); } # Please let me know if there is any other diagnostics you'd like to see. Otherwise, till next panic... Many thanks for your help Anton ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
From davide.itali...@gmail.com Mon Oct 14 12:50:44 2013 This is fair enough -- If you're still at the ddb prompt, please print the whole panic message (or at least the address of the lock reported as deadlocked by DEADLKRES), so that we can at least have a candidate. Here's another one, followed by savecore deadlock. ia64 r255488 panic: wrong page state m 0xe0027a9adb40 cpuid = 0 KDB: stack backtrace: db_trace_self(0x9ffc00158380) at db_trace_self+0x40 db_trace_self_wrapper(0x9ffc00607370) at db_trace_self_wrapper+0x70 kdb_backtrace(0x9ffc00ed0e10, 0x9ffc0058e660, 0x40c, 0x9ffc010a44a0) at kdb_backtrace+0xc0 vpanic(0x9ffc00dd3fe0, 0xa0009de61118, 0x9ffc00ef9670, 0x9ffc00ed0bc0) at vpanic+0x260 kassert_panic(0x9ffc00dd3fe0, 0xe0027a9adb40, 0x81f, 0xe002013cf400, 0x9ffc006a0220, 0x2c60, 0xe002013cf400, 0xe002013cf418) at kassert_panic+0x120 vn_sendfile(0x8df, 0xd, 0x0, 0x0, 0x0, 0x8df, 0x7fffdfe0, 0x0) at vn_sendfile+0x15d0 sys_sendfile(0xe00012aef200, 0xa0009de614e8, 0x10, 0xa0009de61360) at sys_sendfile+0x2b0 syscall(0xe000154f2940, 0xd, 0x0, 0xe00012aef200, 0x0, 0x0, 0x9ffc00ab7280, 0x8) at syscall+0x5e0 epc_syscall_return() at epc_syscall_return KDB: enter: panic [ thread pid 5989 tid 100111 ] Stopped at kdb_enter+0x92: [I2]addl r14=0xffe2c990,gp ;; db db scripts lockinfo=show locks; show alllocks; show lockedvnods zzz=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; capture off; call doadump; reset db db run zzz I get to db:0:alltrace capture off db:0:off call doadump Dumping 10220 MB (25 chunks) chunk 0: 1 pages ... ok chunk 1: 159 pages ... ok chunk 2: 256 pages ... ok chunk 3: 7680 pages ... ok chunk 4: 8192 pages ... ok chunk 5: 239734 pages ... ok chunk 6: 748 pages ... ok chunk 7: 533 pages ... ok chunk 8: 21 pages ... ok chunk 9: 1572862 pages ... ok chunk 10: 781683 pages ... ok chunk 11: 512 pages ... ok chunk 12: 139 pages ... ok chunk 13: 484 pages ... ok chunk 14: 1565 pages ... ok chunk 15: 1 pages ... ok chunk 16: 506 pages ... ok chunk 17: 1 pages ... ok chunk 18: 3 pages ... ok chunk 19: 566 pages ... ok chunk 20: 66 pages ... ok chunk 21: 1 pages ... ok chunk 22: 285 pages ... ok chunk 23: 6 pages ... ok chunk 24: 354 pages ... ok Dump complete = 0 db:0:doadump reset So far, so good. On reboot I get: Starting ddb. ddb: sysctl: debug.ddb.scripting.scripts: Invalid argument /etc/rc: WARNING: failed to start ddb This probably already indicates some problem? Eventually I get to: savecore: reboot after panic: wrong page state m 0xe0027a9adb40 Oct 15 09:05:50 mech-as28 savecore: reboot after panic: wrong page state m 0xe0027a9adb40 savecore: writing core to /var/crash/vmcore.9 So here I'm confused. I think I set up textdump as in the man page. So I think the core should not be written. Instead I was expecting ddb.txt, config.txt, etc., as in textdump(4). Anyway, savecore eventually deadlocks: panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked for 901401 ticks cpuid = 0 KDB: stack backtrace: db_trace_self(0x9ffc00158380) at db_trace_self+0x40 db_trace_self_wrapper(0x9ffc00607370) at db_trace_self_wrapper+0x70 kdb_backtrace(0x9ffc00ed0e10, 0x9ffc0058e660, 0x40c, 0x9ffc010a44a0) at kdb_backtrace+0xc0 vpanic(0x9ffc00db8a18, 0xa0009dca7518) at vpanic+0x260 panic(0x9ffc00db8a18, 0x9ffc00db8c70, 0xe000127b7b00, 0xdc119) at panic+0x80 deadlkres(0xdc119, 0xe000127b7b00, 0x9ffc00dbb648, 0x9ffc00db89a8) at deadlkres+0x420 fork_exit(0x9ffc00e0fca0, 0x0, 0xa0009dca7550) at fork_exit+0x120 enter_userland() at enter_userland KDB: enter: panic [ thread pid 0 tid 100053 ] Stopped at kdb_enter+0x92: [I2]addl r14=0xffe2c990,gp ;; db db scripts lockinfo=show locks; show alllocks; show lockedvnods db run lockinfo db:0:lockinfo show locks db:0:locks show alllocks db:0:alllocks show lockedvnods Locked vnodes 0xe000127cbba8: tag devfs, type VCHR usecount 1, writecount 0, refcount 19 mountedhere 0xe000126ab200 flags (VI_ACTIVE) v_object 0xe000127c2b00 ref 0 pages 422 lock type devfs: EXCL by thread 0xe0001269 (pid 21, syncer, tid 100062) dev da3p1 0xe000127f4ec0: tag ufs, type VREG usecount 1, writecount 1, refcount 32934 mountedhere 0 flags (VI_ACTIVE) v_object 0xe000127f7200 ref 0 pages 1242850 lock type ufs: EXCL by thread 0xe000127b7b00 (pid 805, savecore, tid 100079) ino 6500740, on dev da3p1 db db ps pid ppid pgrp uid state wmesg wchancmd 805 80324 0 L+ *vm page 0xe00012402fc0 savecore 8032424 0 DL+ vm map ( 0xe0001285fa88 sh 801 1 801 0 Ss select 0xe00010c296c0 syslogd 792 1 792 0 Ss
Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
On Tue, Oct 15, 2013 at 10:43 AM, Anton Shterenlikht me...@bris.ac.uk wrote: Anyway, savecore eventually deadlocks: panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked for 901401 ticks [trim] Tracing command savecore pid 805 tid 100079 td 0xe000127b7b00 cpu_switch(0xe000127b7b00, 0xe00011178900, 0xe00012402fc0, 0x9ffc005e7e80) at cpu_switch+0xd0 sched_switch(0xe000127b7b00, 0xe00011178900, 0x9ffc00f15698, 0x9ffc00f15680) at sched_switch+0x890 mi_switch(0x103, 0x0, 0xe000127b7b00, 0x9ffc0062d1f0) at mi_switch+0x3f0 turnstile_wait(0xe00012402fc0, 0xe00012400480, 0x0, 0x9ffc00dcb698) at turnstile_wait+0x960 __mtx_lock_sleep(0x9ffc010f9998, 0xe000127b7b00, 0xe00012402fc0, 0x9ffc00dc0558, 0x742) at __mtx_lock_sleep+0x2f0 __mtx_lock_flags(0x9ffc010f9980, 0x0, 0x9ffc00dd4a90, 0x742) at __mtx_lock_flags+0x1e0 vfs_vmio_release(0xa0009ebe72f0, 0xe0027ed2ab70, 0x3, 0xa0009ebe736c, 0xa0009ebe7498, 0xa0009ebe72f8, 0x9ffc00dd4a90, 0x9ffc010f9680) at vfs_vmio_release+0x290 getnewbuf(0xe000127f4ec0, 0x0, 0x0, 0x8000, 0xa0009ebe99a8, 0x0, 0x9ffc010f0798, 0xa0009ebe72f0) at getnewbuf+0x7e0 getblk(0xe000127f4ec0, 0x4cbaa, 0x8000, 0x0, 0x0, 0x0, 0x0, 0x0) at getblk+0xee0 ffs_balloc_ufs2(0xe000127f4ec0, 0x4cbaa, 0xa000c60ba000, 0xe00011165a00, 0x7f05, 0xa0009dd79160) at ffs_balloc_ufs2+0x2950 ffs_write(0xa0009dd79248, 0x3000, 0x265d5) at ffs_write+0x5c0 VOP_WRITE_APV(0x9ffc00e94ac0, 0xa0009dd79248, 0x0, 0x0) at VOP_WRITE_APV+0x330 vn_write(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 0xe000129ae830, 0xe000127f4ec0) at vn_write+0x450 vn_io_fault(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 0xe000127b7b00) at vn_io_fault+0x330 dofilewrite(0xe000127b7b00, 0x7, 0xe000129ae820, 0xa0009dd79360, 0x, 0x0) at dofilewrite+0x180 kern_writev(0xe000127b7b00, 0x7, 0xa0009dd79360) at kern_writev+0xa0 sys_write(0xe000127b7b00, 0xa0009dd794e8, 0x9ffc00abac80, 0x48d) at sys_write+0x100 syscall(0xe000129d04a0, 0x140857000, 0x8000, 0xe000127b7b00, 0x0, 0x0, 0x9ffc00ab7280, 0x8) at syscall+0x5e0 --More-- I'm not commenting on the first panic you got -- but on the deadlock reported by DEADLKRES. I think that's the vm_page lock. You can run kgdb /boot/${KERNEL}/kernel where ${KERNEL} is the incrimined one then l *vfs_vmio_release+0x290 to get the exact point where it fails. I'm unsure here because 'show alllocks' and 'show locks' outputs are empty -- are you building your kernel with WITNESS etc..? Thanks, -- Davide There are no solved problems; there are only problems that are more or less solved -- Henri Poincare ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
From davide.itali...@gmail.com Tue Oct 15 11:30:07 2013 On Tue, Oct 15, 2013 at 10:43 AM, Anton Shterenlikht me...@bris.ac.uk wrote: Anyway, savecore eventually deadlocks: panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked for 901401 ticks [trim] Tracing command savecore pid 805 tid 100079 td 0xe000127b7b00 cpu_switch(0xe000127b7b00, 0xe00011178900, 0xe00012402fc0, 0x9ffc005e7e80) at cpu_switch+0xd0 sched_switch(0xe000127b7b00, 0xe00011178900, 0x9ffc00f15698, 0x9ffc00f15680) at sched_switch+0x890 mi_switch(0x103, 0x0, 0xe000127b7b00, 0x9ffc0062d1f0) at mi_switch+0x3f0 turnstile_wait(0xe00012402fc0, 0xe00012400480, 0x0, 0x9ffc00dcb698) at turnstile_wait+0x960 __mtx_lock_sleep(0x9ffc010f9998, 0xe000127b7b00, 0xe00012402fc0, 0x9ffc00dc0558, 0x742) at __mtx_lock_sleep+0x2f0 __mtx_lock_flags(0x9ffc010f9980, 0x0, 0x9ffc00dd4a90, 0x742) at __mtx_lock_flags+0x1e0 vfs_vmio_release(0xa0009ebe72f0, 0xe0027ed2ab70, 0x3, 0xa0009ebe736c, 0xa0009ebe7498, 0xa0009ebe72f8, 0x9ffc00dd4a90, 0x9ffc010f9680) at vfs_vmio_release+0x290 getnewbuf(0xe000127f4ec0, 0x0, 0x0, 0x8000, 0xa0009ebe99a8, 0x0, 0x9ffc010f0798, 0xa0009ebe72f0) at getnewbuf+0x7e0 getblk(0xe000127f4ec0, 0x4cbaa, 0x8000, 0x0, 0x0, 0x0, 0x0, 0x0) at getblk+0xee0 ffs_balloc_ufs2(0xe000127f4ec0, 0x4cbaa, 0xa000c60ba000, 0xe00011165a00, 0x7f05, 0xa0009dd79160) at ffs_balloc_ufs2+0x2950 ffs_write(0xa0009dd79248, 0x3000, 0x265d5) at ffs_write+0x5c0 VOP_WRITE_APV(0x9ffc00e94ac0, 0xa0009dd79248, 0x0, 0x0) at VOP_WRITE_APV+0x330 vn_write(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 0xe000129ae830, 0xe000127f4ec0) at vn_write+0x450 vn_io_fault(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 0xe000127b7b00) at vn_io_fault+0x330 dofilewrite(0xe000127b7b00, 0x7, 0xe000129ae820, 0xa0009dd79360, 0x, 0x0) at dofilewrite+0x180 kern_writev(0xe000127b7b00, 0x7, 0xa0009dd79360) at kern_writev+0xa0 sys_write(0xe000127b7b00, 0xa0009dd794e8, 0x9ffc00abac80, 0x48d) at sys_write+0x100 syscall(0xe000129d04a0, 0x140857000, 0x8000, 0xe000127b7b00, 0x0, 0x0, 0x9ffc00ab7280, 0x8) at syscall+0x5e0 --More-- I'm not commenting on the first panic you got -- but on the deadlock reported by DEADLKRES. I think that's the vm_page lock. You can run kgdb /boot/${KERNEL}/kernel where ${KERNEL} is the incrimined one then l *vfs_vmio_release+0x290 to get the exact point where it fails. Like this? # kgdb /boot/kernel/kernel GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as ia64-marcel-freebsd... (kgdb) l *vfs_vmio_release+0x290 0x9ffc006b8830 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1859). 1854/* 1855 * In order to keep page LRU ordering consistent, put 1856 * everything on the inactive queue. 1857 */ 1858vm_page_lock(m); 1859vm_page_unwire(m, 0); 1860 1861/* 1862 * Might as well free the page if we can and it has 1863 * no valid data. We also free the page if the (kgdb) I'm unsure here because 'show alllocks' and 'show locks' outputs are empty -- are you building your kernel with WITNESS etc..? I think so: # Debugging support. Always need this: options KDB # Enable kernel debugger support. options KDB_TRACE # Print a stack trace for a panic. # For full debugger support use (turn off in stable branch): options DDB # Support DDB options GDB # Support remote GDB options DEADLKRES # Enable the deadlock resolver options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # required by INVARIANTS options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones # textdump(4) options TEXTDUMP_PREFERRED options TEXTDUMP_VERBOSE # http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC Also, does this look right: $ sysctl -a | grep kdb debug.ddb.scripting.scripts: kdb.enter.panic=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; capture off; call