Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
on 25/12/2012 02:11 Derek Kulinski said the following: Hello Andriy, Monday, December 24, 2012, 3:28:00 PM, you wrote: I've looked through the cores and it does look like in all cases some sort of memory corruption is a precursor to a subsequent crash. I can't decidedly say if the corruptions are caused by the hardware, by some code overwriting random memory locations (rogue driver) or by a simpler bug like use after free. I am always inclined to suspect the hardware first. You can try to reproduce the problem with some additional checks enabled in the kernel. Those should catch the problem earlier and thus make its source clearer. I recommend the following: options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_MEMGUARD makeoptions DEBUG+=-DDEBUG The last is really needed only for the ZFS and OpenSolaris compat code. It make result in some extra noise from unrelated subsystems. Perhaps you could just add #define DEBUG to sys/cddl/contrib/opensolaris/uts/common/sys/debug.h. I haven't tested this approach though. Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf. Please note that these options will make your system significantly slower. I recompiled the kernel and is running with options you specified (I enabled DEBUG in the file). Anyway even at boot time I started getting following warnings, is this anything: These witness warning are OK-ish. Watch for panics. BTW, I should have said this earlier. Whatever the kind of the corruptions it would be much worse if a corruption would get propagated to the stable storage. Especially if it would be in any kind of pool metadata. So, your data is at great risk now. Please also take measures to back it up. Preferably by using a different system. Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files Dec 24 16:06:03 chinatsu kernel: lock order reversal: Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330 Dec 24 16:06:03 chinatsu kernel: . Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head (radix node head) @ /usr/src/sys/net/route.c:384 Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace: Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at Dec 24 16:06:03 chinatsu kernel: Starting syslogd. Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81 Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5 Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58 Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2 Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at taskqueue_run_locked+0x93 Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0x3e Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133 Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xff85fb2ebbb0, rbp = 0 --- Dec 24 16:06:03 chinatsu kernel: No core dumps found. Dec 24 16:06:04 chinatsu kernel: lock order reversal: Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2677 Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace: Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61 Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Hello Andriy, Tuesday, December 25, 2012, 12:18:23 AM, you wrote: I recompiled the kernel and is running with options you specified (I enabled DEBUG in the file). Anyway even at boot time I started getting following warnings, is this anything: These witness warning are OK-ish. Watch for panics. Sadly (I guess :) I did not have crash today (I don't see even a warning at that time), I'll update when it happens. Will also try to run that task today and will see if it will do anything. BTW, I should have said this earlier. Whatever the kind of the corruptions it would be much worse if a corruption would get propagated to the stable storage. Especially if it would be in any kind of pool metadata. So, your data is at great risk now. Please also take measures to back it up. Preferably by using a different system. I'm backing it up using zfs send since dump doesn't appear to work on ZFS. Anyway zpool scrub does not show any problems... I performed my last scan when I started this thread. -- Best regards, Derekmailto:tak...@takeda.tk Hand over the calculator, friends don't let friends derive drunk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
on 24/12/2012 00:23 Derek Kulinski said the following: Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% So do you have the crash dump(s)? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Hello Andriy, Monday, December 24, 2012, 8:01:26 AM, you wrote: on 24/12/2012 00:23 Derek Kulinski said the following: Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% So do you have the crash dump(s)? Yes, but they are 3.5GB each. I attached text dump to GNATS but I can resend it to you (I don't know if it's ok to send attachments to the mailing list). If you would prefer I could give you access to the box. -- Best regards, Derekmailto:tak...@takeda.tk -- Programmer - A red-eyed, mumbling mammal capable of conversing with inanimate objects. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote: Yes, but they are 3.5GB each. I attached text dump to GNATS but I can resend it to you We have a limit of 500K on GNATS PRs. For something that huge, a PR database is really not the right place for it -- please post the dumps somewhere and include a URL to them in a followup to the PR. Thanks. Mark Linimon, on behalf of bugmeister ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Hello Mark, Monday, December 24, 2012, 12:46:53 PM, you wrote: On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote: Yes, but they are 3.5GB each. I attached text dump to GNATS but I can resend it to you We have a limit of 500K on GNATS PRs. For something that huge, a PR database is really not the right place for it -- please post the dumps somewhere and include a URL to them in a followup to the PR. Thanks. Mark Linimon, on behalf of bugmeister I included the text dump, but I do not see it when I visit the web interface so I don't know if it was attached there or not. -- Best regards, Derekmailto:tak...@takeda.tk My new car runs at 56Kbps ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
on 24/12/2012 20:17 Derek Kulinski said the following: Hello Andriy, Monday, December 24, 2012, 8:01:26 AM, you wrote: on 24/12/2012 00:23 Derek Kulinski said the following: Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% So do you have the crash dump(s)? Yes, but they are 3.5GB each. I attached text dump to GNATS but I can resend it to you (I don't know if it's ok to send attachments to the mailing list). If you would prefer I could give you access to the box. Derek, I've looked through the cores and it does look like in all cases some sort of memory corruption is a precursor to a subsequent crash. I can't decidedly say if the corruptions are caused by the hardware, by some code overwriting random memory locations (rogue driver) or by a simpler bug like use after free. I am always inclined to suspect the hardware first. You can try to reproduce the problem with some additional checks enabled in the kernel. Those should catch the problem earlier and thus make its source clearer. I recommend the following: options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_MEMGUARD makeoptions DEBUG+=-DDEBUG The last is really needed only for the ZFS and OpenSolaris compat code. It make result in some extra noise from unrelated subsystems. Perhaps you could just add #define DEBUG to sys/cddl/contrib/opensolaris/uts/common/sys/debug.h. I haven't tested this approach though. Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf. Please note that these options will make your system significantly slower. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Hello Andriy, Monday, December 24, 2012, 3:28:00 PM, you wrote: I've looked through the cores and it does look like in all cases some sort of memory corruption is a precursor to a subsequent crash. I can't decidedly say if the corruptions are caused by the hardware, by some code overwriting random memory locations (rogue driver) or by a simpler bug like use after free. I am always inclined to suspect the hardware first. You can try to reproduce the problem with some additional checks enabled in the kernel. Those should catch the problem earlier and thus make its source clearer. I recommend the following: options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_MEMGUARD makeoptions DEBUG+=-DDEBUG The last is really needed only for the ZFS and OpenSolaris compat code. It make result in some extra noise from unrelated subsystems. Perhaps you could just add #define DEBUG to sys/cddl/contrib/opensolaris/uts/common/sys/debug.h. I haven't tested this approach though. Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf. Please note that these options will make your system significantly slower. I recompiled the kernel and is running with options you specified (I enabled DEBUG in the file). Anyway even at boot time I started getting following warnings, is this anything: Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files Dec 24 16:06:03 chinatsu kernel: lock order reversal: Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330 Dec 24 16:06:03 chinatsu kernel: . Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head (radix node head) @ /usr/src/sys/net/route.c:384 Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace: Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at Dec 24 16:06:03 chinatsu kernel: Starting syslogd. Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81 Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5 Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58 Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2 Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at taskqueue_run_locked+0x93 Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0x3e Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133 Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xff85fb2ebbb0, rbp = 0 --- Dec 24 16:06:03 chinatsu kernel: No core dumps found. Dec 24 16:06:04 chinatsu kernel: lock order reversal: Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2677 Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace: Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61 Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x33 Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove() at Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove+0x16 Dec 24 16:06:04 chinatsu kernel: ufs_dirremove() at ufs_dirremove+0x1bb Dec 24 16:06:04 chinatsu kernel: ufs_remove() at ufs_remove+0x92 Dec 24 16:06:04 chinatsu kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xb7 Dec 24 16:06:04 chinatsu kernel: kern_unlinkat() at kern_unlinkat+0x2eb Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e Dec 24 16:06:04 chinatsu
FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Please help, I reported this issue on http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/174372 but the crashes are unbearable since they happen regularly at night, most of the time when periodic.daily is called (3am) but there are exceptions. It seems like it can be triggered by any heavy disk activity. In many of the crash dumps the current process is find command, but of course that's does not always is the cause. When calling scrub, it appears to pass successfully. Smartmon tools is not reporting any disk errors. I tested memory using memtest86 about a month ago and it passed tests successfully. I never had this type of issue on 9.0, and not much changed in my kernel config besides installing WiFi card. System: FreeBSD chinatsu.takeda.tk 9.1-RELEASE FreeBSD 9.1-RELEASE #2 r244482: Wed Dec 19 23:28:15 PST 2012 r...@chinatsu.takeda.tk:/usr/obj/usr/src/sys/CHINATSU amd64 It's compiled from releng/9.1 branch. Thank you for any help, Derek Example crash messages: Today: panic: general protection fault cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1cd trap_fatal() at trap_fatal+0x285 trap() at trap+0x23e calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x8101f5ac, rsp = 0xff8230ac18c0, rbp = 0xff8230ac18d0 --- list_prev() at list_prev+0xc arc_evict() at arc_evict+0x194 arc_adjust() at arc_adjust+0x1a1 arc_reclaim_thread() at arc_reclaim_thread+0x1a6 fork_exit() at fork_exit+0x11c fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8230ac1bb0, rbp = 0 --- Uptime: 20h44m51s Dumping 1157 out of 8072 MB:..2%..12%..21%..31%..41%..52%..61%..71%..81%..92% Yesterday: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xffe8 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8102d1e7 stack pointer = 0x28:0xff82315ec190 frame pointer = 0x28:0xff82315ec280 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 34744 (find) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1cd trap_fatal() at trap_fatal+0x285 trap_pfault() at trap_pfault+0x216 trap() at trap+0x363 calltrap() at calltrap+0x8 --- trap 0xc, rip = 0x8102d1e7, rsp = 0xff82315ec190, rbp = 0xff82315ec280 --- arc_evict() at arc_evict+0x1e7 arc_get_data_buf() at arc_get_data_buf+0x1d5 arc_read_nolock() at arc_read_nolock+0x1ec arc_read() at arc_read+0x93 dbuf_read() at dbuf_read+0x452 dmu_buf_hold() at dmu_buf_hold+0xe0 zap_lockdir() at zap_lockdir+0x58 zap_cursor_retrieve() at zap_cursor_retrieve+0x19b zfs_freebsd_readdir() at zfs_freebsd_readdir+0x2ee VOP_READDIR_APV() at VOP_READDIR_APV+0x4a kern_getdirentries() at kern_getdirentries+0x225 sys_getdirentries() at sys_getdirentries+0x23 amd64_syscall() at amd64_syscall+0x5d8 Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (196, FreeBSD ELF64, sys_getdirentries), rip = 0x80089b29c, rsp = 0x7fffd8a8, rbp = 0x1 --- Uptime: 2d3h49m3s Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% -- Best regards, Derek mailto:tak...@takeda.tk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org