Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-25 Thread Andriy Gapon
on 25/12/2012 02:11 Derek Kulinski said the following:
 Hello Andriy,
 
 Monday, December 24, 2012, 3:28:00 PM, you wrote:
 
 I've looked through the cores and it does look like in all cases some sort of
 memory corruption is a precursor to a subsequent crash.
 
 I can't decidedly say if the corruptions are caused by the hardware, by some
 code overwriting random memory locations (rogue driver) or by a simpler 
 bug
 like use after free.
 
 I am always inclined to suspect the hardware first.
 
 You can try to reproduce the problem with some additional checks enabled in 
 the
 kernel.  Those should catch the problem earlier and thus make its source 
 clearer.
 
 I recommend the following:
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_MEMGUARD
 makeoptions DEBUG+=-DDEBUG
 
 The last is really needed only for the ZFS and OpenSolaris compat code.  It 
 make
 result in some extra noise from unrelated subsystems.
 Perhaps you could just add #define DEBUG to
 sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
 approach though.
 
 Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.
 
 Please note that these options will make your system significantly slower.
 
 I recompiled the kernel and is running with options you specified (I
 enabled DEBUG in the file).
 
 Anyway even at boot time I started getting following warnings, is this
 anything:

These witness warning are OK-ish.
Watch for panics.

BTW, I should have said this earlier.  Whatever the kind of the corruptions it
would be much worse if a corruption would get propagated to the stable storage.
Especially if it would be in any kind of pool metadata.

So, your data is at great risk now.
Please also take measures to back it up.  Preferably by using a different 
system.

 Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files
 Dec 24 16:06:03 chinatsu kernel: lock order reversal:
 Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task 
 mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330
 Dec 24 16:06:03 chinatsu kernel: .
 Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head 
 (radix node head) @ /usr/src/sys/net/route.c:384
 Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace:
 Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2a
 Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
 Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
 Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at 
 witness_checkorder+0x844
 Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at
 Dec 24 16:06:03 chinatsu kernel: Starting syslogd.
 Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81
 Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c
 Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5
 Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd
 Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf
 Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58
 Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b
 Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2
 Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc
 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
 netisr_dispatch_src+0x170
 Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d
 Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209
 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
 netisr_dispatch_src+0x170
 Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff
 Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at 
 taskqueue_run_locked+0x93
 Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at 
 taskqueue_thread_loop+0x3e
 Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133
 Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe
 Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 
 0xff85fb2ebbb0, rbp = 0 ---
 Dec 24 16:06:03 chinatsu kernel: No core dumps found.
 Dec 24 16:06:04 chinatsu kernel: lock order reversal:
 Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ 
 /usr/src/sys/kern/vfs_bio.c:2677
 Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ 
 /usr/src/sys/ufs/ufs/ufs_dirhash.c:284
 Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
 Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2a
 Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
 Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
 Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at 
 witness_checkorder+0x844
 Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61
 Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at 
 

Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-25 Thread Derek Kulinski
Hello Andriy,

Tuesday, December 25, 2012, 12:18:23 AM, you wrote:


 I recompiled the kernel and is running with options you specified (I
 enabled DEBUG in the file).
 
 Anyway even at boot time I started getting following warnings, is this
 anything:

 These witness warning are OK-ish.
 Watch for panics.

Sadly (I guess :) I did not have crash today (I don't see even a
warning at that time), I'll update when it happens. Will also try to
run that task today and will see if it will do anything.

 BTW, I should have said this earlier.  Whatever the kind of the corruptions it
 would be much worse if a corruption would get propagated to the stable 
 storage.
 Especially if it would be in any kind of pool metadata.

 So, your data is at great risk now.
 Please also take measures to back it up.  Preferably by using a different 
 system.

I'm backing it up using zfs send since dump doesn't appear to work on
ZFS.

Anyway zpool scrub does not show any problems... I performed my last
scan when I started this thread.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

Hand over the calculator, friends don't let friends derive drunk

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Andriy Gapon
on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

So do you have the crash dump(s)?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Andriy,

Monday, December 24, 2012, 8:01:26 AM, you wrote:

 on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

 So do you have the crash dump(s)?

Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
resend it to you (I don't know if it's ok to send attachments to the
mailing list). If you would prefer I could give you access to the
box.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

-- Programmer - A red-eyed, mumbling mammal capable of conversing with 
inanimate objects.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Mark Linimon
On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote:
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you

We have a limit of 500K on GNATS PRs.  For something that huge, a PR
database is really not the right place for it -- please post the dumps
somewhere and include a URL to them in a followup to the PR.

Thanks.

Mark Linimon, on behalf of bugmeister
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Mark,

Monday, December 24, 2012, 12:46:53 PM, you wrote:

 On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote:
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you

 We have a limit of 500K on GNATS PRs.  For something that huge, a PR
 database is really not the right place for it -- please post the dumps
 somewhere and include a URL to them in a followup to the PR.

 Thanks.

 Mark Linimon, on behalf of bugmeister

I included the text dump, but I do not see it when I visit the web
interface so I don't know if it was attached there or not.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

My new car runs at 56Kbps

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Andriy Gapon
on 24/12/2012 20:17 Derek Kulinski said the following:
 Hello Andriy,
 
 Monday, December 24, 2012, 8:01:26 AM, you wrote:
 
 on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 
 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
 
 So do you have the crash dump(s)?
 
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you (I don't know if it's ok to send attachments to the
 mailing list). If you would prefer I could give you access to the
 box.

Derek,

I've looked through the cores and it does look like in all cases some sort of
memory corruption is a precursor to a subsequent crash.

I can't decidedly say if the corruptions are caused by the hardware, by some
code overwriting random memory locations (rogue driver) or by a simpler bug
like use after free.

I am always inclined to suspect the hardware first.

You can try to reproduce the problem with some additional checks enabled in the
kernel.  Those should catch the problem earlier and thus make its source 
clearer.

I recommend the following:
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_MEMGUARD
makeoptions DEBUG+=-DDEBUG

The last is really needed only for the ZFS and OpenSolaris compat code.  It make
result in some extra noise from unrelated subsystems.
Perhaps you could just add #define DEBUG to
sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
approach though.

Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.

Please note that these options will make your system significantly slower.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Andriy,

Monday, December 24, 2012, 3:28:00 PM, you wrote:

 I've looked through the cores and it does look like in all cases some sort of
 memory corruption is a precursor to a subsequent crash.

 I can't decidedly say if the corruptions are caused by the hardware, by some
 code overwriting random memory locations (rogue driver) or by a simpler 
 bug
 like use after free.

 I am always inclined to suspect the hardware first.

 You can try to reproduce the problem with some additional checks enabled in 
 the
 kernel.  Those should catch the problem earlier and thus make its source 
 clearer.

 I recommend the following:
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_MEMGUARD
 makeoptions DEBUG+=-DDEBUG

 The last is really needed only for the ZFS and OpenSolaris compat code.  It 
 make
 result in some extra noise from unrelated subsystems.
 Perhaps you could just add #define DEBUG to
 sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
 approach though.

 Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.

 Please note that these options will make your system significantly slower.

I recompiled the kernel and is running with options you specified (I
enabled DEBUG in the file).

Anyway even at boot time I started getting following warnings, is this
anything:

Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files
Dec 24 16:06:03 chinatsu kernel: lock order reversal:
Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task 
mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330
Dec 24 16:06:03 chinatsu kernel: .
Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head (radix 
node head) @ /usr/src/sys/net/route.c:384
Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace:
Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2a
Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at 
witness_checkorder+0x844
Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at
Dec 24 16:06:03 chinatsu kernel: Starting syslogd.
Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81
Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c
Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5
Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd
Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf
Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58
Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b
Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2
Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc
Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
netisr_dispatch_src+0x170
Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d
Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209
Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
netisr_dispatch_src+0x170
Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff
Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at 
taskqueue_run_locked+0x93
Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at 
taskqueue_thread_loop+0x3e
Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133
Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe
Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xff85fb2ebbb0, 
rbp = 0 ---
Dec 24 16:06:03 chinatsu kernel: No core dumps found.
Dec 24 16:06:04 chinatsu kernel: lock order reversal:
Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ 
/usr/src/sys/kern/vfs_bio.c:2677
Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:284
Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2a
Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at 
witness_checkorder+0x844
Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x33
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove() at
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove+0x16
Dec 24 16:06:04 chinatsu kernel: ufs_dirremove() at ufs_dirremove+0x1bb
Dec 24 16:06:04 chinatsu kernel: ufs_remove() at ufs_remove+0x92
Dec 24 16:06:04 chinatsu kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xb7
Dec 24 16:06:04 chinatsu kernel: kern_unlinkat() at kern_unlinkat+0x2eb
Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e
Dec 24 16:06:04 chinatsu