Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-25 Thread Andriy Gapon
on 25/12/2012 02:11 Derek Kulinski said the following:
 Hello Andriy,
 
 Monday, December 24, 2012, 3:28:00 PM, you wrote:
 
 I've looked through the cores and it does look like in all cases some sort of
 memory corruption is a precursor to a subsequent crash.
 
 I can't decidedly say if the corruptions are caused by the hardware, by some
 code overwriting random memory locations (rogue driver) or by a simpler 
 bug
 like use after free.
 
 I am always inclined to suspect the hardware first.
 
 You can try to reproduce the problem with some additional checks enabled in 
 the
 kernel.  Those should catch the problem earlier and thus make its source 
 clearer.
 
 I recommend the following:
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_MEMGUARD
 makeoptions DEBUG+=-DDEBUG
 
 The last is really needed only for the ZFS and OpenSolaris compat code.  It 
 make
 result in some extra noise from unrelated subsystems.
 Perhaps you could just add #define DEBUG to
 sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
 approach though.
 
 Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.
 
 Please note that these options will make your system significantly slower.
 
 I recompiled the kernel and is running with options you specified (I
 enabled DEBUG in the file).
 
 Anyway even at boot time I started getting following warnings, is this
 anything:

These witness warning are OK-ish.
Watch for panics.

BTW, I should have said this earlier.  Whatever the kind of the corruptions it
would be much worse if a corruption would get propagated to the stable storage.
Especially if it would be in any kind of pool metadata.

So, your data is at great risk now.
Please also take measures to back it up.  Preferably by using a different 
system.

 Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files
 Dec 24 16:06:03 chinatsu kernel: lock order reversal:
 Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task 
 mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330
 Dec 24 16:06:03 chinatsu kernel: .
 Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head 
 (radix node head) @ /usr/src/sys/net/route.c:384
 Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace:
 Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2a
 Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
 Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
 Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at 
 witness_checkorder+0x844
 Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at
 Dec 24 16:06:03 chinatsu kernel: Starting syslogd.
 Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81
 Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c
 Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5
 Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd
 Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf
 Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58
 Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b
 Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2
 Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc
 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
 netisr_dispatch_src+0x170
 Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d
 Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209
 Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
 netisr_dispatch_src+0x170
 Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff
 Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at 
 taskqueue_run_locked+0x93
 Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at 
 taskqueue_thread_loop+0x3e
 Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133
 Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe
 Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 
 0xff85fb2ebbb0, rbp = 0 ---
 Dec 24 16:06:03 chinatsu kernel: No core dumps found.
 Dec 24 16:06:04 chinatsu kernel: lock order reversal:
 Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ 
 /usr/src/sys/kern/vfs_bio.c:2677
 Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ 
 /usr/src/sys/ufs/ufs/ufs_dirhash.c:284
 Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
 Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2a
 Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
 Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
 Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at 
 witness_checkorder+0x844
 Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61
 Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at 
 

Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-25 Thread Derek Kulinski
Hello Andriy,

Tuesday, December 25, 2012, 12:18:23 AM, you wrote:


 I recompiled the kernel and is running with options you specified (I
 enabled DEBUG in the file).
 
 Anyway even at boot time I started getting following warnings, is this
 anything:

 These witness warning are OK-ish.
 Watch for panics.

Sadly (I guess :) I did not have crash today (I don't see even a
warning at that time), I'll update when it happens. Will also try to
run that task today and will see if it will do anything.

 BTW, I should have said this earlier.  Whatever the kind of the corruptions it
 would be much worse if a corruption would get propagated to the stable 
 storage.
 Especially if it would be in any kind of pool metadata.

 So, your data is at great risk now.
 Please also take measures to back it up.  Preferably by using a different 
 system.

I'm backing it up using zfs send since dump doesn't appear to work on
ZFS.

Anyway zpool scrub does not show any problems... I performed my last
scan when I started this thread.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

Hand over the calculator, friends don't let friends derive drunk

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Andriy Gapon
on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

So do you have the crash dump(s)?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Andriy,

Monday, December 24, 2012, 8:01:26 AM, you wrote:

 on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

 So do you have the crash dump(s)?

Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
resend it to you (I don't know if it's ok to send attachments to the
mailing list). If you would prefer I could give you access to the
box.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

-- Programmer - A red-eyed, mumbling mammal capable of conversing with 
inanimate objects.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Mark Linimon
On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote:
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you

We have a limit of 500K on GNATS PRs.  For something that huge, a PR
database is really not the right place for it -- please post the dumps
somewhere and include a URL to them in a followup to the PR.

Thanks.

Mark Linimon, on behalf of bugmeister
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Mark,

Monday, December 24, 2012, 12:46:53 PM, you wrote:

 On Mon, Dec 24, 2012 at 10:17:19AM -0800, Derek Kulinski wrote:
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you

 We have a limit of 500K on GNATS PRs.  For something that huge, a PR
 database is really not the right place for it -- please post the dumps
 somewhere and include a URL to them in a followup to the PR.

 Thanks.

 Mark Linimon, on behalf of bugmeister

I included the text dump, but I do not see it when I visit the web
interface so I don't know if it was attached there or not.

-- 
Best regards,
 Derekmailto:tak...@takeda.tk

My new car runs at 56Kbps

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Andriy Gapon
on 24/12/2012 20:17 Derek Kulinski said the following:
 Hello Andriy,
 
 Monday, December 24, 2012, 8:01:26 AM, you wrote:
 
 on 24/12/2012 00:23 Derek Kulinski said the following:
 Dumping 3701 out of 8072 
 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
 
 So do you have the crash dump(s)?
 
 Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
 resend it to you (I don't know if it's ok to send attachments to the
 mailing list). If you would prefer I could give you access to the
 box.

Derek,

I've looked through the cores and it does look like in all cases some sort of
memory corruption is a precursor to a subsequent crash.

I can't decidedly say if the corruptions are caused by the hardware, by some
code overwriting random memory locations (rogue driver) or by a simpler bug
like use after free.

I am always inclined to suspect the hardware first.

You can try to reproduce the problem with some additional checks enabled in the
kernel.  Those should catch the problem earlier and thus make its source 
clearer.

I recommend the following:
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_MEMGUARD
makeoptions DEBUG+=-DDEBUG

The last is really needed only for the ZFS and OpenSolaris compat code.  It make
result in some extra noise from unrelated subsystems.
Perhaps you could just add #define DEBUG to
sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
approach though.

Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.

Please note that these options will make your system significantly slower.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-24 Thread Derek Kulinski
Hello Andriy,

Monday, December 24, 2012, 3:28:00 PM, you wrote:

 I've looked through the cores and it does look like in all cases some sort of
 memory corruption is a precursor to a subsequent crash.

 I can't decidedly say if the corruptions are caused by the hardware, by some
 code overwriting random memory locations (rogue driver) or by a simpler 
 bug
 like use after free.

 I am always inclined to suspect the hardware first.

 You can try to reproduce the problem with some additional checks enabled in 
 the
 kernel.  Those should catch the problem earlier and thus make its source 
 clearer.

 I recommend the following:
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_MEMGUARD
 makeoptions DEBUG+=-DDEBUG

 The last is really needed only for the ZFS and OpenSolaris compat code.  It 
 make
 result in some extra noise from unrelated subsystems.
 Perhaps you could just add #define DEBUG to
 sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
 approach though.

 Also, please put vm.memguard.desc=arc_buf_hdr_t into loader.conf.

 Please note that these options will make your system significantly slower.

I recompiled the kernel and is running with options you specified (I
enabled DEBUG in the file).

Anyway even at boot time I started getting following warnings, is this
anything:

Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files
Dec 24 16:06:03 chinatsu kernel: lock order reversal:
Dec 24 16:06:03 chinatsu kernel: 1st 0x80bf5780 pf task mtx (pf task 
mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330
Dec 24 16:06:03 chinatsu kernel: .
Dec 24 16:06:03 chinatsu kernel: 2nd 0xfe0009211af8 radix node head (radix 
node head) @ /usr/src/sys/net/route.c:384
Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace:
Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2a
Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at 
witness_checkorder+0x844
Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at
Dec 24 16:06:03 chinatsu kernel: Starting syslogd.
Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81
Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c
Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5
Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd
Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf
Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58
Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b
Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2
Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc
Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
netisr_dispatch_src+0x170
Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d
Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209
Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at 
netisr_dispatch_src+0x170
Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff
Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at 
taskqueue_run_locked+0x93
Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at 
taskqueue_thread_loop+0x3e
Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133
Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe
Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xff85fb2ebbb0, 
rbp = 0 ---
Dec 24 16:06:03 chinatsu kernel: No core dumps found.
Dec 24 16:06:04 chinatsu kernel: lock order reversal:
Dec 24 16:06:04 chinatsu kernel: 1st 0xff85b9cb8dd8 bufwait (bufwait) @ 
/usr/src/sys/kern/vfs_bio.c:2677
Dec 24 16:06:04 chinatsu kernel: 2nd 0xfe00092c5c00 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:284
Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2a
Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at 
witness_checkorder+0x844
Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x33
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove() at
Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove+0x16
Dec 24 16:06:04 chinatsu kernel: ufs_dirremove() at ufs_dirremove+0x1bb
Dec 24 16:06:04 chinatsu kernel: ufs_remove() at ufs_remove+0x92
Dec 24 16:06:04 chinatsu kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xb7
Dec 24 16:06:04 chinatsu kernel: kern_unlinkat() at kern_unlinkat+0x2eb
Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e
Dec 24 16:06:04 chinatsu 

FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines

2012-12-23 Thread Derek Kulinski
Please help, I reported this issue on
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/174372 but the crashes
are unbearable since they happen regularly at night, most of the time
when periodic.daily is called (3am) but there are exceptions. It seems
like it can be triggered by any heavy disk activity. In many of the
crash dumps the current process is find command, but of course that's
does not always is the cause.

When calling scrub, it appears to pass successfully. Smartmon tools is
not reporting any disk errors. I tested memory using memtest86 about a
month ago and it passed tests successfully.

I never had this type of issue on 9.0, and not much changed in my
kernel config besides installing WiFi card.

System:
FreeBSD chinatsu.takeda.tk 9.1-RELEASE FreeBSD 9.1-RELEASE #2 r244482: Wed Dec 
19 23:28:15 PST 2012 r...@chinatsu.takeda.tk:/usr/obj/usr/src/sys/CHINATSU  
amd64
It's compiled from releng/9.1 branch.

Thank you for any help,
Derek


Example crash messages:

Today:
panic: general protection fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
trap_fatal() at trap_fatal+0x285
trap() at trap+0x23e
calltrap() at calltrap+0x8
--- trap 0x9, rip = 0x8101f5ac, rsp = 0xff8230ac18c0, rbp = 
0xff8230ac18d0 ---
list_prev() at list_prev+0xc
arc_evict() at arc_evict+0x194
arc_adjust() at arc_adjust+0x1a1
arc_reclaim_thread() at arc_reclaim_thread+0x1a6
fork_exit() at fork_exit+0x11c
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8230ac1bb0, rbp = 0 ---
Uptime: 20h44m51s
Dumping 1157 out of 8072 MB:..2%..12%..21%..31%..41%..52%..61%..71%..81%..92%


Yesterday:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xffe8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8102d1e7
stack pointer   = 0x28:0xff82315ec190
frame pointer   = 0x28:0xff82315ec280
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 34744 (find)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
trap_fatal() at trap_fatal+0x285
trap_pfault() at trap_pfault+0x216
trap() at trap+0x363
calltrap() at calltrap+0x8
--- trap 0xc, rip = 0x8102d1e7, rsp = 0xff82315ec190, rbp = 
0xff82315ec280 ---
arc_evict() at arc_evict+0x1e7
arc_get_data_buf() at arc_get_data_buf+0x1d5
arc_read_nolock() at arc_read_nolock+0x1ec
arc_read() at arc_read+0x93
dbuf_read() at dbuf_read+0x452
dmu_buf_hold() at dmu_buf_hold+0xe0
zap_lockdir() at zap_lockdir+0x58
zap_cursor_retrieve() at zap_cursor_retrieve+0x19b
zfs_freebsd_readdir() at zfs_freebsd_readdir+0x2ee
VOP_READDIR_APV() at VOP_READDIR_APV+0x4a
kern_getdirentries() at kern_getdirentries+0x225
sys_getdirentries() at sys_getdirentries+0x23
amd64_syscall() at amd64_syscall+0x5d8
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (196, FreeBSD ELF64, sys_getdirentries), rip = 0x80089b29c, rsp = 
0x7fffd8a8, rbp = 0x1 ---
Uptime: 2d3h49m3s
Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

  

-- 
Best regards,
 Derek  mailto:tak...@takeda.tk

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org