Re: zfs recv panic

2017-05-16 Thread Kristof Provost

On 16 May 2017, at 19:58, Andriy Gapon wrote:

On 16/05/2017 16:49, Kristof Provost wrote:

On 16 May 2017, at 15:41, Andriy Gapon wrote:

On 10/05/2017 12:37, Kristof Provost wrote:

I have a reproducible panic on CURRENT (r318136) doing
(jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc 
dual 1234

(dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var

For clarity, the receiving machine is CURRENT r318136, the sending 
machine is

running a somewhat older CURRENT version.

The receiving machine panics a few seconds in:

receiving full stream of zroot/var@before-kernel-2017-04-03 into
tank/jupiter/var@before-kernel-2017-04-03
panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) 
(0x0 ==
0x1), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,

line: 2007


could you please try to revert commits related to the compressed 
send and see if
that helps?  I assume that the sending machine does not have (does 
not use) the

feature while the target machine is capable of the feature.

The commits are: r317648 and r317414.  Mot that I really suspect 
that change,

but just to eliminate the possibility.


Those commits appear to be the trigger.
I’ve not changed the sender, but with those reverted I don’t see 
the panic any

more.


Thank you for testing.
Do you still have the old kernel / module and the crash dump?
It would interesting to poke around in frame 14.



This contains the kernel and crash files:
https://www.sigsegv.be/files/zfs_recv_kernel_crash.tar.bz2

I was running r318356 at the time of this panic.

Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs recv panic

2017-05-16 Thread Andriy Gapon
On 16/05/2017 16:49, Kristof Provost wrote:
> On 16 May 2017, at 15:41, Andriy Gapon wrote:
>> On 10/05/2017 12:37, Kristof Provost wrote:
>>> I have a reproducible panic on CURRENT (r318136) doing
>>> (jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc dual 1234
>>> (dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var
>>>
>>> For clarity, the receiving machine is CURRENT r318136, the sending machine 
>>> is
>>> running a somewhat older CURRENT version.
>>>
>>> The receiving machine panics a few seconds in:
>>>
>>> receiving full stream of zroot/var@before-kernel-2017-04-03 into
>>> tank/jupiter/var@before-kernel-2017-04-03
>>> panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) (0x0 ==
>>> 0x1), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,
>>> line: 2007
>>
>> could you please try to revert commits related to the compressed send and 
>> see if
>> that helps?  I assume that the sending machine does not have (does not use) 
>> the
>> feature while the target machine is capable of the feature.
>>
>> The commits are: r317648 and r317414.  Mot that I really suspect that change,
>> but just to eliminate the possibility.
> 
> Those commits appear to be the trigger.
> I’ve not changed the sender, but with those reverted I don’t see the panic any
> more.

Thank you for testing.
Do you still have the old kernel / module and the crash dump?
It would interesting to poke around in frame 14.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs recv panic

2017-05-16 Thread Kristof Provost

On 16 May 2017, at 15:41, Andriy Gapon wrote:

On 10/05/2017 12:37, Kristof Provost wrote:

I have a reproducible panic on CURRENT (r318136) doing
(jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc 
dual 1234

(dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var

For clarity, the receiving machine is CURRENT r318136, the sending 
machine is

running a somewhat older CURRENT version.

The receiving machine panics a few seconds in:

receiving full stream of zroot/var@before-kernel-2017-04-03 into
tank/jupiter/var@before-kernel-2017-04-03
panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) 
(0x0 ==
0x1), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,

line: 2007


could you please try to revert commits related to the compressed send 
and see if
that helps?  I assume that the sending machine does not have (does not 
use) the

feature while the target machine is capable of the feature.

The commits are: r317648 and r317414.  Mot that I really suspect that 
change,

but just to eliminate the possibility.


Those commits appear to be the trigger.
I’ve not changed the sender, but with those reverted I don’t see the 
panic any more.


Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs recv panic

2017-05-16 Thread Andriy Gapon
On 10/05/2017 12:37, Kristof Provost wrote:
> Hi,
> 
> I have a reproducible panic on CURRENT (r318136) doing
> (jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc dual 1234
> (dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var
> 
> For clarity, the receiving machine is CURRENT r318136, the sending machine is
> running a somewhat older CURRENT version.
> 
> The receiving machine panics a few seconds in:
> 
> receiving full stream of zroot/var@before-kernel-2017-04-03 into
> tank/jupiter/var@before-kernel-2017-04-03
> panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) (0x0 ==
> 0x1), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,
> line: 2007

Kristof,

could you please try to revert commits related to the compressed send and see if
that helps?  I assume that the sending machine does not have (does not use) the
feature while the target machine is capable of the feature.

The commits are: r317648 and r317414.  Mot that I really suspect that change,
but just to eliminate the possibility.
Thank you.

> cpuid = 0
> time = 1494408122
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0120cad930
> vpanic() at vpanic+0x19c/frame 0xfe0120cad9b0
> panic() at panic+0x43/frame 0xfe0120cada10
> assfail3() at assfail3+0x2c/frame 0xfe0120cada30
> dbuf_assign_arcbuf() at dbuf_assign_arcbuf+0xf2/frame 0xfe0120cada80
> dmu_assign_arcbuf() at dmu_assign_arcbuf+0x170/frame 0xfe0120cadad0
> receive_writer_thread() at receive_writer_thread+0x6ac/frame 
> 0xfe0120cadb70
> fork_exit() at fork_exit+0x84/frame 0xfe0120cadbb0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe0120cadbb0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 7 tid 100672 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> db>
> 
> 
> kgdb backtrace:
> #0  doadump (textdump=0) at pcpu.h:232
> #1  0x803a208b in db_dump (dummy=, dummy2= optimized out>, dummy3=, dummy4=) at
> /usr/src/sys/ddb/db_command.c:546
> #2  0x803a1e7f in db_command (cmd_table=) at
> /usr/src/sys/ddb/db_command.c:453
> #3  0x803a1bb4 in db_command_loop () at 
> /usr/src/sys/ddb/db_command.c:506
> #4  0x803a4c7f in db_trap (type=, code= optimized out>) at /usr/src/sys/ddb/db_main.c:248
> #5  0x80a93cb3 in kdb_trap (type=3, code=-61456, tf= out>) at /usr/src/sys/kern/subr_kdb.c:654
> #6  0x80ed3de6 in trap (frame=0xfe0120cad860) at
> /usr/src/sys/amd64/amd64/trap.c:537
> #7  0x80eb62f1 in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:236
> #8  0x80a933eb in kdb_enter (why=0x8143d8f5 "panic", 
> msg= optimized out>) at cpufunc.h:63
> #9  0x80a51cf9 in vpanic (fmt=,
> ap=0xfe0120cad9f0) at /usr/src/sys/kern/kern_shutdown.c:772
> #10 0x80a51d63 in panic (fmt=) at
> /usr/src/sys/kern/kern_shutdown.c:710
> #11 0x8262b26c in assfail3 (a=, lv= optimized
> out>, op=, rv=, f= out>, l=)
> at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
> #12 0x822ad892 in dbuf_assign_arcbuf (db=0xf8008f23e560,
> buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:2007
> #13 0x822b87f0 in dmu_assign_arcbuf (handle=,
> offset=0, buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1542
> #14 0x822bf7fc in receive_writer_thread (arg=0xfe0120a1d168) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2284
> #15 0x80a13704 in fork_exit (callout=0x822bf150
> , arg=0xfe0120a1d168, frame=0xfe0120cadbc0) at
> /usr/src/sys/kern/kern_fork.c:1038
> #16 0x80eb682e in fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:611
> #17 0x in ?? ()
> 
> Let me know if there’s any other information I can provide, or things I can 
> test.
> Fortunately the target machine is not a production machine, so I can panic it 
> as
> often as required.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

zfs recv panic

2017-05-10 Thread Kristof Provost

Hi,

I have a reproducible panic on CURRENT (r318136) doing
(jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc dual 
1234

(dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var

For clarity, the receiving machine is CURRENT r318136, the sending 
machine is running a somewhat older CURRENT version.


The receiving machine panics a few seconds in:

receiving full stream of zroot/var@before-kernel-2017-04-03 into 
tank/jupiter/var@before-kernel-2017-04-03
panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) (0x0 
== 0x1), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c, line: 
2007

cpuid = 0
time = 1494408122
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe0120cad930

vpanic() at vpanic+0x19c/frame 0xfe0120cad9b0
panic() at panic+0x43/frame 0xfe0120cada10
assfail3() at assfail3+0x2c/frame 0xfe0120cada30
dbuf_assign_arcbuf() at dbuf_assign_arcbuf+0xf2/frame 0xfe0120cada80
dmu_assign_arcbuf() at dmu_assign_arcbuf+0x170/frame 0xfe0120cadad0
receive_writer_thread() at receive_writer_thread+0x6ac/frame 
0xfe0120cadb70

fork_exit() at fork_exit+0x84/frame 0xfe0120cadbb0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0120cadbb0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 7 tid 100672 ]
Stopped at  kdb_enter+0x3b: movq$0,kdb_why
db>


kgdb backtrace:
#0  doadump (textdump=0) at pcpu.h:232
#1  0x803a208b in db_dump (dummy=, 
dummy2=, dummy3=, 
dummy4=) at /usr/src/sys/ddb/db_command.c:546
#2  0x803a1e7f in db_command (cmd_table=) 
at /usr/src/sys/ddb/db_command.c:453
#3  0x803a1bb4 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:506
#4  0x803a4c7f in db_trap (type=, 
code=) at /usr/src/sys/ddb/db_main.c:248
#5  0x80a93cb3 in kdb_trap (type=3, code=-61456, tf=optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0x80ed3de6 in trap (frame=0xfe0120cad860) at 
/usr/src/sys/amd64/amd64/trap.c:537
#7  0x80eb62f1 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236
#8  0x80a933eb in kdb_enter (why=0x8143d8f5 "panic", 
msg=) at cpufunc.h:63
#9  0x80a51cf9 in vpanic (fmt=, 
ap=0xfe0120cad9f0) at /usr/src/sys/kern/kern_shutdown.c:772
#10 0x80a51d63 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:710
#11 0x8262b26c in assfail3 (a=, lv=optimized out>, op=, rv=, 
f=, l=)
at 
/usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#12 0x822ad892 in dbuf_assign_arcbuf (db=0xf8008f23e560, 
buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:2007
#13 0x822b87f0 in dmu_assign_arcbuf (handle=out>, offset=0, buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1542
#14 0x822bf7fc in receive_writer_thread (arg=0xfe0120a1d168) 
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2284
#15 0x80a13704 in fork_exit (callout=0x822bf150 
, arg=0xfe0120a1d168, 
frame=0xfe0120cadbc0) at /usr/src/sys/kern/kern_fork.c:1038
#16 0x80eb682e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:611

#17 0x in ?? ()

Let me know if there’s any other information I can provide, or things 
I can test.
Fortunately the target machine is not a production machine, so I can 
panic it as often as required.


Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"