Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:09 AM, mike tancsa wrote:
> On 12/22/2020 10:07 AM, Mark Johnston wrote:
>> Could you go to frame 11 and print zone->uz_name and
>> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
>> somehow.
> Thank you for looking!
>
> (kgdb) frame 11
>
> #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
> bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
> 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
> bucket->ub_cnt);
> (kgdb) p zone->uz_name
> $1 = 0x8102118a "mbuf_jumbo_9k"
> (kgdb) p bucket->ub_bucket[18]
> $2 = (void *) 0xf80de4654000
> (kgdb) p bucket->ub_bucket   
> $3 = 0xf801c7fd5218
>
> (kgdb)
>
Not sure if its coincidence or not, but previously I was running with
arc being limited to ~30G of the 64G of RAM on the box.  I removed that
limit a few weeks ago after upgrading the box to RELENG_12 to pull in
the OpenSSL changes.  The panic seems to happen under disk load. I have
3 zfs pools that are pretty busy receiving snapshots. One day a week, we
write a full set to a 4th zfs pool off some geli attached drives via USB
for offsite cold storage.  The crashes happened with that extra level of
disk work.  gstat shows most of the 12 drives off 2 mrsas controllers at
or close to 100% busy during the 18hrs it takes to dump out the files.

Trying a new cold storage run now with the arc limit back to
vfs.zfs.arc_max=29334498304

    ---Mike



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:07 AM, Mark Johnston wrote:
>
> Could you go to frame 11 and print zone->uz_name and
> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
> somehow.

Thank you for looking!

(kgdb) frame 11

#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
bucket->ub_cnt);
(kgdb) p zone->uz_name
$1 = 0x8102118a "mbuf_jumbo_9k"
(kgdb) p bucket->ub_bucket[18]
$2 = (void *) 0xf80de4654000
(kgdb) p bucket->ub_bucket   
$3 = 0xf801c7fd5218

(kgdb)

    ---Mike

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread Mark Johnston
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote:
> Hmmm, another one. Not sure if this is hardware as it seems different ?
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x0
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x80ca0826
> stack pointer   = 0x28:0xfe00bc0f8540
> frame pointer   = 0x28:0xfe00bc0f8590
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 33 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 11
> time = 1608641071
> KDB: stack backtrace:
> #0 0x80a3fe85 at kdb_backtrace+0x65
> #1 0x809f406b at vpanic+0x17b
> #2 0x809f3ee3 at panic+0x43
> #3 0x80e3fe71 at trap_fatal+0x391
> #4 0x80e3fecf at trap_pfault+0x4f
> #5 0x80e3f516 at trap+0x286
> #6 0x80e19318 at calltrap+0x8
> #7 0x80ca47d4 at bucket_cache_drain+0x134
> #8 0x80c9e302 at zone_drain_wait+0xa2
> #9 0x80ca2bbd at uma_reclaim_locked+0x6d
> #10 0x80ca2af4 at uma_reclaim+0x34
> #11 0x80cc5321 at vm_pageout_worker+0x421
> #12 0x80cc4ee3 at vm_pageout+0x193
> #13 0x809b55be at fork_exit+0x7e
> #14 0x80e1a34e at fork_trampoline+0xe
> Uptime: 5d20h37m16s
> Dumping 16057 out of 65398
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) bt

Could you go to frame 11 and print zone->uz_name and
bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
somehow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
Hmmm, another one. Not sure if this is hardware as it seems different ?



Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x0
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80ca0826
stack pointer   = 0x28:0xfe00bc0f8540
frame pointer   = 0x28:0xfe00bc0f8590
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 33 (dom0)
trap number = 12
panic: page fault
cpuid = 11
time = 1608641071
KDB: stack backtrace:
#0 0x80a3fe85 at kdb_backtrace+0x65
#1 0x809f406b at vpanic+0x17b
#2 0x809f3ee3 at panic+0x43
#3 0x80e3fe71 at trap_fatal+0x391
#4 0x80e3fecf at trap_pfault+0x4f
#5 0x80e3f516 at trap+0x286
#6 0x80e19318 at calltrap+0x8
#7 0x80ca47d4 at bucket_cache_drain+0x134
#8 0x80c9e302 at zone_drain_wait+0xa2
#9 0x80ca2bbd at uma_reclaim_locked+0x6d
#10 0x80ca2af4 at uma_reclaim+0x34
#11 0x80cc5321 at vm_pageout_worker+0x421
#12 0x80cc4ee3 at vm_pageout+0x193
#13 0x809b55be at fork_exit+0x7e
#14 0x80e1a34e at fork_trampoline+0xe
Uptime: 5d20h37m16s
Dumping 16057 out of 65398
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x80ca0826 in slab_free_item (keg=0xf800037fa380,
slab=0xf80de4656fb0, item=) at
/usr/src/sys/vm/uma_core.c:3357
#10 zone_release (zone=, bucket=0xf801c7fd5218,
cnt=) at /usr/src/sys/vm/uma_core.c:3404
#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
#12 bucket_cache_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:915
#13 0x80c9e302 in zone_drain_wait (zone=0xf800037da000,
waitok=1) at /usr/src/sys/vm/uma_core.c:1037
#14 0x80ca2bbd in zone_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:1056
#15 zone_foreach (zfunc=) at /usr/src/sys/vm/uma_core.c:1985
#16 uma_reclaim_locked (kmem_danger=) at
/usr/src/sys/vm/uma_core.c:3737
#17 0x80ca2af4 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:3757
#18 0x80cc5321 in vm_pageout_lowmem () at
/usr/src/sys/vm/vm_pageout.c:1890
#19 vm_pageout_worker (arg=) at
/usr/src/sys/vm/vm_pageout.c:1966
#20 0x80cc4ee3 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2126
#21 0x809b55be in fork_exit (callout=0x80cc4d50
, arg=0x0, frame=0xfe00bc0f8b00) at
/usr/src/sys/kern/kern_fork.c:1080
#22 
(kgdb) bt full
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    td = 
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
    error = 
    coredump = 
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
    once = 
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
    buf = "page fault", '\000' 
    other_cpus = {__bits = {2047, 0, 0, 0}}
    td = 0xf80004964740
    newpanic = 
    bootopt = 
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
    ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xfe00bc0f82c0, reg_save_area = 0xfe00bc0f8260}}
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
    softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27,
ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1}
    code = 
    type = 
    ss = 40
    handled = 
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
    td = 0xf80004964740
    p = 
    eva = 0
    map = 
    ftype = 
    rv = 
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
    ksi = {ksi_link = {tqe_next = 

zfs panic RELENG_12

2020-12-15 Thread mike tancsa
Was doing a backup via zfs send | zfs recv when the box panic'd.  Its a
not so old RELENG_12 box from last week. Any ideas if this is a hardware
issue or a bug ? Its r368493 from last Wednesday. I dont see an ECC
errors logged, so dont think its hardware.

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x823a554b
stack pointer   = 0x28:0xfe0343231000
frame pointer   = 0x28:0xfe03432310c0
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 87427 (zfs)
trap number = 12
panic: page fault
cpuid = 1
time = 1608065221
KDB: stack backtrace:
#0 0x80a3fa05 at kdb_backtrace+0x65
#1 0x809f3beb at vpanic+0x17b
#2 0x809f3a63 at panic+0x43
#3 0x80e400d1 at trap_fatal+0x391
#4 0x80e4012f at trap_pfault+0x4f
#5 0x80e3f776 at trap+0x286
#6 0x80e19568 at calltrap+0x8
#7 0x82393a5e at dmu_object_info+0x1e
#8 0x823983a5 at dmu_recv_stream+0x7b5
#9 0x8244b706 at zfs_ioc_recv+0xac6
#10 0x8244dd3d at zfsdev_ioctl+0x62d
#11 0x808a35e0 at devfs_ioctl+0xb0
#12 0x80f3becb at VOP_IOCTL_APV+0x7b
#13 0x80ad1b0a at vn_ioctl+0x16a
#14 0x808a3bce at devfs_ioctl_f+0x1e
#15 0x80a5d807 at kern_ioctl+0x2b7
#16 0x80a5d4aa at sys_ioctl+0xfa
#17 0x80e40c87 at amd64_syscall+0x387
Uptime: 3d14h59m52s
Dumping 17213 out of 65366
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
    at /usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3805 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f3c43 in vpanic (fmt=, ap=)
    at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3a63 in panic (fmt=)
    at /usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e400d1 in trap_fatal (frame=0xfe0343230f40, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e4012f in trap_pfault (frame=0xfe0343230f40,
    usermode=, signo=, ucode=)
    at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f776 in trap (frame=0xfe0343230f40)
    at /usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x823a554b in dnode_hold_impl (os=0xf805e1d2b800,
    object=, flag=, slots=,
    tag=, dnp=0xfe03432310d8)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1370
#10 0x82393a5e in dmu_object_info (os=0xf80777890070,
    object=18446744071600721588, doi=0xfe03432312e0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:2615
#11 0x823983a5 in receive_read_record (ra=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2821
#12 dmu_recv_stream (drc=0xfe0343231430, fp=,
    voffp=, cleanup_fd=8, action_handlep=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:3203
#13 0x8244b706 in zfs_ioc_recv (zc=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4745
#14 0x8244dd3d in zfsdev_ioctl (dev=,
    zcmd=, arg=, flag=,
    td=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6956
#15 0x808a35e0 in devfs_ioctl (ap=0xfe0343231778)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:797
#16 0x80f3becb in VOP_IOCTL_APV (
    vop=0x816a2fe0 , a=0xfe0343231778)
    at vnode_if.c:1067
#17 0x80ad1b0a in vn_ioctl (fp=0xf8001802b5a0,
    com=, data=0xfe0343231910,
    active_cred=0xf80032214300, td=0x2070)
    at /usr/src/sys/kern/vfs_vnops.c:1508
#18 0x808a3bce in devfs_ioctl_f (fp=0xf80777890070,
    com=18446744071600721588, data=0x824e34ed <.L.str+1>, cred=0x0,
    td=0xf8029885) at /usr/src/sys/fs/devfs/devfs_vnops.c:755
#19 0x80a5d807 in fo_ioctl (fp=0xf8001802b5a0, com=3222821403,
    data=0x824e34ed <.L.str+1>, active_cred=0x0,
    td=0xf8029885) at /usr/src/sys/sys/file.h:337
#20 kern_ioctl (td=0x2070, fd=, com=3222821403,
    data=0x824e34ed <.L.str+1> "zrl->zr_mtx")
    at /usr/src/sys/kern/sys_generic.c:805
#21 0x80a5d4aa in sys_ioctl (td=0xf8029885,
    uap=0xf802988503c0) at /usr/src/sys/kern/sys_generic.c:713
#22 0x80e40c87 in syscallenter (td=0xf8029885)
    at