Re: ZFS Panics Still

2023-09-11 Thread Cy Schubert
On Tue, 12 Sep 2023 05:29:41 +0100
Graham Perrin  wrote:

> On 12/09/2023 00:17, Cy Schubert wrote:
> 
> > … poudriere …  
> 
> > panic: vm_page_dequeue_deferred: page 0xfe000b7e9748 has unexpected 
> > queue state
> > …  
>  is for arm64. 
> Should we broaden the hardware field, there?

Probably.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0



Re: ZFS Panics Still

2023-09-11 Thread Graham Perrin

On 12/09/2023 00:17, Cy Schubert wrote:


… poudriere …



panic: vm_page_dequeue_deferred: page 0xfe000b7e9748 has unexpected queue 
state
…
 is for arm64. 
Should we broaden the hardware field, there?

Re: aarch64 main [so: 15] panic's in kyua's sys/net/if_lagg_test:status_stress [confirmed with snapshot kernel]

2023-09-11 Thread Mark Millard
On Sep 11, 2023, at 19:40, Mark Millard  wrote:

> On Sep 11, 2023, at 01:13, Mark Millard  wrote:
> 
>> It will be some time before I can try this with
>> an official snapshot instead of a personal build.
>> The build is based on b6ce41118bb1 :
>> 
>> # uname -apKU
>> FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 150 
>> #17 main-n265279-b6ce41118bb1-dirty: Sun Sep 10 14:36:47 PDT 2023 
>> root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C
>>  arm64 aarch64 150 150
>> 
>> So it was a non-debug build, although I do not
>> strip symbols and such in my builds.
>> 
>> . . .
>> sys/net/if_lagg_test:create  ->  passed  [0.105s]
>> sys/net/if_lagg_test:create_destroy_stress  ->  skipped: Skipping this test 
>> because it easily panics the machine  [0.019s]
>> sys/net/if_lagg_test:lacp_linkstate_destroy_stress  ->  passed  [60.045s]
>> sys/net/if_lagg_test:set_ether  ->  passed  [0.066s]
>> sys/net/if_lagg_test:status_stress  ->  
>> 
>> The core.txt.5 is not great, unfortunately:
>> 
>> panic: vm_fault failed: 0x006b96dc error 1
>> 
>> GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD]
>> . . .
>> Reading symbols from /boot/kernel/kernel...
>> Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
>> 
>> Unread portion of the kernel message buffer:
>> (dump_iface + 0x2c0)
>> elr: 0x006b96dc (dump_sa + 0x1c)
>> spsr: 0x00400045
>> far: 0x44572d4338374144
>> esr: 0x9604
>> panic: vm_fault failed: 0x006b96dc error 1
>> cpuid = 2
>> time = 1694414226
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x1a0
>> panic() at panic+0x44
>> data_abort() at data_abort+0x304
>> handle_el1h_sync() at handle_el1h_sync+0x14
>> --- exception, esr 0x9604
>> dump_sa() at dump_sa+0x1c
>> dump_iface() at dump_iface+0x2bc
>> dump_cb() at dump_cb+0x18
>> if_foreach_sleep() at if_foreach_sleep+0x244
>> rtnl_handle_getlink() at rtnl_handle_getlink+0xec
>> rtnl_handle_message() at rtnl_handle_message+0x19c
>> nl_taskqueue_handler() at nl_taskqueue_handler+0x674
>> taskqueue_run_locked() at taskqueue_run_locked+0x194
>> taskqueue_thread_loop() at taskqueue_thread_loop+0xcc
>> fork_exit() at fork_exit+0x88
>> fork_trampoline() at fork_trampoline+0x14
>> KDB: enter: panic
>> 
>> get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
>> 77  __asm __volatile("ldr   %0, [x18]" : "="(td));
>> (kgdb) #0  get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
>> #1  doadump (textdump=0, textdump@entry=4003518992)
>>   at /usr/main-src/sys/kern/kern_shutdown.c:405
>> #2  0x000f7704 in db_dump (dummy=,  
>> dummy2=, dummy3=, dummy4=)
>>   at /usr/main-src/sys/ddb/db_command.c:591
>> #3  0x000f74e0 in db_command (last_cmdp=,  
>> cmd_table=, dopager=true)
>>   at /usr/main-src/sys/ddb/db_command.c:504
>> #4  0x000f71b8 in db_command_loop ()
>>   at /usr/main-src/sys/ddb/db_command.c:551
>> #5  0x000fad9c in db_trap (type=, code=> out>)
>>   at /usr/main-src/sys/ddb/db_main.c:268
>> #6  0x004f4ec4 in kdb_trap (type=60, code=0, tf=)
>>   at /usr/main-src/sys/kern/subr_kdb.c:790
>> #7  
>> #8  
>> #9  
>> #10 
>> #11 
>> #12 
>> #13 
>> #14 
>> #15 
>> #16 
>> #17 
>> #18 
>> #19 
>> #20 
>> #21 
>> #22 
>> Backtrace stopped: Cannot access memory at address 0x10
>> (kgdb) 
>> 
>> 
>> So some transcribing of a picture in order to
>> show register values that were reported:
>> 
>> Fatal data abort:
>>   x0: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
>>   x1: 0x0001
>>   x2: 0x44572d4338374143
>>   x3: 0x005d3f90 (ifdead_ioctl + 0x0)
>>   x4: 0xa00b7f0d185e
>>   x5: 0xa0023fe4b992
>>   x6: 0x6767616c
>>   x7: 0x00706174016f7575
>>   x8: 0x01a4
>>   x9: 0x00210005
>>  x10: 0×0800
>>  x11: 0xfefefefefefefeff
>>  x12: 0x0008
>>  x13: 0x
>>  x14: 0x00ff
>>  x15: 0x0700
>>  x16: 0x0008
>>  x17: 0x0007
>>  x18: 0x0001eea0e500 (_DYNAMIC + 0x6d816358)
>>  x19: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
>>  x20: 0xa00b7f0d1800
>>  x21: 0xa00b7f0d1858
>>  x22: 0x000c
>>  x23: 0X0005
>>  x24: 0×
>>  x25: 0x00c68000 (sysctl___kern_features_netlink + 0x10)
>>  x26: 0x
>>  x27: 0x00ce9000 (cap_linkat_source_rights + 0x8)
>>  x28: 0x006bb0a0 (dump_cb + 0x0)
>>  x29: 0x0001eea0e520 (_DYNAMIC + 0x6d816378)
>>   sp: 0x0001eea0e500
>>   lr: 0x006b8fe0 (dump_iface + 0x2c0)
>>  elr: 0x006b96dc (dump_sa + 0x1c)
>> spsr: 0x00400045
>>  far: 0x44572d4338374144
>>  esr: 0x9604
>> panic: m_fault failed: 0x006b96dc error 1
>> 
>> I expect that this is similar to reports I'd made
>> back in 

Re: aarch64 main [so: 15] panic's in kyua's sys/net/if_lagg_test:status_stress [confirmed with snapshot kernel]

2023-09-11 Thread Mark Millard
On Sep 11, 2023, at 01:13, Mark Millard  wrote:

> It will be some time before I can try this with
> an official snapshot instead of a personal build.
> The build is based on b6ce41118bb1 :
> 
> # uname -apKU
> FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 150 #17 
> main-n265279-b6ce41118bb1-dirty: Sun Sep 10 14:36:47 PDT 2023 
> root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C
>  arm64 aarch64 150 150
> 
> So it was a non-debug build, although I do not
> strip symbols and such in my builds.
> 
> . . .
> sys/net/if_lagg_test:create  ->  passed  [0.105s]
> sys/net/if_lagg_test:create_destroy_stress  ->  skipped: Skipping this test 
> because it easily panics the machine  [0.019s]
> sys/net/if_lagg_test:lacp_linkstate_destroy_stress  ->  passed  [60.045s]
> sys/net/if_lagg_test:set_ether  ->  passed  [0.066s]
> sys/net/if_lagg_test:status_stress  ->  
> 
> The core.txt.5 is not great, unfortunately:
> 
> panic: vm_fault failed: 0x006b96dc error 1
> 
> GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD]
> . . .
> Reading symbols from /boot/kernel/kernel...
> Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
> 
> Unread portion of the kernel message buffer:
> (dump_iface + 0x2c0)
> elr: 0x006b96dc (dump_sa + 0x1c)
> spsr: 0x00400045
> far: 0x44572d4338374144
> esr: 0x9604
> panic: vm_fault failed: 0x006b96dc error 1
> cpuid = 2
> time = 1694414226
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x1a0
> panic() at panic+0x44
> data_abort() at data_abort+0x304
> handle_el1h_sync() at handle_el1h_sync+0x14
> --- exception, esr 0x9604
> dump_sa() at dump_sa+0x1c
> dump_iface() at dump_iface+0x2bc
> dump_cb() at dump_cb+0x18
> if_foreach_sleep() at if_foreach_sleep+0x244
> rtnl_handle_getlink() at rtnl_handle_getlink+0xec
> rtnl_handle_message() at rtnl_handle_message+0x19c
> nl_taskqueue_handler() at nl_taskqueue_handler+0x674
> taskqueue_run_locked() at taskqueue_run_locked+0x194
> taskqueue_thread_loop() at taskqueue_thread_loop+0xcc
> fork_exit() at fork_exit+0x88
> fork_trampoline() at fork_trampoline+0x14
> KDB: enter: panic
> 
> get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
> 77  __asm __volatile("ldr   %0, [x18]" : "="(td));
> (kgdb) #0  get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
> #1  doadump (textdump=0, textdump@entry=4003518992)
>at /usr/main-src/sys/kern/kern_shutdown.c:405
> #2  0x000f7704 in db_dump (dummy=,  
> dummy2=, dummy3=, dummy4=)
>at /usr/main-src/sys/ddb/db_command.c:591
> #3  0x000f74e0 in db_command (last_cmdp=,  
> cmd_table=, dopager=true)
>at /usr/main-src/sys/ddb/db_command.c:504
> #4  0x000f71b8 in db_command_loop ()
>at /usr/main-src/sys/ddb/db_command.c:551
> #5  0x000fad9c in db_trap (type=, code=)
>at /usr/main-src/sys/ddb/db_main.c:268
> #6  0x004f4ec4 in kdb_trap (type=60, code=0, tf=)
>at /usr/main-src/sys/kern/subr_kdb.c:790
> #7  
> #8  
> #9  
> #10 
> #11 
> #12 
> #13 
> #14 
> #15 
> #16 
> #17 
> #18 
> #19 
> #20 
> #21 
> #22 
> Backtrace stopped: Cannot access memory at address 0x10
> (kgdb) 
> 
> 
> So some transcribing of a picture in order to
> show register values that were reported:
> 
> Fatal data abort:
>x0: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
>x1: 0x0001
>x2: 0x44572d4338374143
>x3: 0x005d3f90 (ifdead_ioctl + 0x0)
>x4: 0xa00b7f0d185e
>x5: 0xa0023fe4b992
>x6: 0x6767616c
>x7: 0x00706174016f7575
>x8: 0x01a4
>x9: 0x00210005
>   x10: 0×0800
>   x11: 0xfefefefefefefeff
>   x12: 0x0008
>   x13: 0x
>   x14: 0x00ff
>   x15: 0x0700
>   x16: 0x0008
>   x17: 0x0007
>   x18: 0x0001eea0e500 (_DYNAMIC + 0x6d816358)
>   x19: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
>   x20: 0xa00b7f0d1800
>   x21: 0xa00b7f0d1858
>   x22: 0x000c
>   x23: 0X0005
>   x24: 0×
>   x25: 0x00c68000 (sysctl___kern_features_netlink + 0x10)
>   x26: 0x
>   x27: 0x00ce9000 (cap_linkat_source_rights + 0x8)
>   x28: 0x006bb0a0 (dump_cb + 0x0)
>   x29: 0x0001eea0e520 (_DYNAMIC + 0x6d816378)
>sp: 0x0001eea0e500
>lr: 0x006b8fe0 (dump_iface + 0x2c0)
>   elr: 0x006b96dc (dump_sa + 0x1c)
>  spsr: 0x00400045
>   far: 0x44572d4338374144
>   esr: 0x9604
> panic: m_fault failed: 0x006b96dc error 1
> 
> I expect that this is similar to reports I'd made
> back in 14.0-CURRENT days. As I remember, snapshot
> builds of the time also got the panic.
> 
> I will note that an earlier 14.0-BETA1 snapshot
> kernel test run did not 

ZFS Panics Still

2023-09-11 Thread Cy Schubert
Hi,

One of my machines, running poudriere building i386 packges, panics, below,
at termination of poudriere while poudriere is cleaning up performing zfs 
umount.
I just happenedto catch the machine to see poudriere just having completed and
umounting filesystems.

panic: vm_page_dequeue_deferred: page 0xfe000b7e9748 has unexpected queue 
state
cpuid = 1
time = 1694472686
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00bebac7d0
vpanic() at vpanic+0x132/frame 0xfe00bebac900
panic() at panic+0x43/frame 0xfe00bebac960
vm_page_dequeue_deferred() at vm_page_dequeue_deferred+0xb2/frame 
0xfe00bebac970
vm_page_free_prep() at vm_page_free_prep+0x11b/frame 0xfe00bebac990
vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfe00bebac9c0
vm_object_page_remove() at vm_object_page_remove+0xb6/frame 0xfe00bebaca20
bufobj_invalbuf() at bufobj_invalbuf+0x198/frame 0xfe00bebaca80
vgonel() at vgonel+0x2ce/frame 0xfe00bebacaf0
vflush() at vflush+0x3ad/frame 0xfe00bebacc40
zfs_umount() at zfs_umount+0xca/frame 0xfe00bebacc80
dounmount() at dounmount+0x7b5/frame 0xfe00bebaccf0
kern_unmount() at kern_unmount+0x2eb/frame 0xfe00bebace00
amd64_syscall() at amd64_syscall+0x138/frame 0xfe00bebacf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00bebacf30
--- syscall (22, FreeBSD ELF64, unmount), rip = 0x36bbc3e23dba, rsp = 
0x364f08e8, rbp = 0x364f08f0 ---
Uptime: 46m16s
Dumping 3452 out of 7996 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu,
(kgdb) bt
#0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1)
at /opt/src/git-src/sys/kern/kern_shutdown.c:405
#2  0x806c20b0 in kern_reboot (howto=260)
at /opt/src/git-src/sys/kern/kern_shutdown.c:526
#3  0x806c25af in vpanic (
fmt=0x80b5e2e6 "%s: page %p has unexpected queue state", 
ap=ap@entry=0xfe00bebac940)
at /opt/src/git-src/sys/kern/kern_shutdown.c:970
#4  0x806c2353 in panic (fmt=)
at /opt/src/git-src/sys/kern/kern_shutdown.c:894
#5  0x809db0c2 in vm_page_dequeue_deferred (m=, 
m@entry=0xfe000b7e9748) at /opt/src/git-src/sys/vm/vm_page.c:3790
#6  0x809de5fb in vm_page_free_prep (m=m@entry=0xfe000b7e9748)
at /opt/src/git-src/sys/vm/vm_page.c:3928
#7  0x809d6162 in vm_page_free_toq (m=, 
m@entry=0xfe000b7e9748) at /opt/src/git-src/sys/vm/vm_page.c:3970
#8  0x809d614b in vm_page_free (m=, 
m@entry=0xfe000b7e9748) at /opt/src/git-src/sys/vm/vm_page.c:1328
#9  0x809d0f16 in vm_object_page_remove (object=0xf8013edc0108, 
start=0, end=0, options=1) at /opt/src/git-src/sys/vm/vm_object.c:2157
#10 0x807b6a28 in bufobj_invalbuf (bo=0xf800665c50e0, flags=1, 
slpflag=slpflag@entry=0, slptimeo=slptimeo@entry=0)
--Type  for more, q to quit, c to continue without paging--c
at /opt/src/git-src/sys/kern/vfs_subr.c:2156
#11 0x807ba2ee in vgonel (vp=vp@entry=0xf800665c5000)
at /opt/src/git-src/sys/kern/vfs_subr.c:2187
#12 0x807b999d in vflush (mp=mp@entry=0xfe00da21cb00, 
rootrefs=rootrefs@entry=0, flags=flags@entry=2, 
td=td@entry=0xfe00c4610720)
at /opt/src/git-src/sys/kern/vfs_subr.c:3939
#13 0x83871bea in zfs_umount (vfsp=0xfe00da21cb00, 
vfsp@entry=, 
fflag=, 
fflag@entry=)
at 
/opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1708
#14 0x807ae405 in dounmount (mp=, 
mp@entry=0xfe00da21cb00, flags=flags@entry=524288, 
td=td@entry=0xfe00c4610720)
at /opt/src/git-src/sys/kern/vfs_mount.c:2327
#15 0x807adbeb in kern_unmount (td=0xfe00c4610720, 
path=0x5a7647a1a3c0 , flags=524288) at /opt/src/git-src/sys/kern/vfs_mount.c:1785
#16 0x80a741b8 in syscallenter (td=)
at /opt/src/git-src/sys/amd64/amd64/../../kern/subr_syscall.c:187
#17 amd64_syscall (td=0xfe00c4610720, traced=0)
at /opt/src/git-src/sys/amd64/amd64/trap.c:1197
#18 
#19 0x36bbc3e23dba in ?? ()
Backtrace stopped: Cannot access memory at address 0x364f08e8

At frame 6 *m contains:

(kgdb) p *m
$5 = {plinks = {q = {tqe_next = 0x, 
  tqe_prev = 0x}, s = {ss = {
sle_next = 0x}}, memguard = {p = 18446744073709551615, 
  v = 18446744073709551615}, uma = {slab = 0x, 
  zone = 0x}}, listq = {tqe_next = 0x, 
tqe_prev = 0x}, object = 0x0, pindex = 14685, 
  phys_addr = 7595216896, md = {pv_list = {tqh_first = 0x0, 
  tqh_last = 0xfe000b7e9780}, pv_gen = 20, pat_mode = 6}, 
  ref_count = 0, busy_lock = 4294967294, a = {{flags = 16, queue = 255 '\377', 
  

Re: kernel trap 12 .. cam_periph_release_locked_buses() panics under panic?

2023-09-11 Thread Warner Losh
On Mon, Sep 11, 2023 at 8:26 AM Bjoern A. Zeeb <
bzeeb-li...@lists.zabbadoz.net> wrote:

> On Mon, 11 Sep 2023, Warner Losh wrote:
>
> > That's a crazy traceback. We get a fatal trap and then call into the wifi
> > stack? That makes no sense in the absence of some crazy data corruption
> or
> > a weird traceback issue.
>
> No, we panic in wifi and then iterated again and again.
> The first one is the lkpi_sta_auth_to_scan() panic.
>

Ah. OK. I don't think there's anything in cam_periph_release_locked_buses
that could cause this... but if you get a dump I can help look at it.

Warner


> > On Mon, Sep 11, 2023, 7:47 AM Bjoern A. Zeeb <
> bzeeb-li...@lists.zabbadoz.net>
> > wrote:
> >
> >> Hi,
> >>
> >> had a kernel hitting an alll-to-known wifi issue and panic (I was
> actually
> >> happy I could reproduce) and then the screen kept scrolling for a while
> >> panicing all over again and ddb was unusable (not so happy).
> >>
> >> I assume the problem is cam_periph_release_locked_buses()?
> >>
> >
> > Unlikely given the rest of the traceback
> >
> > Can you get a core so we can look at it more deeply?
>
> No, after  iterations. ddb gave up and stopped and power cycle was
> the only thing I could still do.
>
>
>
> >> /bz
> >>
> >> ...
> >> --- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8,
> rbp =
> >> 0xfe00907fa4a0 ---
> >> cam_periph_release_locked_buses() at
> >> cam_periph_release_locked_buses+0x43/frame 0xfe00907fa4a0
> >> kernel trap 12 with interrupts disabled
> >>
> >>
> >> Fatal trap 12: page fault while in kernel mode
> >> cpuid = 2; apic id = 02
> >> fault virtual address   = 0xfe00907fa4a8
> >> fault code  = supervisor read data, page not present
> >> instruction pointer = 0x20:0x8101f660
> >> stack pointer   = 0x0:0xfe00907f8f90
> >> frame pointer   = 0x0:0xfe00907f9020
> >> code segment= base 0x0, limit 0xf, type 0x1b
> >>  = DPL 0, pres 1, long 1, def32 0, gran 1
> >> processor eflags= resume, IOPL = 0
> >> current process = 0 (iwlwifi0 net80211 t)
> >> rdi: fe00907f8f90 rsi: 0008 rdx: fe00907fa4a8
> >> rcx: fe00907f9030  r8:   r9: 
> >> rax:  rbx: fe00907f90f0 rbp: fe00907f9020
> >> r10:  r11:  r12: fe00907fa4a8
> >> r13: 0008 r14:  r15: fe00907f9030
> >> trap number = 12
> >> panic: page fault
> >> cpuid = 2
> >> time = 1694439681
> >> KDB: stack backtrace:
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >> 0xfe00907f8c60
> >> vpanic() at vpanic+0x132/frame 0xfe00907f8d90
> >> panic() at panic+0x43/frame 0xfe00907f8df0
> >> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f8e50
> >> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f8ec0
> >> calltrap() at calltrap+0x8/frame 0xfe00907f8ec0
> >> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f8f90, rbp =
> >> 0xfe00907f9020 ---
> >> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9020
> >> db_get_value() at db_get_value+0x31/frame 0xfe00907f9060
> >> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f90e0
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >> 0xfe00907f9160
> >> vpanic() at vpanic+0x132/frame 0xfe00907f9290
> >> panic() at panic+0x43/frame 0xfe00907f92f0
> >> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9350
> >> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f93c0
> >> calltrap() at calltrap+0x8/frame 0xfe00907f93c0
> >> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9490, rbp =
> >> 0xfe00907f9520 ---
> >> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9520
> >> db_get_value() at db_get_value+0x31/frame 0xfe00907f9560
> >> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f95e0
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >> 0xfe00907f9660
> >> vpanic() at vpanic+0x132/frame 0xfe00907f9790
> >> panic() at panic+0x43/frame 0xfe00907f97f0
> >> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9850
> >> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f98c0
> >> calltrap() at calltrap+0x8/frame 0xfe00907f98c0
> >> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9990, rbp =
> >> 0xfe00907f9a20 ---
> >> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9a20
> >> db_get_value() at db_get_value+0x31/frame 0xfe00907f9a60
> >> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f9ae0
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >> 0xfe00907f9b60
> >> vpanic() at vpanic+0x132/frame 0xfe00907f9c90
> >> panic() at panic+0x43/frame 0xfe00907f9cf0
> >> lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x388/frame
> >> 0xfe00907f9d70
> >> lkpi_iv_newstate() at 

Re: kernel trap 12 .. cam_periph_release_locked_buses() panics under panic?

2023-09-11 Thread Bjoern A. Zeeb

On Mon, 11 Sep 2023, Warner Losh wrote:


That's a crazy traceback. We get a fatal trap and then call into the wifi
stack? That makes no sense in the absence of some crazy data corruption or
a weird traceback issue.


No, we panic in wifi and then iterated again and again.
The first one is the lkpi_sta_auth_to_scan() panic.



On Mon, Sep 11, 2023, 7:47 AM Bjoern A. Zeeb 
wrote:


Hi,

had a kernel hitting an alll-to-known wifi issue and panic (I was actually
happy I could reproduce) and then the screen kept scrolling for a while
panicing all over again and ddb was unusable (not so happy).

I assume the problem is cam_periph_release_locked_buses()?



Unlikely given the rest of the traceback

Can you get a core so we can look at it more deeply?


No, after  iterations. ddb gave up and stopped and power cycle was
the only thing I could still do.




/bz

...
--- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp =
0xfe00907fa4a0 ---
cam_periph_release_locked_buses() at
cam_periph_release_locked_buses+0x43/frame 0xfe00907fa4a0
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0xfe00907fa4a8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8101f660
stack pointer   = 0x0:0xfe00907f8f90
frame pointer   = 0x0:0xfe00907f9020
code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 0 (iwlwifi0 net80211 t)
rdi: fe00907f8f90 rsi: 0008 rdx: fe00907fa4a8
rcx: fe00907f9030  r8:   r9: 
rax:  rbx: fe00907f90f0 rbp: fe00907f9020
r10:  r11:  r12: fe00907fa4a8
r13: 0008 r14:  r15: fe00907f9030
trap number = 12
panic: page fault
cpuid = 2
time = 1694439681
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe00907f8c60
vpanic() at vpanic+0x132/frame 0xfe00907f8d90
panic() at panic+0x43/frame 0xfe00907f8df0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f8e50
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f8ec0
calltrap() at calltrap+0x8/frame 0xfe00907f8ec0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f8f90, rbp =
0xfe00907f9020 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9020
db_get_value() at db_get_value+0x31/frame 0xfe00907f9060
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f90e0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe00907f9160
vpanic() at vpanic+0x132/frame 0xfe00907f9290
panic() at panic+0x43/frame 0xfe00907f92f0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9350
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f93c0
calltrap() at calltrap+0x8/frame 0xfe00907f93c0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9490, rbp =
0xfe00907f9520 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9520
db_get_value() at db_get_value+0x31/frame 0xfe00907f9560
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f95e0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe00907f9660
vpanic() at vpanic+0x132/frame 0xfe00907f9790
panic() at panic+0x43/frame 0xfe00907f97f0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9850
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f98c0
calltrap() at calltrap+0x8/frame 0xfe00907f98c0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9990, rbp =
0xfe00907f9a20 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9a20
db_get_value() at db_get_value+0x31/frame 0xfe00907f9a60
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f9ae0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe00907f9b60
vpanic() at vpanic+0x132/frame 0xfe00907f9c90
panic() at panic+0x43/frame 0xfe00907f9cf0
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x388/frame
0xfe00907f9d70
lkpi_iv_newstate() at lkpi_iv_newstate+0x2eb/frame 0xfe00907f9df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame
0xfe00907f9e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame
0xfe00907f9ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame
0xfe00907f9ef0
fork_exit() at fork_exit+0x82/frame 0xfe00907f9f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00907f9f30
--- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp =
0xfe00907fa4a0 ---
cam_periph_release_locked_buses() at
cam_periph_release_locked_buses+0x43/frame 0xfe00907fa4a0
kernel trap 12 with interrupts disabled
...

--
Bjoern A. Zeeb r15:7






--
Bjoern A. Zeeb   

Re: kernel trap 12 .. cam_periph_release_locked_buses() panics under panic?

2023-09-11 Thread Warner Losh
That's a crazy traceback. We get a fatal trap and then call into the wifi
stack? That makes no sense in the absence of some crazy data corruption or
a weird traceback issue.

On Mon, Sep 11, 2023, 7:47 AM Bjoern A. Zeeb 
wrote:

> Hi,
>
> had a kernel hitting an alll-to-known wifi issue and panic (I was actually
> happy I could reproduce) and then the screen kept scrolling for a while
> panicing all over again and ddb was unusable (not so happy).
>
> I assume the problem is cam_periph_release_locked_buses()?
>

Unlikely given the rest of the traceback

Can you get a core so we can look at it more deeply?

Warner


> /bz
>
> ...
> --- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp =
> 0xfe00907fa4a0 ---
> cam_periph_release_locked_buses() at
> cam_periph_release_locked_buses+0x43/frame 0xfe00907fa4a0
> kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 02
> fault virtual address   = 0xfe00907fa4a8
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x8101f660
> stack pointer   = 0x0:0xfe00907f8f90
> frame pointer   = 0x0:0xfe00907f9020
> code segment= base 0x0, limit 0xf, type 0x1b
>  = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process = 0 (iwlwifi0 net80211 t)
> rdi: fe00907f8f90 rsi: 0008 rdx: fe00907fa4a8
> rcx: fe00907f9030  r8:   r9: 
> rax:  rbx: fe00907f90f0 rbp: fe00907f9020
> r10:  r11:  r12: fe00907fa4a8
> r13: 0008 r14:  r15: fe00907f9030
> trap number = 12
> panic: page fault
> cpuid = 2
> time = 1694439681
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00907f8c60
> vpanic() at vpanic+0x132/frame 0xfe00907f8d90
> panic() at panic+0x43/frame 0xfe00907f8df0
> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f8e50
> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f8ec0
> calltrap() at calltrap+0x8/frame 0xfe00907f8ec0
> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f8f90, rbp =
> 0xfe00907f9020 ---
> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9020
> db_get_value() at db_get_value+0x31/frame 0xfe00907f9060
> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f90e0
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00907f9160
> vpanic() at vpanic+0x132/frame 0xfe00907f9290
> panic() at panic+0x43/frame 0xfe00907f92f0
> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9350
> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f93c0
> calltrap() at calltrap+0x8/frame 0xfe00907f93c0
> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9490, rbp =
> 0xfe00907f9520 ---
> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9520
> db_get_value() at db_get_value+0x31/frame 0xfe00907f9560
> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f95e0
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00907f9660
> vpanic() at vpanic+0x132/frame 0xfe00907f9790
> panic() at panic+0x43/frame 0xfe00907f97f0
> trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9850
> trap_pfault() at trap_pfault+0xae/frame 0xfe00907f98c0
> calltrap() at calltrap+0x8/frame 0xfe00907f98c0
> --- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9990, rbp =
> 0xfe00907f9a20 ---
> db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9a20
> db_get_value() at db_get_value+0x31/frame 0xfe00907f9a60
> db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f9ae0
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00907f9b60
> vpanic() at vpanic+0x132/frame 0xfe00907f9c90
> panic() at panic+0x43/frame 0xfe00907f9cf0
> lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x388/frame
> 0xfe00907f9d70
> lkpi_iv_newstate() at lkpi_iv_newstate+0x2eb/frame 0xfe00907f9df0
> ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame
> 0xfe00907f9e40
> taskqueue_run_locked() at taskqueue_run_locked+0xab/frame
> 0xfe00907f9ec0
> taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame
> 0xfe00907f9ef0
> fork_exit() at fork_exit+0x82/frame 0xfe00907f9f30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00907f9f30
> --- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp =
> 0xfe00907fa4a0 ---
> cam_periph_release_locked_buses() at
> cam_periph_release_locked_buses+0x43/frame 0xfe00907fa4a0
> kernel trap 12 with interrupts disabled
> ...
>
> --
> Bjoern A. Zeeb r15:7
>
>


kernel trap 12 .. cam_periph_release_locked_buses() panics under panic?

2023-09-11 Thread Bjoern A. Zeeb

Hi,

had a kernel hitting an alll-to-known wifi issue and panic (I was actually
happy I could reproduce) and then the screen kept scrolling for a while
panicing all over again and ddb was unusable (not so happy).

I assume the problem is cam_periph_release_locked_buses()?

/bz

...
--- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp = 
0xfe00907fa4a0 ---
cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x43/frame 
0xfe00907fa4a0
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0xfe00907fa4a8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8101f660
stack pointer   = 0x0:0xfe00907f8f90
frame pointer   = 0x0:0xfe00907f9020
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 0 (iwlwifi0 net80211 t)
rdi: fe00907f8f90 rsi: 0008 rdx: fe00907fa4a8
rcx: fe00907f9030  r8:   r9: 
rax:  rbx: fe00907f90f0 rbp: fe00907f9020
r10:  r11:  r12: fe00907fa4a8
r13: 0008 r14:  r15: fe00907f9030
trap number = 12
panic: page fault
cpuid = 2
time = 1694439681
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00907f8c60
vpanic() at vpanic+0x132/frame 0xfe00907f8d90
panic() at panic+0x43/frame 0xfe00907f8df0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f8e50
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f8ec0
calltrap() at calltrap+0x8/frame 0xfe00907f8ec0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f8f90, rbp = 
0xfe00907f9020 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9020
db_get_value() at db_get_value+0x31/frame 0xfe00907f9060
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f90e0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00907f9160
vpanic() at vpanic+0x132/frame 0xfe00907f9290
panic() at panic+0x43/frame 0xfe00907f92f0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9350
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f93c0
calltrap() at calltrap+0x8/frame 0xfe00907f93c0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9490, rbp = 
0xfe00907f9520 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9520
db_get_value() at db_get_value+0x31/frame 0xfe00907f9560
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f95e0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00907f9660
vpanic() at vpanic+0x132/frame 0xfe00907f9790
panic() at panic+0x43/frame 0xfe00907f97f0
trap_fatal() at trap_fatal+0x40c/frame 0xfe00907f9850
trap_pfault() at trap_pfault+0xae/frame 0xfe00907f98c0
calltrap() at calltrap+0x8/frame 0xfe00907f98c0
--- trap 0xc, rip = 0x8101f660, rsp = 0xfe00907f9990, rbp = 
0xfe00907f9a20 ---
db_read_bytes() at db_read_bytes+0xa0/frame 0xfe00907f9a20
db_get_value() at db_get_value+0x31/frame 0xfe00907f9a60
db_backtrace() at db_backtrace+0x1d9/frame 0xfe00907f9ae0
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00907f9b60
vpanic() at vpanic+0x132/frame 0xfe00907f9c90
panic() at panic+0x43/frame 0xfe00907f9cf0
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x388/frame 0xfe00907f9d70
lkpi_iv_newstate() at lkpi_iv_newstate+0x2eb/frame 0xfe00907f9df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfe00907f9e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfe00907f9ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfe00907f9ef0
fork_exit() at fork_exit+0x82/frame 0xfe00907f9f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00907f9f30
--- trap 0x80bc1f07, rip = 0x80381e83, rsp = 0x3d7bb6db69f8, rbp = 
0xfe00907fa4a0 ---
cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x43/frame 
0xfe00907fa4a0
kernel trap 12 with interrupts disabled
...

--
Bjoern A. Zeeb r15:7



Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-11 Thread Mark Millard
On Sep 11, 2023, at 00:03, Mark Millard  wrote:

> On Sep 10, 2023, at 23:57, Dag-Erling Smørgrav  wrote:
> 
>> Mark Millard  writes:
>>> I'm not aware of there being other documentation for what
>>> is appropriate for setting up such for kyua runs.
>> 
>> https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84
>> 
> 
> Thanks for the reference that does not involve looking at
> CI log files. Filed away for future references.
> 
> 
> Side note . . .
> 
> Turns out that tcptestsuite does not build for aarch64
> do to alignment problems via packing in net/packetdrill :
> 
> In file included from run_packet.c:45:
> In file included from ./tcp_options_iterator.h:31:
> ./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less 
> aligned than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is 
> usually due to 'struct tcp_option' being packed, which can lead to unaligned 
> accesses [-Werror,-Wunaligned-access]
>   union {
>   ^
> --- sctp_iterator.o ---
> cc  -O2 -pipe -mcpu=cortex-a7

Looks like I messed up and reported an armv7 context.
aarch64 built net/packetdrill and net/tcptestsuite just
fine. Sorry for the noise.

>  -Wno-deprecated -g -fstack-protector-strong -fno-strict-aliasing  
> -mcpu=cortex-a7 -Wall -Werror -g -c sctp_iterator.c -o sctp_iterator.o
> --- tcp_options.o ---
> cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
> -fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c tcp_options.c -o 
> tcp_options.o
> --- run_packet.o ---
> 1 error generated.
> *** [run_packet.o] Error code 1
> 
> make[1]: stopped in 
> /wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
> --- tcp_options.o ---
> In file included from tcp_options.c:25:
> ./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less 
> aligned than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is 
> usually due to 'struct tcp_option' being packed, which can lead to unaligned 
> accesses [-Werror,-Wunaligned-access]
>   union {
>   ^
> 1 error generated.
> *** [tcp_options.o] Error code 1
> 
> make[1]: stopped in 
> /wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
> 2 errors
> 



===
Mark Millard
marklmi at yahoo.com




aarch64 main [so: 15] panic's in kyua's sys/net/if_lagg_test:status_stress

2023-09-11 Thread Mark Millard
It will be some time before I can try this with
an official snapshot instead of a personal build.
The build is based on b6ce41118bb1 :

# uname -apKU
FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 150 #17 
main-n265279-b6ce41118bb1-dirty: Sun Sep 10 14:36:47 PDT 2023 
root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C
 arm64 aarch64 150 150

So it was a non-debug build, although I do not
strip symbols and such in my builds.

. . .
sys/net/if_lagg_test:create  ->  passed  [0.105s]
sys/net/if_lagg_test:create_destroy_stress  ->  skipped: Skipping this test 
because it easily panics the machine  [0.019s]
sys/net/if_lagg_test:lacp_linkstate_destroy_stress  ->  passed  [60.045s]
sys/net/if_lagg_test:set_ether  ->  passed  [0.066s]
sys/net/if_lagg_test:status_stress  ->  

The core.txt.5 is not great, unfortunately:

panic: vm_fault failed: 0x006b96dc error 1

GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD]
. . .
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
 (dump_iface + 0x2c0)
 elr: 0x006b96dc (dump_sa + 0x1c)
spsr: 0x00400045
 far: 0x44572d4338374144
 esr: 0x9604
panic: vm_fault failed: 0x006b96dc error 1
cpuid = 2
time = 1694414226
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x1a0
panic() at panic+0x44
data_abort() at data_abort+0x304
handle_el1h_sync() at handle_el1h_sync+0x14
--- exception, esr 0x9604
dump_sa() at dump_sa+0x1c
dump_iface() at dump_iface+0x2bc
dump_cb() at dump_cb+0x18
if_foreach_sleep() at if_foreach_sleep+0x244
rtnl_handle_getlink() at rtnl_handle_getlink+0xec
rtnl_handle_message() at rtnl_handle_message+0x19c
nl_taskqueue_handler() at nl_taskqueue_handler+0x674
taskqueue_run_locked() at taskqueue_run_locked+0x194
taskqueue_thread_loop() at taskqueue_thread_loop+0xcc
fork_exit() at fork_exit+0x88
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic

get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
77  __asm __volatile("ldr   %0, [x18]" : "="(td));
(kgdb) #0  get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77
#1  doadump (textdump=0, textdump@entry=4003518992)
at /usr/main-src/sys/kern/kern_shutdown.c:405
#2  0x000f7704 in db_dump (dummy=,  
dummy2=, dummy3=, dummy4=)
at /usr/main-src/sys/ddb/db_command.c:591
#3  0x000f74e0 in db_command (last_cmdp=,  
cmd_table=, dopager=true)
at /usr/main-src/sys/ddb/db_command.c:504
#4  0x000f71b8 in db_command_loop ()
at /usr/main-src/sys/ddb/db_command.c:551
#5  0x000fad9c in db_trap (type=, code=)
at /usr/main-src/sys/ddb/db_main.c:268
#6  0x004f4ec4 in kdb_trap (type=60, code=0, tf=)
at /usr/main-src/sys/kern/subr_kdb.c:790
#7  
#8  
#9  
#10 
#11 
#12 
#13 
#14 
#15 
#16 
#17 
#18 
#19 
#20 
#21 
#22 
Backtrace stopped: Cannot access memory at address 0x10
(kgdb) 


So some transcribing of a picture in order to
show register values that were reported:

Fatal data abort:
x0: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
x1: 0x0001
x2: 0x44572d4338374143
x3: 0x005d3f90 (ifdead_ioctl + 0x0)
x4: 0xa00b7f0d185e
x5: 0xa0023fe4b992
x6: 0x6767616c
x7: 0x00706174016f7575
x8: 0x01a4
x9: 0x00210005
   x10: 0×0800
   x11: 0xfefefefefefefeff
   x12: 0x0008
   x13: 0x
   x14: 0x00ff
   x15: 0x0700
   x16: 0x0008
   x17: 0x0007
   x18: 0x0001eea0e500 (_DYNAMIC + 0x6d816358)
   x19: 0x000leea0e7f0 (_DYNAMIC * 0x6d816648)
   x20: 0xa00b7f0d1800
   x21: 0xa00b7f0d1858
   x22: 0x000c
   x23: 0X0005
   x24: 0×
   x25: 0x00c68000 (sysctl___kern_features_netlink + 0x10)
   x26: 0x
   x27: 0x00ce9000 (cap_linkat_source_rights + 0x8)
   x28: 0x006bb0a0 (dump_cb + 0x0)
   x29: 0x0001eea0e520 (_DYNAMIC + 0x6d816378)
sp: 0x0001eea0e500
lr: 0x006b8fe0 (dump_iface + 0x2c0)
   elr: 0x006b96dc (dump_sa + 0x1c)
  spsr: 0x00400045
   far: 0x44572d4338374144
   esr: 0x9604
panic: m_fault failed: 0x006b96dc error 1

I expect that this is similar to reports I'd made
back in 14.0-CURRENT days. As I remember, snapshot
builds of the time also got the panic.

I will note that an earlier 14.0-BETA1 snapshot
kernel test run did not panic at this point in the
sequence (or at any point). But I do not know how
repeatable the panics are in the various contexts.

I'll note that I've tried to have the various ports
installed (poudriere built) that are listed at:


Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-11 Thread Mark Millard
On Sep 10, 2023, at 23:57, Dag-Erling Smørgrav  wrote:

> Mark Millard  writes:
>> I'm not aware of there being other documentation for what
>> is appropriate for setting up such for kyua runs.
> 
> https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84
> 

Thanks for the reference that does not involve looking at
CI log files. Filed away for future references.


Side note . . .

Turns out that tcptestsuite does not build for aarch64
do to alignment problems via packing in net/packetdrill :

In file included from run_packet.c:45:
In file included from ./tcp_options_iterator.h:31:
./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less aligned 
than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is usually 
due to 'struct tcp_option' being packed, which can lead to unaligned accesses 
[-Werror,-Wunaligned-access]
   union {
   ^
--- sctp_iterator.o ---
cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
-fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c sctp_iterator.c -o 
sctp_iterator.o
--- tcp_options.o ---
cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
-fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c tcp_options.c -o 
tcp_options.o
--- run_packet.o ---
1 error generated.
*** [run_packet.o] Error code 1

make[1]: stopped in 
/wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
--- tcp_options.o ---
In file included from tcp_options.c:25:
./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less aligned 
than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is usually 
due to 'struct tcp_option' being packed, which can lead to unaligned accesses 
[-Werror,-Wunaligned-access]
   union {
   ^
1 error generated.
*** [tcp_options.o] Error code 1

make[1]: stopped in 
/wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
2 errors


===
Mark Millard
marklmi at yahoo.com




Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-11 Thread Dag-Erling Smørgrav
Mark Millard  writes:
> I'm not aware of there being other documentation for what
> is appropriate for setting up such for kyua runs.

https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: sed in CURRENT fails in textproc/jq

2023-09-11 Thread Alexander Leidinger

Am 2023-09-10 18:53, schrieb Robert Clausecker:

Hi Warner,

Thank you for your response.

Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh:

On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker  wrote:

> Hi Warner,
>
> I have pushed a fix.  It should hopefully address those failing tests.
> The same issue should also affect memcmp(), but unlike for memchr(), it is
> illegal to pass a length to memcmp() that extends past the actual end of
> the buffer as memcmp() is permitted to examine the whole buffer regardless
> of where the first mismatch is.
>
> I am considering a change to improve the behaviour of memcmp() on such
> errorneous inputs.  There are two options: (a) I could change memcmp() the
> same way I fixed memchr() and have implausible buffer lengths behave as if
> the buffer goes to the end of the address space or (b) I could change
> memcmp() to crash loudly if it detects such a case.  I could also
> (c) leave memcmp() as is.  Which of these three choices is preferable?
>

What does the standard say? I'm highly skeptical that these corner 
cases are

UB behavior.

I'd like actual support for this statement, rather than your 
conjecture

that it's
illegal. Even if you can come up with that, preserving the old 
behavior is

my
first choice. Especially since many of these functions aren't well 
defined

by
a standard, but are extensions.

As for memchr,
https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html
has no such permission to examine 'the entire buffer at once' nor any
restirction
as to the length extending beyond the address space. I'm skeptical of 
your

reading
that it allows one to examine all of [b, b + len), so please explain 
where

the standard
supports reading past the first occurance.


memchr() in particular is specified to only examine the input until the
matching character is found (ISO/IEC 9899:2011 § 7.24.5.1):

***
The memchr function locates the first occurrence of c (converted to an
unsigned char) in the initial n characters (each interpreted as 
unsigned

char) of the object pointed to by s. The implementation shall behave as
if it reads the characters sequentially and stops as soon as a matching
character is found.
***

Therefore, it appears reasonable that calls with fake buffer lengths
(e.g. SIZE_MAX, to read until a mismatch occurs) must be supported.
However, memcmp() has no such language and the text explicitly states
that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1):

***
The memcmp function compares the first n characters of the object
pointed to by s1 to the first n characters of the object pointed to by 
s2.

***

By omission, this seems to give license to e.g. implement memcmp() like
timingsafe_memcmp() where it inspects all n characters of both buffers
and only then gives a result.  So if n is longer than the actual buffer
(e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be
a crash due to crossing into an unmapped page).

Thus I have patched memchr() to behave correctly when length SIZE_MAX 
is

given (commit b2618b65).  My memcmp() suffers from similarly flawed
logic and may need to be patched.  However, as the language I cited 
above

does not indicate that such usage needs to be supported for memcmp()
(whereas it must be for memchr(), contrary to my assumptions), I was
asking you for how to proceed with memcmp (hence choices (a)--(c)).


My 2ct:
What did the previous implementation of memcmp() do in this case?
 - If it was generous and behaved similar to the requirements of
   memchr(), POLA requires to have the same now too.
 - If it was crashing or silently going on (= lurking bugs in 3rd
   party code), we may have the possibility to do a coredump in case
   of running past the end of the buffer to prevent malicous use.
 - In general I go with the robustness principle, "be liberal what you
   accept, but strict in what you provide" = memcmp() should behave
   as if it is supported.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature