Re: disklabel change?
On Mon, Apr 22, 2024 at 03:00:40PM +0100, Patrick Welche wrote: > On Mon, Apr 22, 2024 at 01:11:56PM -, Michael van Elst wrote: > > pr...@welche.eu (Patrick Welche) writes: > > > > >In fact, the difference is between "-t" and "-rt": > > > > >I deem "-t" output to be correct (and matches what I had in /etc/diskpart) > > > > > > The in-kernel disklabel gets the RAW_PART from by the disk geometry > > and if RAW_PART == 3, it gets d_partitions[2] from the MBR partition > > table. > > > > That explains why 'disklabel -t' looks correct, it shows the in-kernel > > disklabel. > > > > It doesn't explain why the on-disk label has the entries swapped. > > When you edit the disklabel, the kernel writes to the disk. When > > that corrects the error, the bug is in the disklabel program, > > otherwise it's in the kernel. > > Given that the content of /etc/disktab agrees with the correct version, > I am pretty sure that 3 years ago I did a "disklabel -r -w sd0 perc" > > I suppose I could do it again and see if -rt becomes correct... oder? After all that: there was a typo in the disktab: :pc#4294703103:od#264192:\ :pd#4294967295:od#0:\ "od" in both lines... After fixing that, -t and -rt agree! Sorry for the noise - it was surprising though... Cheers, Patrick
Re: disklabel change?
On Mon, Apr 22, 2024 at 01:11:56PM -, Michael van Elst wrote: > pr...@welche.eu (Patrick Welche) writes: > > >In fact, the difference is between "-t" and "-rt": > > >I deem "-t" output to be correct (and matches what I had in /etc/diskpart) > > > The in-kernel disklabel gets the RAW_PART from by the disk geometry > and if RAW_PART == 3, it gets d_partitions[2] from the MBR partition > table. > > That explains why 'disklabel -t' looks correct, it shows the in-kernel > disklabel. > > It doesn't explain why the on-disk label has the entries swapped. > When you edit the disklabel, the kernel writes to the disk. When > that corrects the error, the bug is in the disklabel program, > otherwise it's in the kernel. Given that the content of /etc/disktab agrees with the correct version, I am pretty sure that 3 years ago I did a "disklabel -r -w sd0 perc" I suppose I could do it again and see if -rt becomes correct... oder? Cheers, Patrick
Re: disklabel change?
In fact, the difference is between "-t" and "-rt": $ disklabel -t sd0 perc|Automatically generated label:\ :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ :su#4294967295:\ :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ :pb#33554432:ob#21235712:tb=swap:\ :pc#4294703103:oc#264192:\ :pd#4294703103:od#0:\ :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: $ disklabel -rt sd0 perc|Automatically generated label:\ :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ :su#4294967295:\ :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ :pb#33554432:ob#21235712:tb=swap:\ :pc#4294703103:oc#0:\ :pd#4294967295:od#264192:\ :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: Given $ sysctl kern.rawpartition kern.rawpartition = 3 I deem "-t" output to be correct (and matches what I had in /etc/diskpart) Cheers, Patrick On Sun, Apr 21, 2024 at 07:02:46PM +0100, Patrick Welche wrote: > With a kernel & userland of 14 April 2024, on amd64, I just did: > > # disklabel -rt sd0 > perc|Automatically generated label:\ > :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ > :su#4294967295:\ > :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ > :pb#33554432:ob#21235712:tb=swap:\ > :pc#4294703103:oc#0:\ > :pd#4294967295:od#264192:\ > :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: > > and was surprised to see that d no longer was the whole disk. My > recollection was that peecees were odd and i386/amd64 used d, whereas > everyone else used c. Has this changed for consistency? > > That computer's /etc/disktab contains > > perc|PERC H700:\ > :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ > :su#4294967295:\ > :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ > :pb#33554432:ob#21235712:tb=swap:\ > :pc#4294703103:od#264192:\ > :pd#4294967295:od#0:\ > :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: > > which shows it was the other way around on 29 April 2021 when the > disklabel was written. > > > Cheers, > > Patrick
disklabel change?
With a kernel & userland of 14 April 2024, on amd64, I just did: # disklabel -rt sd0 perc|Automatically generated label:\ :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ :su#4294967295:\ :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ :pb#33554432:ob#21235712:tb=swap:\ :pc#4294703103:oc#0:\ :pd#4294967295:od#264192:\ :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: and was surprised to see that d no longer was the whole disk. My recollection was that peecees were odd and i386/amd64 used d, whereas everyone else used c. Has this changed for consistency? That computer's /etc/disktab contains perc|PERC H700:\ :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\ :su#4294967295:\ :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\ :pb#33554432:ob#21235712:tb=swap:\ :pc#4294703103:od#264192:\ :pd#4294967295:od#0:\ :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0: which shows it was the other way around on 29 April 2021 when the disklabel was written. Cheers, Patrick
Re: xsrc mesalib build problem
On Sat, Apr 13, 2024 at 01:19:05PM +0100, Patrick Welche wrote: > Building xsrc on -current/amd64 with > > HAVE_MESA_VER=21 > HAVE_GCC=12 > > fails for me with > > /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: error: > 'STN_UNDEF' undeclared (first use in this function); did you mean 'SHN_UNDEF'? > 658 | if (r_sym == STN_UNDEF) { > |^ > |SHN_UNDEF > /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: note: > each undeclared identifier is reported only once for each function it appears > in > > Non standard build options I know, but it had worked, and I had a complete > build on 27 March... > > STN_UNDEF / SHN_UNDEF appear to be from LLVM? (and this bit of build I think > is gallium == llvmpipe?) Really confused: STN_UNDEF should be found via libelf.h -> sys/exec_elf.h cd /usr/src/external/mit make -j24 dependall make -j24 install works without complaint, and then cd /usr/src sh build.sh -u -x -j24 -E build fails with the above *** Failed target: ac_rtld.pico If I run the lengthy output of *** Failed commands: ${_MKTARGET_COMPILE} => @echo '# ' "compile " gallium/ac_rtld.pico ${COMPILE.c} ${COPTS.${.IMPSRC:T}} ${CPUFLAGS.${.IMPSRC:T}} ${CPPFLAGS.${.IMPSRC:T}} ${CSHLIBFLAGS} ${.IMPSRC} -o ${.TARGET} I can reproduce the problem. If I use "gcc -E", I see -- typedef struct { uint32_t gh_nbuckets; uint32_t gh_symndx; uint32_t gh_maskwords; uint32_t gh_shift2; } Elf_GNU_Hash_Header; # 35 "/usr/include/elfdefinitions.h" 2 3 4 # 41 "/usr/include/libelf.h" 2 3 4 typedef struct _Elf Elf; typedef struct _Elf_Scn Elf_Scn; - c.f. libelf.h - #ifdef BUILTIN_ELF_HEADERS # include # include # include "elfdefinitions.h" #elif HAVE_NBTOOL_CONFIG_H # include #elif defined(__NetBSD__) # include # include #elif defined(__FreeBSD__) # include # include # include #else #error "No valid elf headers" #endif /* Library private data structures */ typedef struct _Elf Elf; typedef struct _Elf_Scn Elf_Scn; - My current suspicion is that "elfdefinitions.h" is being used rather than sys/exec_elf.h, the latter is the one which defines STN_UNDEF: $ grep _UNDEF elfdefinitions.h exec_elf.h elfdefinitions.h:#defineSHN_UNDEF 0 exec_elf.h:#define ELF_SYM_UNDEFINED0 exec_elf.h:#define STN_UNDEF0 /* undefined index */ exec_elf.h:#define SHN_UNDEF0 /* Undefined section */ but I don't see "BUILTIN_ELF_HEADERS" in the failing command line, and why would "make dependall" behave differently? My guess is that adding STN_UNDEF to elfdefintions.h will patch over this, but I don't see why build.sh would behave differently. Possibly related to Author: riastradh Date: Mon Apr 1 18:33:22 2024 + elftoolchain: Be consistent about which ELF header files we use. which would match my notion of it working on 27 March? Cheers, Patrick
xsrc mesalib build problem
Building xsrc on -current/amd64 with HAVE_MESA_VER=21 HAVE_GCC=12 fails for me with /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: error: 'STN_UNDEF' undeclared (first use in this function); did you mean 'SHN_UNDEF'? 658 | if (r_sym == STN_UNDEF) { |^ |SHN_UNDEF /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: note: each undeclared identifier is reported only once for each function it appears in Non standard build options I know, but it had worked, and I had a complete build on 27 March... STN_UNDEF / SHN_UNDEF appear to be from LLVM? (and this bit of build I think is gallium == llvmpipe?) Cheers, Patrick
Re: gdb crashes on current
On Wed, Mar 20, 2024 at 11:33:30PM +0500, Vitaly Shevtsov wrote: > Hello! > > It seems that gdb from base NetBSD image doesn't work with netbsd's > libcurses in tui mode: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x756547929fba in _lwp_kill () from /usr/lib/libc.so.12 > [Current thread is 1 (process 12621)] > (gdb) bt > #0 0x756547929fba in _lwp_kill () from /usr/lib/libc.so.12 > #1 0x0101ef97 in handle_fatal_signal(int) () > #2 0x0101f152 in handle_sigsegv(int) () > #3 > #4 0x756547e679e1 in prefresh () from /usr/lib/libcurses.so.9 > #5 0x00eec1a0 in tui_source_window_base::refresh_window() () > #6 0x00eedf18 in tui_unhighlight_win(tui_win_info*) () > #7 0x00eec5f8 in > tui_source_window_base::do_erase_source_content(char const*) () > #8 0x00ef60e5 in tui_layout_split::apply(int, int, int, int, bool) () > #9 0x00ef4aa8 in tui_apply_current_layout(bool) () > #10 0x00ef4ddb in tui_set_layout(tui_layout_split*) () > #11 0x00ee4f2f in tui_enable() () > #12 0x00ee5487 in tui_rl_switch_mode(int, int) () > #13 0x01379fa8 in _rl_dispatch_subseq () > #14 0x0137aa27 in _rl_dispatch_callback () > #15 0x0135a1f1 in rl_callback_read_char () > #16 0x0101f40e in gdb_rl_callback_read_char_wrapper_noexcept() () > #17 0x0102020d in gdb_rl_callback_read_char_wrapper(void*) () > #18 0x0101eeb1 in stdin_event_handler(int, void*) () > #19 0x01300eba in gdb_wait_for_event(int) [clone .part.0] () > #20 0x0130155a in gdb_do_one_event(int) () > #21 0x00fa0576 in captured_command_loop() () > #22 0x00fa2113 in gdb_main(captured_main_args*) () > #23 0x013c66fb in main () Just had a go, and "tui enable" doesn't get as far as libcurses (gdb) bt #0 0x7f7ff78dfffa in ?? () #1 0x00222b45 in gdb_rl_callback_handler () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/event-top.c:262 #2 0x00222cfa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count () at /usr/export/amd64/usr/include/g++/bits/shared_ptr_base.h:1070 #3 std::__shared_ptr, std::allocator >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr () at /usr/export/amd64/usr/include/g++/bits/shared_ptr_base.h:1524 #4 std::shared_ptr, std::allocator > >::~shared_ptr () at /usr/export/amd64/usr/include/g++/bits/shared_ptr.h:175 #5 gdb_exception::~gdb_exception () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdbsupport/common-exceptions.h:114 #6 gdb_rl_callback_read_char_wrapper_noexcept () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/event-top.c:212 #7 0x7f7ff78e00b0 in ?? () #8 0x0001000b in ?? () #9 0x in ?? () but "gdb -tui" does: Thread 1 "" received signal SIGSEGV, Segmentation faultprefresh (pad=0x0, pbegy=0, pbegx=0, sbegy=1, sbegx=5, smaxy=0, smaxx=78) at /usr/src/lib/libcurses/refresh.c:511 511 pad->pbegy = pbegy; (gdb) bt #0 prefresh (pad=0x0, pbegy=0, pbegx=0, sbegy=1, sbegx=5, smaxy=0, smaxx=78) at /usr/src/lib/libcurses/refresh.c:511 #1 0x000f60ff in tui_source_window_base::refresh_window () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:267 #2 0x000f7e68 in tui_unhighlight_win () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-wingeneral.c:138 #3 tui_unhighlight_win () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-wingeneral.c:131 #4 0x000f6552 in tui_source_window_base::do_erase_source_content () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:219 #5 0x0010008b in tui_layout_split::apply () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:1019 #6 0x000fe927 in tui_apply_current_layout () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:81 #7 0x000fec65 in tui_set_layout () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:150 #8 0x000f1f2f in tui_enable () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui.c:452 #9 0x001c319c in interp_set () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/interps.c:191 #10 0x001a7c55 in captured_main_1 () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1145 #11 0x001a8b0f in captured_main () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1320 #12 gdb_main () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1345 #13 0x005be5cf in main () at /usr/src/external/gpl3/gdb/bin/gdb/../../dist/gdb/gdb.c:32 (gdb) print pad $1 = (WINDOW *) 0x0 (gdb) frame 1 #1 0x000f60ff in tui_source_window_base::refresh_window () at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:267 267
Re: binary in dtrace output
On Wed, Dec 27, 2023 at 12:15:36AM +, RVP wrote: > On Fri, 8 Dec 2023, Patrick Welche wrote: > > > When profiling my simulation with > > > > dtrace -x ustackframes=100 -n 'profile-9 / execname == "mds" / { > > @[ustack()] = count(); } tick-180s { exit(0); }' -o /tmp/out.stacks > > > > every function name is preceded by c0 df ff ff 7f 7f e.g., > > > > 0060 20 20 20 20 20 20 20 20 20 c0 df ff ff 7f 7f 60 | > > ..`| > > 0070 66 5f 70 61 69 72 5f 44 4c 56 4f 28 61 74 6f 6d > > |f_pair_DLVO(atom| > > > > which is making perl, i.e., FlameGraph unhappy. > > > > 6 bytes seems a strange number. > > > > Is this normal / any hints on what I can do about it? > > > > This looks like junk on the stack: > > src/external/cddl/osnet/dist/lib/libdtrace/common/dt_consume.c:dt_print_ustack() > doesn't initialize `objname' and doesn't check the return of proc_objname() > (line 1385) either. > > Can you try this patch (fudged from FreeBSD)? Thanks - it works a treat! Cheers, Patrick
binary in dtrace output
When profiling my simulation with dtrace -x ustackframes=100 -n 'profile-9 / execname == "mds" / { @[ustack()] = count(); } tick-180s { exit(0); }' -o /tmp/out.stacks every function name is preceded by c0 df ff ff 7f 7f e.g., 0060 20 20 20 20 20 20 20 20 20 c0 df ff ff 7f 7f 60 | ..`| 0070 66 5f 70 61 69 72 5f 44 4c 56 4f 28 61 74 6f 6d |f_pair_DLVO(atom| which is making perl, i.e., FlameGraph unhappy. 6 bytes seems a strange number. Is this normal / any hints on what I can do about it? Cheers, Patrick
Re: gcc 12 question
On Thu, Nov 23, 2023 at 12:31:34PM +, Robert Swindells wrote: > > Patrick Welche wrote: > > I'm trying to build a release on amd64 using > > > > HAVE_MESA_VER=21 > > HAVE_GCC=12 > > What does pkgsrc graphics/MesaLib do if built using gcc 12? It builds OK. Given https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716 my guess is that the pkgsrc package doesn't treat warnings as errors. (-Werror=stringop-overread) Cheers, Patrick
Re: gcc 12 question
On Thu, Nov 23, 2023 at 09:22:11AM +, Patrick Welche wrote: > I'm trying to build a release on amd64 using > > HAVE_MESA_VER=21 > HAVE_GCC=12 > > and get the following build error which I am puzzled by: > > inlined from 'r300_merge_textures_and_samplers' at > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:823:17, > inlined from 'r300_update_derived_state' at > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:1064:9: > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5: > error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 > [-Werror=stringop-overread] > 676 | util_format_unswizzle_4f(border_swizzled, border, desc->swizzle); > | ^~~~ > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5: > note: referencing argument 3 of type 'const unsigned char[4]' > In file included from > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/auxiliary/util/u_pack_color.h:40, > from > /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:28: > /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h: In function > 'r300_update_derived_state': > /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h:1671:6: note: > in a call to function 'util_format_unswizzle_4f' > 1671 | void util_format_unswizzle_4f(float *dst, const float *src, > | ^~~~ > > > desc is a const struct util_format_description * > > util/format/u_format.h:226 defines the swizzle member of > util_format_description as: > > unsigned char swizzle[4]; > > and the truncated quote of util/format/u_format.h:1671 is > > void util_format_unswizzle_4f(float *dst, const float *src, > const unsigned char swz[4]); > > so all is consistent. > > error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 > [-Werror=stringop-overread] > > Assuming "a region" is the 3rd argument, then how can the size of an > unsigned char[4] be zero? > > Puzzled... Seems others were puzzled too: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716 Cheers, Patrick
gcc 12 question
I'm trying to build a release on amd64 using HAVE_MESA_VER=21 HAVE_GCC=12 and get the following build error which I am puzzled by: inlined from 'r300_merge_textures_and_samplers' at /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:823:17, inlined from 'r300_update_derived_state' at /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:1064:9: /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5: error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 [-Werror=stringop-overread] 676 | util_format_unswizzle_4f(border_swizzled, border, desc->swizzle); | ^~~~ /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5: note: referencing argument 3 of type 'const unsigned char[4]' In file included from /usr/xsrc/external/mit/MesaLib/dist/src/gallium/auxiliary/util/u_pack_color.h:40, from /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:28: /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h: In function 'r300_update_derived_state': /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h:1671:6: note: in a call to function 'util_format_unswizzle_4f' 1671 | void util_format_unswizzle_4f(float *dst, const float *src, | ^~~~ desc is a const struct util_format_description * util/format/u_format.h:226 defines the swizzle member of util_format_description as: unsigned char swizzle[4]; and the truncated quote of util/format/u_format.h:1671 is void util_format_unswizzle_4f(float *dst, const float *src, const unsigned char swz[4]); so all is consistent. error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 [-Werror=stringop-overread] Assuming "a region" is the 3rd argument, then how can the size of an unsigned char[4] be zero? Puzzled... Cheers, Patrick
Re: ffmpeg6 and SSP?
On Wed, Nov 15, 2023 at 01:48:19PM +0200, Vitaly Shevtsov wrote: > Even arcticfox cannot be built due to the same reason. Christos fixed it - cvs update and rebuild, and check you have # nm -g /lib/libc.so | grep ssp 00055136 T __ssp_protected_getcwd 0005512c T __ssp_protected_read 00055131 T __ssp_protected_readlink 0007cc3a T _getfsspec 0007cc3a W getfsspec 0019822f T isspace 00198245 T isspace_l 0004afb7 T wcsspn Cheers, Patrick
Re: ffmpeg6 and SSP?
On Tue, Nov 14, 2023 at 11:30:27AM +, Patrick Welche wrote: > On Tue, Nov 14, 2023 at 10:32:01AM +0000, Patrick Welche wrote: > > On Mon, Nov 13, 2023 at 11:22:55AM +, Patrick Welche wrote: > > > I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current > > > with HAVE_GCC=12 and pkgsrc-current I'm seeing > > > > > > => Bootstrap dependency digest>=20211023: found digest-20220214 > > > ===> Checking for vulnerabilities in ffmpeg6-6.0nb6 > > > ===> Building for ffmpeg6-6.0nb6 > > > LD ffmpeg6_g > > > LD ffprobe6_g > > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of > > > `environ' > > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of > > > `environ' > > > ld: libavdevice/libavdevice.so: undefined reference to > > > `__ssp_protected_read' > > > ld: libavdevice/libavdevice.so: undefined reference to > > > `__ssp_protected_read' > > > gmake: *** [Makefile:131: ffprobe6_g] Error 1 > > > gmake: *** Waiting for unfinished jobs > > > gmake: *** [Makefile:131: ffmpeg6_g] Error 1 > > > *** Error code 2 > > > > > > > > > Suggestions? Try no FORTIFY? > > > > I tried "no FORTIFY" on ffmpeg6 as > > > > CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\"" > > > > which didn't help. > > > > I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12) > > which didn't help. > > > > I also see the problem with the simpler lang/gawk package: > > > > ld: awkgram.o: in function `get_src_buf': > > awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read' > > ld: io.o: in function `iop_alloc': > > io.c:(.text+0xf03): undefined reference to `__ssp_protected_read' > > ld: io.o: in function `get_a_record': > > io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read' > > ld: io.o: in function `after_beginfile': > > io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read' > > ld: io.o: in function `redirect_string': > > io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read' > > ld: io.o:io.c:(.text+0x5606): more undefined references to > > `__ssp_protected_read' follow > > > > If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from > > the definition of__ssp_inline and make it static again, then gawk builds, > > > > i.e., reverting > > > > -/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/ > > +/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/ > > > > allows gawk to build. > > Userland was built with MKUPDATE=yes - maybe I didn't rebuild whichever > library should contain the extern definition of __ssp_protected_read ? > > git grep ssp_protected_read > > on https://github.com/NetBSD/src.git returned nothing - where should > the __ssp_protected_read symbol live? Thank you to Christos for putting the symbol in libc today with the addition of ssp_redirect.c! Before: $ nm -g libc.so.12.221 | grep ssp 0007bb8a T _getfsspec 0007bb8a W getfsspec 0019717f T isspace 00197195 T isspace_l 00049f67 T wcsspn After: $ nm -g libc.so.12.221 | grep ssp 00055136 T __ssp_protected_getcwd 0005512c T __ssp_protected_read 00055131 T __ssp_protected_readlink 0007cc3a T _getfsspec 0007cc3a W getfsspec 0019822f T isspace 00198245 T isspace_l 0004afb7 T wcsspn Cheers, Patrick
SSP
Talking of SSP, what can you do once a detection happens? I see in /var/log/messages: Nov 15 06:59:32 mail -: mail.example.com exim - - - stack overflow detected; terminated I have: kern.coredump.setid.dump = 1 kern.coredump.setid.path = /var/crash/%n.core proc.curproc.rlimit.coredumpsize.soft = unlimited proc.curproc.rlimit.coredumpsize.hard = unlimited but /var/crash is empty. How do you make use of SSP? Cheers, Patrick
Re: ffmpeg6 and SSP?
On Tue, Nov 14, 2023 at 10:32:01AM +, Patrick Welche wrote: > On Mon, Nov 13, 2023 at 11:22:55AM +0000, Patrick Welche wrote: > > I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current > > with HAVE_GCC=12 and pkgsrc-current I'm seeing > > > > => Bootstrap dependency digest>=20211023: found digest-20220214 > > ===> Checking for vulnerabilities in ffmpeg6-6.0nb6 > > ===> Building for ffmpeg6-6.0nb6 > > LD ffmpeg6_g > > LD ffprobe6_g > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of > > `environ' > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of > > `environ' > > ld: libavdevice/libavdevice.so: undefined reference to > > `__ssp_protected_read' > > ld: libavdevice/libavdevice.so: undefined reference to > > `__ssp_protected_read' > > gmake: *** [Makefile:131: ffprobe6_g] Error 1 > > gmake: *** Waiting for unfinished jobs > > gmake: *** [Makefile:131: ffmpeg6_g] Error 1 > > *** Error code 2 > > > > > > Suggestions? Try no FORTIFY? > > I tried "no FORTIFY" on ffmpeg6 as > > CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\"" > > which didn't help. > > I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12) > which didn't help. > > I also see the problem with the simpler lang/gawk package: > > ld: awkgram.o: in function `get_src_buf': > awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read' > ld: io.o: in function `iop_alloc': > io.c:(.text+0xf03): undefined reference to `__ssp_protected_read' > ld: io.o: in function `get_a_record': > io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read' > ld: io.o: in function `after_beginfile': > io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read' > ld: io.o: in function `redirect_string': > io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read' > ld: io.o:io.c:(.text+0x5606): more undefined references to > `__ssp_protected_read' follow > > If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from > the definition of__ssp_inline and make it static again, then gawk builds, > > i.e., reverting > > -/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/ > +/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/ > > allows gawk to build. Userland was built with MKUPDATE=yes - maybe I didn't rebuild whichever library should contain the extern definition of __ssp_protected_read ? git grep ssp_protected_read on https://github.com/NetBSD/src.git returned nothing - where should the __ssp_protected_read symbol live? Cheers, Patrick
Re: ffmpeg6 and SSP?
On Mon, Nov 13, 2023 at 11:22:55AM +, Patrick Welche wrote: > I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current > with HAVE_GCC=12 and pkgsrc-current I'm seeing > > => Bootstrap dependency digest>=20211023: found digest-20220214 > ===> Checking for vulnerabilities in ffmpeg6-6.0nb6 > ===> Building for ffmpeg6-6.0nb6 > LD ffmpeg6_g > LD ffprobe6_g > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of `environ' > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of `environ' > ld: libavdevice/libavdevice.so: undefined reference to `__ssp_protected_read' > ld: libavdevice/libavdevice.so: undefined reference to `__ssp_protected_read' > gmake: *** [Makefile:131: ffprobe6_g] Error 1 > gmake: *** Waiting for unfinished jobs > gmake: *** [Makefile:131: ffmpeg6_g] Error 1 > *** Error code 2 > > > Suggestions? Try no FORTIFY? I tried "no FORTIFY" on ffmpeg6 as CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\"" which didn't help. I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12) which didn't help. I also see the problem with the simpler lang/gawk package: ld: awkgram.o: in function `get_src_buf': awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read' ld: io.o: in function `iop_alloc': io.c:(.text+0xf03): undefined reference to `__ssp_protected_read' ld: io.o: in function `get_a_record': io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read' ld: io.o: in function `after_beginfile': io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read' ld: io.o: in function `redirect_string': io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read' ld: io.o:io.c:(.text+0x5606): more undefined references to `__ssp_protected_read' follow If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from the definition of__ssp_inline and make it static again, then gawk builds, i.e., reverting -/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/ +/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/ allows gawk to build. Cheers, Patrick
Re: blocklist puzzle
On Sun, Feb 19, 2023 at 09:52:24AM +0100, J. Hannken-Illjes wrote: > > On 18. Feb 2023, at 23:34, Patrick Welche wrote: > > > > 12 hours after rebooting > > > > # npfctl rule blocklistd list > > block in final family inet4 proto tcp from 61.177.173.35/32 to any port 22 > > # id="1" > > # > > > > contains a single block, yet /var/log/messages is full: > > > > Feb 18 17:47:44 mail blocklistd[596]: blocked 195.226.194.142/32:22 for > > 172800 seconds > > Feb 18 18:18:00 mail blocklistd[596]: released 171.225.184.179/32:22 after > > 172800 seconds > > Feb 18 18:18:07 mail blocklistd[596]: blocked 195.226.194.142/32:22 for > > 172800 seconds > > Feb 18 18:35:18 mail blocklistd[596]: blocked 31.41.244.124/32:22 for > > 172800 seconds > > Feb 18 18:48:10 mail blocklistd[596]: blocked 195.226.194.242/32:22 for > > 172800 seconds > > Feb 18 19:18:02 mail blocklistd[596]: blocked 195.226.194.142/32:22 for > > 172800 seconds > > Feb 18 20:18:13 mail blocklistd[596]: blocked 195.226.194.142/32:22 for > > 172800 seconds > > Feb 18 20:47:46 mail blocklistd[596]: blocked 195.226.194.242/32:22 for > > 172800 seconds > > Feb 18 21:17:48 mail blocklistd[596]: blocked 195.226.194.242/32:22 for > > 172800 seconds > > Feb 18 21:47:55 mail blocklistd[596]: blocked 195.226.194.242/32:22 for > > 172800 seconds > > > > > > > > If something were misconfigured, I would expect no hosts in the ruleset, > > rather than some (or one). How can this work partially? > > > > extract of npf.conf: > > > > group "external" on $ext_if { > >pass stateful out final all > > > >ruleset "blocklistd" > > > > ... > > Looks like your ruleset "blocklistd" never fires as the rule above is "final > all". I thought this would only apply to packets on their way out, whereas the blocking should happen on the way in? npf.conf(5) gives the example: group "external" on $ext_if { pass stateful out final all block in final from ... which suggests that it should work? "npfctl rule blocklistd list" also lists more hosts today, so it at least works _sometimes_. The puzzle is this apparent _sometimes_ - I would expect an empty list if this were misconfigured, which is why I can't guess where to look for the problem. Cheers, Patrick
blocklist puzzle
I see in /var/log/messages (NetBSD-10.99.2/XEN3_DOMU/amd64): ... Feb 18 00:19:16 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 seconds Feb 18 00:49:33 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 seconds Feb 18 01:18:58 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 01:49:45 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 02:18:50 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 seconds Feb 18 02:49:23 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 03:49:05 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 04:18:15 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 04:49:27 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 05:18:16 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 seconds Feb 18 05:49:14 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 seconds Feb 18 06:48:01 mail blocklistd[625]: blocked 195.226.194.142/32:22 fo 172800 seconds = 48 hours, so the hourly attempt shouldn't make it. # npfctl rule blocklistd list | grep 195.226 # but npf doesn't appear to be blocking it, though some are blocked: # npfctl rule blocklistd list block in final family inet4 proto tcp from 179.60.147.157/32 to any port 22 # id="d" block in final family inet4 proto tcp from 171.225.184.179/32 to any port 22 # id="f" block in final family inet4 proto tcp from 113.249.95.65/32 to any port 22 # id="10" ... (I noticed while wondering why mail to said domu stop being received, which seems to happen every 4 days.) Thoughts? Cheers, Patrick
uvm_pagelookup panic, no bt
Just had this panic on -current/amd64, but it doesn't make for much of a bug report: Crash version 10.99.2, image version 10.99.2. crash: _kvm_kvatop(0) Kernel compiled without options LOCKDEBUG. System panicked: kernel diagnostic assertion "uvm_pagelookup(>u_obj, (UAO_SWHASH_ELT_PAGEIDX_BASE(elt) + j) << PAGE_SHIFT) == NULL" failed: file "../../../../uvm/uvm_aobj.c", line 1364 Backtrace from time of crash is available. crash> bt end() at 0 crash: _kvm_kvatop(b610cec18aa8) crash: kvm_read(0xb610cec18aa8, 8): could not read PT level 3 entry: Invalid argument I'll try running with LOCKDEBUG for a bit... Cheers, Patrick
Re: ctfmerge i/o error
On Wed, Dec 14, 2022 at 09:15:40PM -, Christos Zoulas wrote: > In article , > Patrick Welche wrote: > >While trying to build a release, I am having trouble trying to make > >GENERIC_KASLR.debug, so manually, I tried > > > >cd /usr/obj/sys/arch/amd64/compile.amd64/GENERIC_KASLR > >make clean > >make dependall > > > >and repeatedly get > > > ># link GENERIC_KASLR/netbsd > >ld -Map netbsd.map --cref -T netbsd.ldscript -Ttext 0x8020 > >-e start --split-by-file=0x10 -r -d -X -o netbsd > >${SYSTEM_OBJ:[@]:Nswapnetbsd.o} ${EXTRA_OBJ} vers.o swapnetbsd.o > >NetBSD 9.99.108 (GENERIC_KASLR) #4: Wed Dec 14 10:40:49 CST 2022 > > textdata bss dec hex filename > >21686890 686136 466512 2283953815c80f2 netbsd > >ERROR: ctfmerge: netbsd.ctf: Cannot finalize temp file: I/O error: > >Operation already in progress > >*** Error code 1 > > > >which is not a message I recognize. FWIW /usr/obj is a ZFS filesystem, but > >this hasn't caused trouble so far... > > > >Thoughts? > > Can you ktrace to find out what syscall caused EALREADY. zfs uses EALREADY > for TX_WRITE, when a block is already being synced according to my quick > glance to the code. I am not sure how this stuff is supposed to work, but > I don't think that this error is supposed to be returned by filesystem > related syscalls, but only for connect(2)? and of course now it is no longer reproducible. I note that that ZFS partition now has more free space than earlier, so I will guess the error message might mean "out of space"... Thanks, Patrick
ctfmerge i/o error
While trying to build a release, I am having trouble trying to make GENERIC_KASLR.debug, so manually, I tried cd /usr/obj/sys/arch/amd64/compile.amd64/GENERIC_KASLR make clean make dependall and repeatedly get # link GENERIC_KASLR/netbsd ld -Map netbsd.map --cref -T netbsd.ldscript -Ttext 0x8020 -e start --split-by-file=0x10 -r -d -X -o netbsd ${SYSTEM_OBJ:[@]:Nswapnetbsd.o} ${EXTRA_OBJ} vers.o swapnetbsd.o NetBSD 9.99.108 (GENERIC_KASLR) #4: Wed Dec 14 10:40:49 CST 2022 textdata bss dec hex filename 21686890 686136 466512 2283953815c80f2 netbsd ERROR: ctfmerge: netbsd.ctf: Cannot finalize temp file: I/O error: Operation already in progress *** Error code 1 which is not a message I recognize. FWIW /usr/obj is a ZFS filesystem, but this hasn't caused trouble so far... Thoughts? Cheers, Patrick
Re: Usable Notebook for NetBSD-current wanted
On Fri, Sep 09, 2022 at 06:34:03PM +0200, Frank Kardel wrote: > I have seen quite a bit work in the drm/X area - thanks for that. I was > hoping that 1915 was a common denominator that would allow many Notebooks to > work. I think i have learned now that i915 is many critters all alike or > different so things don't always work as expected. Not sure of definition of notebook vs laptop - I got a Dell Latitude 7300 when it was the current model in 2020 (so not that long ago!) and use it daily running NetBSD-current. The only "glitch" you already know about(!): https://mail-index.netbsd.org/current-users/2022/07/21/msg042712.html and it rarely manifests itself, as often there is either cpu load, or a video running, oh, and I use a USB wifi dongle. Cheers, Patrick
raspberry pi zero W serial port overlay fun
[I posted this to port-arm around 4th July, but hasn't made it. Reposting here in case useful...] tl;dr On a raspberry pi zero W, updating the firmware allows the disable-bt overlay to function resulting in a stable serial console. Experimental method(!) - grab https://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202207031950Z/evbarm-earmv6hf/binary/gzimg/rpi.img.gz - gunzip / dd to card / remove "console=fb" from first line of cmdline.txt - connect raspberry pi zero W 1.1 to serial port via pins 4,6,8,10 which boots - watch output with tip & baud rate 115200 - all is well until /etc/rc.local is run, changing the cpu frequency, which messes up the baud rate because the raspberry pi zero w, by default, reserves its real UART for bluetooth, and attaches a "mini" UART to the console, and this "mini" UART doesn't appear to have its own clock: ... [ 1.000] simplebus0 at armfdt0: Raspberry Pi Zero W Rev 1.1 ... [ 1.000] plcom0 at simplebus1: ARM PL011 UART ... [ 1.000] com0 at simplebus1: BCM AUX UART, 1-byte FIFO [ 1.000] com0: console ... Starting local daemons:. JRQ ܊VP�҇KZ�� � ���Q At this point, we can either: 1) delete /etc/rc.local 2) try to make use of the "real" UART by disabling bluetooth 1) works, 2) would be nice. 2): - create overlays directory in SD card's /boot partition: mount /dev/sd0e /mnt cd /mnt mkdir overlays cp /tmp/firmware/boot/overlays/disable-bt.dtbo overlays - add "dtoverlay=disable-bt" to config.txt - boot, but no change: [ 1.000] com0 at simplebus1: BCM AUX UART, 1-byte FIFO [ 1.000] com0: console - Grab precompiled raspberry pi firmware: cd /tmp git clone --depth=1 https://github.com/raspberrypi/firmware.git Compare disable-bt.dtb0 to rpi.img dtb using file: /tmp/firmware/boot/overlays/disable-bt.dtbo: Device Tree Blob version 17, size=1073, boot CPU=0, string block size=145, DT structure block size=872 /mnt/dtb/bcm2835-rpi-zero-w.dtb: Device Tree Blob version 17, size=19566, boot CPU=0, string block size=1770, DT structure block size=16700 Same version, so looking hopeful. - replace firmware: mount /dev/sd0e /mnt cd /mnt for sfx in elf bin dat; do rm *.${sfx} cp /tmp/firmware/boot/*.${sfx} . done; rm dtb/* cp /tmp/firmware/boot/bcm2708-rpi-zero-w.dtb dtb NB in rpi.img.gz was called: bcm2835-rpi-zero-w.dtb, bcm2708, yet dtc -I dtb -O dts ./boot/overlays/disable-bt.dtbo shows /dts-v1/; / { compatible = "brcm,bcm2835"; ... Now, boot, and success! [ 1.000] plcom0 at simplebus1: ARM PL011 UART [ 1.000] plcom0: txfifo disabled [ 1.000] plcom0: console rpi# sysctl machdep.cpu.frequency machdep.cpu.frequency.target = 1000 machdep.cpu.frequency.current = 1000 machdep.cpu.frequency.min = 700 machdep.cpu.frequency.max = 1000 machdep.cpu.frequency.available = 700 1000 and that is read over the working serial console. BTW my impression is there is no bluetooth support, so we are not missing anything by disabling it? (as opposed to miniuart-bt). Cheers, Patrick
Re: Branching for NetBSD 10
On Sat, Jun 04, 2022 at 05:16:34AM +, Thomas Mueller wrote: > My big concern with the branch is the entropy bug, where building many > packages from pkgsrc is stopped. Isn't "entropy bug" a misnomer for the interesting libpthread bug https://gnats.netbsd.org/56414 which has recently been fixed? Cheers, Patrick
Re: uvideo uvm_fault panic
On Sat, May 14, 2022 at 03:30:49PM +, Taylor R Campbell wrote: > > Date: Sat, 14 May 2022 15:14:50 + > > From: sc.dy...@gmail.com > > > > On 2022/05/14 13:47, Taylor R Campbell wrote: > > > Can you please try the two attached patches? > > > > > > 1. uvideobadstream.patch should fix the _crash_ when you try to open a > > >video stream on a device that the driver deemed to have a bad > > >descriptor. Try this one first, if you have time -- it prevents a > > >malicious USB device from causing this kernel crash. > > > > Yes, it fixes crash -- as it prevents attaching video(4)... > > > > > 2. uvideosizeof.patch should fix the sizeof calculations so that the > > >driver stops rejecting your device's descriptor. This one should > > >make your device work again. > > > > It works well again. > > Thanks, committed! Just logged into zoom.us without a panic :-) and now # videoctl -a videoctl: couldn't open '/dev/video0': Input/output error Does this dmesg look sensible? $ dmesg -t | grep video uvideo0 at uhub1 port 6 configuration 1 interface 0: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 video0 at uvideo0: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 video1 at uvideo0: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 uvideo1 at uhub1 port 6 configuration 1 interface 2: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 video2 at uvideo1: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 video3 at uvideo1: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2 there is just the one built-in camera... Cheers, Patrick
Re: uvideo uvm_fault panic
On Fri, Apr 22, 2022 at 09:52:55AM +0100, Patrick Welche wrote: > Logged in to zoom.us using firefox on 9.99.96/amd64 from Tuesday and > got the following panic: ... > --- trap (number 6) --- > uvideo_open() at uvideo_open+0x17 > videoopen() at videoopen+0x71 ... Even easier reproducer: "videoctl -a" P
uvideo uvm_fault panic
Logged in to zoom.us using firefox on 9.99.96/amd64 from Tuesday and got the following panic: crash> bt end() at 0 kern_reboot() at sys_reboot vpanic() at vpanic+0x181 panic() at printf_tolog trap() at startlwp --- trap (number 6) --- uvideo_open() at uvideo_open+0x17 videoopen() at videoopen+0x71 cdev_open() at cdev_open+0x19c spec_open() at spec_open+0x224 VOP_OPEN() at VOP_OPEN+0x36 vn_open() at vn_open+0x32e do_open() at do_open+0xc3 do_sys_openat() at do_sys_openat+0x74 sys_open() at sys_open+0x24 syscall() at syscall+0x18c --- syscall (number 5) --- syscall+0x18c: [ 4205.3040059] uvm_fault(0xa978ca4ba1b0, 0x0, 1) -> e [ 4205.3040059] fatal page fault in supervisor mode [ 4205.3040059] trap type 6 code 0 rip 0x803a8018 cs 0x8 rflags 0x10286 cr2 0 ilevel 0 rsp 0x8501501d2a30 [ 4205.3040059] curlwp 0xa978f1bce940 pid 2365.6641 lowest kstack 0x8501 501ce2c0 (also surprised at the web site as I had no intention of using the camera) (gdb doesn't cope with the corefile) crash> print uvideo_open 803a8001 uvideo_open:movq8 (%rdi),%rdx uvideo_open+0x4:movl28 (%rdx),%ecx uvideo_open+0x7:testl %ecx,%ecx uvideo_open+0x9:jnz 803a8064 uvideo_open+0xb:pushq %rbp uvideo_open+0xc:movq%rsp,%rbp uvideo_open+0xf:subq$0x30,%rsp uvideo_open+0x13: movq40 (%rdi),%rax uvideo_open+0x17: movq0 (%rax),%rcx uvideo_open+0x1a: movq%rcx,ffd4 (%rbp) uvideo_open+0x1e: movq8 (%rax),%rcx uvideo_open+0x22: movq%rcx,ffdc (%rbp) uvideo_open+0x26: movq10 (%rax),%rcx uvideo_open+0x2a: movq%rcx,ffe4 (%rbp) uvideo_open+0x2e: movq18 (%rax),%rcx uvideo_open+0x32: movq%rcx,ffec (%rbp) uvideo_open+0x36: movq20 (%rax),%rcx uvideo_open+0x3a: movq%rcx,fff4 (%rbp) uvideo_open+0x3e: movl28 (%rax),%eax uvideo_open+0x41: movl%eax,fffc (%rbp) uvideo_open+0x44: movl28 (%rdx),%eax PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT 2365 >6641 7 0 100 a978f1bce940 VideoCapture 0> 213 7 3 240 a978c35ea8c0 usb1 Thoughts? Cheers, Patrick
Re: current - unable to rebuild gobject-introspection due to libffi
On Mon, Feb 14, 2022 at 03:13:41PM +0100, Riccardo Mottola wrote: > Hi Patrick, > > > Patrick Welche wrote: > >> Given the error below, there seems to be a failure right on the libffi > >> needed for libgirepository, but it is the one I am trying to replace, or > >> not? > > I think one cunning plan seen on this list was to look at the output from > > > > grep libffi.so.7 /usr/pkg/pkgdb/*/+BUILD_INFO > > the output is below. Sorry if it doesn't illuminate immediately. Does it > mean I should force a rebuild of another package first? That's what I thought - I assume you have libffi.so.8, so those packages will be in trouble - but none seem relevant. The other work around seen on the mailing lists was to remove the installed gobject-introspection, then build it, which would explain why pbulk builds are happy. This might make sense as: ld: warning: libffi.so.7, needed by /usr/pkg/lib/libgirepository-1.0.so.1, not found (try using -rpath or -rpath-link) suggests an already installed gobject-introspection... (I think there is a way of not removing all the packages which depend on gobject-introspection) Cheers, Patrick
Re: current - unable to rebuild gobject-introspection due to libffi
On Mon, Feb 14, 2022 at 01:03:33PM +0100, Riccardo Mottola wrote: > I am on netbsd 9.99.93 and current pkgsrc tree. > After upgrading the whole base system, upgrading pkgsrc is also a good idea. > pkg_rolling-replace -uv is my usual friend. > > I always stop at gobject introspection, which is currently broken due to > libffi upgrade > > Given the error below, there seems to be a failure right on the libffi > needed for libgirepository, but it is the one I am trying to replace, or > not? I think one cunning plan seen on this list was to look at the output from grep libffi.so.7 /usr/pkg/pkgdb/*/+BUILD_INFO Cheers, Patrick
Re: ixg wierdness
On Wed, Dec 22, 2021 at 01:34:25PM +0100, Hauke Fath wrote: > On Wed, 22 Dec 2021 12:26:21 +0000, Patrick Welche wrote: > > The box in 53155 is Hauke's - also a Dell, but slightly different model. > > he@, not hauke@ -- no Dell boxes here. Sorry - Havard's! On the 51355 front, dholland asks if the 2 bnx hang issue is the same as 47229, and it looks like it. From the email threads quoted in 47229, the gist seems to be that the issue doesn't exist on /i386, just /amd64. Cheers, Patrick
Re: ixg wierdness
On Wed, Dec 22, 2021 at 12:05:38PM +0900, SAITOH Masanobu wrote: > On 2021/12/22 9:38, SAITOH Masanobu wrote: > > I will take a look of it. Thanks - reams of logs heading to you off-list... > If the machine is the same as kern/53155 I don't know why > the MSI-X allocation fails. If the fallback code in ixgbe.c > has a bug, it would worth to try the following diff: The box in 53155 is Hauke's - also a Dell, but slightly different model. Cheers, Patrick
ixg wierdness
On a box with 4 bnx and 4 ixg interfaces, I just hit PR kern/53155 when trying to use bnx1. (Built LOCKDEBUG etc kernel, with serial console. Hang such that ~# doesn't drop into ddb) No problems running as an NFS server for a year or two just using bnx0. (I didn't try "up"ing bnx2) So I tried swapping to use ixg0 and ixg1 instead. I see a strange bursty pattern with what looks like a 1s count down, e.g.: 64 bytes from 10.0.0.236: icmp_seq=642 ttl=255 time=37004.721972 ms 64 bytes from 10.0.0.236: icmp_seq=643 ttl=255 time=36004.533428 ms 64 bytes from 10.0.0.236: icmp_seq=644 ttl=255 time=35004.224479 ms 64 bytes from 10.0.0.236: icmp_seq=645 ttl=255 time=34003.925027 ms 64 bytes from 10.0.0.236: icmp_seq=646 ttl=255 time=33003.615239 ms 64 bytes from 10.0.0.236: icmp_seq=647 ttl=255 time=32003.313832 ms 64 bytes from 10.0.0.236: icmp_seq=648 ttl=255 time=31003.008233 ms 64 bytes from 10.0.0.236: icmp_seq=649 ttl=255 time=30002.702356 ms 64 bytes from 10.0.0.236: icmp_seq=650 ttl=255 time=29002.396480 ms 64 bytes from 10.0.0.236: icmp_seq=651 ttl=255 time=28002.090882 ms 64 bytes from 10.0.0.236: icmp_seq=652 ttl=255 time=27001.772992 ms 64 bytes from 10.0.0.236: icmp_seq=653 ttl=255 time=26001.477731 ms 64 bytes from 10.0.0.236: icmp_seq=654 ttl=255 time=25001.291421 ms 64 bytes from 10.0.0.236: icmp_seq=655 ttl=255 time=24000.965150 ms 64 bytes from 10.0.0.236: icmp_seq=656 ttl=255 time=23000.622398 ms 64 bytes from 10.0.0.236: icmp_seq=657 ttl=255 time=22000.278807 ms 64 bytes from 10.0.0.236: icmp_seq=658 ttl=255 time=20999.931305 ms 64 bytes from 10.0.0.236: icmp_seq=659 ttl=255 time=1.592463 ms 64 bytes from 10.0.0.236: icmp_seq=660 ttl=255 time=19009.253137 ms 64 bytes from 10.0.0.236: icmp_seq=661 ttl=255 time=18008.910105 ms 64 bytes from 10.0.0.236: icmp_seq=662 ttl=255 time=17008.551987 ms 64 bytes from 10.0.0.236: icmp_seq=663 ttl=255 time=16008.224040 ms 64 bytes from 10.0.0.236: icmp_seq=664 ttl=255 time=15007.874862 ms 64 bytes from 10.0.0.236: icmp_seq=665 ttl=255 time=14007.533506 ms 64 bytes from 10.0.0.236: icmp_seq=666 ttl=255 time=13007.194943 ms 64 bytes from 10.0.0.236: icmp_seq=667 ttl=255 time=12006.852469 ms 64 bytes from 10.0.0.236: icmp_seq=668 ttl=255 time=11006.509437 ms 64 bytes from 10.0.0.236: icmp_seq=669 ttl=255 time=10006.193223 ms 64 bytes from 10.0.0.236: icmp_seq=670 ttl=255 time=9005.846559 ms 64 bytes from 10.0.0.236: icmp_seq=671 ttl=255 time=8005.508556 ms 64 bytes from 10.0.0.236: icmp_seq=672 ttl=255 time=7005.165803 ms 64 bytes from 10.0.0.236: icmp_seq=673 ttl=255 time=6004.818579 ms 64 bytes from 10.0.0.236: icmp_seq=674 ttl=255 time=5004.479458 ms 64 bytes from 10.0.0.236: icmp_seq=675 ttl=255 time=4004.132514 ms 64 bytes from 10.0.0.236: icmp_seq=676 ttl=255 time=3003.794232 ms 64 bytes from 10.0.0.236: icmp_seq=677 ttl=255 time=2003.431084 ms 64 bytes from 10.0.0.236: icmp_seq=678 ttl=255 time=1003.103697 ms 64 bytes from 10.0.0.236: icmp_seq=679 ttl=255 time=2.761223 ms 64 bytes from 10.0.0.236: icmp_seq=717 ttl=255 time=6373.442427 ms 64 bytes from 10.0.0.236: icmp_seq=718 ttl=255 time=5373.238237 ms 64 bytes from 10.0.0.236: icmp_seq=719 ttl=255 time=4372.937388 ms 64 bytes from 10.0.0.236: icmp_seq=720 ttl=255 time=3372.631791 ms 64 bytes from 10.0.0.236: icmp_seq=721 ttl=255 time=2372.325913 ms 64 bytes from 10.0.0.236: icmp_seq=722 ttl=255 time=1372.006627 ms 64 bytes from 10.0.0.236: icmp_seq=723 ttl=255 time=371.714159 ms ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ... then eventually wakes up again when pinging to its ixg0 interface. You see the bursts while running tcpdump -ni ixg0. ixg0 at pci8 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 4.0.1-k ixg0: device 82599EB ixg0: ETrackID 81a5 ixg0: autoconfiguration error: failed to allocate MSI-X interrupt ixg0: interrupting at ioapic1 pin 22 ixg0: Ethernet address 00:1b:21:9a:d4:84 ixg0: PHY: OUI 0x0014a6 model 0x0001, rev. 0 ixg0: PCI Express Bus: Speed 5.0GT/s Width x8 ixg0: feature cap 0x1780 ixg0: feature ena 0x1000 ixg0: flags=0x8843 mtu 1500 capabilities=0x7ff80 capabilities=0x7ff80 capabilities=0x7ff80 enabled=0 ec_capabilities=0xf ec_enabled=0x7 address: 00:1b:21:9a:d4:84 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::21b:21ff:fe9a:d484%ixg0/64 flags 0 scopeid 0x5 inet 10.0.0.236/24 broadcast 10.0.0.255 flags 0 This is with 15 December 2021 -current/amd64. Any ideas on what might be going on? Cheers, Patrick
Re: IDENTIFY failed
On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote: > Jun, Patrick, thank you for dmesg (and discussion offlist). > > For Jun, the problem is no longer reproducible even with the original > copy of kernel, which failed before. > > So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine: > > https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c > > With this patch, AHCISATA_EXTRA_DELAY option is no longer required for > this machine. I cvs updated, rebuilt the kernel without the DELAY, and checked that the problem still existed. (it does) Then applied your gist patch, and had a successful reboot! (I haven't tried reducing the delay) Thanks, Patrick
Re: IDENTIFY failed
On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: > From: matthew green > Subject: re: IDENTIFY failed > Date: Fri, 29 Oct 2021 07:18:09 +1100 > > >> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for > >> > drive 0 > >> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html > > this one has reduced timeframe, too: > >> between > >> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK > >> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed > > which changed how some interrupt handling works, and: > >http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html > > which removed some delays in the probe path. possibly this one > > is more likely to be at fault since it touches the probe path > > directly. > > add > /usr/src/sys/arch/amd64/conf/GENERIC.local > options AHCISATA_EXTRA_DELAY > > compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
IDENTIFY failed
Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure: wd1 at atabus1 drive 0 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 wd1: autoconfiguration error: IDENTIFY failed wd1(ahcisata0:1:0): using PIO mode 0 and booting fails. Reverting and booting with 9.99.90 gets me a working box: wd1 at atabus1 drive 0 wd1: wd1: drive supports 16-sector PIO transfers, LBA48 addressing wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect... ... wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) I'm sure someone else saw this too, but I can't find the original post... Cheers, Patrick
Re: /bin/sh tabcomplete
On Tue, Sep 14, 2021 at 11:04:47AM -0400, Christos Zoulas wrote: > This is a side effect of the change to add file-completion for commands. > Fixed. > > christos > > > On Sep 14, 2021, at 6:24 AM, Robert Elz wrote: > > > >Date:Tue, 14 Sep 2021 09:19:36 +0100 > >From:Patrick Welche > >Message-ID: > > > > | It seems that after updating a box from August -current to yesterday's > > | -current, /bin/sh's tabcomplete no longer escapes spaces? > > > > If something changed there, it must be related to the way that libedit now > > works, Christos? That was quick - thanks for fixing! Patrick
/bin/sh tabcomplete
It seems that after updating a box from August -current to yesterday's -current, /bin/sh's tabcomplete no longer escapes spaces? Cheers, Patrick
Re: zpool import skips wedges due to a race condition
On Wed, Sep 08, 2021 at 06:38:02AM -, Michael van Elst wrote: > al...@yandex.ru (Alexander Nasonov) writes: > > >When I run zfs import, it launches 32 threads and opens 32 disks in > >parallel, including cgd1 and dk24. But it can't open dk24 while > >cgd1 is still open (it fails with EBUSY). > > >I fixed it in the attatched patch by running only one thread. It's > >not the best approach but I'm not sure how to fix it properly. > > > There are other issues with scanning devices in an arbitrary order, > a parallel scan just makes it worse by adding randomness. > > LVM tries to solve this with an optional filter for device names > when scanning for physical volumes. > > The root detection code tries to solve this by scanning twice, once > for wedges, once for everything else, and by identifying wedges > that alias a partition. > > For a complete solution you would need to know all the device relationships > (dkX on wdY, cgdN on dmN, etc, but also e.g. dkX on cgdN). That still leaves > out hot-plug devices where "upper" devices appear late. I see something similar in another context: when I shutdown, shutdown can stall on a system with dkX on cgdN, with detaching dkX detaching cgdN detaching dkX(same X) (hang) If the dice are rolled correctly, and cgdN gets the detach before dkX, it shuts down properly... Cheers, Patrick
Re: Build failure for evbarmv6hf due to new OpenSSH
On Fri, Sep 03, 2021 at 06:46:20PM +0900, Rin Okuyama wrote: > Build for evbarmv6hf{,eb} fails due to collision of symbol tilde_expand b/w > libedit and new libopenssh: I just hit this too, and was bemused by why the amd64 build wasn't affected: # find /usr/src/distrib/amd64 -name list | grep ssh # find /usr/src/distrib/evbarm -name list | grep ssh /usr/src/distrib/evbarm/instkernel/sshramdisk/list ... applying your patch locally! Cheers, Patrick
Re: 9.99.86 HEAD
On Wed, Jun 30, 2021 at 05:57:46PM +, David Holland wrote: > On Wed, Jun 30, 2021 at 04:10:17PM +0100, Patrick Welche wrote: > > I see what you mean: the next for me is urtwn0 unable to open /dev/bpf. > > This should be fixed now. All much better now - thanks all! P
Re: 9.99.86 HEAD
On Wed, Jun 30, 2021 at 03:49:52PM +0200, Martin Husemann wrote: > On Wed, Jun 30, 2021 at 02:23:23PM +0100, Patrick Welche wrote: > > The one bug I see is that running cgdconfig causes a panic > > > > --- trap (number 6) --- > > vn_open() at vn_open+0x31d > > That one is fixed now, but there is still serious fallout. Thanks, yes, after your fix cgdconfig doesn't cause a panic. I see what you mean: the next for me is urtwn0 unable to open /dev/bpf. Cheers, Patrick
Re: 9.99.86 HEAD
On Wed, Jun 30, 2021 at 12:11:43PM +, voidpin wrote: > On Wed, Jun 30, 2021 at 12:03:21PM +, voidpin wrote: > > I've just tried to upgrade from 9.99.85 to 9.99.86 and I'm unable to even > > boot the system. > > Several errors related to acpi, cpu-temp, ipv6 and fs. > > It just loops back to the welcome screen and tries to reboot. > > > > Am I missing some major change? I haven't seen anything hinting at this in > > the mailing lists. > > > > The last working install image is from 29-Jun-2021 00:25 > > The following three look broken to me. > > On the plus side, I've managed to run fsck from the live image (29-Jun-2021 > > 00:25), correct the file system errors and downgrade back to 9.99.85. > > 9.99.86 was a bigger update, there were three independent kernel bump > > changes. > > > > It was discussed on the developers mailing list. > > > > There have been at least two major bugs reported by martin@ already. > > > > Please report your issues to current-users@ as well. > > The last two images for 9.99.85 also look to be broken. > At least, they were unable to boot my system after the upgrade to 9.99.86 > when I was trying to downgrade. > > The one from 29-Jun-2021 00:25 did work and my system is back up. Not sure of cause-and-effect, but my first attempt at booting 9.99.86 just restarted during boot. make clean in the kernel compile directory and rebuilding got me a kernel which booted. The one bug I see is that running cgdconfig causes a panic --- trap (number 6) --- vn_open() at vn_open+0x31d vn_bdev_openpath() at vn_bdev_openpath+0x40 cgdioctl() at cgdioctl+0x5f3 VOP_IOCTL() at VOP_IOCTL+0x41 vn_ioctl() at vn_ioctl+0xad sys_ioctl() at sys_ioctl+0x555 syscall() at syscall+0x1f2 --- syscall (number 54) --- syscall+0x1f2: Cheers, Patrick
Re: dump/restore out of range inode
On Sat, Jun 05, 2021 at 06:45:24PM +0200, J. Hannken-Illjes wrote: > Patrick, > > please try the attached diff so the "spcl.c_addr" test > no longer runs off the spcl record. > > "blks" is used for multi-tape checkpointing and examining > TS_INODE/TS_ADDR records should be sufficient as the are > the only records that support holes in data. Thanks! With your patch, the dump | restore has been happily running for about 12 hours now. In your previous email you mention: > This trace makes no sense, bitmaps (CLRI and BITS) don't have holes > and therefore ignore the "c_addr" array. I have no idea how dumping > a bitmap ends in the hole processing of flushtape(). Is it worth investigating further while I have the reproducer? Cheers, Patrick
Re: dump/restore out of range inode
On Sat, Jun 05, 2021 at 10:03:21AM -, Michael van Elst wrote: > pr...@cam.ac.uk (Patrick Welche) writes: > > >How can gdb not see a spcl anywhere? > > /usr/include/protocols/dumprestore.h:#define spcl u_spcl.s_spcl > > spcl is just a define that got resolved by the compiler. ach... here it is(gdb) print u_spcl.s_spcl $2 = {c_type = 6, c_old_date = 0, c_old_ddate = 0, c_volume = 1, c_old_tapea = 0, c_inumber = 397083647, c_magic = 424935705, c_checksum = 1906085926, __c_ino = {__uc_dinode = {di_mode = 0, di_nlink = 0, di_oldids = {0, 0}, di_size = 0, di_atime = 0, di_atimensec = 0, di_mtime = 0, di_mtimensec = 0, di_ctime = 0, di_ctimensec = 0, di_db = {0 }, di_ib = {0, 0, 0}, di_flags = 0, di_blocks = 0, di_gen = 0, di_uid = 0, di_gid = 0, di_modrev = 0}, __uc_ino = {__uc_mode = 0, __uc_spare1 = {0, 0, 0}, __uc_size = 0, __uc_old_atime = 0, __uc_atimensec = 0, __uc_old_mtime = 0, __uc_mtimensec = 0, __uc_spare2 = {0, 0}, __uc_rdev = 0, __uc_birthtimensec = 0, __uc_birthtime = 0, __uc_atime = 0, __uc_mtime = 0, __uc_spare4 = {0, 0, 0, 0, 0, 0, 0}, __uc_file_flags = 0, __uc_spare5 = {0, 0}, __uc_uid = 0, __uc_gid = 0, __uc_spare6 = {0, 0}}}, c_count = 48473, c_addr = '\000' , c_label = "none", '\000' , c_level = 0, c_filesys = "/store/backup", '\000' , c_dev = "/dev/rdk18", '\000' , c_host = "quantz", '\000' , c_flags = 2, c_old_firstrec = 0, c_date = 1622887657, c_ddate = 0, c_tapea = 10, c_firstrec = 0, c_spare = {0 }} (gdb) bt #0 flushtape () at /usr/src/sbin/dump/tape.c:333 #1 0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168 #2 0x00208e49 in dumpmap (map=, type=type@entry=6, ino=ino@entry=397083647) at /usr/src/sbin/dump/traverse.c:716 #3 0x0020b355 in main (argc=1, argv=0x7f7fe7e8) at /usr/src/sbin/dump/main.c:646 (gdb) list 328 } 329 330 blks = 0; 331 if (iswap32(spcl.c_type) != TS_END) { 332 for (i = 0; i < iswap32(spcl.c_count); i++) 333 if (spcl.c_addr[i] != 0) 334 blks++; 335 } 336 slp->count = lastspclrec + blks + 1 - iswap64(spcl.c_tapea); 337 slp->tapea = iswap64(spcl.c_tapea); (gdb) print i $6 = (gdb) print u_spcl.s_spcl.c_count $7 = 48473 (gdb) whatis u_spcl.s_spcl.c_addr type = char [512] so guess optimized_out i >> 512 c_type==6 = TS_CLRI map of inodes deleted since last dump (a bit odd: (gdb) print needswap $11 = 0 (gdb) print iswap32(u_spcl.s_spcl.c_count) $10 = 1505558528 ) Still puzzled... Cheers, Patrick
Re: dump/restore out of range inode
On Thu, Jun 03, 2021 at 05:14:07PM -, Michael van Elst wrote: > pr...@cam.ac.uk (Patrick Welche) writes: > > > DUMP: Child 29322 returns LOB status 213 > >213=0xd5 > > That's octal. Return status 0213 = 139 -> WCOREFLAG(==128) + signal 11. > > >Can this happen if the original filesystem is broken? At a distance > >it just looks as though restore hasn't read a symbol table before using it > >and the filesystem seems to have a valid inode? > > Segfaults should never happen. > > maxino has probably never been set since dump crashed and restore got > an early end-of-file. # dump -0auf foo.dmp /store/backup ... DUMP: pid=3262 Dumping /dev/rdk18 (/store/backup) to foo.dmp DUMP: pid=3262 Label: none Using 512 buffers (33574952 bytes) DUMP: pid=3262 mapping (Pass I) [regular files] DUMP: pid=3262 mapping (Pass II) [directories] DUMP: pid=3262 estimated 3632204910 tape blocks. DUMP: pid=3262 Tape: 1; parent process: 3262 child process 3402 DUMP: pid=3402 Child on Tape 1 has parent 3262, my pid = 3402 DUMP: pid=3402 Volume 1 started at: Sat Jun 5 10:35:18 2021 slave 0 wrote 10240 werror 22 and here process 3402 gets the SIGSEGV Program received signal SIGSEGV, Segmentation fault. flushtape () at /usr/src/sbin/dump/tape.c:333 333 if (spcl.c_addr[i] != 0) (gdb) bt #0 flushtape () at /usr/src/sbin/dump/tape.c:333 #1 0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168 #2 0x00208e49 in dumpmap (map=, type=type@entry=6, ino=ino@entry=397083647) at /usr/src/sbin/dump/traverse.c:716 #3 0x0020b355 in main (argc=1, argv=0x7f7fe7d8) at /usr/src/sbin/dump/main.c:646 (gdb) print spcl No symbol "spcl" in current context. (gdb) frame 1 #1 0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168 168 flushtape(); (gdb) print spcl No symbol "spcl" in current context. (gdb) print isspcl $2 = 0 How can gdb not see a spcl anywhere? Cheers, Patrick
dump/restore out of range inode
On a 9.99.83/amd64 box, I just observed the following: # mount -r -o noatime NAME=backup /store/backup # cd /tmp/foo # dump -0auf - /store/backup | /sbin/restore vdrf - Verify tape and initialize maps DUMP: Found /dev/rdk18 on /store/backup in mount table DUMP: Date of this level 0 dump: Thu Jun 3 17:12:25 2021 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rdk18 (/store/backup) to standard output DUMP: Label: none DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 3632204910 tape blocks. DUMP: Volume 1 started at: Thu Jun 3 17:13:51 2021 DUMP: Child 29322 returns LOB status 213 Volume header (new inode format) Dump date: Thu Jun 3 17:12:25 2021 Dumped from: the epoch Level 0 dump of /store/backup on quantz:/dev/rdk18 Label: none End-of-tape encountered Warning: End-of-input encountered while extracting End-of-tape encountered Warning: End-of-input encountered while extracting Begin level 0 restore Initialize symbol table. addino: out of range 2 abort? [yn] n [1] Donedump -0auf - /store/backup | Floating point exception (core dumped) /sbin/restore vdrf - Puzzle: DUMP: Child 29322 returns LOB status 213 213=0xd5 tape.c then looks as though 0xd5 >> 8 = 0 => X_FINOK all is well? addino: out of range 2 symtab.c: out of range as 2 >= maxino, which from the coredump is zero! entrytblsize also = zero. Can this happen if the original filesystem is broken? At a distance it just looks as though restore hasn't read a symbol table before using it and the filesystem seems to have a valid inode? Cheers, Patrick
Re: booting xen [was Re: serial console puzzle]
On Sat, May 01, 2021 at 12:33:17PM -0700, Greg A. Woods wrote: > I've copied this reply to port-xen as it's entirely Xen related. ... > On serial console machines I've been using NetBSD "console=xencons" for > ages. > > This is the documented (by Xen, i.e. preferred Xen way), for serial > consoles: > > menu=Boot Xen:load /netbsd-XEN3_DOM0 -v bootdev=dk0 > console=xencons;multiboot /xen bootscrub=false dom0_mem=4G console=com1,vga > console_timestamps=datems dom0_max_vcpus=4 dom0_vcpus_pin=true > pv-l1tf=off,domu=off vpmu=on cpuid=rdrand spec-ctrl=no-xen,l1d-flush=off > guest_loglvl=all Amazing: I removed "rndseed /var/db/entropy-file;" from the beginning of the xen entry in /boot.cfg and instead of getting (XEN) *** Building a PV Dom0 *** (XEN) ELF: not an ELF binary (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Could not construct domain 0 (XEN) I got (XEN) Dom0 has maximum 1128 PIRQs (XEN) *** Building a PV Dom0 *** (XEN) ELF: phdr: paddr=0x8020 memsz=0xe07000 (XEN) ELF: memory: 0x8020 -> 0x81007000 (XEN) ELF: note: GUEST_OS = "NetBSD" (XEN) ELF: note: GUEST_VERSION = "4.99" (XEN) ELF: note: XEN_VERSION = "xen-3.0" ... ! The only file which was edited was boot.cfg. Still no joy though: (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) (XEN) Freed 604kB init memory [ 1.000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 20, [ 1.000] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, , [ 1.000] 2018, 2019, 2020, 2021 The NetBSD Foundation, Inc. All righ. [ 1.000] Copyright (c) 1982, 1986, 1989, 1991, 1993 [ 1.000] The Regents of the University of California. All rights res. [ 1.000] NetBSD 9.99.82 (XEN3_DOM0) #4: Wed Apr 28 10:53:21 BST 2021 ... [ 1.030] pci14: i/o space, memory space enabled [ 1.030] entropy: WARNING: extracting entropy too early (XEN) mm.c:2980:d0v0 Bad type (saw e401 != exp 2000) fo) (XEN) mm.c:1142:d0v0 Attempt to create linear p.t. with write perms [ 1.030] xpq_flush_queue: 2 entries (0 successful) on cpu0 (0) [ 1.030] panic: HYPERVISOR_mmu_update failed, ret: -22 [ 1.030] cpu0: Begin traceback... [ 1.030] vpanic() at netbsd:vpanic+0x14a [ 1.030] device_printf() at netbsd:device_printf [ 1.030] xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update [ 1.030] pmap_zero_page() at netbsd:pmap_zero_page+0xe3 [ 1.030] uvm_pagealloc_strat() at netbsd:uvm_pagealloc_strat+0x1ef [ 1.030] pmap_get_physpage() at netbsd:pmap_get_physpage+0x1cb [ 1.030] pmap_growkernel() at netbsd:pmap_growkernel+0x1b3 [ 1.030] uvm_map_prepare() at netbsd:uvm_map_prepare+0x3a2 [ 1.030] uvm_map() at netbsd:uvm_map+0x70 [ 1.030] ubc_init() at netbsd:ubc_init+0x15b [ 1.030] main() at netbsd:main+0x33b [ 1.030] cpu0: End traceback... [ 1.030] fatal breakpoint trap in supervisor mode [ 1.030] trap type 1 code 0 rip 0x8024093d cs 0xe030 rflags 0x2020 [ 1.030] curlwp 0x80e75040 pid 0.0 lowest kstack 0x8198720 Stopped in pid 0.0 (system) at netbsd:breakpoint+0x5: leave ds es 0 fs bae0 gs ba80 rdi 6 rsi deadbeefdeadf00d rbp 8198bad0 rbx 2 rdx 1 rcx 6 rax 0 r8 2 r9 75 r10 0 r11 fffe r12 80c57088ostype+0x138 r13 8198bb18 r14 104 r15 10 rip 8024093dbreakpoint+0x5 cs e030 rflags 202 rsp 8198bad0 ss e02b netbsd:breakpoint+0x5: leave Cheers, Patrick
Re: booting xen [was Re: serial console puzzle]
On Fri, Apr 30, 2021 at 08:50:10PM +0200, Manuel Bouyer wrote: > On Fri, Apr 30, 2021 at 07:28:57PM +0100, Patrick Welche wrote: > > On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote: > > > On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote: > > > > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots. > > > > Nothing more appears on the console. (-current XEN, xen.gz from > > > > xenkernel415) > > > > > > Try xen-debug.gz ? > > > Do you get the Xen boot messages ? > > > > I don't get the Xen boot messages. Just tried xen-debug.gz and again I just > > see loading, and then a reboot. I don't think it gets as far xen*.gz. > > > > boot.cfg contains: > > > > menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load > > /netbsd-XEN3_ > > DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz > > dom0_mem=1024M > > should probably be: > menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load > /netbsd-XEN3_ DOM0 console=com0;multiboot /xen-debug.gz dom0_mem=1024M > console=com1 com1=57600,8n1,0x3f8 > > (should really be console=com0 for NetBSD, it doens't access the hardware and > use the I/O services from the hypervisor) Ah - I remembered that NetBSD starts at 0, but xen at 1, but clearly still muddled the boot.cfg. We now have some serial console output! Some of what flew by: (XEN) Xen version 4.15.0nb0 (prlw1@) (gcc (nb1 20210411) 10.3.0) debug=y Thu Ap1 (XEN) Latest ChangeSet: (XEN) build-id: d2ee973db9f01886c1297a3a469888de162702c6 (XEN) Bootloader: NetBSD/x86 BIOS Boot, Revision 5.11 (Tue Apr 20 14:32:11 UTC ) (XEN) Command line: dom0_mem=1024M console=com1 com1=57600,8n1,0x3f8 (XEN) Xen image load base address: 0 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) CPU Vendor: AMD, Family 16 (0x10), Model 9 (0x9), Stepping 1 (raw 00100f9) (XEN) Xen-e820 RAM map: (XEN) [, 0009dfff] (usable) ... (XEN) 5 disabled (XEN) 6 disabled (XEN) 7 disabled (XEN) TOM2: 00182000 (WB) (XEN) Xenoprofile: AMD IBS detected (0x1f) (XEN) Running stub recovery selftests... (XEN) Fixup #UD[]: 82d07fffe040 [82d07fffe040] -> 82d04038aa07 (XEN) Fixup #GP[]: 82d07fffe041 [82d07fffe041] -> 82d04038aa07 (XEN) Fixup #SS[]: 82d07fffe040 [82d07fffe040] -> 82d04038aa07 (XEN) Fixup #BP[]: 82d07fffe041 [82d07fffe041] -> 82d04038aa07 (XEN) HPET: 3 timers usable for broadcast (4 total) (XEN) NX (Execute Disable) protection active (XEN) Dom0 has maximum 1128 PIRQs (XEN) *** Building a PV Dom0 *** (XEN) ELF: not an ELF binary (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Could not construct domain 0 (XEN) (XEN) (XEN) Reboot in five seconds... $ file /netbsd* /netbsd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for NetBSD 9.99.82, not stripped /netbsd-XEN3_DOM0: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for NetBSD 9.99.82, with debug_info, not stripped (Should we move this to port-xen?) Cheers, Patrick
Re: booting xen [was Re: serial console puzzle]
On Fri, Apr 30, 2021 at 07:28:57PM +0100, Patrick Welche wrote: > On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote: > > On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote: > > > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots. > > > Nothing more appears on the console. (-current XEN, xen.gz from > > > xenkernel415) > > > > Try xen-debug.gz ? > > Do you get the Xen boot messages ? > > I don't get the Xen boot messages. Just tried xen-debug.gz and again I just > see loading, and then a reboot. I don't think it gets as far xen*.gz. > > boot.cfg contains: > > menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load > /netbsd-XEN3_ > DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz dom0_mem=1024M > > [Any one know how to avoid "Collecting System Inventory ..." so booting > doesn't take forever?] Bizarre observation: boot.cfg: menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot menu=Boot single user:rndseed /var/db/entropy-file;consdev com0,57600;boot -s menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load /netbsd-XEN3_DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz dom0_mem=1024M menu=Drop to boot prompt:prompt default=3 timeout=5 clear=1 If I press "1", I still get "loading /netbsd-XEN3_DOM0" and a spontaneous reboot. I need to press "2", then ctrl-D in single user for the equivalent of "1". Cheers, Patrick
booting xen [was Re: serial console puzzle]
On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote: > On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote: > > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots. > > Nothing more appears on the console. (-current XEN, xen.gz from > > xenkernel415) > > Try xen-debug.gz ? > Do you get the Xen boot messages ? I don't get the Xen boot messages. Just tried xen-debug.gz and again I just see loading, and then a reboot. I don't think it gets as far xen*.gz. boot.cfg contains: menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load /netbsd-XEN3_ DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz dom0_mem=1024M [Any one know how to avoid "Collecting System Inventory ..." so booting doesn't take forever?] Cheers, Patrick
Re: serial console puzzle
On Fri, Apr 30, 2021 at 04:52:41PM +0100, Patrick Welche wrote: > On Fri, Apr 30, 2021 at 05:23:54PM +0200, Manuel Bouyer wrote: > > On Fri, Apr 30, 2021 at 04:18:49PM +0100, Patrick Welche wrote: > > > On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote: > > > > On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote: > > > > > In /boot.cfg: > > > > > > > > > > menu=Boot normally:rndseed /var/db/entropy-file;consdev > > > > > com0,57600;boot > > > > > > > > > > # installboot -ve /dev/rsd0a > > > > > File system: /dev/rsd0a > > > > > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, > > > > > console com0 > > > > > > > > > > Yet in dmesg: > > > > > > > > > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO > > > > > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO > > > > > com1: console > > > > > > > > > > (so I don't actually see anything) > > > > > > > > > > (Wednesday's -current/amd64) > > > > > > > > > > > > > > > Thoughts? > > > > > > > > one possibility is that the bios has com0 and com1 swapped. > > > > In some case I had to explicitely set ioaddr with installboot to have > > > > the serial console working. > > > > > > I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B" > > > is 0x2f8, so they are the right way around. > > > > I've seen BIOSes report it the right way on in setup, but the wrong way > > to the boot loader. > > In such cases and explicit ioaddr did help. > > Indeed - it did! > > # installboot -ve /dev/rsd0a > File system: /dev/rsd0a > Boot options:timeout 5, flags 0, speed 57600, ioaddr 3f8, console com0 > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO > com0: console > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO > > now for xen... no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots. Nothing more appears on the console. (-current XEN, xen.gz from xenkernel415) This is BIOS boot, with disklabeled disk. GENERIC boots. Ho hum Patrick
Re: serial console puzzle
On Fri, Apr 30, 2021 at 05:23:54PM +0200, Manuel Bouyer wrote: > On Fri, Apr 30, 2021 at 04:18:49PM +0100, Patrick Welche wrote: > > On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote: > > > On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote: > > > > In /boot.cfg: > > > > > > > > menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot > > > > > > > > # installboot -ve /dev/rsd0a > > > > File system: /dev/rsd0a > > > > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console > > > > com0 > > > > > > > > Yet in dmesg: > > > > > > > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO > > > > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO > > > > com1: console > > > > > > > > (so I don't actually see anything) > > > > > > > > (Wednesday's -current/amd64) > > > > > > > > > > > > Thoughts? > > > > > > one possibility is that the bios has com0 and com1 swapped. > > > In some case I had to explicitely set ioaddr with installboot to have > > > the serial console working. > > > > I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B" > > is 0x2f8, so they are the right way around. > > I've seen BIOSes report it the right way on in setup, but the wrong way > to the boot loader. > In such cases and explicit ioaddr did help. Indeed - it did! # installboot -ve /dev/rsd0a File system: /dev/rsd0a Boot options:timeout 5, flags 0, speed 57600, ioaddr 3f8, console com0 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO com0: console com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO now for xen... Thanks, Patrick
Re: serial console puzzle
On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote: > On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote: > > In /boot.cfg: > > > > menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot > > > > # installboot -ve /dev/rsd0a > > File system: /dev/rsd0a > > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console com0 > > > > Yet in dmesg: > > > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO > > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO > > com1: console > > > > (so I don't actually see anything) > > > > (Wednesday's -current/amd64) > > > > > > Thoughts? > > one possibility is that the bios has com0 and com1 swapped. > In some case I had to explicitely set ioaddr with installboot to have > the serial console working. I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B" is 0x2f8, so they are the right way around. (Of course UEFI is fine, and I get console output, but...) Cheers, Patrick
serial console puzzle
In /boot.cfg: menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot # installboot -ve /dev/rsd0a File system: /dev/rsd0a Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console com0 Yet in dmesg: com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO com1: console (so I don't actually see anything) (Wednesday's -current/amd64) Thoughts? Cheers, Patrick
Re: running xen on current
On Thu, Apr 15, 2021 at 07:28:32AM -0400, Brad Spencer wrote: > Manuel Bouyer writes: > > > On Thu, Apr 15, 2021 at 09:53:50AM +0100, Patrick Welche wrote: > >> I have tried and failed to run xen on 3 -current/amd64 systems with > >> 3 different failure modes: > >> > >> 1) laptop: xen.gz Building a PV Dom0 / ELF: not an ELF binary -> > >> panic/reboot > >> 2) desktop: XEN3_DOM0 panics including PR port-xen/55978 > >> 3) server: Trampoline space cannot be allocated; will try fallback -> > >> reboot > >> > >> They are all working NetBSD-current/amd64 systems. > >> > >> My conclusion was that xen is hopelessly broken, so was quite surprised > >> by Greg Wood's thread about the finer points of running a guest OS, given > >> that those systems won't even start the host OS. > >> > >> I dug out an old desktop, and to my pleasant surprise it booted XEN3_DOM0, > >> and I have managed to run some XEN3_DOMUs. > >> > >> The difference between the working/broken setups seems to be that the > >> working one is "BIOS" booting rather than EFI booting. > >> > >> Among all your xen success stories, are any of you EFI booting? > > > > AFAIK EFI is not yet supported by Xen (maybe this is supported by 4.15, > > I've not had a chance to try yet). I have it running on fairly recent > > Dell servers (in BIOS mode) > > > There has been fiddling with Xen and EFI for quite some time. See: > > https://wiki.xenproject.org/wiki/Xen_EFI > > for example (might be old)... this indicates that Xen 4.3 or later could > be built as a EFI binary and probably booted from the EFI firmware > directly or with grub2 when grub2 is a EFI binary itself. Of course > those instructions are all Linux-centric and I don't know if you created > a Xen kernel like this if it would boot a NetBSD DOM0 kernel. I am in > no position to try any tests with this right now personally, but it is > tempting as I have a EFI only laptop that I could probably replace the > hard drive temporarily. Looking at https://xenproject.org/2021/04/08/xen-project-hypervisor-4-15/ (so 4.15 only just came out!) I see Unified boot images: It is now possible to create an image bundling together files needed for Xen to boot into a single EFI binary; making it now possible to boot a functional Xen system directly from the EFI boot manager, rather than having to go through grub multiboot. Files that can be bundled include a hypervisor, dom0 kernel, dom0 initrd, Xen KConfig, XSM configuration, and a device tree. I thought that "go through grub multiboot" was the equivalent of our boot.cfg "multiboot /xen.gz dom0_mem=1024M", but apparently not? (Seems different to booting straight from the EFI boot menu) Cheers, Patrick
running xen on current
I have tried and failed to run xen on 3 -current/amd64 systems with 3 different failure modes: 1) laptop: xen.gz Building a PV Dom0 / ELF: not an ELF binary -> panic/reboot 2) desktop: XEN3_DOM0 panics including PR port-xen/55978 3) server: Trampoline space cannot be allocated; will try fallback -> reboot They are all working NetBSD-current/amd64 systems. My conclusion was that xen is hopelessly broken, so was quite surprised by Greg Wood's thread about the finer points of running a guest OS, given that those systems won't even start the host OS. I dug out an old desktop, and to my pleasant surprise it booted XEN3_DOM0, and I have managed to run some XEN3_DOMUs. The difference between the working/broken setups seems to be that the working one is "BIOS" booting rather than EFI booting. Among all your xen success stories, are any of you EFI booting? Cheers, Patrick = Some extra gory details 1) laptop: Building a PV Dom0 ELF: Not an ELF binary *** Panic on CPU 0: Could not set up DOM0 guest OS *** Reboot in five seconds... 2) desktop: selection of panics in addition to PR port-xen/55978 [ 80.989] panic: LIST_INSERT_HEAD 0xa080073eec28 ../../../../arch/x86/x86/pmap.c:2285 [ 80.989] cpu13: Begin traceback... [ 80.989] vpanic() at netbsd:vpanic+0x14a [ 80.989] snprintf() at netbsd:snprintf [ 80.989] pmap_enter_ma() at netbsd:pmap_enter_ma+0x14e7 [ 80.989] pmap_enter() at netbsd:pmap_enter+0x32 [ 80.989] udv_fault() at netbsd:udv_fault+0x100 [ 80.989] uvm_fault_internal() at netbsd:uvm_fault_internal+0x574 [ 80.989] trap() at netbsd:trap+0x432 [ 80.989] --- trap (number 6) --- [ 80.989] 7a60617787af: [ 80.989] cpu13: End traceback... [ 75.6599981] panic: kernel diagnostic assertion "ncp->nc_dvp == dvp" failed: file "../../../../kern/vfs_cache.c", line 432 [ 75.6599981] cpu0: Begin traceback... [ 75.6599981] vpanic() at netbsd:vpanic+0x14a [ 75.6599981] kern_assert() at netbsd:kern_assert+0x48 [ 75.6599981] cache_lookup_entry() at netbsd:cache_lookup_entry+0xde [ 75.6599981] cache_lookup_linked() at netbsd:cache_lookup_linked+0x160 [ 75.6599981] namei_tryemulroot() at netbsd:namei_tryemulroot+0x298 [ 75.6599981] namei() at netbsd:namei+0x29 [ 75.6599981] vn_open() at netbsd:vn_open+0x8f [ 75.6599981] do_open() at netbsd:do_open+0x119 [ 75.6599981] do_sys_openat() at netbsd:do_sys_openat+0x74 [ 75.6599981] sys_open() at netbsd:sys_open+0x24 [ 75.6599981] syscall() at netbsd:syscall+0x9c [ 75.6599981] --- syscall (number 5) --- [ 75.6599981] netbsd:syscall+0x9c: [ 75.6599981] cpu0: End traceback... 3) server: EFI boot of Feb 6 2021, xenkernel413-4.13.3.tgz, serial console On serial console, all that is seen is: 2415648+1324000=0x3910ec Loading /var/db/entropy-file Loading /netbsd-XEN3_DOM0 Start @ 0xce60 [1=0xce991000-0xce9910ec]... Trampoline space cannot be allocated; will try fallback. then it reboots
atomic_load_relaxed panic
On an otherwise idle amd64 box running NetBSD 9.99.81 (GENERIC) #3: Fri Mar 26 17:37:48 GMT 2021 I got tantalisingly close to managing to clone the NetBSD source: # hg clone http://anonhg.netbsd.org/src destination directory: src applying clone bundle from https://cdn.NetBSD.org/_bundles/src/77d2a2ece3a06d837da45acd0fda80086ab4113c.zstd.hg adding changesets adding manifests adding file changes added 931876 changesets with 2425841 changes to 439702 files (+417 heads) finished applying clone bundle searching for changes adding changesets adding manifests adding file changes adding changesets adding manifests adding file changes added 14681 changesets with 130115 changes to 91055 files (+5 heads) new changesets 26c8f37631b6:87ca3e58cbb1 (241 drafts) 280333 local changesets published updating to branch trunk updating [=> ] 71400/202610 17s (any hints on making this go faster?) When: [ 257027.9185592] panic: kernel diagnostic assertion "atomic_load_relaxed(> [ 257027.9485598] cpu2: Begin traceback... [ 257027.9585600] vpanic() at netbsd:vpanic+0x156 [ 257027.9685597] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax [ 257027.9785609] pmap_clear_attrs() at netbsd:pmap_clear_attrs+0x124 [ 257027.9985632] uvm_pagemarkdirty() at netbsd:uvm_pagemarkdirty+0x37b [ 257028.0085600] uao_get() at netbsd:uao_get+0x315 [ 257028.0185597] ubc_fault() at netbsd:ubc_fault+0x16a [ 257028.0285614] uvm_fault_internal() at netbsd:uvm_fault_internal+0x57a [ 257028.0485613] trap() at netbsd:trap+0x5b7 [ 257028.0485613] --- trap (number 6) --- [ 257028.0585610] copyout() at netbsd:copyout+0x33 [ 257028.0685603] uiomove() at netbsd:uiomove+0x80 [ 257028.0785605] ubc_uiomove() at netbsd:ubc_uiomove+0x157 [ 257028.0885612] tmpfs_read() at netbsd:tmpfs_read+0xbe [ 257028.0985609] VOP_READ() at netbsd:VOP_READ+0x40 [ 257028.1185598] vn_read() at netbsd:vn_read+0x97 [ 257028.1285595] dofileread() at netbsd:dofileread+0x79 [ 257028.1385631] sys_read() at netbsd:sys_read+0x49 [ 257028.1485614] syscall() at netbsd:syscall+0x23e [ 257028.1585616] --- syscall (number 3) --- [ 257028.1685608] netbsd:syscall+0x23e: [ 257028.1685608] cpu2: End traceback... [ 257028.1785615] dumping to dev 168,2 (offset=8, size=41938679): [ 257028.1885651] dump 9684 9683 9682 9681 9680 9679 9678 9677 9676 9675 9674 9 tip in tmux doesn't appear to be wrapped... Hopefully the dump will succeed... Cheers, Patrick
Re: bluetooth ubt0
On Tue, Mar 16, 2021 at 07:33:14AM +, Iain Hibbert wrote: > > I wish someone could implement newer Intel ubt support. > > Hm I can look into it - is it possible to buy one of these intel adapters > which is USB or do they only come built in to laptops etc? This one is built-in. (A dongle is already in the post.) I'm happy to test anything... In the meantime, the work around of boot into windows (from your analysis to load the firmware), and restart into NetBSD got me working bluetooth(!). As this is my first foray, back to some basic questions which after perusal of chapter 21 of the guide I'm still in the dark about. Overall, I'm trying to update the firmware of a bluetooth label printer. Alledgedly this will move the print head away from the label to stop it jamming. really? The instructions are to upload the file using "serial bluetooth terminal" from android selecting protocol "raw". I see: 2: bdaddr 00:07:80:ac:19:4f : name "Pro 02105 " : class [0x040680] Printer : page scan rep mode 0x01 : clock offset 17621 : rssi 0 # sdpquery -d ubt0 -a 00:07:80:ac:19:4f Browse ServiceRecordHandle: 0x0001 ServiceClassIDList: Serial Port ProtocolDescriptorList: L2CAP RFCOMM (channel 1) BrowseGroupList: Public Browse Root LanguageBaseAttributeIDList: en.UTF-8 base 0x0100 ServiceName: "Bluetooth Serial Port" so it is looking promising given the SP, but what next? I tried # cat /etc/bluetooth/hosts 00:07:80:ac:19:4f printer # btpin -d ubt0 -a printer -p # rfcomm_sppd -d ubt0 -a printer -t /dev/ttyp1 rfcomm_sppd: Cannot open `/dev/ptyp1': No such file or directory # rfcomm_sppd -d ubt0 -a printer -t /dev/pts/5 rfcomm_sppd: Cannot open `/dev/pts/p': No such file or directory # rfcomm_sppd -d ubt0 -a printer rfcomm_sppd: SP: Host is down Cheers, Patrick
Re: bluetooth ubt0
My previous note hasn't appeared on the list yet, so out of order resolution: when the printer was running on battery, it behaved differently and # rfcomm_sppd -d ubt0 -a printer < Pro\ printer\ config\ file.txt rfcomm_sppd[1645]: Starting on stdio... rfcomm_sppd[1645]: Completed on stdio suggests success! Cheers, Patrick
bluetooth ubt0
A first foray into bluetooth on this amd64 laptop gives me: ubt0: Intel (0x8087) product 0aaa (0x0aaa), rev 2.00/0.02, addr 4 ubt0: autoconfiguration error: CommandComplete opcode (003|0003) failed (status= 0x01) and after a btconfig ubt0 up, btconfig shows ubt0: bdaddr 00:00:00:00:00:00 flags 0x2e0 which doesn't look like a valid address. (Bluetooth is "on", at least according to OtherOS) Any thoughts on how to get it going? Cheers, Patrick # usbdevs -v -v -a 4 Controller /dev/usb0: Controller /dev/usb1: addr 4: full speed, self powered, config 1, product 0aaa(0x0aaa), Intel(0x8087), rev 0.02(0x0002) Wireless(0xe0), Radio Frequency(0x01), proto 1 Controller /dev/usb2: Controller /dev/usb3:
sed and xentools413
xentools413 is repeatably failing with ./config/ioapi.h:17:10: fatal error: config/local/ioapi.h: No such file or directory 17 | #include In a very round about way this seems to be related to [DEPS] arch/x86/drivers/net/undiisr.S sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\) gmake[7]: *** Deleting file 'bin/deps/arch/x86/drivers/net/undiisr.S.d' [DEPS] arch/x86/transitions/librm.S sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\) gmake[7]: *** Deleting file 'bin/deps/arch/x86/transitions/librm.S.d' ... apparently from pkgsrc/sysutils/xentools413/work.x86_64/ipxe-1dd56dbd11082fb622c2ed21cfaced4f47d798a6/src/Makefile.housekeeping sed 's/\.o\s*:/_DEPS +=/' > $(BIN)/deps/$(1).d Index: Makefile === RCS file: /cvsroot/pkgsrc/sysutils/xentools413/Makefile,v retrieving revision 1.17 diff -u -r1.17 Makefile --- Makefile8 Mar 2021 08:13:06 - 1.17 +++ Makefile8 Mar 2021 15:12:13 - @@ -53,7 +53,7 @@ EGDIR= ${PREFIX}/share/examples/xen MESSAGE_SUBST+=EGDIR=${EGDIR} -USE_TOOLS+=pod2man gmake pkg-config makeinfo perl bash cmake +USE_TOOLS+=pod2man gmake pkg-config makeinfo perl bash cmake gsed USE_LANGUAGES= c c++ GNU_CONFIGURE= YES gets me a successful build. I'm assuming a WORKSFORME response for xentools413, so wondering whether something changed in -current sed that would explain the above. Cheers, Patrick
Re: serial console
On Wed, Feb 17, 2021 at 12:34:21AM -0800, Darrin B. Jewell wrote: > > I honestly hope this isn't related to your problem, but I did make a > change to the amd64 cdrom building recently that is in this area > and may be the issue. > >https://mail-index.netbsd.org/source-changes/2021/02/06/msg126676.html > > I thought I was only affecting the boot-com.iso install cd, but this > change should probably be examined to make sure it isn't causing your > problem. I didn't think your change had anything to do with it, but just in case, I repeated the experiment with today's code (as you reverted your change already) and had the same outcome. Cheers, Patrick
serial console
Has something changed recently in the land of serial consoles? I am pretty sure that once I enabled serial console redirection "after boot" in the "bios" of this amd64 uefi booting server, with a serial port plugged in, I would have a serial console with the default /boot.cfg. After getting access to the building(!), it seems I now need to add consdev com0,115200 to each menu item in boot.cfg. (Putting it on a line on its own seems insufficient - is that a bug?) Cheers, Patrick
alignment and packed structs
I just tried to compile if_iwn as a module. It failed with dev/pci/if_iwn.c:2685:6: error: converting a packed 'struct iwn_fw_dump' pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Werror=address-of-packed-member] I got around it with diff -u -r1.17 if_iwnreg.h --- if_iwnreg.h 19 Jul 2017 16:55:12 - 1.17 +++ if_iwnreg.h 6 Jan 2021 17:24:01 - @@ -1447,7 +1447,7 @@ uint32_tsrc_line; uint32_ttsf; uint32_ttime[2]; -} __packed; +} __attribute__((aligned(4),packed)); /* TLV firmware header. */ struct iwn_fw_tlv_hdr { Why isn't this necessary when building if_iwn.c as part of a kernel? Is the above the right solution? Cheers, Patrick
Re: netbsd32_coredump
> On Sat, 14 Nov 2020, Patrick Welche wrote: > > > Just upgraded a pi zero w from 9.99.10 to 9.99.75/evbarm-earmv6hf > > i.e., 32 bit, and on boot with a standard RPI kernel and new dtb (but > > presumably old startelf) > > > > panic: kernel diagnostic assertion "!*hooked" failed: file > > "/usr/src/sys/kern/kern_module_hook.c", line 70 > On Sat, Nov 14, 2020 at 09:36:07AM -0800, Paul Goyette wrote: > That should already be fixed. Can you update to HEAD? Sure enough that .75 was from last Sunday - yesterday's .75 is fine! Thanks, Patrick
netbsd32_coredump
Just upgraded a pi zero w from 9.99.10 to 9.99.75/evbarm-earmv6hf i.e., 32 bit, and on boot with a standard RPI kernel and new dtb (but presumably old startelf) panic: kernel diagnostic assertion "!*hooked" failed: file "/usr/src/sys/kern/kern_module_hook.c", line 70 0x8097ae3c: netbsd:vpanic+0xc 0x8097ae54: netbsd:kern_assert+0x3c 0x8097ae84: netbsd:module_hook_set+0x98 0x8097aea4: netbsd:compat_netbsd32_coredump_modcmd+0xa0 0x8097af0c: netbsd:module_do_builtin+0x16c 0x8097af4c: netbsd:module_init_class+0x210 0x8097af9c: netbsd:main+0x38c 0x8097afac: netbsd:kernel_text+0x54 Cheers, Patrick
Re: vmstat
On Tue, Nov 10, 2020 at 05:07:10PM +0100, Lars Reichardt wrote: > Those pool names seem clearly broken. > It seems the box is running with a zfs pool which has it's own cache > (ARC) which memory is entirely wired and might grow quite large. > What does vmstat -vmC show for those pools? zpool export various... # zpool list no pools available # zfs list no datasets available # modunload zfs and those pools are gone! 10G still wired though, however 12G free memory, so the box is zipping along once more! Those zfs partitions weren't being used... Thanks, Patrick
Re: vmstat
On Tue, Nov 10, 2020 at 05:07:10PM +0100, Lars Reichardt wrote: > On Tue, 10 Nov 2020 12:12:14 + > Patrick Welche wrote: > > > # vmstat -C > > Pool cache statistics. > > Name Spin GrpSz Full Emty PoolLayer CacheLayer Hit% > > CpuLayer Hit% amappl 863 660496221 > > 1592973 68.8 563372236 99.7 anonpl 6077263 11400 0 > > 55980901 177419830 68.4 9594977028 98.2 ... > > xhcixfer 01500 5 8 37.5 > > 77 89.6 xhcixfer 01500 5 11 > > 54.5 193 94.3 zfs_znode_cache 2906 15 510 4258698 > > 17474551 75.6 215763165 91.9 ©ÿÿ 236815 53 > > 0 4471316 17390812 74.3 305275419 94.3 ©ÿÿ 0 > > 1500258353 259551 0.5 277790 6.6 ©ÿÿ > > 01500117290 117459 0.1 119696 > > 1.9 ©ÿÿ 01500 74991 75197 0.3 > > 79474 5.4 ©ÿÿ 21500 36643 > > 36739 0.3 37986 3.3 ... > > > > > > interesting "Pool" name... (10G wired memory box) > > > > > > Those pool names seem clearly broken. > It seems the box is running with a zfs pool which has it's own cache > (ARC) which memory is entirely wired and might grow quite large. > What does vmstat -vmC show for those pools? Memory resource pool statistics NameSize Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle zfs_znode_cache 2906 1500 4258698 17474551 75.6 215763165 91.9 zil_lwb_cache 01500 0 0 nan 0 nan ©ÿÿ 23681500 4471316 17390812 74.3 305275419 94.3 ©ÿÿ 01500258353 259551 0.5 277790 6.6 ©ÿÿ 01500117290 117459 0.1 119696 1.9 ©ÿÿ 01500 74991 75197 0.3 79474 5.4 ©ÿÿ 21500 36643 36739 0.3 37986 3.3 ... Thanks - I'll see if unmounting them etc helps... Patrick
Re: wired memory
On Tue, Nov 10, 2020 at 05:26:26AM -0800, Chuck Silvers wrote: > On Tue, Nov 10, 2020 at 11:40:08AM +0000, Patrick Welche wrote: > > My 4 Nov -current/amd64 build box seems to be building slowly, with a lot > > of "biowait", e.g. watching the RES of a cc1plus slowly crawl up to SIZE > > at around 3M per 5s. > > > > Is 10G of "Wired" normal? (32G RAM + 64G swap) > > (lots of malloc(9) and no free(9)?) > > kernel memory usage is not reported as "wired" memory. I got that notion from: https://www.netbsd.org/docs/internals/en/chap-memory.html#wired_memory > > Memory: 3235M Act, 115M Inact, 10G Wired, 75M Exec, 2860M File, 2800M Free > > Swap: 64G Total, 11G Used, 53G Free > > no, 10GB wired memory out of 32GB total RAM is not a typical situation. > could you send me the output of "ps aux" and "vmstat -m" ? Thanks - sent off list > with 11GB of swap in use, it's not surprising that the system would be > very slow. Part of the question is why it needs to swap... (where did the 10G go?) > > it is "just" building... > > > > Could kern.maxfiles = 108928 be an issue? (3404 default below) > > do you mean that on the system that is slow, kern.maxfiles is 108928? > that doesn't seem overly large for 32GB RAM. > > I'm not sure what you mean by "3404 default below". just that the box below had the default kern.maxfiles=3404 setting, but as you say that shouldn't be too large, and without it, one sees complaints of insufficient files. > > Just looked at another amd64 box for comparison (also building - no swap): > > > > Memory: 15G Act, 3820K Inact, 16M Wired, 44M Exec, 14G File, 135G Free > > Swap: > > this system has a lot more RAM and most of it isn't even used, > so I would expect this system to perform much better than the first one. Yes, but also just 16M of "Wired" memory... Cheers, Patrick
vmstat
# vmstat -C Pool cache statistics. Name Spin GrpSz Full Emty PoolLayer CacheLayer Hit%CpuLayer Hit% amappl 863 6604962211592973 68.8 563372236 99.7 anonpl 6077263 11400 0 55980901 177419830 68.4 9594977028 98.2 ... xhcixfer 01500 5 8 37.5 77 89.6 xhcixfer 01500 5 11 54.5 193 94.3 zfs_znode_cache 2906 15 510 4258698 17474551 75.6 215763165 91.9 ©ÿÿ 236815 530 4471316 17390812 74.3 305275419 94.3 ©ÿÿ 01500258353 259551 0.5 277790 6.6 ©ÿÿ 01500117290 117459 0.1 119696 1.9 ©ÿÿ 01500 74991 75197 0.3 79474 5.4 ©ÿÿ 21500 36643 36739 0.3 37986 3.3 ... interesting "Pool" name... (10G wired memory box) Cheers, Patrick
wired memory
My 4 Nov -current/amd64 build box seems to be building slowly, with a lot of "biowait", e.g. watching the RES of a cc1plus slowly crawl up to SIZE at around 3M per 5s. Is 10G of "Wired" normal? (32G RAM + 64G swap) (lots of malloc(9) and no free(9)?) Memory: 3235M Act, 115M Inact, 10G Wired, 75M Exec, 2860M File, 2800M Free Swap: 64G Total, 11G Used, 53G Free it is "just" building... Could kern.maxfiles = 108928 be an issue? (3404 default below) Just looked at another amd64 box for comparison (also building - no swap): Memory: 15G Act, 3820K Inact, 16M Wired, 44M Exec, 14G File, 135G Free Swap: Cheers, Patrick
gdb core dump
A Monday amd64 kernel running box panicked overnight during a bulk build. Reading symbols from netbsd.0... (No debugging symbols found in netbsd.0) (gdb) target kvm netbsd.0.core [1] Abort trap (core dumped) gdb netbsd.0 # gdb `which gdb` gdb.core GNU gdb (GDB) 11.0.50.20200914-git ... Program terminated with signal SIGABRT, Aborted. # crash -N netbsd.0 -M netbsd.0.core Crash version 9.99.74, image version 9.99.74. System panicked: kernel diagnostic assertion "(pg->flags & PG_PAGEOUT) == 0" failed: file "../../../../uvm/uvm_page.c", line 1448 Backtrace from time of crash is available. crash> bt _KERNEL_OPT_NAGR() at 0 ?() at c9803de74b01 sys_reboot() at sys_reboot vpanic() at vpanic+0x160 __x86_indirect_thunk_rax() at __x86_indirect_thunk_rax uvm_pagefree() at uvm_pagefree+0x62d uvm_anon_release() at uvm_anon_release+0x6d uvm_aio_aiodone_pages() at uvm_aio_aiodone_pages+0x439 uvm_aio_aiodone() at uvm_aio_aiodone+0x226 dkiodone() at dkiodone+0x97 biointr() at biointr+0x61 softint_dispatch() at softint_dispatch+0xf5 crash: _kvm_kvatop(c9825cf930b8) crash: kvm_read(0xc9825cf930b8, 8): invalid translation (invalid PTE) # ident /netbsd.old | grep uvm_ | grep 2020/10 $NetBSD: uvm_bio.c,v 1.123 2020/10/18 08:52:15 rin Exp $ $NetBSD: uvm_init.c,v 1.54 2020/10/07 17:51:50 chs Exp $ $NetBSD: uvm_page.c,v 1.249 2020/10/18 18:31:31 chs Exp $ $NetBSD: uvm_pager.c,v 1.130 2020/10/18 18:22:29 chs Exp $ $NetBSD: uvm_pgflcache.c,v 1.6 2020/10/18 18:31:31 chs Exp $ $NetBSD: uvm_pglist.c,v 1.86 2020/10/07 17:51:50 chs Exp $ $NetBSD: uvm_swap.c,v 1.200 2020/10/07 17:51:50 chs Exp $ Cheers, Patrick
panic rebooting yesterday's kernel
Booted a yesterday's source amd64 kernel, and on reboot [ 17037.8583948] unmounting 0xfef63b813000 / (/dev/dk14)... [ 17037.8583948] forcefully unmounting / (/dev/dk14)... [ 17037.8783949] dk14 at wd4 (root) deleted [ 17037.8783949] wd4: detached [ 17037.8883950] atabus8: detached [ 17037.8883950] ahcisata1: detached [ 17038.6083908] Kernel lock error: _kernel_lock,244: spinout [ 17038.6083908] lock address : 0x81082f00 type : spin [ 17038.6083908] initialized : 0x80bc8340 [ 17038.6083908] shared holds : 0 exclusive: 1 [ 17038.6083908] shares wanted: 0 exclusive: 3 [ 17038.6083908] relevant cpu : 0 last held: 1 [ 17038.6083908] relevant lwp : 0xfefd19744080 last held: 0xfef642bf4280 [ 17038.6083908] last locked* : 0x80a42ab1 unlocked : 0x80323d4c [ 17038.6083908] curcpu holds : 0 wanted by: 0xfefd19744080 [ 17038.6763377] Skipping crash dump on recursive panic [ 17038.6816213] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinout [ 17038.6886673] cpu0: Begin traceback... [ 17038.6886673] vpanic() at netbsd:vpanic+0x156 [ 17038.6983892] snprintf() at netbsd:snprintf [ 17038.6983892] lockdebug_more() at netbsd:lockdebug_more [ 17038.7083890] _kernel_lock() at netbsd:_kernel_lock+0x22a [ 17038.7183890] intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x14 [ 17038.7183890] Xhandle_ioapic_edge17() at netbsd:Xhandle_ioapic_edge17+0x6e [ 17038.7296195] --- interrupt --- [ 17038.7296195] _kernel_lock() at netbsd:_kernel_lock+0x1f8 [ 17038.7403251] callout_softclock() at netbsd:callout_softclock+0x425 [ 17038.7483888] softint_dispatch() at netbsd:softint_dispatch+0xf5 address 0xc8025cf990b8 is invalid address 0xc8025cf990b0 is invalid address 0xc8025cf990c0 is invalid address 0xc8025cf990b8 is invalid address 0xc8025cf990c8 is invalid address 0xc8025cf990c0 is invalid address 0xc8025cf990d0 is invalid address 0xc8025cf990c8 is invalid [ 17038.7784674] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xc8025cf98ff0 [ 17038.7900680] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 17038.7993371] --- interrupt --- address 0xc8025cf990c8 is invalid address 0xc8025cf99080 is invalid [ 17038.8084475] Bad frame pointer: 0xc8025cf98450 [ 17038.8084475] c8025cf98450: [ 17038.8185590] cpu0: End traceback... so no ddb either... Cheers, Patrick
Re: file system corruption
On Sun, Oct 11, 2020 at 11:19:16PM +0200, Thomas Klausner wrote: > I've had serious file system corruption. Mostly in mercurial and > sqlite3 databases, but also in normal files. > Anyone else having problems? Is yours a ryzen system? (mine is, and it has filesystem issues - just trying to see why it is not a common issue) Cheers, Patrick
Re: file system corruption
On Mon, Oct 12, 2020 at 06:39:48AM +0200, Martin Husemann wrote: > On Sun, Oct 11, 2020 at 11:19:16PM +0200, Thomas Klausner wrote: > > I don't know enough about the internals of the hg and sqlite3, but I > > also saw a broken zip archive and had a good copy for comparison. In > > that case, a block of 256 bytes was zero instead of the real data. > > Do you know the file offset where the corruption started? > Can you show "dumpfs $rawdev | head -15" for that file system? Reminds me of PR kern/55362. If I started with a disk full of zeros, some ranges would have zero instead of the real data. If I started with a disk full of ones, some ranges would contain ones instead of the real data. In other news, just now, after a clean reboot to use a new kernel, the system came up with [ 1885.434544] panic: ffs_blkfree: bad size: dev = 0xa803, bno = 331526 bsize = 32768, size = 12288, fs = /usr/obj (different filesytem & disk) Cheers, Patrick
ctype and gcc9
Since gcc9, essentially every ctype using piece of software fails with error: array subscript has type 'char' [-Werror=char-subscripts] which prompts a style question: cast every argument of every call to a ctype function in every piece of software to (unsigned char), or -Wno-error=char-subscripts], or something else? Cheers, Patrick
Re: mesalib abort
On Thu, Sep 17, 2020 at 03:42:09PM +0200, Martin Husemann wrote: > On Thu, Sep 17, 2020 at 02:11:13PM +0100, Patrick Welche wrote: > > #2 0x7f7ff160c63d in pthread_create (thread=0x2f9d, > > thread@entry=0x7f7fd588, attr=attr@entry=0x0, > > startfunc=startfunc@entry=0x7f7fed764930 , > > arg=arg@entry=0x7f7ff7e9f230) at /usr/src/lib/libpthread/pthread.c:404 > > That is: > > if (__predict_false(__uselibcstub)) { > pthread__errorfunc(__FILE__, __LINE__, __func__, > "pthread_create() requires linking with -lpthread"); > return __libc_thr_create_stub(thread, attr, startfunc, arg); > } > > > So "something" needs to be linked with -pthread but isn't. Thanks: I confused myself thinking that the glmark2 in pkgsrc also coredumps. It doesn't. I updated glmark2, as it depends on python 3 rather than 2. This is interesting: commit a7bd0084f67b80ae9c32189e4f28a4e7d08f0b92 Author: Jamie Madill Date: Thu Feb 7 23:33:08 2019 -0500 Add gl, egl and glx loader using GLAD. Instead of hard linking against libGL, libEGL and libGLESv2 we can load the entry points at runtime. Loading dynamically is more flexible across different platforms. Note that the glad loaders are licensed under public domain. Preparation for Windows support in Issue #9. and it is /usr/X11R7/lib/libGL.so.3 which provides the libpthread, which is now absent as it isn't linked. Thanks for the clues! Cheers, Patrick
Re: mesalib abort
On Thu, Sep 17, 2020 at 01:38:21PM +0200, Tobias Nygren wrote: > On Thu, 17 Sep 2020 12:04:10 +0100 > Patrick Welche wrote: > > It looks as though line 50 simply undoes line 48, and it doesn't matter > > whether or not line 49 fails. How can this break? > > This is a red herring. Because they mask the SIGSEGV/SIGBUS handlers > you are not getting the actual fault location which happens inside > thrd_create. Instead the process takes the fault when the signal mask > is restored. This code is severly broken and upstream should fix it. > Anyway, If you comment out the signal masking crap you will get a > proper backtrace. It does break in between as you say: #0 0x7f7ff598991a in _lwp_kill () from /usr/lib/libc.so.12 #1 0x7f7ff5846c3c in __libc_thr_create_stub (tp=tp@entry=0x2f9d, ta=ta@entry=0x0, f=f@entry=0x7f7fed764930 , a=a@entry=0x7f7ff7e9f230) at /usr/src/lib/libc/thread-stub/thread-stub.c:418 #2 0x7f7ff160c63d in pthread_create (thread=0x2f9d, thread@entry=0x7f7fd588, attr=attr@entry=0x0, startfunc=startfunc@entry=0x7f7fed764930 , arg=arg@entry=0x7f7ff7e9f230) at /usr/src/lib/libpthread/pthread.c:404 #3 0x7f7fed764a90 in thrd_create ( func=0x7f7fed764af6 , arg=0x7f7ff7e9f220, thr=0x7f7fd588) at /usr/xsrc/external/mit/MesaLib/dist/include/c11/threads_posix.h:289 #4 u_thread_create (param=0x7f7ff7e9f220, routine=0x7f7fed764af6 ) at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:54 Thanks, Patrick
mesalib abort
When running glmark2 with native xsrc on -current/amd64, I get Program terminated with signal SIGABRT, Aborted. #0 0x7f7ff5844a2a in _sys___sigprocmask14 () from /usr/lib/libc.so.12 (gdb) bt #0 0x7f7ff5844a2a in _sys___sigprocmask14 () from /usr/lib/libc.so.12 #1 0x7f7ff160a461 in pthread_sigmask (how=, set=, oset=) at /usr/src/lib/libpthread/pthread_misc.c:164 #2 0x7f7fed75fec3 in u_thread_create ( routine=0x7f7fed75ff06 , param=0x7f7ff7e9f220) at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:50 #3 util_queue_create_thread (queue=queue@entry=0x7f7ff7e8b900, index=index@entry=0) at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_queue.c:350 #4 0x7f7fed7604ad in util_queue_init (queue=queue@entry=0x7f7ff7e8b900, name=name@entry=0x7f7fef727c8b "disk$", max_jobs=max_jobs@entry=32, num_threads=num_threads@entry=1, flags=flags@entry=7) at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_queue.c:466 ... /usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:50: 41 thrd_t thread; 42 #ifdef HAVE_PTHREAD 43 sigset_t saved_set, new_set; 44 int ret; 45 46 sigfillset(_set); 47 sigdelset(_set, SIGSYS); 48 pthread_sigmask(SIG_BLOCK, _set, _set); 49 ret = thrd_create( , routine, param ); 50 pthread_sigmask(SIG_SETMASK, _set, NULL); 51 #else 52 int ret; 53 ret = thrd_create( , routine, param ); 54 #endif 55 if (ret) 56return 0; 57 58 return thread; It looks as though line 50 simply undoes line 48, and it doesn't matter whether or not line 49 fails. How can this break? Cheers, Patrick
Re: GCC 9 enabled for x86 and arm platforms.
On Sun, Sep 13, 2020 at 07:28:16AM +, Thomas Mueller wrote: > > i've switched x86 and arm to GCC 9. several others are liking > > to switch soon, and they will all likely need clean builds to > > be stable. > > > please send email here or to me or send-pr for problems. > > > thanks! > > > > .mrg. > > Would -r flag in build.sh command be sufficient to ensure a clean build? > > Something like > ===> build.sh command:./build.sh -m amd64 -B nb899-20190723 -M ../obj -T > ../tooldir -r -U distribution kernel=SANDY7 > > Or would it be necessary to explicitly clean obj and tooldir directories, like > rm -R ../obj/* > rm -R ../tooldir/* > ? > > At this stage I wouldn't want to keep any remnants from old builds. Making sure there isn't MKUPDATE=yes nor -u in build.sh call was enough for me. The explicit clean would probably be better. (-r doesn't touch OBJDIR) Cheers, Patrick
Re: no entropy?
On Thu, Sep 10, 2020 at 05:52:08PM +0200, Martin Husemann wrote: > On Thu, Sep 10, 2020 at 04:06:12PM +0100, Patrick Welche wrote: > > I just upgraded to ancient boxen to -current/amd64. One has its 256 bits, > > the other has none?! I have tried playing spot the difference but > > haven't spotted anything: > > Both machines do not have a proper hardware random number generator. > One of them has been seeded before and saved the entropy > in /var/db/entropy-file. > > On the one that has not, you can manually do it by writing 32 random > bytes to /dev/random with dd(1), eg. after extracting them from some > machine with newer cpu (and hardware random number support) or from > the properly seeded machine by reading 32bytes from /dev/urandom. 256 bits currently stored in pool (max 256) Thanks, it's happier now! Patrick
no entropy?
I just upgraded to ancient boxen to -current/amd64. One has its 256 bits, the other has none?! I have tried playing spot the difference but haven't spotted anything: OK: [ 1.00] entropy: no seed from bootloader [ 7.024989] entropy: ready # rndctl -ls Source Bits Type Flags raid0 0 disk estimate, collect, v, t, dt ucom0 0 tty estimate, collect, v, t, dt /dev/random 0 ??? estimate, collect, v wd1 0 disk estimate, collect, v, t, dt wd0 0 disk estimate, collect, v, t, dt cpu1 0 vm estimate, collect, v, t, dv cpu0 0 vm estimate, collect, v, t, dv coretemp1-cpu10 env estimate, collect, v, t, dv, dt coretemp0-cpu00 env estimate, collect, v, t, dv, dt wm1 0 net estimate, v, t, dt wm0 0 net estimate, v, t, dt system-power 0 power estimate, collect, v, t, dt autoconf 0 ??? estimate, collect, t seed256 ??? estimate, collect, v 0 bits mixed into pool 256 bits currently stored in pool (max 256) 0 bits of entropy discarded due to full pool 0 hard-random bits generated 0 pseudo-random bits generated puzzling: [ 1.00] entropy: no seed from bootloader [ 7.383099] entropy: WARNING: consolidating less than full entropy # rndctl -ls Source Bits Type Flags raid3 0 disk estimate, collect, v, t, dt raid2 0 disk estimate, collect, v, t, dt raid1 0 disk estimate, collect, v, t, dt raid0 0 disk estimate, collect, v, t, dt /dev/random 0 ??? estimate, collect, v wd1 0 disk estimate, collect, v, t, dt wd0 0 disk estimate, collect, v, t, dt cpu1 0 vm estimate, collect, v, t, dv cpu0 0 vm estimate, collect, v, t, dv coretemp1-cpu10 env estimate, collect, v, t, dv, dt coretemp0-cpu00 env estimate, collect, v, t, dv, dt wm1 0 net estimate, v, t, dt wm0 0 net estimate, v, t, dt system-power 0 power estimate, collect, v, t, dt autoconf 0 ??? estimate, collect, t seed 0 ??? estimate, collect, v 0 bits mixed into pool 0 bits currently stored in pool (max 256) 0 bits of entropy discarded due to full pool 0 hard-random bits generated 0 pseudo-random bits generated both installed from the same tarballs, with GENERIC, and both apparently with the same model of CPU, both show [ 1.991010] aes: Intel SSSE3 vpaes [ 1.991010] aes_ccm: self-test passed [ 5.142260] cgd: self-test aes-xts-256 [ 5.142260] cgd: self-test aes-xts-512 [ 5.142260] cgd: self-test aes-cbc-128 [ 5.142260] cgd: self-test aes-cbc-256 [ 5.142260] cgd: self-test aes-cbc-128 (encblkno8) Maybe I need to toss a coin... Cheers, Patrick
Re: hang while updating pkg_rolling-replace libvdpau
On Wed, Sep 02, 2020 at 11:50:52PM +0200, Riccardo Mottola wrote: > I finished updating all my core system to current on i386-64, kernel, > userland, etc. > > Now I launched pkg_rolling replace, it crunches through several packages, > but then hangs. > > > I tried running it several times, rebooting in between... but nothing. What > is "hang" ? The CPU stays idle too. Hangs exactly there. What does e.g., ps auxwwd say when it hangs? Maybe another manifestation of "cmake hanging" http://mail-index.netbsd.org/current-users/2020/05/24/msg038692.html ? Cheers, Patrick
ptrace(PT_DUMPCORE) permission?
pbulk was idle stuck building kio at dbus24339 0.0 0.0 182156 9644 pts/6 Il 12:45PM 0:00.04 /usr/pkg/bin/cmake -E cmake_autogen /tmp/pkgsrc/devel/kio/work/kio-5.70.1/_KDE_build/src/urifilters/ikws/CMakeFiles/kurisearchfilter_autogen.dir/AutogenInfo.json Trying to debug: # gcore -c cmake.core 24339 gcore: ptrace(PT_DUMPCORE) to 24339 failed: Permission denied # gcore 24339 gcore: ptrace(PT_ATTACH) to 24339 failed: No such process Permission denied? and sure enough after that, there was no process. Thoughts on giving root permission? (I should have just gdb attached...) Cheers, Patrick
radixtree panic
A amd64 box updated to -current yesterday panicked overnight with (gdb) target kvm netbsd.0.core 0x80222535 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:713 713 dumpsys(); (gdb) bt #0 0x80222535 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:713 #1 0x8062750a in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../kern/kern_reboot.c:73 #2 0x80657570 in vpanic (fmt=fmt@entry=0x809397d0 "trap", ap=ap@entry=0xa9814eb90ac8) at ../../../../kern/subr_prf.c:290 #3 0x80657634 in panic (fmt=fmt@entry=0x809397d0 "trap") at ../../../../kern/subr_prf.c:209 #4 0x80224b6b in trap (frame=0xa9814eb90c10) at ../../../../arch/amd64/amd64/trap.c:326 #5 0x8021da35 in alltraps () #6 0x807596d5 in radix_tree_lookup_ptr (tagmask=0, alloc=false, path=0xa9814eb90d00, idx=10, t=0x84733b57b1d8) at ../../../../../../lib/libkern/../../../common/lib/libc/gen/radixtree.c:557 #7 radix_tree_clear_tag (t=t@entry=0x84733b57b1d8, idx=idx@entry=10, tagmask=tagmask@entry=1) at ../../../../../../lib/libkern/../../../common/lib/libc/gen/radixtree.c:1113 #8 0x805f2f42 in uvm_pagemarkdirty (pg=pg@entry=0xa98005772880, newstatus=newstatus@entry=1) at ../../../../uvm/uvm_page_status.c:109 #9 0x805f44a8 in uvmpd_scan_queue () at ../../../../uvm/uvm_pdaemon.c:752 #10 uvmpd_scan () at ../../../../uvm/uvm_pdaemon.c:900 #11 uvm_pageout (arg=) at ../../../../uvm/uvm_pdaemon.c:316 #12 0x802086f7 in lwp_trampoline () reminds me of PR 55493 which was on a different box Cheers, Patrick
Re: strange assert failure on today's -current
On Mon, Aug 03, 2020 at 02:59:43PM +0100, Patrick Welche wrote: > On Sun, Aug 02, 2020 at 09:22:22PM +0100, Chavdar Ivanov wrote: > > I've rebuilt pkgin itself, it doesn't do that on the build host, only > > on one which normally receives its packages via pkgin. > > Oddly I saw that too, got as far as deciding that > > https://github.com/NetBSDfr/pkgin/issues/86 > > looked similar. Thought it might be good to get a better backtrace and > recompiled with "-O0 -ggdb". Of course, then there was no more problem, > and pkgin just worked. I think the client had a pkgin that had been built on the server. The recompile bit happened on the client. That might be more relevant than the flags, but you say you "rebuilt pkgin"? Cheers, Patrick
Re: strange assert failure on today's -current
On Sun, Aug 02, 2020 at 09:22:22PM +0100, Chavdar Ivanov wrote: > I've rebuilt pkgin itself, it doesn't do that on the build host, only > on one which normally receives its packages via pkgin. Oddly I saw that too, got as far as deciding that https://github.com/NetBSDfr/pkgin/issues/86 looked similar. Thought it might be good to get a better backtrace and recompiled with "-O0 -ggdb". Of course, then there was no more problem, and pkgin just worked. Cheers, Patrick
Re: Failure durin nbmake build
On Sun, Jul 26, 2020 at 10:33:33AM +0100, Chavdar Ivanov wrote: > cc -D_PATH_DEFSYSPATH="/home/sysbuild/src/share/mk" > -DDEFSHELL_CUSTOM="/bin/sh" -DHAVE_SETENV=1 -DHAVE_STRDUP=1 > -DHAVE_STRERROR=1 -DHAVE_STRFTIME=1 -DHAVE_VSNPRINTF=1 -O -c > /home/sysbuild/src/usr.bin/make/lst.lib/*.c > cc: error: /home/sysbuild/src/usr.bin/make/lst.lib/*.c: No such file > or directory > cc: fatal error: no input files > compilation terminated. I think the lst.lib directory has just been removed. I haven't seen your build error, but do see a couple of references to lst.lib in Makefiles. Cheers, Patrick
Re: wm0 panic
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote: > Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can > boot multiuser without a network. If I log in as root, as soon as I hit > enter: > > # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 > [ 127.5763268] Kernel lock error 127.5763268] lock address : > 0x8106ab40 type : spin I can't reproduce this after http://mail-index.netbsd.org/source-changes/2020/07/07/msg119158.html Cheers, Patrick
Re: x86 in-kernel fpu bug
On Mon, Jul 20, 2020 at 04:49:14PM +, Taylor R Campbell wrote: > > Date: Mon, 20 Jul 2020 11:04:21 +0100 > > From: Patrick Welche > > > > After a -current/amd64 update, a sudden outbreak of floating point > > exceptions: > > > > /usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1: > > > > internal compiler error: Floating point exception > > > > > > Fetching message headers... > > Floating point exception (core dumped) mutt > > > > Any guesses? > > There's a good chance this has been fixed in sys/arch/x86/x86/fpu.c > revision 1.72 -- can you update and try again with a new kernel? Indeed! Thanks, Patrick
Re: Samba DC provisioning fails with ACL-enabled NetBSD-current
On Mon, Jul 20, 2020 at 05:47:59PM +0200, Matthias Petermann wrote: > test10# mount > /dev/dk0 on / type ffs (acls, log, local) In /etc/fstab, try /dev/dk0 / ffs rw,posix1eacls 1 1 (rather than acls) Cheers, Patrick
Re: floating point exceptions
On Mon, Jul 20, 2020 at 11:24:29AM -, Michael van Elst wrote: > pr...@cam.ac.uk (Patrick Welche) writes: > > >After a -current/amd64 update, a sudden outbreak of floating point > >exceptions: > > >/usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1: > > > >internal compiler error: Floating point exception > > > >Fetching message headers... > >Floating point exception (core dumped) mutt > > >Any guesses? > > > FPU usage in the kernel was enabled to support AES-NI for ipsec and cgd. > A workaround is to comment out all aes_md_init() calls in identcpu.c. Yes, this laptop is much happier now! I'm surprised as I was using the patch on an AMD build box before it was even committed with only a cgd speed-up to report. The unhappy laptop has a cpu0: "Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz" cpu0: features1 0x7ffafbff cpu0: features1 0x7ffafbff Cheers, Patrick
floating point exceptions
After a -current/amd64 update, a sudden outbreak of floating point exceptions: /usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1: internal compiler error: Floating point exception Fetching message headers... Floating point exception (core dumped) mutt Any guesses? Cheers, Patrick
Re: Heads up: ubc_direct enabled by default
On Thu, Apr 23, 2020 at 10:10:56PM +, Andrew Doran wrote: > This affects amd64, alpha and aarch64, but only 1 and 2 CPU systems so far. > Any more and it's still off by default. Only the default has changed so the > sysctl (vm.ubc_direct) still works for turning it on and off manually. > > This works great for me on amd64 but needs some tweaks to handle many CPUs. > I have some ideas on that one and hopefully will have something to try soon. What are the risks of switching it on for > 2 CPUs? (With a LOCKDEBUG kernel on a 8-core ryzen 1700 for PR kern/55362, after days I'm at # dd if=/dev/zero obs=64k | progress -l 12884901888b dd of=/dev/rdk6 ibs=64k 8% |** | 544 GiB2.18 MiB/s--:-- ETA which I think could be helped with ubc_direct. I don't wnat to increase the problem domain though.) Cheers, Patrick
Re: rump bridge fun
On Tue, Jul 14, 2020 at 11:20:24AM +0100, Patrick Welche wrote: > On 9.99.68/amd64, in one xterm: > > $ rump_allserver -sv unix:///tmp/sock > > in another xterm: > > $ export RUMP_SERVER=unix:///tmp/sock > $ rump.ifconfig -a > lo0: flags=0x8049 mtu 33624 > inet 127.0.0.1/8 flags 0 > inet6 ::1/128 flags 0x20 > inet6 fe80::1%lo0/64 flags 0 scopeid 0x1 > $ rump.ifconfig bridge0 create > > and watch rump_allserver's cpu use hit 100% (!) Filed as PR kern/55489 Cheers, Patrick
rump bridge fun
On 9.99.68/amd64, in one xterm: $ rump_allserver -sv unix:///tmp/sock in another xterm: $ export RUMP_SERVER=unix:///tmp/sock $ rump.ifconfig -a lo0: flags=0x8049 mtu 33624 inet 127.0.0.1/8 flags 0 inet6 ::1/128 flags 0x20 inet6 fe80::1%lo0/64 flags 0 scopeid 0x1 $ rump.ifconfig bridge0 create and watch rump_allserver's cpu use hit 100% (!) Cheers, Patrick
Re: wm0 panic
On Mon, Jun 29, 2020 at 12:53:23PM +0900, Kengo NAKAHARA wrote: > It seems some other code have held KERNEL_LOCK too long time. > Could you show the function of last locked address? > # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5 With Jun 28 14:26 code # addr2line -e netbsd.3.gdb -f 0x80a4c526 doifioctl /usr/src/sys/arch/amd64/compile/QUANTZDBG/../../../../net/if.c:3403 (discriminator 3) > If the panic can reappear, could you show "show all locks/t" of ddb? It is nicely reproducible (boot single user, type "ifconfig wm0 up"), I have a core dump and a serial console, but debugging locking issues is "interesting"! Thanks, Patrick type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 db{1}> show all locks /t [Locks tracked through LWPs] ** LWP 330.330 (ifconfig) @ 0xf1c6387c8a40, l_stat=7 *** Locks held: * Lock 0 (ick address : 0xf1c637a4e380 type : sleep/adaptive initialized : 0x80a475fd shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 0 relevant cpu : 0 last held: 0 relevant lwp : 0xf1c6388a40 last locked* : 0x80a4bf94 unlocked : 0x80a4c02b owner field : 0xf1c6387c8a40 wait/spin:0/0 Turnstile: no active turnstile for this lock. *** Loczed at module_hook_init) lock address : 0x8106a800 type : sleep/adaptive initialized : 0x80952c6e shared holds : 0 exclusive: 0 shares wanted: 0 exclusive: 0 relevant cpu : 0 last held: 0 relevant lwp : 0xf1c6387c8a40 last held: 00 last locked : 00 unlocked*: 00 owner field : 00 wait/spin:0/0 Turnstile: no active turnstile for this lock. *** Traceback: trace: pid 330 lid 330 at 0x8 address 0x283 is invalid ?() at 283 address 0x10 is invalid address 0x8 is invalid db_printf() at netbsd:db_printf ** LWP 0.402 (iic1) @ 0xf1c637f1aa40, l_stat=7 *** Locks held: none *** Locks wanted: * Lock 0 (initialized at main) lock address : 0x8106a700 type :0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 2 last hellwp : 0xf1c637f1aa40 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 *** 02 at 0xb0025da16ec0 sleepq_block() at netbsd:sleepq_block+0x211 iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52 ** LWP 0.401 (iic0) @ 0xf1c637f1a600, l_stat=7 *** Locks held: none *** Locks wanted: * Lock 0 (initiax8106a700 type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c637f1a600 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 *** Traceback: trace: pid 0 lid 401 at 0xb0025da11ec0 sleepq_block() at netbsd:sleepq_block+0x211 iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52 ** LWP 0.23 (softclk/1) @ 0xf1c63767f200, l_stat=7 *** Locks held: * Lock 0 (initialized at soinit) lock address : 0xf1cd177e3080 type : sleep/adaptive initialed holds : 0 exclusive: 1 shares wanted: 0 exclusive: 0 relevant cpu : 1 last held: 1 r last held: 0xf1c63767f200 last locked* : 0x806c3e65 unlocked : 0x806d5ebd owner field : 0xf1c63767f200 wait/spin:0/0 Turnstile: no active turnstileted: * Lock 0 (initialized at main) lock address : 0x8106a700 type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked :
Re: wm0 panic
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote: > (must try with biosboot instead fo EFI which is the case here) makes no difference
wm0 panic
Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can boot multiuser without a network. If I log in as root, as soon as I hit enter: # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 [ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 type : spin [ 127.5863237] initialized : 0x80b0bbb9 [ 127.5863237] shared holds : 0 exclusive: 1 [ 127.5963238] shares wanted: 0 exclusive: 1 [ 127.6063236] relevant cpu : 1 last held: 0 [ 127.6163235] relevant lwp : 0x8d419a07f20 [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6 [ 127.6263235] curcpu holds : 0 wanted by: 0x8d419a07f200 [ 127.6363234] panic: LOCKDEBock,244: spinout [ 127.6363234] cpu1: Begin traceback... [ 127.6463233] vpanic() at netbsd:vpanic+0x152 [ 127.6463233] snprintf() at netbsd:snprintf [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244 [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34 [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f [ 127.6863230] softint_disph+0x108 [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xa4825d02eff0 [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 127.7063229] --- interrupt --- [ 127.706322traceback... (box is happily usable without the LOCKDEBUG - it just means I can't debug what I'm trying to get at...) (must try with biosboot instead fo EFI which is the case here) wm0 at pci7 dev 0 function 0: I211 Ethernet (COPPER) (rev. 0x03) wm0: for TX and RX interrupting at msix3 vec 0 affinity to 1 wm0: for TX and RX interrupting at msix3 vec 1 affinity to 2 wm0: for LINK interrupting at msix3 vec 2 wm0: PCI-Express bus wm0: 64 words iNVM, version 0.6 wm0: Ethernet address 60:45:cb:9e:13:dd wm0: COMPAT = wm0: Copper wm0: 0xc614420 makphy0 at wm0 phy 1: I210 10/100/1000 media interface, rev. 0 # strings /netbsd | grep if_wm.c $NetBSD: if_wm.c,v 1.679 2020/06/27 13:32:00 jmcneill Exp $ Cheers, Patrick