from:"Patrick Welche"

Re: disklabel change?

2024-04-22 Thread Patrick Welche

On Mon, Apr 22, 2024 at 03:00:40PM +0100, Patrick Welche wrote:
> On Mon, Apr 22, 2024 at 01:11:56PM -, Michael van Elst wrote:
> > pr...@welche.eu (Patrick Welche) writes:
> > 
> > >In fact, the difference is between "-t" and "-rt":
> > 
> > >I deem "-t" output to be correct (and matches what I had in /etc/diskpart)
> > 
> > 
> > The in-kernel disklabel gets the RAW_PART from by the disk geometry
> > and if RAW_PART == 3, it gets d_partitions[2] from the MBR partition
> > table.
> > 
> > That explains why 'disklabel -t' looks correct, it shows the in-kernel
> > disklabel.
> > 
> > It doesn't explain why the on-disk label has the entries swapped.
> > When you edit the disklabel, the kernel writes to the disk. When
> > that corrects the error, the bug is in the disklabel program,
> > otherwise it's in the kernel.
> 
> Given that the content of /etc/disktab agrees with the correct version,
> I am pretty sure that 3 years ago I did a "disklabel -r -w sd0 perc"
> 
> I suppose I could do it again and see if -rt becomes correct... oder?

After all that: there was a typo in the disktab:

:pc#4294703103:od#264192:\
:pd#4294967295:od#0:\

"od" in both lines... After fixing that, -t and -rt agree!

Sorry for the noise - it was surprising though...


Cheers,

Patrick

Re: disklabel change?

2024-04-22 Thread Patrick Welche

On Mon, Apr 22, 2024 at 01:11:56PM -, Michael van Elst wrote:
> pr...@welche.eu (Patrick Welche) writes:
> 
> >In fact, the difference is between "-t" and "-rt":
> 
> >I deem "-t" output to be correct (and matches what I had in /etc/diskpart)
> 
> 
> The in-kernel disklabel gets the RAW_PART from by the disk geometry
> and if RAW_PART == 3, it gets d_partitions[2] from the MBR partition
> table.
> 
> That explains why 'disklabel -t' looks correct, it shows the in-kernel
> disklabel.
> 
> It doesn't explain why the on-disk label has the entries swapped.
> When you edit the disklabel, the kernel writes to the disk. When
> that corrects the error, the bug is in the disklabel program,
> otherwise it's in the kernel.

Given that the content of /etc/disktab agrees with the correct version,
I am pretty sure that 3 years ago I did a "disklabel -r -w sd0 perc"

I suppose I could do it again and see if -rt becomes correct... oder?


Cheers,

Patrick

Re: disklabel change?

2024-04-22 Thread Patrick Welche

In fact, the difference is between "-t" and "-rt":

$ disklabel -t sd0
perc|Automatically generated label:\
:dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
:su#4294967295:\
:pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
:pb#33554432:ob#21235712:tb=swap:\
:pc#4294703103:oc#264192:\
:pd#4294703103:od#0:\
:pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:
$ disklabel -rt sd0
perc|Automatically generated label:\
:dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
:su#4294967295:\
:pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
:pb#33554432:ob#21235712:tb=swap:\
:pc#4294703103:oc#0:\
:pd#4294967295:od#264192:\
:pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:

Given

$ sysctl kern.rawpartition
kern.rawpartition = 3

I deem "-t" output to be correct (and matches what I had in /etc/diskpart)


Cheers,

Patrick


On Sun, Apr 21, 2024 at 07:02:46PM +0100, Patrick Welche wrote:
> With a kernel & userland of 14 April 2024, on amd64, I just did:
> 
> # disklabel -rt sd0
> perc|Automatically generated label:\
> :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
> :su#4294967295:\
> :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
> :pb#33554432:ob#21235712:tb=swap:\
> :pc#4294703103:oc#0:\
> :pd#4294967295:od#264192:\
> :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:
> 
> and was surprised to see that d no longer was the whole disk. My
> recollection was that peecees were odd and i386/amd64 used d, whereas
> everyone else used c. Has this changed for consistency?
> 
> That computer's /etc/disktab contains
> 
> perc|PERC H700:\
> :dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
> :su#4294967295:\
> :pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
> :pb#33554432:ob#21235712:tb=swap:\
> :pc#4294703103:od#264192:\
> :pd#4294967295:od#0:\
> :pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:
> 
> which shows it was the other way around on 29 April 2021 when the
> disklabel was written.
> 
> 
> Cheers,
> 
> Patrick

disklabel change?

2024-04-21 Thread Patrick Welche

With a kernel & userland of 14 April 2024, on amd64, I just did:

# disklabel -rt sd0
perc|Automatically generated label:\
:dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
:su#4294967295:\
:pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
:pb#33554432:ob#21235712:tb=swap:\
:pc#4294703103:oc#0:\
:pd#4294967295:od#264192:\
:pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:

and was surprised to see that d no longer was the whole disk. My
recollection was that peecees were odd and i386/amd64 used d, whereas
everyone else used c. Has this changed for consistency?

That computer's /etc/disktab contains

perc|PERC H700:\
:dt=SCSI:se#512:ns#32:nt#64:sc#2048:nc#2143360:\
:su#4294967295:\
:pa#20971520:oa#264192:ta=4.2BSD:ba#0:fa#0:\
:pb#33554432:ob#21235712:tb=swap:\
:pc#4294703103:od#264192:\
:pd#4294967295:od#0:\
:pe#4240177151:oe#54790144:te=4.2BSD:be#0:fe#0:

which shows it was the other way around on 29 April 2021 when the
disklabel was written.


Cheers,

Patrick

Re: xsrc mesalib build problem

2024-04-13 Thread Patrick Welche

On Sat, Apr 13, 2024 at 01:19:05PM +0100, Patrick Welche wrote:
> Building xsrc on -current/amd64 with 
> 
> HAVE_MESA_VER=21
> HAVE_GCC=12
> 
> fails for me with
> 
> /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: error: 
> 'STN_UNDEF' undeclared (first use in this function); did you mean 'SHN_UNDEF'?
> 658 |   if (r_sym == STN_UNDEF) {
> |^
> |SHN_UNDEF
> /usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: note: 
> each undeclared identifier is reported only once for each function it appears 
> in 
> 
> Non standard build options I know, but it had worked, and I had a complete
> build on 27 March...
> 
> STN_UNDEF / SHN_UNDEF appear to be from LLVM? (and this bit of build I think
> is gallium == llvmpipe?)

Really confused:

STN_UNDEF should be found via libelf.h -> sys/exec_elf.h

cd /usr/src/external/mit
make -j24 dependall
make -j24 install

works without complaint, and then

cd /usr/src
sh build.sh -u -x -j24 -E build

fails with the above
*** Failed target: ac_rtld.pico

If I run the lengthy output of

*** Failed commands:
${_MKTARGET_COMPILE}
=> @echo '#  ' "compile " gallium/ac_rtld.pico
${COMPILE.c} ${COPTS.${.IMPSRC:T}} ${CPUFLAGS.${.IMPSRC:T}} 
${CPPFLAGS.${.IMPSRC:T}} ${CSHLIBFLAGS} ${.IMPSRC} -o ${.TARGET}

I can reproduce the problem.

If I use "gcc -E", I see

--
typedef struct {
 uint32_t gh_nbuckets;
 uint32_t gh_symndx;
 uint32_t gh_maskwords;
 uint32_t gh_shift2;
} Elf_GNU_Hash_Header;
# 35 "/usr/include/elfdefinitions.h" 2 3 4
# 41 "/usr/include/libelf.h" 2 3 4

typedef struct _Elf Elf;
typedef struct _Elf_Scn Elf_Scn;
-
c.f. libelf.h
-
#ifdef BUILTIN_ELF_HEADERS
# include 
# include 
# include "elfdefinitions.h"
#elif HAVE_NBTOOL_CONFIG_H
# include 
#elif defined(__NetBSD__)
# include 
# include 
#elif defined(__FreeBSD__)
# include 
# include 
# include 
#else
  #error "No valid elf headers"
#endif

/* Library private data structures */
typedef struct _Elf Elf;
typedef struct _Elf_Scn Elf_Scn;
-

My current suspicion is that "elfdefinitions.h" is being used rather
than sys/exec_elf.h, the latter is the one which defines STN_UNDEF:

$ grep _UNDEF elfdefinitions.h exec_elf.h 
elfdefinitions.h:#defineSHN_UNDEF   0
exec_elf.h:#define ELF_SYM_UNDEFINED0
exec_elf.h:#define STN_UNDEF0   /* undefined index */
exec_elf.h:#define SHN_UNDEF0   /* Undefined section */

but I don't see "BUILTIN_ELF_HEADERS" in the failing command line,
and why would "make dependall" behave differently?

My guess is that adding STN_UNDEF to elfdefintions.h will patch over
this, but I don't see why build.sh would behave differently.

Possibly related to

Author: riastradh 
Date:   Mon Apr 1 18:33:22 2024 +

elftoolchain: Be consistent about which ELF header files we use.

which would match my notion of it working on 27 March?

Cheers,

Patrick

xsrc mesalib build problem

2024-04-13 Thread Patrick Welche

Building xsrc on -current/amd64 with 

HAVE_MESA_VER=21
HAVE_GCC=12

fails for me with

/usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: error: 
'STN_UNDEF' undeclared (first use in this function); did you mean 'SHN_UNDEF'?
658 |   if (r_sym == STN_UNDEF) {
|^
|SHN_UNDEF
/usr/xsrc/external/mit/MesaLib/dist/src/amd/common/ac_rtld.c:658:20: note: each 
undeclared identifier is reported only once for each function it appears in 

Non standard build options I know, but it had worked, and I had a complete
build on 27 March...

STN_UNDEF / SHN_UNDEF appear to be from LLVM? (and this bit of build I think
is gallium == llvmpipe?)


Cheers,

Patrick

Re: gdb crashes on current

2024-03-20 Thread Patrick Welche

On Wed, Mar 20, 2024 at 11:33:30PM +0500, Vitaly Shevtsov wrote:
> Hello!
> 
> It seems that gdb from base NetBSD image doesn't work with netbsd's
> libcurses in tui mode:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x756547929fba in _lwp_kill () from /usr/lib/libc.so.12
> [Current thread is 1 (process 12621)]
> (gdb) bt
> #0  0x756547929fba in _lwp_kill () from /usr/lib/libc.so.12
> #1  0x0101ef97 in handle_fatal_signal(int) ()
> #2  0x0101f152 in handle_sigsegv(int) ()
> #3  
> #4  0x756547e679e1 in prefresh () from /usr/lib/libcurses.so.9
> #5  0x00eec1a0 in tui_source_window_base::refresh_window() ()
> #6  0x00eedf18 in tui_unhighlight_win(tui_win_info*) ()
> #7  0x00eec5f8 in
> tui_source_window_base::do_erase_source_content(char const*) ()
> #8  0x00ef60e5 in tui_layout_split::apply(int, int, int, int, bool) ()
> #9  0x00ef4aa8 in tui_apply_current_layout(bool) ()
> #10 0x00ef4ddb in tui_set_layout(tui_layout_split*) ()
> #11 0x00ee4f2f in tui_enable() ()
> #12 0x00ee5487 in tui_rl_switch_mode(int, int) ()
> #13 0x01379fa8 in _rl_dispatch_subseq ()
> #14 0x0137aa27 in _rl_dispatch_callback ()
> #15 0x0135a1f1 in rl_callback_read_char ()
> #16 0x0101f40e in gdb_rl_callback_read_char_wrapper_noexcept() ()
> #17 0x0102020d in gdb_rl_callback_read_char_wrapper(void*) ()
> #18 0x0101eeb1 in stdin_event_handler(int, void*) ()
> #19 0x01300eba in gdb_wait_for_event(int) [clone .part.0] ()
> #20 0x0130155a in gdb_do_one_event(int) ()
> #21 0x00fa0576 in captured_command_loop() ()
> #22 0x00fa2113 in gdb_main(captured_main_args*) ()
> #23 0x013c66fb in main ()

Just had a go, and "tui enable" doesn't get as far as libcurses

(gdb) bt
#0  0x7f7ff78dfffa in ?? ()
#1  0x00222b45 in gdb_rl_callback_handler ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/event-top.c:262
#2  0x00222cfa in 
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count () at 
/usr/export/amd64/usr/include/g++/bits/shared_ptr_base.h:1070
#3  std::__shared_ptr, 
std::allocator >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr ()
at /usr/export/amd64/usr/include/g++/bits/shared_ptr_base.h:1524
#4  std::shared_ptr, 
std::allocator > >::~shared_ptr ()
at /usr/export/amd64/usr/include/g++/bits/shared_ptr.h:175
#5  gdb_exception::~gdb_exception ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdbsupport/common-exceptions.h:114
#6  gdb_rl_callback_read_char_wrapper_noexcept ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/event-top.c:212
#7  0x7f7ff78e00b0 in ?? ()
#8  0x0001000b in ?? ()
#9  0x in ?? ()

but "gdb -tui" does:

Thread 1 "" received signal SIGSEGV, Segmentation faultprefresh (pad=0x0, 
pbegy=0, pbegx=0, sbegy=1, sbegx=5, smaxy=0, smaxx=78)
at /usr/src/lib/libcurses/refresh.c:511
511 pad->pbegy = pbegy;
(gdb) bt
#0  prefresh (pad=0x0, pbegy=0, pbegx=0, sbegy=1, sbegx=5, smaxy=0, smaxx=78)
at /usr/src/lib/libcurses/refresh.c:511
#1  0x000f60ff in tui_source_window_base::refresh_window ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:267
#2  0x000f7e68 in tui_unhighlight_win ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-wingeneral.c:138
#3  tui_unhighlight_win ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-wingeneral.c:131
#4  0x000f6552 in tui_source_window_base::do_erase_source_content ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:219
#5  0x0010008b in tui_layout_split::apply ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:1019
#6  0x000fe927 in tui_apply_current_layout ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:81
#7  0x000fec65 in tui_set_layout ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-layout.c:150
#8  0x000f1f2f in tui_enable ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui.c:452
#9  0x001c319c in interp_set ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/interps.c:191
#10 0x001a7c55 in captured_main_1 ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1145
#11 0x001a8b0f in captured_main ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1320
#12 gdb_main ()
at /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/main.c:1345
#13 0x005be5cf in main ()
at /usr/src/external/gpl3/gdb/bin/gdb/../../dist/gdb/gdb.c:32
(gdb) print pad
$1 = (WINDOW *) 0x0
(gdb) frame 1
#1  0x000f60ff in tui_source_window_base::refresh_window ()
at 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/tui/tui-winsource.c:267
267

Re: binary in dtrace output

2023-12-27 Thread Patrick Welche

On Wed, Dec 27, 2023 at 12:15:36AM +, RVP wrote:
> On Fri, 8 Dec 2023, Patrick Welche wrote:
> 
> > When profiling my simulation with
> > 
> > dtrace -x ustackframes=100 -n 'profile-9 / execname == "mds" / { 
> > @[ustack()] = count(); } tick-180s { exit(0); }' -o /tmp/out.stacks
> > 
> > every function name is preceded by c0 df ff ff 7f 7f e.g.,
> > 
> > 0060  20 20 20 20 20 20 20 20  20 c0 df ff ff 7f 7f 60  | 
> > ..`|
> > 0070  66 5f 70 61 69 72 5f 44  4c 56 4f 28 61 74 6f 6d  
> > |f_pair_DLVO(atom|
> > 
> > which is making perl, i.e., FlameGraph unhappy.
> > 
> > 6 bytes seems a strange number.
> > 
> > Is this normal / any hints on what I can do about it?
> > 
> 
> This looks like junk on the stack:
> 
> src/external/cddl/osnet/dist/lib/libdtrace/common/dt_consume.c:dt_print_ustack()
> doesn't initialize `objname' and doesn't check the return of proc_objname()
> (line 1385) either.
> 
> Can you try this patch (fudged from FreeBSD)?

Thanks - it works a treat!

Cheers,

Patrick

binary in dtrace output

2023-12-08 Thread Patrick Welche

When profiling my simulation with

dtrace -x ustackframes=100 -n 'profile-9 / execname == "mds" / { @[ustack()] = 
count(); } tick-180s { exit(0); }' -o /tmp/out.stacks

every function name is preceded by c0 df ff ff 7f 7f e.g.,

0060  20 20 20 20 20 20 20 20  20 c0 df ff ff 7f 7f 60  | ..`|
0070  66 5f 70 61 69 72 5f 44  4c 56 4f 28 61 74 6f 6d  |f_pair_DLVO(atom|

which is making perl, i.e., FlameGraph unhappy.

6 bytes seems a strange number.

Is this normal / any hints on what I can do about it?


Cheers,

Patrick

Re: gcc 12 question

2023-11-24 Thread Patrick Welche

On Thu, Nov 23, 2023 at 12:31:34PM +, Robert Swindells wrote:
> 
> Patrick Welche  wrote:
> > I'm trying to build a release on amd64 using
> >
> > HAVE_MESA_VER=21
> > HAVE_GCC=12
> 
> What does pkgsrc graphics/MesaLib do if built using gcc 12?

It builds OK.

Given

https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716

my guess is that the pkgsrc package doesn't treat warnings as errors.
(-Werror=stringop-overread)


Cheers,

Patrick

Re: gcc 12 question

2023-11-24 Thread Patrick Welche

On Thu, Nov 23, 2023 at 09:22:11AM +, Patrick Welche wrote:
> I'm trying to build a release on amd64 using
> 
> HAVE_MESA_VER=21
> HAVE_GCC=12
> 
> and get the following build error which I am puzzled by:
> 
> inlined from 'r300_merge_textures_and_samplers' at 
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:823:17,
> inlined from 'r300_update_derived_state' at 
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:1064:9:
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5:
>  error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 
> [-Werror=stringop-overread]
>   676 | util_format_unswizzle_4f(border_swizzled, border, desc->swizzle);
>   | ^~~~
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5:
>  note: referencing argument 3 of type 'const unsigned char[4]'
> In file included from 
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/auxiliary/util/u_pack_color.h:40,
>  from 
> /usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:28:
> /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h: In function 
> 'r300_update_derived_state':
> /usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h:1671:6: note: 
> in a call to function 'util_format_unswizzle_4f'
>  1671 | void util_format_unswizzle_4f(float *dst, const float *src,
>   |  ^~~~
> 
> 
> desc is a const struct util_format_description *
> 
> util/format/u_format.h:226 defines the swizzle member of
> util_format_description as:
> 
> unsigned char swizzle[4];
> 
> and the truncated quote of util/format/u_format.h:1671 is
> 
> void util_format_unswizzle_4f(float *dst, const float *src,
>   const unsigned char swz[4]);
> 
> so all is consistent.
> 
> error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 
> [-Werror=stringop-overread]
> 
> Assuming "a region" is the 3rd argument, then how can the size of an
> unsigned char[4] be zero?
> 
> Puzzled...

Seems others were puzzled too:

https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716


Cheers,

Patrick

gcc 12 question

2023-11-23 Thread Patrick Welche

I'm trying to build a release on amd64 using

HAVE_MESA_VER=21
HAVE_GCC=12

and get the following build error which I am puzzled by:

inlined from 'r300_merge_textures_and_samplers' at 
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:823:17,
inlined from 'r300_update_derived_state' at 
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:1064:9:
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5:
 error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 
[-Werror=stringop-overread]
  676 | util_format_unswizzle_4f(border_swizzled, border, desc->swizzle);
  | ^~~~
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:676:5:
 note: referencing argument 3 of type 'const unsigned char[4]'
In file included from 
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/auxiliary/util/u_pack_color.h:40,
 from 
/usr/xsrc/external/mit/MesaLib/dist/src/gallium/drivers/r300/r300_state_derived.c:28:
/usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h: In function 
'r300_update_derived_state':
/usr/xsrc/external/mit/MesaLib/dist/src/util/format/u_format.h:1671:6: note: in 
a call to function 'util_format_unswizzle_4f'
 1671 | void util_format_unswizzle_4f(float *dst, const float *src,
  |  ^~~~


desc is a const struct util_format_description *

util/format/u_format.h:226 defines the swizzle member of
util_format_description as:

unsigned char swizzle[4];

and the truncated quote of util/format/u_format.h:1671 is

void util_format_unswizzle_4f(float *dst, const float *src,
  const unsigned char swz[4]);

so all is consistent.

error: 'util_format_unswizzle_4f' reading 4 bytes from a region of size 0 
[-Werror=stringop-overread]

Assuming "a region" is the 3rd argument, then how can the size of an
unsigned char[4] be zero?

Puzzled...


Cheers,

Patrick

Re: ffmpeg6 and SSP?

2023-11-15 Thread Patrick Welche

On Wed, Nov 15, 2023 at 01:48:19PM +0200, Vitaly Shevtsov wrote:
> Even arcticfox cannot be built due to the same reason.

Christos fixed it - cvs update and rebuild, and check you have

# nm -g /lib/libc.so | grep ssp
00055136 T __ssp_protected_getcwd
0005512c T __ssp_protected_read
00055131 T __ssp_protected_readlink
0007cc3a T _getfsspec
0007cc3a W getfsspec
0019822f T isspace
00198245 T isspace_l
0004afb7 T wcsspn


Cheers,

Patrick

Re: ffmpeg6 and SSP?

2023-11-15 Thread Patrick Welche

On Tue, Nov 14, 2023 at 11:30:27AM +, Patrick Welche wrote:
> On Tue, Nov 14, 2023 at 10:32:01AM +0000, Patrick Welche wrote:
> > On Mon, Nov 13, 2023 at 11:22:55AM +, Patrick Welche wrote:
> > > I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current
> > > with HAVE_GCC=12 and pkgsrc-current I'm seeing
> > > 
> > > => Bootstrap dependency digest>=20211023: found digest-20220214
> > > ===> Checking for vulnerabilities in ffmpeg6-6.0nb6
> > > ===> Building for ffmpeg6-6.0nb6
> > > LD  ffmpeg6_g
> > > LD  ffprobe6_g
> > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of 
> > > `environ'
> > > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of 
> > > `environ'
> > > ld: libavdevice/libavdevice.so: undefined reference to 
> > > `__ssp_protected_read'
> > > ld: libavdevice/libavdevice.so: undefined reference to 
> > > `__ssp_protected_read'
> > > gmake: *** [Makefile:131: ffprobe6_g] Error 1
> > > gmake: *** Waiting for unfinished jobs
> > > gmake: *** [Makefile:131: ffmpeg6_g] Error 1
> > > *** Error code 2
> > > 
> > > 
> > > Suggestions? Try no FORTIFY?
> > 
> > I tried "no FORTIFY" on ffmpeg6 as
> > 
> >   CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\""
> > 
> > which didn't help.
> > 
> > I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12)
> > which didn't help.
> > 
> > I also see the problem with the simpler lang/gawk package:
> > 
> > ld: awkgram.o: in function `get_src_buf':
> > awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read'
> > ld: io.o: in function `iop_alloc':
> > io.c:(.text+0xf03): undefined reference to `__ssp_protected_read'
> > ld: io.o: in function `get_a_record':
> > io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read'
> > ld: io.o: in function `after_beginfile':
> > io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read'
> > ld: io.o: in function `redirect_string':
> > io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read'
> > ld: io.o:io.c:(.text+0x5606): more undefined references to 
> > `__ssp_protected_read' follow
> > 
> > If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from
> > the definition of__ssp_inline and make it static again, then gawk builds,
> > 
> > i.e., reverting
> > 
> > -/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/
> > +/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/
> > 
> > allows gawk to build.
> 
> Userland was built with MKUPDATE=yes - maybe I didn't rebuild whichever
> library should contain the extern definition of __ssp_protected_read ?
> 
> git grep ssp_protected_read
> 
> on https://github.com/NetBSD/src.git returned nothing - where should
> the __ssp_protected_read symbol live?


Thank you to Christos for putting the symbol in libc today with
the addition of ssp_redirect.c!

Before:
$ nm -g libc.so.12.221 | grep ssp
0007bb8a T _getfsspec
0007bb8a W getfsspec
0019717f T isspace
00197195 T isspace_l
00049f67 T wcsspn

After:
$ nm -g libc.so.12.221 | grep ssp
00055136 T __ssp_protected_getcwd
0005512c T __ssp_protected_read
00055131 T __ssp_protected_readlink
0007cc3a T _getfsspec
0007cc3a W getfsspec
0019822f T isspace
00198245 T isspace_l
0004afb7 T wcsspn


Cheers,

Patrick

SSP

2023-11-15 Thread Patrick Welche

Talking of SSP, what can you do once a detection happens?

I see in /var/log/messages:

Nov 15 06:59:32 mail -: mail.example.com exim - - - stack overflow detected; 
terminated

I have:

kern.coredump.setid.dump = 1
kern.coredump.setid.path = /var/crash/%n.core
proc.curproc.rlimit.coredumpsize.soft = unlimited
proc.curproc.rlimit.coredumpsize.hard = unlimited

but /var/crash is empty.

How do you make use of SSP?


Cheers,

Patrick

Re: ffmpeg6 and SSP?

2023-11-14 Thread Patrick Welche

On Tue, Nov 14, 2023 at 10:32:01AM +, Patrick Welche wrote:
> On Mon, Nov 13, 2023 at 11:22:55AM +0000, Patrick Welche wrote:
> > I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current
> > with HAVE_GCC=12 and pkgsrc-current I'm seeing
> > 
> > => Bootstrap dependency digest>=20211023: found digest-20220214
> > ===> Checking for vulnerabilities in ffmpeg6-6.0nb6
> > ===> Building for ffmpeg6-6.0nb6
> > LD  ffmpeg6_g
> > LD  ffprobe6_g
> > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of 
> > `environ'
> > ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of 
> > `environ'
> > ld: libavdevice/libavdevice.so: undefined reference to 
> > `__ssp_protected_read'
> > ld: libavdevice/libavdevice.so: undefined reference to 
> > `__ssp_protected_read'
> > gmake: *** [Makefile:131: ffprobe6_g] Error 1
> > gmake: *** Waiting for unfinished jobs
> > gmake: *** [Makefile:131: ffmpeg6_g] Error 1
> > *** Error code 2
> > 
> > 
> > Suggestions? Try no FORTIFY?
> 
> I tried "no FORTIFY" on ffmpeg6 as
> 
>   CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\""
> 
> which didn't help.
> 
> I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12)
> which didn't help.
> 
> I also see the problem with the simpler lang/gawk package:
> 
> ld: awkgram.o: in function `get_src_buf':
> awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read'
> ld: io.o: in function `iop_alloc':
> io.c:(.text+0xf03): undefined reference to `__ssp_protected_read'
> ld: io.o: in function `get_a_record':
> io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read'
> ld: io.o: in function `after_beginfile':
> io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read'
> ld: io.o: in function `redirect_string':
> io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read'
> ld: io.o:io.c:(.text+0x5606): more undefined references to 
> `__ssp_protected_read' follow
> 
> If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from
> the definition of__ssp_inline and make it static again, then gawk builds,
> 
> i.e., reverting
> 
> -/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/
> +/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/
> 
> allows gawk to build.

Userland was built with MKUPDATE=yes - maybe I didn't rebuild whichever
library should contain the extern definition of __ssp_protected_read ?

git grep ssp_protected_read

on https://github.com/NetBSD/src.git returned nothing - where should
the __ssp_protected_read symbol live?


Cheers,

Patrick

Re: ffmpeg6 and SSP?

2023-11-14 Thread Patrick Welche

On Mon, Nov 13, 2023 at 11:22:55AM +, Patrick Welche wrote:
> I'm pretty sure ffmpeg6 compiled recently, but on today's NetBSD-current
> with HAVE_GCC=12 and pkgsrc-current I'm seeing
> 
> => Bootstrap dependency digest>=20211023: found digest-20220214
> ===> Checking for vulnerabilities in ffmpeg6-6.0nb6
> ===> Building for ffmpeg6-6.0nb6
> LD  ffmpeg6_g
> LD  ffprobe6_g
> ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of `environ'
> ld: /usr/lib/crt0.o and /usr/lib/crt0.o: warning: multiple common of `environ'
> ld: libavdevice/libavdevice.so: undefined reference to `__ssp_protected_read'
> ld: libavdevice/libavdevice.so: undefined reference to `__ssp_protected_read'
> gmake: *** [Makefile:131: ffprobe6_g] Error 1
> gmake: *** Waiting for unfinished jobs
> gmake: *** [Makefile:131: ffmpeg6_g] Error 1
> *** Error code 2
> 
> 
> Suggestions? Try no FORTIFY?

I tried "no FORTIFY" on ffmpeg6 as

  CONFIGURE_ENV+="CPPFLAGS=\"-D_FORTIFY_SOURCE=0\""

which didn't help.

I tried a NetBSD-current box with gcc 10.5.0 (i.e., without HAVE_GCC=12)
which didn't help.

I also see the problem with the simpler lang/gawk package:

ld: awkgram.o: in function `get_src_buf':
awkgram.c:(.text+0x2d8c): undefined reference to `__ssp_protected_read'
ld: io.o: in function `iop_alloc':
io.c:(.text+0xf03): undefined reference to `__ssp_protected_read'
ld: io.o: in function `get_a_record':
io.c:(.text+0x22d6): undefined reference to `__ssp_protected_read'
ld: io.o: in function `after_beginfile':
io.c:(.text+0x27c7): undefined reference to `__ssp_protected_read'
ld: io.o: in function `redirect_string':
io.c:(.text+0x55e7): undefined reference to `__ssp_protected_read'
ld: io.o:io.c:(.text+0x5606): more undefined references to 
`__ssp_protected_read' follow

If I simply edit /usr/include/ssp/ssp.h to remove the __gnu_inline__ from
the definition of__ssp_inline and make it static again, then gawk builds,

i.e., reverting

-/* $NetBSD: ssp.h,v 1.14 2023/03/29 13:37:10 christos Exp $*/
+/* $NetBSD: ssp.h,v 1.15 2023/11/10 23:03:37 christos Exp $*/

allows gawk to build.


Cheers,

Patrick

Re: blocklist puzzle

2023-02-19 Thread Patrick Welche

On Sun, Feb 19, 2023 at 09:52:24AM +0100, J. Hannken-Illjes wrote:
> > On 18. Feb 2023, at 23:34, Patrick Welche  wrote:
> > 
> > 12 hours after rebooting
> > 
> > # npfctl rule blocklistd list
> > block in final family inet4 proto tcp from 61.177.173.35/32 to any port 22 
> > # id="1"
> > #
> > 
> > contains a single block, yet /var/log/messages is full:
> > 
> > Feb 18 17:47:44 mail blocklistd[596]: blocked 195.226.194.142/32:22 for 
> > 172800 seconds
> > Feb 18 18:18:00 mail blocklistd[596]: released 171.225.184.179/32:22 after 
> > 172800 seconds
> > Feb 18 18:18:07 mail blocklistd[596]: blocked 195.226.194.142/32:22 for 
> > 172800 seconds
> > Feb 18 18:35:18 mail blocklistd[596]: blocked 31.41.244.124/32:22 for 
> > 172800 seconds
> > Feb 18 18:48:10 mail blocklistd[596]: blocked 195.226.194.242/32:22 for 
> > 172800 seconds
> > Feb 18 19:18:02 mail blocklistd[596]: blocked 195.226.194.142/32:22 for 
> > 172800 seconds
> > Feb 18 20:18:13 mail blocklistd[596]: blocked 195.226.194.142/32:22 for 
> > 172800 seconds
> > Feb 18 20:47:46 mail blocklistd[596]: blocked 195.226.194.242/32:22 for 
> > 172800 seconds
> > Feb 18 21:17:48 mail blocklistd[596]: blocked 195.226.194.242/32:22 for 
> > 172800 seconds
> > Feb 18 21:47:55 mail blocklistd[596]: blocked 195.226.194.242/32:22 for 
> > 172800 seconds
> > 
> > 
> > 
> > If something were misconfigured, I would expect no hosts in the ruleset,
> > rather than some (or one). How can this work partially?
> > 
> > extract of npf.conf:
> > 
> > group "external" on $ext_if {
> >pass stateful out final all
> > 
> >ruleset "blocklistd"
> > 
> > ...
> 
> Looks like your ruleset "blocklistd" never fires as the rule above is "final 
> all".

I thought this would only apply to packets on their way out, whereas the
blocking should happen on the way in?

npf.conf(5) gives the example:

 group "external" on $ext_if {
 pass stateful out final all

 block in final from 
...
which suggests that it should work?

"npfctl rule blocklistd list" also lists more hosts today, so it at least
works _sometimes_.

The puzzle is this apparent _sometimes_ - I would expect an empty list if
this were misconfigured, which is why I can't guess where to look for the
problem.


Cheers,

Patrick

blocklist puzzle

2023-02-18 Thread Patrick Welche

I see in /var/log/messages (NetBSD-10.99.2/XEN3_DOMU/amd64):


...
Feb 18 00:19:16 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 
seconds
Feb 18 00:49:33 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 
seconds
Feb 18 01:18:58 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 01:49:45 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 02:18:50 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 
seconds
Feb 18 02:49:23 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 03:49:05 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 04:18:15 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 04:49:27 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 05:18:16 mail blocklistd[625]: blocked 195.226.194.142/32:22 for 172800 
seconds
Feb 18 05:49:14 mail blocklistd[625]: blocked 195.226.194.242/32:22 for 172800 
seconds
Feb 18 06:48:01 mail blocklistd[625]: blocked 195.226.194.142/32:22 fo

172800 seconds = 48 hours, so the hourly attempt shouldn't make it.

# npfctl rule blocklistd list | grep 195.226
# 

but npf doesn't appear to be blocking it, though some are blocked:

# npfctl rule blocklistd list 
block in final family inet4 proto tcp from 179.60.147.157/32 to any port 22 # 
id="d" 
block in final family inet4 proto tcp from 171.225.184.179/32 to any port 22 # 
id="f" 
block in final family inet4 proto tcp from 113.249.95.65/32 to any port 22 # 
id="10" 
...


(I noticed while wondering why mail to said domu stop being received, which
seems to happen every 4 days.)

Thoughts?


Cheers,

Patrick

uvm_pagelookup panic, no bt

2023-01-30 Thread Patrick Welche

Just had this panic on -current/amd64, but it doesn't make for much
of a bug report:

Crash version 10.99.2, image version 10.99.2.
crash: _kvm_kvatop(0)
Kernel compiled without options LOCKDEBUG.
System panicked: kernel diagnostic assertion "uvm_pagelookup(>u_obj, 
(UAO_SWHASH_ELT_PAGEIDX_BASE(elt) + j) << PAGE_SHIFT) == NULL" failed: file 
"../../../../uvm/uvm_aobj.c", line 1364 
Backtrace from time of crash is available.
crash> bt
end() at 0
crash: _kvm_kvatop(b610cec18aa8)
crash: kvm_read(0xb610cec18aa8, 8): could not read PT level 3 entry: 
Invalid argument


I'll try running with LOCKDEBUG for a bit...


Cheers,

Patrick

Re: ctfmerge i/o error

2022-12-14 Thread Patrick Welche

On Wed, Dec 14, 2022 at 09:15:40PM -, Christos Zoulas wrote:
> In article ,
> Patrick Welche   wrote:
> >While trying to build a release, I am having trouble trying to make
> >GENERIC_KASLR.debug, so manually, I tried
> >
> >cd /usr/obj/sys/arch/amd64/compile.amd64/GENERIC_KASLR
> >make clean
> >make dependall
> >
> >and repeatedly get
> >
> >#  link  GENERIC_KASLR/netbsd
> >ld -Map netbsd.map --cref -T netbsd.ldscript -Ttext 0x8020
> >-e start --split-by-file=0x10 -r -d -X -o netbsd
> >${SYSTEM_OBJ:[@]:Nswapnetbsd.o} ${EXTRA_OBJ} vers.o swapnetbsd.o
> >NetBSD 9.99.108 (GENERIC_KASLR) #4: Wed Dec 14 10:40:49 CST 2022
> >   textdata bss dec hex filename
> >21686890 686136  466512 2283953815c80f2 netbsd
> >ERROR: ctfmerge: netbsd.ctf: Cannot finalize temp file: I/O error:
> >Operation already in progress
> >*** Error code 1
> >
> >which is not a message I recognize. FWIW /usr/obj is a ZFS filesystem, but
> >this hasn't caused trouble so far...
> >
> >Thoughts?
> 
> Can you ktrace to find out what syscall caused EALREADY. zfs uses EALREADY
> for TX_WRITE, when a block is already being synced according to my quick
> glance to the code. I am not sure how this stuff is supposed to work, but
> I don't think that this error is supposed to be returned by filesystem
> related syscalls, but only for connect(2)?

and of course now it is no longer reproducible. I note that that ZFS
partition now has more free space than earlier, so I will guess the
error message might mean "out of space"...


Thanks,

Patrick

ctfmerge i/o error

2022-12-14 Thread Patrick Welche

While trying to build a release, I am having trouble trying to make
GENERIC_KASLR.debug, so manually, I tried

cd /usr/obj/sys/arch/amd64/compile.amd64/GENERIC_KASLR
make clean
make dependall

and repeatedly get

#  link  GENERIC_KASLR/netbsd
ld -Map netbsd.map --cref -T netbsd.ldscript -Ttext 0x8020 -e start 
--split-by-file=0x10 -r -d -X -o netbsd ${SYSTEM_OBJ:[@]:Nswapnetbsd.o} 
${EXTRA_OBJ} vers.o swapnetbsd.o
NetBSD 9.99.108 (GENERIC_KASLR) #4: Wed Dec 14 10:40:49 CST 2022
   textdata bss dec hex filename
21686890 686136  466512 2283953815c80f2 netbsd
ERROR: ctfmerge: netbsd.ctf: Cannot finalize temp file: I/O error: Operation 
already in progress
*** Error code 1

which is not a message I recognize. FWIW /usr/obj is a ZFS filesystem, but
this hasn't caused trouble so far...

Thoughts?


Cheers,

Patrick

Re: Usable Notebook for NetBSD-current wanted

2022-09-10 Thread Patrick Welche

On Fri, Sep 09, 2022 at 06:34:03PM +0200, Frank Kardel wrote:
> I have seen quite a bit work in the drm/X area - thanks for that. I was
> hoping that 1915 was a common denominator that would allow many Notebooks to
> work. I think i have learned now that i915 is many critters all alike or
> different so things don't always work as expected.

Not sure of definition of notebook vs laptop - I got a Dell Latitude 7300
when it was the current model in 2020 (so not that long ago!) and use
it daily running NetBSD-current. The only "glitch" you already know about(!):

  https://mail-index.netbsd.org/current-users/2022/07/21/msg042712.html

and it rarely manifests itself, as often there is either cpu load,
or a video running, oh, and I use a USB wifi dongle.

Cheers,

Patrick

raspberry pi zero W serial port overlay fun

2022-07-20 Thread Patrick Welche

[I posted this to port-arm around 4th July, but hasn't made it. Reposting
 here in case useful...]


tl;dr On a raspberry pi zero W, updating the firmware allows the disable-bt
  overlay to function resulting in a stable serial console.

Experimental method(!)

- grab 
https://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202207031950Z/evbarm-earmv6hf/binary/gzimg/rpi.img.gz
- gunzip / dd to card / remove "console=fb" from first line of cmdline.txt
- connect raspberry pi zero W 1.1 to serial port via pins 4,6,8,10 which boots
- watch output with tip & baud rate 115200
- all is well until /etc/rc.local is run, changing the cpu frequency,
  which messes up the baud rate because the raspberry pi zero w, by
  default, reserves its real UART for bluetooth, and attaches a "mini"
  UART to the console, and this "mini" UART doesn't appear to have its
  own clock:
...
[   1.000] simplebus0 at armfdt0: Raspberry Pi Zero W Rev 1.1
...
[   1.000] plcom0 at simplebus1: ARM PL011 UART
...
[   1.000] com0 at simplebus1: BCM AUX UART, 1-byte FIFO
[   1.000] com0: console
...
Starting local daemons:.
JRQ
   ܊VP�҇KZ�� �   ���Q


At this point, we can either:
1) delete /etc/rc.local
2) try to make use of the "real" UART by disabling bluetooth

1) works, 2) would be nice.

2):
- create overlays directory in SD card's /boot partition:
mount /dev/sd0e /mnt
cd /mnt
mkdir overlays
cp /tmp/firmware/boot/overlays/disable-bt.dtbo overlays

- add "dtoverlay=disable-bt" to config.txt

- boot, but no change:

[   1.000] com0 at simplebus1: BCM AUX UART, 1-byte FIFO
[   1.000] com0: console



- Grab precompiled raspberry pi firmware:
cd /tmp
git clone --depth=1 https://github.com/raspberrypi/firmware.git

Compare disable-bt.dtb0 to rpi.img dtb using file:
/tmp/firmware/boot/overlays/disable-bt.dtbo: Device Tree Blob version 17, 
size=1073, boot CPU=0, string block size=145, DT structure block size=872
/mnt/dtb/bcm2835-rpi-zero-w.dtb: Device Tree Blob version 17, size=19566, 
boot CPU=0, string block size=1770, DT structure block size=16700

Same version, so looking hopeful.

- replace firmware:

mount /dev/sd0e /mnt
cd /mnt
for sfx in elf bin dat; do
  rm *.${sfx}
  cp /tmp/firmware/boot/*.${sfx} .
done;
rm dtb/*
cp /tmp/firmware/boot/bcm2708-rpi-zero-w.dtb dtb

  NB in rpi.img.gz was called: bcm2835-rpi-zero-w.dtb, bcm2708, yet
  dtc -I dtb -O dts  ./boot/overlays/disable-bt.dtbo
  shows
/dts-v1/;

/ {
compatible = "brcm,bcm2835";
...


Now, boot, and success!

[   1.000] plcom0 at simplebus1: ARM PL011 UART
[   1.000] plcom0: txfifo disabled
[   1.000] plcom0: console

rpi# sysctl machdep.cpu.frequency
machdep.cpu.frequency.target = 1000
machdep.cpu.frequency.current = 1000
machdep.cpu.frequency.min = 700
machdep.cpu.frequency.max = 1000
machdep.cpu.frequency.available = 700 1000

and that is read over the working serial console.

BTW my impression is there is no bluetooth support, so we are not
missing anything by disabling it? (as opposed to miniuart-bt).


Cheers,

Patrick

Re: Branching for NetBSD 10

2022-06-04 Thread Patrick Welche

On Sat, Jun 04, 2022 at 05:16:34AM +, Thomas Mueller wrote:
> My big concern with the branch is the entropy bug, where building many 
> packages from pkgsrc is stopped.


Isn't "entropy bug" a misnomer for the interesting libpthread bug

  https://gnats.netbsd.org/56414

which has recently been fixed?



Cheers,

Patrick

Re: uvideo uvm_fault panic

2022-05-16 Thread Patrick Welche

On Sat, May 14, 2022 at 03:30:49PM +, Taylor R Campbell wrote:
> > Date: Sat, 14 May 2022 15:14:50 +
> > From: sc.dy...@gmail.com
> > 
> > On 2022/05/14 13:47, Taylor R Campbell wrote:
> > > Can you please try the two attached patches?
> > > 
> > > 1. uvideobadstream.patch should fix the _crash_ when you try to open a
> > >video stream on a device that the driver deemed to have a bad
> > >descriptor.  Try this one first, if you have time -- it prevents a
> > >malicious USB device from causing this kernel crash.
> > 
> > Yes, it fixes crash -- as it prevents attaching video(4)...
> > 
> > > 2. uvideosizeof.patch should fix the sizeof calculations so that the
> > >driver stops rejecting your device's descriptor.  This one should
> > >make your device work again.
> > 
> > It works well again.
> 
> Thanks, committed!

Just logged into zoom.us without a panic :-)

and now

# videoctl -a
videoctl: couldn't open '/dev/video0': Input/output error

Does this dmesg look sensible?

$ dmesg -t | grep video
uvideo0 at uhub1 port 6 configuration 1 interface 0: CKFIH12P466071019182 
(0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2
video0 at uvideo0: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), 
rev 2.01/81.78, addr 2
video1 at uvideo0: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), 
rev 2.01/81.78, addr 2
uvideo1 at uhub1 port 6 configuration 1 interface 2: CKFIH12P466071019182 
(0x0bda) Integrated_Webcam_HD (0x5531), rev 2.01/81.78, addr 2
video2 at uvideo1: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), 
rev 2.01/81.78, addr 2
video3 at uvideo1: CKFIH12P466071019182 (0x0bda) Integrated_Webcam_HD (0x5531), 
rev 2.01/81.78, addr 2

there is just the one built-in camera...


Cheers,

Patrick

Re: uvideo uvm_fault panic

2022-04-22 Thread Patrick Welche

On Fri, Apr 22, 2022 at 09:52:55AM +0100, Patrick Welche wrote:
> Logged in to zoom.us using firefox on 9.99.96/amd64 from Tuesday and
> got the following panic:
... 
> --- trap (number 6) ---
> uvideo_open() at uvideo_open+0x17
> videoopen() at videoopen+0x71
...

Even easier reproducer: "videoctl -a"

P

uvideo uvm_fault panic

2022-04-22 Thread Patrick Welche

Logged in to zoom.us using firefox on 9.99.96/amd64 from Tuesday and
got the following panic:

crash> bt
end() at 0
kern_reboot() at sys_reboot
vpanic() at vpanic+0x181
panic() at printf_tolog
trap() at startlwp
--- trap (number 6) ---
uvideo_open() at uvideo_open+0x17
videoopen() at videoopen+0x71
cdev_open() at cdev_open+0x19c
spec_open() at spec_open+0x224
VOP_OPEN() at VOP_OPEN+0x36
vn_open() at vn_open+0x32e
do_open() at do_open+0xc3
do_sys_openat() at do_sys_openat+0x74
sys_open() at sys_open+0x24
syscall() at syscall+0x18c
--- syscall (number 5) ---
syscall+0x18c:

[ 4205.3040059] uvm_fault(0xa978ca4ba1b0, 0x0, 1) -> e
[ 4205.3040059] fatal page fault in supervisor mode
[ 4205.3040059] trap type 6 code 0 rip 0x803a8018 cs 0x8 rflags 0x10286 
cr2 0 ilevel 0 rsp 0x8501501d2a30
[ 4205.3040059] curlwp 0xa978f1bce940 pid 2365.6641 lowest kstack 0x8501
501ce2c0


(also surprised at the web site as I had no intention of using the camera)
(gdb doesn't cope with the corefile)

crash> print uvideo_open
803a8001

uvideo_open:movq8 (%rdi),%rdx
uvideo_open+0x4:movl28 (%rdx),%ecx
uvideo_open+0x7:testl   %ecx,%ecx
uvideo_open+0x9:jnz 803a8064
uvideo_open+0xb:pushq   %rbp
uvideo_open+0xc:movq%rsp,%rbp
uvideo_open+0xf:subq$0x30,%rsp
uvideo_open+0x13:   movq40 (%rdi),%rax
uvideo_open+0x17:   movq0 (%rax),%rcx
uvideo_open+0x1a:   movq%rcx,ffd4 (%rbp)
uvideo_open+0x1e:   movq8 (%rax),%rcx
uvideo_open+0x22:   movq%rcx,ffdc (%rbp)
uvideo_open+0x26:   movq10 (%rax),%rcx
uvideo_open+0x2a:   movq%rcx,ffe4 (%rbp)
uvideo_open+0x2e:   movq18 (%rax),%rcx
uvideo_open+0x32:   movq%rcx,ffec (%rbp)
uvideo_open+0x36:   movq20 (%rax),%rcx
uvideo_open+0x3a:   movq%rcx,fff4 (%rbp)
uvideo_open+0x3e:   movl28 (%rax),%eax
uvideo_open+0x41:   movl%eax,fffc (%rbp)
uvideo_open+0x44:   movl28 (%rdx),%eax

PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
2365 >6641 7   0   100   a978f1bce940   VideoCapture
0> 213 7   3   240   a978c35ea8c0   usb1


Thoughts?


Cheers,

Patrick

Re: current - unable to rebuild gobject-introspection due to libffi

2022-02-14 Thread Patrick Welche

On Mon, Feb 14, 2022 at 03:13:41PM +0100, Riccardo Mottola wrote:
> Hi Patrick,
> 
> 
> Patrick Welche wrote:
> >> Given the error below, there seems to be a failure right on the libffi
> >> needed for libgirepository, but it is the one I am trying to replace, or
> >> not?
> > I think one cunning plan seen on this list was to look at the output from
> > 
> >   grep libffi.so.7 /usr/pkg/pkgdb/*/+BUILD_INFO
> 
> the output is below. Sorry if it doesn't illuminate immediately. Does it
> mean I should force a rebuild of another package first?

That's what I thought - I assume you have libffi.so.8, so those packages
will be in trouble - but none seem relevant.

The other work around seen on the mailing lists was to remove the installed
gobject-introspection, then build it, which would explain why pbulk builds
are happy. This might make sense as:

ld: warning: libffi.so.7, needed by /usr/pkg/lib/libgirepository-1.0.so.1, not 
found (try using -rpath or -rpath-link)

suggests an already installed gobject-introspection...

(I think there is a way of not removing all the packages which depend
on gobject-introspection)

Cheers,

Patrick

Re: current - unable to rebuild gobject-introspection due to libffi

2022-02-14 Thread Patrick Welche

On Mon, Feb 14, 2022 at 01:03:33PM +0100, Riccardo Mottola wrote:
> I am on netbsd 9.99.93 and current pkgsrc tree.
> After upgrading the whole base system, upgrading pkgsrc is also a good idea.
> pkg_rolling-replace -uv is my usual friend.
> 
> I always stop at gobject introspection, which is currently broken due to
> libffi upgrade
> 
> Given the error below, there seems to be a failure right on the libffi
> needed for libgirepository, but it is the one I am trying to replace, or
> not?

I think one cunning plan seen on this list was to look at the output from

  grep libffi.so.7 /usr/pkg/pkgdb/*/+BUILD_INFO

Cheers,

Patrick

Re: ixg wierdness

2021-12-22 Thread Patrick Welche

On Wed, Dec 22, 2021 at 01:34:25PM +0100, Hauke Fath wrote:
> On Wed, 22 Dec 2021 12:26:21 +0000, Patrick Welche wrote:
> > The box in 53155 is Hauke's - also a Dell, but slightly different model.
> 
> he@, not hauke@ -- no Dell boxes here.

Sorry - Havard's!

On the 51355 front, dholland asks if the 2 bnx hang issue is the same as
47229, and it looks like it. From the email threads quoted in 47229, the
gist seems to be that the issue doesn't exist on /i386, just /amd64.

Cheers,

Patrick

Re: ixg wierdness

2021-12-22 Thread Patrick Welche

On Wed, Dec 22, 2021 at 12:05:38PM +0900, SAITOH Masanobu wrote:
> On 2021/12/22 9:38, SAITOH Masanobu wrote:
> > I will take a look of it.

Thanks - reams of logs heading to you off-list...

> If the machine is the same as kern/53155 I don't know why
> the MSI-X allocation fails. If the fallback code in ixgbe.c
> has a bug, it would worth to try the following diff:

The box in 53155 is Hauke's - also a Dell, but slightly different model.


Cheers,

Patrick

ixg wierdness

2021-12-21 Thread Patrick Welche

On a box with 4 bnx and 4 ixg interfaces, I just hit PR kern/53155
when trying to use bnx1. (Built LOCKDEBUG etc kernel, with serial
console. Hang such that ~# doesn't drop into ddb) No problems
running as an NFS server for a year or two just using bnx0.
(I didn't try "up"ing bnx2)

So I tried swapping to use ixg0 and ixg1 instead.

I see a strange bursty pattern with what looks like a 1s count down, e.g.:

64 bytes from 10.0.0.236: icmp_seq=642 ttl=255 time=37004.721972 ms
64 bytes from 10.0.0.236: icmp_seq=643 ttl=255 time=36004.533428 ms
64 bytes from 10.0.0.236: icmp_seq=644 ttl=255 time=35004.224479 ms
64 bytes from 10.0.0.236: icmp_seq=645 ttl=255 time=34003.925027 ms
64 bytes from 10.0.0.236: icmp_seq=646 ttl=255 time=33003.615239 ms
64 bytes from 10.0.0.236: icmp_seq=647 ttl=255 time=32003.313832 ms
64 bytes from 10.0.0.236: icmp_seq=648 ttl=255 time=31003.008233 ms
64 bytes from 10.0.0.236: icmp_seq=649 ttl=255 time=30002.702356 ms
64 bytes from 10.0.0.236: icmp_seq=650 ttl=255 time=29002.396480 ms
64 bytes from 10.0.0.236: icmp_seq=651 ttl=255 time=28002.090882 ms
64 bytes from 10.0.0.236: icmp_seq=652 ttl=255 time=27001.772992 ms
64 bytes from 10.0.0.236: icmp_seq=653 ttl=255 time=26001.477731 ms
64 bytes from 10.0.0.236: icmp_seq=654 ttl=255 time=25001.291421 ms
64 bytes from 10.0.0.236: icmp_seq=655 ttl=255 time=24000.965150 ms
64 bytes from 10.0.0.236: icmp_seq=656 ttl=255 time=23000.622398 ms
64 bytes from 10.0.0.236: icmp_seq=657 ttl=255 time=22000.278807 ms
64 bytes from 10.0.0.236: icmp_seq=658 ttl=255 time=20999.931305 ms
64 bytes from 10.0.0.236: icmp_seq=659 ttl=255 time=1.592463 ms
64 bytes from 10.0.0.236: icmp_seq=660 ttl=255 time=19009.253137 ms
64 bytes from 10.0.0.236: icmp_seq=661 ttl=255 time=18008.910105 ms
64 bytes from 10.0.0.236: icmp_seq=662 ttl=255 time=17008.551987 ms
64 bytes from 10.0.0.236: icmp_seq=663 ttl=255 time=16008.224040 ms
64 bytes from 10.0.0.236: icmp_seq=664 ttl=255 time=15007.874862 ms
64 bytes from 10.0.0.236: icmp_seq=665 ttl=255 time=14007.533506 ms
64 bytes from 10.0.0.236: icmp_seq=666 ttl=255 time=13007.194943 ms
64 bytes from 10.0.0.236: icmp_seq=667 ttl=255 time=12006.852469 ms
64 bytes from 10.0.0.236: icmp_seq=668 ttl=255 time=11006.509437 ms
64 bytes from 10.0.0.236: icmp_seq=669 ttl=255 time=10006.193223 ms
64 bytes from 10.0.0.236: icmp_seq=670 ttl=255 time=9005.846559 ms
64 bytes from 10.0.0.236: icmp_seq=671 ttl=255 time=8005.508556 ms
64 bytes from 10.0.0.236: icmp_seq=672 ttl=255 time=7005.165803 ms
64 bytes from 10.0.0.236: icmp_seq=673 ttl=255 time=6004.818579 ms
64 bytes from 10.0.0.236: icmp_seq=674 ttl=255 time=5004.479458 ms
64 bytes from 10.0.0.236: icmp_seq=675 ttl=255 time=4004.132514 ms
64 bytes from 10.0.0.236: icmp_seq=676 ttl=255 time=3003.794232 ms
64 bytes from 10.0.0.236: icmp_seq=677 ttl=255 time=2003.431084 ms
64 bytes from 10.0.0.236: icmp_seq=678 ttl=255 time=1003.103697 ms
64 bytes from 10.0.0.236: icmp_seq=679 ttl=255 time=2.761223 ms
64 bytes from 10.0.0.236: icmp_seq=717 ttl=255 time=6373.442427 ms
64 bytes from 10.0.0.236: icmp_seq=718 ttl=255 time=5373.238237 ms
64 bytes from 10.0.0.236: icmp_seq=719 ttl=255 time=4372.937388 ms
64 bytes from 10.0.0.236: icmp_seq=720 ttl=255 time=3372.631791 ms
64 bytes from 10.0.0.236: icmp_seq=721 ttl=255 time=2372.325913 ms
64 bytes from 10.0.0.236: icmp_seq=722 ttl=255 time=1372.006627 ms
64 bytes from 10.0.0.236: icmp_seq=723 ttl=255 time=371.714159 ms
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
...
then eventually wakes up again

when pinging to its ixg0 interface.

You see the bursts while running tcpdump -ni ixg0.

ixg0 at pci8 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, 
Version - 4.0.1-k
ixg0: device 82599EB
ixg0: ETrackID 81a5
ixg0: autoconfiguration error: failed to allocate MSI-X interrupt
ixg0: interrupting at ioapic1 pin 22
ixg0: Ethernet address 00:1b:21:9a:d4:84
ixg0: PHY: OUI 0x0014a6 model 0x0001, rev. 0
ixg0: PCI Express Bus: Speed 5.0GT/s Width x8
ixg0: feature cap 0x1780
ixg0: feature ena 0x1000

ixg0: flags=0x8843 mtu 1500
capabilities=0x7ff80
capabilities=0x7ff80
capabilities=0x7ff80
enabled=0
ec_capabilities=0xf
ec_enabled=0x7
address: 00:1b:21:9a:d4:84
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::21b:21ff:fe9a:d484%ixg0/64 flags 0 scopeid 0x5
inet 10.0.0.236/24 broadcast 10.0.0.255 flags 0

This is with 15 December 2021 -current/amd64.

Any ideas on what might be going on?


Cheers,

Patrick

Re: IDENTIFY failed

2021-11-09 Thread Patrick Welche

On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote:
> Jun, Patrick, thank you for dmesg (and discussion offlist).
> 
> For Jun, the problem is no longer reproducible even with the original
> copy of kernel, which failed before.
> 
> So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine:
> 
> https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c
> 
> With this patch, AHCISATA_EXTRA_DELAY option is no longer required for
> this machine.

I cvs updated, rebuilt the kernel without the DELAY, and checked that
the problem still existed. (it does) Then applied your gist patch, and
had a successful reboot!

(I haven't tried reducing the delay)


Thanks,

Patrick

Re: IDENTIFY failed

2021-11-01 Thread Patrick Welche

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:
> From: matthew green 
> Subject: re: IDENTIFY failed
> Date: Fri, 29 Oct 2021 07:18:09 +1100
> 
> >> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
> >> > drive 0
> >> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html
> > this one has reduced timeframe, too:
> >> between
> >> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
> >> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 
> > which changed how some interrupt handling works, and:
> >http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
> > which removed some delays in the probe path.  possibly this one
> > is more likely to be at fault since it touches the probe path
> > directly.
> 
> add 
> /usr/src/sys/arch/amd64/conf/GENERIC.local
> options AHCISATA_EXTRA_DELAY
> 
> compile kernel

That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick

IDENTIFY failed

2021-10-28 Thread Patrick Welche

Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure:

wd1 at atabus1 drive 0
autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0
wd1: autoconfiguration error: IDENTIFY failed
wd1(ahcisata0:1:0): using PIO mode 0

and booting fails. Reverting and booting with 9.99.90 gets me a working box:

wd1 at atabus1 drive 0
wd1: 
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect...
...
wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) 
(using DMA), NCQ (31 tags)

I'm sure someone else saw this too, but I can't find the original post...


Cheers,

Patrick

Re: /bin/sh tabcomplete

2021-09-14 Thread Patrick Welche

On Tue, Sep 14, 2021 at 11:04:47AM -0400, Christos Zoulas wrote:
> This is a side effect of the change to add file-completion for commands. 
> Fixed.
> 
> christos
> 
> > On Sep 14, 2021, at 6:24 AM, Robert Elz  wrote:
> > 
> >Date:Tue, 14 Sep 2021 09:19:36 +0100
> >From:Patrick Welche 
> >Message-ID:  
> > 
> >  | It seems that after updating a box from August -current to yesterday's
> >  | -current, /bin/sh's tabcomplete no longer escapes spaces?
> > 
> > If something changed there, it must be related to the way that libedit now
> > works, Christos?

That was quick - thanks for fixing!

Patrick

/bin/sh tabcomplete

2021-09-14 Thread Patrick Welche

It seems that after updating a box from August -current to yesterday's
-current, /bin/sh's tabcomplete no longer escapes spaces?

Cheers,

Patrick

Re: zpool import skips wedges due to a race condition

2021-09-08 Thread Patrick Welche

On Wed, Sep 08, 2021 at 06:38:02AM -, Michael van Elst wrote:
> al...@yandex.ru (Alexander Nasonov) writes:
> 
> >When I run zfs import, it launches 32 threads and opens 32 disks in
> >parallel, including cgd1 and dk24. But it can't open dk24 while
> >cgd1 is still open (it fails with EBUSY).
> 
> >I fixed it in the attatched patch by running only one thread. It's
> >not the best approach but I'm not sure how to fix it properly.
> 
> 
> There are other issues with scanning devices in an arbitrary order,
> a parallel scan just makes it worse by adding randomness.
> 
> LVM tries to solve this with an optional filter for device names
> when scanning for physical volumes.
> 
> The root detection code tries to solve this by scanning twice, once
> for wedges, once for everything else, and by identifying wedges
> that alias a partition.
> 
> For a complete solution you would need to know all the device relationships
> (dkX on wdY, cgdN on dmN, etc, but also e.g. dkX on cgdN). That still leaves
> out hot-plug devices where "upper" devices appear late.

I see something similar in another context: when I shutdown, shutdown
can stall on a system with dkX on cgdN, with

detaching dkX
detaching cgdN
detaching dkX(same X)
(hang)

If the dice are rolled correctly, and cgdN gets the detach before dkX,
it shuts down properly...


Cheers,

Patrick

Re: Build failure for evbarmv6hf due to new OpenSSH

2021-09-03 Thread Patrick Welche

On Fri, Sep 03, 2021 at 06:46:20PM +0900, Rin Okuyama wrote:
> Build for evbarmv6hf{,eb} fails due to collision of symbol tilde_expand b/w
> libedit and new libopenssh:

I just hit this too, and was bemused by why the amd64 build wasn't affected:

# find /usr/src/distrib/amd64 -name list | grep ssh
# find /usr/src/distrib/evbarm  -name list | grep ssh
/usr/src/distrib/evbarm/instkernel/sshramdisk/list


... applying your patch locally!


Cheers,

Patrick

Re: 9.99.86 HEAD

2021-06-30 Thread Patrick Welche

On Wed, Jun 30, 2021 at 05:57:46PM +, David Holland wrote:
> On Wed, Jun 30, 2021 at 04:10:17PM +0100, Patrick Welche wrote:
>  > I see what you mean: the next for me is urtwn0 unable to open /dev/bpf.
> 
> This should be fixed now.

All much better now - thanks all!

P

Re: 9.99.86 HEAD

2021-06-30 Thread Patrick Welche

On Wed, Jun 30, 2021 at 03:49:52PM +0200, Martin Husemann wrote:
> On Wed, Jun 30, 2021 at 02:23:23PM +0100, Patrick Welche wrote:
> > The one bug I see is that running cgdconfig causes a panic
> > 
> > --- trap (number 6) ---
> > vn_open() at vn_open+0x31d
> 
> That one is fixed now, but there is still serious fallout.

Thanks, yes, after your fix cgdconfig doesn't cause a panic.

I see what you mean: the next for me is urtwn0 unable to open /dev/bpf.


Cheers,

Patrick

Re: 9.99.86 HEAD

2021-06-30 Thread Patrick Welche

On Wed, Jun 30, 2021 at 12:11:43PM +, voidpin wrote:
> On Wed, Jun 30, 2021 at 12:03:21PM +, voidpin wrote:
> > I've just tried to upgrade from 9.99.85 to 9.99.86 and I'm unable to even 
> > boot the system.
> > Several errors related to acpi, cpu-temp, ipv6 and fs.
> > It just loops back to the welcome screen and tries to reboot.
> >
> > Am I missing some major change? I haven't seen anything hinting at this in 
> > the mailing lists.
> >
> > The last working install image is from 29-Jun-2021 00:25
> > The following three look broken to me.
> > On the plus side, I've managed to run fsck from the live image (29-Jun-2021 
> > 00:25), correct the file system errors and downgrade back to 9.99.85.
> > 9.99.86 was a bigger update, there were three independent kernel bump 
> > changes.
> >
> > It was discussed on the developers mailing list.
> >
> > There have been at least two major bugs reported by martin@ already.
> >
> > Please report your issues to current-users@ as well.
> 
> The last two images for 9.99.85 also look to be broken.
> At least, they were unable to boot my system after the upgrade to 9.99.86 
> when I was trying to downgrade.
> 
> The one from 29-Jun-2021 00:25 did work and my system is back up.

Not sure of cause-and-effect, but my first attempt at booting
9.99.86 just restarted during boot. make clean in the kernel compile
directory and rebuilding got me a kernel which booted.

The one bug I see is that running cgdconfig causes a panic

--- trap (number 6) ---
vn_open() at vn_open+0x31d
vn_bdev_openpath() at vn_bdev_openpath+0x40
cgdioctl() at cgdioctl+0x5f3
VOP_IOCTL() at VOP_IOCTL+0x41
vn_ioctl() at vn_ioctl+0xad
sys_ioctl() at sys_ioctl+0x555
syscall() at syscall+0x1f2
--- syscall (number 54) ---
syscall+0x1f2:


Cheers,

Patrick

Re: dump/restore out of range inode

2021-06-06 Thread Patrick Welche

On Sat, Jun 05, 2021 at 06:45:24PM +0200, J. Hannken-Illjes wrote:
> Patrick,
> 
> please try the attached diff so the "spcl.c_addr" test
> no longer runs off the spcl record.
> 
> "blks" is used for multi-tape checkpointing and examining
> TS_INODE/TS_ADDR records should be sufficient as the are
> the only records that support holes in data.

Thanks! With your patch, the dump | restore has been happily
running for about 12 hours now.

In your previous email you mention:

> This trace makes no sense, bitmaps (CLRI and BITS) don't have holes
> and therefore ignore the "c_addr" array.  I have no idea how dumping
> a bitmap ends in the hole processing of flushtape().

Is it worth investigating further while I have the reproducer?


Cheers,

Patrick

Re: dump/restore out of range inode

2021-06-05 Thread Patrick Welche

On Sat, Jun 05, 2021 at 10:03:21AM -, Michael van Elst wrote:
> pr...@cam.ac.uk (Patrick Welche) writes:
> 
> >How can gdb not see a spcl anywhere?
> 
> /usr/include/protocols/dumprestore.h:#define spcl u_spcl.s_spcl
> 
> spcl is just a define that got resolved by the compiler.

ach... here it is(gdb) print u_spcl.s_spcl

$2 = {c_type = 6, c_old_date = 0, c_old_ddate = 0, c_volume = 1, 
  c_old_tapea = 0, c_inumber = 397083647, c_magic = 424935705, 
  c_checksum = 1906085926, __c_ino = {__uc_dinode = {di_mode = 0, 
  di_nlink = 0, di_oldids = {0, 0}, di_size = 0, di_atime = 0, 
  di_atimensec = 0, di_mtime = 0, di_mtimensec = 0, di_ctime = 0, 
  di_ctimensec = 0, di_db = {0 }, di_ib = {0, 0, 0}, 
  di_flags = 0, di_blocks = 0, di_gen = 0, di_uid = 0, di_gid = 0, 
  di_modrev = 0}, __uc_ino = {__uc_mode = 0, __uc_spare1 = {0, 0, 0}, 
  __uc_size = 0, __uc_old_atime = 0, __uc_atimensec = 0, 
  __uc_old_mtime = 0, __uc_mtimensec = 0, __uc_spare2 = {0, 0}, 
  __uc_rdev = 0, __uc_birthtimensec = 0, __uc_birthtime = 0, 
  __uc_atime = 0, __uc_mtime = 0, __uc_spare4 = {0, 0, 0, 0, 0, 0, 0}, 
  __uc_file_flags = 0, __uc_spare5 = {0, 0}, __uc_uid = 0, __uc_gid = 0, 
  __uc_spare6 = {0, 0}}}, c_count = 48473, 
  c_addr = '\000' , 
  c_label = "none", '\000' , c_level = 0, 
  c_filesys = "/store/backup", '\000' , 
  c_dev = "/dev/rdk18", '\000' , 
  c_host = "quantz", '\000' , c_flags = 2, 
  c_old_firstrec = 0, c_date = 1622887657, c_ddate = 0, c_tapea = 10, 
  c_firstrec = 0, c_spare = {0 }}
(gdb) bt
#0  flushtape () at /usr/src/sbin/dump/tape.c:333
#1  0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", 
isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168
#2  0x00208e49 in dumpmap (map=, type=type@entry=6, 
ino=ino@entry=397083647) at /usr/src/sbin/dump/traverse.c:716
#3  0x0020b355 in main (argc=1, argv=0x7f7fe7e8)
at /usr/src/sbin/dump/main.c:646
(gdb) list
328 }
329 
330 blks = 0;
331 if (iswap32(spcl.c_type) != TS_END) {
332 for (i = 0; i < iswap32(spcl.c_count); i++)
333 if (spcl.c_addr[i] != 0)
334 blks++;
335 }
336 slp->count = lastspclrec + blks + 1 - iswap64(spcl.c_tapea);
337 slp->tapea = iswap64(spcl.c_tapea);
(gdb) print i
$6 = 
(gdb) print u_spcl.s_spcl.c_count  
$7 = 48473
(gdb) whatis u_spcl.s_spcl.c_addr
type = char [512]

so guess optimized_out i >> 512

c_type==6 = TS_CLRI map of inodes deleted since last dump

(a bit odd:
(gdb) print needswap
$11 = 0
(gdb) print iswap32(u_spcl.s_spcl.c_count)
$10 = 1505558528
)

Still puzzled...


Cheers,

Patrick

Re: dump/restore out of range inode

2021-06-05 Thread Patrick Welche

On Thu, Jun 03, 2021 at 05:14:07PM -, Michael van Elst wrote:
> pr...@cam.ac.uk (Patrick Welche) writes:
> 
> >  DUMP: Child 29322 returns LOB status 213
> >213=0xd5
> 
> That's octal. Return status 0213 = 139 -> WCOREFLAG(==128) + signal 11.
> 
> >Can this happen if the original filesystem is broken? At a distance
> >it just looks as though restore hasn't read a symbol table before using it
> >and the filesystem seems to have a valid inode?
> 
> Segfaults should never happen.
> 
> maxino has probably never been set since dump crashed and restore got
> an early end-of-file.

# dump -0auf foo.dmp /store/backup
...
  DUMP: pid=3262 Dumping /dev/rdk18 (/store/backup) to foo.dmp
  DUMP: pid=3262 Label: none
Using 512 buffers (33574952 bytes)
  DUMP: pid=3262 mapping (Pass I) [regular files]
  DUMP: pid=3262 mapping (Pass II) [directories]
  DUMP: pid=3262 estimated 3632204910 tape blocks.
  DUMP: pid=3262 Tape: 1; parent process: 3262 child process 3402
  DUMP: pid=3402 Child on Tape 1 has parent 3262, my pid = 3402
  DUMP: pid=3402 Volume 1 started at: Sat Jun  5 10:35:18 2021
slave 0 wrote 10240 werror 22

and here process 3402 gets the SIGSEGV

Program received signal SIGSEGV, Segmentation fault.
flushtape () at /usr/src/sbin/dump/tape.c:333
333 if (spcl.c_addr[i] != 0)
(gdb) bt
#0  flushtape () at /usr/src/sbin/dump/tape.c:333
#1  0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", 
isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168
#2  0x00208e49 in dumpmap (map=, type=type@entry=6, 
ino=ino@entry=397083647) at /usr/src/sbin/dump/traverse.c:716
#3  0x0020b355 in main (argc=1, argv=0x7f7fe7d8)
at /usr/src/sbin/dump/main.c:646
(gdb) print spcl
No symbol "spcl" in current context.
(gdb) frame 1
#1  0x0020763e in writerec (dp=dp@entry=0x7f7ff3a01380 "", 
isspcl=isspcl@entry=0) at /usr/src/sbin/dump/tape.c:168
168 flushtape();
(gdb) print spcl
No symbol "spcl" in current context.
(gdb) print isspcl
$2 = 0

How can gdb not see a spcl anywhere?

Cheers,

Patrick

dump/restore out of range inode

2021-06-03 Thread Patrick Welche

On a 9.99.83/amd64 box, I just observed the following:

# mount -r -o noatime NAME=backup /store/backup
# cd /tmp/foo 
# dump -0auf - /store/backup | /sbin/restore vdrf -   
Verify tape and initialize maps 
  DUMP: Found /dev/rdk18 on /store/backup in mount table
  DUMP: Date of this level 0 dump: Thu Jun  3 17:12:25 2021 
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rdk18 (/store/backup) to standard output   
  DUMP: Label: none 
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories] 
  DUMP: estimated 3632204910 tape blocks.
  DUMP: Volume 1 started at: Thu Jun  3 17:13:51 2021
  DUMP: Child 29322 returns LOB status 213
Volume header (new inode format) 
Dump   date: Thu Jun  3 17:12:25 2021
Dumped from: the epoch
Level 0 dump of /store/backup on quantz:/dev/rdk18
Label: none
End-of-tape encountered
Warning: End-of-input encountered while extracting 
End-of-tape encountered
Warning: End-of-input encountered while extracting 
Begin level 0 restore
Initialize symbol table.
addino: out of range 2
abort? [yn] n
[1]   Donedump -0auf - /store/backup |
  Floating point exception (core dumped) /sbin/restore vdrf -

Puzzle:

  DUMP: Child 29322 returns LOB status 213
213=0xd5

tape.c then looks as though 0xd5 >> 8 = 0 => X_FINOK all is well?


addino: out of range 2

symtab.c: out of range as 2 >= maxino, which from the coredump is zero!
entrytblsize also = zero.

Can this happen if the original filesystem is broken? At a distance
it just looks as though restore hasn't read a symbol table before using it
and the filesystem seems to have a valid inode?


Cheers,

Patrick

Re: booting xen [was Re: serial console puzzle]

2021-05-07 Thread Patrick Welche

On Sat, May 01, 2021 at 12:33:17PM -0700, Greg A. Woods wrote:
> I've copied this reply to port-xen as it's entirely Xen related.
... 
> On serial console machines I've been using NetBSD "console=xencons" for
> ages.
> 
> This is the documented (by Xen, i.e. preferred Xen way), for serial
> consoles:
> 
>   menu=Boot Xen:load /netbsd-XEN3_DOM0 -v bootdev=dk0 
> console=xencons;multiboot /xen bootscrub=false dom0_mem=4G console=com1,vga 
> console_timestamps=datems dom0_max_vcpus=4 dom0_vcpus_pin=true 
> pv-l1tf=off,domu=off vpmu=on cpuid=rdrand spec-ctrl=no-xen,l1d-flush=off 
> guest_loglvl=all

Amazing: I removed "rndseed /var/db/entropy-file;" from the beginning
of the xen entry in /boot.cfg and instead of getting

(XEN) *** Building a PV Dom0 ***
(XEN) ELF: not an ELF binary
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) 

I got

(XEN) Dom0 has maximum 1128 PIRQs
(XEN) *** Building a PV Dom0 ***
(XEN) ELF: phdr: paddr=0x8020 memsz=0xe07000
(XEN) ELF: memory: 0x8020 -> 0x81007000
(XEN) ELF: note: GUEST_OS = "NetBSD"
(XEN) ELF: note: GUEST_VERSION = "4.99"
(XEN) ELF: note: XEN_VERSION = "xen-3.0"
...
!

The only file which was edited was boot.cfg.


Still no joy though:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 604kB init memory
[   1.000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 20,
[   1.000] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, ,
[   1.000] 2018, 2019, 2020, 2021 The NetBSD Foundation, Inc.  All righ.
[   1.000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.000] The Regents of the University of California.  All rights res.

[   1.000] NetBSD 9.99.82 (XEN3_DOM0) #4: Wed Apr 28 10:53:21 BST 2021
...
[   1.030] pci14: i/o space, memory space enabled
[   1.030] entropy: WARNING: extracting entropy too early
(XEN) mm.c:2980:d0v0 Bad type (saw e401 != exp 2000) fo)
(XEN) mm.c:1142:d0v0 Attempt to create linear p.t. with write perms
[   1.030] xpq_flush_queue: 2 entries (0 successful) on cpu0 (0)
[   1.030] panic: HYPERVISOR_mmu_update failed, ret: -22

[   1.030] cpu0: Begin traceback...
[   1.030] vpanic() at netbsd:vpanic+0x14a
[   1.030] device_printf() at netbsd:device_printf
[   1.030] xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
[   1.030] pmap_zero_page() at netbsd:pmap_zero_page+0xe3
[   1.030] uvm_pagealloc_strat() at netbsd:uvm_pagealloc_strat+0x1ef
[   1.030] pmap_get_physpage() at netbsd:pmap_get_physpage+0x1cb
[   1.030] pmap_growkernel() at netbsd:pmap_growkernel+0x1b3
[   1.030] uvm_map_prepare() at netbsd:uvm_map_prepare+0x3a2
[   1.030] uvm_map() at netbsd:uvm_map+0x70
[   1.030] ubc_init() at netbsd:ubc_init+0x15b
[   1.030] main() at netbsd:main+0x33b
[   1.030] cpu0: End traceback...
[   1.030] fatal breakpoint trap in supervisor mode
[   1.030] trap type 1 code 0 rip 0x8024093d cs 0xe030 rflags 0x2020
[   1.030] curlwp 0x80e75040 pid 0.0 lowest kstack 0x8198720
Stopped in pid 0.0 (system) at  netbsd:breakpoint+0x5:  leave
ds  
es  0
fs  bae0
gs  ba80
rdi 6
rsi deadbeefdeadf00d
rbp 8198bad0
rbx 2
rdx 1
rcx 6
rax 0
r8  2
r9  75
r10 0
r11 fffe
r12 80c57088ostype+0x138
r13 8198bb18
r14 104
r15 10
rip 8024093dbreakpoint+0x5
cs  e030
rflags  202
rsp 8198bad0
ss  e02b
netbsd:breakpoint+0x5:  leave


Cheers,

Patrick

Re: booting xen [was Re: serial console puzzle]

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 08:50:10PM +0200, Manuel Bouyer wrote:
> On Fri, Apr 30, 2021 at 07:28:57PM +0100, Patrick Welche wrote:
> > On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote:
> > > On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote:
> > > > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots.
> > > > Nothing more appears on the console. (-current XEN, xen.gz from 
> > > > xenkernel415)
> > > 
> > > Try xen-debug.gz ?
> > > Do you get the Xen boot messages ?
> > 
> > I don't get the Xen boot messages. Just tried xen-debug.gz and again I just
> > see loading, and then a reboot. I don't think it gets as far xen*.gz.
> > 
> > boot.cfg contains:
> > 
> > menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load 
> > /netbsd-XEN3_
> > DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz 
> > dom0_mem=1024M
> 
> should probably be:
> menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load 
> /netbsd-XEN3_ DOM0 console=com0;multiboot /xen-debug.gz dom0_mem=1024M 
> console=com1 com1=57600,8n1,0x3f8
> 
> (should really be console=com0 for NetBSD, it doens't access the hardware and
> use the I/O services from the hypervisor)

Ah - I remembered that NetBSD starts at 0, but xen at 1, but clearly still
muddled the boot.cfg.

We now have some serial console output!

Some of what flew by:

(XEN) Xen version 4.15.0nb0 (prlw1@) (gcc (nb1 20210411) 10.3.0) debug=y Thu Ap1
(XEN) Latest ChangeSet:
(XEN) build-id: d2ee973db9f01886c1297a3a469888de162702c6
(XEN) Bootloader: NetBSD/x86 BIOS Boot, Revision 5.11 (Tue Apr 20 14:32:11 UTC )
(XEN) Command line: dom0_mem=1024M console=com1 com1=57600,8n1,0x3f8
(XEN) Xen image load base address: 0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) CPU Vendor: AMD, Family 16 (0x10), Model 9 (0x9), Stepping 1 (raw 00100f9)
(XEN) Xen-e820 RAM map:
(XEN)  [, 0009dfff] (usable)
...
(XEN)   5 disabled
(XEN)   6 disabled
(XEN)   7 disabled
(XEN) TOM2: 00182000 (WB)
(XEN) Xenoprofile: AMD IBS detected (0x1f)
(XEN) Running stub recovery selftests...
(XEN) Fixup #UD[]: 82d07fffe040 [82d07fffe040] -> 82d04038aa07
(XEN) Fixup #GP[]: 82d07fffe041 [82d07fffe041] -> 82d04038aa07
(XEN) Fixup #SS[]: 82d07fffe040 [82d07fffe040] -> 82d04038aa07
(XEN) Fixup #BP[]: 82d07fffe041 [82d07fffe041] -> 82d04038aa07
(XEN) HPET: 3 timers usable for broadcast (4 total)
(XEN) NX (Execute Disable) protection active
(XEN) Dom0 has maximum 1128 PIRQs
(XEN) *** Building a PV Dom0 ***
(XEN) ELF: not an ELF binary
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

$ file /netbsd*
/netbsd:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
statically linked, for NetBSD 9.99.82, not stripped
/netbsd-XEN3_DOM0: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
statically linked, for NetBSD 9.99.82, with debug_info, not stripped

(Should we move this to port-xen?)

Cheers,

Patrick

Re: booting xen [was Re: serial console puzzle]

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 07:28:57PM +0100, Patrick Welche wrote:
> On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote:
> > On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote:
> > > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots.
> > > Nothing more appears on the console. (-current XEN, xen.gz from 
> > > xenkernel415)
> > 
> > Try xen-debug.gz ?
> > Do you get the Xen boot messages ?
> 
> I don't get the Xen boot messages. Just tried xen-debug.gz and again I just
> see loading, and then a reboot. I don't think it gets as far xen*.gz.
> 
> boot.cfg contains:
> 
> menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load 
> /netbsd-XEN3_
> DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz dom0_mem=1024M
> 
> [Any one know how to avoid "Collecting System Inventory ..." so booting
> doesn't take forever?]


Bizarre observation:

boot.cfg:

menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot
menu=Boot single user:rndseed /var/db/entropy-file;consdev com0,57600;boot -s
menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load 
/netbsd-XEN3_DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz 
dom0_mem=1024M
menu=Drop to boot prompt:prompt
default=3
timeout=5
clear=1


If I press "1", I still get "loading /netbsd-XEN3_DOM0" and a spontaneous
reboot. I need to press "2", then ctrl-D in single user for the equivalent
of "1".


Cheers,

Patrick

booting xen [was Re: serial console puzzle]

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 07:00:38PM +0200, Manuel Bouyer wrote:
> On Fri, Apr 30, 2021 at 05:55:37PM +0100, Patrick Welche wrote:
> > no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots.
> > Nothing more appears on the console. (-current XEN, xen.gz from 
> > xenkernel415)
> 
> Try xen-debug.gz ?
> Do you get the Xen boot messages ?

I don't get the Xen boot messages. Just tried xen-debug.gz and again I just
see loading, and then a reboot. I don't think it gets as far xen*.gz.

boot.cfg contains:

menu=Boot Xen:rndseed /var/db/entropy-file;consdev com0,57600;load /netbsd-XEN3_
DOM0 console=com1 com1=57600,8n1,0x3f8;multiboot /xen-debug.gz dom0_mem=1024M

[Any one know how to avoid "Collecting System Inventory ..." so booting
doesn't take forever?]


Cheers,

Patrick

Re: serial console puzzle

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 04:52:41PM +0100, Patrick Welche wrote:
> On Fri, Apr 30, 2021 at 05:23:54PM +0200, Manuel Bouyer wrote:
> > On Fri, Apr 30, 2021 at 04:18:49PM +0100, Patrick Welche wrote:
> > > On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote:
> > > > On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote:
> > > > > In /boot.cfg:
> > > > > 
> > > > > menu=Boot normally:rndseed /var/db/entropy-file;consdev 
> > > > > com0,57600;boot
> > > > > 
> > > > > # installboot -ve /dev/rsd0a
> > > > > File system: /dev/rsd0a
> > > > > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, 
> > > > > console com0
> > > > > 
> > > > > Yet in dmesg:
> > > > > 
> > > > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
> > > > > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO
> > > > > com1: console
> > > > > 
> > > > > (so I don't actually see anything)
> > > > > 
> > > > > (Wednesday's -current/amd64)
> > > > > 
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > one possibility is that the bios has com0 and com1 swapped.
> > > > In some case I had to explicitely set ioaddr with installboot to have
> > > > the serial console working.
> > > 
> > > I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B"
> > > is 0x2f8, so they are the right way around.
> > 
> > I've seen BIOSes report it the right way on in setup, but the wrong way
> > to the boot loader.
> > In such cases and explicit ioaddr did help.
> 
> Indeed - it did!
> 
> # installboot -ve /dev/rsd0a
> File system: /dev/rsd0a
> Boot options:timeout 5, flags 0, speed 57600, ioaddr 3f8, console com0
> 
> com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
> com0: console
> com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO
> 
> now for xen...

no luck. I see loading /netbsd-XEN3_DOM0, and then it just reboots.
Nothing more appears on the console. (-current XEN, xen.gz from xenkernel415)

This is BIOS boot, with disklabeled disk. GENERIC boots.

Ho hum


Patrick

Re: serial console puzzle

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 05:23:54PM +0200, Manuel Bouyer wrote:
> On Fri, Apr 30, 2021 at 04:18:49PM +0100, Patrick Welche wrote:
> > On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote:
> > > On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote:
> > > > In /boot.cfg:
> > > > 
> > > > menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot
> > > > 
> > > > # installboot -ve /dev/rsd0a
> > > > File system: /dev/rsd0a
> > > > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console 
> > > > com0
> > > > 
> > > > Yet in dmesg:
> > > > 
> > > > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
> > > > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO
> > > > com1: console
> > > > 
> > > > (so I don't actually see anything)
> > > > 
> > > > (Wednesday's -current/amd64)
> > > > 
> > > > 
> > > > Thoughts?
> > > 
> > > one possibility is that the bios has com0 and com1 swapped.
> > > In some case I had to explicitely set ioaddr with installboot to have
> > > the serial console working.
> > 
> > I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B"
> > is 0x2f8, so they are the right way around.
> 
> I've seen BIOSes report it the right way on in setup, but the wrong way
> to the boot loader.
> In such cases and explicit ioaddr did help.

Indeed - it did!

# installboot -ve /dev/rsd0a
File system: /dev/rsd0a
Boot options:timeout 5, flags 0, speed 57600, ioaddr 3f8, console com0

com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO

now for xen...


Thanks,

Patrick

Re: serial console puzzle

2021-04-30 Thread Patrick Welche

On Fri, Apr 30, 2021 at 05:04:34PM +0200, Manuel Bouyer wrote:
> On Fri, Apr 30, 2021 at 03:44:46PM +0100, Patrick Welche wrote:
> > In /boot.cfg:
> > 
> > menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot
> > 
> > # installboot -ve /dev/rsd0a
> > File system: /dev/rsd0a
> > Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console com0
> > 
> > Yet in dmesg:
> > 
> > com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
> > com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO
> > com1: console
> > 
> > (so I don't actually see anything)
> > 
> > (Wednesday's -current/amd64)
> > 
> > 
> > Thoughts?
> 
> one possibility is that the bios has com0 and com1 swapped.
> In some case I had to explicitely set ioaddr with installboot to have
> the serial console working.

I should have said: according to the BIOS "COM A" is 0x3f8, and "COM B"
is 0x2f8, so they are the right way around.

(Of course UEFI is fine, and I get console output, but...)


Cheers,

Patrick

serial console puzzle

2021-04-30 Thread Patrick Welche

In /boot.cfg:

menu=Boot normally:rndseed /var/db/entropy-file;consdev com0,57600;boot

# installboot -ve /dev/rsd0a
File system: /dev/rsd0a
Boot options:timeout 5, flags 0, speed 57600, ioaddr 0, console com0

Yet in dmesg:

com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 1-byte FIFO
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 1-byte FIFO
com1: console

(so I don't actually see anything)

(Wednesday's -current/amd64)


Thoughts?

Cheers,

Patrick

Re: running xen on current

2021-04-15 Thread Patrick Welche

On Thu, Apr 15, 2021 at 07:28:32AM -0400, Brad Spencer wrote:
> Manuel Bouyer  writes:
> 
> > On Thu, Apr 15, 2021 at 09:53:50AM +0100, Patrick Welche wrote:
> >> I have tried and failed to run xen on 3 -current/amd64 systems with
> >> 3 different failure modes:
> >> 
> >> 1) laptop:  xen.gz Building a PV Dom0 / ELF: not an ELF binary -> 
> >> panic/reboot
> >> 2) desktop: XEN3_DOM0 panics including PR port-xen/55978
> >> 3) server:  Trampoline space cannot be allocated; will try fallback -> 
> >> reboot
> >> 
> >> They are all working NetBSD-current/amd64 systems.
> >> 
> >> My conclusion was that xen is hopelessly broken, so was quite surprised
> >> by Greg Wood's thread about the finer points of running a guest OS, given
> >> that those systems won't even start the host OS.
> >> 
> >> I dug out an old desktop, and to my pleasant surprise it booted XEN3_DOM0,
> >> and I have managed to run some XEN3_DOMUs.
> >> 
> >> The difference between the working/broken setups seems to be that the
> >> working one is "BIOS" booting rather than EFI booting.
> >> 
> >> Among all your xen success stories, are any of you EFI booting?
> >
> > AFAIK EFI is not yet supported by Xen (maybe this is supported by 4.15,
> > I've not had a chance to try yet). I have it running on fairly recent
> > Dell servers (in BIOS mode)
> 
> 
> There has been fiddling with Xen and EFI for quite some time.  See:
> 
> https://wiki.xenproject.org/wiki/Xen_EFI
> 
> for example (might be old)... this indicates that Xen 4.3 or later could
> be built as a EFI binary and probably booted from the EFI firmware
> directly or with grub2 when grub2 is a EFI binary itself.  Of course
> those instructions are all Linux-centric and I don't know if you created
> a Xen kernel like this if it would boot a NetBSD DOM0 kernel.  I am in
> no position to try any tests with this right now personally, but it is
> tempting as I have a EFI only laptop that I could probably replace the
> hard drive temporarily.

Looking at

  https://xenproject.org/2021/04/08/xen-project-hypervisor-4-15/

(so 4.15 only just came out!) I see

  Unified boot images: It is now possible to create an image bundling
  together files needed for Xen to boot into a single EFI binary;
  making it now possible to boot a functional Xen system directly
  from the EFI boot manager, rather than having to go through grub
  multiboot.  Files that can be bundled include a hypervisor, dom0
  kernel, dom0 initrd, Xen KConfig, XSM configuration, and a device
  tree.

I thought that "go through grub multiboot" was the equivalent of our
boot.cfg "multiboot /xen.gz dom0_mem=1024M", but apparently not?
(Seems different to booting straight from the EFI boot menu)


Cheers,

Patrick

running xen on current

2021-04-15 Thread Patrick Welche

I have tried and failed to run xen on 3 -current/amd64 systems with
3 different failure modes:

1) laptop:  xen.gz Building a PV Dom0 / ELF: not an ELF binary -> panic/reboot
2) desktop: XEN3_DOM0 panics including PR port-xen/55978
3) server:  Trampoline space cannot be allocated; will try fallback -> reboot

They are all working NetBSD-current/amd64 systems.

My conclusion was that xen is hopelessly broken, so was quite surprised
by Greg Wood's thread about the finer points of running a guest OS, given
that those systems won't even start the host OS.

I dug out an old desktop, and to my pleasant surprise it booted XEN3_DOM0,
and I have managed to run some XEN3_DOMUs.

The difference between the working/broken setups seems to be that the
working one is "BIOS" booting rather than EFI booting.

Among all your xen success stories, are any of you EFI booting?


Cheers,

Patrick


=

Some extra gory details

1) laptop:

 Building a PV Dom0 
ELF: Not an ELF binary

***
Panic on CPU 0:
Could not set up DOM0 guest OS
***

Reboot in five seconds...


2) desktop: selection of panics in addition to PR port-xen/55978


[  80.989] panic: LIST_INSERT_HEAD 0xa080073eec28 
../../../../arch/x86/x86/pmap.c:2285
[  80.989] cpu13: Begin traceback...
[  80.989] vpanic() at netbsd:vpanic+0x14a
[  80.989] snprintf() at netbsd:snprintf
[  80.989] pmap_enter_ma() at netbsd:pmap_enter_ma+0x14e7
[  80.989] pmap_enter() at netbsd:pmap_enter+0x32
[  80.989] udv_fault() at netbsd:udv_fault+0x100
[  80.989] uvm_fault_internal() at netbsd:uvm_fault_internal+0x574
[  80.989] trap() at netbsd:trap+0x432
[  80.989] --- trap (number 6) ---
[  80.989] 7a60617787af:
[  80.989] cpu13: End traceback...

[  75.6599981] panic: kernel diagnostic assertion "ncp->nc_dvp == dvp" failed: 
file "../../../../kern/vfs_cache.c", line 432 
[  75.6599981] cpu0: Begin traceback...
[  75.6599981] vpanic() at netbsd:vpanic+0x14a
[  75.6599981] kern_assert() at netbsd:kern_assert+0x48
[  75.6599981] cache_lookup_entry() at netbsd:cache_lookup_entry+0xde
[  75.6599981] cache_lookup_linked() at netbsd:cache_lookup_linked+0x160
[  75.6599981] namei_tryemulroot() at netbsd:namei_tryemulroot+0x298
[  75.6599981] namei() at netbsd:namei+0x29
[  75.6599981] vn_open() at netbsd:vn_open+0x8f
[  75.6599981] do_open() at netbsd:do_open+0x119
[  75.6599981] do_sys_openat() at netbsd:do_sys_openat+0x74
[  75.6599981] sys_open() at netbsd:sys_open+0x24
[  75.6599981] syscall() at netbsd:syscall+0x9c
[  75.6599981] --- syscall (number 5) ---
[  75.6599981] netbsd:syscall+0x9c:
[  75.6599981] cpu0: End traceback...


3) server: EFI boot of Feb 6 2021, xenkernel413-4.13.3.tgz, serial console

On serial console, all that is seen is:

2415648+1324000=0x3910ec 
Loading /var/db/entropy-file
Loading /netbsd-XEN3_DOM0
Start @ 0xce60 [1=0xce991000-0xce9910ec]... 
Trampoline space cannot be allocated; will try fallback.

then it reboots

atomic_load_relaxed panic

2021-03-29 Thread Patrick Welche

On an otherwise idle amd64 box running

  NetBSD 9.99.81 (GENERIC) #3: Fri Mar 26 17:37:48 GMT 2021

I got tantalisingly close to managing to clone the NetBSD source:

# hg clone http://anonhg.netbsd.org/src
destination directory: src
applying clone bundle from 
https://cdn.NetBSD.org/_bundles/src/77d2a2ece3a06d837da45acd0fda80086ab4113c.zstd.hg
adding changesets
adding manifests
   
adding file changes 
   
added 931876 changesets with 2425841 changes to 439702 files (+417 heads)   
   
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
adding changesets
adding manifests
adding file changes
added 14681 changesets with 130115 changes to 91055 files (+5 heads)
new changesets 26c8f37631b6:87ca3e58cbb1 (241 drafts)
280333 local changesets published
updating to branch trunk
updating [=>   ]  71400/202610 17s

(any hints on making this go faster?)

When:

[ 257027.9185592] panic: kernel diagnostic assertion "atomic_load_relaxed(> 
[ 257027.9485598] cpu2: Begin traceback...  
[ 257027.9585600] vpanic() at netbsd:vpanic+0x156   
[ 257027.9685597] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
[ 257027.9785609] pmap_clear_attrs() at netbsd:pmap_clear_attrs+0x124
[ 257027.9985632] uvm_pagemarkdirty() at netbsd:uvm_pagemarkdirty+0x37b
[ 257028.0085600] uao_get() at netbsd:uao_get+0x315
[ 257028.0185597] ubc_fault() at netbsd:ubc_fault+0x16a
[ 257028.0285614] uvm_fault_internal() at netbsd:uvm_fault_internal+0x57a
[ 257028.0485613] trap() at netbsd:trap+0x5b7
[ 257028.0485613] --- trap (number 6) ---
[ 257028.0585610] copyout() at netbsd:copyout+0x33
[ 257028.0685603] uiomove() at netbsd:uiomove+0x80
[ 257028.0785605] ubc_uiomove() at netbsd:ubc_uiomove+0x157
[ 257028.0885612] tmpfs_read() at netbsd:tmpfs_read+0xbe
[ 257028.0985609] VOP_READ() at netbsd:VOP_READ+0x40
[ 257028.1185598] vn_read() at netbsd:vn_read+0x97
[ 257028.1285595] dofileread() at netbsd:dofileread+0x79
[ 257028.1385631] sys_read() at netbsd:sys_read+0x49
[ 257028.1485614] syscall() at netbsd:syscall+0x23e
[ 257028.1585616] --- syscall (number 3) ---
[ 257028.1685608] netbsd:syscall+0x23e:
[ 257028.1685608] cpu2: End traceback...

[ 257028.1785615] dumping to dev 168,2 (offset=8, size=41938679):
[ 257028.1885651] dump 9684 9683 9682 9681 9680 9679 9678 9677 9676 9675 9674 9

tip in tmux doesn't appear to be wrapped... Hopefully the dump will succeed...


Cheers,

Patrick

Re: bluetooth ubt0

2021-03-16 Thread Patrick Welche

On Tue, Mar 16, 2021 at 07:33:14AM +, Iain Hibbert wrote:
> > I wish someone could implement newer Intel ubt support.
> 
> Hm I can look into it - is it possible to buy one of these intel adapters 
> which is USB or do they only come built in to laptops etc?

This one is built-in. (A dongle is already in the post.) I'm happy to
test anything...

In the meantime, the work around of boot into windows (from your
analysis to load the firmware), and restart into NetBSD got me
working bluetooth(!). As this is my first foray, back to some basic
questions which after perusal of chapter 21 of the guide I'm still
in the dark about.

Overall, I'm trying to update the firmware of a bluetooth label printer.
Alledgedly this will move the print head away from the label to stop
it jamming. really? The instructions are to upload the file using
"serial bluetooth terminal" from android selecting protocol "raw".

I see:

  2: bdaddr 00:07:80:ac:19:4f
   : name "Pro 02105  "
   : class [0x040680] Printer 
   : page scan rep mode 0x01
   : clock offset 17621
   : rssi 0

# sdpquery -d ubt0 -a 00:07:80:ac:19:4f Browse
ServiceRecordHandle: 0x0001
ServiceClassIDList: 
Serial Port
ProtocolDescriptorList: 
L2CAP
RFCOMM (channel 1)
BrowseGroupList: 
Public Browse Root
LanguageBaseAttributeIDList: 
en.UTF-8 base 0x0100
ServiceName: "Bluetooth Serial Port"

so it is looking promising given the SP, but what next?

I tried

# cat /etc/bluetooth/hosts 
00:07:80:ac:19:4f   printer
# btpin -d ubt0 -a printer -p 
# rfcomm_sppd -d ubt0 -a printer -t /dev/ttyp1
rfcomm_sppd: Cannot open `/dev/ptyp1': No such file or directory
# rfcomm_sppd -d ubt0 -a printer -t /dev/pts/5
rfcomm_sppd: Cannot open `/dev/pts/p': No such file or directory
# rfcomm_sppd -d ubt0 -a printer 
rfcomm_sppd: SP: Host is down

Cheers,

Patrick

Re: bluetooth ubt0

2021-03-16 Thread Patrick Welche

My previous note hasn't appeared on the list yet, so out of order
resolution: when the printer was running on battery, it behaved differently
and

# rfcomm_sppd -d ubt0 -a printer < Pro\ printer\ config\ file.txt
rfcomm_sppd[1645]: Starting on stdio...
rfcomm_sppd[1645]: Completed on stdio

suggests success!


Cheers,

Patrick

bluetooth ubt0

2021-03-15 Thread Patrick Welche

A first foray into bluetooth on this amd64 laptop gives me:

ubt0: Intel (0x8087) product 0aaa (0x0aaa), rev 2.00/0.02, addr 4
ubt0: autoconfiguration error: CommandComplete opcode (003|0003) failed (status=
0x01)

and after a btconfig ubt0 up, btconfig shows

ubt0: bdaddr 00:00:00:00:00:00 flags 0x2e0

which doesn't look like a valid address. (Bluetooth is "on", at least
according to OtherOS)

Any thoughts on how to get it going?


Cheers,

Patrick

# usbdevs -v -v -a 4
Controller /dev/usb0:
Controller /dev/usb1:
addr 4: full speed, self powered, config 1, product 0aaa(0x0aaa), 
Intel(0x8087), rev 0.02(0x0002)
  Wireless(0xe0), Radio Frequency(0x01), proto 1
Controller /dev/usb2:
Controller /dev/usb3:

sed and xentools413

2021-03-08 Thread Patrick Welche

xentools413 is repeatably failing with

./config/ioapi.h:17:10: fatal error: config/local/ioapi.h: No such file or 
directory
   17 | #include 

In a very round about way this seems to be related to

  [DEPS] arch/x86/drivers/net/undiisr.S
sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\)
gmake[7]: *** Deleting file 'bin/deps/arch/x86/drivers/net/undiisr.S.d'
  [DEPS] arch/x86/transitions/librm.S
sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\)
gmake[7]: *** Deleting file 'bin/deps/arch/x86/transitions/librm.S.d'
...

apparently from

pkgsrc/sysutils/xentools413/work.x86_64/ipxe-1dd56dbd11082fb622c2ed21cfaced4f47d798a6/src/Makefile.housekeeping

   sed 's/\.o\s*:/_DEPS +=/' > $(BIN)/deps/$(1).d



Index: Makefile
===
RCS file: /cvsroot/pkgsrc/sysutils/xentools413/Makefile,v
retrieving revision 1.17
diff -u -r1.17 Makefile
--- Makefile8 Mar 2021 08:13:06 -   1.17
+++ Makefile8 Mar 2021 15:12:13 -
@@ -53,7 +53,7 @@
 EGDIR= ${PREFIX}/share/examples/xen
 MESSAGE_SUBST+=EGDIR=${EGDIR}
 
-USE_TOOLS+=pod2man gmake pkg-config makeinfo perl bash cmake
+USE_TOOLS+=pod2man gmake pkg-config makeinfo perl bash cmake gsed
 USE_LANGUAGES= c c++
 
 GNU_CONFIGURE= YES


gets me a successful build.

I'm assuming a WORKSFORME response for xentools413, so wondering
whether something changed in -current sed that would explain the
above.


Cheers,

Patrick

Re: serial console

2021-03-01 Thread Patrick Welche

On Wed, Feb 17, 2021 at 12:34:21AM -0800, Darrin B. Jewell wrote:
> 
> I honestly hope this isn't related to your problem, but I did make a
> change to the amd64 cdrom building recently that is in this area
> and may be the issue.
> 
>https://mail-index.netbsd.org/source-changes/2021/02/06/msg126676.html
> 
> I thought I was only affecting the boot-com.iso install cd, but this
> change should probably be examined to make sure it isn't causing your
> problem.

I didn't think your change had anything to do with it, but just in case,
I repeated the experiment with today's code (as you reverted your
change already) and had the same outcome.


Cheers,

Patrick

serial console

2021-02-16 Thread Patrick Welche

Has something changed recently in the land of serial consoles?

I am pretty sure that once I enabled serial console redirection
"after boot" in the "bios" of this amd64 uefi booting server, with
a serial port plugged in, I would have a serial console with the
default /boot.cfg.

After getting access to the building(!), it seems I now need to add
consdev com0,115200 to each menu item in boot.cfg. (Putting it
on a line on its own seems insufficient - is that a bug?)


Cheers,

Patrick

alignment and packed structs

2021-01-06 Thread Patrick Welche

I just tried to compile if_iwn as a module. It failed with

dev/pci/if_iwn.c:2685:6: error: converting a packed 'struct iwn_fw_dump' 
pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 
4) may result in an unaligned pointer value [-Werror=address-of-packed-member]


I got around it with

diff -u -r1.17 if_iwnreg.h
--- if_iwnreg.h 19 Jul 2017 16:55:12 -  1.17
+++ if_iwnreg.h 6 Jan 2021 17:24:01 -
@@ -1447,7 +1447,7 @@
uint32_tsrc_line;
uint32_ttsf;
uint32_ttime[2];
-} __packed;
+} __attribute__((aligned(4),packed));
 
 /* TLV firmware header. */
 struct iwn_fw_tlv_hdr {


Why isn't this necessary when building if_iwn.c as part of a kernel?
Is the above the right solution?


Cheers,

Patrick

Re: netbsd32_coredump

2020-11-15 Thread Patrick Welche

> On Sat, 14 Nov 2020, Patrick Welche wrote:
> 
> > Just upgraded a pi zero w from 9.99.10 to 9.99.75/evbarm-earmv6hf
> > i.e., 32 bit, and on boot with a standard RPI kernel and new dtb (but
> > presumably old startelf)
> > 
> > panic: kernel diagnostic assertion "!*hooked" failed: file 
> > "/usr/src/sys/kern/kern_module_hook.c", line 70
>
On Sat, Nov 14, 2020 at 09:36:07AM -0800, Paul Goyette wrote:
> That should already be fixed.  Can you update to HEAD?

Sure enough that .75 was from last Sunday - yesterday's .75 is fine!

Thanks,

Patrick

netbsd32_coredump

2020-11-14 Thread Patrick Welche

Just upgraded a pi zero w from 9.99.10 to 9.99.75/evbarm-earmv6hf
i.e., 32 bit, and on boot with a standard RPI kernel and new dtb (but
presumably old startelf)

panic: kernel diagnostic assertion "!*hooked" failed: file 
"/usr/src/sys/kern/kern_module_hook.c", line 70 

0x8097ae3c: netbsd:vpanic+0xc
0x8097ae54: netbsd:kern_assert+0x3c
0x8097ae84: netbsd:module_hook_set+0x98
0x8097aea4: netbsd:compat_netbsd32_coredump_modcmd+0xa0
0x8097af0c: netbsd:module_do_builtin+0x16c
0x8097af4c: netbsd:module_init_class+0x210
0x8097af9c: netbsd:main+0x38c
0x8097afac: netbsd:kernel_text+0x54


Cheers,

Patrick

Re: vmstat

2020-11-10 Thread Patrick Welche

On Tue, Nov 10, 2020 at 05:07:10PM +0100, Lars Reichardt wrote:
> Those pool names seem clearly broken.
> It seems the box is running with a zfs pool which has it's own cache
> (ARC) which memory is entirely wired and might grow quite large.
> What does vmstat -vmC show for those pools?

zpool export various...

# zpool list
no pools available
# zfs list
no datasets available
# modunload zfs

and those pools are gone!

10G still wired though, however 12G free memory, so the box is zipping
along once more! Those zfs partitions weren't being used...

Thanks,

Patrick

Re: vmstat

2020-11-10 Thread Patrick Welche

On Tue, Nov 10, 2020 at 05:07:10PM +0100, Lars Reichardt wrote:
> On Tue, 10 Nov 2020 12:12:14 +
> Patrick Welche  wrote:
> 
> > # vmstat -C
> > Pool cache statistics.
> > Name  Spin GrpSz Full Emty PoolLayer CacheLayer  Hit%
> > CpuLayer  Hit% amappl   863   660496221
> > 1592973  68.8   563372236  99.7 anonpl   6077263 11400   0
> > 55980901  177419830  68.4  9594977028  98.2 ...
> > xhcixfer 01500 5  8  37.5
> >  77  89.6 xhcixfer 01500 5 11
> > 54.5 193  94.3 zfs_znode_cache 2906   15   510   4258698
> >  17474551  75.6   215763165  91.9 Â©Ã¿Ã¿  236815   53
> > 0   4471316   17390812  74.3   305275419  94.3 Â©Ã¿Ã¿ 0
> >  1500258353 259551   0.5  277790   6.6 Â©Ã¿Ã¿
> > 01500117290 117459   0.1  119696
> > 1.9 Â©Ã¿Ã¿ 01500 74991  75197   0.3
> > 79474   5.4 Â©Ã¿Ã¿ 21500 36643
> > 36739   0.3   37986   3.3 ...
> > 
> > 
> > interesting "Pool" name... (10G wired memory box)
> > 
> > 
> 
> Those pool names seem clearly broken.
> It seems the box is running with a zfs pool which has it's own cache
> (ARC) which memory is entirely wired and might grow quite large.
> What does vmstat -vmC show for those pools?

Memory resource pool statistics
NameSize Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
zfs_znode_cache 2906   1500   4258698   17474551  75.6   215763165  91.9
zil_lwb_cache 01500 0  0   nan   0   nan
Â©Ã¿Ã¿  23681500   4471316   17390812  74.3   305275419  
94.3
Â©Ã¿Ã¿ 01500258353 259551   0.5  277790   
6.6
Â©Ã¿Ã¿ 01500117290 117459   0.1  119696   
1.9
Â©Ã¿Ã¿ 01500 74991  75197   0.3   79474   
5.4
Â©Ã¿Ã¿ 21500 36643  36739   0.3   37986   
3.3
...

Thanks - I'll see if unmounting them etc helps...


Patrick

Re: wired memory

2020-11-10 Thread Patrick Welche

On Tue, Nov 10, 2020 at 05:26:26AM -0800, Chuck Silvers wrote:
> On Tue, Nov 10, 2020 at 11:40:08AM +0000, Patrick Welche wrote:
> > My 4 Nov -current/amd64 build box seems to be building slowly, with a lot
> > of "biowait", e.g. watching the RES of a cc1plus slowly crawl up to SIZE
> > at around 3M per 5s.
> > 
> > Is 10G of "Wired" normal? (32G RAM + 64G swap)
> > (lots of malloc(9) and no free(9)?)
> 
> kernel memory usage is not reported as "wired" memory.

I got that notion from:

https://www.netbsd.org/docs/internals/en/chap-memory.html#wired_memory

> > Memory: 3235M Act, 115M Inact, 10G Wired, 75M Exec, 2860M File, 2800M Free
> > Swap: 64G Total, 11G Used, 53G Free
> 
> no, 10GB wired memory out of 32GB total RAM is not a typical situation.
> could you send me the output of "ps aux" and "vmstat -m" ?

Thanks - sent off list

> with 11GB of swap in use, it's not surprising that the system would be
> very slow.

Part of the question is why it needs to swap... (where did the 10G go?)

> > it is "just" building...
> > 
> > Could kern.maxfiles = 108928 be an issue? (3404 default below)
> 
> do you mean that on the system that is slow, kern.maxfiles is 108928?
> that doesn't seem overly large for 32GB RAM.
> 
> I'm not sure what you mean by "3404 default below".

just that the box below had the default kern.maxfiles=3404 setting, but
as you say that shouldn't be too large, and without it, one sees complaints
of insufficient files.


> > Just looked at another amd64 box for comparison (also building - no swap):
> > 
> > Memory: 15G Act, 3820K Inact, 16M Wired, 44M Exec, 14G File, 135G Free
> > Swap: 
> 
> this system has a lot more RAM and most of it isn't even used,
> so I would expect this system to perform much better than the first one.

Yes, but also just 16M of "Wired" memory...


Cheers,

Patrick

vmstat

2020-11-10 Thread Patrick Welche

# vmstat -C
Pool cache statistics.
Name  Spin GrpSz Full Emty PoolLayer CacheLayer  Hit%CpuLayer  Hit%
amappl   863   6604962211592973  68.8   563372236  99.7
anonpl   6077263 11400   0  55980901  177419830  68.4  9594977028  98.2
...
xhcixfer 01500 5  8  37.5  77  89.6
xhcixfer 01500 5 11  54.5 193  94.3
zfs_znode_cache 2906   15   510   4258698   17474551  75.6   215763165  91.9
Â©Ã¿Ã¿  236815   530   4471316   17390812  74.3   305275419  
94.3
Â©Ã¿Ã¿ 01500258353 259551   0.5  277790   
6.6
Â©Ã¿Ã¿ 01500117290 117459   0.1  119696   
1.9
Â©Ã¿Ã¿ 01500 74991  75197   0.3   79474   
5.4
Â©Ã¿Ã¿ 21500 36643  36739   0.3   37986   
3.3
...


interesting "Pool" name... (10G wired memory box)


Cheers,

Patrick

wired memory

2020-11-10 Thread Patrick Welche

My 4 Nov -current/amd64 build box seems to be building slowly, with a lot
of "biowait", e.g. watching the RES of a cc1plus slowly crawl up to SIZE
at around 3M per 5s.

Is 10G of "Wired" normal? (32G RAM + 64G swap)
(lots of malloc(9) and no free(9)?)

Memory: 3235M Act, 115M Inact, 10G Wired, 75M Exec, 2860M File, 2800M Free
Swap: 64G Total, 11G Used, 53G Free

it is "just" building...

Could kern.maxfiles = 108928 be an issue? (3404 default below)


Just looked at another amd64 box for comparison (also building - no swap):

Memory: 15G Act, 3820K Inact, 16M Wired, 44M Exec, 14G File, 135G Free
Swap: 


Cheers,

Patrick

gdb core dump

2020-10-21 Thread Patrick Welche

A Monday amd64 kernel running box panicked overnight during a bulk build.

Reading symbols from netbsd.0...
(No debugging symbols found in netbsd.0)
(gdb) target kvm netbsd.0.core
[1]   Abort trap (core dumped) gdb netbsd.0
# gdb `which gdb` gdb.core
GNU gdb (GDB) 11.0.50.20200914-git
...
Program terminated with signal SIGABRT, Aborted.


# crash -N netbsd.0 -M netbsd.0.core 
Crash version 9.99.74, image version 9.99.74.
System panicked: kernel diagnostic assertion "(pg->flags & PG_PAGEOUT) == 0" 
failed: file "../../../../uvm/uvm_page.c", line 1448 
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
?() at c9803de74b01
sys_reboot() at sys_reboot
vpanic() at vpanic+0x160
__x86_indirect_thunk_rax() at __x86_indirect_thunk_rax
uvm_pagefree() at uvm_pagefree+0x62d
uvm_anon_release() at uvm_anon_release+0x6d
uvm_aio_aiodone_pages() at uvm_aio_aiodone_pages+0x439
uvm_aio_aiodone() at uvm_aio_aiodone+0x226
dkiodone() at dkiodone+0x97
biointr() at biointr+0x61
softint_dispatch() at softint_dispatch+0xf5
crash: _kvm_kvatop(c9825cf930b8)
crash: kvm_read(0xc9825cf930b8, 8): invalid translation (invalid PTE)

# ident /netbsd.old | grep uvm_ | grep 2020/10
 $NetBSD: uvm_bio.c,v 1.123 2020/10/18 08:52:15 rin Exp $
 $NetBSD: uvm_init.c,v 1.54 2020/10/07 17:51:50 chs Exp $
 $NetBSD: uvm_page.c,v 1.249 2020/10/18 18:31:31 chs Exp $
 $NetBSD: uvm_pager.c,v 1.130 2020/10/18 18:22:29 chs Exp $
 $NetBSD: uvm_pgflcache.c,v 1.6 2020/10/18 18:31:31 chs Exp $
 $NetBSD: uvm_pglist.c,v 1.86 2020/10/07 17:51:50 chs Exp $
 $NetBSD: uvm_swap.c,v 1.200 2020/10/07 17:51:50 chs Exp $



Cheers,

Patrick

panic rebooting yesterday's kernel

2020-10-21 Thread Patrick Welche

Booted a yesterday's source amd64 kernel, and on reboot

[ 17037.8583948] unmounting 0xfef63b813000 / (/dev/dk14)...
[ 17037.8583948] forcefully unmounting / (/dev/dk14)...
[ 17037.8783949] dk14 at wd4 (root) deleted
[ 17037.8783949] wd4: detached
[ 17037.8883950] atabus8: detached
[ 17037.8883950] ahcisata1: detached
[ 17038.6083908] Kernel lock error: _kernel_lock,244: spinout

[ 17038.6083908] lock address : 0x81082f00 type :   spin
[ 17038.6083908] initialized  : 0x80bc8340
[ 17038.6083908] shared holds :  0 exclusive:  1
[ 17038.6083908] shares wanted:  0 exclusive:  3
[ 17038.6083908] relevant cpu :  0 last held:  1
[ 17038.6083908] relevant lwp : 0xfefd19744080 last held: 0xfef642bf4280
[ 17038.6083908] last locked* : 0x80a42ab1 unlocked : 0x80323d4c
[ 17038.6083908] curcpu holds :  0 wanted by: 0xfefd19744080

[ 17038.6763377] Skipping crash dump on recursive panic
[ 17038.6816213] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinout
[ 17038.6886673] cpu0: Begin traceback...
[ 17038.6886673] vpanic() at netbsd:vpanic+0x156
[ 17038.6983892] snprintf() at netbsd:snprintf
[ 17038.6983892] lockdebug_more() at netbsd:lockdebug_more
[ 17038.7083890] _kernel_lock() at netbsd:_kernel_lock+0x22a
[ 17038.7183890] intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x14
[ 17038.7183890] Xhandle_ioapic_edge17() at netbsd:Xhandle_ioapic_edge17+0x6e
[ 17038.7296195] --- interrupt ---
[ 17038.7296195] _kernel_lock() at netbsd:_kernel_lock+0x1f8
[ 17038.7403251] callout_softclock() at netbsd:callout_softclock+0x425
[ 17038.7483888] softint_dispatch() at netbsd:softint_dispatch+0xf5
address 0xc8025cf990b8 is invalid
address 0xc8025cf990b0 is invalid
address 0xc8025cf990c0 is invalid
address 0xc8025cf990b8 is invalid
address 0xc8025cf990c8 is invalid
address 0xc8025cf990c0 is invalid
address 0xc8025cf990d0 is invalid
address 0xc8025cf990c8 is invalid
[ 17038.7784674] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xc8025cf98ff0
[ 17038.7900680] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 17038.7993371] --- interrupt ---
address 0xc8025cf990c8 is invalid
address 0xc8025cf99080 is invalid
[ 17038.8084475] Bad frame pointer: 0xc8025cf98450
[ 17038.8084475] c8025cf98450:
[ 17038.8185590] cpu0: End traceback...


so no ddb either...


Cheers,

Patrick

Re: file system corruption

2020-10-15 Thread Patrick Welche

On Sun, Oct 11, 2020 at 11:19:16PM +0200, Thomas Klausner wrote:
> I've had serious file system corruption. Mostly in mercurial and
> sqlite3 databases, but also in normal files.

> Anyone else having problems?

Is yours a ryzen system? (mine is, and it has filesystem issues - just
trying to see why it is not a common issue)

Cheers,

Patrick

Re: file system corruption

2020-10-14 Thread Patrick Welche

On Mon, Oct 12, 2020 at 06:39:48AM +0200, Martin Husemann wrote:
> On Sun, Oct 11, 2020 at 11:19:16PM +0200, Thomas Klausner wrote:
> > I don't know enough about the internals of the hg and sqlite3, but I
> > also saw a broken zip archive and had a good copy for comparison. In
> > that case, a block of 256 bytes was zero instead of the real data.
> 
> Do you know the file offset where the corruption started?
> Can you show "dumpfs $rawdev | head -15" for that file system?

Reminds me of PR kern/55362. If I started with a disk full of zeros,
some ranges would have zero instead of the real data. If I started
with a disk full of ones, some ranges would contain ones instead of
the real data.


In other news, just now, after a clean reboot to use a new kernel, the
system came up with

[  1885.434544] panic: ffs_blkfree: bad size: dev = 0xa803, bno = 331526 bsize 
= 32768, size = 12288, fs = /usr/obj

(different filesytem & disk)


Cheers,

Patrick

ctype and gcc9

2020-09-21 Thread Patrick Welche

Since gcc9, essentially every ctype using piece of software fails with

   error: array subscript has type 'char' [-Werror=char-subscripts]

which prompts a style question: cast every argument of every call to
a ctype function in every piece of software to (unsigned char), or
-Wno-error=char-subscripts], or something else?


Cheers,

Patrick

Re: mesalib abort

2020-09-17 Thread Patrick Welche

On Thu, Sep 17, 2020 at 03:42:09PM +0200, Martin Husemann wrote:
> On Thu, Sep 17, 2020 at 02:11:13PM +0100, Patrick Welche wrote:
> > #2  0x7f7ff160c63d in pthread_create (thread=0x2f9d, 
> > thread@entry=0x7f7fd588, attr=attr@entry=0x0, 
> > startfunc=startfunc@entry=0x7f7fed764930 , 
> > arg=arg@entry=0x7f7ff7e9f230) at /usr/src/lib/libpthread/pthread.c:404
> 
> That is:
> 
> if (__predict_false(__uselibcstub)) {
> pthread__errorfunc(__FILE__, __LINE__, __func__,
> "pthread_create() requires linking with -lpthread");
> return __libc_thr_create_stub(thread, attr, startfunc, arg);
> }
> 
> 
> So "something" needs to be linked with -pthread but isn't.

Thanks: I confused myself thinking that the glmark2 in pkgsrc also coredumps.
It doesn't. I updated glmark2, as it depends on python 3 rather than 2.

This is interesting:

commit a7bd0084f67b80ae9c32189e4f28a4e7d08f0b92
Author: Jamie Madill 
Date:   Thu Feb 7 23:33:08 2019 -0500

Add gl, egl and glx loader using GLAD.

Instead of hard linking against libGL, libEGL and libGLESv2 we can
load the entry points at runtime. Loading dynamically is more
flexible across different platforms.

Note that the glad loaders are licensed under public domain.

Preparation for Windows support in Issue #9.

and it is /usr/X11R7/lib/libGL.so.3 which provides the libpthread, which
is now absent as it isn't linked.

Thanks for the clues!

Cheers,

Patrick

Re: mesalib abort

2020-09-17 Thread Patrick Welche

On Thu, Sep 17, 2020 at 01:38:21PM +0200, Tobias Nygren wrote:
> On Thu, 17 Sep 2020 12:04:10 +0100
> Patrick Welche  wrote:
> > It looks as though line 50 simply undoes line 48, and it doesn't matter
> > whether or not line 49 fails. How can this break?
> 
> This is a red herring. Because they mask the SIGSEGV/SIGBUS handlers
> you are not getting the actual fault location which happens inside
> thrd_create. Instead the process takes the fault when the signal mask
> is restored. This code is severly broken and upstream should fix it.
> Anyway, If you comment out the signal masking crap you will get a
> proper backtrace.

It does break in between as you say:

#0  0x7f7ff598991a in _lwp_kill () from /usr/lib/libc.so.12
#1  0x7f7ff5846c3c in __libc_thr_create_stub (tp=tp@entry=0x2f9d, 
ta=ta@entry=0x0, f=f@entry=0x7f7fed764930 , 
a=a@entry=0x7f7ff7e9f230)
at /usr/src/lib/libc/thread-stub/thread-stub.c:418
#2  0x7f7ff160c63d in pthread_create (thread=0x2f9d, 
thread@entry=0x7f7fd588, attr=attr@entry=0x0, 
startfunc=startfunc@entry=0x7f7fed764930 , 
arg=arg@entry=0x7f7ff7e9f230) at /usr/src/lib/libpthread/pthread.c:404
#3  0x7f7fed764a90 in thrd_create (
func=0x7f7fed764af6 , arg=0x7f7ff7e9f220, 
thr=0x7f7fd588)
at /usr/xsrc/external/mit/MesaLib/dist/include/c11/threads_posix.h:289
#4  u_thread_create (param=0x7f7ff7e9f220, 
routine=0x7f7fed764af6 )
at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:54


Thanks,

Patrick

mesalib abort

2020-09-17 Thread Patrick Welche

When running glmark2 with native xsrc on -current/amd64, I get

Program terminated with signal SIGABRT, Aborted.
#0  0x7f7ff5844a2a in _sys___sigprocmask14 () from /usr/lib/libc.so.12
(gdb) bt
#0  0x7f7ff5844a2a in _sys___sigprocmask14 () from /usr/lib/libc.so.12
#1  0x7f7ff160a461 in pthread_sigmask (how=, 
set=, oset=)
at /usr/src/lib/libpthread/pthread_misc.c:164
#2  0x7f7fed75fec3 in u_thread_create (
routine=0x7f7fed75ff06 , param=0x7f7ff7e9f220)
at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:50
#3  util_queue_create_thread (queue=queue@entry=0x7f7ff7e8b900, 
index=index@entry=0)
at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_queue.c:350
#4  0x7f7fed7604ad in util_queue_init (queue=queue@entry=0x7f7ff7e8b900, 
name=name@entry=0x7f7fef727c8b "disk$", max_jobs=max_jobs@entry=32, 
num_threads=num_threads@entry=1, flags=flags@entry=7)
at /usr/xsrc/external/mit/MesaLib/dist/src/util/u_queue.c:466
...

/usr/xsrc/external/mit/MesaLib/dist/src/util/u_thread.h:50:
41 thrd_t thread;
42  #ifdef HAVE_PTHREAD
43 sigset_t saved_set, new_set;
44 int ret;
45  
46 sigfillset(_set);
47 sigdelset(_set, SIGSYS);
48 pthread_sigmask(SIG_BLOCK, _set, _set);
49 ret = thrd_create( , routine, param );
50 pthread_sigmask(SIG_SETMASK, _set, NULL);
51  #else
52 int ret;
53 ret = thrd_create( , routine, param );
54  #endif
55 if (ret)
56return 0;
57  
58 return thread;


It looks as though line 50 simply undoes line 48, and it doesn't matter
whether or not line 49 fails. How can this break?

Cheers,

Patrick

Re: GCC 9 enabled for x86 and arm platforms.

2020-09-13 Thread Patrick Welche

On Sun, Sep 13, 2020 at 07:28:16AM +, Thomas Mueller wrote:
> > i've switched x86 and arm to GCC 9.  several others are liking
> > to switch soon, and they will all likely need clean builds to
> > be stable.
> 
> > please send email here or to me or send-pr for problems.
> 
> > thanks! 
> 
> 
> > .mrg.   
> 
> Would -r flag in build.sh command be sufficient to ensure a clean build?
> 
> Something like
> ===> build.sh command:./build.sh -m amd64 -B nb899-20190723 -M ../obj -T 
> ../tooldir -r -U distribution kernel=SANDY7
> 
> Or would it be necessary to explicitly clean obj and tooldir directories, like
> rm -R ../obj/*
> rm -R ../tooldir/*
> ?
> 
> At this stage I wouldn't want to keep any remnants from old builds.

Making sure there isn't MKUPDATE=yes nor -u in build.sh call was enough
for me. The explicit clean would probably be better. (-r doesn't touch
OBJDIR)

Cheers,

Patrick

Re: no entropy?

2020-09-10 Thread Patrick Welche

On Thu, Sep 10, 2020 at 05:52:08PM +0200, Martin Husemann wrote:
> On Thu, Sep 10, 2020 at 04:06:12PM +0100, Patrick Welche wrote:
> > I just upgraded to ancient boxen to -current/amd64. One has its 256 bits,
> > the other has none?! I have tried playing spot the difference but
> > haven't spotted anything:
> 
> Both machines do not have a proper hardware random number generator.
> One of them has been seeded before and saved the entropy
> in /var/db/entropy-file.
> 
> On the one that has not, you can manually do it by writing 32 random
> bytes to /dev/random with dd(1), eg. after extracting them from some
> machine with newer cpu (and hardware random number support) or from 
> the properly seeded machine by reading 32bytes from /dev/urandom.


  256 bits currently stored in pool (max 256)

Thanks, it's happier now!

Patrick

no entropy?

2020-09-10 Thread Patrick Welche

I just upgraded to ancient boxen to -current/amd64. One has its 256 bits,
the other has none?! I have tried playing spot the difference but
haven't spotted anything:

OK:
[ 1.00] entropy: no seed from bootloader
[ 7.024989] entropy: ready
# rndctl -ls
Source Bits Type  Flags
raid0 0 disk estimate, collect, v, t, dt
ucom0 0 tty  estimate, collect, v, t, dt
/dev/random   0 ???  estimate, collect, v
wd1   0 disk estimate, collect, v, t, dt
wd0   0 disk estimate, collect, v, t, dt
cpu1  0 vm   estimate, collect, v, t, dv
cpu0  0 vm   estimate, collect, v, t, dv
coretemp1-cpu10 env  estimate, collect, v, t, dv, dt
coretemp0-cpu00 env  estimate, collect, v, t, dv, dt
wm1   0 net  estimate, v, t, dt
wm0   0 net  estimate, v, t, dt
system-power  0 power estimate, collect, v, t, dt
autoconf  0 ???  estimate, collect, t
seed256 ???  estimate, collect, v
0 bits mixed into pool
  256 bits currently stored in pool (max 256)
0 bits of entropy discarded due to full pool
0 hard-random bits generated
0 pseudo-random bits generated


puzzling:
[ 1.00] entropy: no seed from bootloader
[ 7.383099] entropy: WARNING: consolidating less than full entropy
# rndctl -ls
Source Bits Type  Flags
raid3 0 disk estimate, collect, v, t, dt
raid2 0 disk estimate, collect, v, t, dt
raid1 0 disk estimate, collect, v, t, dt
raid0 0 disk estimate, collect, v, t, dt
/dev/random   0 ???  estimate, collect, v
wd1   0 disk estimate, collect, v, t, dt
wd0   0 disk estimate, collect, v, t, dt
cpu1  0 vm   estimate, collect, v, t, dv
cpu0  0 vm   estimate, collect, v, t, dv
coretemp1-cpu10 env  estimate, collect, v, t, dv, dt
coretemp0-cpu00 env  estimate, collect, v, t, dv, dt
wm1   0 net  estimate, v, t, dt
wm0   0 net  estimate, v, t, dt
system-power  0 power estimate, collect, v, t, dt
autoconf  0 ???  estimate, collect, t
seed  0 ???  estimate, collect, v
0 bits mixed into pool
0 bits currently stored in pool (max 256)
0 bits of entropy discarded due to full pool
0 hard-random bits generated
0 pseudo-random bits generated

both installed from the same tarballs, with GENERIC, and both apparently
with the same model of CPU, both show

[ 1.991010] aes: Intel SSSE3 vpaes
[ 1.991010] aes_ccm: self-test passed
[ 5.142260] cgd: self-test aes-xts-256
[ 5.142260] cgd: self-test aes-xts-512
[ 5.142260] cgd: self-test aes-cbc-128
[ 5.142260] cgd: self-test aes-cbc-256
[ 5.142260] cgd: self-test aes-cbc-128 (encblkno8)

Maybe I need to toss a coin...


Cheers,

Patrick

Re: hang while updating pkg_rolling-replace libvdpau

2020-09-03 Thread Patrick Welche

On Wed, Sep 02, 2020 at 11:50:52PM +0200, Riccardo Mottola wrote:
> I finished updating all my core system to current on i386-64, kernel,
> userland, etc.
> 
> Now I launched pkg_rolling replace, it crunches through several packages,
> but then hangs.
> 
> 
> I tried running it several times, rebooting in between... but nothing. What
> is "hang" ? The CPU stays idle too. Hangs exactly there.

What does e.g., ps auxwwd say when it hangs? Maybe another manifestation
of "cmake hanging"

  http://mail-index.netbsd.org/current-users/2020/05/24/msg038692.html

?


Cheers,

Patrick

ptrace(PT_DUMPCORE) permission?

2020-08-21 Thread Patrick Welche

pbulk was idle stuck building kio at

dbus24339  0.0  0.0 182156   9644 pts/6  Il   12:45PM   0:00.04 
/usr/pkg/bin/cmake -E cmake_autogen 
/tmp/pkgsrc/devel/kio/work/kio-5.70.1/_KDE_build/src/urifilters/ikws/CMakeFiles/kurisearchfilter_autogen.dir/AutogenInfo.json
  

Trying to debug:

# gcore -c cmake.core 24339
gcore: ptrace(PT_DUMPCORE) to 24339 failed: Permission denied
# gcore 24339
gcore: ptrace(PT_ATTACH) to 24339 failed: No such process

Permission denied? and sure enough after that, there was no process.

Thoughts on giving root permission? (I should have just gdb attached...)


Cheers,

Patrick

radixtree panic

2020-08-05 Thread Patrick Welche

A amd64 box updated to -current yesterday panicked overnight with

(gdb) target kvm netbsd.0.core
0x80222535 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:713
713 dumpsys();
(gdb) bt
#0  0x80222535 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:713
#1  0x8062750a in kern_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at ../../../../kern/kern_reboot.c:73
#2  0x80657570 in vpanic (fmt=fmt@entry=0x809397d0 "trap", 
ap=ap@entry=0xa9814eb90ac8) at ../../../../kern/subr_prf.c:290
#3  0x80657634 in panic (fmt=fmt@entry=0x809397d0 "trap")
at ../../../../kern/subr_prf.c:209
#4  0x80224b6b in trap (frame=0xa9814eb90c10)
at ../../../../arch/amd64/amd64/trap.c:326
#5  0x8021da35 in alltraps ()
#6  0x807596d5 in radix_tree_lookup_ptr (tagmask=0, alloc=false, 
path=0xa9814eb90d00, idx=10, t=0x84733b57b1d8)
at 
../../../../../../lib/libkern/../../../common/lib/libc/gen/radixtree.c:557
#7  radix_tree_clear_tag (t=t@entry=0x84733b57b1d8, idx=idx@entry=10, 
tagmask=tagmask@entry=1)
at 
../../../../../../lib/libkern/../../../common/lib/libc/gen/radixtree.c:1113
#8  0x805f2f42 in uvm_pagemarkdirty (pg=pg@entry=0xa98005772880, 
newstatus=newstatus@entry=1) at ../../../../uvm/uvm_page_status.c:109
#9  0x805f44a8 in uvmpd_scan_queue ()
at ../../../../uvm/uvm_pdaemon.c:752
#10 uvmpd_scan () at ../../../../uvm/uvm_pdaemon.c:900
#11 uvm_pageout (arg=) at ../../../../uvm/uvm_pdaemon.c:316
#12 0x802086f7 in lwp_trampoline ()


reminds me of PR 55493 which was on a different box


Cheers,

Patrick

Re: strange assert failure on today's -current

2020-08-03 Thread Patrick Welche

On Mon, Aug 03, 2020 at 02:59:43PM +0100, Patrick Welche wrote:
> On Sun, Aug 02, 2020 at 09:22:22PM +0100, Chavdar Ivanov wrote:
> > I've rebuilt pkgin itself, it doesn't do that on the build host, only
> > on one which normally receives its packages via pkgin.
> 
> Oddly I saw that too, got as far as deciding that
> 
>   https://github.com/NetBSDfr/pkgin/issues/86
> 
> looked similar. Thought it might be good to get a better backtrace and
> recompiled with "-O0 -ggdb". Of course, then there was no more problem,
> and pkgin just worked.

I think the client had a pkgin that had been built on the server. The
recompile bit happened on the client. That might be more relevant than
the flags, but you say you "rebuilt pkgin"?


Cheers,

Patrick

Re: strange assert failure on today's -current

2020-08-03 Thread Patrick Welche

On Sun, Aug 02, 2020 at 09:22:22PM +0100, Chavdar Ivanov wrote:
> I've rebuilt pkgin itself, it doesn't do that on the build host, only
> on one which normally receives its packages via pkgin.

Oddly I saw that too, got as far as deciding that

  https://github.com/NetBSDfr/pkgin/issues/86

looked similar. Thought it might be good to get a better backtrace and
recompiled with "-O0 -ggdb". Of course, then there was no more problem,
and pkgin just worked.

Cheers,

Patrick

Re: Failure durin nbmake build

2020-07-26 Thread Patrick Welche

On Sun, Jul 26, 2020 at 10:33:33AM +0100, Chavdar Ivanov wrote:
> cc  -D_PATH_DEFSYSPATH="/home/sysbuild/src/share/mk"
> -DDEFSHELL_CUSTOM="/bin/sh" -DHAVE_SETENV=1 -DHAVE_STRDUP=1
> -DHAVE_STRERROR=1 -DHAVE_STRFTIME=1 -DHAVE_VSNPRINTF=1  -O -c
> /home/sysbuild/src/usr.bin/make/lst.lib/*.c
> cc: error: /home/sysbuild/src/usr.bin/make/lst.lib/*.c: No such file
> or directory
> cc: fatal error: no input files
> compilation terminated.

I think the lst.lib directory has just been removed. I haven't seen
your build error, but do see a couple of references to lst.lib in Makefiles.

Cheers,

Patrick

Re: wm0 panic

2020-07-23 Thread Patrick Welche

On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote:
> Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
> boot multiuser without a network. If I log in as root, as soon as I hit
> enter:
> 
> # ifconfig wm0 inet 10.0.0.62 netmask 0xff00
> [ 127.5763268] Kernel lock error 127.5763268] lock address : 
> 0x8106ab40 type :   spin

I can't reproduce this after
 
  http://mail-index.netbsd.org/source-changes/2020/07/07/msg119158.html
 
Cheers,
 
Patrick

Re: x86 in-kernel fpu bug

2020-07-21 Thread Patrick Welche

On Mon, Jul 20, 2020 at 04:49:14PM +, Taylor R Campbell wrote:
> > Date: Mon, 20 Jul 2020 11:04:21 +0100
> > From: Patrick Welche 
> > 
> > After a -current/amd64 update, a sudden outbreak of floating point 
> > exceptions:
> > 
> > /usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1:
> >  
> > internal compiler error: Floating point exception   
> > 
> > 
> > Fetching message headers...
> > Floating point exception (core dumped) mutt
> > 
> > Any guesses?
> 
> There's a good chance this has been fixed in sys/arch/x86/x86/fpu.c
> revision 1.72 -- can you update and try again with a new kernel?

Indeed!

Thanks,

Patrick

Re: Samba DC provisioning fails with ACL-enabled NetBSD-current

2020-07-20 Thread Patrick Welche

On Mon, Jul 20, 2020 at 05:47:59PM +0200, Matthias Petermann wrote:
> test10# mount
> /dev/dk0 on / type ffs (acls, log, local)

In /etc/fstab, try

/dev/dk0   /   ffs rw,posix1eacls   1 1

(rather than acls)


Cheers,

Patrick

Re: floating point exceptions

2020-07-20 Thread Patrick Welche

On Mon, Jul 20, 2020 at 11:24:29AM -, Michael van Elst wrote:
> pr...@cam.ac.uk (Patrick Welche) writes:
> 
> >After a -current/amd64 update, a sudden outbreak of floating point 
> >exceptions:
> 
> >/usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1:
> > 
> >internal compiler error: Floating point exception
> >   
> >Fetching message headers...
> >Floating point exception (core dumped) mutt
> 
> >Any guesses?
> 
> 
> FPU usage in the kernel was enabled to support AES-NI for ipsec and cgd.
> A workaround is to comment out all aes_md_init() calls in identcpu.c.

Yes, this laptop is much happier now! I'm surprised as I was using the
patch on an AMD build box before it was even committed with only a cgd
speed-up to report. The unhappy laptop has a

cpu0: "Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz"
cpu0: features1 0x7ffafbff
cpu0: features1 0x7ffafbff


Cheers,

Patrick

floating point exceptions

2020-07-20 Thread Patrick Welche

After a -current/amd64 update, a sudden outbreak of floating point exceptions:

/usr/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/tree-ssa-operands.c:1348:1: 
internal compiler error: Floating point exception   

Fetching message headers...
Floating point exception (core dumped) mutt

Any guesses?


Cheers,

Patrick

Re: Heads up: ubc_direct enabled by default

2020-07-20 Thread Patrick Welche

On Thu, Apr 23, 2020 at 10:10:56PM +, Andrew Doran wrote:
> This affects amd64, alpha and aarch64, but only 1 and 2 CPU systems so far. 
> Any more and it's still off by default.  Only the default has changed so the
> sysctl (vm.ubc_direct) still works for turning it on and off manually.
> 
> This works great for me on amd64 but needs some tweaks to handle many CPUs.
> I have some ideas on that one and hopefully will have something to try soon.


What are the risks of switching it on for > 2 CPUs?


(With a LOCKDEBUG kernel on a 8-core ryzen 1700 for PR kern/55362, after
days I'm at

# dd if=/dev/zero obs=64k | progress -l 12884901888b dd of=/dev/rdk6 ibs=64k
  8% |** |   544 GiB2.18 MiB/s--:-- ETA

which I think could be helped with ubc_direct. I don't wnat to increase
the problem domain though.)


Cheers,

Patrick

Re: rump bridge fun

2020-07-14 Thread Patrick Welche

On Tue, Jul 14, 2020 at 11:20:24AM +0100, Patrick Welche wrote:
> On 9.99.68/amd64, in one xterm:
> 
> $ rump_allserver -sv unix:///tmp/sock
> 
> in another xterm:
> 
> $ export RUMP_SERVER=unix:///tmp/sock
> $ rump.ifconfig -a
> lo0: flags=0x8049 mtu 33624
> inet 127.0.0.1/8 flags 0
> inet6 ::1/128 flags 0x20
> inet6 fe80::1%lo0/64 flags 0 scopeid 0x1
> $ rump.ifconfig bridge0 create
> 
> and watch rump_allserver's cpu use hit 100% (!)

Filed as PR kern/55489

Cheers,

Patrick

rump bridge fun

2020-07-14 Thread Patrick Welche

On 9.99.68/amd64, in one xterm:

$ rump_allserver -sv unix:///tmp/sock

in another xterm:

$ export RUMP_SERVER=unix:///tmp/sock
$ rump.ifconfig -a
lo0: flags=0x8049 mtu 33624
inet 127.0.0.1/8 flags 0
inet6 ::1/128 flags 0x20
inet6 fe80::1%lo0/64 flags 0 scopeid 0x1
$ rump.ifconfig bridge0 create

and watch rump_allserver's cpu use hit 100% (!)


Cheers,

Patrick

Re: wm0 panic

2020-06-29 Thread Patrick Welche

On Mon, Jun 29, 2020 at 12:53:23PM +0900, Kengo NAKAHARA wrote:
> It seems some other code have held KERNEL_LOCK too long time.
> Could you show the function of last locked address?
> # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5

With Jun 28 14:26 code

# addr2line -e netbsd.3.gdb -f 0x80a4c526
doifioctl
/usr/src/sys/arch/amd64/compile/QUANTZDBG/../../../../net/if.c:3403 
(discriminator 3)

> If the panic can reappear, could you show "show all locks/t" of ddb?

It is nicely reproducible (boot single user, type "ifconfig wm0 up"),
I have a core dump and a serial console, but debugging locking issues
is "interesting"!


Thanks,

Patrick
type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

db{1}> show all locks /t
[Locks tracked through LWPs]

** LWP 330.330 (ifconfig) @ 0xf1c6387c8a40, l_stat=7

*** Locks held:

* Lock 0 (ick address : 0xf1c637a4e380 type : sleep/adaptive
initialized  : 0x80a475fd
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
relevant cpu :  0 last held:  0
relevant lwp : 0xf1c6388a40
last locked* : 0x80a4bf94 unlocked : 0x80a4c02b
owner field  : 0xf1c6387c8a40 wait/spin:0/0
Turnstile: no active turnstile for this lock.

*** Loczed at module_hook_init)
lock address : 0x8106a800 type : sleep/adaptive
initialized  : 0x80952c6e
shared holds :  0 exclusive:  0
shares wanted:  0 exclusive:  0
relevant cpu :  0 last held:  0
relevant lwp : 0xf1c6387c8a40 last held: 00
last locked  : 00 unlocked*: 00
owner field  : 00 wait/spin:0/0
Turnstile: no active turnstile for this lock.
   
*** Traceback:
   
trace: pid 330 lid 330 at 0x8
address 0x283 is invalid
?() at 283 
address 0x10 is invalid
address 0x8 is invalid
db_printf() at netbsd:db_printf
   
   
** LWP 0.402 (iic1) @ 0xf1c637f1aa40, l_stat=7
   
*** Locks held: none

*** Locks wanted:

* Lock 0 (initialized at main)
lock address : 0x8106a700 type :0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  2 last hellwp : 0xf1c637f1aa40 last held: 
0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

*** 02 at 0xb0025da16ec0
sleepq_block() at netbsd:sleepq_block+0x211
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52


** LWP 0.401 (iic0) @ 0xf1c637f1a600, l_stat=7
   
*** Locks held: none

*** Locks wanted:

* Lock 0 (initiax8106a700 type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c637f1a600 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

*** Traceback:

trace: pid 0 lid 401 at 0xb0025da11ec0
sleepq_block() at netbsd:sleepq_block+0x211
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52


** LWP 0.23 (softclk/1) @ 0xf1c63767f200, l_stat=7
   
*** Locks held:

* Lock 0 (initialized at soinit)
lock address : 0xf1cd177e3080 type : sleep/adaptive
initialed holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
relevant cpu :  1 last held:  1
r last held: 0xf1c63767f200
last locked* : 0x806c3e65 unlocked : 0x806d5ebd
owner field  : 0xf1c63767f200 wait/spin:0/0
Turnstile: no active turnstileted:

* Lock 0 (initialized at main)
lock address : 0x8106a700 type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked :

Re: wm0 panic

2020-06-27 Thread Patrick Welche

On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote:
> (must try with biosboot instead fo EFI which is the case here)
makes no difference

wm0 panic

2020-06-27 Thread Patrick Welche

Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
boot multiuser without a network. If I log in as root, as soon as I hit
enter:

# ifconfig wm0 inet 10.0.0.62 netmask 0xff00
[ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 
type :   spin
[ 127.5863237] initialized  : 0x80b0bbb9
[ 127.5863237] shared holds :  0 exclusive:  1
[ 127.5963238] shares wanted:  0 exclusive:  1
[ 127.6063236] relevant cpu :  1 last held:  0
[ 127.6163235] relevant lwp : 0x8d419a07f20
[ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6
[ 127.6263235] curcpu holds :  0 wanted by: 0x8d419a07f200
[ 127.6363234] panic: LOCKDEBock,244: spinout
[ 127.6363234] cpu1: Begin traceback...
[ 127.6463233] vpanic() at netbsd:vpanic+0x152
[ 127.6463233] snprintf() at netbsd:snprintf
[ 127.6563232] lockdebug_more() at netbsd:lockdebug_more
[ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244
[ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a
[ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34
[ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f
[ 127.6863230] softint_disph+0x108
[ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xa4825d02eff0
[ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 127.7063229] --- interrupt ---
[ 127.706322traceback...


(box is happily usable without the LOCKDEBUG - it just means I can't debug
what I'm trying to get at...)
(must try with biosboot instead fo EFI which is the case here)

wm0 at pci7 dev 0 function 0: I211 Ethernet (COPPER) (rev. 0x03)
wm0: for TX and RX interrupting at msix3 vec 0 affinity to 1
wm0: for TX and RX interrupting at msix3 vec 1 affinity to 2
wm0: for LINK interrupting at msix3 vec 2
wm0: PCI-Express bus
wm0: 64 words iNVM, version 0.6
wm0: Ethernet address 60:45:cb:9e:13:dd
wm0: COMPAT = 
wm0: Copper
wm0: 0xc614420
makphy0 at wm0 phy 1: I210 10/100/1000 media interface, rev. 0

# strings /netbsd | grep if_wm.c
$NetBSD: if_wm.c,v 1.679 2020/06/27 13:32:00 jmcneill Exp $



Cheers,

Patrick

1 2 3 4 5 >

1 - 100 of 460 matches

Mail list logo