Re: Automated report: NetBSD-current/i386 test failure

2023-10-13 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> This is an automatically generated notice of a new failure of the
> NetBSD test suite.
> 
> The newly failing test case is:
> 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v4v6_esp_cast128cbc
> 
> The above test failed in each of the last 4 test runs, and passed in
> at least 26 consecutive runs before that.
> 
> The following commits were made between the last successful test and
> the first failed test:
> 
> 2023.10.12.17.18.38 riastradh 
> src/crypto/external/bsd/heimdal/Makefile.inc 1.11
(etc)

Sorry about the multitude of automated reports, some of which (such as
the one quoted above) pointed at the wrong commit.

What happened is that a commit by ad@ caused a large number of tests
to start failing randomly, and because the failures are random and
numerous and today is Friday the 13th, some of the tests happened to
fail for the first time only after some later commit, and then
also happened to fail several times in a row.

This resulted in a pattern of a long stretch of successes followed by
several failures in a row.  This looks very much the same as a
deterministic failure introduced by the later commit, and the testbed
mistook it as such.

Even though some of the test failures were reported as an excessive
number of separate emails and attributed to the wrong commit, the
failures themselves are real.
-- 
Andreas Gustafsson, g...@gson.org


Re: odd setlist failure

2022-02-25 Thread Andreas Gustafsson
Greg Troxel wrote:
> current fails to build for me, complaining about ati_drv.so.19 in
> destdir but not in setlist.   I see that .6 is in the setlists now.   It
> my destdir I have:
> 
> -r--r--r--  1 gdt  wheel  7420 Jan 26 10:48 
> /usr/obj/gdt-current/destdir/i386/usr/X11R7/lib/modules/drivers/ati_drv.so.6
> -r--r--r--  1 gdt  wheel  7420 Feb 25 08:06 
> /usr/obj/gdt-current/destdir/i386/usr/X11R7/lib/modules/drivers/ati_drv.so.19
> lrwxr-xr-x  1 gdt  wheel13 Feb 25 08:06 
> /usr/obj/gdt-current/destdir/i386/usr/X11R7/lib/modules/drivers/ati_drv.so -> 
> ati_drv.so.19
> 
> Build host is netbsd-9 amd64.
> 
> Is anyone else seeing this?

Yes:

  
https://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2022.02.html#2022.02.24.08.06.41
  
https://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2022.02.html#2022.02.24.08.06.41

I'm guessing it started with this commit:

  2022.02.23.17.28.31 mrg 
src/external/mit/xorg/server/drivers/xf86-video-ati/Makefile 1.7

-- 
Andreas Gustafsson, g...@gson.org


Re: black screen, boot doesn't finish

2022-02-18 Thread Andreas Gustafsson
Thomas Klausner wrote:
> This commit
> 
> $NetBSD: drmfb.c,v 1.13 2022/02/16 23:30:10 riastradh Exp $
> 
> makes my graphical console disappear.

It also makes my i386 laptop testbed hang during boot:

  
http://www.gson.org/netbsd/bugs/build/i386-laptop/commits-2022.02.html#2022.02.16.23.30.10

-- 
Andreas Gustafsson, g...@gson.org


Re: Heads up: objdir is now rm -rf resistent

2021-12-15 Thread Andreas Gustafsson
m...@netbsd.org wrote:
> I hope fixing this is enough to fix all the cryptic issues.

The build is now fixed, but I still need to give the testbeds the
ability to automatically remove objdirs containing non-writable
directories, because otherwise they will get stuck whenever they
decide to build a historic version from the affected time range.

This is also going to be an ongoing pitfall for anyone building
historic versions, for example when bisecting.
-- 
Andreas Gustafsson, g...@gson.org


Heads up: objdir is now rm -rf resistent

2021-12-14 Thread Andreas Gustafsson
All,

The TNF testbed is currently failing to start new builds because it is
unable to remove the objdirs from previous builds using the Python
equivalent of "rm -rf".

Specifically, after the i386 build fails the way it currently does,
the objdir contains two directories with mode 0111, which rm -rf is
unable to remove:

  obj/distrib/i386/cdroms/bootcd/cdrom/var/spool/ftp/hidden
  obj/distrib/i386/cdroms/bootcd-com/cdrom/var/spool/ftp/hidden

The work-around is to manually chmod the directories to 0755 before
removing the objdir, but until I get around to automating that on the
testbed, you can expect a reduced level of automated testing service.
Also, you may want to be on the lookout for this failure mode in your
own builds (or the cleanup after them).
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-12-14 Thread Andreas Gustafsson
The build is still failing, now with:

  --- release-bootcd-com ---
  Copying set gpufw
  pax: line 206: ./libdata/firmware/nouveau/nvidia/gp107/gr/fecs_bl.bin: type 
mismatch: specfile link, tree file

This is from:

  
http://releng.netbsd.org/b5reports/i386/2021/2021.12.14.16.55.45/build.log.tail

-- 
Andreas Gustafsson, g...@gson.org


i386 install failing

2021-11-25 Thread Andreas Gustafsson
Hi ryo,

NetBSD-current/i386 panics when booting the install media since your
recent COMPAT_LINUX32 commits, with

  panic: kernel diagnostic assertion "*e->e_sigobject == NULL" failed: file 
"/tmp/build/2021.11.25.03.08.05-i386/src/sys/kern/kern_exec.c", line 2032

Logs:


http://releng.NetBSD.org/b5reports/i386/commits-2021.11.html#2021.11.25.03.08.05

The sparc port is also broken; it installs but panics when booting the installed
system.
-- 
Andreas Gustafsson, g...@gson.org


Panic running tests

2021-10-20 Thread Andreas Gustafsson
2fs/ext2fs_vnops.c,v 1.136
2021.10.20.03.08.19 thorpej src/sys/ufs/lfs/lfs_rename.c,v 1.25
2021.10.20.03.08.19 thorpej src/sys/ufs/lfs/lfs_vnops.c,v 1.340
2021.10.20.03.08.19 thorpej src/sys/ufs/lfs/ulfs_readwrite.c,v 1.28
2021.10.20.03.08.19 thorpej src/sys/ufs/lfs/ulfs_vnops.c,v 1.55
2021.10.20.03.08.19 thorpej src/sys/ufs/ufs/ufs_acl.c,v 1.3
2021.10.20.03.08.19 thorpej src/sys/ufs/ufs/ufs_extern.h,v 1.88
2021.10.20.03.08.19 thorpej src/sys/ufs/ufs/ufs_readwrite.c,v 1.127
2021.10.20.03.08.19 thorpej src/sys/ufs/ufs/ufs_rename.c,v 1.14
2021.10.20.03.08.19 thorpej src/sys/ufs/ufs/ufs_vnops.c,v 1.260
2021.10.20.03.08.19 thorpej src/tests/kernel/kqueue/t_vnode.c,v 1.2
2021.10.20.03.09.45 thorpej src/sys/sys/param.h,v 1.705
2021.10.20.03.13.14 thorpej src/sys/kern/vnode_if.c,v 1.115
2021.10.20.03.13.14 thorpej src/sys/rump/include/rump/rumpvnode_if.h,v 1.37
2021.10.20.03.13.14 thorpej src/sys/rump/librump/rumpvfs/rumpvnode_if.c,v 
1.37
2021.10.20.03.13.14 thorpej src/sys/sys/vnode_if.h,v 1.108

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2021.10.html#2021.10.20.03.26.20

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-09-26 Thread Andreas Gustafsson
Hi maya,

The build is still failing with:
> ./libdata/firmware/nouveau/nvidia/LICENCE.nvidia
> ./libdata/firmware/nouveau/nvidia/gm206/fecs_data.bin
> ./libdata/firmware/nouveau/nvidia/gm206/fecs_inst.bin
> ./libdata/firmware/nouveau/nvidia/gm206/gpccs_data.bin
> ./libdata/firmware/nouveau/nvidia/gm206/gpccs_inst.bin
> =  end of 5 extra files  ===
> *** Failed target: checkflist
> *** Failed commands:
>   ${SETSCMD} ${.CURDIR}/checkflist  ${MAKEFLIST_FLAGS} 
> ${CHECKFLIST_FLAGS} ${METALOG.unpriv}
> *** [checkflist] Error code 1
> nbmake[2]: stopped in /tmp/build/2021.09.25.21.26.04-i386/src/distrib/sets
> 1 error
> nbmake[2]: stopped in /tmp/build/2021.09.25.21.26.04-i386/src/distrib/sets
> nbmake[1]: stopped in /tmp/build/2021.09.25.21.26.04-i386/src
> nbmake: stopped in /tmp/build/2021.09.25.21.26.04-i386/src
> ERROR: Failed to make release

since this commit:
> 2021.09.25.21.26.03 maya 
> src/distrib/common/bootimage/Makefile.installimage,v 1.10
> 2021.09.25.21.26.03 maya src/distrib/sets/sets.subr,v 1.197
> 2021.09.25.21.26.04 maya src/distrib/sets/lists/gpufw/mi,v 1.3
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-08-18 Thread Andreas Gustafsson
Yesterday, the NetBSD Test Fixture wrote:
> nbmake[3]: stopped in /tmp/build/2021.08.17.17.31.59-i386/src
> nbmake[2]: stopped in /tmp/build/2021.08.17.17.31.59-i386/src
> nbmake[1]: stopped in /tmp/build/2021.08.17.17.31.59-i386/src
> nbmake: stopped in /tmp/build/2021.08.17.17.31.59-i386/src
> ERROR: Failed to make release

kre@ fixed this particular error, but the build is still failing on
i386 and other 32-bit platforms, now with different errors such as
these:

  --- dependall-sodium ---
  In file included from 
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/ed25519_ref10_fe_51.h:3,
   from 
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/ed25519_ref10.h:23,
   from 
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/crypto_scalarmult/curve25519/ref10/x25519_ref10.c:7:
  
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/common.h:14:1:
 error: unable to emulate 'TI'
 14 | typedef unsigned uint128_t __attribute__((mode(TI)));
| ^~~
  In file included from 
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/ed25519_ref10.h:23,
   from 
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/crypto_scalarmult/curve25519/ref10/x25519_ref10.c:7:
  
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/ed25519_ref10_fe_51.h:
 In function 'fe25519_mul':
  
/tmp/build/2021.08.17.22.29.11-i386/src/sys/external/isc/libsodium/dist/src/libsodium/include/sodium/private/ed25519_ref10_fe_51.h:300:17:
 error: right shift count >= width of type [-Werror=shift-count-overflow]
300 | carry  = r0 >> 51;
| ^~

This is from:

  
http://releng.netbsd.org/b5reports/i386/2021/2021.08.17.22.29.11/build.log.tail

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-07-24 Thread Andreas Gustafsson
On Monday, David Holland wrote:
> Right... I wonder what happened to bracket's error-matching script; it
> usually does better than that.

I have now deployed bracket 2.15 on babylon5.netbsd.org, and the latest
build failure report looks much better:

  http://mail-index.netbsd.org/current-users/2021/07/24/msg041311.html

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-07-18 Thread Andreas Gustafsson
David Holland wrote:
> On Mon, Jul 19, 2021 at 10:32:20AM +0900, Rin Okuyama wrote:
>  > Logs below are usually more helpful.
> 
> Right... I wonder what happened to bracket's error-matching script; it
> usually does better than that.

There are multiple causes, but a major one is that since babylon5 was
upgraded to a new server with more cores, the builds have more
parallelism, which causes make(1) to print more output from the other
parallel jobs after the actual error message, and bracket isn't
looking far enough back in the log.  I have a fix in testing on my own
testbed but still need to deploy it on babylon5.
-- 
Andreas Gustafsson, g...@gson.org


Re: 9.99.86 HEAD

2021-07-01 Thread Andreas Gustafsson
Martin Husemann wrote:
> Hmm, that is the last commit I needed to get everything working again
> here - any idea what exactly hangs and where?

The same way it has been hanging ever since dholland's commit of
2021.06.29.22.37.11 (except for a period when the tests didn't even
start):

  kernel/t_umountstress (98/899): 2 test cases
  fileop:

The latest test run on b5 has hung but not timed out yet; when it
does, the log will appear here:

  
http://releng.netbsd.org/b5reports/i386/commits-2021.07.html#2021.07.01.04.25.51

-- 
Andreas Gustafsson, g...@gson.org


Re: 9.99.86 HEAD

2021-07-01 Thread Andreas Gustafsson
Martin Husemann wrote:
> All regressions I am aware of have been fixed now.

At least i386 still hangs while running the ATF tests as of source
date 2021.07.01.04.25.51.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 install success

2021-06-13 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> The NetBSD-current/i386 install is working again.

So it is, but the system now panics during the ATF tests:

  dev/fss/t_fss (38/899): 1 test cases
  basic: [ 148.2076287] panic: kernel diagnostic assertion 
"KERNEL_LOCKED_P()" failed: file 
"/tmp/build/2021.06.13.03.09.20-i386/src/sys/kern/subr_autoconf.c", line 1972 
  [ 148.2076287] cpu0: Begin traceback...
  [ 148.2076287] 
vpanic(c11949f0,c9c33d7c,c9c33db0,c0cdc391,c11949f0,c1194937,c11cf716,c129d924,7b4,c2786c00)
 at netbsd:vpanic+0x13c
  [ 148.2076287] 
kern_assert(c11949f0,c1194937,c11cf716,c129d924,7b4,c2786c00,0,c14e1de0,c2689a00,c2689a00)
 at netbsd:kern_assert+0x23
  [ 148.2076287] config_detach(c2689a00,2,c1d1d000,0,0,c0d7335e,0,a300,80,0) at 
netbsd:config_detach+0x430
  [ 148.2076287] fss_close(a300,0,1,6000,c2a89180,0,a300,0,1,c21740c0) at 
netbsd:fss_close+0x131
  [ 148.2200843] 
spec_close(c9c33e3c,3,0,c117cc10,c2786c00,1,c1a304c0,c2786c00,c9c33e74,c0d69ab7)
 at netbsd:spec_close+0x209
  [ 148.2200843] 
VOP_CLOSE(c2786c00,1,c1a304c0,0,c9c33f38,c22ad980,c22ad99c,c9c33e88,c0d69b48,c2786c00)
 at netbsd:VOP_CLOSE+0x3d
  [ 148.2200843] 
vn_close(c2786c00,1,c1a304c0,c9c33ec4,c0c8971c,c22ad980,0,0,c9c33eac,c26a4680) 
at netbsd:vn_close+0x39
  [ 148.2200843] 
vn_closefile(c22ad980,0,0,c9c33eac,c26a4680,2c,402c7413,c2b05bc0,0,c2b05c4c) at 
netbsd:vn_closefile+0x22
  [ 148.2200843] 
closef(c22ad980,c2a89180,c9c33f9c,c012273c,c270a4d4,b3ff,2,c26a4680,c22ad980,c2b05bf0)
 at netbsd:closef+0x4f
  [ 148.2200843] 
fd_close(3,0,c2a89180,c2a89180,c2a89180,c9c33f9c,c04a1f8b,c2a89180,c9c33f68,c9c33f60)
 at netbsd:fd_close+0x17d
  [ 148.2200843] 
sys_close(c2a89180,c9c33f68,c9c33f60,c19f7bc8,0,6,c9c33f60,c9c33f68,0,0) at 
netbsd:sys_close+0x20
  [ 148.2200843] syscall() at netbsd:syscall+0x17c
  [ 148.2395535] --- syscall (number 6) ---
  [ 148.2395535] b405b597:
  [ 148.2395535] cpu0: End traceback...

Logs:

 
http://releng.netbsd.org/b5reports/i386/commits-2021.06.html#2021.06.13.00.11.17

amd64 is also affected.
-- 
Andreas Gustafsson, g...@netbsd.org


Re: Automated report: NetBSD-current/i386 build failure

2021-05-27 Thread Andreas Gustafsson
The i386 build is still failing, but now with a different error:

  --- in6_pcb.o ---
  /tmp/build/2021.05.27.08.58.29-i386/src/sys/netinet6/in6_pcb.c: In function 
'in6_pcblookup_port':
  cc1: error: function may return address of local variable 
[-Werror=return-local-addr]
  /tmp/build/2021.05.27.08.58.29-i386/src/sys/netinet6/in6_pcb.c:1056:26: note: 
declared here
   1056 |   struct vestigial_inpcb better;
|  ^~

It's not clear to me which of the commits made since christos first
broke the build could have triggered this, nor why this is not
affecting all ports.

Logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2021.05.html#2021.05.27.08.41.35

-- 
Andreas Gustafsson, g...@gson.org


Re: Problem reports for version control systems

2021-05-02 Thread Andreas Gustafsson
Brett Lymn wrote:
> Just for you info... there are a few NetBSD developers in .au, my self 
> included.  I haven't
> had any issues with cvs disconnects.  Not to deny you have an issue, just 
> letting you know
> it works ok for people near you.

For what it's worth, my connections to anoncvs from Finland frequently
break in the middle of a transfer, though I'm using rsync rather than
cvs.  I have an admins@ ticket open about this since a couple of years
ago (#160795).
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2021-04-18 Thread Andreas Gustafsson
The i386 build is still failing, with errors like these:

  --- dependall-tmux ---
  
/tmp/build/2021.04.18.12.05.29-i386/src/external/bsd/tmux/dist/cmd-display-menu.c:
 In function 'cmd_display_menu_get_position':
  
/tmp/build/2021.04.18.12.05.29-i386/src/external/bsd/tmux/dist/cmd-display-menu.c:158:8:
 error: comparison of integer expressions of different signedness: 'long int' 
and 'u_int' {aka 'unsigned int'} [-Werror=sign-compare]
158 |  if (n >= tty->sy)
|^~
  
/tmp/build/2021.04.18.12.05.29-i386/src/external/bsd/tmux/dist/cmd-display-menu.c:191:8:
 error: comparison of integer expressions of different signedness: 'long int' 
and 'u_int' {aka 'unsigned int'} [-Werror=sign-compare]
191 |  if (n >= tty->sy)
|^~
  
/tmp/build/2021.04.18.12.05.29-i386/src/external/bsd/tmux/dist/cmd-display-menu.c:239:8:
 error: comparison of integer expressions of different signedness: 'long int' 
and 'u_int' {aka 'unsigned int'} [-Werror=sign-compare]
239 |  if (n < h)
|    ^

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2021-01-17 Thread Andreas Gustafsson
The cause of the 1000+ new test failures has now been narrowed down to
the following commit:

 2021.01.16.23.50.49 chs src/sys/rump/librump/rumpkern/rump.c,v 1.352
 2021.01.16.23.51.50 chs src/sys/arch/arm/arm/psci.c,v 1.5
 2021.01.16.23.51.50 chs src/sys/conf/files,v 1.1278
 2021.01.16.23.51.51 chs src/sys/lib/libkern/arch/hppa/bcopy.S,v 1.16
 2021.01.16.23.51.51 chs src/sys/lib/libkern/libkern.h,v 1.141
 2021.01.16.23.51.51 chs src/sys/sys/cdefs.h,v 1.156
 2021.01.16.23.51.51 chs src/sys/sys/queue.h,v 1.76

Logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2021.01.html#2021.01.16.23.51.51

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-12-09 Thread Andreas Gustafsson
Yesterday, the NetBSD Test Fixture wrote:
> The newly failing test case is:
> 
> rump/rumpkern/t_vm:busypage

This one is still failing.  The rump kernel panics with:

  [   1.1400050] panic: kernel diagnostic assertion "(pg->flags & PG_FAKE) == 
0" failed: file 
"/tmp/build/2020.12.07.10.02.51-i386/src/lib/librump/../../sys/rump/librump/rumpkern/vm.c",
 line 710

A full log and backtrace is at:

  
http://releng.netbsd.org/b5reports/i386/2020/2020.12.07.10.02.51/test.html#rump_rumpkern_t_vm_busypage

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-12-07 Thread Andreas Gustafsson
Roland Illig wrote:
> >===  2 extra files in DESTDIR  =
> Fixed in distrib/sets/lists/tests/mi 1.984.

Confirmed fixed, thanks.  The i386 build is still failing, though -
it's now back to failing in mpu_acpi.c:

  --- kern-GENERIC ---
  /tmp/build/2020.12.07.08.31.07-i386/src/sys/dev/acpi/mpu_acpi.c:119:38: 
error: cast from pointer to integer of different size 
[-Werror=pointer-to-int-cast]
119 |  sc->arg = acpi_intr_establish(self, (uint64_t)aa->aa_node->ad_handle,
|  ^

Logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.12.html#2020.12.07.08.31.07

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-12-06 Thread Andreas Gustafsson
The build is now failing differently:

  ===  2 extra files in DESTDIR  =
  Files in DESTDIR but missing from flist.
  File is obsolete or flist is out of date ?
  --
  ./usr/tests/usr.bin/make/unit-tests/opt-keep-going-multiple.exp
  ./usr/tests/usr.bin/make/unit-tests/opt-keep-going-multiple.mk
  =  end of 2 extra files  ===

Logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.12.html#2020.12.07.01.32.04

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-12-06 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> nbmake[2]: stopped in 
> /tmp/build/2020.12.06.12.54.32-i386/obj/sys/arch/i386/compile/LEGACY

More specifically:

  --- mpu_acpi.o ---
  /tmp/build/2020.12.06.12.23.13-i386/src/sys/dev/acpi/mpu_acpi.c: In function 
'mpu_acpi_attach':
  /tmp/build/2020.12.06.12.23.13-i386/src/sys/dev/acpi/mpu_acpi.c:119:38: 
error: cast from pointer to integer of different size 
[-Werror=pointer-to-int-cast]
119 |  sc->arg = acpi_intr_establish(self, (uint64_t)aa->aa_node->ad_handle,
|  ^

> The following commits were made between the last successful build and
> the failed build:

Now bisected to this commit:

> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/amdccp_acpi.c,v 1.3
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/atppc_acpi.c,v 1.18
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/fdc_acpi.c,v 1.44
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/lpt_acpi.c,v 1.21
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/mpu_acpi.c,v 1.14
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/pckbc_acpi.c,v 1.38
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/spic_acpi.c,v 1.7
> 2020.12.06.12.23.13 jmcneill src/sys/dev/acpi/wb_acpi.c,v 1.6
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build success

2020-12-06 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> The NetBSD-current/i386 build is working again.

It is, but the amd64 build is failing with:

===  1 extra files in DESTDIR  =
Files in DESTDIR but missing from flist.
File is obsolete or flist is out of date ?
--
./usr/bin/gdbserver
=  end of 1 extra files  ===

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-11-13 Thread Andreas Gustafsson
The build is now failing with:

  ===  2 extra files in DESTDIR  =
  Files in DESTDIR but missing from flist.
  File is obsolete or flist is out of date ?
  --
  ./usr/tests/usr.bin/make/unit-tests/objdir-writable.exp
  ./usr/tests/usr.bin/make/unit-tests/objdir-writable.mk
  =  end of 2 extra files  ===

Logs at:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.11.html#2020.11.13.09.56.53

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-11-13 Thread Andreas Gustafsson
The build is still failing, but now with a different error:

  nbmake[5]: "/tmp/build/2020.11.13.08.33.07-i386/src/external/ofl/Makefile" 
line 3: Malformed conditional (${MKX11} != "no")

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-11-09 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> nbmake[7]: nbmake[7]: don't know how to make -ltermlib. Stop

The build is still failing.  The problems started with this commit:

 2020.11.08.21.56.47 nia src/external/bsd/kyua-cli/Makefile.inc,v 1.8
 2020.11.08.21.56.47 nia src/external/ibm-public/postfix/Makefile.inc,v 1.25
 2020.11.08.21.56.48 nia src/external/public-domain/sqlite/Makefile.inc,v 
1.9
 2020.11.08.21.56.48 nia src/external/public-domain/sqlite/bin/Makefile,v 
1.7
 2020.11.08.21.56.48 nia src/external/public-domain/sqlite/lib/Makefile,v 
1.12
 2020.11.08.21.56.48 nia 
src/external/public-domain/sqlite/lib/sqlite3.pc.in,v 1.3
 2020.11.08.21.56.48 nia src/usr.sbin/makemandb/Makefile,v 1.10

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-11-02 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> This is an automatically generated notice of a new failure of the
> NetBSD test suite.
> 
> The newly failing test case is:
> 
> lib/libm/t_fmod:fmod
> 
[...]
> 2020.08.23.06.12.52 rillig src/usr.bin/make/buf.c,v 1.36
[...]

False alarm - it looks like testbed has somehow managed to dig up an
old test failure from August that has already been fixed.  Sorry about
that, and I will make some changes to keep it from happening again.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-10-29 Thread Andreas Gustafsson
nia wrote:
> It should be fixed already.

It's not, it's now failing earlier in the build:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.10.html#2020.10.29.16.35.33

-- 
Andreas Gustafsson, g...@gson.org


Re: All ATF curses tests failing on babylon5 i386

2020-10-28 Thread Andreas Gustafsson
> >   | Should be fixed.
> > Is fixed, thanks.
> Thanks for the fixes.

The wgetch test case is still failing:

  
http://releng.netbsd.org/b5reports/i386/2020/2020.10.28.03.21.25/test.html#lib_libcurses_t_curses_wgetch

-- 
Andreas Gustafsson, g...@gson.org


Automated report: NetBSD-current/i386 test failure

2020-10-24 Thread Andreas Gustafsson
[Manually forwarded as the automated notifications are temporarily
disabled while new hardware is being tested]

This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

lib/libcurses/t_curses:addch
lib/libcurses/t_curses:addchnstr
lib/libcurses/t_curses:addchstr
lib/libcurses/t_curses:addnstr
lib/libcurses/t_curses:addstr
lib/libcurses/t_curses:assume_default_colors
lib/libcurses/t_curses:attributes
lib/libcurses/t_curses:background
lib/libcurses/t_curses:beep
lib/libcurses/t_curses:bkgdset
lib/libcurses/t_curses:box
lib/libcurses/t_curses:can_change_color
lib/libcurses/t_curses:cbreak
lib/libcurses/t_curses:chgat
lib/libcurses/t_curses:clear
lib/libcurses/t_curses:copywin
lib/libcurses/t_curses:curs_set
lib/libcurses/t_curses:define_key
lib/libcurses/t_curses:derwin
lib/libcurses/t_curses:doupdate
lib/libcurses/t_curses:dupwin
lib/libcurses/t_curses:erasechar
lib/libcurses/t_curses:flash
lib/libcurses/t_curses:getattrs
lib/libcurses/t_curses:getbkgd
lib/libcurses/t_curses:getch
lib/libcurses/t_curses:getcurx
lib/libcurses/t_curses:getmaxx
lib/libcurses/t_curses:getmaxy
lib/libcurses/t_curses:getnstr
lib/libcurses/t_curses:getparx
lib/libcurses/t_curses:getstr
lib/libcurses/t_curses:has_colors
lib/libcurses/t_curses:has_ic
lib/libcurses/t_curses:hline
lib/libcurses/t_curses:inch
lib/libcurses/t_curses:inchnstr
lib/libcurses/t_curses:init_color
lib/libcurses/t_curses:innstr
lib/libcurses/t_curses:is_linetouched
lib/libcurses/t_curses:is_wintouched
lib/libcurses/t_curses:keyname
lib/libcurses/t_curses:keyok
lib/libcurses/t_curses:killchar
lib/libcurses/t_curses:meta
lib/libcurses/t_curses:mvaddch
lib/libcurses/t_curses:mvaddchnstr
lib/libcurses/t_curses:mvaddchstr
lib/libcurses/t_curses:mvaddnstr
lib/libcurses/t_curses:mvaddstr
lib/libcurses/t_curses:mvchgat
lib/libcurses/t_curses:mvcur
lib/libcurses/t_curses:mvderwin
lib/libcurses/t_curses:mvgetnstr
lib/libcurses/t_curses:mvgetstr
lib/libcurses/t_curses:mvhline
lib/libcurses/t_curses:mvinchnstr
lib/libcurses/t_curses:mvprintw
lib/libcurses/t_curses:mvscanw
lib/libcurses/t_curses:mvvline
lib/libcurses/t_curses:mvwin
lib/libcurses/t_curses:nocbreak
lib/libcurses/t_curses:nodelay
lib/libcurses/t_curses:pad
lib/libcurses/t_curses:startup
lib/libcurses/t_curses:termattrs
lib/libcurses/t_curses:timeout
lib/libcurses/t_curses:wborder
lib/libcurses/t_curses:window
lib/libcurses/t_curses:wprintw
lib/libcurses/t_curses:wscrl

The above tests failed in each of the last 4 test runs, and passed in
at least 26 consecutive runs before that.

Between the last successful test and the failed test, a total of 299
revisions were committed, by the following developers:

blymn
rillig

The first of these commits was made on CVS date 2020.10.24.04.40.45,
and the last on 2020.10.24.04.46.17.

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2020.10.html#2020.10.24.04.46.17



Automated report: NetBSD-current/i386 build failure

2020-10-24 Thread Andreas Gustafsson
[Manually forwarded as the automated notifications are temporarily
disabled while new hardware is being tested]

This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2020.10.24.04.47.43.

An extract from the build.sh output follows:

./usr/tests/lib/libcurses/tests/window_hierarchy
./usr/tests/lib/libcurses/tests/winnstr
./usr/tests/lib/libcurses/tests/winnwstr
./usr/tests/lib/libcurses/tests/wins_nwstr
./usr/tests/lib/libcurses/tests/wins_wch
./usr/tests/lib/libcurses/tests/wins_wstr
./usr/tests/lib/libcurses/tests/winsch
./usr/tests/lib/libcurses/tests/winwstr
./usr/tests/lib/libcurses/tests/wredrawln
./usr/tests/lib/libcurses/tests/wsetscrreg
./usr/tests/lib/libcurses/tests/wstandout
./usr/tests/lib/libcurses/tests/wtimeout
./usr/tests/lib/libcurses/tests/wtouchln
./usr/tests/lib/libcurses/tests/wunderscore
./usr/tests/lib/libcurses/tests/wvline
./usr/tests/lib/libcurses/tests/wvline_set
  end of 249 missing files  ==
*** [checkflist] Error code 1
nbmake[2]: stopped in /tmp/build/2020.10.24.04.47.43-i386/src/distrib/sets
1 error
nbmake[2]: stopped in /tmp/build/2020.10.24.04.47.43-i386/src/distrib/sets
ERROR: Failed to make release

The following commits were made between the last successful build and
the failed build:

2020.10.24.04.47.43 blymn src/distrib/sets/lists/tests/mi,v 1.951

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2020.10.html#2020.10.24.04.47.43



Re: Automated report: NetBSD-current/i386 test failure (l2tp)

2020-10-23 Thread Andreas Gustafsson
Roy Marples wrote:
> This is rump crashing and I don't know why.

If the rump kernel crashes in the test, that likely means the real
kernel will crash in actual use.

> I can't get a backtrace to tell me where the problem is.

I managed to get one this way:

  sysctl -w kern.defcorename="/tmp/%n.core"
  cd /usr/tests/net/if_l2tp
  ./t_l2tp l2tp_basic_ipv4overipv4
  gdb rump_server /tmp/rump_server.core

It looks like this:

  (gdb) bt
  #0  0x752206d751ea in _lwp_kill () from /usr/lib/libc.so.12
  #1  0x752206d756e5 in abort ()
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/libc/stdlib/abort.c:74
  #2  0x7522076088bf in rumpuser_exit (rv=rv@entry=-1)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librumpuser/rumpuser.c:236
  #3  0x7522082c2b74 in cpu_reboot (howto=, 
  bootstr=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/librump/rumpkern/emul.c:429
  #4  0x75220827b08d in kern_reboot (howto=4, bootstr=0x0)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/../kern/kern_reboot.c:73
  #5  0x752208279efe in vpanic (
  fmt=0x752205ea5428 "kernel %sassertion \"%s\" failed: file \"%s\", line 
%d ", ap=0x75220319fc88)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/../kern/subr_prf.c:290
  #6  0x75220825f298 in kern_assert (fmt=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/../lib/libkern/kern_assert.c:51
  #7  0x752205e9fa7d in if_percpuq_enqueue (ipq=0x0, m=0x752208061650)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libnet/../../../../net/if.c:911
  #8  0x752204a03801 in in_l2tp_input (eparg=, 
  proto=, off=20, m=0x752208061858)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libl2tp/../../../../netinet/in_l2tp.c:349
  #9  in_l2tp_input (m=0x752208061858, off=20, proto=, 
  eparg=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libl2tp/../../../../netinet/in_l2tp.c:249
  #10 0x752205e75097 in encap4_input (m=0x752208061858, off=20, proto=115)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libnet/../../../../netinet/ip_encap.c:357
  #11 0x752205e7d465 in ip_input (ifp=, m=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libnet/../../../../netinet/ip_input.c:821
  #12 ipintr (arg=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/sys/rump/net/lib/libnet/../../../../netinet/ip_input.c:412
  #13 0x7522082c265c in sithread (arg=)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/librump/rumpkern/intr.c:180
  #14 0x7522082bf52e in threadbouncer (arg=0x75220883bac0)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/librump/../../sys/rump/librump/rumpkern/threads.c:90
  #15 0x75220720bf7e in pthread__create_tramp (cookie=0x7522085a4800)
  at 
/tmp/build/2020.10.22.11.21.42-amd64-debug/src/lib/libpthread/pthread.c:560
  #16 0x752206c91dc0 in ?? () from /usr/lib/libc.so.12

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure (l2tp)

2020-10-22 Thread Andreas Gustafsson
Hi Roy,

On Oct 16, the NetBSD Test Fixture wrote:
> The newly failing test cases are:
> 
> net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv4
> net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv6
> net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv4
> net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv6
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_hmacsha512
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_rijndaelcbc
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_hmacsha512
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_rijndaelcbc
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_hmacsha512
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_rijndaelcbc
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_hmacsha512
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_null
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_rijndaelcbc

These are still failing as of 2020.10.21.15.12.15, and the commit that
triggered the failures has now been identified:

  2020.10.15.02.54.10 roy src/sys/net/if_l2tp.c 1.44

For logs, see

  
http://www.gson.org/netbsd/bugs/build/amd64/commits-2020.10.html#2020.10.15.02.54.10

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-10-21 Thread Andreas Gustafsson
Two days ago, the NetBSD Test Fixture wrote:
> This is an automatically generated notice of new failures of the
> NetBSD test suite.
> 
> The newly failing test cases are:
> 
> sbin/resize_ffs/t_grow:grow_16M_v0_8192
> sbin/resize_ffs/t_grow:grow_16M_v1_16384
> sbin/resize_ffs/t_grow:grow_16M_v2_32768
> sbin/resize_ffs/t_grow_swapped:grow_16M_v0_65536
> sbin/resize_ffs/t_grow_swapped:grow_16M_v1_4096
> sbin/resize_ffs/t_grow_swapped:grow_16M_v2_8192
> sbin/resize_ffs/t_shrink:shrink_24M_16M_v0_32768
> sbin/resize_ffs/t_shrink:shrink_24M_16M_v1_65536
> sbin/resize_ffs/t_shrink_swapped:shrink_24M_16M_v0_4096
> sbin/resize_ffs/t_shrink_swapped:shrink_24M_16M_v1_8192

These are still failing as of source date 2020.10.21.06.36.10, and the
commit that triggered the failures has now been identified:

  2020.10.18.18.22.29 chs src/sys/rump/librump/rumpvfs/vm_vfs.c 1.39
  2020.10.18.18.22.29 chs src/sys/uvm/uvm_page.c 1.248
  2020.10.18.18.22.29 chs src/sys/uvm/uvm_pager.c 1.130

Logs from real amd64 hardware are at:

  
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.10.html#2020.10.18.18.22.29

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-10-14 Thread Andreas Gustafsson
On Oct 8, the NetBSD Test Fixture wrote:
> The newly failing test cases are:
>
> net/carp/t_basic:carp_handover_ipv4_halt_carpdevip
> net/carp/t_basic:carp_handover_ipv4_halt_nocarpdevip
> net/carp/t_basic:carp_handover_ipv4_ifdown_carpdevip
> net/carp/t_basic:carp_handover_ipv4_ifdown_nocarpdevip
> net/carp/t_basic:carp_handover_ipv6_halt_carpdevip
> net/carp/t_basic:carp_handover_ipv6_ifdown_carpdevip

These were fixed on Oct 8, but then broken again on Oct 12:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.10.html#2020.10.12.11.07.27

They are still failing as of source date 2020.10.13.21.27.18.
-- 
Andreas Gustafsson, g...@gson.org


-current fails to boot

2020-10-04 Thread Andreas Gustafsson
All,

At least i386 and amd64 are currently in a state where installation
using sysinst completes successfully, but the installed system fails
to boot.  The problem appears to have started yesterday during a
period of build breakage encompassing the following commits:

  2020.10.03.17.30.54 rillig src/usr.bin/make/unit-tests/Makefile 1.159
  2020.10.03.17.30.54 rillig src/usr.bin/make/unit-tests/hanoi-include.exp 1.1
  2020.10.03.17.30.54 rillig src/usr.bin/make/unit-tests/hanoi-include.mk 1.1
  2020.10.03.17.31.46 thorpej src/sys/arch/alpha/alpha/autoconf.c 1.55
  2020.10.03.17.31.46 thorpej src/sys/arch/alpha/alpha/machdep.c 1.366
  2020.10.03.17.31.46 thorpej src/sys/arch/alpha/alpha/prom.c 1.58
  2020.10.03.17.31.46 thorpej src/sys/arch/alpha/include/alpha.h 1.42
  2020.10.03.17.31.46 thorpej src/sys/arch/alpha/include/prom.h 1.16
  2020.10.03.17.32.49 thorpej src/sys/arch/alpha/alpha/qemu.c 1.3
  2020.10.03.17.33.23 thorpej src/sys/arch/alpha/include/rpb.h 1.44
  2020.10.03.18.06.37 christos src/sbin/mount_nfs/mount_nfs.8 1.49
  2020.10.03.18.06.37 christos src/sbin/mount_nfs/mount_nfs.c 1.73
  2020.10.03.18.29.02 wiz src/sbin/mount_nfs/mount_nfs.8 1.50
  2020.10.03.18.30.39 christos src/include/rpc/auth.h 1.20
  2020.10.03.18.31.29 christos src/lib/libc/rpc/Makefile.inc 1.27
  2020.10.03.18.31.29 christos src/lib/libc/rpc/auth_unix.c 1.27
  2020.10.03.18.31.29 christos src/lib/libc/rpc/rpc_clnt_auth.3 1.7
  2020.10.03.18.33.52 christos src/distrib/sets/lists/comp/mi 1.2361
  2020.10.03.18.34.15 christos src/lib/libc/shlib_version 1.290
  2020.10.03.18.35.21 christos src/distrib/sets/lists/base/shl.mi 1.908
  2020.10.03.18.35.21 christos src/distrib/sets/lists/debug/shl.mi 1.267
  2020.10.03.18.42.20 christos src/sbin/mount_nfs/mount_nfs.c 1.74
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/bsddisklabel.c 1.46
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/disklabel.c 1.40
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/gpt.c 1.19
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/label.c 1.26
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/mbr.c 1.34
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/part_edit.c 1.18
  2020.10.03.18.54.18 martin src/usr.sbin/sysinst/partitions.h 1.17
  2020.10.03.20.34.06 rillig src/distrib/sets/lists/tests/mi 1.937

-- 
Andreas Gustafsson, g...@gson.org


Finding errors in build logs

2020-09-27 Thread Andreas Gustafsson
On June 21, Simon J. Gerraty wrote:
> > It would be helpful for both human and robotic users if error messages
> > consistently included the word "error", or if there was some other easy
> > way of identifying them in the build log.
> 
> The regex 'make.*stopped' is the best clue to look for since it will
> always be present.

It is not present in this recent log extract:

  
http://releng.netbsd.org/b5reports/i386/2020/2020.09.27.13.59.24/build.log.tail

I also checked the full log, and it's not there, either.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-08-28 Thread Andreas Gustafsson
The NetBSD Test Fixture reported this test failure twice:
> net/net/t_unix:sockaddr_un_local_peereid

Sorry about the duplicate report.  The testbed is now using Python 3
and that appears to have broken the duplicate suppression.  I'll fix it.
-- 
Andreas Gustafsson, g...@gson.org


Re: System panicing on boot since recent uvm changes

2020-08-16 Thread Andreas Gustafsson
Tobias Nygren wrote:
> Seems there is still something wrong with -current.
> ./build.sh -j8 hangs in <10 seconds on a t3.2xlarge EC2 instance.
> Reverting to a -D20200812 kernel makes it stable.

FWIW, I successfully completed a "build.sh -j 24 release" of 9.0
hosted on a -current built from source date 2020.08.16.00.24.41,
running on real amd64 hardware.
-- 
Andreas Gustafsson, g...@gson.org


Re: System panicing on boot since recent uvm changes

2020-08-16 Thread Andreas Gustafsson
Chuck Silvers wrote:
> this should be fixed now.
> sorry about that, the problem did not happen for me and
> it took me forever to find a way that I could reproduce it.

This is not to pick on you specifically as almost everyone is doing
the same thing, but IMO, in cases like this it would generally be
better to revert the commit immediately and later re-commit a correct
version rather than leaving things broken during the entire process of
reproducing and fixing the issue.
-- 
Andreas Gustafsson, g...@gson.org


System panicing on boot since recent uvm changes

2020-08-15 Thread Andreas Gustafsson
Hi chs,

At least i386, amd64, and sparc are all panicing on boot since this commit:

  2020.08.14.09.06.14 chs src/sys/miscfs/genfs/genfs_io.c 1.100
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_extern.h 1.231
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_object.c 1.24
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_object.h 1.39
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_page.c 1.245
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_page_status.c 1.6
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_pager.c 1.129
  2020.08.14.09.06.15 chs src/sys/uvm/uvm_vnode.c 1.116

Logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.08.html#2020.08.14.09.06.15

Please revert the commit.
-- 
Andreas Gustafsson, g...@gson.org


Re: i386 and amd64 testbeds now use NVMM

2020-08-10 Thread Andreas Gustafsson
Jukka Ruohonen wrote:
> > This reduces the time it takes to run the test suite from more than
> > 20 hours to about 3-4 hours.  Many thanks to Maxime Villard for making
> > this possible by writing NVMM.
> 
> Does this mean that the amount of test runs increases accordingly (i.e.,
> to about six runs per 24h)?

It's a bit more complicated than that.  Since multiple source versions
are tested in parallel, the i386 tests have been achieving a
throughput of more than six runs per 24 h even before the switch to
NVMM.  Using NVMM frees up a significant amount of CPU, but the builds
and the sparc tests still use as much CPU as before.  So the overall
throughput of the server has increased, but by a smaller factor than
the latency of the i386 and amd64 tests.
-- 
Andreas Gustafsson, g...@gson.org


i386 and amd64 testbeds now use NVMM

2020-08-10 Thread Andreas Gustafsson
Hi all,

The TNF testbed is now using NVMM for the i386 and amd64 tests:

  http://releng.netbsd.org/b5reports/i386/
  http://releng.netbsd.org/b5reports/amd64/

This reduces the time it takes to run the test suite from more than
20 hours to about 3-4 hours.  Many thanks to Maxime Villard for making
this possible by writing NVMM.

The switch to NVMM was made yesterday, but since the testbed may test
source versions out of order, there is not necessarily an unambiguous
transition point in terms of -current source dates.  To determine
whether a given test run was made using NVMM or not, look for "-accel
nvmm" in the qemu command line in the console log.

Some test cases that were previously failing are now passing and vice
versa.  For example, kernel/t_trapsignal:fpe_* now pass, but
lib/libpthread/t_condwait:* now fail (these contain a work-around for
the qemu timing issues of PR 43997, but now fail to detect that they
are running under qemu).
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-08-09 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> --- kern-XEN3PAE_DOMU ---
> *** [netbsd] Error code 1
> nbmake[2]: stopped in 
> /tmp/bracket/build/2020.08.09.11.04.05-i386/obj/sys/arch/i386/compile/XEN3PAE_DOMU

Specifically:

  --- kern-XEN3PAE_DOMU ---
  /tmp/bracket/build/2020.08.09.11.04.05-i386/tools/bin/i486--netbsdelf-ld: 
trap.o: in function `trap':
  trap.c:(.text+0xe27): undefined reference to `x86_cpu_is_lcall'

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-07-26 Thread Andreas Gustafsson
The build is still failing, current error as of 2020.07.26.09.17.24:

  ===  1 extra files in DESTDIR  =
  Files in DESTDIR but missing from flist.
  File is obsolete or flist is out of date ?
  --
  ./usr/libdata/debug/usr/tests/sys/crypto/chacha
  =  end of 1 extra files  ===

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-07-19 Thread Andreas Gustafsson
The build is still broken as of source date 2020.07.19.16.22.44:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.07.html#2020.07.19.16.22.44

-- 
Andreas Gustafsson, g...@gson.org


Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1

2020-07-06 Thread Andreas Gustafsson
Brian Buhrow wrote:
>   Hello.  I'm thinking of notebooks.   Yes, they have screens and
> keyboards, but those are not always usable and, having a serial console
> over USB could let someone install to a notebook remotely.
> Also, I've encountered some Intel based  appliance boards that don't have 
> easily
> used serial ports on them.  When they're installed in cramped wiring
> closets, it's much easier to get a USB serial port on them than it is to
> get a screen and keyboard.

It's not just laptops and appliance boards - even ATX sized PC
motherboards have been made with no com ports for a long time,
for example the Intel DH67CL from 2011.  The specifications at

  
https://ark.intel.com/content/www/us/en/ark/products/50101/intel-desktop-board-dh67cl.html

say

  # of Serial Ports:  0
  Serial Port via Internal Header: No

and when booting NetBSD on one, the dmesg output contains no "com" entry.
-- 
Andreas Gustafsson, g...@gson.org


Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1

2020-07-06 Thread Andreas Gustafsson
Martin Husemann wrote:
> USB keyboards as console in ddb worked fine last I tested.

That has not been my experience.  For example, 

  PR 52569 Entering ddb using USB keyboard panics with "locking against myself"
  PR 54599 Can't enter ddb using USB keyboard because console

> Running the usb host in polled mode however is quite a bit simpler than
> doing the device part (where you have to obey timings from the host).

Why would acting as a device be needed?
-- 
Andreas Gustafsson, g...@gson.org


Re: Build error on amd64 -current

2020-06-27 Thread Andreas Gustafsson
Paul Goyette wrote:
> With up-to-date sources I'm getting
> 
> /build/netbsd-compat/src_ro/sys/arch/xen/x86/cpu.c: In function 
> 'mp_cpu_start':
> /build/netbsd-compat/src_ro/sys/arch/xen/x86/cpu.c:999:1: error: stack usage 
> is5408 bytes [-Werror=stack-usage=]
>   mp_cpu_start(struct cpu_info *ci, vaddr_t target)
>   ^~~~

It started with this commit:

  2020.06.25.14.52.26 jdolecek src/sys/conf/Makefile.kern.inc 1.274

  enable gcc stack usage limit for kernel functions, set to 3.5 KiB for now
  as that seems to be enough to accomodate the current biggest stack usages

  there are about six functions which use over 3KiB local stack, and
  about a dozen between 2-3 KiB, so pushing this further needs more work
  if desired

  compile tested on amd64, i386, sparc64, sparc, powerpc (evbppc - BookE),
  m68k (mac68k)

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-06-21 Thread Andreas Gustafsson
Simon J. Gerraty wrote:
> Simon J. Gerraty  wrote:
> > > It would be helpful for both human and robotic users if error messages
> > > consistently included the word "error", or if there was some other easy
> > > way of identifying them in the build log.
> > 
> > The regex 'make.*stopped' is the best clue to look for since it will
> > always be present.

I'll change bracket to look for that and see how it works.

> BTW if this behavior change is a problem for your automation, you can
> disable it by setting .MAKE.DIE_QUIETLY=no

That would be counter to my principle of testing with default settings.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-06-21 Thread Andreas Gustafsson
Martin pointed me to this error some 63 lines from the end of the log:

  --- dependall-tests ---
  nbmake[7]: nbmake[7]: don't know how to make t_cabsl.cc. Stop

I think the reason I didn't find it myself is that I have developed a
habit of searching for the message "Error code 1" (or similar with
another number) which used to be printed by make, but that's no longer
there.  Bracket also looks for that string as part of its heuristics
for deciding how much of the build log to include in the email report,
which is why this report didn't include any of it.

It would be helpful for both human and robotic users if error messages
consistently included the word "error", or if there was some other easy
way of identifying them in the build log.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-06-21 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> This is an automatically generated notice of a NetBSD-current/i386
> build failure.
> 
> The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
> using sources from CVS date 2020.06.21.03.39.21.
> 
> The following commits were made between the last successful build and
> the failed build:
> 
> 2020.06.21.03.39.21 lukem src/share/mk/bsd.dep.mk,v 1.85
> 
> Logs can be found at:
> 
> 
> http://releng.NetBSD.org/b5reports/i386/commits-2020.06.html#2020.06.21.03.39.21

The full build log can be found at:

  http://releng.netbsd.org/b5reports/i386/2020/2020.06.21.03.39.21/build.log
  
It's not clear from the log what the error was or where it occurred,
and I'm wondering if the lack of identifying and locating information
could be related to another recent commit:

  2020.06.19.21.17.48 sjg src/usr.bin/make/job.c 1.198
  2020.06.19.21.17.48 sjg src/usr.bin/make/main.c 1.275
  2020.06.19.21.17.48 sjg src/usr.bin/make/make.h 1.108

  Avoid unnecessary noise when sub-make or sibling dies

  When analyzing a build log, the first 'stopped' output
  from make, is the end of interesting output.

  Normally when a build fails deep down in a parallel build
  the log ends with many blockes of error output from make,
  with all but the fist being unhelpful.

  We add a function dieQuietly() which will return true
  if we should supress the error output from make.
  If the failing node was a sub-make, we want to die quietly.

  Also when we read an abort token we call dieQuietly telling we
  want to die quietly.

  This behavior is suppressed by -dj or
  setting .MAKE.DIE_QUIETLY=no

  Reviewed by: christos

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-04-28 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> This is an automatically generated notice of new failures of the
> NetBSD test suite.
> 
> The newly failing test cases are:
> 
> dev/audio/t_audio:AUDIO_ERROR_RDWR
> dev/audio/t_audio:AUDIO_ERROR_WRONLY
[and many more]

That message got stuck somewhere (moderation?) for three days, and
those particular failures have already been fixed.

There are still plenty of other test cases failing, though, 79 of them
at last count:

  
http://releng.netbsd.org/b5reports/i386/2020/2020.04.27.02.54.42/test.html#failed-tcs-summary

-- 
Andreas Gustafsson, g...@gson.org


Re: github.com/NetBSD/src 5 days old?

2020-04-28 Thread Andreas Gustafsson
m...@netbsd.org wrote:
> Yes, I believe joerg and spz are changing the conversion from
> cvs->??->git to hg->git, to match what will be done once we stop using
> CVS.

Has there been a formal decision choosing hg over git?
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-04-25 Thread Andreas Gustafsson
There are actually now more than 2,000 failing test cases in total,
but the email message reporting most of them has failed to appear on
current-users, perhaps because of its size.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-04-18 Thread Andreas Gustafsson
The NetBSD Test Fixture sent three reports listing the following
groups of commits, respectively:

>2020.04.16.14.39.58 joerg src/lib/libc/gen/pthread_atfork.c,v 1.13
>2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.c,v 1.204
>2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.h,v 1.139
>2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/symbols.map,v 1.3

>2020.04.16.18.20.46 msaitoh src/sys/dev/pci/pcidevs,v 1.1406
>2020.04.16.18.21.12 msaitoh src/sys/dev/pci/pcidevs.h,v 1.1394
>2020.04.16.18.21.12 msaitoh src/sys/dev/pci/pcidevs_data.h,v 1.1393
>2020.04.16.18.32.29 msaitoh src/sys/dev/pci/ichsmb.c,v 1.67
>2020.04.16.18.51.47 pgoyette src/share/man/man4/man4.x86/imcsmb.4,v 1.8
>2020.04.16.18.56.04 pgoyette src/share/man/man4/man4.x86/imcsmb.4,v 1.9
>2020.04.16.19.23.50 bouyer src/sys/arch/xen/xen/Attic/xen_clock.c,v 1.1

>2020.04.16.15.47.19 christos 
> src/external/gpl3/binutils/dist/ld/emultempl/elf.em,v 1.2
>2020.04.16.15.58.13 jdolecek 
> src/sys/external/mit/xen-include-public/dist/xen/include/public/io/blkif.h,v 
> 1.2
>2020.04.16.16.38.43 jdolecek src/sys/arch/xen/xen/xbd_xenbus.c,v 1.116
>2020.04.16.17.18.27 nat src/sys/dev/ic/rtwnreg.h,v 1.3
>2020.04.16.17.18.27 nat src/sys/dev/usb/if_urtwn.c,v 1.86

The latter two reports are spurious, and the commits listed in them
have nothing to do with the breakage.  The reason for the spurious
reports is that a large number of t_ptrace_wait* test cases started
failing with the commit listen in the first report, but are not failing
in every run.  Tests that happened to pass in the first run and fail
four times in a row after that got reported one commit too late, etc.
-- 
Andreas Gustafsson, g...@gson.org


Re: Build time measurements

2020-04-11 Thread Andreas Gustafsson
Earlier, I wrote:
> > After disabling DIAGNOSTIC and acpicpu, they are:
> > 
> > 2016.09.06.06.27.173319.87 real  9767.39 user  4184.24 sys
> > 2019.10.18.17.16.503525.65 real 10309.00 user 11618.57 sys
> > 2020.03.17.22.03.412419.52 real  9577.58 user  9602.81 sys
> > 2020.03.22.19.56.072363.06 real  9482.36 user  7614.66 sys

One more with the same settings:

2020.04.09.11.10.072210.82 real  9435.36 user  4388.02 sys

That's a great reduction in system time in the last few weeks.
-- 
Andreas Gustafsson, g...@gson.org


Re: WRT the failing ATF tests (some of them)

2020-04-08 Thread Andreas Gustafsson
Robert Elz wrote:
> A bunch of the net yet fixed, relatively recent ATF test failures are
> caused by:
> 
>   rn_init: radix functions require max_keylen be set

I don't think there is a causal relationship between those messages
and any current test failures.  If it's any consolation, I made the 
same mistaken assumption back in 2011.

Tens or hundreds of those messages appear in the output from most test
runs starting in 2009, including ones where all tests passed.  I think
the only runs where they don't occur are those where the system was
unable to actually run the tests.

Can you please file a PR (about the messages being printed and
confusing people for a decade now, not about the tests being broken)?
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-04-07 Thread Andreas Gustafsson
NetBSD Test Fixture wrote:
> This is an automatically generated notice of new failures of the
> NetBSD test suite.
> 
> The newly failing test cases are:
> 
> fs/puffs/t_basic:root_chrdev
> fs/puffs/t_basic:root_fifo
> fs/puffs/t_basic:root_lnk
(etc)

These are already reported in kern/55146 as the automated report was
delayed until kern/54786 got fixed.  Some but not all of the failures
reported are already fixed; a more up-to-date list of the tests still
failing is at

  
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.04.06.20.26.16/test.html#failed-tcs-summary

-- 
Andreas Gustafsson, g...@gson.org


Re: Build time measurements

2020-04-06 Thread Andreas Gustafsson
Andrew,

You wrote:
> > 2016.09.06.06.27.173319.87 real  9767.39 user  4184.24 sys
> > 2019.10.18.17.16.503525.65 real 10309.00 user 11618.57 sys
> > 2020.03.17.22.03.412419.52 real  9577.58 user  9602.81 sys
> > 2020.03.22.19.56.072363.06 real  9482.36 user  7614.66 sys
> 
> Thanks for repeating the tests.  For the sys time to still be that high in
> relation to user, there's some other limiting factor.  Does that machine
> have tmpfs /tmp?

It is a fresh install with all default settings except for disabling
DIAGNOSTIC and acpicpu.  For 2020.03.22.19.56.07 that means it does
have a tmpfs /tmp, but I have not checked the others.

The SRCDIR, OBJDIR, etc are all on a single SATA SSD.

> Is NUMA enabled in the BIOS?  Different node number for
> CPUs in recent kernels in dmesg is a good clue.

Different from the other CPUs in the same dmesg, or different from a
non-recent kernel?  And how recent is recent?

> Is it a really old source tree?

Every build is of the official NetBSD-8.1/amd64 tree.

> I would be interested to see lockstat output from a kernel build at
> some point, if you're so inclined.

Is this just "lockstat build.sh ...", or are there some specific lockstat
options I should use?

PS. I would prefer that you prioritize fixing the fallout from the
changes you have already made so far over making further changes.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-04-03 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> --- dependall-gdb ---
> 
> CC=/tmp/bracket/build/2020.04.02.11.52.41-i386/tools/bin/i486--netbsdelf-c++ 
> /tmp/bracket/build/2020.04.02.11.52.41-i386/tools/bin/nbmkdep -f maint.d.tmp  
> --   -std=gnu++11   
> --sysroot=/tmp/bracket/build/2020.04.02.11.52.41-i386/destdir -D_KERNTYPES 
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/arch/i386
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/config
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/common
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gnulib/import
>  
> -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb/lib/libgdb/../../dist/include/opcode
>  -I/tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gdb
>  /lib/libgdb/../../dist/libdecn--- dependall-gcc ---
> 
> /tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gcc/dist/gcc/machmode.h:593:30:
>  error: 'mode_nunits_inline' was not declared in this scope
> ? mode_nunits_inline (mode) : mode_nunits[mode]);
>   ^
> *** [min-insn-modes.lo] Error code 1
> nbmake[9]: stopped in 
> /tmp/bracket/build/2020.04.02.11.52.41-i386/src/external/gpl3/gcc/usr.bin/backend
> --- gengtype.lo ---

This looks like a random failure, and there has been a couple of
other similar ones also involving machmode.h:

  
http://releng.netbsd.org/b5reports/i386/2019/2019.11.26.08.38.19/build.log.tail
  
http://releng.netbsd.org/b5reports/i386/2020/2020.03.15.15.58.24/build.log.tail

Someone please fix.
-- 
Andreas Gustafsson, g...@gson.org

Re: Build time measurements

2020-03-27 Thread Andreas Gustafsson
On Wednesday, I said:
> I will rerun the 24-core tests with these disabled for comparison.

Done.  To recap, with a stock GENERIC kernel, the numbers were:

2016.09.06.06.27.173321.55 real  9853.49 user  5156.92 sys
2019.10.18.17.16.503767.63 real 10376.15 user 16100.99 sys
2020.03.17.22.03.412910.76 real  9696.10 user 18367.58 sys
2020.03.22.19.56.072711.14 real  9729.10 user 12068.90 sys

After disabling DIAGNOSTIC and acpicpu, they are:

2016.09.06.06.27.173319.87 real  9767.39 user  4184.24 sys
2019.10.18.17.16.503525.65 real 10309.00 user 11618.57 sys
2020.03.17.22.03.412419.52 real  9577.58 user  9602.81 sys
2020.03.22.19.56.072363.06 real  9482.36 user  7614.66 sys

-- 
Andreas Gustafsson, g...@gson.org


Re: Build time measurements

2020-03-25 Thread Andreas Gustafsson
Andrew,

You wrote:
> Thank you for doing this, and for bisecting the performance losses over
> time (I fixed the vnode regression you found BTW).

Thank you for the fix and the other performance improvements!

> There are two options enabled in -current that spoil performance on multi
> processor machines: DIAGNOSTIC and acpicpu.  I'm guessing that you had both
> enabled during your test runs.

Yes, my tests so far have all been using unmodified GENERIC kernels.

> We ship releases without DIAGNOSTIC, and acpicpu really needs to be
> fixed.

I will rerun the 24-core tests with these disabled for comparison.
-- 
Andreas Gustafsson, g...@gson.org


Build time measurements

2020-03-23 Thread Andreas Gustafsson
Hi all,

In September and November, I reported some measurements of the amount
of system time it takes to build a NetBSD-8/amd64 release on different
versions of -current/amd64.  I have now repeated the measurements with
a couple of newer versions of -current on the same hardware, and here
are the results.  The left column is the source date of the -current
system hosting the build.

  HP ProLiant DL360 G7, 2 x Xeon L5630, 8 cores, 32 GB, build.sh -j 8

  2016.09.06.06.27.173930.86 real 15737.04 user  4245.26 sys
  2019.10.18.17.16.504461.47 real 16687.37 user  9344.68 sys
  2020.03.17.22.03.414723.81 real 16646.42 user  8928.72 sys
  2020.03.22.19.56.074595.95 real 16592.80 user  8171.56 sys

I also measured the same versions on a newer machine with more cores:

  Dell PowerEdge 630, 2 x Xeon E5-2678 v3, 24 cores, 32 GB, build.sh -j 24

  2016.09.06.06.27.173321.55 real  9853.49 user  5156.92 sys
  2019.10.18.17.16.503767.63 real 10376.15 user 16100.99 sys
  2020.03.17.22.03.412910.76 real  9696.10 user 18367.58 sys
  2020.03.22.19.56.072711.14 real  9729.10 user 12068.90 sys

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-03-07 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> cc1: all warnings being treated as errors
> *** [t_ptrace_wait.o] Error code 1

The compiler error message did not appare because it was too far
back from the end of the build log (5149 lines):

--- dependall-sys ---
/tmp/bracket/build/2020.03.07.14.53.14-i386/src/tests/lib/libc/sys/t_ptrace_wait.c:
 In function 'traceme_crash':
/tmp/bracket/build/2020.03.07.14.53.14-i386/src/tests/lib/libc/sys/t_ptrace_wait.c:441:24:
 error: implicit declaration of function 'are_fpu_exceptions_supported'; did 
you mean 'are_fpu_exceptions_supporter'? [-Werror=imp\
licit-function-declaration]
  if (sig == SIGFPE && !are_fpu_exceptions_supported())
^~~~
are_fpu_exceptions_supporter

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-03-06 Thread Andreas Gustafsson
This morning, the NetBSD Test Fixture wrote:
> The newly failing test case is:
> 
> net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
> 
> The above test failed in each of the last 3 test runs, and passed in
> at least 27 consecutive runs before that.
> 
> The following commits were made between the last successful test and
> the failed test:
> 
> 2020.03.04.22.00.03 ad src/sys/arch/x86/x86/pmap.c,v 1.362
> 2020.03.04.22.07.08 christos src/external/bsd/Makefile,v 1.68
> 2020.03.04.22.09.00 christos src/distrib/sets/lists/base/mi,v 1.1231
> 2020.03.04.22.09.00 christos src/distrib/sets/lists/debug/mi,v 1.296
> 2020.03.04.22.24.46 fcambus src/share/misc/inter.phone,v 1.32
> 2020.03.04.22.56.08 jmcneill src/external/bsd/Makefile,v 1.69

I'm not sure what happened here.  The ipsecif_natt_transport_null and
ipsecif_natt_transport_rijndaelcbc test cases both failed in the same
three consecutive tests which seems unlikely to be a coincidence, but
whatever it was, it appears to have been resolved as they have both
passed twice since then.

Here are the outcomes of the last 40 runs for the two test cases
with "-" meaning success and "X" meaning failure:

  XX-X--X-X--XXX--   
net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
  -X-XXX--   
net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-03-03 Thread Andreas Gustafsson
NetBSD Test Fixture wrote:
> *** [cleandir-pamu2fcfg] Error code 2
> nbmake[7]: stopped in 
> /tmp/bracket/build/2020.03.03.00.47.33-i386/src/external/bsd/pam-u2f/bin

The build is still broken as of source date 2020.03.03.08.56.05:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.03.html#2020.03.03.08.56.05

Would it be too much to ask that imports of entire new subsystems like
this be at least build tested with "build.sh release"?
-- 
Andreas Gustafsson, g...@gson.org


Re: Regressions

2020-03-01 Thread Andreas Gustafsson
Jason Thorpe wrote:
> The issue seems to be that rump really wants to join threads that
> are created for work queues when the rump server exits.  But in this
> particular case, there's a global work queue that never goes away
> because in the real kernel, there's no need to do this before the
> system reboots / shuts down.  Any change to fix this will be 100%
> for the appeasement of rump.

Well, yes, just like any change to fix the current build breakage in
if_stge.c will be 100% for the appeasement of 32-bit platforms.
Are you saying fixing one or the other is not your responsibility,
and if so, whose?
-- 
Andreas Gustafsson, g...@gson.org


Regressions

2020-03-01 Thread Andreas Gustafsson
Hi all,

NetBSD-current is again suffering from a number of regressions.  The
last time the ATF tests showed zero unexpected failures on real amd64
hardware was on Dec 12, and the sparc, sparc64, pmax, and hpcmips
tests have all been unable to run to completion for more than a month.

Here are the PRs for some of the issues:

  50350 rump/rumpkern/t_sp/stress_{long,short} fail on Core 2 Quad
  54810 sparc64 pool_redzone_check errors during install
  54845 sparc panics in sleepq_remove
  54923 pmax test runs fail to complete since Jan 15
  55018 atf tests for pppoe sometimes leave rump_server processes around
  55020 dbregs_dr?_dont_inherit_lwp test cases fail on real hardware
  55032 rump/rumpkern/t_vm:uvmwait test case now fails

What can be done?
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-02-29 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> *** [hifn7751.o] Error code 1
> nbmake[8]: stopped in 
> /tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/modules/hifn

Specifically:

--- dependall-hifn ---
In file included from 
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:53:
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c: In 
function 'hifn_rng_locked':
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:692:13: 
error: comparison of integer expressions of different signedness: 'unsigned 
int' and 'int' [-Werror=sign-compare]
nwords = MIN(__arraycount(num), nwords);
 ^~~
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:692:13: 
error: operand of ?: changes signedness from 'int' to 'unsigned int' due to 
unsignedness of other operand [-Werror=sign-compare]
nwords = MIN(__arraycount(num), nwords);
 ^~~
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c: In 
function 'hifn_next_signature':
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:850:16: 
error: comparison of integer expressions of different signedness: 'int' and 
'u_int' {aka 'unsigned int'} [-Werror=sign-compare]
  for (i = 0; i < cnt; i++) {
^
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c: In 
function 'hifn_ramtype':
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:1134:16: 
error: comparison of integer expressions of different signedness: 'int' and 
'unsigned int' [-Werror=sign-compare]
  for (i = 0; i < sizeof(data); i++)
^
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:1145:16: 
error: comparison of integer expressions of different signedness: 'int' and 
'unsigned int' [-Werror=sign-compare]
  for (i = 0; i < sizeof(data); i++)
^
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c: In 
function 'hifn_sramsize':
/tmp/bracket/build/2020.02.29.11.03.44-i386/src/sys/dev/pci/hifn7751.c:1171:16: 
error: comparison of integer expressions of different signedness: 'int32_t' 
{aka 'int'} and 'unsigned int' [-Werror=sign-compare]
  for (i = 0; i < sizeof(data); i++)
        ^

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-02-26 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> nbmake[8]: nbmake[8]: don't know how to make libpam_echo.so.. Stop

This was fixed but the build is still broken as of 2020.02.27.03.25.08:

  #  link  rescue/rescue
  [...]
  
/tmp/bracket/build/2020.02.27.03.25.08-i386/tools/lib/gcc/i486--netbsdelf/8.3.0/../../../../i486--netbsdelf/bin/ld:
 
/tmp/bracket/build/2020.02.27.03.25.08-i386/destdir/usr/lib/libssh.a(sshkey.o): 
in function `.L1885':
  sshkey.c:(.text+0x83d9): undefined reference to `sshsk_sign'
  collect2: error: ld returned 1 exit status
  *** [rescue] Error code 1

More logs:

  
http://releng.netbsd.org/b5reports/i386/commits-2020.02.html#2020.02.27.03.25.08

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-02-14 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> nbmake[8]: stopped in 
> /tmp/bracket/build/2020.02.14.04.38.48-i386/src/sys/modules/drmkms
> 1 error

--- dependall-sys ---
/tmp/bracket/build/2020.02.14.04.38.48-i386/src/sys/external/bsd/drm2/dist/drm/drm_bufs.c:958:40:
 error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
buf->address = (void *)(dmah->vaddr + offset);
        ^
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-02-11 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> The newly failing test case is:
> 
> net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_null
> 
> The above test failed in each of the last 3 test runs, and passed in
> at least 27 consecutive runs before that.

The fourth test run passed, so this looks like another random
occurrcence made more likely by the high frequency of ipsec
test failures reported in PR 54897.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2020-01-29 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> The newly failing test case is:
> 
> net/ipsec/t_ipsec_tunnel:ipsec_tunnel_ipv4_ah_keyedmd5
> 
> The above test failed in each of the last 3 test runs, and passed in
> at least 27 consecutive runs before that.
> 
> The following commits were made between the last successful test and
> the failed test:
> 
> 2020.01.28.07.43.42 martin src/usr.sbin/sysinst/partitions.c,v 1.10
> 2020.01.28.07.47.26 skrll src/sys/arch/arm/mainbus/cpu_mainbus.c,v 1.17
> 2020.01.28.08.09.19 martin src/sys/dev/fdt/fdtbus.c,v 1.32
> 2020.01.28.09.23.15 ad src/lib/libpthread/pthread.c,v 1.158

This is probably unrelated to the commits listed.  As reported in
PR 54897, many IPSEC tests are failing randomly since Feb 15, and with
so many randomly failing tests, one of them failing three times in a
row after succeeding 27 times is not all that unlikely.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2020-01-09 Thread Andreas Gustafsson
The i386 build is still failing as of source date 2020.01.09.04.04.01:

  --- dependall-exec_elf32 ---
  /tmp/bracket/build/2020.01.09.04.04.01-i386/src/sys/kern/core_elf32.c: In 
function 'coredump_note_elf32':
  /tmp/bracket/build/2020.01.09.04.04.01-i386/src/sys/kern/core_elf32.c:518:2: 
error: 'PT32_GETXSTATE' undeclared (first use in this function); did you mean 
'PT_GETXSTATE'?
COREDUMP_MACHDEP_LWP_NOTES(l, ns, name);
^~
  /tmp/bracket/build/2020.01.09.04.04.01-i386/src/sys/kern/core_elf32.c:518:2: 
note: each undeclared identifier is reported only once for each function it 
appears in
  *** [core_elf32.o] Error code 1

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2019-12-18 Thread Andreas Gustafsson
Andrew Doran wrote:
> > sbin/resize_ffs/t_grow:grow_16M_v1_16384
> > sbin/resize_ffs/t_grow:grow_16M_v2_32768
> > sbin/resize_ffs/t_grow_swapped:grow_16M_v0_65536
> > sbin/resize_ffs/t_grow_swapped:grow_16M_v1_4096
> > sbin/resize_ffs/t_grow_swapped:grow_16M_v2_8192
> > sbin/resize_ffs/t_shrink:shrink_24M_16M_v0_32768
> > sbin/resize_ffs/t_shrink:shrink_24M_16M_v1_65536
> > sbin/resize_ffs/t_shrink_swapped:shrink_24M_16M_v0_4096
> > sbin/resize_ffs/t_shrink_swapped:shrink_24M_16M_v1_8192
> 
> Hmm, I wonder I this is a rump issue.  In any case I'll take a look into the
> failures this evening.

Looks like the resize_ffs failures have been fixed already:

  
http://www.gson.org/netbsd/bugs/build/i386-baremetal/commits-2019.12.html#2019.12.17.18.59.39

The lfs ones are still failing.
-- 
Andreas Gustafsson, g...@gson.org


Re: current/Xen i386 broken on 2019-12-16 01:20 UTC

2019-12-18 Thread Andreas Gustafsson
Martin Husemann wrote:
> We see that on various architectures.

Indeed.  Here's one from i386 under qemu/KVM under Linux, with a more
helpful backtrace than the Xen one:

  ipsec_l2tp_ipv6_tunnel_ah_null: [18.164581s] Passed.
  ipsec_l2tp_ipv6_tunnel_esp_null: [19.390658s] Passed.
  ipsec_l2tp_ipv6_tunnel_esp_rijndaelcbc: [ 4272.8545386] panic: kernel 
diagnostic assertion "pg->offset >= nextoff" failed: file 
"/bracket/build/2019.12.15.23.13.33-i386/src/sys/miscfs/genfs/genfs_io.c", line 
972
  [ 4272.8545386] cpu0: Begin traceback...
  [ 4272.8545386] 
vpanic(c10df2ac,c52e9c98,c52e9dd8,c09f5e28,c10df2ac,c10df1ef,c11b1014,c11b0c0c,3cc,0)
 at netbsd:vpanic+0x139
  [ 4272.8545386] 
kern_assert(c10df2ac,c10df1ef,c11b1014,c11b0c0c,3cc,0,0,c52e9cf0,c0974f02,c17a188c)
 at netbsd:kern_assert+0x23
  [ 4272.8545386] 
genfs_do_putpages(c1a309dc,0,0,0,0,8011,0,c52e9e30,c09f23cb,c52e9e10) at 
netbsd:genfs_do_putpages+0x75e
  [ 4272.8648708] genfs_putpages(c52e9e10,0,10,c10548c4,c1a309dc,0,0,0,0,8011) 
at netbsd:genfs_putpages+0x3f
  [ 4272.8648708] VOP_PUTPAGES(c1a309dc,0,0,0,0,8011,6,c1a309dc,0,0) at 
netbsd:VOP_PUTPAGES+0x4c
  [ 4272.8648708] 
vflushbuf(c1a309dc,8,c180d300,c1972100,3d5a9ff1,c1972c00,c1972100,4,c0155f51,c52e9f3c)
 at netbsd:vflushbuf+0x62
  [ 4272.8648708] 
ffs_full_fsync(c1a309dc,8,c0980010,30,c09e3a39,10,c1992008,c1a309dc,c52e9f08,102)
 at netbsd:ffs_full_fsync+0x134
  [ 4272.8745850] 
ffs_fsync(c52e9f3c,c52e9f60,c09e7529,c1054c18,c1a309dc,c1808040,8,0,0,0) at 
netbsd:ffs_fsync+0x127
  [ 4272.8745850] VOP_FSYNC(c1a309dc,c1808040,8,0,0,0,0,5df6e568,0,c1ba7188) at 
netbsd:VOP_FSYNC+0x4f
  [ 4272.8745850] sched_sync(c1972c00,16bd000,16c8000,0,c01005a3,0,0,0,0,0) at 
netbsd:sched_sync+0x1f0
  [ 4272.8745850] cpu0: End traceback...

This is from:

  
http://www.gson.org/netbsd/bugs/build/i386-linuxhost/2019/2019.12.15.23.13.33/test.log

There's also:

  http://releng.netbsd.org/b5reports/i386/2019/2019.12.15.22.50.51/install.log

  
http://releng.netbsd.org/b5reports/sparc64/2019/2019.12.16.00.03.50/install.log
  
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 test failure

2019-12-18 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
> The newly failing test case is:
> 
> sbin/resize_ffs/t_grow:grow_16M_v0_8192
> 
> The above test failed in each of the last 4 test runs, and passed in
> at least 36 consecutive runs before that.
> 
> The following commits were made between the last successful test and
> the failed test:
> 
> 2019.12.15.23.13.33 uwe src/lib/libpthread/pthread_rwlock.c,v 1.35
> 2019.12.16.00.03.50 jmcneill src/sys/arch/aarch64/aarch64/efi_machdep.c,v 
> 1.5
> 2019.12.16.00.03.50 jmcneill src/sys/arch/arm/arm/efi_runtime.c,v 1.3
> 2019.12.16.00.03.50 jmcneill src/sys/arch/arm/arm/efi_runtime.h,v 1.3

There was a second report that showed several other resize_ffs test
cases already failing before these commits were made, so it seems
likely that this particular test case just happened to randomly pass
in the first run after the bug was introduced, which would mean these
commits are innocent.
-- 
Andreas Gustafsson, g...@gson.org


New ATF test failures

2019-12-15 Thread Andreas Gustafsson
The following test cases are now failing on multiple testbeds:

  dev/sysmon/t_swsensor/alarm_sensor
  dev/sysmon/t_swsensor/entropy_interrupt_sensor
  dev/sysmon/t_swsensor/entropy_polled_sensor
  dev/sysmon/t_swsensor/limit_sensor
  dev/sysmon/t_swsensor/simple_sensor
  net/if_vlan/t_vlan/vlan_auto_follow_mtu
  net/if_vlan/t_vlan/vlan_auto_follow_mtu6
  net/if_vlan/t_vlan/vlan_basic
  net/if_vlan/t_vlan/vlan_basic6
  net/if_vlan/t_vlan/vlan_bridge
  net/if_vlan/t_vlan/vlan_bridge6
  net/if_vlan/t_vlan/vlan_configs
  net/if_vlan/t_vlan/vlan_configs6
  net/if_vlan/t_vlan/vlan_create_destroy
  net/if_vlan/t_vlan/vlan_create_destroy6
  net/if_vlan/t_vlan/vlan_multicast
  net/if_vlan/t_vlan/vlan_multicast6
  net/if_vlan/t_vlan/vlan_vlanid
  net/if_vlan/t_vlan/vlan_vlanid6

since these commits:

  2019.12.12.22.55.20 pgoyette src/sys/kern/files.kern 1.39
  2019.12.12.22.55.20 pgoyette src/sys/kern/init_main.c 1.509
  2019.12.12.22.55.20 pgoyette src/sys/kern/kern_module.c 1.141
  2019.12.12.22.55.20 pgoyette src/sys/kern/kern_module_hook.c 1.1
  2019.12.12.22.55.20 pgoyette src/sys/rump/librump/rumpkern/Makefile.rumpkern 
1.178
  2019.12.12.22.55.20 pgoyette src/sys/sys/module_hook.h 1.6
  2019.12.12.22.55.20 pgoyette src/sys/sys/param.h 1.624

For logs, see:

  
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2019.12.html#2019.12.12.22.55.20

-- 
Andreas Gustafsson, g...@gson.org


Build breakage

2019-12-12 Thread Andreas Gustafsson
Hi all,

As of source date 2019.12.12.05.00.33, the evbarm-earmv7hf build is
failing with:

  --- kern_module.o ---
  
/tmp/bracket/build/2019.12.12.11.47.30-evbarm-earmv7hf/src/lib/librump/../../sys/rump/../kern/kern_module.c:
 In function 'module_init':
  
/tmp/bracket/build/2019.12.12.11.47.30-evbarm-earmv7hf/src/lib/librump/../../sys/rump/../kern/kern_module.c:456:20:
 error: implicit declaration of function 'pserialize_create'; did you mean 
'sysctl_create'? [-Werror=implicit-function-declaration]
module_hook_psz = pserialize_create();
  ^
  sysctl_create
  cc1: all warnings being treated as errors
  *** [kern_module.o] Error code 1

and the sparc build is also failing with a similar error.
-- 
Andreas Gustafsson, g...@gson.org


Re: Current test failures

2019-12-07 Thread Andreas Gustafsson
Taylor R Campbell wrote:
> OOPS -- rmind removed pserialize_init from rump_init, so the mutex
> never got initialized.  Fixed in rump.c 1.337!

Perhaps, but before Taylor made that commit, at least one other bug
was introduced that is causing the system to panic before finishing
the tests:

fs/vfs/t_renamerace (726/847): 28 test cases
ext2fs_renamerace: [6.743565s] Failed: Test program received signal 11 
(core dumped)
ext2fs_renamerace_dirs: [6.690776s] Failed: Test program received signal 11 
(core dumped)
ffs_renamerace: [6.602727s] Failed: Test program received signal 11 (core 
dumped)
ffs_renamerace_dirs: [ 3923.9308316] panic: kernel diagnostic assertion 
"l->l_cpu == ci" failed: file 
"/tmp/bracket/build/2019.12.06.21.45.14-amd64-baremetal/src/sys/kern/kern_synch.c",
 line 764
[ 3924.1108893] cpu7: Begin traceback...
[ 3924.1509019] vpanic() at netbsd:vpanic+0x178
[ 3924.2009181] kern_assert() at netbsd:kern_assert+0x48
[ 3924.2609379] mi_switch() at netbsd:mi_switch+0x569
[ 3924.3209576] sleepq_block() at netbsd:sleepq_block+0xb7
[ 3924.3809774] lwp_park() at netbsd:lwp_park+0x10d
[ 3924.4409956] syslwp_park60() at netbsd:syslwp_park60+0x5d
[ 3924.5110189] syscall() at netbsd:syscall+0x299
[ 3924.5610351] --- syscall (number 478) ---
[ 3924.6110531] 7adcb44b035a:
[ 3924.6410624] cpu7: End traceback...

More logs at:

  
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2019.12.html#2019.12.07.14.55.58

Could everyone please refrain from committing new kernel-crashing bugs
until the test infrastructure has recovered from the previous round?
-- 
Andreas Gustafsson, g...@netbsd.org


Re: Current test failures

2019-12-07 Thread Andreas Gustafsson
Martin Husemann wrote:
> Here is a simple recipe to reproduce the massive test lossage in -current:
> 
>   cd /usr/tests/dev/raidframe && atf-run

I have now bisected it down to the following commits:

  2019.12.05.03.21.08 riastradh src/sys/kern/subr_percpu.c 1.20
  2019.12.05.03.21.17 riastradh src/sys/kern/subr_pserialize.c 1.16
  2019.12.05.03.21.29 riastradh src/sys/kern/subr_pserialize.c 1.17
  2019.12.05.03.21.42 riastradh src/external/cddl/osnet/sys/sys/opentypes.h 1.5

-- 
Andreas Gustafsson, g...@netbsd.org


Testbed breakage

2019-12-06 Thread Andreas Gustafsson
Hi all,

For the last few days, most of the testbeds have been seeing the
system under test either hang or panic before the ATF tests have run
to completion.  The failures are too many and varied to file a PR
about each, but for a start, you can look for "tests: did not
complete" in the following:

  http://releng.netbsd.org/b5reports/i386/commits-2019.12.html
  http://releng.netbsd.org/b5reports/amd64/commits-2019.12.html
  http://releng.netbsd.org/b5reports/evbarm-aarch64/commits-2019.12.html
  http://releng.netbsd.org/b5reports/pmax/commits-2019.12.html

For sparc, there is PR 54734.  Both qemu and gxemul based testbeds are
failing, but my i386 and amd64 testbeds running on real hardware are
not (other than the latest amd64 test run showing 1336 new test
failures, which looks like an unrelated bug).  That the failing hosts
are uniprocessors and the working ones are multiprocessors may or may
not be a coincidence.

Please help find and fix the offending commit(s); until that is done,
there can be very little automated testing of new commits.
-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Steffen Nurpmeso wrote:
> This thread reminds me of me turning off hyperthreading.
> Using the four cores i have with HT turned on results in a 40
> percent time penalty compared to when its off.  (For example,
> compiling the Linux kernel 4.19.X takes almost exactly 10 minutes
> when it is turned off, and about 14 minutes when it is turned
> on.  Just a thought.)

FWIW, these tests were run with hyperthreading disabled.
-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Mateusz Guzik wrote:
> >   http://www.gson.org/netbsd/bugs/system-time/fg.svg
> 
> First thing which jumps at me is DIAGNOSTIC being on (seen with e.g.,
> _vstate_assert). Did your older kernels have it? If you just compiled
> GENERIC from release branches it is presumably removed, so would be
> nice to retest without it.

All the versions tested were built from the CVS trunk, and all used
the GENERIC kernel.  The only thing from a release branch was the
build target (8.1), which was the same in all test runs.

> That said, can you rerun without DIGANOSTIC but with lockstat?

I'd rather leave that to someone else, and to a separate thread.  All
the test results presented in this thread were produced with the same
options so that they can be meaningfully compared, and running new
tests with different options would only confuse things.
-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Jaromír Doleček wrote:
> I wonder also if we could try enabling vm.ubc_direct on the build machine?

Using 2019.11.14.13.58.22 sources:

with default settings:
4612.56 real 16896.10 user  9325.87 sys

with vm.ubc_direct = 1:
4615.95 real 16819.96 user  9416.13 sys

-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Mateusz Guzik wrote:
> Can you get a kernel-side flamegraph?

Done, using sources from 2019.11.14.13.58.22:

  http://www.gson.org/netbsd/bugs/system-time/fg.svg

-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-14 Thread Andreas Gustafsson
Michael van Elst wrote:
> g...@gson.org (Andreas Gustafsson) writes:
> 
> >mitigations, which I guess is not really surprising.  But the 12% net
> >increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
> >to call for closer investigation.
> 
> Is this also reflected in real time?

Only partly.  With the jemalloc change, the system time increased by
938 seconds, but the real time only increased by 170 seconds, and the
user time decreased by 89 seconds:

  2019.03.08.20.34.24/build_8.log.gz: 4229.21 real 16686.03 user  
7932.08 sys
  2019.03.08.20.35.10/build_8.log.gz: 4398.88 real 16597.25 user  
8870.49 sys

With the vfs_vnode.c change, the system time increased by 305 seconds,
but the real time only increased by 35 seconds:

  2016.12.14.15.48.55/build_8.log.gz: 3934.44 real 15707.68 user  
4243.02 sys
  2016.12.14.15.49.35/build_8.log.gz: 3969.58 real 15718.85 user  
4548.50 sys

-- 
Andreas Gustafsson, g...@gson.org


Increases in build system time

2019-11-14 Thread Andreas Gustafsson


Hi all,

Back in September, I wrote:
> I'm trying to run a bisection to determine why builds hosted on recent
> versions of NetBSD seem to be taking significantly more system time
> than they used to, building the same thing.

I finally have some results to report.  These are from builds of the
NetBSD-8/amd64 release hosted on various versions of -current/amd64,
on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all).  The
amount of system time taken by each build was measured using time(1).

Between a -current from September 2016 and one from October 2019, the
system time more than doubled, from 4245 seconds to 9344 seconds.
The time(1) output from the oldest and newest version was:

3930.86 real 15737.04 user  4245.26 sys
4461.47 real 16687.37 user  9344.68 sys

This means that on the recent -current, on average, roughly four of
the eight cores were executing the build tools (compilers, etc),
roughly two were executing the kernel, and the remaining two were
presumably idle.

The increase did not happen all at once but in several smaller steps
as shown in this graph:

  http://www.gson.org/netbsd/bugs/system-time/graph.png

For each step, finding the commits that caused it required a separate
bisection.  Each bisection took 1-2 days to run, so I have only
bisected the largest steps, those of 5 percent or more.  They are
listed below in order from largest to smallest, with CVS revisions
and commit messages.

  38% increase:

2018.04.04.12.59.49 maxv src/sys/arch/amd64/amd64/machdep.c 1.303
2018.04.04.12.59.49 maxv src/sys/arch/x86/include/cpu.h 1.91
2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/cpu.c 1.154
2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/spectre.c 1.8

Enable the SpectreV2 mitigation by default at boot time.

  12% increase:

2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108

Back to using jemalloc for x86_64; all problems have been resolved.

  9% increase:

2018.02.26.05.52.50 maxv src/sys/arch/amd64/conf/GENERIC 1.485

Enable SVS by default.

  7% increase:

2016.12.14.15.49.35 hannken src/sys/kern/vfs_vnode.c 1.63

Change the freelists to lrulists, all vnodes are always on one
of the lists.  Speeds up namei on cached vnodes by ~3 percent.

Merge "vrele_thread" into "vdrain_thread" so we have one thread
working on the lrulists.  Adapt vfs_drainvnodes() to always wait
for a complete cycle of vdrain_thread().

  5% increase:

2018.04.07.22.39.31 christos src/external/Makefile 1.21
2018.04.07.22.39.31 christos src/external/README 1.16
[302 more revisions by christos elided]
2018.04.07.22.39.53 christos src/external/bsd/Makefile 1.59
2018.04.07.22.41.55 christos src/doc/3RDPARTY 1.1515
2018.04.07.22.41.55 christos src/doc/CHANGES 1.2376
2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/ALL 1.85
2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/GENERIC 1.489
2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/ALL 1.437
2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/GENERIC 1.1177
2018.04.08.01.30.01 christos src/external/mpl/Makefile 1.1

[Too many commit messages to list here, but the following from
mrg's commit of src/sys/arch/amd64/conf/GENERIC 1.489 may
be relevant]

turn on GCC spectre v2 mitigation options.

  5% increase:

2019.03.10.15.32.42 christos src/external/bsd/jemalloc/lib/Makefile.inc 1.5

turn on debugging to help find problems

  5% decrease:

2019.07.23.06.31.20 martin src/external/bsd/jemalloc/lib/Makefile.inc 1.10

Disable JEMALLOC_DEBUG, it served us well, but now we want performance
back. Discussed with christos.

To summarize, most of the increase was due to Spectre and Meltdown
mitigations, which I guess is not really surprising.  But the 12% net
increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
to call for closer investigation.
-- 
Andreas Gustafsson, g...@gson.org


Re: vm.ubc_direct

2019-11-14 Thread Andreas Gustafsson
Patrick Welche wrote:
> I have been running with vm.ubc_direct=1 and feeling a speedup and no
> inconveniences on multicore systems. What are thoughts on having it as
> a default?

No such option is documented, hence the question makes no sense.
-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2019-11-12 Thread Andreas Gustafsson
Robert Elz wrote:
> There's no info in the log (the available log on the website) to allow
> anyone to work out what happened there (why is make debug enabled?  The
> log that is there is filled with make debug noise)

I don't know the answer to this, but the question is known as PR 53561.
-- 
Andreas Gustafsson, g...@gson.org


Re: Xen kernel diagnostic assertion "powerof2(align)" failed

2019-11-03 Thread Andreas Gustafsson
Manuel Bouyer wrote:
> Hello,
> the xen test run for the 201911010850Z build fails with:
> [   1.1185040] panic: kernel diagnostic assertion "powerof2(align)" failed: 
> file "/usr/src/sys/uvm/uvm_map.c", line 196
[...]
> Has it been fixed since then ?

Yes, by:

  commit 2019.11.01.13.04.22 rin src/sys/uvm/uvm_map.c 1.366

-- 
Andreas Gustafsson, g...@gson.org


Re: time(1) reporting corrupted system time

2019-10-31 Thread Andreas Gustafsson
Mateusz Guzik wrote:
> Hi, I failed to find a follow up to this.
> 
> I see someone gave the you the fix for corrupted time accounting.
> Did you get around to finding the offending commit?

For the corrupted system time, I believe the offending commit was
kern_resource.c 1.180, and the fix was 1.182.

As for the increased system time taken by release builds, it has
happened in multiple steps.  I have bisected the largest increases,
but analyzing and writing up the results for current-users is still
on my "to do" list.
-- 
Andreas Gustafsson, g...@gson.org


-current panics on boot in atabus_alloc_drives()

2019-10-22 Thread Andreas Gustafsson
Christos,

Both i386 and amd64 are failing to install on the testbed due to the
install kernel panicing on boot.  The amd64 console log contains a
backtrace:

  [   1.0264751] panic: lock error: Mutex: mutex_vector_exit,742: exiting 
unheld spin mutex: lock 0xc83161f0 cpu 0 lwp 0xc217e7c90540
  [   1.0264751] cpu0: Begin traceback...
  [   1.2751115] vpanic() at netbsd:vpanic+0x160
  [   1.2751115] snprintf() at netbsd:snprintf
  [   1.3032327] lockdebug_abort() at netbsd:lockdebug_abort+0xee
  [   1.3032327] mutex_vector_exit() at netbsd:mutex_vector_exit+0xbd
  [   1.3032327] atabus_alloc_drives() at netbsd:atabus_alloc_drives+0x61
  [   1.3231953] wdc_drvprobe() at netbsd:wdc_drvprobe+0x40
  [   1.3432056] atabusconfig() at netbsd:atabusconfig+0x65
  [   1.3432056] atabus_thread() at netbsd:atabus_thread+0x7e
  [   1.3432056] cpu0: End traceback...

This is from:

  http://releng.netbsd.org/b5reports/amd64/2019/2019.10.21.19.00.11/install.log

On i386, the only kernel commits between the last sucesss and the failure
were:

  2019.10.21.18.37.47 christos src/sys/dev/ata/ata.c 1.152
  2019.10.21.18.58.57 christos src/sys/dev/ata/ata.c 1.153
  2019.10.21.19.00.11 christos src/sys/dev/pci/satalink.c 1.57

-- 
Andreas Gustafsson, g...@gson.org


Re: time(1) reporting corrupted system time

2019-09-29 Thread Andreas Gustafsson
Michael van Elst wrote:
> First there was a change to precent that user/system time are decreasing
> in kern-resource.c 1.180. But it shouldn't be related to negative numbers.
> 
> Additionally a possible underflow of user/system time was fixed in
> kern_resource.c 1.182. This prevents negative numbers, but IIRC this
> would only happen for very small values, not when values already
> accumulated to a few thousand seconds.

Thanks.  I think what happened is that 1.180 caused the bug, and 1.182
fixed it.  In any case, I think I now have what I need to back-port
the fix and bisect the other bug.
-- 
Andreas Gustafsson, g...@gson.org


time(1) reporting corrupted system time

2019-09-29 Thread Andreas Gustafsson
Hi all,

I'm trying to run a bisection to determine why builds hosted on recent
versions of NetBSD seem to be taking significantly more system time
than they used to, building the same thing.

My efforts are hampered by time(1) reporting corrupted system times on
certain past versions of -current:

  2017.01.01.03.06.06/build_8.log: 3562.32 real 15806.10 user  
4893.62 sys
  2018.05.21.10.28.13/build_8.log: 4250.22 real 16835.23 user 
608742554440425.55 sys
  2019.01.30.20.20.36/build_8.log: 4228.25 real 16801.48 user 
700976274808841.24 sys
  2019.09.27.08.57.12/build_8.log: 4488.49 real 16670.79 user  
9279.25 sys

Does anyone happen to know which commits caused and/or fixed this?
This information could save me a couple of days of bisection run time.
-- 
Andreas Gustafsson, g...@gson.org


Re: [Small Heads up] USE_SHLIBDIR=yes added some some library Makefiles

2019-09-23 Thread Andreas Gustafsson
Brad Spencer wrote:
> I committed a change today to add USE_SHLIBDIR=yes to the libraries used
> by /sbin/{zfs,mount_zfs,zpool}.  The general effect will be to move the
> libraries from /usr/lib to /lib and put compatibility links in place so
> that things, say in /usr/pkg, continue to work as expected.  Run tested
> on amd64 and i386 and compile tested on evbarm.  This will allow /usr
> and /var to be mounted as a ZFS legacy filesystem and keeping with the
> apparent pattern of having items in /sbin depend on items in /lib and
> not /usr/lib.
> 
> Sorry if this breaks anything.

Looks like it broke MKDEBUG=yes builds:

  ===  8 extra files in DESTDIR  =
  Files in DESTDIR but missing from flist.
  File is obsolete or flist is out of date ?
  --
  ./usr/libdata/debug/lib/libavl.so.0.0.debug
  ./usr/libdata/debug/lib/libnvpair.so.0.0.debug
  ./usr/libdata/debug/lib/libpthread.so.1.4.debug
  ./usr/libdata/debug/lib/libumem.so.0.0.debug
  ./usr/libdata/debug/lib/libuutil.so.0.0.debug
  ./usr/libdata/debug/lib/libzfs.so.0.0.debug
  ./usr/libdata/debug/lib/libzfs_core.so.0.0.debug
  ./usr/libdata/debug/lib/libzpool.so.0.0.debug
  =  end of 8 extra files  ===

  *** [checkflist] Error code 1

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2019-09-16 Thread Andreas Gustafsson
The build is still failing as of source date 2019.09.16.04.59.32,
now with a different error:

  ==  2 missing files in DESTDIR  
  Files in flist but missing from DESTDIR.
  File wasn't installed ?
  --
  ./usr/share/man/html8/mount_zfs.html
  ./usr/share/man/man8/mount_zfs.8
    end of 2 missing files  ==

-- 
Andreas Gustafsson, g...@gson.org


Re: Automated report: NetBSD-current/i386 build failure

2019-09-08 Thread Andreas Gustafsson
The build is still failing as of source date 2019.09.08.11.53.23, with:

  /tmp/bracket/build/2019.09.08.11.53.23-i386/src/sys/dev/usb/xhci.c: In 
function 'xhci_address_device':
  /tmp/bracket/build/2019.09.08.11.53.23-i386/src/sys/dev/usb/xhci.c:2893:26: 
error: suggest braces around empty body in an 'else' statement 
[-Werror=empty-body]
 icp, slot_id, 0, 0);

-- 
Andreas Gustafsson, g...@gson.org


  1   2   3   >