obsolete manhtml files not deleted

2024-04-11 Thread Thomas Klausner
Hi!

I just upgraded from a couple months ago and found:

Only in /usr/share/man/html8: dnssec-dsfromkey.html
Only in /usr/share/man/html8: dnssec-importkey.html
Only in /usr/share/man/html8: dnssec-keyfromlabel.html
Only in /usr/share/man/html8: dnssec-keygen.html
Only in /usr/share/man/html8: dnssec-revoke.html
Only in /usr/share/man/html8: dnssec-settime.html
Only in /usr/share/man/html8: dnssec-signzone.html
Only in /usr/share/man/html8: dnssec-verify.html
Only in /usr/share/man/html8: named-checkconf.html
Only in /usr/share/man/html8: named-checkzone.html
Only in /usr/share/man/html8: named-compilezone.html
Only in /usr/share/man/html8: named-journalprint.html
Only in /usr/share/man/html8: nsec3hash.html

some files that were in my old install and not the new one, and which
should have been deleted.

The man8 versions of the file are indeed gone.

I see that these are listed in the manhtml set - does the "postinstall
fix" step for cleaning up obsolete files need changes for supporting
manhtml?
 Thomas


header installation not make-jobs safe?

2024-04-11 Thread Thomas Klausner
Hi!

I had an interesting build failure today when using

build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -T 
/usr/obj/tools.gcc -m amd64 -O /usr/obj/src.amd64 -D 
/usr/obj/amd64.gcc.20240411 -R /usr/obj/amd64.gcc.20240411.release distribution

The build stopped quite early with:

--- /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h ---
*** Failed target: /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h
*** Failed commands:
@cmp -s ${.ALLSRC} ${.TARGET} > /dev/null 2>&1 ||  (${_MKSHMSG_INSTALL} 
${.TARGET};  ${_MKSHECHO} "${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP}  -m 
${NONBINMODE} ${.ALLSRC} ${.TARGET}" &&  ${INSTALL_FILE} -o ${BINOWN} -g 
${BINGRP}  -m ${NONBINMODE} ${.ALLSRC} ${.TARGET})
=> @cmp -s krb5_asn1.h 
/usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h > /dev/null 2>&1 ||  
(echo '#  ' "install " 
/usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h;  echo 
"/usr/obj/tools.gcc/bin/x86_64--netbsd-install  -N 
/disk/storage-202004/archive/foreign/src/etc -c  -r -o root -g wheel  -m 444 
krb5_asn1.h /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h" &&  
/usr/obj/tools.gcc/bin/x86_64--netbsd-install  -N 
/disk/storage-202004/archive/foreign/src/etc -c  -r -o root -g wheel  -m 444 
krb5_asn1.h /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h)
*** [/usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h] Error code 1
nbmake[4]: stopped in 
/disk/storage-202004/archive/foreign/src/crypto/external/bsd/heimdal/lib/libasn1

A second try of the same command on the same machine with the same
sources succeeded.

Full log available on request.
 Thomas


Re: tmux-direct entry only has 8 colors

2024-03-13 Thread Thomas Klausner
On Wed, Jan 31, 2024 at 10:31:41PM +0100, Thomas Klausner wrote:
> I've tried to get my terminal+tmux to display true colors today using
> the latest terminfo as imported to NetBSD.

I debugged this a bit further.  It works fine if I just use two
entries, but as soon as a third in the style of kitty+setal is added,
it breaks.


--- terminfotest2 ---
kitty+setal|set underline colors (nonstandard),
setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm,

minfirst|TermInfo Test,
# if the second number is >32767, it disappears!
   use=min, use=max,
maxfirst|TermInfo Test,
# putting the bigger one first makes "promotion" happen.
   use=max, use=min,

max|any number > INT16_MAX,
   colors#16777216,

min|any num < INT16_MAX,
   colors#8,

kittymin|kitty+min,
use=kitty+setal, use=min, use=max
kittymax|kitty+max,
use=kitty+setal, use=max, use=min
--- end of terminfotest2 ---

> tic -x terminfotest2

> infocmp -1x -A /home/wiz/terminfotest2.cdb minfirst
# Reconstructed from /home/wiz/terminfotest2.cdb
minfirst|TermInfo Test,
colors#8,
> infocmp -1x -A /home/wiz/terminfotest2.cdb maxfirst
# Reconstructed from /home/wiz/terminfotest2.cdb
maxfirst|TermInfo Test,
colors#16777216,

Here you can see the first encountered definition wins.

> infocmp -1x -A /home/wiz/terminfotest2.cdb kittymin
# Reconstructed from /home/wiz/terminfotest2.cdb
kittymin|kitty+min,
colors#8,
setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm,
> infocmp -1x -A /home/wiz/terminfotest2.cdb kittymax
# Reconstructed from /home/wiz/terminfotest2.cdb
kittymax|kitty+max,
colors#8,
setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm,

but here it doesn't any longer, or kitty+setal's non-definition counts
as an colors#8?

I've filed PR 58034 for this. Roy, Christos, can either of you please
take a look?

Thanks,
 Thomas


Re: bug in ftp(1)?

2024-02-18 Thread Thomas Klausner
On Sun, Feb 18, 2024 at 12:19:57PM -, Michael van Elst wrote:
> w...@netbsd.org (Thomas Klausner) writes:
> 
> >ftp: Receiving HTTP reply: Input line is too long
> 
> #define   FTPBUFLEN   (4 * MAXPATHLEN)
> char buf[FTPBUFLEN];
> 
> That's 4kB.
> 
> >curl -v https://sourceforge.net/projects/courier/files/courier-unicode/2.3.=
> >0/courier-unicode-2.3.0.tar.bz2
> 
> This returns a 5kB HTTP header "content-security-policy".
> 
> There is no protocol limit, but common server implementations do limit header
> lines to something between 4k (some nginx versions) to 48k (tomcat).

Thanks for the analysis. I've increased the size to 16kB.
 Thomas


bug in ftp(1)?

2024-02-18 Thread Thomas Klausner
Hi!

When fetching the distfile for mail/courier-unicode, I see:

=> Bootstrap dependency digest>=20211023: found digest-20220214
=> Fetching courier-unicode-2.3.0.tar.bz2
=> Total size: 657354 bytes
Trying [2606:4700:4400::ac40:9691]:443 ...
Requesting 
https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2
ftp: Receiving HTTP reply: Input line is too long
fetch: Unable to fetch expected file courier-unicode-2.3.0.tar.bz2
...

wget fetches the file fine.

curl -v gives some more information on the return value:


curl -v 
https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2
* Host sourceforge.net:443 was resolved.
* IPv6: 2606:4700:4400::ac40:9691, 2606:4700:4400::6812:256f
* IPv4: 104.18.37.111, 172.64.150.145
*   Trying [2606:4700:4400::ac40:9691]:443...
* Connected to sourceforge.net (2606:4700:4400::ac40:9691) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: none
*  CApath: /etc/openssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / 
id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; 
CN=sourceforge.net
*  start date: Feb  4 00:00:00 2024 GMT
*  expire date: Dec 31 23:59:59 2024 GMT
*  subjectAltName: host "sourceforge.net" matched cert's "sourceforge.net"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), 
signed using ecdsa-with-SHA256
*   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), 
signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (2048/112 Bits/secBits), signed 
using sha1WithRSAEncryption
* using HTTP/2
* [HTTP/2] [1] OPENED stream for 
https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: sourceforge.net]
* [HTTP/2] [1] [:path: 
/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2]
* [HTTP/2] [1] [user-agent: Mozilla/5.0]
* [HTTP/2] [1] [accept: */*]
> GET 
> /projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2 
> HTTP/2
> Host: sourceforge.net
> User-Agent: Mozilla/5.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 301
< date: Sun, 18 Feb 2024 11:09:47 GMT
< content-type: text/html; charset=UTF-8
< location: 
https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2/
< cache-control: no-cache
< pragma: no-cache
< x-ua-compatible: IE=edge,chrome=1
< permissions-policy: geolocation=(), microphone=(), camera=(), payment=(), 
document-domain=(), display-capture=(), autoplay=()
< feature-policy: geolocation 'none'; microphone 'none'; camera 'none'; payment 
'none'; document-domain 'none'; display-capture 'none'; autoplay 'none'
< x-frame-options: SAMEORIGIN
< content-security-policy: frame-ancestors 'self'; script-src 'self' 
adservice.google.co.jp adservice.google.co.tz adservice.google.nr *.crsspxl.com 
adservice.google.ge adservice.google.com.gi adservice.google.com.br 
adservice.google.com.tr adservice.google.so adservice.google.com.pe 
adservice.google.com.sb adservice.google.st *.sharethrough.com 
adservice.google.com.co adservice.google.com.pk adservice.google.ad 
adservice.google.cv adservice.google.ws adservice.google.gm adservice.google.gy 
adservice.google.tn adservice.google.no adservice.google.rs *.gstatic.cn 
*.googlesyndication.com adservice.google.com.bn adservice.google.tm 
http://c.sf-syn.com translate.googleapis.com adservice.google.com.my 
adservice.google.as *.google.com adservice.google.com.tw *.2mdn.net 
adservice.google.de adservice.google.lu adservice.google.com.hk 
adservice.google.pl adservice.google.gg adservice.google.tt 
adservice.google.com.pa adservice.google.vu adservice.google.co.ve 
adservice.google.fi adservice.google.mu adservice.google.vg adservice.google.to 
adservice.google.co.th adservice.google.iq adservice.google.ml 
adservice.google.com.bo adservice.google.com.ai adservice.google.com.uy 
adservice.google.ro adservice.google.ae adservice.google.cg *.trustarc.com 
adservice.google.co.bw adservice.google.tg adservice.google.com.eg *.tiny.cloud 
adservice.google.rw adservice.google.cz adservice.google.gr 
adservice.google.co.id 

tmux-direct entry only has 8 colors

2024-01-31 Thread Thomas Klausner
Hi!

I've tried to get my terminal+tmux to display true colors today using
the latest terminfo as imported to NetBSD.

Either I misunderstand something or the tmux-direct entry is broken.

> infocmp tmux-direct
# Reconstructed from /usr/share/misc/terminfo.cdb
tmux-direct|tmux with direct-color indexing,
am, hs, km, mir, msgr, xenl,
colors#8, cols#80, it#8, lines#24, pairs#64,
...

It shouldn't be 'colors#8', but 16777216, which is the whole point of
the "-direct" entries.

xterm-direct looks good:

> infocmp xterm-direct
# Reconstructed from /usr/share/misc/terminfo.cdb
xterm-direct|xterm with direct-color indexing,
am, bce, km, mc5i, mir, msgr, npc, xenl,
colors#16777216, cols#80, it#8, lines#24, pairs#65536,
...


Reading terminfo.src I see:

tmux-direct|tmux with direct-color indexing,
use=kitty+setal, use=xterm+direct, use=tmux,

> infocmp kitty+setal
kitty+setal|set underline colors (nonstandard),

Compare that to terminfo.src:

kitty+setal|set underline colors (nonstandard),
setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1
  %{255}%&%dm,

Seems like NetBSD's infocmp (or terminfo) doesn't support setal,
sounds like a bug.



> infocmp xterm+direct
# Reconstructed from /usr/share/misc/terminfo.cdb
xterm+direct|xterm with direct-color indexing (building-block),
colors#16777216, pairs#65536,
op=\E[39;49m, 
setab=\E[%?%p1%{8}%<%t4%p1%d%e48:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m,
 
setaf=\E[%?%p1%{8}%<%t3%p1%d%e38:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m,

Compared to terminfo file:

xterm+direct|xterm with direct-color indexing (building-block),
RGB,
colors#0x100, pairs#0x1, CO#8,
initc@, op=\E[39;49m,
setab=\E[%?%p1%{8}%<%t4%p1%d%e48:2::%p1%{65536}%/%d:%p1
  %{256}%/%{255}%&%d:%p1%{255}%&%d%;m,
setaf=\E[%?%p1%{8}%<%t3%p1%d%e38:2::%p1%{65536}%/%d:%p1
  %{256}%/%{255}%&%d:%p1%{255}%&%d%;m,
setb@, setf@,

Again, a couple things seem to get lost (RGB, CO#8, initc@, setb@,
setf@) but the colors are there.

> infocmp tmux
# Reconstructed from /usr/share/misc/terminfo.cdb
tmux|tmux terminal multiplexer,
am, hs, km, mir, msgr, xenl,
colors#8, cols#80, it#8, lines#24, pairs#64,
...

so the colors get lost here because colors#8 overwrites the
xterm+direct entry.

>From terminfo:
tmux|tmux terminal multiplexer,
invis=\E[8m, rmso=\E[27m,
sgr=\E[0%?%p6%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%?
%p5%t;2%;%?%p7%t;8%;m%?%p9%t\016%e\017%;,
smso=\E[7m, E3=\E[3J, Smulx=\E[4:%p1%dm,
use=ecma+italics, use=ecma+strikeout, use=xterm+edit,
use=xterm+pcfkeys, use=xterm+sl, use=xterm+tmux,
use=screen, use=bracketed+paste, use=report+version,
use=xterm+focus,

I looked at the 'use' and some of them are empty (which makes me think
NetBSD's infocmp or terminfo are missing more features) and then I
found:

# Reconstructed from /usr/share/misc/terminfo.cdb
screen|VT 100/ANSI X3.64 virtual terminal,
am, km, mir, msgr, xenl,
colors#8, cols#80, it#8, lines#24, pairs#64,
...

So it looks to me like the 'colors' from the 'screen' entry via the
'tmux' entry overwrite the colors defined by 'xterm+direct'.

When I run infocmp from ncurses 6.4, I get a different output for
tmux-direct:

# /usr/pkg/bin/infocmp tmux-direct
#   Reconstructed via infocmp from file: 
/usr/pkg/share/terminfo/t/tmux-direct
tmux-direct|tmux with direct-color indexing,
am, hs, km, mir, msgr, xenl,
colors#0x7fff, cols#80, it#8, lines#24, pairs#0x7fff,
acsc=++\,\,--..00``aaffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
clear=\E[H\E[J, cnorm=\E[34h\E[?25h, cr=\r,
csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H,
cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C,
cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\EM,
cvvis=\E[34l, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m,
dl=\E[%p1%dM, dl1=\E[M, dsl=\E]0;\007, ed=\E[J, el=\E[K,
el1=\E[1K, enacs=\E(B\E)0, flash=\Eg, fsl=^G, home=\E[H,
hpa=\E[%i%p1%dG, ht=^I, hts=\EH, ich=\E[%p1%d@,
il=\E[%p1%dL, il1=\E[L, ind=\n, indn=\E[%p1%dS,
invis=\E[8m, is2=\E)0, kDC=\E[3;2~, kEND=\E[1;2F,
kHOM=\E[1;2H, kIC=\E[2;2~, kLFT=\E[1;2D, kNXT=\E[6;2~,
kPRV=\E[5;2~, kRIT=\E[1;2C, kbs=^?, kcbt=\E[Z, kcub1=\EOD,
kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kdch1=\E[3~,
kend=\E[4~, kf1=\EOP, kf10=\E[21~, kf11=\E[23~,
kf12=\E[24~, kf13=\E[1;2P, kf14=\E[1;2Q, kf15=\E[1;2R,
kf16=\E[1;2S, kf17=\E[15;2~, kf18=\E[17;2~,
kf19=\E[18;2~, kf2=\EOQ, kf20=\E[19;2~, kf21=\E[20;2~,
kf22=\E[21;2~, kf23=\E[23;2~, kf24=\E[24;2~,
kf25=\E[1;5P, kf26=\E[1;5Q, kf27=\E[1;5R, kf28=\E[1;5S,
kf29=\E[15;5~, kf3=\EOR, kf30=\E[17;5~, 

mktemp POSIX (and Linux) divergence

2024-01-16 Thread Thomas Klausner
Hi!

Our mktemp man page says:

RETURN VALUES
 The mktemp() and mkdtemp() functions return a pointer to the template on
 success and NULL on failure.

But POSIX[1] (and Linux) say:

The mktemp() function shall return the pointer template. If a unique name 
cannot be created, template shall point to a null string.

where 'null string' is[2]

  3.146 Empty String (or Null String)

  A string whose first byte is a null byte

So NetBSD's mktemp returns NULL on error, while Linux returns a
pointer to string of length 0.

I think mktemp has been removed from POSIX in the meantime, but should
we switch to the POSIX behaviour?

 Thomas


[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/mktemp.html
[2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html


Update ARFLAGS?

2023-12-28 Thread Thomas Klausner
Hi!

As noted in PR 57565, the default ARFLAGS in share/mk/sys.mk are
broken - they use 'l' which changed behaviour between binutils 2.34
and 2.39.

Ok to commit the change?

(This broke the build of ruby-nokogiri recently, which is how I
noticed.)
 Thomas


Re: grafana rc.d scripts reports process not running

2023-12-27 Thread Thomas Klausner
On Wed, Dec 27, 2023 at 08:53:10AM +, RVP wrote:
> A way to check for a process-name different from the command-name seems
> to be documented in /etc/rc.subr. Does this patch work?

I think that works too, yes.
 Thomas


Re: grafana rc.d scripts reports process not running

2023-12-15 Thread Thomas Klausner
Thanks for the suggestions.

It turns out that starting 'grafana-server ...' ends up starting
'grafana server ...' which made the process name check fail - it
expected arg0 to be grafana-server, not grafana.

I've changed the script to start grafana as 'grafana server' instead
and it works now.
 Thomas


grafana rc.d scripts reports process not running

2023-12-15 Thread Thomas Klausner
Hi!

I'm currently trying out grafana, and I noticed one weirdness after
starting it using the pkgsrc rc.d script.

# /etc/rc.d/grafana status
grafana is not running.
# cat /var/run/grafana.pid
21719# ps -auxwww | grep 21719
root 7846  0.0  0.0 12468   2212 pts/4  O+3:14nachm.0:00.00 
grep 21719 
grafana 21719  0.0  0.1   1681352 157636 pts/4  Sl3:08nachm.0:02.15 
grafana server -homepath /usr/pkg/share/grafana -config 
/usr/pkg/etc/grafana.conf -pidfile /var/run/grafana.pid 

So grafana saved its PID into /var/run/grafana.pid, which is what's
configured in the rc.d script as pidfile, but the status command
thinks it's not running, despite a grafana process with the
corresponding PID running. (I tried manually adding a newline to the
pidfile, but that doesn't change the behaviour.)

There is not even an interpreter involved, /usr/pkg/bin/grafana is a
go binary.

Does anyone have an idea what the problem could be here?

grafana rc.d script attached.

Thanks,
 Thomas
#!/bin/sh
#
# $NetBSD: grafana.sh,v 1.6 2022/11/29 22:06:47 wiz Exp $
#
# PROVIDE: grafana
# REQUIRE: DAEMON
# KEYWORD: shutdown
#
# Consider installing pkgtools/rc.subr in unprivileged.
#
# You will need to set some variables in /etc/rc.conf to start grafana:
#
# grafana=YES

if [ -f /etc/rc.subr ]; then
$_rc_subr_loaded . /etc/rc.subr
fi

name="grafana"
rcvar=$name
grafana_user="grafana"
grafana_group="grafana"
grafana_home="/usr/pkg/share/${name}"
pidfile="/var/run/${name}.pid"
command="/usr/pkg/bin/grafana-server"
command_args="-homepath ${grafana_home} -config /usr/pkg/etc/grafana.conf 
-pidfile ${pidfile} < /dev/null > /dev/null 2>&1 &"
start_precmd="grafana_precmd"

grafana_precmd() {
if [ ! -r "${pidfile}" ]; then
touch "${pidfile}"
chown "${grafana_user}:${grafana_group}" "${pidfile}"
chmod 644 "${pidfile}"
fi
}

if [ -f /etc/rc.subr -a -d /etc/rc.d -a -f /etc/rc.d/DAEMON ]; then
load_rc_config $name
run_rc_command "$1"
else
if [ -f /etc/rc.conf ]; then
. /etc/rc.conf
fi
case "$1" in
start)
if [ -r "${pidfile}" ]; then
echo "Already running ${name}."
else
echo "Starting ${name}."
eval ${command} ${command_args}
fi
;;
stop)
if [ -r "${pidfile}" ]; then
echo "Stopping ${name}."
kill `/bin/cat "${pidfile}"` && /bin/rm "${pidfile}"
fi
;;
*)
echo "Usage: $0 {start|stop}" 1>&2
exit 10
;;
esac
fi


stack guard setup?

2023-11-17 Thread Thomas Klausner
Hi!

We found some operating system specific code in rust and would like to
know how this should be done for NetBSD.

Can someone please explain the stack guard setup on NetBSD?

Below the last mail from the thread on tech-pkg, with a link to the
rust code that shows how it's implemented in rust for other BSDs.

Thanks,
Thomas

- Forwarded message from Havard Eidnes  -

Date: Thu, 16 Nov 2023 19:29:25 +0100 (CET)
From: Havard Eidnes 
To: w...@netbsd.org
Cc: jper...@mnx.io, tech-...@netbsd.org
Subject: Re: rust problem when building firefox
X-Mailer: Mew version 6.9 on Emacs 26.3

>> > Is this a bug in rust?
>>
>> It might be missing some initialisation in the NetBSD implementation of
>> Rust.  It's panicking in this function:
>>
>>   https://doc.rust-lang.org/src/std/sys/unix/thread.rs.html#835-914
>>
>> There's some OS-specific code for other BSDs.  Does NetBSD also need to do
>> something specific here?
>
> Does he@ know? :)

he@ is regrettably blissfully ignorant about this issue.

Can someone please describe how our stack guard page is placed
etc, e.g. in the same terms as in the comments for the other
BSDs, I'll take a look at getting something suitable in,
initially as a patch, but I'll also take care of upstreaming it
if we can demonstrate that it works properly.

Regards,

- Håvard

- End forwarded message -


ure(4) or xhci(4) error?

2023-11-05 Thread Thomas Klausner
Hi!

After about 1.5 days of uptime I saw

xhci2: xhci_set_dequeue: endpoint 0x0: timed out
xhci2: endpoint 0x2 failed to stop
xhci2: xhci_set_dequeue: endpoint 0x2: timed out
ure0: usb error on tx: TIMEOUT
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
xhci2: xhci_set_dequeue: endpoint 0x2: timed out
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
xhci2: xhci_set_dequeue: endpoint 0x2: timed out
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
xhci2: xhci_reset_endpoint: endpoint 0x2: timed out
xhci2: endpoint 0x2 failed to stop
xhci2: xhci_set_dequeue: endpoint 0x2: timed out
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
xhci2: xhci_reset_endpoint: endpoint 0x2: timed out
xhci2: endpoint 0x2 failed to stop
xhci2: xhci_set_dequeue: endpoint 0x2: timed out
ure0: usb error on tx: IOERROR
ure0: watchdog timeout
xhci2: xhci_reset_endpoint: endpoint 0x2: timed out
xhci2: endpoint 0x2 failed to stop
xhci2: xhci_set_dequeue: endpoint 0x2: timed out

Is this an xhci issue or an ure one?
Has anyone else seen this?
 Thomas


updating kernel AND modules

2023-11-04 Thread Thomas Klausner
Hi!

I'm used to just running fully-compiled kernel without kernel modules
to speak of, so I just to 'build.sh kernel=GENERIC' and copy the
resulting kernel to /netbsd manually.

However, e.g. dtrace is a kernel module, so if I'm interested in
bugfixes for that, the kernel module needs to be updated as well.

The NetBSD guide does not talk about kernel modules at all in the
updating section
(https://www.netbsd.org/docs/guide/en/chap-kernel.html,
http://netbsd.org/docs/guide/en/chap-updating.html)

What is the current best-practice method for that?

Thanks,
 Thomas


Re: weird hangs in current (ghc, gnucash)

2023-11-04 Thread Thomas Klausner
On Thu, Nov 02, 2023 at 11:33:54AM +0100, Martin Husemann wrote:
> On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote:
> > Should we back out ad's changes until he has time to look at them?
> 
> I just did that on behalf of core.
> Can you test if this solves your problem?

Thank you, both my test cases work again with a GENERIC.
 Thomas


rge(4) completely hangs

2023-11-01 Thread Thomas Klausner
Hi!

After the latest fixes, rge(4) is better, but it's completely hung up
the network interface twice so far - no network traffic possible on it
- both times so hard, that the BIOS had some kind of issue on the next
boot and needed 15 minutes to sort itself out (before even showing
anything on the screen).

I'm running a kernel from Oct 22.

In /var/log/messages I see:
Nov  1 18:59:43 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 18:59:43 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 18:59:45 exadelic dhcpcd[2191]: rge0: Router Advertisement from ::1  
1 18:59:46 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 18:59:46 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 18:59:58 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:01:11 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:01:19 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:01:19 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:01:31 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:01:57 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:02:05 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:02:05 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:02:17 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:04:27 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:04:35 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:04:35 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:04:47 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:06:12 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:06:20 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:06:21 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:06:33 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:09:27 exadelic /netbsd: [ 91537.5847758] nfs server 
192.168.178.19:/path: not responding
Nov  1 19:15:16 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:15:24 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:15:24 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:15:36 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:16:51 exadelic dhcpcd[2191]: rge0: ::1 is reachable again
Nov  1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space 
available
Nov  1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space 
available
Nov  1 19:16:59 exadelic dhcpcd[2191]: rge0: ::1 is unreachable
Nov  1 19:16:59 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:16:59 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space 
available
Nov  1 19:17:11 exadelic syslogd[2290]: last message repeated 3 times
Nov  1 19:17:11 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:17:44 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space 
available

Just in case it matters, I'm not running with default sysctl's, I have

kern.sbmax: 262144 -> 16777216
net.inet.tcp.recvbuf_max: 262144 -> 16777216
net.inet.tcp.sendbuf_max: 262144 -> 16777216
net.inet.tcp.recvspace: 32768 -> 262144
net.inet.tcp.sendspace: 32768 -> 262144

because of

https://mail-index.netbsd.org/current-users/2017/09/21/msg032369.html

I've now switched to an ure(4) device.

Has anyone else seen this?
 Thomas


Re: weird hangs in current (ghc, gnucash)

2023-11-01 Thread Thomas Klausner
Should we back out ad's changes until he has time to look at them?
 Thomas


On Wed, Nov 01, 2023 at 09:36:01AM +, Chavdar Ivanov wrote:
> This weird hang still takes place on
> 
> ❯ uname -a
> NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct
> 30 19:45:39 GMT 2023
> sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com
> pile/GENERIC amd64
> 
> - again during building a haskell package:
> 
> ===> Configuring for hs-tagged-0.8.8
> [1 of 2] Compiling Main ( Setup.lhs, Setup.o )
> 
> 
> Htop gives weird output for the process not-yet-created:
> 
> 11506 root63   0 33283   873 S   0.0  0.0  0:00.00 |  `- make
> 20458 root62   0 34832   613 S   0.0  0.0  0:00.00 |  `-
> /bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:"  && exit
> 1;  exec 3<&0;??? whil
> 24942 root63   0 33296   882 S   0.0  0.0  0:00.00 |  `-
> /usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10
> OPSYS_VERSION=109910 LOWE
> 21643 root58   0 34302   606 S   0.0  0.0  0:00.00 |  `-
> /bin/sh -c set -e;? if test -n "" &&  /usr/pkg/sbin/pkg_info -K
> /usr/pkg/pkgdb -qe hs
> 19149 root63   0 34367   920 S   0.0  0.0  0:00.00 |  `-
> /usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes
> ALLOW_VULNERABLE_PACKAGES= reinst
> 23303 root58   0 33685   603 S   0.0  0.0  0:00.00 |   `-
> /bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`;
> cd /usr/pkgs
> 27078 root21   0  256G 37735 S   0.0  0.9  0:00.00 | `-
> /usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib
> -package-env
> 22058 root   -22   0 0 0 Z   0.0  0.0  0:00.00 |
> `- gcc   <==
> ---
> 
> 
> I guess it is back to the kernel from the 9th of October.
> 
> Chavdar
> 
> -
> 
> On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov  wrote:
> >
> > I can confirm that after reverting to the kernel from 9th of October 
> > devel/happy builds OK.
> >
> > On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger  wrote:
> >>
> >> ... and probably
> >>
> >> 3. PR kern/57660
> >> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660
> >>
> >> Markus
> >>
> >> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner 
> >> :
> >> >
> >> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
> >> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> >> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to 
> >> > > > Oct
> >> > > > 20) to test the rge(4) changes, and started a bulk build, and the
> >> > > > packages using ghc seem to wait for something and make no progress.
> >> > > ...
> >> > > > I see one other new weird behaviour on that machine - gnucash doesn't
> >> > > > finish starting up.
> >> > >
> >> > > I've backed out ad's changes from the 13th, and both problems are gone.
> >> > >
> >> > > I'll attach my local change.
> >> > >
> >> > > Andrew, can you please take a look?
> >> >
> >> > Two test cases to see the problem I have:
> >> >
> >> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs.
> >> >
> >> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
> >> >The 'build' step has two parts, it hangs after the first one.
> >> >
> >> >  Thomas
> >
> >
> >
> > --
> > 
> 
> 
> 
> -- 
> 


Re: dtracing unlink

2023-10-30 Thread Thomas Klausner
On Mon, Oct 30, 2023 at 11:33:24AM +, RVP wrote:
> The NetBSD copyinstr() _disables_ SMAP before copying data from
> userspace.  The dtrace version _does not_. I think this is what
> fails on some CPUs. My Intel CPU's more than 10 years old so it
> doesn't support SMAP (only SMEP), dtrace works for me. If you and
> bch tell me that your CPUs support SMAP, then that would be the
> smoking gun.

# cpuctl identify 0 | grep SMAP
cpu0: features5 0xf1bf97a9

Looks that way!
 Thomas


Re: dtracing unlink

2023-10-30 Thread Thomas Klausner
RVP looked at this some more and it seems related to
time-after-booting or perhaps RAM churn. It starts happening on RVP's
machine too after some uptime.

Still looking for a dtrace guru to help out here :)
 Thomas


Re: weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
> On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
> > 20) to test the rge(4) changes, and started a bulk build, and the
> > packages using ghc seem to wait for something and make no progress.
> ...
> > I see one other new weird behaviour on that machine - gnucash doesn't
> > finish starting up.
> 
> I've backed out ad's changes from the 13th, and both problems are gone.
> 
> I'll attach my local change.
> 
> Andrew, can you please take a look?

Two test cases to see the problem I have:

1. start gnucash, it doesn't finish starting up, the splash screen hangs.

2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
   The 'build' step has two parts, it hangs after the first one.

 Thomas


Re: weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
> 20) to test the rge(4) changes, and started a bulk build, and the
> packages using ghc seem to wait for something and make no progress.
...
> I see one other new weird behaviour on that machine - gnucash doesn't
> finish starting up.

I've backed out ad's changes from the 13th, and both problems are gone.

I'll attach my local change.

Andrew, can you please take a look?

Thanks,
 Thomas
Module Name:src
Committed By:   ad
Date:   Fri Oct 13 18:48:56 UTC 2023

Modified Files:
src/sys/kern: kern_condvar.c kern_sleepq.c
src/sys/rump/librump/rumpkern: locks.c locks_up.c
src/sys/sys: condvar.h lwp.h

Log Message:
Add cv_fdrestart() (better name suggestions welcome):

Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming.  Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.


To generate a diff of this commit:
cvs rdiff -u -r1.59 -r1.60 src/sys/kern/kern_condvar.c
cvs rdiff -u -r1.83 -r1.84 src/sys/kern/kern_sleepq.c
cvs rdiff -u -r1.86 -r1.87 src/sys/rump/librump/rumpkern/locks.c
cvs rdiff -u -r1.12 -r1.13 src/sys/rump/librump/rumpkern/locks_up.c
cvs rdiff -u -r1.17 -r1.18 src/sys/sys/condvar.h
cvs rdiff -u -r1.227 -r1.228 src/sys/sys/lwp.h


Module Name:src
Committed By:   ad
Date:   Fri Oct 13 18:50:39 UTC 2023

Modified Files:
src/sys/kern: uipc_socket.c uipc_syscalls.c
src/sys/sys: socketvar.h

Log Message:
Use cv_fdrestart() to implement fo_restart.


To generate a diff of this commit:
cvs rdiff -u -r1.305 -r1.306 src/sys/kern/uipc_socket.c
cvs rdiff -u -r1.208 -r1.209 src/sys/kern/uipc_syscalls.c
cvs rdiff -u -r1.165 -r1.166 src/sys/sys/socketvar.h


Module Name:src
Committed By:   ad
Date:   Fri Oct 13 19:07:09 UTC 2023

Modified Files:
src/sys/ddb: db_command.c db_interface.h db_xxx.c
src/sys/kern: sys_pipe.c
src/sys/sys: pipe.h
src/usr.bin/fstat: fstat.c

Log Message:
Simplify/streamline pipes a little bit:

- Allocate only one struct pipe not two (no need to be bidirectional here).
- Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops.
- Never wake the other side or acquire long-term (I/O) lock unless needed.
- Whenever possible, defer wakeups until after locks have been released.
- Do some things locklessly in pipe_ioctl() and pipe_poll().

Some notable results:

- -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process.
- 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine.
- 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w).


To generate a diff of this commit:
cvs rdiff -u -r1.186 -r1.187 src/sys/ddb/db_command.c
cvs rdiff -u -r1.41 -r1.42 src/sys/ddb/db_interface.h
cvs rdiff -u -r1.77 -r1.78 src/sys/ddb/db_xxx.c
cvs rdiff -u -r1.164 -r1.165 src/sys/kern/sys_pipe.c
cvs rdiff -u -r1.39 -r1.40 src/sys/sys/pipe.h
cvs rdiff -u -r1.118 -r1.119 src/usr.bin/fstat/fstat.c



ad.backed.out.diff.gz
Description: Binary data


weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
Hi!

I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
20) to test the rge(4) changes, and started a bulk build, and the
packages using ghc seem to wait for something and make no progress.

In one of my sandboxes there is a hs-data-array-byte build but it's not
doing anything.

The log stops at:

===> Creating toolchain wrappers for hs-data-array-byte-0.1.0.1nb2
===> Configuring for hs-data-array-byte-0.1.0.1nb2
=> Checking for portability problems in extracted files
[1 of 2] Compiling Main ( Setup.hs, Setup.o )

>From ps:

pbulk   26131  0.0  0.1 1073923564  140684 ?  Il8:23PM 0:00.23 
/usr/pkg/lib/ghc-9.4.7/bin/./ghc-9.4.7 -B/usr/pkg/lib/ghc-9.4.7/lib 
-package-env - --make Setup -dynamic 

(btw, that is a really huge process size?!)

Attaching with gdb shows me:

[Switching to LWP 20090 of process 26131]
0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
(gdb) bt
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa22f010, 
pMut=pMut@entry=0x7195fa22f038) at rts/posix/OSThreads.c:143
#3  0x7195faa903e1 in waitForWorkerCapability (task=) at 
rts/Capability.c:707
#4  yieldCapability (pCap=pCap@entry=0x7195f77fff10, 
task=task@entry=0x7195fa22f000, gcAllowed=gcAllowed@entry=true) at 
rts/Capability.c:1011
#5  0x7195faab0026 in scheduleYield (task=0x7195fa22f000, 
pcap=0x7195f77fff08) at rts/Schedule.c:709
#6  schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa22f000) at rts/Schedule.c:319
#7  0x7195faab20b9 in scheduleWorker (cap=cap@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa22f000) at rts/Schedule.c:2668
#8  0x7195faab78a2 in workerStart (task=0x7195fa22f000) at rts/Task.c:444
#9  0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1
#10 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12
#11 0x0020 in ?? ()
#12 0x in ?? ()
(gdb) thread apply all bt

Thread 6 (LWP 26131 of process 26131 ""):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2010, 
pMut=pMut@entry=0x7195fa2b2038) at rts/posix/OSThreads.c:143
#3  0x7195faa903e1 in waitForWorkerCapability (task=) at 
rts/Capability.c:707
#4  yieldCapability (pCap=pCap@entry=0x7f7fff2287c0, 
task=task@entry=0x7195fa2b2000, gcAllowed=gcAllowed@entry=true) at 
rts/Capability.c:1011
#5  0x7195faab0026 in scheduleYield (task=0x7195fa2b2000, 
pcap=0x7f7fff2287b8) at rts/Schedule.c:709
#6  schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa2b2000) at rts/Schedule.c:319
#7  0x7195faab2069 in scheduleWaitThread (tso=0x4200406ce8, 
ret=ret@entry=0x0, pcap=pcap@entry=0x7f7fff228940) at rts/Schedule.c:2651
#8  0x7195faaa85fb in rts_evalLazyIO (cap=cap@entry=0x7f7fff228940, 
p=p@entry=0x1071e60, ret=ret@entry=0x0) at rts/RtsAPI.c:566
#9  0x7195faaabb48 in hs_main (argc=, argv=, 
main_closure=0x1071e60, rts_config=...) at rts/RtsMain.c:72
#10 0x01063124 in main ()

Thread 5 (LWP 7329 of process 26131 "ghc_ticker"):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fab21bc0 
, pMut=pMut@entry=0x7195fab21b80 ) at 
rts/posix/OSThreads.c:143
#3  0x7195faae040e in itimer_thread_func (_handle_tick=0x7195faab9c57 
) at rts/posix/ticker/Pthread.c:140
#4  0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1
#5  0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12
#6  0x in ?? ()

Thread 4 (LWP 15032 of process 26131 "ghc_worker"):
#0  0x7195fa5a030a in _sys___kevent100 () from /usr/lib/libc.so.12
#1  0x7195fa97a8a7 in __kevent100 () from /usr/lib/libpthread.so.1
#2  0x7195fba014f2 in base_GHCziEventziKQueue_new12_info () from 
/usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so
#3  0x in ?? ()

Thread 3 (LWP 17781 of process 26131 "ghc_worker"):
#0  0x7195fa5a016a in poll () from /usr/lib/libc.so.12
#1  0x7195fa97ae63 in poll () from /usr/lib/libpthread.so.1
#2  0x7195fba0ff55 in ?? () from 
/usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so
#3  0x in ?? ()

Thread 2 (LWP 23219 of process 26131 "ghc_worker"):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2190, 
pMut=pMut@entry=0x7195fa2b21b8) at rts/posix/OSThreads.c:143
#3  

Re: dtracing unlink

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 07:40:17AM +, RVP wrote:
> Ah, that attachment is still based on _my_ version which is plain wrong: You
> can't do copyinstr(arg0) in the :entry action because the kernel may not have
> paged in the memory containing the pathname (yet).
> 
> Use your version (which is correct--it does copyinstr() in :return when the
> kernel is sure to have the pathname already in memory):

Yes, then we're back at the start:
dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid 
address (0x77002a73f7ce) in action #1 at DIF offset 12
: No such file or directory

 Thomas


Re: dtracing unlink

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 06:00:43AM +, RVP wrote:
> On Fri, 20 Oct 2023, Thomas Klausner wrote:
> 
> > # dtrace -n syscall::unlink:entry'/pid == 27647/{ self->file = arg0;  }' -n 
> > syscall::unlink:return'{ trace(copyinstr(self->file)); self->file = 0; }'
> > 
> > but this just gives me lots of
> > 
> > dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): 
> > invalid address (0x79c4586577ce) in action #1 at DIF offset 12
> > : No such file or directory
> > 
> 
> Actually, this command-line is almost correct. What's missing is the paired
> /pid == 27647/ for syscall::unlink:return. Without it, unlink:return is called
> for _every_ pid and there's not going to be a valid self->file for almost 
> every
> one of them.

I tried that (see attachment), didn't help.

dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid 
address (0x7a8e0685a7ce) in action #1 at DIF offset 12
: No such file or directory
dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid 
address (0x0) in action #2
: No such file or directory
dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid 
address (0x7a8e0685a7ce) in action #1 at DIF offset 12
: No such file or directory
dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid 
address (0x0) in action #2
: No such file or directory


The machine has 128 GB RAM and ~450 GB swap. I haven't tried limiting
the RAM from BIOS yet.
 Thomas
#!/usr/sbin/dtrace -s

#pragma D option destructive
#pragma D option quiet

syscall::unlink:entry /pid == 28651/
{
self->file = copyinstr(arg0);
}

syscall::unlink:return /pid == 28651/
{
printf("%d %s\n", pid, self->file);
self->file = 0;
}


Re: dtracing unlink

2023-10-21 Thread Thomas Klausner
On Sat, Oct 21, 2023 at 10:30:54AM +, RVP wrote:
> On Sat, 21 Oct 2023, Thomas Klausner wrote:
> 
> > With that I see:
> > 
> > # ./dtrace.unlink2
> > dtrace: buffer size lowered to 1m
> > dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): 
> > invalid address (0xc48240) in action #1 at DIF offset 12
> > : No such file or directory
> > dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): 
> > invalid address (0x0) in action #2
> > : No such file or directory
> > 
> 
> Odd. Are you running a KASLR kernel?

No, a standard GENERIC kernel (first try with one from daily releng
builds, second built just now from today's sources).
 Thomas


Re: dtracing unlink

2023-10-21 Thread Thomas Klausner
On Sat, Oct 21, 2023 at 06:10:17AM +, RVP wrote:
> On Fri, 20 Oct 2023, bch wrote:
> 
> > What OS release/architecture are you using that is getting favorable
> > results?
> > 
> 
> $ uname -a
> NetBSD x202e.localdomain 10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 19 
> 23:43:40 UTC 2023  
> mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
> 
> > I’m following along on ~up-to-the-minute -current AMD64 on my Thinkpad, and
> > only seeing the same memory errors as wiz’ original example.
> > 
> 
> Do the copyinstr in ::entry like this:
> 
> ```
> #!/usr/sbin/dtrace -s
> 
> #pragma D option destructive
> #pragma D option quiet
> 
> syscall::unlink:entry
> {
>   self->file = copyinstr(arg0);
> }
> 
> syscall::unlink:return
> {
>   printf("%d %s\n", pid, self->file);
>   self->file = 0;
> }
> ```

With that I see:

# ./dtrace.unlink2
dtrace: buffer size lowered to 1m
dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid 
address (0xc48240) in action #1 at DIF offset 12
: No such file or directory
dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid 
address (0x0) in action #2
: No such file or directory

 Thomas


cpuctl ucode: no patch available

2023-10-21 Thread Thomas Klausner
Hi!

I read about a new microcode update for the AMD Zen family, downloaded
the linux firmware repository and tried to apply it.

I put the new file for my CPU in /libdata/firmware/x86/amd/ (per man
page) as microcode_amd_fam19h.bin (as the filename is in the
repository).

# ls -l /libdata/firmware/x86/amd/microcode_amd_fam19h.bin
-rw-rw-r--  1 root  wheel  39172 Oct 21 10:34 
/libdata/firmware/x86/amd/microcode_amd_fam19h.bin

# cpuctl identify 0 | grep -i -e family -e ucode
cpu0: AMD Family 19h (686-class), 4491.57 MHz
cpu0: family 0x19 model 0x61 stepping 0x2 (id 0xa60f12)
cpu0: UCode version: 0xa601203

Then I try to apply it:

# cpuctl ucode
cpuctl: please also check dmesg(8) output for additional error information
cpuctl: IOC_CPU_UCODE_APPLY: No such file or directory
# dmesg | tail -1
autoconfiguration error: ucode: No patch available for this cpu

So this looks like it didn't find a patch file.

When I run it under ktrace I see:
  3719   3719 cpuctl   NAMI  
"/libdata/firmware/x86/amd/microcode_amd_fam19h.bin"
  3719   3719 cpuctl   RET   ioctl -1 errno 2 No such file or directory

so it looks in the right path.

Why does it claim there is no patch available?
 Thomas


Re: dtracing unlink

2023-10-20 Thread Thomas Klausner
On Fri, Oct 20, 2023 at 11:20:00PM +0200, Roland Illig wrote:
> Am 20.10.2023 um 22:38 schrieb Thomas Klausner:
> > Hi!
> >
> > I'm trying to find out what a program does, and found it does a lot of
> > unlink syscalls, so I wanted to see what it unlinks.
> 
> Did you try 'ktruss | grep NAMI' before diving deep into dtrace?

The interesting work load is _very_ long running, so I'm not sure I
want to ktrace all of it - so no, I didn't do that yet.
 Thomas


dtracing unlink

2023-10-20 Thread Thomas Klausner
Hi!

I'm trying to find out what a program does, and found it does a lot of
unlink syscalls, so I wanted to see what it unlinks.

I tried

# dtrace -n syscall::unlink:entry'/pid == 27647/{ self->file = arg0;  }' -n 
syscall::unlink:return'{ trace(copyinstr(self->file)); self->file = 0; }'

but this just gives me lots of

dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid 
address (0x79c4586577ce) in action #1 at DIF offset 12
: No such file or directory

(yes, including that weird newline in the middle).

What's the proper way to do this?

Thanks,
 Thomas


file-backed cgd backup question

2023-10-19 Thread Thomas Klausner
Hi!

For a cgd in a file that I mount via vnd+cgd, the file system contents
inside may change, but the actual file on the hard disk outside only
has 'access' time changes. So "smart" backup programs that check
timestamps to find out if they need to re-hash files don't notice it
was changed. How do you handle this? Manually touch it?

Cheers,
 Thomas


Re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread Thomas Klausner
On Tue, Oct 17, 2023 at 10:07:14AM +1100, Matthew Green wrote:
> > panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> > "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0
> 
> this is from:
> 
> KASSERTMSG(offset < map->dm_mapsize,
> "bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
> offset, map->dm_mapsize);
> 
> the mapsize being zero indicates that there's nothing mapped
> currently in this dma map, so there's nothing to sync.  ie,
> the caller seems to be trying to sync something not mapped.
> 
> can you post the full back trace?

Sure:

(gdb) bt
#0  0x80239c75 in cpu_reboot ()
#1  0x80ddb28d in kern_reboot ()
#2  0x80e21798 in vpanic ()
#3  0x80fe6e5f in kern_assert ()
#4  0x8058be67 in bus_dmamap_sync ()
#5  0x8044edc7 in rge_rxeof ()
#6  0x804536fd in rge_intr ()
#7  0x80592c15 in intr_biglock_wrapper ()
#8  0x80214405 in Xhandle_ioapic_edge18 ()
#9  0x8023547d in x86_mwait ()
#10 0x805819d0 in acpicpu_cstate_idle ()
#11 0x80dbe5d6 in idle_loop ()
#12 0x80210327 in lwp_trampoline ()
#13 0x in ?? ()

 Thomas


panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread Thomas Klausner
Hi!

I just tried checking out pkgsrc on an nvme when the machine paniced:

panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
"/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0×0 >= 0x0

That's a GENERIC 10.99.10/amd64 from releng, Oct 11.

Has anyone seen this one before?

I have a crash dump but no debug kernel, since I didn't build it
myself. dmesg attached, there is one warning from ACPI:
acpi0: autoconfiguration error: invalid PCI address for D005
no idea if that could be related.
 Thomas


dmesg.redacted.txt.gz
Description: Binary data


Re: panic: assertion "!cpu_softintr_p()" failed

2023-10-02 Thread Thomas Klausner
On Mon, Oct 02, 2023 at 09:23:59AM +1100, Matthew Green wrote:
> Thomas Klausner writes:
> > panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file 
> > "/usr/src/sys/kern/subr_kmem.c", line 451
> >
> > gdb says:
> >
> > #10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel 
> > %sassertion \"%s\" failed: file \"%s\", line %d ", 
> > ap=ap@entry=0xae2110a93e08)
> > at /usr/src/sys/kern/subr_prf.c:286
> > #11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 
> > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
> > at /usr/src/sys/lib/libkern/kern_assert.c:51
> > #12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at 
> > /usr/src/sys/kern/subr_kmem.c:451
> > #13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at 
> > /usr/src/sys/kern/kern_rwlock_obj.c:127
> > #14 0x80d825d3 in uvm_anon_release (anon=) at 
> > /usr/src/sys/uvm/uvm_anon.c:385
> 
> i think this is a new bug.  this line changed from:
> 
> 1.11 (ad   12-Sep-23):  pool_cache_put(rw_obj_cache, ro);
> 
> to
> 
> 1.12 (ad   23-Sep-23):  kmem_free(ro, sizeof(*ro));
> 
> i guess it just should be kmem_free_intr(), as pool_cache
> is intr-safe as well.

Thanks, I'll try a kernel with the attached diff.
 Thomas
Index: kern_rwlock_obj.c
===
RCS file: /cvsroot/src/sys/kern/kern_rwlock_obj.c,v
retrieving revision 1.12
diff -u -r1.12 kern_rwlock_obj.c
--- kern_rwlock_obj.c   23 Sep 2023 18:21:11 -  1.12
+++ kern_rwlock_obj.c   2 Oct 2023 07:51:31 -
@@ -124,7 +124,7 @@
}
membar_acquire();
rw_destroy(>ro_lock);
-   kmem_free(ro, sizeof(*ro));
+   kmem_intr_free(ro, sizeof(*ro));
return true;
 }
 


Re: cgd questions

2023-10-02 Thread Thomas Klausner
Follow up question because it just happened to me:

I have a USB Disk with ffs-on-cgd.  I unmounted the ffs but forgot
unconfiguring the cgd before unplugging the disk.

Can this cause problems? What kinds?
 Thomas


Re: cgd questions

2023-10-01 Thread Thomas Klausner
On Sun, Oct 01, 2023 at 09:31:03AM -0400, Greg Troxel wrote:
> Thomas Klausner  writes:
> 
> > When I pick up a cgd disk and want to use it on a NetBSD system to
> > which it was not connected before, what do I need?
> >
> > - the passphrase
> > - the /etc/cgd/foo file?
> >
> > If you need the /etc/cgd/foo file too, how do people handle those for
> > cgds used as backup disks?
> 
> Yes, you need the /etc/cgd/foo file because the passphrase is salted,
> and you might need an iv depending on iv method.  IMHO this is a design
> bug in cgd.  At least as a normal path, one should be able to access
> with just the passphrase.
> 
> My setup is
> 
>   (this is for a 512-sector disk)
>   GPT partition on disk
>   index 2: 16384 sectors starting at 64, ffs
>   index 1: rest of disk, cgd
> 
>   in index 2, newfs and then rsync all my cgd init files.
>   in index 1, cgconfig
> 
> Thus, any backup disk has the params for all of them.

That is a great idea. I should have thought of that before creating
partitions on my backup disks :|

> > The other question is that the cgd man page says that some ciphers are
> > obsolete. How can I switch from an obsolete cipher to a new one - is
> > the only method to make a new cgd with the new cipher and copy the
> > data manually?
> 
> I believe that's the only way.  I can't even figure out how to change
> the passphrase without doing that.

IIUC the cgdconfig man page correctly, this is how you do that:

 To create a new parameters file that will generate the same key as an old
 parameters file:

 # cgdconfig -G -o newparamsfile oldparamsfile
 old file's passphrase:
 new file's passphrase:

 Thomas


cgd questions

2023-10-01 Thread Thomas Klausner
Hi!

I tried finding this in the man page, but it wasn't fully clear to me.

When I pick up a cgd disk and want to use it on a NetBSD system to
which it was not connected before, what do I need?

- the passphrase
- the /etc/cgd/foo file?

If you need the /etc/cgd/foo file too, how do people handle those for
cgds used as backup disks?


The other question is that the cgd man page says that some ciphers are
obsolete. How can I switch from an obsolete cipher to a new one - is
the only method to make a new cgd with the new cipher and copy the
data manually?

Thanks,
 Thomas


panic: assertion "!cpu_softintr_p()" failed

2023-10-01 Thread Thomas Klausner
Hi!

I've updated to 10.99.9 last night and started a bulk build, which
didn't get very far.

panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file 
"/usr/src/sys/kern/subr_kmem.c", line 451

gdb says:

#10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel %sassertion 
\"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xae2110a93e08)
at /usr/src/sys/kern/subr_prf.c:286
#11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 "kernel 
%sassertion \"%s\" failed: file \"%s\", line %d ")
at /usr/src/sys/lib/libkern/kern_assert.c:51
#12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at 
/usr/src/sys/kern/subr_kmem.c:451
#13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at 
/usr/src/sys/kern/kern_rwlock_obj.c:127
#14 0x80d825d3 in uvm_anon_release (anon=) at 
/usr/src/sys/uvm/uvm_anon.c:385
#15 0x80d9e525 in uvm_aio_aiodone_pages 
(pgs=pgs@entry=0xae2110a93f30, npages=npages@entry=16, 
write=write@entry=true, error=error@entry=0)
at /usr/src/sys/uvm/uvm_pager.c:466
#16 0x80d9e954 in uvm_aio_aiodone (bp=0x9b158a500ed8) at 
/usr/src/sys/uvm/uvm_pager.c:526
#17 0x80ece109 in dkiodone (bp=) at 
/usr/src/sys/dev/dkwedge/dk.c:1658
#18 0x80e878a3 in biointr (cookie=) at 
/usr/src/sys/kern/vfs_bio.c:1737
#19 0x80dfd7bf in softint_execute (s=3, l=0x9b16c8abd8c0) at 
/usr/src/sys/kern/kern_softint.c:597
#20 softint_dispatch (pinned=, s=3) at 
/usr/src/sys/kern/kern_softint.c:842
#21 0x8023480c in Xsoftintr ()

 Thomas


make[1]: Cannot open `.' (Permission denied)

2023-09-05 Thread Thomas Klausner
Hi!

I was doing a limited bulk build on today's current/amd64, and only
libreoffice was left, when it failed like this:

[build HPX] zh-TW/helpcontent2/source/text/swriter/guide
[build HPX] zh-TW/helpcontent2/source/text/swriter/librelogo
[build HPX] zh-TW/helpcontent2/source/text/swriter/menu
[build HIX] scalc/en-US
[build HEJ] scalc/en-US
[build HIX] schart/en-US
[build HEJ] schart/en-US
Error reading directory 
file:///scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpTarget/scalc/en-US/content
gmake[1]: *** 
[/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/solenv/gbuild/HelpTarget.mk:460:
 
/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpIndexTarget/scalc/en-US.done]
 Error 1
gmake[1]: *** Waiting for unfinished jobs
Error reading directory 
file:///scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpTarget/schart/en-US/content
gmake[1]: *** 
[/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/solenv/gbuild/HelpTarget.mk:460:
 
/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpIndexTarget/schart/en-US.done]
 Error 1
gmake: *** [Makefile:289: build] Error 2
*** Error code 2

Stop.
make[1]: stopped in /usr/pkgsrc/misc/libreoffice
make[1]: Cannot open `.' (Permission denied)
*** Error code 2

Stop.
make: stopped in /usr/pkgsrc/misc/libreoffice

I'm used to gmake rarely failing randomly, but this usually looks
different.

There is nothing in /var/log/messages, and the permissions of
/usr/pkgsrc/misc/libreoffice are fine for the pbulk user (otherwise
the build wouldn't have started anyway).

I had such a make(1) problem before:

http://gnats.netbsd.org/42484

but there were fixes committed for that in 2013. Perhaps they were not
sufficient.

The bulk build clients are in sandboxes that look like this:

tmpfs on /archive/sandboxes/client3 type tmpfs (local)
ptyfs on /archive/sandboxes/client3/dev/ptyfs type ptyfs (local)
procfs on /archive/sandboxes/client3/proc type procfs (local)
/bin on /archive/sandboxes/client3/bin type null (read-only, local)
/sbin on /archive/sandboxes/client3/sbin type null (read-only, local)
/lib on /archive/sandboxes/client3/lib type null (read-only, local)
/libexec on /archive/sandboxes/client3/libexec type null (read-only, local)
/usr/bin on /archive/sandboxes/client3/usr/bin type null (read-only, local)
/usr/games on /archive/sandboxes/client3/usr/games type null (read-only, local)
/usr/include on /archive/sandboxes/client3/usr/include type null (read-only, 
local)
/usr/lib on /archive/sandboxes/client3/usr/lib type null (read-only, local)
/usr/libdata on /archive/sandboxes/client3/usr/libdata type null (read-only, 
local)
/usr/libexec on /archive/sandboxes/client3/usr/libexec type null (read-only, 
local)
/usr/share on /archive/sandboxes/client3/usr/share type null (read-only, local)
/usr/sbin on /archive/sandboxes/client3/usr/sbin type null (read-only, local)
/usr/X11R7 on /archive/sandboxes/client3/usr/X11R7 type null (read-only, local)
/var/mail on /archive/sandboxes/client3/var/mail type null (read-only, local)
/usr/src on /archive/sandboxes/client3/usr/src type null (read-only, local)
/usr/pkgsrc on /archive/sandboxes/client3/usr/pkgsrc type null (local)
/disk/scratch_ssd/client3 on /archive/sandboxes/client3/scratch type null 
(local)
/usr/xsrc on /archive/sandboxes/client3/usr/xsrc type null (read-only, local)
/packages on /archive/sandboxes/client3/packages type null (local)
/distfiles on /archive/sandboxes/client3/distfiles type null (local)

Worth a PR, or re-opening the old one?

Cheers,
 Thomas


Re: 10.99.7 panic: defibrillate

2023-08-14 Thread Thomas Klausner
On Mon, Aug 14, 2023 at 12:41:06PM +0200, Thomas Klausner wrote:
> I had followed your suggestion and bumped the heartbeat limit from 15
> to 300, but today it paniced again.
> 
> panic: cpu8: found cpu9 heart stopped beating and unresponsive
> 
> I have a core dump in case you want any particular details.
> 
> I've now switched set it to 0.

and had a hard hang less than half a day later.

This hasn't been happening in 10.99.5 (at least not with that
frequency), which had uptimes of weeks, so either the heartbeat code
introduced additional problems (even if disabled this way) or
something else got worse, or I am really really unlucky right now.
 Thomas


Re: 10.99.7 panic: defibrillate

2023-08-14 Thread Thomas Klausner
I had followed your suggestion and bumped the heartbeat limit from 15
to 300, but today it paniced again.

panic: cpu8: found cpu9 heart stopped beating and unresponsive

I have a core dump in case you want any particular details.

I've now switched set it to 0.
 Thomas


Re: 10.99.7 panic: defibrillate

2023-08-13 Thread Thomas Klausner
So it happened again, no bulk build this time, just qt5-qtwebengine in
a sandbox.

panic: cpu0: softints stuck for 16 seconds

I've got a kernel coredump this time, let me know what information
would be useful.

Btw, gdb 13.2 (built on Aug 11) doesn't work with kernel core dumps:
(gdb) target kvm netbsd.40.core
Undefined target command: "kvm netbsd.40.core".  Try "help target".

Using an older gdb I get:
(gdb) target kvm netbsd.40.core
0x80239c95 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
717 dumpsys();
(gdb) bt
#0  0x80239c95 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
#1  0x80deeb3d in kern_reboot (howto=260, bootstr=bootstr@entry=0x0) at 
/usr/src/sys/kern/kern_reboot.c:73
#2  0x80b72c04 in db_reboot_cmd (addr=, 
have_addr=, count=, modif=)
at /usr/src/sys/ddb/db_command.c:1589
#3  0x80b732da in db_command 
(last_cmdp=last_cmdp@entry=0x8187a360 ) at 
/usr/src/sys/ddb/db_command.c:964
#4  0x80b737ac in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:623
#5  0x80b77a98 in db_trap (type=type@entry=1, code=code@entry=0) at 
/usr/src/sys/ddb/db_trap.c:91
#6  0x80236b14 in kdb_trap (type=type@entry=1, code=code@entry=0, 
regs=regs@entry=0xd3a110933ad0)
at /usr/src/sys/arch/amd64/amd64/db_interface.c:251
#7  0x8023c2a4 in trap (frame=0xd3a110933ad0) at 
/usr/src/sys/arch/amd64/amd64/trap.c:315
#8  0x80234ad4 in alltraps ()
#9  0x0003 in ?? ()
#10 0x0001 in ?? ()
#11 0x0001 in ?? ()
#12 0x in ?? ()
 Thomas


Re: 10.99.7 panic: defibrillate

2023-08-12 Thread Thomas Klausner
On Sat, Aug 12, 2023 at 04:03:59PM +, Taylor R Campbell wrote:
> This panic means that one CPU has detected that another CPU has failed
> to run either the hardclock interrupt handler or the SOFTINT_CLOCK
> softints in over 15 seconds, and triggered an interprocessor interrupt
> in an attempt to panic rather than stay stuck where it appears to be
> stuck -- here, pmap_tlb_shootnow.
> 
> Normally the hardclock interrupt handler runs every 10ms (or 1/hz sec;
> default hz=100), and softints run reasonably promptly, so failing to
> do this for 15 sec is extremely unusual and likely indicates a CPU is
> wedged and unable to make progress.  For example, something may be
> stuck in an infinite loop with a spin lock held or spl raised, which
> blocks interrupts.
> 
> (The HEARTBEAT option, this system where CPUs check one another for
> progress, is new as of last month.  The problems it uncovers would
> likely have manifested as silent unresponsive hang before.)
> 
> 1. Did you notice anything sluggish before the crash?

I was active on the machine in a remote terminal and I didn't notice
anything in particular. The machine has 32 (virtual) cores so I
probably wouldn't notice one stuck - except that I would notice if
pbulk never finished because one was stuck.

I know that this machine is sometimes sluggish when all three parallel
pbulk clients want to interact with the disk (e.g. libreoffice and
rust unpacking at the same time, or something similar).

> 2. Can you start another bulk build and run the following dtrace
>script for a while and share the final output?
> 
> dtrace -x cleanrate=50hz -n '
> fbt::pmap_tlb_shootnow:entry,
> fbt::uvm_pagermapout:entry {
> self->starttime[probefunc] = timestamp
> }
> fbt::pmap_tlb_shootnow:return,
> fbt::uvm_pagermapout:return /self->starttime[probefunc]/ {
> @[probefunc] = quantize(timestamp -
> self->starttime[probefunc]);
> self->starttime[probefunc] = 0
> }
> tick-60s {
> printa(@)
> }
> '
> 
> You may need to modload dtrace_fbt and dtrace_profile first.  The
> tick-60s probe will print the current state of data collection once a
> minute, showing a histogram of the time spent in the functions
> pmap_tlb_shootnow and uvm_pagermapout.
> 
> If it says something like
> 
> dtrace: 429 dynamic variable drops with non-empty dirty list
> 
> then just hit ^C and save the last output.

I left it running for a bit and this is what it said:

# ./dtrace-script-20230812.sh
dtrace: description '
fbt::pmap_tlb_shootnow:entry,
fbt::uvm_pagermapout:entry ' matched 5 probes
dtrace: aggregation size lowered to 2m
dtrace: 312026 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 22746 dynamic variable drops with non-empty rinsing list
: Operation timed out
dtrace: 255016 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 341982 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 273456 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 11589 dynamic variable drops with non-empty rinsing list
: Operation timed out
dtrace: 303313 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 312509 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 345693 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 16682 dynamic variable drops with non-empty rinsing list
: Operation timed out
dtrace: 333801 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 327437 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 412853 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 539988 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 471254 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 501274 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 475914 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 501722 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 400591 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 370924 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 395296 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 276777 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 255151 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 300495 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 263274 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 390073 dynamic variable drops with non-empty dirty list
: Operation timed out
dtrace: 

10.99.7 panic: defibrillate

2023-08-12 Thread Thomas Klausner
Hi!

I just got a new panic in 10.99.7 after running a pbulk for less than
a day (after updating from 10.99.5, which was stable for weeks).

OCR'd from screenshot and manually corrected:

[ 24737.0090714] hardclock() at netbsd:hardclock+0x8b
[ 24737.0090714] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+Oxle
[ 24737.0090714] --- interrupt
[ 24737.0090714] pmap_tlb_shootnow() at netbs:pmap_tlb_shootnow+0x1f7
[ 24737.0090714] map_update() at netbsd:map_update+0×17
[ 24737.0090714] uvm_pagermapout() at netbsd:um_pagermapout+0×29
[ 24737.0090714] genfs_getpages() at netbsd:genfs_getpages+0×1755
[ 24737.0090714] VOP_GETPAGES() at netbsd:VOP_GETPAGES+0×58
[ 24737.0190721] ufs_balloc_range() at netbsd:ufs_balloc_range+0x11a
[ 24737.0190721) ffs_write() at netbsd:ffs _write+0x34c
[ 24737.0190721) layer_bypass() at netbsd: layer_bypass+0×102
[ 24737.0190721] VOP_WRITE() at netbsd:VOP_WRITE+0x103
[ 24737.0190721] vn_write() at netbsd:vn_write+Oxe0
[ 24737.0190721] dofilewrite() at netbsd: dofilewrite+0×80
[ 24737.0190721] sys_write() at netbsd:sys_write+0x49
[ 24737.0190721] syscall() at netbsd:syscall+0x196
[ 24737.0190721] --- syscall (number 4) ---
[ 24737.0190721] netbsd:syscall+0x196:
[ 24737.0190721] cpu14: End traceback..
[ 24737.01907211 fatal breakpoint trap in supervisor mode
[ 24737.0190721] trap type 1 code 0 rip Ox80235425 cs 0x8 rflags 0x202 
cr2 0x71f902c561cf level 0x7 rsp Oxaaa17d0d9628
[ 24737.0190721] curlwp @xe@cbc87ed700 pid 9818.9818 lowest kstack 
Oxaaal7ded52c0
Stopped in pid 9818.9818 (as) at netbsd:breakpoint+0x5: leave
breakpoint () at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x173 panic() at netbsd :panic+0x3c
defibrillate() at netbsd:defibrillate+Oxe3 hardclock() at netbsd:hardclock+0x8b
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+Oxle
--- interrupt ---
pmap_tlb_shootnow() at netbsd:pmap_tlb_shootnow+0x1f7
pmap_update() at netbsd:pmap_update+0x17
um._pagermapout () at netbsd:um_pagermapout+0×29
genfs_getpages() at netbsd:genfs_getpages+0x1755
VOP_GETPAGES () at netbsd: VOP_GETPAGES+0x58
ufs_balloc_range() at netbsd:ufs_balloc_range+0x11a
ffs_write() at netbsd:ffs_write+0×34c
layer_bypass() at netbsd: layer_bypass+0×102
VOP_WRITE () at netbsd:VOP_WRITE+0x103
vn_write() at netbsd:vn_write+Oxe0
dofilewrite() at netbsd:dofilewrite+0x80
sys_write() at netbsd: sys_write+0×49
syscall() at netbsd:syscall+0x196

Sorry, no crash dump available.

Any ideas what this one is about?
 Thomas


Re: kernel size change

2023-07-13 Thread Thomas Klausner
On Thu, Jul 13, 2023 at 08:16:34AM +, RVP wrote:
> That's one of them. The DATA segment is now at 0x1a0 instead of at
> 0x180 (2MB difference). The CODE segment must've increased in size for
> this. Check the previous `Section Headers:' display to see see how the sizes
> have changes, starting with the `.text' section.

Here's the complete diff for readelf -We:

--- old 2023-07-13 12:58:51.079219761 +0200
+++ new 2023-07-13 12:58:37.280681398 +0200
@@ -10,7 +10,7 @@
   Version:   0x1
   Entry point address:   0x8020e000
   Start of program headers:  64 (bytes into file)
-  Start of section headers:  29650104 (bytes into file)
+  Start of section headers:  31749240 (bytes into file)
   Flags: 0x0
   Size of this header:   64 (bytes)
   Size of program headers:   56 (bytes)
@@ -22,39 +22,39 @@
 Section Headers:
   [Nr] Name  TypeAddress  OffSize   ES Flg 
Lk Inf Al
   [ 0]   NULL 00 00 00 
 0   0  0
-  [ 1] .text PROGBITS8020 20 e0 00  AX 
 0   0 4096
-  [ 2] .rodata.hotpatch  PROGBITS8100 100 00229c 00   
A  0   0  1
-  [ 3] .rodata   PROGBITS810022c0 10022c0 4fab40 00   
A  0   0 64
-  [ 4] .eh_frame PROGBITS814fce00 14fce00 1bf5b8 00   
A  0   0  8
-  [ 5] link_set_x86_hotpatch_descriptors PROGBITS816bc3b8 
16bc3b8 68 00   A  0   0  8
-  [ 6] link_set_modules  PROGBITS816bc420 16bc420 000908 00   
A  0   0  8
-  [ 7] link_set_sdt_argtypes_set PROGBITS816bcd28 16bcd28 
0020d8 00   A  0   0  8
-  [ 8] link_set_sdt_probes_set PROGBITS816bee00 16bee00 000bb0 
00   A  0   0  8
-  [ 9] link_set_sdt_providers_set PROGBITS816bf9b0 16bf9b0 
30 00   A  0   0  8
-  [10] link_set_sysctl_funcs PROGBITS816bf9e0 16bf9e0 0002f0 
00   A  0   0  8
-  [11] link_set_acpi_device_calls PROGBITS816bfcd0 16bfcd0 
10 00   A  0   0  8
-  [12] link_set_evcnts   PROGBITS816bfce0 16bfce0 000138 00   
A  0   0  8
-  [13] link_set_linux_module_param_desc PROGBITS816bfe18 
16bfe18 0002b8 00   A  0   0  8
-  [14] link_set_linux_module_param_info PROGBITS816c00d0 
16c00d0 0002c0 00   A  0   0  8
-  [15] link_set_domains  PROGBITS816c0390 16c0390 58 00   
A  0   0  8
-  [16] link_set_ieee80211_funcs PROGBITS816c03e8 16c03e8 
20 00   A  0   0  8
-  [17] link_set_ah_chips PROGBITS816c0408 16c0408 38 00   
A  0   0  8
-  [18] link_set_ah_rfs   PROGBITS816c0440 16c0440 38 00   
A  0   0  8
-  [19] link_set_dkwedge_methods PROGBITS816c0478 16c0478 
18 00   A  0   0  8
-  [20] link_set_prop_linkpools PROGBITS816c0490 16c0490 40 
00   A  0   0  8
-  [21] .data PROGBITS8180 180 0c3bf8 00  
WA  0   0 64
-  [22] .data.cacheline_aligned PROGBITS818c3c00 18c3c00 00e178 
00  WA  0   0 64
-  [23] .data.read_mostly PROGBITS818d1d80 18d1d80 001468 00  
WA  0   0 32
-  [24] .bss  NOBITS  818d4000 18d31e8 12c000 00  
WA  0   0 4096
-  [25] .note.netbsd.ident NOTE81a0 18d31e8 18 00   
   0   0  4
-  [26] .note.Xen NOTE 18d3200 000198 00
  0   0  4
-  [27] .identPROGBITS 18d3398 031eef 01  
MS  0   0  1
-  [28] .comment  PROGBITS 1905287 22 01  
MS  0   0  1
-  [29] .SUNW_ctf PROGBITS 19052ac 0e9048 00
  0   0  4
-  [30] .gnu_debuglinkPROGBITS 19ee2f4 18 00
  0   0  4
-  [31] .symtab   SYMTAB   19ee310 15a800 18
 32 31599  8
-  [32] .strtab   STRTAB   1b48b10 0fdf6e 00
  0   0  1
-  [33] .shstrtab STRTAB   1c46a7e 000238 00
  0   0  1
+  [ 1] .text PROGBITS8020 20 100 00  
AX  0   0 4096
+  [ 2] .rodata.hotpatch  PROGBITS8120 120 00229c 00   
A  0   0  1
+  [ 3] .rodata   PROGBITS812022c0 12022c0 4fbc80 00   
A  0   0 64
+  [ 4] .eh_frame PROGBITS816fdf40 16fdf40 1bf940 00   
A  0   0  8
+  [ 5] link_set_x86_hotpatch_descriptors PROGBITS818bd880 
18bd880 68 00   A  0   0  8
+  [ 6] link_set_modules  PROGBITS818bd8e8 18bd8e8 000908 00   
A  0   0  8
+  [ 7] link_set_sdt_argtypes_set PROGBITS818be1f0 18be1f0 
0020d8 00   A  0   0  8
+  [ 8] link_set_sdt_probes_set PROGBITS

Re: kernel size change

2023-07-12 Thread Thomas Klausner
On Wed, Jul 12, 2023 at 03:01:54PM +, RVP wrote:
> On Wed, 12 Jul 2023, Jonathan A. Kollasch wrote:
> 
> > The amd64 maximum page size (or something like that) is 2MiB and I
> > suspect a section of your kernel just crossed that boundary.  Anyway,
> > check things like size(1) and nm(1) --print-size (maybe with --size-sort)
> > on both kernels.
> > 
> 
> Yeah, that's more likely. A diff of `readelf -We' on the kernels would
> confirm this. See if the offset has changed a lot.

Many changes, but this one's the one you mean, I think:
66,67c66,67
<   LOAD   0x20 0x8020 0x0020 0x14c04d0 
0x14c04d0 R E 0x20
<   LOAD   0x180 0x8180 0x0180 0x0d31e8 
0x20 RW  0x20
---
>   LOAD   0x20 0x8020 0x0020 0x16c1998 
> 0x16c1998 R E 0x20
>   LOAD   0x1a0 0x81a0 0x01a0 0x0d3228 
> 0x20 RW  0x20

 Thomas


Re: kernel size change

2023-07-12 Thread Thomas Klausner
On Wed, Jul 12, 2023 at 09:19:50AM -0500, Jonathan A. Kollasch wrote:
> On Wed, Jul 12, 2023 at 02:28:15PM +0200, Thomas Klausner wrote:
> > Hi!
> > 
> > For the last years, my nearly-GENERIC[1] kernel had size around 30MB.
> > Yesterday's kernel is 32MB.
> > 
> > Any ideas what changed, or how to find out?
> > 
> > -rwxr-xr-x  2 root  wheel   29652280 Jun 27 12:40 /netbsd.10.99.4
> > -rwxr-xr-x  2 root  wheel   31751416 Jul 11 22:57 /netbsd.10.99.5
> > 
> >  Thomas
> > 
> > 
> > [1] amd64/GENERIC plus
> > options FONT_GO_MONO12x23
> > no options FONT_BOLD16x32
> > no options FONT_BOLD8x16
> > options COMPAT_LINUX
> > options COMPAT_LINUX32
> 
> The amd64 maximum page size (or something like that) is 2MiB and I
> suspect a section of your kernel just crossed that boundary.  Anyway,
> check things like size(1) and nm(1) --print-size (maybe with --size-sort)
> on both kernels.

There are a lot of small changes, so I guess your page size idea is
the real change.

# size /netbsd.10.99.4
   textdata bss dec hex filename
21759148 864728 1228800 2385267616bf684 /netbsd.10.99.4
# size /netbsd.10.99.5
   textdata bss dec hex filename
23861620 864792 1228800 2595521218c0b8c /netbsd.10.99.5

diff old new (after removing address column):

355a356
> 0001 d besteffort.5
394a396
> 0001 d ready.6
3190a3193
> 0006 t memfd_ioctl
3408a3412
> 0007 r memfd_prefix
7525d7528
< 000a T dk_done
9332d9334
< 000f r __func__.2
9345a9348
> 000f r __func__.4
9822c9825
< 0010 b lasttime.6
---
> 0010 b lasttime.8
10488c10491
< 0010 r interval.5
---
> 0010 r interval.7
11515d11517
< 0012 r __func__.1
11565a11568
> 0012 r __func__.3
12316a12320
> 0014 r __func__.1
13375a13380,13381
> 0018 r CSWTCH.104
> 0018 r CSWTCH.110
13412d13417
< 0018 r CSWTCH.82
13415d13419
< 0018 r CSWTCH.88
14166a14171
> 0019 r __func__.2
14174d14178
< 0019 r __func__.3
14177a14182
> 0019 r __func__.5
16330a16336
> 0020 t crashme_kpreempt_spinout
16813d16818
< 0024 T i915_ggtt_enable_hw
17063a17069
> 0025 T i915_ggtt_enable_hw
18504c18510
< 002c r CSWTCH.62
---
> 002c r CSWTCH.64
22531a22538
> 003a t memfd_seek
22938a22946
> 003e T curcpu_stable
24149a24158
> 0044 t memfd_close
24914a24924
> 0049 t crashme_spl_spinout
26882d26891
< 0057 t entropy_softintr
28521a28531
> 005c t memfd_fcntl
29191a29202
> 0060 t entropy_softintr
29815d29825
< 0065 t entropy_pending_cpu
30843c30853
< 006d t tpm_poll
---
> 006d t tpm_poll.constprop.0
30907a30918
> 006e T sys_ftruncate
32288d32298
< 0078 R audio_fileops
32308d32317
< 0078 R drm_fileops
32345d32353
< 0078 R pad_fileops
32360d32367
< 0078 R socketops
32384d32390
< 0078 R vnops
32476d32481
< 0078 r bpf_fileops
32483,32487d32487
< 0078 r cryptofops
< 0078 r dmabuf_fileops
< 0078 r drm_syncobj_file_ops
< 0078 r drvctl_fileops
< 0078 r dtv_demux_fileops
32490d32489
< 0078 r eventfd_fileops
32493,32494d32491
< 0078 r fops
< 0078 r fops.2
32515,32516d32511
< 0078 r kqueueops
< 0078 r ksyms_fileops
32520d32514
< 0078 r mqops
32569,32571d32562
< 0078 r pipeops
< 0078 r putter_fileops
< 0078 r semops
32578,32579d32568
< 0078 r sync_file_ops
< 0078 r tap_fileops
32582d32570
< 0078 r timerfd_fileops
33346a5
> 0080 R audio_fileops
33347a7
> 0080 R drm_fileops
33355a33346
> 0080 R pad_fileops
33357a33349
> 0080 R socketops
33358a33351
> 0080 R vnops
33448a33442
> 0080 r bpf_fileops
33456a33451
> 0080 r cryptofops
33460a33456
> 0080 r dmabuf_fileops
33461a33458,33460
> 0080 r drm_syncobj_file_ops
> 0080 r drvctl_fileops
> 0080 r dtv_demux_fileops
33462a33462
> 0080 r eventfd_fileops
33463a33464,33465
> 0080 r fops
> 0080 r fops.2
33515a33518,33519
> 0080 r kqueueops
> 0080 r ksyms_fileops
33518a3

Re: kernel size change

2023-07-12 Thread Thomas Klausner
On Wed, Jul 12, 2023 at 02:28:15PM +0200, Thomas Klausner wrote:
> For the last years, my nearly-GENERIC[1] kernel had size around 30MB.
> Yesterday's kernel is 32MB.
> 
> Any ideas what changed, or how to find out?

Comparing "nm /netbsd | sed "s/^[^ ] //" | sort" of old and new kernel, I see

So mostly heartbeat, memfd, and some CSWTCH.* I don't understand.
Are these really 2MB?
 Thomas

214a215
> A _KERNEL_OPT_HEARTBEAT
7788a7790
> T addrulwp
10609a10612
> T curcpu_stable
10776a10780
> T db_syncobj_owner
17752a17757
> T linux_sys_memfd_create
25004a25010
> T sys_memfd_create
28043c28049
< b lasttime.6
---
> b lasttime.8
28872a28879
> d besteffort.5
29696a29704
> d ready.6
31127a31136
> r CSWTCH.104
31135a31145
> r CSWTCH.110
31137d31146
< r CSWTCH.1147
31140a31150
> r CSWTCH.1153
31146c31156
< r CSWTCH.1280
---
> r CSWTCH.1283
31151,31152c31161
< r CSWTCH.1336
< r CSWTCH.1337
---
> r CSWTCH.1339
31153a31163
> r CSWTCH.1340
31336d31345
< r CSWTCH.62
31341a31351
> r CSWTCH.64
31365d31374
< r CSWTCH.82
31381d31389
< r CSWTCH.88
32775a32784
> r __func__.4
32862a32872
> r __func__.5
37991c38001
< r interval.5
---
> r interval.7
38370a38381,38382
> r memfd_fileops
> r memfd_prefix
45017a45030
> t crashme_kpreempt_spinout
45021a45035
> t crashme_spl_spinout
45268a45283
> t db_show_all_tstiles
50636a50652,50660
> t memfd_close
> t memfd_fcntl
> t memfd_ioctl
> t memfd_mmap
> t memfd_read
> t memfd_seek
> t memfd_stat
> t memfd_truncate
> t memfd_write
56216c56240
< t tpm_poll
---
> t tpm_poll.constprop.0
56225c56249
< t tpm_waitfor.constprop.0
---
> t tpm_waitfor.constprop.0.isra.0
58046a58071
> t vn_truncate


kernel size change

2023-07-12 Thread Thomas Klausner
Hi!

For the last years, my nearly-GENERIC[1] kernel had size around 30MB.
Yesterday's kernel is 32MB.

Any ideas what changed, or how to find out?

-rwxr-xr-x  2 root  wheel   29652280 Jun 27 12:40 /netbsd.10.99.4
-rwxr-xr-x  2 root  wheel   31751416 Jul 11 22:57 /netbsd.10.99.5

 Thomas


[1] amd64/GENERIC plus
options FONT_GO_MONO12x23
no options FONT_BOLD16x32
no options FONT_BOLD8x16
options COMPAT_LINUX
options COMPAT_LINUX32


scp/sftp -R broken?

2023-06-05 Thread Thomas Klausner
Hi!

When I try to recursively copy a directory with "scp -r" or sftp's
"put -Rp" between a -current and a NetBSD 9, I see:

# scp -r a netbsd-9:
scp: realpath ./a: No such file
scp: upload "./a": path canonicalization failed
scp: failed to upload directory a to .

# ssh -V
OpenSSH_9.1 NetBSD_Secure_Shell-20221004-hpn13v14-lpk, OpenSSL 3.0.8 7 Feb 2023

netbsd-9# ssh -V
OpenSSH_8.0 NetBSD_Secure_Shell-20220604-hpn13v14-lpk, OpenSSL 1.1.1k  25 Mar 
2021

scp of single files works.

The same command works if I copy it onto the same machine (and thus
same ssh on the other side), both current -> current and netbsd9 ->
netbsd9.

Any ideas why this doesn't work, and what the error message wants to tell me??
 Thomas


Re: -current build failure

2023-06-04 Thread Thomas Klausner
On Mon, Jun 05, 2023 at 03:01:37AM +1000, Luke Mewburn wrote:
> I managed to reproduced this just building the tools with -V MKLLVM=yes.
> I've reverted tools/Makefile.host revision 1.35 and it seems to fix the
> tools build for me.
> 
> Does this resolve the issue for you?

Yes.

Now I just have a setlists problem:

===  10 extra files in DESTDIR  =
Files in DESTDIR but missing from flist.
File is obsolete or flist is out of date ?
--
./usr/tests/libexec/ld.elf_so/libh_abuse_dynamic_g.a
./usr/tests/libexec/ld.elf_so/libh_abuse_static_g.a
./usr/tests/libexec/ld.elf_so/libh_def_dynamic_g.a
./usr/tests/libexec/ld.elf_so/libh_def_static_g.a
./usr/tests/libexec/ld.elf_so/libh_onlyctor_dynamic_g.a
./usr/tests/libexec/ld.elf_so/libh_onlydef_g.a
./usr/tests/libexec/ld.elf_so/libh_onlyuse_dynamic_g.a
./usr/tests/libexec/ld.elf_so/libh_onlyuse_static_g.a
./usr/tests/libexec/ld.elf_so/libh_use_dynamic_g.a
./usr/tests/libexec/ld.elf_so/libh_use_static_g.a
=  end of 10 extra files  ===

 Thomas


-current build failure

2023-06-04 Thread Thomas Klausner
Hi!

I just tried updating my -current but the build failed:

build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -V 
NOGCCERROR=yes -T /usr/obj/tools.gcc -m amd64 -O /usr/obj/src.amd64 -D 
/usr/obj/amd64.gcc.20230604 -R /usr/obj/amd64.gcc.20230604.release distribution

--- support-modules ---
g++: error: unrecognized command-line option '-stdlib=libc++'
g++: error: unrecognized command-line option '-fmodules'; did you mean 
'-fmoduleinfo'?
g++: error: unrecognized command-line option '-fcxx-modules'
g++: error: unrecognized command-line option 
'-fmodules-cache-path=./module.cache'


Any ideas how to fix this?

Cheers,
 Thomas


Re: LLONG_MAX not available from c++

2023-03-31 Thread Thomas Klausner
On Fri, Mar 31, 2023 at 02:46:18PM +0200, Joerg Sonnenberger wrote:
> Am Fri, Mar 31, 2023 at 02:39:40PM +0200 schrieb Thomas Klausner:
> > On Fri, Mar 31, 2023 at 02:35:38PM +0200, Martin Husemann wrote:
> > > Which options does it pass to g++ ?
> > 
> > Good point, but it's not the compiler, it's lua itself:
> > 
> >  tar xvzf lua-5.4.4.tar.gz
> >  cd lua-5.4.4/src
> >  c++ lbaselib.c
> > 
> > and see it fail.
> > 
> > In file included from lua.h:16,
> >  from lbaselib.c:18:
> > luaconf.h:557:2: error: #error "Compiler does not support 'long long'. Use 
> > option '-DLUA_32BITS'   or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for 
> > details)"
> >   557 | #error "Compiler does not support 'long long'. Use option 
> > '-DLUA_32BITS' \
> >   |  ^
> 
> Make sure c++ with using at least -std=c++11?

Same error, also with c++17 and gnu++17. Probably lua does something
weird.

> Also, to ensure stack
> unwinding for C, -fexceptions should be enough.

Thanks for the information. I don't expect I'd get MAME to change to
that though.
 Thomas


Re: LLONG_MAX not available from c++

2023-03-31 Thread Thomas Klausner
On Fri, Mar 31, 2023 at 02:35:38PM +0200, Martin Husemann wrote:
> Which options does it pass to g++ ?

Good point, but it's not the compiler, it's lua itself:

 tar xvzf lua-5.4.4.tar.gz
 cd lua-5.4.4/src
 c++ lbaselib.c

and see it fail.

In file included from lua.h:16,
 from lbaselib.c:18:
luaconf.h:557:2: error: #error "Compiler does not support 'long long'. Use 
option '-DLUA_32BITS'   or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for 
details)"
  557 | #error "Compiler does not support 'long long'. Use option 
'-DLUA_32BITS' \
  |  ^

 Thomas


LLONG_MAX not available from c++

2023-03-31 Thread Thomas Klausner
Hi!

mame wants to compile lua with a c++ compiler.[1]

lua has a check in its headers to detect C99 mode by looking for
LLONG_MAX. If that is not found (and no workaround like an explicit
fallback to 32-bit ints is defined) then it fails to compile.

g++ in -current doesn't get this symbol when you include limits.h
(which lua does, since this is still C code) because of (from
/usr/include/machine/limits.h):

#if defined(_ISOC99_SOURCE) || (__STDC_VERSION__ - 0) >= 199901L || \
defined(_NETBSD_SOURCE)
#define ULLONG_MAX  0xULL   /* max unsigned long long */
#define LLONG_MAX   0x7fffLL/* max signed long long */
#define LLONG_MIN   (-0x7fffLL-1) /* min signed long long */
#endif

What is the best solution here?

1. define LUA_32BITS to use short ints?

2. pass a magic define to the compiler so the header works better when
used from c++? (which one would that be, _NETBSD_SOURCE?)

3. change the installed header in some way (but that won't help NetBSD
8/9/10)

Suggestions welcome!
 Thomas


[1] https://www.mamedev.org/ says: The technical reason for this
change is that MAME requires C++ stack frames to be unwound correctly,
including destructor calls, when Lua errors are raised from C++
code. Using Lua compiled as C will cause resource leaks.


Re: error installing libiconv-1.17

2023-03-27 Thread Thomas Klausner
On Mon, Mar 27, 2023 at 10:03:18AM +, Riccardo Mottola wrote:
> I am trying to upgrade current pkgsrc packages on current.
> 
> Current installed version:
> libiconv-1.14nb3Character set conversion library

IIRC libiconv doesn't build if a different version is already
installed - is that the case in your setup?
 Thomas


Re: nouveau: console stops updating

2023-03-22 Thread Thomas Klausner
On Sun, Mar 19, 2023 at 02:23:42PM +0100, Thomas Klausner wrote:
> Ok, so here I'm answering my own question - I looked in the BIOS
> settings and in the 'default boot options' I selected 'Legacy OPROM'
> instead of either that or UEFI, and the machine booted fine and I
> could start X! Yay :)

And with today's snapshot which has the attached pullup, I don't even
need to do that - it just works.

Thank you :-)
 Thomas--- Begin Message ---
The following reply was made to PR port-amd64/53126; it has been noted by GNATS.

From: "Martin Husemann" 
To: gnats-b...@gnats.netbsd.org
Cc: 
Subject: PR/53126 CVS commit: [netbsd-10] src/sys
Date: Mon, 20 Mar 2023 17:24:15 +

 Module Name:   src
 Committed By:  martin
 Date:  Mon Mar 20 17:24:15 UTC 2023
 
 Modified Files:
src/sys/dev/wscons [netbsd-10]: wsdisplay.c wsdisplayvar.h
src/sys/external/bsd/drm2/dist/drm/amd/amdgpu [netbsd-10]:
amdgpu_gart.c
src/sys/external/bsd/drm2/nouveau [netbsd-10]: nouveau_pci.c
src/sys/external/bsd/drm2/radeon [netbsd-10]: radeon_pci.c
 
 Log Message:
 Pull up following revision(s) (requested by mrg in ticket #122):
 
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_gart.c: revision 1.11
sys/external/bsd/drm2/nouveau/nouveau_pci.c: revision 1.37
sys/external/bsd/drm2/radeon/radeon_pci.c: revision 1.22
sys/dev/wscons/wsdisplay.c: revision 1.166
sys/dev/wscons/wsdisplayvar.h: revision 1.57
 
 amdgpu: Fix bogus loop invariant assertions in amdgpu_gart_map.
 nouveau: Kick out genfb on firmware framebuffer before initializing.
 
 PR kern/53126
 
 radeon: Kick out genfb on firmware framebuffer before initializing.
 this is the same change as nouveau_pci.c:1.37, and should fix at
 least PR#56714 and i thought at least another PR i can't find right
 now.  it fixes at least 2 different radeon cards for me on UEFI
 booted system.
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.165 -r1.165.4.1 src/sys/dev/wscons/wsdisplay.c
 cvs rdiff -u -r1.56 -r1.56.4.1 src/sys/dev/wscons/wsdisplayvar.h
 cvs rdiff -u -r1.10 -r1.10.4.1 \
 src/sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_gart.c
 cvs rdiff -u -r1.36 -r1.36.4.1 \
 src/sys/external/bsd/drm2/nouveau/nouveau_pci.c
 cvs rdiff -u -r1.21 -r1.21.4.1 src/sys/external/bsd/drm2/radeon/radeon_pci.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 
--- End Message ---


Re: nouveau: console stops updating

2023-03-19 Thread Thomas Klausner
On Sun, Mar 19, 2023 at 02:16:47PM +0100, Thomas Klausner wrote:
> I tried a NetBSD 10 snapshot with a GTX 970 today.
> 
> Sysinst ran fine -- in high resolution! -- but when I booted NetBSD
> after the installation, I get the screen update stop as reported in PR
> 57168 and PR 53126.
> 
> So I wonder if why it worked for sysinst, and how I could force my
> BIOS to do the same for the installed NetBSD. Any ideas/hints?

Ok, so here I'm answering my own question - I looked in the BIOS
settings and in the 'default boot options' I selected 'Legacy OPROM'
instead of either that or UEFI, and the machine booted fine and I
could start X! Yay :)
 Thomas


nouveau: console stops updating

2023-03-19 Thread Thomas Klausner
Hi!

I tried a NetBSD 10 snapshot with a GTX 970 today.

Sysinst ran fine -- in high resolution! -- but when I booted NetBSD
after the installation, I get the screen update stop as reported in PR
57168 and PR 53126.

So I wonder if why it worked for sysinst, and how I could force my
BIOS to do the same for the installed NetBSD. Any ideas/hints?
 Thomas


Re: 10.99.2 panic in kern_timeout.c

2023-02-03 Thread Thomas Klausner
On Fri, Feb 03, 2023 at 09:24:11AM -, Michael van Elst wrote:
> w...@netbsd.org (Thomas Klausner) writes:
> 
> >> The biggest change recently is probably that my bulk build switched
> >> from ghc92 to ghc94, but I don't know if that could cause this.
> 
> >Next bulk build, next panic, quite reliably. Has anyone else seen this?
> 
> Not yet, maybe this is the first use of timerfd from multiple threads.

I used

/usr/pkgsrc/mk> cvs di haskell.mk 
Index: haskell.mk
===
RCS file: /cvsroot/pkgsrc/mk/haskell.mk,v
retrieving revision 1.54
diff -u -r1.54 haskell.mk
--- haskell.mk  1 Feb 2023 03:37:21 -   1.54
+++ haskell.mk  4 Feb 2023 07:56:28 -
@@ -148,7 +148,7 @@
 HASKELL_ENABLE_TESTS?= no
 HASKELL_UNRESTRICT_DEPENDENCIES?=  # empty
 
-.include "../../lang/ghc94/buildlink3.mk"
+.include "../../lang/ghc92/buildlink3.mk"
 
 # Some Cabal packages requires preprocessors to build, and we don't
 # want them to implicitly depend on such tools. Place dummy scripts by


and the bulk build succeeded.

Can someone else please try building e.g. pandoc in a bulk build on
10.99.2/amd64/Jan 27 and check if they see the same issue?

Thanks,
 Thomas


Re: 10.99.2 panic in kern_timeout.c

2023-02-03 Thread Thomas Klausner
On Wed, Feb 01, 2023 at 02:00:07PM +0100, Thomas Klausner wrote:
> I have a new problem on a system running 10.99.2/amd64 from Jan 27,
> which was heavily bulk building most of the time, and stable.
> 
> Now I have seen this panic twice today already (OCR'd so beware of typos):
> 
> panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || 
> c->c_cu->ce_active != c" failed: file "/usr/src/sys/kern/kern_timeout.c" line 
> 381 running callout 0x96631f85b80: c_func (Ox8@e070e3) c_flags 
> (0x100) destroyed from OxfFff80e51f99
> cpu31: Begin traceback...
> vpanic() at netbsd:vpanic+0x183
> kern_assert() at netbsd:kern_assert+0x4b
> callout_destroy() at netbsd:callout_destroy+0xa2
> timerfd_fop_close() at netbsd:timerfd_fop_close+0x36
> closef() at netbsd:closef+0x60
> fd_close() at netbsd:fd_close+ 0x138
> sys_close() at netbsd:sys_close+0x22
> syscall() at netbsd: svscall+0x196
> --- syscall (number 6) ---
> netbsd: syscall+0x196:
> cpu31: End traceback..
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip 0x80235315 cs 0x8 rflags 0x202 cr2 
> 0x71b8870b12dd ilevel 0 rsp 0xa2a133029db0
> 
> The biggest change recently is probably that my bulk build switched
> from ghc92 to ghc94, but I don't know if that could cause this.

Next bulk build, next panic, quite reliably. Has anyone else seen this?
 Thomas


10.99.2 panic in kern_timeout.c

2023-02-01 Thread Thomas Klausner
Hi!

I have a new problem on a system running 10.99.2/amd64 from Jan 27,
which was heavily bulk building most of the time, and stable.

Now I have seen this panic twice today already (OCR'd so beware of typos):

panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || 
c->c_cu->ce_active != c" failed: file "/usr/src/sys/kern/kern_timeout.c" line 
381 running callout 0x96631f85b80: c_func (Ox8@e070e3) c_flags 
(0x100) destroyed from OxfFff80e51f99
cpu31: Begin traceback...
vpanic() at netbsd:vpanic+0x183
kern_assert() at netbsd:kern_assert+0x4b
callout_destroy() at netbsd:callout_destroy+0xa2
timerfd_fop_close() at netbsd:timerfd_fop_close+0x36
closef() at netbsd:closef+0x60
fd_close() at netbsd:fd_close+ 0x138
sys_close() at netbsd:sys_close+0x22
syscall() at netbsd: svscall+0x196
--- syscall (number 6) ---
netbsd: syscall+0x196:
cpu31: End traceback..
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0x80235315 cs 0x8 rflags 0x202 cr2 
0x71b8870b12dd ilevel 0 rsp 0xa2a133029db0

The biggest change recently is probably that my bulk build switched
from ghc92 to ghc94, but I don't know if that could cause this.

Ideas?
 Thomas


Re: gnucash coredump on startup

2023-01-13 Thread Thomas Klausner
On Sun, Jan 08, 2023 at 11:58:09PM +0100, Thomas Klausner wrote:
> On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote:
> > In article ,
> > Thomas Klausner   wrote:
> > >Hi!
> > >
> > >I've just replaced my 10.99.2/20221231 userland (kernel slightly
> > >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland.
> > >
> > >Now gnucash dumps core on startup:
> > 
> > Could be rtld related. Can you try with the older ld_elf.so?
> 
> I can't go back with just that one, but I did the following test:
> 
> Downgraded my whole userland to 20221231:
> gnucash works
> 
> install ld_elf.so from 20230107:
> gnucash dumps core
> 
> install ld_elf.so from 20221231:
> gnucash works
> 
> So yes, definitely an issue in ld_elf.so :)

After christos' commits from the last hours, this bug is fixed - both
gnucash and guile (old binaries) still work on an updated system.

I'll re-build the system from scratch next and report if there are any
issues.

Thanks, christos!
 Thomas


Re: gnucash coredump on startup

2023-01-08 Thread Thomas Klausner
On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote:
> In article ,
> Thomas Klausner   wrote:
> >Hi!
> >
> >I've just replaced my 10.99.2/20221231 userland (kernel slightly
> >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland.
> >
> >Now gnucash dumps core on startup:
> 
> Could be rtld related. Can you try with the older ld_elf.so?

I can't go back with just that one, but I did the following test:

Downgraded my whole userland to 20221231:
gnucash works

install ld_elf.so from 20230107:
gnucash dumps core

install ld_elf.so from 20221231:
gnucash works

So yes, definitely an issue in ld_elf.so :)
 Thomas


Re: ldscripts not cleaned up

2023-01-08 Thread Thomas Klausner
On Sun, Jan 08, 2023 at 08:48:24PM -, Christos Zoulas wrote:
> In article ,
> Thomas Klausner   wrote:
> >Hi!
> >
> >NetBSD after the switch to binutils 2.39 does not install the
> >following files any longer, but they are not marked as obsolete
> >either:
> >
> >/usr/libdata/ldscripts/elf_k1om.x
> >/usr/libdata/ldscripts/elf_k1om.xbn
> >/usr/libdata/ldscripts/elf_k1om.xc
> >/usr/libdata/ldscripts/elf_k1om.xd
> >/usr/libdata/ldscripts/elf_k1om.xdc
> >/usr/libdata/ldscripts/elf_k1om.xdw
> >/usr/libdata/ldscripts/elf_k1om.xn
> >/usr/libdata/ldscripts/elf_k1om.xr
> >/usr/libdata/ldscripts/elf_k1om.xs
> >/usr/libdata/ldscripts/elf_k1om.xsc
> >/usr/libdata/ldscripts/elf_k1om.xsw
> >/usr/libdata/ldscripts/elf_k1om.xu
> >/usr/libdata/ldscripts/elf_k1om.xw
> >/usr/libdata/ldscripts/elf_l1om.x
> >/usr/libdata/ldscripts/elf_l1om.xbn
> >/usr/libdata/ldscripts/elf_l1om.xc
> >/usr/libdata/ldscripts/elf_l1om.xd
> >/usr/libdata/ldscripts/elf_l1om.xdc
> >/usr/libdata/ldscripts/elf_l1om.xdw
> >/usr/libdata/ldscripts/elf_l1om.xn
> >/usr/libdata/ldscripts/elf_l1om.xr
> >/usr/libdata/ldscripts/elf_l1om.xs
> >/usr/libdata/ldscripts/elf_l1om.xsc
> >/usr/libdata/ldscripts/elf_l1om.xsw
> >/usr/libdata/ldscripts/elf_l1om.xu
> >/usr/libdata/ldscripts/elf_l1om.xw
> >
> >Should new binutils install them, or should they be marked as obsolete?
> >
> They should be marked as obsolete...

Done!
 Thomas


lang/guile30 crash in build even without lto [was Re: lang/guile30 build issue: lto support missing in ar/ranlib]

2023-01-08 Thread Thomas Klausner
On Sun, Jan 08, 2023 at 12:38:05PM -0500, Greg Troxel wrote:
> Thomas Klausner  writes:
> 
> > On 10.99.2 after the load sections 2->4 change I see the following
> > when building lang/guile30:
> >
> > ar: libguile_3.0_la-alist.o: plugin needed to handle lto object
> > ranlib: .libs/libguile-3.0.a(libguile_3.0_la-alist.o): plugin needed to 
> > handle lto object
> >   CCLD guile
> >
> > and the resulting binary segfaults when run (which also happens during
> > the build), backtrace below.
> >
> > Is there a flag to turn off lto, or can we please get ar/ranlib
> > support for lto?
> >
> > To reproduce, just try building 'lang/guile30'.
> 
> It fails to even build on i386.  I have a local patch, pending figuring
> it out, to just disable lto.  I was unsure if that belonged on only some
> arches, but seems best to mass disable and theni figure it out.

Ok, I tried that out - the warning is gone, but guile is still crashing.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x799bb2aff3a6 in scm_sloppy_assq () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
(gdb) bt
#0  0x799bb2aff3a6 in scm_sloppy_assq () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#1  0x799bb2b27c28 in scm_hash_fn_ref () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#2  0x799bb2b15f18 in expand () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#3  0x799bb2b160c4 in expand_and () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#4  0x799bb2b16e49 in expand_cond_clauses () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#5  0x799bb2b16e06 in expand_cond_clauses () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#6  0x799bb2b15dda in expand () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#7  0x799bb2b18f7f in expand_letrec_helper () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#8  0x799bb2b17492 in expand_lambda_star_case () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#9  0x799bb2b179a9 in expand_lambda_star () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#10 0x799bb2b1688a in expand_set_x () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#11 0x799bb2b1618b in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#12 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#13 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#14 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#15 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#16 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#17 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#18 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#19 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#20 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#21 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#22 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#23 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#24 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#25 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#26 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#27 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#28 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#29 0x799bb2b1617c in expand_sequence () from 
/scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1
#30 0x799bb2b1617c in expand_sequence () from 
/scratch/l

ldscripts not cleaned up

2023-01-08 Thread Thomas Klausner
Hi!

NetBSD after the switch to binutils 2.39 does not install the
following files any longer, but they are not marked as obsolete
either:

/usr/libdata/ldscripts/elf_k1om.x
/usr/libdata/ldscripts/elf_k1om.xbn
/usr/libdata/ldscripts/elf_k1om.xc
/usr/libdata/ldscripts/elf_k1om.xd
/usr/libdata/ldscripts/elf_k1om.xdc
/usr/libdata/ldscripts/elf_k1om.xdw
/usr/libdata/ldscripts/elf_k1om.xn
/usr/libdata/ldscripts/elf_k1om.xr
/usr/libdata/ldscripts/elf_k1om.xs
/usr/libdata/ldscripts/elf_k1om.xsc
/usr/libdata/ldscripts/elf_k1om.xsw
/usr/libdata/ldscripts/elf_k1om.xu
/usr/libdata/ldscripts/elf_k1om.xw
/usr/libdata/ldscripts/elf_l1om.x
/usr/libdata/ldscripts/elf_l1om.xbn
/usr/libdata/ldscripts/elf_l1om.xc
/usr/libdata/ldscripts/elf_l1om.xd
/usr/libdata/ldscripts/elf_l1om.xdc
/usr/libdata/ldscripts/elf_l1om.xdw
/usr/libdata/ldscripts/elf_l1om.xn
/usr/libdata/ldscripts/elf_l1om.xr
/usr/libdata/ldscripts/elf_l1om.xs
/usr/libdata/ldscripts/elf_l1om.xsc
/usr/libdata/ldscripts/elf_l1om.xsw
/usr/libdata/ldscripts/elf_l1om.xu
/usr/libdata/ldscripts/elf_l1om.xw

Should new binutils install them, or should they be marked as obsolete?
 Thomas


lang/guile30 build issue: lto support missing in ar/ranlib

2023-01-08 Thread Thomas Klausner
Hi!

On 10.99.2 after the load sections 2->4 change I see the following
when building lang/guile30:

ar: libguile_3.0_la-alist.o: plugin needed to handle lto object
ranlib: .libs/libguile-3.0.a(libguile_3.0_la-alist.o): plugin needed to handle 
lto object
  CCLD guile

and the resulting binary segfaults when run (which also happens during
the build), backtrace below.

Is there a flag to turn off lto, or can we please get ar/ranlib
support for lto?

To reproduce, just try building 'lang/guile30'.

Thanks,
 Thomas

[New process 4469]
Core was generated by `guile'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  weak_set_lookup.constprop.0 (set=set@entry=0x7902a5065150, 
hash=581991487143039215, hash@entry=290995743571519607, 
pred=pred@entry=0x7902a58f6f1b ,
closure=closure@entry=0x7f7fff302a50, dflt=) at 
weak-set.c:483
483   other_hash = entries[k].hash;
(gdb) bt
#0  weak_set_lookup.constprop.0 (set=set@entry=0x7902a5065150, 
hash=581991487143039215, hash@entry=290995743571519607, 
pred=pred@entry=0x7902a58f6f1b ,
closure=closure@entry=0x7f7fff302a50, dflt=) at 
weak-set.c:483
#1  0x7902a58f56b8 in scm_c_weak_set_lookup (dflt=0x4, 
closure=0x7f7fff302a50, pred=0x7902a58f6f1b , 
raw_hash=290995743571519607, set=) at weak-set.c:763
#2  lookup_interned_symbol (raw_hash=290995743571519607, name=0x7902a500ef80) 
at symbols.c:112
#3  scm_i_str2symbol (str=0x7902a500ef80) at symbols.c:244
#4  0x7902a58f92a6 in scm_string_to_symbol (string=) at 
symbols.c:360
#5  0x7902a58fe437 in scm_gensym (prefix=0x7902a4ed30e0) at symbols.c:408
#6  0x7902a5872cb1 in transform_bindings 
(bindings=bindings@entry=0x7902a5028cb0, expr=expr@entry=0x7902a5028d50, 
names=names@entry=0x7f7fff302c20, vars=vars@entry=0x7f7fff302c18,
initptr=initptr@entry=0x7f7fff302c10) at expand.c:948
#7  0x7902a5872e49 in expand_let (expr=0x7902a5028d50, env=0x7902a5070ff0) 
at expand.c:1012
#8  0x7902a5871157 in expand_if (expr=0x7902a501d120, env=0x7902a5070ff0) 
at expand.c:586
#9  0x7902a587266c in expand_letstar_clause (bindings=0x7902a501d140, 
body=0x7902a502cdf0, env=0x7902a5065150) at expand.c:1074
#10 0x7902a587266c in expand_letstar_clause (bindings=0x7902a501d1e0, 
body=0x7902a502cdf0, env=0x7902a5065200) at expand.c:1074
#11 0x7902a5871157 in expand_if (expr=0x7902a501d5c0, env=0x7902a5065200) 
at expand.c:586
#12 0x7902a5871359 in expand_lambda_case (clause=0x7902a501d5d0, 
alternate=alternate@entry=0x4, env=) at expand.c:662
#13 0x7902a587162e in expand_lambda (expr=0x7902a501d610, env=) at expand.c:676
#14 0x7902a58732ce in expand_exprs (env=0x7902a5062a40, 
forms=0x7902a5062ad0) at expand.c:393
#15 expand_letrec_helper (expr=, env=0x7902a5062a40, 
in_order_p=0x404) at expand.c:1040
#16 0x7902a5871359 in expand_lambda_case (clause=0x7902a501dd80, 
alternate=alternate@entry=0x4, env=) at expand.c:662
#17 0x7902a587162e in expand_lambda (expr=0x7902a501ddc0, env=) at expand.c:676
#18 0x7902a58732ce in expand_exprs (env=0x7902a5059560, 
forms=0x7902a5059c20) at expand.c:393
#19 expand_letrec_helper (expr=, env=0x7902a5059560, 
in_order_p=0x404) at expand.c:1040
#20 0x7902a58703d8 in expand (exp=0x7902a501de50, env=0x7902a5055020) at 
expand.c:361
#21 0x7902a5870703 in expand_sequence (forms=0x7902a5034160, 
env=0x7902a5055020) at expand.c:405
#22 0x7902a58706f4 in expand_sequence (forms=0x7902a501de60, 
env=0x7902a5055020) at expand.c:405
#23 0x7902a58706f4 in expand_sequence (forms=0x7902a501df10, 
env=0x7902a5055020) at expand.c:405
#24 0x7902a58706f4 in expand_sequence (forms=0x7902a501dfc0, 
env=0x7902a5055020) at expand.c:405
#25 0x7902a58706f4 in expand_sequence (forms=0x7902a501a070, 
env=0x7902a5055020) at expand.c:405
#26 0x7902a58706f4 in expand_sequence (forms=0x7902a501a120, 
env=0x7902a5055020) at expand.c:405
#27 0x7902a58706f4 in expand_sequence (forms=0x7902a501a1d0, 
env=0x7902a5055020) at expand.c:405
#28 0x7902a58706f4 in expand_sequence (forms=0x7902a501a900, 
env=0x7902a5055020) at expand.c:405
#29 0x7902a58706f4 in expand_sequence (forms=0x7902a5014230, 
env=0x7902a5055020) at expand.c:405
#30 0x7902a58706f4 in expand_sequence (forms=0x7902a5014860, 
env=0x7902a5055020) at expand.c:405
#31 0x7902a58706f4 in expand_sequence (forms=0x7902a500c1f0, 
env=0x7902a5055020) at expand.c:405
#32 0x7902a58706f4 in expand_sequence (forms=0x7902a500cc50, 
env=0x7902a5055020) at expand.c:405
#33 0x7902a58706f4 in expand_sequence (forms=0x7902a50096b0, 
env=0x7902a5055020) at expand.c:405
#34 0x7902a58706f4 in expand_sequence (forms=0x7902a50056e0, 
env=0x7902a5055020) at expand.c:405
#35 0x7902a58706f4 in expand_sequence (forms=0x7902a50010d0, 
env=0x7902a5055020) at expand.c:405
#36 0x7902a58706f4 in expand_sequence (forms=0x7902a5001cc0, 
env=0x7902a5055020) at expand.c:405
#37 0x7902a58706f4 in expand_sequence (forms=0x7902a4ffc8b0, 
env=0x7902a5055020) at 

Re: gnucash coredump on startup

2023-01-07 Thread Thomas Klausner
On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote:
> In article ,
> Thomas Klausner   wrote:
> >Hi!
> >
> >I've just replaced my 10.99.2/20221231 userland (kernel slightly
> >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland.
> >
> >Now gnucash dumps core on startup:
>
> Could be rtld related. Can you try with the older ld_elf.so?

Not really, because the base system uses 4 segments now and the old
one doesn't handle it - I can just downgrade the whole system.

# cd /archive/build/amd64.gcc.20221231/libexec/
# install -c ld.elf_so /libexec/
# ls
ls: Shared object "libutil.so.7" not found
# cd /archive/build/amd64.gcc.20230107/libexec/
# install -c ld.elf_so /libexec/
install: Shared object "libutil.so.7" not found
# LD_PRELOAD=/lib/libutil.so.7 install
/lib/libutil.so.7: wrong number of segments (4 != 2)

But yes, I suspect that too.

Btw, I also see a core dump building lang/guile30 on that system,
that's probably related and has less dependencies for trying out on
your system.
 Thomas


gnucash coredump on startup

2023-01-07 Thread Thomas Klausner
Hi!

I've just replaced my 10.99.2/20221231 userland (kernel slightly
older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland.

Now gnucash dumps core on startup:

(gdb) bt
#0  0x in ?? ()
#1  0x7f59b66414b1 in scm_c_hook_run (hook=0x7f59b695d140 
, data=0x0) at chooks.c:95
#2  0x7f59b665fb79 in after_gc_async_thunk () at gc.c:523
#3  0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#4  0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=5) at vm.c:1608
#5  0x7f59b6647b6a in scm_apply_0 (proc=0x7f59a6a9c500, args=0x304) at 
eval.c:603
#6  0x7f59b66dcbd7 in scm_throw (key=0x7f59a6a41560, args=0x7f59a6b3c2a0) 
at throw.c:262
#7  0x7f59b66dcbfe in scm_ithrow (key=, args=, no_return=) at throw.c:457
#8  0x7f59b6644224 in scm_error_scm (key=key@entry=0x7f59a6a41560, 
subr=, message=message@entry=0x7f59a6a574e0, 
args=args@entry=0x7f59a6b3c2e0, data=data@entry=0x7f59a6b3c310) at error.c:90
#9  0x7f59b6644282 in scm_error (key=0x7f59a6a41560, 
subr=subr@entry=0x7f59b670f487 "scm_hash_fn_get_handle", 
message=message@entry=0x7f59b670c9e0 "Wrong type argument in position ~A 
(expecting ~A): ~S", 
args=0x7f59a6b3c2e0, rest=rest@entry=0x7f59a6b3c310) at error.c:62
#10 0x7f59b664592a in scm_wrong_type_arg_msg (subr=0x7f59b670f487 
"scm_hash_fn_get_handle", pos=1, bad_value=0x7f59a6a48200, szMessage=) at error.c:282
#11 0x7f59b665c1ad in scm_hash_fn_get_handle (table=, 
obj=, hash_fn=, assoc_fn=, 
closure=) at hashtab.c:226
#12 0x7f59b665c1f5 in scm_hash_fn_ref (table=, 
obj=, dflt=0x4, hash_fn=, assoc_fn=, closure=) at hashtab.c:300
#13 0x7f59b6680574 in scm_symbol_to_keyword (symbol=0x7f59a6a45ec0) at 
keywords.c:72
#14 0x7f59b66df931 in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:1486
#15 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#16 0x7f59b668589a in load_thunk_from_memory (data=0x7f59a621b000 
"\177ELF\002\001\001\377", len=716869, is_read_only=) at 
loader.c:480
#17 0x7f59b6709a64 in scm_c_with_exception_handler.constprop.0 (type=0x404, 
handler_data=handler_data@entry=0x7f7fff6f1830, 
thunk_data=thunk_data@entry=0x7f7fff6f1830, thunk=, 
handler=)
at exceptions.c:170
#18 0x7f59b66db1f6 in scm_c_catch (tag=, body=, body_data=, handler=, 
handler_data=, pre_unwind_handler=, 
pre_unwind_handler_data=0x0) at throw.c:168
#19 0x7f59b66861e6 in try_load_thunk_from_file (filename=0x7f59a6b393c0) at 
load.c:622
#20 load_thunk_from_path (filename=filename@entry=0x7f59a6b39400, 
source_file_name=source_file_name@entry=0x7f59a6b393e0, 
source_stat_buf=source_stat_buf@entry=0x7f7fff6f1be0, 
found_stale_file=found_stale_file@entry=0x7f7fff6f1b3c) at load.c:765
#21 0x7f59b66863f0 in scm_primitive_load_path (args=) at 
load.c:1209
#22 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#23 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#24 0x7f59b668650a in scm_primitive_load_path (args=) at 
load.c:1259
#25 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#26 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#27 0x7f59b668650a in scm_primitive_load_path (args=) at 
load.c:1259
#28 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#29 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#30 0x7f59b668650a in scm_primitive_load_path (args=) at 
load.c:1259
#31 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#32 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=3) at vm.c:1608
#33 0x7f59b6642d8e in scm_call_3 (proc=, arg1=, arg2=, arg3=) at eval.c:510
#34 0x7f59b6684c8d in scm_public_variable (module_name=0x7f59a6b11830, 
name=0x7f59a6a42120) at modules.c:673
#35 0x7f59b66d8028 in init_eval_string_var_and_k_module () at strports.c:363
#36 0x7f59b69fa717 in pthread_once 
(once_control=once_control@entry=0x7f59b69555e0 , 
routine=routine@entry=0x7f59b66d7ffc )
at /disk/6/archive/foreign/src/lib/libpthread/pthread_once.c:66
#37 0x7f59b66d3f53 in scm_eval_string_in_module (string=0x7f59a6b0b2e0, 
module=0x904) at strports.c:379
#38 0x7f59b66d4004 in scm_eval_string (string=) at 
strports.c:394
#39 0x7f59b66d5d3c in scm_c_eval_string (expr=) at 
strports.c:347
#40 0x00f50740 in scm_run_gnucash (data=0x7f7fff6f2de0, argc=, argv=) at 
/scratch/finance/gnucash/work/gnucash-4.13/gnucash/gnucash.cpp:142
#41 0x7f59b665b6fc in invoke_main_func (body_data=0x7f7fff6f2da0) at 
init.c:312
#42 0x7f59b6640e62 in c_body (d=0x7f7fff6f2c60) at continuations.c:430
#43 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at 
vm-engine.c:972
#44 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=2) at vm.c:1608
#45 0x7f59b6642cde in scm_call_2 (proc=, arg1=, arg2=) at eval.c:503
#46 

MKLLVM build broken

2022-12-26 Thread Thomas Klausner
build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -V 
NOGCCERROR=yes -m amd64 distribution

with cvs from about an hour ago failed with:

src/external/bsd/compiler_rt/lib/clang/lib/netbsd/safestack-m64/../../../../../../../../sys/external/bsd/compiler_rt/dist/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cc:2182:31:
 error: 'TIOCRCVFRAME' was not declared in this scope; did you mean 
'IOCTL_TIOCRCVFRAME'?
 2182 | unsigned IOCTL_TIOCRCVFRAME = TIOCRCVFRAME;
  |   ^~~~
  |   IOCTL_TIOCRCVFRAME
src/external/bsd/compiler_rt/lib/clang/lib/netbsd/safestack-m64/../../../../../../../../sys/external/bsd/compiler_rt/dist/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cc:2183:31:
 error: 'TIOCXMTFRAME' was not declared in this scope; did you mean 
'TIOCPTSNAME'?
 2183 | unsigned IOCTL_TIOCXMTFRAME = TIOCXMTFRAME;
  |   ^~~~
  |   TIOCPTSNAME


 Thomas


Re: .gdbinit files in the repository

2022-11-23 Thread Thomas Klausner
On Wed, Nov 23, 2022 at 04:15:37PM -0500, Andrew Cagney wrote:
> On Tue, 22 Nov 2022 at 11:49, Thomas Klausner  wrote:
> >
> > Hi!
> >
> > Should these files be there?
> >
> > /usr/src> find . -name .gdbinit
> > ./external/gpl3/binutils/dist/gprof/.gdbinit
> > ./external/gpl3/binutils.old/dist/gprof/.gdbinit
> > ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit
> > ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit
> > ./external/gpl3/gdb/dist/sim/ppc/.gdbinit
> > ./external/gpl3/gdb/dist/gprof/.gdbinit
> > ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit
> > ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit
> > ./external/gpl3/gdb.old/dist/sim/ppc/.gdbinit
> > ./external/lgpl3/gmp/dist/.gdbinit
> >
> > Looks to me like they should be deleted by the *2netbsd import preparation 
> > scripts.
> 
> Is there a problem?  For instance, removing
> gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit would break
> that test (if someone were to desire to run it).

No problem - but usually dot files are not displayed (with ls) and
this could lead to surprises when you start gdb in those directories.
 Thomas


.gdbinit files in the repository

2022-11-22 Thread Thomas Klausner
Hi!

Should these files be there?

/usr/src> find . -name .gdbinit
./external/gpl3/binutils/dist/gprof/.gdbinit
./external/gpl3/binutils.old/dist/gprof/.gdbinit
./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit
./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit
./external/gpl3/gdb/dist/sim/ppc/.gdbinit
./external/gpl3/gdb/dist/gprof/.gdbinit
./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit
./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit
./external/gpl3/gdb.old/dist/sim/ppc/.gdbinit
./external/lgpl3/gmp/dist/.gdbinit

Looks to me like they should be deleted by the *2netbsd import preparation 
scripts.

Ok to remove?
 Thomas


Re: weird less(1) CTRL-Z behaviour

2022-11-06 Thread Thomas Klausner
On Mon, Oct 17, 2022 at 12:06:51PM +0200, Thomas Klausner wrote:
> On Fri, Oct 14, 2022 at 11:25:49PM +, RVP wrote:
> > On Wed, 12 Oct 2022, RVP wrote:
> > 
> > > On Wed, 12 Oct 2022, Thomas Klausner wrote:
> > > 
> > > > bin/57053: continuation problem in shell pipelines
> > > > 
> > > 
> > > FYI: Just tried on FreeBSD 13.1, and zsh-5.9 is broken there too.
> > > 
> > 
> > More: The prev. version, zsh-5.8.1, works on -HEAD. zsh-5.9 has the
> > same problem on Ubuntu 19.04 too.
> 
> Thanks for testing on other operating systems!
> 
> I've reported this issue to the zsh developers:
> 
> https://zsh.org/mla/workers/2022/msg01115.html

Upstream is still iterating, but I've added the second candidate patch
to pkgsrc:

https://zsh.org/mla/workers/2022/msg01204.html

Cheers,
 Thomas


Re: noisy dhcpcd messages

2022-11-05 Thread Thomas Klausner
On Tue, Nov 01, 2022 at 01:29:19PM +0300, Valeriy E. Ushakov wrote:
> On Tue, Nov 01, 2022 at 10:05:19 +0100, Thomas Klausner wrote:
> 
> > What's up with these log lines?
> > 
> > Oct 31 07:52:59 yt dhcpcd[3496]: wm0: requesting DHCPv6 information
> > Oct 31 07:53:52 yt syslogd[4885]: last message repeated 5 times
> [...]
> > This is not a new issue, I can find these log lines in my messages
> > files going back to at least August 31. Does this message have a point
> > or should we remove it from the default log level?
> 
> It is, but in -current,

Do you mean "it is removed, but in -current"?  That confuses me,
because these messages I see are on 9.99.104.

Or syslog has a different bug, because more details from the same log
file:

Nov  5 07:24:05 yt ntpd[6574]: 86.59.113.124 local addr 192.168.0.33 -> 
Nov  5 07:24:06 yt dhcpcd[3514]: wm0: requesting DHCPv6 information
Nov  5 07:24:48 yt syslogd[4883]: last message repeated 4 times
Nov  5 07:26:48 yt syslogd[4883]: last message repeated 12 times
Nov  5 07:36:48 yt syslogd[4883]: last message repeated 60 times
Nov  5 07:46:48 yt syslogd[4883]: last message repeated 60 times
Nov  5 07:56:49 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:06:49 yt syslogd[4883]: last message repeated 59 times
Nov  5 08:16:49 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:26:49 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:36:49 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:46:50 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:56:50 yt syslogd[4883]: last message repeated 60 times
Nov  5 08:59:21 yt /netbsd: [ 49084.9251623] 192.168.0.19:/volume2/transfer: 
inaccurate wcc data (ctime) detected, disabling wcc (ctime 1667627567.063208771 
1667627567.063208771, mtime 1667627567.063208771 1667627567.063208771)
Nov  5 08:59:21 yt /netbsd: [ 56679.1666898] nfs server 
192.168.0.19:/volume2/games: not responding
Nov  5 08:59:21 yt /netbsd: [ 56679.2966918] nfs server 
192.168.0.19:/volume2/games: is alive again
Nov  5 09:06:50 yt syslogd[4883]: last message repeated 60 times
Nov  5 09:16:50 yt syslogd[4883]: last message repeated 60 times
Nov  5 09:26:50 yt syslogd[4883]: last message repeated 60 times

It doesn't make sense that the "is alive again" message should be
repeated without the "not responding" one before that.

Or I don't understand syslog messages :)
 Thomas


Re: building source against installed libraries?

2022-10-31 Thread Thomas Klausner
On Mon, Oct 31, 2022 at 01:46:56PM +0300, Valeriy E. Ushakov wrote:
> On Mon, Oct 31, 2022 at 11:10:24 +0100, Thomas Klausner wrote:
> 
> > For test builds, I use 'USETOOLS=no make' to avoid building a
> > toolchain.  However that still wants to link against libraries built
> > in the source tree, i.e. I have to 'cd /usr/src/lib/libcrypto &&
> > USETOOLS=no make' to build a new libcrypto if this library is used.
> > 
> > Is there a toggle to build against the installed libraries instead?
> 
> It's entirely unclear from this description what exactly you are
> trying to do and how does it fail.

I'm trying this:

wiz@yt:/usr/src/external/bsd/nsd> USETOOLS=no make

and get
...
all ===> lib/libnsd
all ===> lib/libxfrd
all ===> sbin
all ===> sbin/nsd
make[2]: don't know how to make 
/disk/6/archive/foreign/src/external/bsd/libevent/lib/libevent/libevent.a. Stop

I want the build to use /usr/lib/libevent.a instead.

> If I have to venture a guess (I don't have time atm to second guess/
> reverse engineer the question), you are probably running into
> something like LIBDPLIBS dependencies that are explicitly listed in
> the in-tree makefiles, b/c those makefiles are intended to build the
> in-tree code (e.g. for curses I would disable its LIBDPLIBS dependency
> on terminfo).  Just overriding them on the command line might help.

I guess this is

./Makefile.inc:DPLIBS+= event ${NETBSDSRCDIR}/external/bsd/libevent/lib/libevent

so perhaps what you're talking about?

What do I have to set? Do I have to do this for every library
separately?
 Thomas


building source against installed libraries?

2022-10-31 Thread Thomas Klausner
Hi!

For test builds, I use 'USETOOLS=no make' to avoid building a
toolchain.  However that still wants to link against libraries built
in the source tree, i.e. I have to 'cd /usr/src/lib/libcrypto &&
USETOOLS=no make' to build a new libcrypto if this library is used.

Is there a toggle to build against the installed libraries instead?
 Thomas


Re: 9.99.104: panic in tcp_shutdown_wrapper

2022-10-29 Thread Thomas Klausner
Hi!

A couple hours later, my shell was in an NFS mounted directory (probably idle 
for some time) and I tried tab-completing an entry, and it panicked again.
Same location as below.

Hand copied:
tcp_shutdown_wrapper+0x20
nfs_disconnect+0x69
nfs_reconnect+0x1a
nfs_request+0x7fb
nfs_access+0x1ed
VOP_ACCESS+0x61
nfs_lookup+052f
VOP_LOOKUP+0x8a
lookup_once+0x1a6
namei_tryemulroot+0xb00
namei+0x29
vn_open+0x133
do_open+0xc3
do_sys_openat+0x74
sys_open+0x24
syscall+0x196

 Thomas

> On 29.10.2022, at 11:53, Thomas Klausner  wrote:
> 
> Hi!
> 
> I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + user 
> land, but packages still the old ones built on 9.99.100 in case it matters).
> A couple hours later I started transmission-gtk and the machine immediately 
> panicked.
> 
> Hand copied:
> 
> uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e
> Fatal page fault in supervisor mode
> Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 
> ilevel 0 rsp 0xfc62191caaaf0
> Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0
> Kernel: page fault trap, code = 0
> Stopped in pid 6904.22757 (transmission-gtk) at 
> netbsd:tcp_shutdown_wrapper+0x20
> : movq 38(%rax), %r14
> tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20
> nfs_disconnect() at netbsd:nfs_disconnect+0x69
> nfs_reconnect() at netbsd:nfs_reconnect+0x1a
> nfs_request() at netbsd:nfs_request+0x7fb
> nfs_statvfs() at netbsd:nfs_statvfs+0x173
> VFS_STATVFS() at netbsd:VFS_STATVFS+0x22
> dostatvfs() at netbsd:dostatvfs+0x132
> do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f
> sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b
> syscall() at netbsd:syscall+0x196
> 
> I have nfs mounted some shares from a Synology station.
> 
> Ideas? Perhaps the pcb merge changes from this week?
> Thomas



9.99.104: panic in tcp_shutdown_wrapper

2022-10-29 Thread Thomas Klausner
Hi!

I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + user 
land, but packages still the old ones built on 9.99.100 in case it matters).
A couple hours later I started transmission-gtk and the machine immediately 
panicked.

Hand copied:

uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e
Fatal page fault in supervisor mode
Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 ilevel 
0 rsp 0xfc62191caaaf0
Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0
Kernel: page fault trap, code = 0
Stopped in pid 6904.22757 (transmission-gtk) at netbsd:tcp_shutdown_wrapper+0x20
: movq 38(%rax), %r14
tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20
nfs_disconnect() at netbsd:nfs_disconnect+0x69
nfs_reconnect() at netbsd:nfs_reconnect+0x1a
nfs_request() at netbsd:nfs_request+0x7fb
nfs_statvfs() at netbsd:nfs_statvfs+0x173
VFS_STATVFS() at netbsd:VFS_STATVFS+0x22
dostatvfs() at netbsd:dostatvfs+0x132
do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f
sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b
syscall() at netbsd:syscall+0x196

I have nfs mounted some shares from a Synology station.

Ideas? Perhaps the pcb merge changes from this week?
 Thomas

Re: weird less(1) CTRL-Z behaviour

2022-10-17 Thread Thomas Klausner
On Fri, Oct 14, 2022 at 11:25:49PM +, RVP wrote:
> On Wed, 12 Oct 2022, RVP wrote:
> 
> > On Wed, 12 Oct 2022, Thomas Klausner wrote:
> > 
> > > bin/57053: continuation problem in shell pipelines
> > > 
> > 
> > FYI: Just tried on FreeBSD 13.1, and zsh-5.9 is broken there too.
> > 
> 
> More: The prev. version, zsh-5.8.1, works on -HEAD. zsh-5.9 has the
> same problem on Ubuntu 19.04 too.

Thanks for testing on other operating systems!

I've reported this issue to the zsh developers:

https://zsh.org/mla/workers/2022/msg01115.html

 Thomas


Re: weird less(1) CTRL-Z behaviour

2022-10-12 Thread Thomas Klausner
On Wed, Oct 12, 2022 at 10:58:56AM +, RVP wrote:
> File a PR.

This is now

bin/57053: continuation problem in shell pipelines

Thanks,
 Thomas


weird less(1) CTRL-Z behaviour

2022-10-12 Thread Thomas Klausner
Hi!

I've been using the following shell function for ages:

dir() { ls -al "$@" | less; }

On -current (9.99.100 kernel from Oct 9, Userland from Sep 21, zsh
from May), when I CTRL-Z the less(1) and then want to go back in, it
doesn't work and I see the following:

> dir
zsh: done   ls -al "$@" | 
zsh: suspended
> fg
[1]  + done   ls -al "$@" | 
   continued  
zsh: donels -al "$@" | 
zsh: suspended (tty output)  
zsh: donels -al "$@" | 
zsh: suspended (tty output)  

That happens every time I try to 'fg' it.

This was working fine not so long ago, but I don't remember exactly
when it started happening.
 Thomas


Re: Panic (KASSERT) in src/sys/uvm/uvm_map.c", line 2120 (today's HEAD).

2022-09-18 Thread Thomas Klausner
On Sun, Sep 18, 2022 at 01:44:25PM +0700, Robert Elz wrote:
> mmap_hint: [ 991.7219923] panic: kernel diagnostic assertion "!topdown || 
> hint <= orig_hint" failed: file "/release/src/sys/uvm/uvm_map.c", line 2120 
> hint: 0x1ff000, orig_hint: 0x1000

I think this is http://gnats.netbsd.org/56900

An assertion riastradh added that should be true isn't always true.
 Thomas


Re: namespace pollution? clone()

2022-08-01 Thread Thomas Klausner
On Mon, Aug 01, 2022 at 06:06:19PM +0300, Valeriy E. Ushakov wrote:
> On Mon, Aug 01, 2022 at 16:50:14 +0200, Thomas Klausner wrote:
> 
> > On Mon, Aug 01, 2022 at 05:45:23PM +0300, Valeriy E. Ushakov wrote:
> > > Shouldn't we expose __clone(2) (the real symbol in the reserved
> > > namespace) under _NETBSD_SOURCE and only hide clone(2) weak alias
> > > under _GNU_SOURCE?  You kinda sidestep some potential issues here in
> > > this case b/c __clone is an assembler syscall stub, so there's no C
> > > source that implements __close() that has to see the declaration.
> > 
> > I don't understand the problem you see here - please fix it as you
> > find appropriate.
> 
> I think we should still expose __clone() under _NETBSD_SOURCE, but
> expose clone() only under _GNU_SOURCE.  My original reply that
> prompted your patch was not very clear about this, but it talked
> specifically about clone() (and clone() only).
> 
> Your patch hides both clone() and __clone() under _GNU_SOURCE.  You
> were not forced to consider this choice b/c __clone() is not
> implemented in C, so there's no C code in the tree that needs to see
> the __clone() prototype that your patch hides.
> 
> __clone is in the reserved namespace, so no well behaving programs
> should be affected by that declaration.

I don't understand why we expose __clone() in a public header at all,
but I understand your comments to result in the attached patch.

Please suggest a comment to put before the __clone() line.

Thanks,
 Thomas
Index: sched.h
===
RCS file: /cvsroot/src/include/sched.h,v
retrieving revision 1.14
diff -u -r1.14 sched.h
--- sched.h 1 Aug 2022 14:34:01 -   1.14
+++ sched.h 1 Aug 2022 15:10:52 -
@@ -73,13 +73,17 @@
 
 /*
  * Historical functions, not defined in standard
- * Linux man page documents these functions as only available when
+ * Linux man page documents clone() as only available when
  * _GNU_SOURCE is defined
  */
 pid_t   clone(int (*)(void *), void *, int, void *);
+#endif /* _GNU_SOURCE */
+
+#if defined(_NETBSD_SOURCE)
+
 pid_t  __clone(int (*)(void *), void *, int, void *);
 
-#endif /* _GNU_SOURCE */
+#endif /* _NETBSD_SOURCE */
 
 __END_DECLS
 


Re: namespace pollution? clone()

2022-08-01 Thread Thomas Klausner
On Mon, Aug 01, 2022 at 05:45:23PM +0300, Valeriy E. Ushakov wrote:
> Shouldn't we expose __clone(2) (the real symbol in the reserved
> namespace) under _NETBSD_SOURCE and only hide clone(2) weak alias
> under _GNU_SOURCE?  You kinda sidestep some potential issues here in
> this case b/c __clone is an assembler syscall stub, so there's no C
> source that implements __close() that has to see the declaration.

I don't understand the problem you see here - please fix it as you
find appropriate.

Thanks,
 Thomas


Re: namespace pollution? clone()

2022-08-01 Thread Thomas Klausner
On Mon, Aug 01, 2022 at 07:32:26AM -0700, Jason Thorpe wrote:
> 
> > On Aug 1, 2022, at 7:22 AM, Thomas Klausner  wrote:
> > 
> > On Mon, Aug 01, 2022 at 11:20:11PM +0900, Rin Okuyama wrote:
> >> On 2022/08/01 23:13, Martin Husemann wrote:
> >>> On Mon, Aug 01, 2022 at 03:57:19PM +0200, Thomas Klausner wrote:
> >>>> The attached diff survived a complete amd64-current build. Ok to commit?
> >>> 
> >>> Looks good to me.
> >> 
> >> Can you please mention _GNU_SOURCE in clone(2)?
> > 
> > Thanks for the reminder - done!
> 
> Please also fix the comment style to conform to KNF.

Done.
 Thomas


Re: namespace pollution? clone()

2022-08-01 Thread Thomas Klausner
On Mon, Aug 01, 2022 at 11:20:11PM +0900, Rin Okuyama wrote:
> On 2022/08/01 23:13, Martin Husemann wrote:
> > On Mon, Aug 01, 2022 at 03:57:19PM +0200, Thomas Klausner wrote:
> > > The attached diff survived a complete amd64-current build. Ok to commit?
> > 
> > Looks good to me.
> 
> Can you please mention _GNU_SOURCE in clone(2)?

Thanks for the reminder - done!
 Thomas


Re: namespace pollution? clone()

2022-08-01 Thread Thomas Klausner
On Tue, Jul 26, 2022 at 03:03:54PM +0200, Martin Husemann wrote:
> On Tue, Jul 26, 2022 at 03:46:14PM +0300, Valery Ushakov wrote:
> > On Linux clone(2) is declared only for _GNU_SOURCE, which explains why
> > linux doesn't run into the name clash.  I gather we should follow
> > suit, as that's what the apps expect.
> 
> Yes, that is the right thing to do here, especially as clone(2) does
> only exist as a portability helper for linux code.
> 
> I think we could even pull that change up to -9.

The attached diff survived a complete amd64-current build. Ok to commit?
 Thomas
Index: sched.h
===
RCS file: /cvsroot/src/include/sched.h,v
retrieving revision 1.12
diff -u -r1.12 sched.h
--- sched.h 11 Jan 2009 03:04:12 -  1.12
+++ sched.h 1 Aug 2022 13:57:06 -
@@ -59,20 +59,26 @@
 #define sched_yield__libc_thr_yield
 #endif /* __LIBPTHREAD_SOURCE__ */
 
-#if defined(_NETBSD_SOURCE)
-
 __BEGIN_DECLS
 
+#if defined(_NETBSD_SOURCE)
+
 /* Process affinity functions (not portable) */
 intsched_getaffinity_np(pid_t, size_t, cpuset_t *);
 intsched_setaffinity_np(pid_t, size_t, cpuset_t *);
 
+#endif /* _NETBSD_SOURCE */
+
+#if defined(_GNU_SOURCE)
+
 /* Historical functions, not defined in standard */
+/* Linux man page documents these functions as only available when
+ * _GNU_SOURCE is defined */
 pid_t   clone(int (*)(void *), void *, int, void *);
 pid_t  __clone(int (*)(void *), void *, int, void *);
 
-__END_DECLS
+#endif /* _GNU_SOURCE */
 
-#endif /* _NETBSD_SOURCE */
+__END_DECLS
 
 #endif /* _SCHED_H_ */


Re: namespace pollution? clone()

2022-07-26 Thread Thomas Klausner
On Tue, Jul 26, 2022 at 06:11:36AM -0400, Greg Troxel wrote:
> So where is the visibility restriction?

Oh, that's probably a misunderstanding on my side.
 Thomas




namespace pollution? clone()

2022-07-26 Thread Thomas Klausner
Hi!

When compiling inkscape I found a weird compilation error that I
traced down to clone() being in the visible namespace.

https://gitlab.com/inkscape/inbox/-/issues/7378

I wonder why it's visible though, since in sched.h it's protected by
_NETBSD_SOURCE.

The command line is

cd /scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src && 
c++ -DHAVE_CONFIG_H -DHAVE_X11 -DWITH_CSSBLEND -DWITH_MESH -DWITH_SVG2 
-D_REENTRANT -Dinkscape_base_EXPORTS 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/include 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/adaptagrams
 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/2geom/include
 
-I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/2geom/include/2geom
 -isystem /usr/pkg/include/harfbuzz -isystem /usr/pkg/include/freetype2 
-isystem /usr/pkg/include -isystem /usr/pkg/include/glib-2.0 -isystem 
/usr/pkg/lib/glib-2.0/include -isystem /usr/pkg/include/pango-1.0 -isystem 
/usr/pkg/include/fribidi -isystem /usr/pkg/include/cairo -isystem 
/usr/pkg/include/pixman-1 -isystem /usr/pkg/include/libpng16 -isystem 
/usr/pkg/include/libsoup-2.4 -isystem /usr/pkg/include/libxml2 -isystem 
/usr/pkg/include/poppler -isystem /usr/pkg/include/libwpg-0.3 -isystem 
/usr/pkg/include/librevenge-0.0 -isystem /usr/pkg/include/libwpd-0.10 -isystem 
/usr/pkg/include/libvisio-0.1 -isystem /usr/pkg/include/libcdr-0.1 -isystem 
/usr/pkg/include/gtkmm-3.0 -isystem /usr/pkg/lib/gtkmm-3.0/include -isystem 
/usr/pkg/include/giomm-2.4 -isystem /usr/pkg/lib/giomm-2.4/include -isystem 
/usr/pkg/include/glibmm-2.4 -isystem /usr/pkg/lib/glibmm-2.4/include -isystem 
/usr/pkg/include/sigc++-2.0 -isystem /usr/pkg/lib/sigc++-2.0/include -isystem 
/usr/pkg/include/gtk-3.0 -isystem /usr/pkg/include/gdk-pixbuf-2.0 -isystem 
/usr/pkg/include/gio-unix-2.0 -isystem /usr/pkg/include/libdrm -isystem 
/usr/pkg/include/atk-1.0 -isystem /usr/pkg/include/at-spi2-atk/2.0 -isystem 
/usr/pkg/include/dbus-1.0 -isystem /usr/pkg/lib/dbus-1.0/include -isystem 
/usr/pkg/include/at-spi-2.0 -isystem /usr/pkg/include/cairomm-1.0 -isystem 
/usr/pkg/lib/cairomm-1.0/include -isystem /usr/pkg/include/pangomm-1.4 -isystem 
/usr/pkg/lib/pangomm-1.4/include -isystem /usr/pkg/include/atkmm-1.6 -isystem 
/usr/pkg/lib/atkmm-1.6/include -isystem /usr/pkg/include/gtk-3.0/unix-print 
-isystem /usr/pkg/include/gdkmm-3.0 -isystem /usr/pkg/lib/gdkmm-3.0/include -O2 
-g -fPIC -D_FORTIFY_SOURCE=2 -fstack-check -pthread -I/usr/pkg/include 
-I/usr/include -I/usr/pkg/include/freetype2 -I/usr/pkg/include/glib-2.0 
-I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include 
-I/usr/pkg/include/harfbuzz -I/usr/pkg/include/python3.10 
-I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -DG_DISABLE_ASSERT 
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DGLIBMM_DISABLE_DEPRECATED 
-DGTKMM_DISABLE_DEPRECATED -DGDKMM_DISABLE_DEPRECATED -DGTK_DISABLE_DEPRECATED 
-DGDK_DISABLE_DEPRECATED -fstack-protector-strong -Werror=format 
-Werror=format-security -Werror=ignored-qualifiers -Werror=return-type 
-Wno-switch -Wstrict-null-sentinel -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT 
-D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -pthread 
-D_REENTRANT -D_REENTRANT -DNDEBUG -fPIC   -pthread -fPIC -std=gnu++17 -MD -MT 
src/CMakeFiles/inkscape_base.dir/actions/actions-edit.cpp.o -MF 
CMakeFiles/inkscape_base.dir/actions/actions-edit.cpp.o.d -E 
/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/actions/actions-edit.cpp

Cheers,
 Thomas


Re: panic in evo_wait

2022-07-18 Thread Thomas Klausner
Hi Matt!

On Mon, Jul 18, 2022 at 01:53:49PM +1000, Matthew Green wrote:
> > [184218.xxx] warning: 
> > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
> >  1
> 
> can you patch this code to print the value of "data" here?
> it's probably a bad request for userland, but the BUG_ON()
> here does not give you any indication on _what_.

Ok, I'll use the attached diff for my next kernel.

> > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
> > [184218.xxx] fatal page fault in supervisor mode
> > [184218.xxx] trap type 6 code 0x2 ...
> 
> this line's contents would have included the fault address,
> which is kinda useful for next time :-)

I've got the rip -- it's 0x8095e177.

> > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
> > 0xb589296452c0
> > kernel: page fault trap, code=0
> > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
> > 000,0(%rdx,%rax,1)
> > evo_wait() at netbsd:evo_wait+0x7b
> > base507c_ntfy_set()
> > nv50_wndw_flush_set()
> > nv50_disp_atomic_commit_tail()
> > nv50_disp_atomic_commit()
> > drm_atomic_helper_set_config()
> > drm_mode_setcrtc()
> > drm_ioctl()
> 
> can you find out where evo_wait+0x7b is?  in my kernel it's
> at line 243, and the disasm seems to patch your "movl" above.
>
> 235 evo_wait(struct nv50_dmac *evoc, int nr)
> 236 {
> 237 struct nv50_dmac *dmac = evoc;
> 238 struct nvif_device *device = dmac->base.device;
> 239 u32 put = nvif_rd32(>base.user, 0x) / 4;
> 240
> 241 spin_lock(>lock);
> 242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
> 243 dmac->ptr[put] = 0x2000;
> 244 evo_flush(dmac);
> 
> Dump of assembler code for function evo_wait:
>0x8084dfe1 <+0>:   push   %rbp
> [...]
>0x8084e05c <+123>: movl   $0x2000,(%rdx,%rax,1)
> 
> (0x7b = 123)

exactly:

(gdb) 
241 spin_lock(>lock);
242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
243 dmac->ptr[put] = 0x2000;
244 evo_flush(dmac);
245
246 nvif_wr32(>base.user, 0x, 0x);
247 if (nvif_msec(device, 2000,
248 if (!nvif_rd32(>base.user, 0x0004))
249 break;
250 ) < 0) {
(gdb) info line *(evo_wait+0x7b)
Line 243 of 
"/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c"
 starts at address 0x8095e170  and ends at 
0x8095e17e .

which also matches the rip:

(gdb) info line *(0x8095e177)
Line 243 of 
"/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c"
 starts at address 0x8095e170  and ends at 
0x8095e17e .

> probably "dmac->ptr" is invalid here.  a quick guess at the
> code indicates it's only set once in nv50_dmac_create(),
> the source from the caller(s).  at least, i can't see it
> set anywhere else right now.

 Thomas
Index: nouveau_nvkm_engine_disp_headgf119.c
===
RCS file: 
/cvsroot/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c,v
retrieving revision 1.2
diff -u -r1.2 nouveau_nvkm_engine_disp_headgf119.c
--- nouveau_nvkm_engine_disp_headgf119.c18 Dec 2021 23:45:35 -  
1.2
+++ nouveau_nvkm_engine_disp_headgf119.c18 Jul 2022 18:36:47 -
@@ -80,7 +80,7 @@
case 0: state->or.depth = 18; break; /*XXX: "default" */
default:
state->or.depth = 18;
-   WARN_ON(1);
+   WARN_ON(data);
break;
}
 }


panic in evo_wait

2022-07-17 Thread Thomas Klausner
Hi!

Yesterday I had a panic on 9.99.98/amd64 from June 22 while playing a
couple of videos using mpv. Hand-transcribed from the console

[184197.xxx] nouveau0: error: bus: MMIO read of  FAULT at 409800 
[TIMEOUT ]
[184199.xxx] nouveau0: warn: timeout
[184199.xxx] nouveau0: error: gr: init failed, -16
[184201.xxx] nouveau0: warn: timeout
[184203.xxx] nouveau0: warn: timeout
[184205.xxx] nouveau0: warn: timeout
[184207.xxx] nouveau0: warn: timeout
[184209.xxx] nouveau0: warn: timeout
[184211.xxx] nouveau0: warn: timeout
[184213.xxx] nouveau0: warn: timeout
[184215.xxx] nouveau0: warn: timeout
[184218.xxx] nouveau0: warn: timeout
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
[184218.xxx] fatal page fault in supervisor mode
[184218.xxx] trap type 6 code 0x2 ...
[184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
0xb589296452c0
kernel: page fault trap, code=0
Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
000,0(%rdx,%rax,1)
evo_wait() at netbsd:evo_wait+0x7b
base507c_ntfy_set()
nv50_wndw_flush_set()
nv50_disp_atomic_commit_tail()
nv50_disp_atomic_commit()
drm_atomic_helper_set_config()
drm_mode_setcrtc()
drm_ioctl()
drm_ioctl_shim()
sys_ioctl()
syscall()
--- syscall (number 54) ---


Does this ring a bell with anyone?

Should I file a PR?
 Thomas


savecore weirdness

2022-06-29 Thread Thomas Klausner
Hi!

In the last weeks, every reboot tries to write a crashdump, but
savecore fails at the end, and the next boot tries to write it again.

savecore: writing compressed core to ...
savecore: writing compressed kernel to ...
savecore: kvm_read ksymcs: _kvm_kvatop(e9x031814c8cf8c7)
savecore: (null): Bad address
/etc/rc.d/savecore exited with code 1

That looks like a bug in savecore. It shouldn't misbehave whatever the
crash data.

I've tried overwriting the first 100MB of the 'dp' entry in my fstab
with zeroes in the hope of getting rid of the crashdump, but that
didn't help either. How can I get rid of the crashdump so savecore
doesn't try again to write it out?

Thanks,
 Thomas


Panic in uvm_map_findspace (was Re: 9.99.98: spontaneous reboots)

2022-06-24 Thread Thomas Klausner



> On 23.06.2022, at 09:34, Thomas Klausner  wrote:
> 
> On Tue, Jun 21, 2022 at 11:04:03PM +0200, Thomas Klausner wrote:
>> I've been running a 9.99.97 kernel from June 1 and it was stable,
>> including bulk builds. Today I upgraded to 9.99.98 and started a fresh
>> bulk build, and it rebooted after a couple hours, nothing in dmesg or
>> syslog, no crashdump.
>> 
>> I restarted the bulk build and it rebooted again after about 5 minutes
>> and one finished package. I think it was building nodejs and
>> webkit-gtk at the time (and some third package I don't know).
>> 
>> Has anyone else seen stability issues with today's current?
> 
> Thanks for the feedback.
> 
> I think you were lucky :) or we have different hardware.
> 
> I've tried bulk building with 98 for a third time and had a reboot
> shortly afterwards. Going back to 97 from June 1, I could finish a
> bulk build.
> 
> I've locally backed out the uvm changes from early June and will
> report back if that brought back stability.
> 

With the UVM changes from early June backed out, the system was stable.
I had ddb_onpanic=0, I switched to ddb_onpanic=1 and after some time with a 
9.99.98 kernel I got this:

panic: kernel diagnostic assertion “!topdown || hint <= orig_hint” failed: file 
“/usr/src/sys/uvm/uvm_map.c”, line 1795 map=0xc8415a1d9388 
hint=0xfff944a0 orig_hint=0x8200 length=0x77e00 uobj=0x0 
uoffset=0xf align=0 flags=0x80710 entry=0xc8415a1d93e0 
(uvm_map_findspace line 1998)
cpu0: begin traceback
vpanic()
kern_assert()
uvm_findspace_invariants() at netbsd:uvm_findspace_invariants+0x83
uvm_map_findspace() at netbsd:uvm_map_findspace()+0x1c6
uvm_map_prepare() at netbsd:uvm_map_prepare+0x1f6
uvm_map() at netbsd:uvm_map+0x70
uvm_mmap.part.0() at netbsd:vm_mmap.part.0+0x15a
sys_mmap() at netbsd:sys_mmap+0x42f
syscall() number 197

(Handcopied with autocorrection, but I have pictures if you want to verify 
something).

 Thomas

Re: 9.99.98: spontaneous reboots

2022-06-23 Thread Thomas Klausner
On Tue, Jun 21, 2022 at 11:04:03PM +0200, Thomas Klausner wrote:
> I've been running a 9.99.97 kernel from June 1 and it was stable,
> including bulk builds. Today I upgraded to 9.99.98 and started a fresh
> bulk build, and it rebooted after a couple hours, nothing in dmesg or
> syslog, no crashdump.
> 
> I restarted the bulk build and it rebooted again after about 5 minutes
> and one finished package. I think it was building nodejs and
> webkit-gtk at the time (and some third package I don't know).
> 
> Has anyone else seen stability issues with today's current?

Thanks for the feedback.

I think you were lucky :) or we have different hardware.

I've tried bulk building with 98 for a third time and had a reboot
shortly afterwards. Going back to 97 from June 1, I could finish a
bulk build.

I've locally backed out the uvm changes from early June and will
report back if that brought back stability.
 Thomas


9.99.98: spontaneous reboots

2022-06-21 Thread Thomas Klausner
Hi!

I've been running a 9.99.97 kernel from June 1 and it was stable,
including bulk builds. Today I upgraded to 9.99.98 and started a fresh
bulk build, and it rebooted after a couple hours, nothing in dmesg or
syslog, no crashdump.

I restarted the bulk build and it rebooted again after about 5 minutes
and one finished package. I think it was building nodejs and
webkit-gtk at the time (and some third package I don't know).

Has anyone else seen stability issues with today's current?
 Thomas


Re: scp -r incompatibility between -current and NetBSD releases

2022-06-11 Thread Thomas Klausner
On Sat, Jun 11, 2022 at 08:48:10AM -0700, Brian Buhrow wrote:
>   Hello.  What version of openssh are you using?  I just tested between 
> NetBSD-5.2 and
> -current as of 99.77.  Those versions are:
> 5.2: OpenSSH_5.0 NetBSD_Secure_Shell-20080403-hpn13v1
> 99.77: OpenSSH_8.4 NetBSD_Secure_Shell-20201204-hpn13v14-lpk,
> 
> your command, with a nested directory, works in both directions between these 
> two machines
> without an issue.

OpenSSH_9.0 NetBSD_Secure_Shell-20220415-hpn13v14-lpk, OpenSSL 1.1.1n  15 Mar 
2022

 Thomas


scp -r incompatibility between -current and NetBSD releases

2022-06-11 Thread Thomas Klausner
Hi!

I cannot use 'scp -r' from -current to NetBSD 8 or NetBSD 9.

> scp -r a target:
scp: realpath ./a: No such file
scp: upload "./a": path canonicalization failed
scp: failed to upload directory a to .

scp without -r still works fine.

Is there a compatibility setting I can enable to make this work again?
 Thomas


Re: nouveau: back in text console after switch to graphical one

2022-06-08 Thread Thomas Klausner
Did either of you install any firmware files?
Which firmware file is loaded?
 Thomas


Re: nouveau: back in text console after switch to graphical one

2022-06-08 Thread Thomas Klausner
On Wed, Jun 08, 2022 at 04:21:16PM +0200, Cygnus X-1 wrote:
> On 22/06/08 06:58AM, Paul Goyette wrote:
> > Yup. At least with 9.99.97 my nouveau is running great on my Geforce
> > GTX 1050 Ti  
>
> Thanks a lot for the precious feedback.
> Alright, I guess it's time to upgrade to 9.99.97 and report back.

Well, I was already on 9.99.97 from June, so that's not enough for me.
 Thomas


Re: boot.cfg syntax question

2022-06-05 Thread Thomas Klausner
On Sun, Jun 05, 2022 at 12:53:09AM +, RVP wrote:
> On Sun, 5 Jun 2022, Thomas Klausner wrote:
> 
> > However, when I press '3' in that config, I get a kernel where nouveau
> > is disabled.
> > 
> > Did I misunderstand the man page or is there a bug here?
> > 
> 
> Looks like a bug when a bare `boot' is encountered. Work around it by
> forcing a kernel filename:
> 
> --- boot.cfg.orig   2022-06-05 00:48:51.47679 +
> +++ boot.cfg2022-06-05 00:49:18.797459000 +
> @@ -1,6 +1,6 @@
> -menu=Boot without nouveau:rndseed /var/db/entropy-file;userconf disable 
> nouveau*;boot
> +menu=Boot without nouveau:rndseed /var/db/entropy-file;userconf disable 
> nouveau*;boot /netbsd
>  menu=Boot old without nouveau:rndseed /var/db/entropy-file;userconf disable 
> nouveau*;boot /netbsd.old
> -menu=Boot normally:rndseed /var/db/entropy-file;boot
> +menu=Boot normally:rndseed /var/db/entropy-file;boot /netbsd
>  menu=Boot single user:rndseed /var/db/entropy-file;boot -s
>  menu=Drop to boot prompt:prompt
>  default=1

Yes, this works around the issue for me. Thanks!
 Thomas


  1   2   3   4   5   6   >