obsolete manhtml files not deleted
Hi! I just upgraded from a couple months ago and found: Only in /usr/share/man/html8: dnssec-dsfromkey.html Only in /usr/share/man/html8: dnssec-importkey.html Only in /usr/share/man/html8: dnssec-keyfromlabel.html Only in /usr/share/man/html8: dnssec-keygen.html Only in /usr/share/man/html8: dnssec-revoke.html Only in /usr/share/man/html8: dnssec-settime.html Only in /usr/share/man/html8: dnssec-signzone.html Only in /usr/share/man/html8: dnssec-verify.html Only in /usr/share/man/html8: named-checkconf.html Only in /usr/share/man/html8: named-checkzone.html Only in /usr/share/man/html8: named-compilezone.html Only in /usr/share/man/html8: named-journalprint.html Only in /usr/share/man/html8: nsec3hash.html some files that were in my old install and not the new one, and which should have been deleted. The man8 versions of the file are indeed gone. I see that these are listed in the manhtml set - does the "postinstall fix" step for cleaning up obsolete files need changes for supporting manhtml? Thomas
header installation not make-jobs safe?
Hi! I had an interesting build failure today when using build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -T /usr/obj/tools.gcc -m amd64 -O /usr/obj/src.amd64 -D /usr/obj/amd64.gcc.20240411 -R /usr/obj/amd64.gcc.20240411.release distribution The build stopped quite early with: --- /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h --- *** Failed target: /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h *** Failed commands: @cmp -s ${.ALLSRC} ${.TARGET} > /dev/null 2>&1 || (${_MKSHMSG_INSTALL} ${.TARGET}; ${_MKSHECHO} "${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP} -m ${NONBINMODE} ${.ALLSRC} ${.TARGET}" && ${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP} -m ${NONBINMODE} ${.ALLSRC} ${.TARGET}) => @cmp -s krb5_asn1.h /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h > /dev/null 2>&1 || (echo '# ' "install " /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h; echo "/usr/obj/tools.gcc/bin/x86_64--netbsd-install -N /disk/storage-202004/archive/foreign/src/etc -c -r -o root -g wheel -m 444 krb5_asn1.h /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h" && /usr/obj/tools.gcc/bin/x86_64--netbsd-install -N /disk/storage-202004/archive/foreign/src/etc -c -r -o root -g wheel -m 444 krb5_asn1.h /usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h) *** [/usr/obj/amd64.gcc.20240411/usr/include/krb5/krb5_asn1.h] Error code 1 nbmake[4]: stopped in /disk/storage-202004/archive/foreign/src/crypto/external/bsd/heimdal/lib/libasn1 A second try of the same command on the same machine with the same sources succeeded. Full log available on request. Thomas
Re: tmux-direct entry only has 8 colors
On Wed, Jan 31, 2024 at 10:31:41PM +0100, Thomas Klausner wrote: > I've tried to get my terminal+tmux to display true colors today using > the latest terminfo as imported to NetBSD. I debugged this a bit further. It works fine if I just use two entries, but as soon as a third in the style of kitty+setal is added, it breaks. --- terminfotest2 --- kitty+setal|set underline colors (nonstandard), setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm, minfirst|TermInfo Test, # if the second number is >32767, it disappears! use=min, use=max, maxfirst|TermInfo Test, # putting the bigger one first makes "promotion" happen. use=max, use=min, max|any number > INT16_MAX, colors#16777216, min|any num < INT16_MAX, colors#8, kittymin|kitty+min, use=kitty+setal, use=min, use=max kittymax|kitty+max, use=kitty+setal, use=max, use=min --- end of terminfotest2 --- > tic -x terminfotest2 > infocmp -1x -A /home/wiz/terminfotest2.cdb minfirst # Reconstructed from /home/wiz/terminfotest2.cdb minfirst|TermInfo Test, colors#8, > infocmp -1x -A /home/wiz/terminfotest2.cdb maxfirst # Reconstructed from /home/wiz/terminfotest2.cdb maxfirst|TermInfo Test, colors#16777216, Here you can see the first encountered definition wins. > infocmp -1x -A /home/wiz/terminfotest2.cdb kittymin # Reconstructed from /home/wiz/terminfotest2.cdb kittymin|kitty+min, colors#8, setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm, > infocmp -1x -A /home/wiz/terminfotest2.cdb kittymax # Reconstructed from /home/wiz/terminfotest2.cdb kittymax|kitty+max, colors#8, setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%dm, but here it doesn't any longer, or kitty+setal's non-definition counts as an colors#8? I've filed PR 58034 for this. Roy, Christos, can either of you please take a look? Thanks, Thomas
Re: bug in ftp(1)?
On Sun, Feb 18, 2024 at 12:19:57PM -, Michael van Elst wrote: > w...@netbsd.org (Thomas Klausner) writes: > > >ftp: Receiving HTTP reply: Input line is too long > > #define FTPBUFLEN (4 * MAXPATHLEN) > char buf[FTPBUFLEN]; > > That's 4kB. > > >curl -v https://sourceforge.net/projects/courier/files/courier-unicode/2.3.= > >0/courier-unicode-2.3.0.tar.bz2 > > This returns a 5kB HTTP header "content-security-policy". > > There is no protocol limit, but common server implementations do limit header > lines to something between 4k (some nginx versions) to 48k (tomcat). Thanks for the analysis. I've increased the size to 16kB. Thomas
bug in ftp(1)?
Hi! When fetching the distfile for mail/courier-unicode, I see: => Bootstrap dependency digest>=20211023: found digest-20220214 => Fetching courier-unicode-2.3.0.tar.bz2 => Total size: 657354 bytes Trying [2606:4700:4400::ac40:9691]:443 ... Requesting https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2 ftp: Receiving HTTP reply: Input line is too long fetch: Unable to fetch expected file courier-unicode-2.3.0.tar.bz2 ... wget fetches the file fine. curl -v gives some more information on the return value: curl -v https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2 * Host sourceforge.net:443 was resolved. * IPv6: 2606:4700:4400::ac40:9691, 2606:4700:4400::6812:256f * IPv4: 104.18.37.111, 172.64.150.145 * Trying [2606:4700:4400::ac40:9691]:443... * Connected to sourceforge.net (2606:4700:4400::ac40:9691) port 443 * ALPN: curl offers h2,http/1.1 * TLSv1.3 (OUT), TLS handshake, Client hello (1): * CAfile: none * CApath: /etc/openssl/certs * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / id-ecPublicKey * ALPN: server accepted h2 * Server certificate: * subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=sourceforge.net * start date: Feb 4 00:00:00 2024 GMT * expire date: Dec 31 23:59:59 2024 GMT * subjectAltName: host "sourceforge.net" matched cert's "sourceforge.net" * issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3 * SSL certificate verify ok. * Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256 * Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using sha256WithRSAEncryption * Certificate level 2: Public key type RSA (2048/112 Bits/secBits), signed using sha1WithRSAEncryption * using HTTP/2 * [HTTP/2] [1] OPENED stream for https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2 * [HTTP/2] [1] [:method: GET] * [HTTP/2] [1] [:scheme: https] * [HTTP/2] [1] [:authority: sourceforge.net] * [HTTP/2] [1] [:path: /projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2] * [HTTP/2] [1] [user-agent: Mozilla/5.0] * [HTTP/2] [1] [accept: */*] > GET > /projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2 > HTTP/2 > Host: sourceforge.net > User-Agent: Mozilla/5.0 > Accept: */* > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * old SSL session ID is stale, removing < HTTP/2 301 < date: Sun, 18 Feb 2024 11:09:47 GMT < content-type: text/html; charset=UTF-8 < location: https://sourceforge.net/projects/courier/files/courier-unicode/2.3.0/courier-unicode-2.3.0.tar.bz2/ < cache-control: no-cache < pragma: no-cache < x-ua-compatible: IE=edge,chrome=1 < permissions-policy: geolocation=(), microphone=(), camera=(), payment=(), document-domain=(), display-capture=(), autoplay=() < feature-policy: geolocation 'none'; microphone 'none'; camera 'none'; payment 'none'; document-domain 'none'; display-capture 'none'; autoplay 'none' < x-frame-options: SAMEORIGIN < content-security-policy: frame-ancestors 'self'; script-src 'self' adservice.google.co.jp adservice.google.co.tz adservice.google.nr *.crsspxl.com adservice.google.ge adservice.google.com.gi adservice.google.com.br adservice.google.com.tr adservice.google.so adservice.google.com.pe adservice.google.com.sb adservice.google.st *.sharethrough.com adservice.google.com.co adservice.google.com.pk adservice.google.ad adservice.google.cv adservice.google.ws adservice.google.gm adservice.google.gy adservice.google.tn adservice.google.no adservice.google.rs *.gstatic.cn *.googlesyndication.com adservice.google.com.bn adservice.google.tm http://c.sf-syn.com translate.googleapis.com adservice.google.com.my adservice.google.as *.google.com adservice.google.com.tw *.2mdn.net adservice.google.de adservice.google.lu adservice.google.com.hk adservice.google.pl adservice.google.gg adservice.google.tt adservice.google.com.pa adservice.google.vu adservice.google.co.ve adservice.google.fi adservice.google.mu adservice.google.vg adservice.google.to adservice.google.co.th adservice.google.iq adservice.google.ml adservice.google.com.bo adservice.google.com.ai adservice.google.com.uy adservice.google.ro adservice.google.ae adservice.google.cg *.trustarc.com adservice.google.co.bw adservice.google.tg adservice.google.com.eg *.tiny.cloud adservice.google.rw adservice.google.cz adservice.google.gr adservice.google.co.id
tmux-direct entry only has 8 colors
Hi! I've tried to get my terminal+tmux to display true colors today using the latest terminfo as imported to NetBSD. Either I misunderstand something or the tmux-direct entry is broken. > infocmp tmux-direct # Reconstructed from /usr/share/misc/terminfo.cdb tmux-direct|tmux with direct-color indexing, am, hs, km, mir, msgr, xenl, colors#8, cols#80, it#8, lines#24, pairs#64, ... It shouldn't be 'colors#8', but 16777216, which is the whole point of the "-direct" entries. xterm-direct looks good: > infocmp xterm-direct # Reconstructed from /usr/share/misc/terminfo.cdb xterm-direct|xterm with direct-color indexing, am, bce, km, mc5i, mir, msgr, npc, xenl, colors#16777216, cols#80, it#8, lines#24, pairs#65536, ... Reading terminfo.src I see: tmux-direct|tmux with direct-color indexing, use=kitty+setal, use=xterm+direct, use=tmux, > infocmp kitty+setal kitty+setal|set underline colors (nonstandard), Compare that to terminfo.src: kitty+setal|set underline colors (nonstandard), setal=\E[58:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1 %{255}%&%dm, Seems like NetBSD's infocmp (or terminfo) doesn't support setal, sounds like a bug. > infocmp xterm+direct # Reconstructed from /usr/share/misc/terminfo.cdb xterm+direct|xterm with direct-color indexing (building-block), colors#16777216, pairs#65536, op=\E[39;49m, setab=\E[%?%p1%{8}%<%t4%p1%d%e48:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m, setaf=\E[%?%p1%{8}%<%t3%p1%d%e38:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m, Compared to terminfo file: xterm+direct|xterm with direct-color indexing (building-block), RGB, colors#0x100, pairs#0x1, CO#8, initc@, op=\E[39;49m, setab=\E[%?%p1%{8}%<%t4%p1%d%e48:2::%p1%{65536}%/%d:%p1 %{256}%/%{255}%&%d:%p1%{255}%&%d%;m, setaf=\E[%?%p1%{8}%<%t3%p1%d%e38:2::%p1%{65536}%/%d:%p1 %{256}%/%{255}%&%d:%p1%{255}%&%d%;m, setb@, setf@, Again, a couple things seem to get lost (RGB, CO#8, initc@, setb@, setf@) but the colors are there. > infocmp tmux # Reconstructed from /usr/share/misc/terminfo.cdb tmux|tmux terminal multiplexer, am, hs, km, mir, msgr, xenl, colors#8, cols#80, it#8, lines#24, pairs#64, ... so the colors get lost here because colors#8 overwrites the xterm+direct entry. >From terminfo: tmux|tmux terminal multiplexer, invis=\E[8m, rmso=\E[27m, sgr=\E[0%?%p6%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%? %p5%t;2%;%?%p7%t;8%;m%?%p9%t\016%e\017%;, smso=\E[7m, E3=\E[3J, Smulx=\E[4:%p1%dm, use=ecma+italics, use=ecma+strikeout, use=xterm+edit, use=xterm+pcfkeys, use=xterm+sl, use=xterm+tmux, use=screen, use=bracketed+paste, use=report+version, use=xterm+focus, I looked at the 'use' and some of them are empty (which makes me think NetBSD's infocmp or terminfo are missing more features) and then I found: # Reconstructed from /usr/share/misc/terminfo.cdb screen|VT 100/ANSI X3.64 virtual terminal, am, km, mir, msgr, xenl, colors#8, cols#80, it#8, lines#24, pairs#64, ... So it looks to me like the 'colors' from the 'screen' entry via the 'tmux' entry overwrite the colors defined by 'xterm+direct'. When I run infocmp from ncurses 6.4, I get a different output for tmux-direct: # /usr/pkg/bin/infocmp tmux-direct # Reconstructed via infocmp from file: /usr/pkg/share/terminfo/t/tmux-direct tmux-direct|tmux with direct-color indexing, am, hs, km, mir, msgr, xenl, colors#0x7fff, cols#80, it#8, lines#24, pairs#0x7fff, acsc=++\,\,--..00``aaffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~, bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l, clear=\E[H\E[J, cnorm=\E[34h\E[?25h, cr=\r, csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H, cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C, cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\EM, cvvis=\E[34l, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m, dl=\E[%p1%dM, dl1=\E[M, dsl=\E]0;\007, ed=\E[J, el=\E[K, el1=\E[1K, enacs=\E(B\E)0, flash=\Eg, fsl=^G, home=\E[H, hpa=\E[%i%p1%dG, ht=^I, hts=\EH, ich=\E[%p1%d@, il=\E[%p1%dL, il1=\E[L, ind=\n, indn=\E[%p1%dS, invis=\E[8m, is2=\E)0, kDC=\E[3;2~, kEND=\E[1;2F, kHOM=\E[1;2H, kIC=\E[2;2~, kLFT=\E[1;2D, kNXT=\E[6;2~, kPRV=\E[5;2~, kRIT=\E[1;2C, kbs=^?, kcbt=\E[Z, kcub1=\EOD, kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kdch1=\E[3~, kend=\E[4~, kf1=\EOP, kf10=\E[21~, kf11=\E[23~, kf12=\E[24~, kf13=\E[1;2P, kf14=\E[1;2Q, kf15=\E[1;2R, kf16=\E[1;2S, kf17=\E[15;2~, kf18=\E[17;2~, kf19=\E[18;2~, kf2=\EOQ, kf20=\E[19;2~, kf21=\E[20;2~, kf22=\E[21;2~, kf23=\E[23;2~, kf24=\E[24;2~, kf25=\E[1;5P, kf26=\E[1;5Q, kf27=\E[1;5R, kf28=\E[1;5S, kf29=\E[15;5~, kf3=\EOR, kf30=\E[17;5~,
mktemp POSIX (and Linux) divergence
Hi! Our mktemp man page says: RETURN VALUES The mktemp() and mkdtemp() functions return a pointer to the template on success and NULL on failure. But POSIX[1] (and Linux) say: The mktemp() function shall return the pointer template. If a unique name cannot be created, template shall point to a null string. where 'null string' is[2] 3.146 Empty String (or Null String) A string whose first byte is a null byte So NetBSD's mktemp returns NULL on error, while Linux returns a pointer to string of length 0. I think mktemp has been removed from POSIX in the meantime, but should we switch to the POSIX behaviour? Thomas [1] https://pubs.opengroup.org/onlinepubs/009695399/functions/mktemp.html [2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
Update ARFLAGS?
Hi! As noted in PR 57565, the default ARFLAGS in share/mk/sys.mk are broken - they use 'l' which changed behaviour between binutils 2.34 and 2.39. Ok to commit the change? (This broke the build of ruby-nokogiri recently, which is how I noticed.) Thomas
Re: grafana rc.d scripts reports process not running
On Wed, Dec 27, 2023 at 08:53:10AM +, RVP wrote: > A way to check for a process-name different from the command-name seems > to be documented in /etc/rc.subr. Does this patch work? I think that works too, yes. Thomas
Re: grafana rc.d scripts reports process not running
Thanks for the suggestions. It turns out that starting 'grafana-server ...' ends up starting 'grafana server ...' which made the process name check fail - it expected arg0 to be grafana-server, not grafana. I've changed the script to start grafana as 'grafana server' instead and it works now. Thomas
grafana rc.d scripts reports process not running
Hi! I'm currently trying out grafana, and I noticed one weirdness after starting it using the pkgsrc rc.d script. # /etc/rc.d/grafana status grafana is not running. # cat /var/run/grafana.pid 21719# ps -auxwww | grep 21719 root 7846 0.0 0.0 12468 2212 pts/4 O+3:14nachm.0:00.00 grep 21719 grafana 21719 0.0 0.1 1681352 157636 pts/4 Sl3:08nachm.0:02.15 grafana server -homepath /usr/pkg/share/grafana -config /usr/pkg/etc/grafana.conf -pidfile /var/run/grafana.pid So grafana saved its PID into /var/run/grafana.pid, which is what's configured in the rc.d script as pidfile, but the status command thinks it's not running, despite a grafana process with the corresponding PID running. (I tried manually adding a newline to the pidfile, but that doesn't change the behaviour.) There is not even an interpreter involved, /usr/pkg/bin/grafana is a go binary. Does anyone have an idea what the problem could be here? grafana rc.d script attached. Thanks, Thomas #!/bin/sh # # $NetBSD: grafana.sh,v 1.6 2022/11/29 22:06:47 wiz Exp $ # # PROVIDE: grafana # REQUIRE: DAEMON # KEYWORD: shutdown # # Consider installing pkgtools/rc.subr in unprivileged. # # You will need to set some variables in /etc/rc.conf to start grafana: # # grafana=YES if [ -f /etc/rc.subr ]; then $_rc_subr_loaded . /etc/rc.subr fi name="grafana" rcvar=$name grafana_user="grafana" grafana_group="grafana" grafana_home="/usr/pkg/share/${name}" pidfile="/var/run/${name}.pid" command="/usr/pkg/bin/grafana-server" command_args="-homepath ${grafana_home} -config /usr/pkg/etc/grafana.conf -pidfile ${pidfile} < /dev/null > /dev/null 2>&1 &" start_precmd="grafana_precmd" grafana_precmd() { if [ ! -r "${pidfile}" ]; then touch "${pidfile}" chown "${grafana_user}:${grafana_group}" "${pidfile}" chmod 644 "${pidfile}" fi } if [ -f /etc/rc.subr -a -d /etc/rc.d -a -f /etc/rc.d/DAEMON ]; then load_rc_config $name run_rc_command "$1" else if [ -f /etc/rc.conf ]; then . /etc/rc.conf fi case "$1" in start) if [ -r "${pidfile}" ]; then echo "Already running ${name}." else echo "Starting ${name}." eval ${command} ${command_args} fi ;; stop) if [ -r "${pidfile}" ]; then echo "Stopping ${name}." kill `/bin/cat "${pidfile}"` && /bin/rm "${pidfile}" fi ;; *) echo "Usage: $0 {start|stop}" 1>&2 exit 10 ;; esac fi
stack guard setup?
Hi! We found some operating system specific code in rust and would like to know how this should be done for NetBSD. Can someone please explain the stack guard setup on NetBSD? Below the last mail from the thread on tech-pkg, with a link to the rust code that shows how it's implemented in rust for other BSDs. Thanks, Thomas - Forwarded message from Havard Eidnes - Date: Thu, 16 Nov 2023 19:29:25 +0100 (CET) From: Havard Eidnes To: w...@netbsd.org Cc: jper...@mnx.io, tech-...@netbsd.org Subject: Re: rust problem when building firefox X-Mailer: Mew version 6.9 on Emacs 26.3 >> > Is this a bug in rust? >> >> It might be missing some initialisation in the NetBSD implementation of >> Rust. It's panicking in this function: >> >> https://doc.rust-lang.org/src/std/sys/unix/thread.rs.html#835-914 >> >> There's some OS-specific code for other BSDs. Does NetBSD also need to do >> something specific here? > > Does he@ know? :) he@ is regrettably blissfully ignorant about this issue. Can someone please describe how our stack guard page is placed etc, e.g. in the same terms as in the comments for the other BSDs, I'll take a look at getting something suitable in, initially as a patch, but I'll also take care of upstreaming it if we can demonstrate that it works properly. Regards, - Håvard - End forwarded message -
ure(4) or xhci(4) error?
Hi! After about 1.5 days of uptime I saw xhci2: xhci_set_dequeue: endpoint 0x0: timed out xhci2: endpoint 0x2 failed to stop xhci2: xhci_set_dequeue: endpoint 0x2: timed out ure0: usb error on tx: TIMEOUT ure0: usb error on tx: IOERROR ure0: watchdog timeout xhci2: xhci_set_dequeue: endpoint 0x2: timed out ure0: usb error on tx: IOERROR ure0: watchdog timeout xhci2: xhci_set_dequeue: endpoint 0x2: timed out ure0: usb error on tx: IOERROR ure0: watchdog timeout ure0: usb error on tx: IOERROR ure0: watchdog timeout xhci2: xhci_reset_endpoint: endpoint 0x2: timed out xhci2: endpoint 0x2 failed to stop xhci2: xhci_set_dequeue: endpoint 0x2: timed out ure0: usb error on tx: IOERROR ure0: watchdog timeout xhci2: xhci_reset_endpoint: endpoint 0x2: timed out xhci2: endpoint 0x2 failed to stop xhci2: xhci_set_dequeue: endpoint 0x2: timed out ure0: usb error on tx: IOERROR ure0: watchdog timeout xhci2: xhci_reset_endpoint: endpoint 0x2: timed out xhci2: endpoint 0x2 failed to stop xhci2: xhci_set_dequeue: endpoint 0x2: timed out Is this an xhci issue or an ure one? Has anyone else seen this? Thomas
updating kernel AND modules
Hi! I'm used to just running fully-compiled kernel without kernel modules to speak of, so I just to 'build.sh kernel=GENERIC' and copy the resulting kernel to /netbsd manually. However, e.g. dtrace is a kernel module, so if I'm interested in bugfixes for that, the kernel module needs to be updated as well. The NetBSD guide does not talk about kernel modules at all in the updating section (https://www.netbsd.org/docs/guide/en/chap-kernel.html, http://netbsd.org/docs/guide/en/chap-updating.html) What is the current best-practice method for that? Thanks, Thomas
Re: weird hangs in current (ghc, gnucash)
On Thu, Nov 02, 2023 at 11:33:54AM +0100, Martin Husemann wrote: > On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote: > > Should we back out ad's changes until he has time to look at them? > > I just did that on behalf of core. > Can you test if this solves your problem? Thank you, both my test cases work again with a GENERIC. Thomas
rge(4) completely hangs
Hi! After the latest fixes, rge(4) is better, but it's completely hung up the network interface twice so far - no network traffic possible on it - both times so hard, that the BIOS had some kind of issue on the next boot and needed 15 minutes to sort itself out (before even showing anything on the screen). I'm running a kernel from Oct 22. In /var/log/messages I see: Nov 1 18:59:43 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 18:59:43 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 18:59:45 exadelic dhcpcd[2191]: rge0: Router Advertisement from ::1 1 18:59:46 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 18:59:46 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 18:59:58 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:01:11 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:01:19 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:01:19 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:01:31 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:01:57 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:02:05 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:02:05 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:02:17 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:04:27 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:04:35 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:04:35 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:04:47 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:06:12 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:06:20 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:06:21 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:06:33 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:09:27 exadelic /netbsd: [ 91537.5847758] nfs server 192.168.178.19:/path: not responding Nov 1 19:15:16 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:15:24 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:15:24 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:15:36 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:16:51 exadelic dhcpcd[2191]: rge0: ::1 is reachable again Nov 1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available Nov 1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available Nov 1 19:16:59 exadelic dhcpcd[2191]: rge0: ::1 is unreachable Nov 1 19:16:59 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router Nov 1 19:16:59 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available Nov 1 19:17:11 exadelic syslogd[2290]: last message repeated 3 times Nov 1 19:17:11 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available Nov 1 19:17:44 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available Just in case it matters, I'm not running with default sysctl's, I have kern.sbmax: 262144 -> 16777216 net.inet.tcp.recvbuf_max: 262144 -> 16777216 net.inet.tcp.sendbuf_max: 262144 -> 16777216 net.inet.tcp.recvspace: 32768 -> 262144 net.inet.tcp.sendspace: 32768 -> 262144 because of https://mail-index.netbsd.org/current-users/2017/09/21/msg032369.html I've now switched to an ure(4) device. Has anyone else seen this? Thomas
Re: weird hangs in current (ghc, gnucash)
Should we back out ad's changes until he has time to look at them? Thomas On Wed, Nov 01, 2023 at 09:36:01AM +, Chavdar Ivanov wrote: > This weird hang still takes place on > > ❯ uname -a > NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct > 30 19:45:39 GMT 2023 > sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com > pile/GENERIC amd64 > > - again during building a haskell package: > > ===> Configuring for hs-tagged-0.8.8 > [1 of 2] Compiling Main ( Setup.lhs, Setup.o ) > > > Htop gives weird output for the process not-yet-created: > > 11506 root63 0 33283 873 S 0.0 0.0 0:00.00 | `- make > 20458 root62 0 34832 613 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:" && exit > 1; exec 3<&0;??? whil > 24942 root63 0 33296 882 S 0.0 0.0 0:00.00 | `- > /usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10 > OPSYS_VERSION=109910 LOWE > 21643 root58 0 34302 606 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e;? if test -n "" && /usr/pkg/sbin/pkg_info -K > /usr/pkg/pkgdb -qe hs > 19149 root63 0 34367 920 S 0.0 0.0 0:00.00 | `- > /usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes > ALLOW_VULNERABLE_PACKAGES= reinst > 23303 root58 0 33685 603 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`; > cd /usr/pkgs > 27078 root21 0 256G 37735 S 0.0 0.9 0:00.00 | `- > /usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib > -package-env > 22058 root -22 0 0 0 Z 0.0 0.0 0:00.00 | > `- gcc <== > --- > > > I guess it is back to the kernel from the 9th of October. > > Chavdar > > - > > On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov wrote: > > > > I can confirm that after reverting to the kernel from 9th of October > > devel/happy builds OK. > > > > On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger wrote: > >> > >> ... and probably > >> > >> 3. PR kern/57660 > >> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660 > >> > >> Markus > >> > >> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner > >> : > >> > > >> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: > >> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > >> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to > >> > > > Oct > >> > > > 20) to test the rge(4) changes, and started a bulk build, and the > >> > > > packages using ghc seem to wait for something and make no progress. > >> > > ... > >> > > > I see one other new weird behaviour on that machine - gnucash doesn't > >> > > > finish starting up. > >> > > > >> > > I've backed out ad's changes from the 13th, and both problems are gone. > >> > > > >> > > I'll attach my local change. > >> > > > >> > > Andrew, can you please take a look? > >> > > >> > Two test cases to see the problem I have: > >> > > >> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs. > >> > > >> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make > >> >The 'build' step has two parts, it hangs after the first one. > >> > > >> > Thomas > > > > > > > > -- > > > > > > -- >
Re: dtracing unlink
On Mon, Oct 30, 2023 at 11:33:24AM +, RVP wrote: > The NetBSD copyinstr() _disables_ SMAP before copying data from > userspace. The dtrace version _does not_. I think this is what > fails on some CPUs. My Intel CPU's more than 10 years old so it > doesn't support SMAP (only SMEP), dtrace works for me. If you and > bch tell me that your CPUs support SMAP, then that would be the > smoking gun. # cpuctl identify 0 | grep SMAP cpu0: features5 0xf1bf97a9 Looks that way! Thomas
Re: dtracing unlink
RVP looked at this some more and it seems related to time-after-booting or perhaps RAM churn. It starts happening on RVP's machine too after some uptime. Still looking for a dtrace guru to help out here :) Thomas
Re: weird hangs in current (ghc, gnucash)
On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct > > 20) to test the rge(4) changes, and started a bulk build, and the > > packages using ghc seem to wait for something and make no progress. > ... > > I see one other new weird behaviour on that machine - gnucash doesn't > > finish starting up. > > I've backed out ad's changes from the 13th, and both problems are gone. > > I'll attach my local change. > > Andrew, can you please take a look? Two test cases to see the problem I have: 1. start gnucash, it doesn't finish starting up, the splash screen hangs. 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make The 'build' step has two parts, it hangs after the first one. Thomas
Re: weird hangs in current (ghc, gnucash)
On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct > 20) to test the rge(4) changes, and started a bulk build, and the > packages using ghc seem to wait for something and make no progress. ... > I see one other new weird behaviour on that machine - gnucash doesn't > finish starting up. I've backed out ad's changes from the 13th, and both problems are gone. I'll attach my local change. Andrew, can you please take a look? Thanks, Thomas Module Name:src Committed By: ad Date: Fri Oct 13 18:48:56 UTC 2023 Modified Files: src/sys/kern: kern_condvar.c kern_sleepq.c src/sys/rump/librump/rumpkern: locks.c locks_up.c src/sys/sys: condvar.h lwp.h Log Message: Add cv_fdrestart() (better name suggestions welcome): Like cv_broadcast(), but make any LWPs that share the same file descriptor table as the caller return ERESTART when resuming. Used to dislodge LWPs waiting for I/O that prevent a file descriptor from being closed, without upsetting access to the file (not descriptor) made from another direction. To generate a diff of this commit: cvs rdiff -u -r1.59 -r1.60 src/sys/kern/kern_condvar.c cvs rdiff -u -r1.83 -r1.84 src/sys/kern/kern_sleepq.c cvs rdiff -u -r1.86 -r1.87 src/sys/rump/librump/rumpkern/locks.c cvs rdiff -u -r1.12 -r1.13 src/sys/rump/librump/rumpkern/locks_up.c cvs rdiff -u -r1.17 -r1.18 src/sys/sys/condvar.h cvs rdiff -u -r1.227 -r1.228 src/sys/sys/lwp.h Module Name:src Committed By: ad Date: Fri Oct 13 18:50:39 UTC 2023 Modified Files: src/sys/kern: uipc_socket.c uipc_syscalls.c src/sys/sys: socketvar.h Log Message: Use cv_fdrestart() to implement fo_restart. To generate a diff of this commit: cvs rdiff -u -r1.305 -r1.306 src/sys/kern/uipc_socket.c cvs rdiff -u -r1.208 -r1.209 src/sys/kern/uipc_syscalls.c cvs rdiff -u -r1.165 -r1.166 src/sys/sys/socketvar.h Module Name:src Committed By: ad Date: Fri Oct 13 19:07:09 UTC 2023 Modified Files: src/sys/ddb: db_command.c db_interface.h db_xxx.c src/sys/kern: sys_pipe.c src/sys/sys: pipe.h src/usr.bin/fstat: fstat.c Log Message: Simplify/streamline pipes a little bit: - Allocate only one struct pipe not two (no need to be bidirectional here). - Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops. - Never wake the other side or acquire long-term (I/O) lock unless needed. - Whenever possible, defer wakeups until after locks have been released. - Do some things locklessly in pipe_ioctl() and pipe_poll(). Some notable results: - -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process. - 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine. - 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w). To generate a diff of this commit: cvs rdiff -u -r1.186 -r1.187 src/sys/ddb/db_command.c cvs rdiff -u -r1.41 -r1.42 src/sys/ddb/db_interface.h cvs rdiff -u -r1.77 -r1.78 src/sys/ddb/db_xxx.c cvs rdiff -u -r1.164 -r1.165 src/sys/kern/sys_pipe.c cvs rdiff -u -r1.39 -r1.40 src/sys/sys/pipe.h cvs rdiff -u -r1.118 -r1.119 src/usr.bin/fstat/fstat.c ad.backed.out.diff.gz Description: Binary data
weird hangs in current (ghc, gnucash)
Hi! I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct 20) to test the rge(4) changes, and started a bulk build, and the packages using ghc seem to wait for something and make no progress. In one of my sandboxes there is a hs-data-array-byte build but it's not doing anything. The log stops at: ===> Creating toolchain wrappers for hs-data-array-byte-0.1.0.1nb2 ===> Configuring for hs-data-array-byte-0.1.0.1nb2 => Checking for portability problems in extracted files [1 of 2] Compiling Main ( Setup.hs, Setup.o ) >From ps: pbulk 26131 0.0 0.1 1073923564 140684 ? Il8:23PM 0:00.23 /usr/pkg/lib/ghc-9.4.7/bin/./ghc-9.4.7 -B/usr/pkg/lib/ghc-9.4.7/lib -package-env - --make Setup -dynamic (btw, that is a really huge process size?!) Attaching with gdb shows me: [Switching to LWP 20090 of process 26131] 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 (gdb) bt #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa22f010, pMut=pMut@entry=0x7195fa22f038) at rts/posix/OSThreads.c:143 #3 0x7195faa903e1 in waitForWorkerCapability (task=) at rts/Capability.c:707 #4 yieldCapability (pCap=pCap@entry=0x7195f77fff10, task=task@entry=0x7195fa22f000, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:1011 #5 0x7195faab0026 in scheduleYield (task=0x7195fa22f000, pcap=0x7195f77fff08) at rts/Schedule.c:709 #6 schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 , task=task@entry=0x7195fa22f000) at rts/Schedule.c:319 #7 0x7195faab20b9 in scheduleWorker (cap=cap@entry=0x7195fab21cc0 , task=task@entry=0x7195fa22f000) at rts/Schedule.c:2668 #8 0x7195faab78a2 in workerStart (task=0x7195fa22f000) at rts/Task.c:444 #9 0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1 #10 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12 #11 0x0020 in ?? () #12 0x in ?? () (gdb) thread apply all bt Thread 6 (LWP 26131 of process 26131 ""): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2010, pMut=pMut@entry=0x7195fa2b2038) at rts/posix/OSThreads.c:143 #3 0x7195faa903e1 in waitForWorkerCapability (task=) at rts/Capability.c:707 #4 yieldCapability (pCap=pCap@entry=0x7f7fff2287c0, task=task@entry=0x7195fa2b2000, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:1011 #5 0x7195faab0026 in scheduleYield (task=0x7195fa2b2000, pcap=0x7f7fff2287b8) at rts/Schedule.c:709 #6 schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 , task=task@entry=0x7195fa2b2000) at rts/Schedule.c:319 #7 0x7195faab2069 in scheduleWaitThread (tso=0x4200406ce8, ret=ret@entry=0x0, pcap=pcap@entry=0x7f7fff228940) at rts/Schedule.c:2651 #8 0x7195faaa85fb in rts_evalLazyIO (cap=cap@entry=0x7f7fff228940, p=p@entry=0x1071e60, ret=ret@entry=0x0) at rts/RtsAPI.c:566 #9 0x7195faaabb48 in hs_main (argc=, argv=, main_closure=0x1071e60, rts_config=...) at rts/RtsMain.c:72 #10 0x01063124 in main () Thread 5 (LWP 7329 of process 26131 "ghc_ticker"): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fab21bc0 , pMut=pMut@entry=0x7195fab21b80 ) at rts/posix/OSThreads.c:143 #3 0x7195faae040e in itimer_thread_func (_handle_tick=0x7195faab9c57 ) at rts/posix/ticker/Pthread.c:140 #4 0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1 #5 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12 #6 0x in ?? () Thread 4 (LWP 15032 of process 26131 "ghc_worker"): #0 0x7195fa5a030a in _sys___kevent100 () from /usr/lib/libc.so.12 #1 0x7195fa97a8a7 in __kevent100 () from /usr/lib/libpthread.so.1 #2 0x7195fba014f2 in base_GHCziEventziKQueue_new12_info () from /usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so #3 0x in ?? () Thread 3 (LWP 17781 of process 26131 "ghc_worker"): #0 0x7195fa5a016a in poll () from /usr/lib/libc.so.12 #1 0x7195fa97ae63 in poll () from /usr/lib/libpthread.so.1 #2 0x7195fba0ff55 in ?? () from /usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so #3 0x in ?? () Thread 2 (LWP 23219 of process 26131 "ghc_worker"): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2190, pMut=pMut@entry=0x7195fa2b21b8) at rts/posix/OSThreads.c:143 #3
Re: dtracing unlink
On Sun, Oct 22, 2023 at 07:40:17AM +, RVP wrote: > Ah, that attachment is still based on _my_ version which is plain wrong: You > can't do copyinstr(arg0) in the :entry action because the kernel may not have > paged in the memory containing the pathname (yet). > > Use your version (which is correct--it does copyinstr() in :return when the > kernel is sure to have the pathname already in memory): Yes, then we're back at the start: dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid address (0x77002a73f7ce) in action #1 at DIF offset 12 : No such file or directory Thomas
Re: dtracing unlink
On Sun, Oct 22, 2023 at 06:00:43AM +, RVP wrote: > On Fri, 20 Oct 2023, Thomas Klausner wrote: > > > # dtrace -n syscall::unlink:entry'/pid == 27647/{ self->file = arg0; }' -n > > syscall::unlink:return'{ trace(copyinstr(self->file)); self->file = 0; }' > > > > but this just gives me lots of > > > > dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): > > invalid address (0x79c4586577ce) in action #1 at DIF offset 12 > > : No such file or directory > > > > Actually, this command-line is almost correct. What's missing is the paired > /pid == 27647/ for syscall::unlink:return. Without it, unlink:return is called > for _every_ pid and there's not going to be a valid self->file for almost > every > one of them. I tried that (see attachment), didn't help. dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid address (0x7a8e0685a7ce) in action #1 at DIF offset 12 : No such file or directory dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid address (0x0) in action #2 : No such file or directory dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid address (0x7a8e0685a7ce) in action #1 at DIF offset 12 : No such file or directory dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid address (0x0) in action #2 : No such file or directory The machine has 128 GB RAM and ~450 GB swap. I haven't tried limiting the RAM from BIOS yet. Thomas #!/usr/sbin/dtrace -s #pragma D option destructive #pragma D option quiet syscall::unlink:entry /pid == 28651/ { self->file = copyinstr(arg0); } syscall::unlink:return /pid == 28651/ { printf("%d %s\n", pid, self->file); self->file = 0; }
Re: dtracing unlink
On Sat, Oct 21, 2023 at 10:30:54AM +, RVP wrote: > On Sat, 21 Oct 2023, Thomas Klausner wrote: > > > With that I see: > > > > # ./dtrace.unlink2 > > dtrace: buffer size lowered to 1m > > dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): > > invalid address (0xc48240) in action #1 at DIF offset 12 > > : No such file or directory > > dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): > > invalid address (0x0) in action #2 > > : No such file or directory > > > > Odd. Are you running a KASLR kernel? No, a standard GENERIC kernel (first try with one from daily releng builds, second built just now from today's sources). Thomas
Re: dtracing unlink
On Sat, Oct 21, 2023 at 06:10:17AM +, RVP wrote: > On Fri, 20 Oct 2023, bch wrote: > > > What OS release/architecture are you using that is getting favorable > > results? > > > > $ uname -a > NetBSD x202e.localdomain 10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 19 > 23:43:40 UTC 2023 > mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64 > > > I’m following along on ~up-to-the-minute -current AMD64 on my Thinkpad, and > > only seeing the same memory errors as wiz’ original example. > > > > Do the copyinstr in ::entry like this: > > ``` > #!/usr/sbin/dtrace -s > > #pragma D option destructive > #pragma D option quiet > > syscall::unlink:entry > { > self->file = copyinstr(arg0); > } > > syscall::unlink:return > { > printf("%d %s\n", pid, self->file); > self->file = 0; > } > ``` With that I see: # ./dtrace.unlink2 dtrace: buffer size lowered to 1m dtrace: error on enabled probe ID 1 (ID 404: syscall::unlink:entry): invalid address (0xc48240) in action #1 at DIF offset 12 : No such file or directory dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid address (0x0) in action #2 : No such file or directory Thomas
cpuctl ucode: no patch available
Hi! I read about a new microcode update for the AMD Zen family, downloaded the linux firmware repository and tried to apply it. I put the new file for my CPU in /libdata/firmware/x86/amd/ (per man page) as microcode_amd_fam19h.bin (as the filename is in the repository). # ls -l /libdata/firmware/x86/amd/microcode_amd_fam19h.bin -rw-rw-r-- 1 root wheel 39172 Oct 21 10:34 /libdata/firmware/x86/amd/microcode_amd_fam19h.bin # cpuctl identify 0 | grep -i -e family -e ucode cpu0: AMD Family 19h (686-class), 4491.57 MHz cpu0: family 0x19 model 0x61 stepping 0x2 (id 0xa60f12) cpu0: UCode version: 0xa601203 Then I try to apply it: # cpuctl ucode cpuctl: please also check dmesg(8) output for additional error information cpuctl: IOC_CPU_UCODE_APPLY: No such file or directory # dmesg | tail -1 autoconfiguration error: ucode: No patch available for this cpu So this looks like it didn't find a patch file. When I run it under ktrace I see: 3719 3719 cpuctl NAMI "/libdata/firmware/x86/amd/microcode_amd_fam19h.bin" 3719 3719 cpuctl RET ioctl -1 errno 2 No such file or directory so it looks in the right path. Why does it claim there is no patch available? Thomas
Re: dtracing unlink
On Fri, Oct 20, 2023 at 11:20:00PM +0200, Roland Illig wrote: > Am 20.10.2023 um 22:38 schrieb Thomas Klausner: > > Hi! > > > > I'm trying to find out what a program does, and found it does a lot of > > unlink syscalls, so I wanted to see what it unlinks. > > Did you try 'ktruss | grep NAMI' before diving deep into dtrace? The interesting work load is _very_ long running, so I'm not sure I want to ktrace all of it - so no, I didn't do that yet. Thomas
dtracing unlink
Hi! I'm trying to find out what a program does, and found it does a lot of unlink syscalls, so I wanted to see what it unlinks. I tried # dtrace -n syscall::unlink:entry'/pid == 27647/{ self->file = arg0; }' -n syscall::unlink:return'{ trace(copyinstr(self->file)); self->file = 0; }' but this just gives me lots of dtrace: error on enabled probe ID 2 (ID 405: syscall::unlink:return): invalid address (0x79c4586577ce) in action #1 at DIF offset 12 : No such file or directory (yes, including that weird newline in the middle). What's the proper way to do this? Thanks, Thomas
file-backed cgd backup question
Hi! For a cgd in a file that I mount via vnd+cgd, the file system contents inside may change, but the actual file on the hard disk outside only has 'access' time changes. So "smart" backup programs that check timestamps to find out if they need to re-hash files don't notice it was changed. How do you handle this? Manually touch it? Cheers, Thomas
Re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed
On Tue, Oct 17, 2023 at 10:07:14AM +1100, Matthew Green wrote: > > panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file > > "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0 > > this is from: > > KASSERTMSG(offset < map->dm_mapsize, > "bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE, > offset, map->dm_mapsize); > > the mapsize being zero indicates that there's nothing mapped > currently in this dma map, so there's nothing to sync. ie, > the caller seems to be trying to sync something not mapped. > > can you post the full back trace? Sure: (gdb) bt #0 0x80239c75 in cpu_reboot () #1 0x80ddb28d in kern_reboot () #2 0x80e21798 in vpanic () #3 0x80fe6e5f in kern_assert () #4 0x8058be67 in bus_dmamap_sync () #5 0x8044edc7 in rge_rxeof () #6 0x804536fd in rge_intr () #7 0x80592c15 in intr_biglock_wrapper () #8 0x80214405 in Xhandle_ioapic_edge18 () #9 0x8023547d in x86_mwait () #10 0x805819d0 in acpicpu_cstate_idle () #11 0x80dbe5d6 in idle_loop () #12 0x80210327 in lwp_trampoline () #13 0x in ?? () Thomas
panic: kernel diagnostic assertion "offset < map->dm_maps" failed
Hi! I just tried checking out pkgsrc on an nvme when the machine paniced: panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0×0 >= 0x0 That's a GENERIC 10.99.10/amd64 from releng, Oct 11. Has anyone seen this one before? I have a crash dump but no debug kernel, since I didn't build it myself. dmesg attached, there is one warning from ACPI: acpi0: autoconfiguration error: invalid PCI address for D005 no idea if that could be related. Thomas dmesg.redacted.txt.gz Description: Binary data
Re: panic: assertion "!cpu_softintr_p()" failed
On Mon, Oct 02, 2023 at 09:23:59AM +1100, Matthew Green wrote: > Thomas Klausner writes: > > panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file > > "/usr/src/sys/kern/subr_kmem.c", line 451 > > > > gdb says: > > > > #10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel > > %sassertion \"%s\" failed: file \"%s\", line %d ", > > ap=ap@entry=0xae2110a93e08) > > at /usr/src/sys/kern/subr_prf.c:286 > > #11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 > > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") > > at /usr/src/sys/lib/libkern/kern_assert.c:51 > > #12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at > > /usr/src/sys/kern/subr_kmem.c:451 > > #13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at > > /usr/src/sys/kern/kern_rwlock_obj.c:127 > > #14 0x80d825d3 in uvm_anon_release (anon=) at > > /usr/src/sys/uvm/uvm_anon.c:385 > > i think this is a new bug. this line changed from: > > 1.11 (ad 12-Sep-23): pool_cache_put(rw_obj_cache, ro); > > to > > 1.12 (ad 23-Sep-23): kmem_free(ro, sizeof(*ro)); > > i guess it just should be kmem_free_intr(), as pool_cache > is intr-safe as well. Thanks, I'll try a kernel with the attached diff. Thomas Index: kern_rwlock_obj.c === RCS file: /cvsroot/src/sys/kern/kern_rwlock_obj.c,v retrieving revision 1.12 diff -u -r1.12 kern_rwlock_obj.c --- kern_rwlock_obj.c 23 Sep 2023 18:21:11 - 1.12 +++ kern_rwlock_obj.c 2 Oct 2023 07:51:31 - @@ -124,7 +124,7 @@ } membar_acquire(); rw_destroy(>ro_lock); - kmem_free(ro, sizeof(*ro)); + kmem_intr_free(ro, sizeof(*ro)); return true; }
Re: cgd questions
Follow up question because it just happened to me: I have a USB Disk with ffs-on-cgd. I unmounted the ffs but forgot unconfiguring the cgd before unplugging the disk. Can this cause problems? What kinds? Thomas
Re: cgd questions
On Sun, Oct 01, 2023 at 09:31:03AM -0400, Greg Troxel wrote: > Thomas Klausner writes: > > > When I pick up a cgd disk and want to use it on a NetBSD system to > > which it was not connected before, what do I need? > > > > - the passphrase > > - the /etc/cgd/foo file? > > > > If you need the /etc/cgd/foo file too, how do people handle those for > > cgds used as backup disks? > > Yes, you need the /etc/cgd/foo file because the passphrase is salted, > and you might need an iv depending on iv method. IMHO this is a design > bug in cgd. At least as a normal path, one should be able to access > with just the passphrase. > > My setup is > > (this is for a 512-sector disk) > GPT partition on disk > index 2: 16384 sectors starting at 64, ffs > index 1: rest of disk, cgd > > in index 2, newfs and then rsync all my cgd init files. > in index 1, cgconfig > > Thus, any backup disk has the params for all of them. That is a great idea. I should have thought of that before creating partitions on my backup disks :| > > The other question is that the cgd man page says that some ciphers are > > obsolete. How can I switch from an obsolete cipher to a new one - is > > the only method to make a new cgd with the new cipher and copy the > > data manually? > > I believe that's the only way. I can't even figure out how to change > the passphrase without doing that. IIUC the cgdconfig man page correctly, this is how you do that: To create a new parameters file that will generate the same key as an old parameters file: # cgdconfig -G -o newparamsfile oldparamsfile old file's passphrase: new file's passphrase: Thomas
cgd questions
Hi! I tried finding this in the man page, but it wasn't fully clear to me. When I pick up a cgd disk and want to use it on a NetBSD system to which it was not connected before, what do I need? - the passphrase - the /etc/cgd/foo file? If you need the /etc/cgd/foo file too, how do people handle those for cgds used as backup disks? The other question is that the cgd man page says that some ciphers are obsolete. How can I switch from an obsolete cipher to a new one - is the only method to make a new cgd with the new cipher and copy the data manually? Thanks, Thomas
panic: assertion "!cpu_softintr_p()" failed
Hi! I've updated to 10.99.9 last night and started a bulk build, which didn't get very far. panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file "/usr/src/sys/kern/subr_kmem.c", line 451 gdb says: #10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xae2110a93e08) at /usr/src/sys/kern/subr_prf.c:286 #11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51 #12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at /usr/src/sys/kern/subr_kmem.c:451 #13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at /usr/src/sys/kern/kern_rwlock_obj.c:127 #14 0x80d825d3 in uvm_anon_release (anon=) at /usr/src/sys/uvm/uvm_anon.c:385 #15 0x80d9e525 in uvm_aio_aiodone_pages (pgs=pgs@entry=0xae2110a93f30, npages=npages@entry=16, write=write@entry=true, error=error@entry=0) at /usr/src/sys/uvm/uvm_pager.c:466 #16 0x80d9e954 in uvm_aio_aiodone (bp=0x9b158a500ed8) at /usr/src/sys/uvm/uvm_pager.c:526 #17 0x80ece109 in dkiodone (bp=) at /usr/src/sys/dev/dkwedge/dk.c:1658 #18 0x80e878a3 in biointr (cookie=) at /usr/src/sys/kern/vfs_bio.c:1737 #19 0x80dfd7bf in softint_execute (s=3, l=0x9b16c8abd8c0) at /usr/src/sys/kern/kern_softint.c:597 #20 softint_dispatch (pinned=, s=3) at /usr/src/sys/kern/kern_softint.c:842 #21 0x8023480c in Xsoftintr () Thomas
make[1]: Cannot open `.' (Permission denied)
Hi! I was doing a limited bulk build on today's current/amd64, and only libreoffice was left, when it failed like this: [build HPX] zh-TW/helpcontent2/source/text/swriter/guide [build HPX] zh-TW/helpcontent2/source/text/swriter/librelogo [build HPX] zh-TW/helpcontent2/source/text/swriter/menu [build HIX] scalc/en-US [build HEJ] scalc/en-US [build HIX] schart/en-US [build HEJ] schart/en-US Error reading directory file:///scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpTarget/scalc/en-US/content gmake[1]: *** [/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/solenv/gbuild/HelpTarget.mk:460: /scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpIndexTarget/scalc/en-US.done] Error 1 gmake[1]: *** Waiting for unfinished jobs Error reading directory file:///scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpTarget/schart/en-US/content gmake[1]: *** [/scratch/misc/libreoffice/work/libreoffice-7.5.5.2/solenv/gbuild/HelpTarget.mk:460: /scratch/misc/libreoffice/work/libreoffice-7.5.5.2/workdir/HelpIndexTarget/schart/en-US.done] Error 1 gmake: *** [Makefile:289: build] Error 2 *** Error code 2 Stop. make[1]: stopped in /usr/pkgsrc/misc/libreoffice make[1]: Cannot open `.' (Permission denied) *** Error code 2 Stop. make: stopped in /usr/pkgsrc/misc/libreoffice I'm used to gmake rarely failing randomly, but this usually looks different. There is nothing in /var/log/messages, and the permissions of /usr/pkgsrc/misc/libreoffice are fine for the pbulk user (otherwise the build wouldn't have started anyway). I had such a make(1) problem before: http://gnats.netbsd.org/42484 but there were fixes committed for that in 2013. Perhaps they were not sufficient. The bulk build clients are in sandboxes that look like this: tmpfs on /archive/sandboxes/client3 type tmpfs (local) ptyfs on /archive/sandboxes/client3/dev/ptyfs type ptyfs (local) procfs on /archive/sandboxes/client3/proc type procfs (local) /bin on /archive/sandboxes/client3/bin type null (read-only, local) /sbin on /archive/sandboxes/client3/sbin type null (read-only, local) /lib on /archive/sandboxes/client3/lib type null (read-only, local) /libexec on /archive/sandboxes/client3/libexec type null (read-only, local) /usr/bin on /archive/sandboxes/client3/usr/bin type null (read-only, local) /usr/games on /archive/sandboxes/client3/usr/games type null (read-only, local) /usr/include on /archive/sandboxes/client3/usr/include type null (read-only, local) /usr/lib on /archive/sandboxes/client3/usr/lib type null (read-only, local) /usr/libdata on /archive/sandboxes/client3/usr/libdata type null (read-only, local) /usr/libexec on /archive/sandboxes/client3/usr/libexec type null (read-only, local) /usr/share on /archive/sandboxes/client3/usr/share type null (read-only, local) /usr/sbin on /archive/sandboxes/client3/usr/sbin type null (read-only, local) /usr/X11R7 on /archive/sandboxes/client3/usr/X11R7 type null (read-only, local) /var/mail on /archive/sandboxes/client3/var/mail type null (read-only, local) /usr/src on /archive/sandboxes/client3/usr/src type null (read-only, local) /usr/pkgsrc on /archive/sandboxes/client3/usr/pkgsrc type null (local) /disk/scratch_ssd/client3 on /archive/sandboxes/client3/scratch type null (local) /usr/xsrc on /archive/sandboxes/client3/usr/xsrc type null (read-only, local) /packages on /archive/sandboxes/client3/packages type null (local) /distfiles on /archive/sandboxes/client3/distfiles type null (local) Worth a PR, or re-opening the old one? Cheers, Thomas
Re: 10.99.7 panic: defibrillate
On Mon, Aug 14, 2023 at 12:41:06PM +0200, Thomas Klausner wrote: > I had followed your suggestion and bumped the heartbeat limit from 15 > to 300, but today it paniced again. > > panic: cpu8: found cpu9 heart stopped beating and unresponsive > > I have a core dump in case you want any particular details. > > I've now switched set it to 0. and had a hard hang less than half a day later. This hasn't been happening in 10.99.5 (at least not with that frequency), which had uptimes of weeks, so either the heartbeat code introduced additional problems (even if disabled this way) or something else got worse, or I am really really unlucky right now. Thomas
Re: 10.99.7 panic: defibrillate
I had followed your suggestion and bumped the heartbeat limit from 15 to 300, but today it paniced again. panic: cpu8: found cpu9 heart stopped beating and unresponsive I have a core dump in case you want any particular details. I've now switched set it to 0. Thomas
Re: 10.99.7 panic: defibrillate
So it happened again, no bulk build this time, just qt5-qtwebengine in a sandbox. panic: cpu0: softints stuck for 16 seconds I've got a kernel coredump this time, let me know what information would be useful. Btw, gdb 13.2 (built on Aug 11) doesn't work with kernel core dumps: (gdb) target kvm netbsd.40.core Undefined target command: "kvm netbsd.40.core". Try "help target". Using an older gdb I get: (gdb) target kvm netbsd.40.core 0x80239c95 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717 717 dumpsys(); (gdb) bt #0 0x80239c95 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717 #1 0x80deeb3d in kern_reboot (howto=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73 #2 0x80b72c04 in db_reboot_cmd (addr=, have_addr=, count=, modif=) at /usr/src/sys/ddb/db_command.c:1589 #3 0x80b732da in db_command (last_cmdp=last_cmdp@entry=0x8187a360 ) at /usr/src/sys/ddb/db_command.c:964 #4 0x80b737ac in db_command_loop () at /usr/src/sys/ddb/db_command.c:623 #5 0x80b77a98 in db_trap (type=type@entry=1, code=code@entry=0) at /usr/src/sys/ddb/db_trap.c:91 #6 0x80236b14 in kdb_trap (type=type@entry=1, code=code@entry=0, regs=regs@entry=0xd3a110933ad0) at /usr/src/sys/arch/amd64/amd64/db_interface.c:251 #7 0x8023c2a4 in trap (frame=0xd3a110933ad0) at /usr/src/sys/arch/amd64/amd64/trap.c:315 #8 0x80234ad4 in alltraps () #9 0x0003 in ?? () #10 0x0001 in ?? () #11 0x0001 in ?? () #12 0x in ?? () Thomas
Re: 10.99.7 panic: defibrillate
On Sat, Aug 12, 2023 at 04:03:59PM +, Taylor R Campbell wrote: > This panic means that one CPU has detected that another CPU has failed > to run either the hardclock interrupt handler or the SOFTINT_CLOCK > softints in over 15 seconds, and triggered an interprocessor interrupt > in an attempt to panic rather than stay stuck where it appears to be > stuck -- here, pmap_tlb_shootnow. > > Normally the hardclock interrupt handler runs every 10ms (or 1/hz sec; > default hz=100), and softints run reasonably promptly, so failing to > do this for 15 sec is extremely unusual and likely indicates a CPU is > wedged and unable to make progress. For example, something may be > stuck in an infinite loop with a spin lock held or spl raised, which > blocks interrupts. > > (The HEARTBEAT option, this system where CPUs check one another for > progress, is new as of last month. The problems it uncovers would > likely have manifested as silent unresponsive hang before.) > > 1. Did you notice anything sluggish before the crash? I was active on the machine in a remote terminal and I didn't notice anything in particular. The machine has 32 (virtual) cores so I probably wouldn't notice one stuck - except that I would notice if pbulk never finished because one was stuck. I know that this machine is sometimes sluggish when all three parallel pbulk clients want to interact with the disk (e.g. libreoffice and rust unpacking at the same time, or something similar). > 2. Can you start another bulk build and run the following dtrace >script for a while and share the final output? > > dtrace -x cleanrate=50hz -n ' > fbt::pmap_tlb_shootnow:entry, > fbt::uvm_pagermapout:entry { > self->starttime[probefunc] = timestamp > } > fbt::pmap_tlb_shootnow:return, > fbt::uvm_pagermapout:return /self->starttime[probefunc]/ { > @[probefunc] = quantize(timestamp - > self->starttime[probefunc]); > self->starttime[probefunc] = 0 > } > tick-60s { > printa(@) > } > ' > > You may need to modload dtrace_fbt and dtrace_profile first. The > tick-60s probe will print the current state of data collection once a > minute, showing a histogram of the time spent in the functions > pmap_tlb_shootnow and uvm_pagermapout. > > If it says something like > > dtrace: 429 dynamic variable drops with non-empty dirty list > > then just hit ^C and save the last output. I left it running for a bit and this is what it said: # ./dtrace-script-20230812.sh dtrace: description ' fbt::pmap_tlb_shootnow:entry, fbt::uvm_pagermapout:entry ' matched 5 probes dtrace: aggregation size lowered to 2m dtrace: 312026 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 22746 dynamic variable drops with non-empty rinsing list : Operation timed out dtrace: 255016 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 341982 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 273456 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 11589 dynamic variable drops with non-empty rinsing list : Operation timed out dtrace: 303313 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 312509 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 345693 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 16682 dynamic variable drops with non-empty rinsing list : Operation timed out dtrace: 333801 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 327437 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 412853 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 539988 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 471254 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 501274 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 475914 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 501722 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 400591 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 370924 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 395296 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 276777 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 255151 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 300495 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 263274 dynamic variable drops with non-empty dirty list : Operation timed out dtrace: 390073 dynamic variable drops with non-empty dirty list : Operation timed out dtrace:
10.99.7 panic: defibrillate
Hi! I just got a new panic in 10.99.7 after running a pbulk for less than a day (after updating from 10.99.5, which was stable for weeks). OCR'd from screenshot and manually corrected: [ 24737.0090714] hardclock() at netbsd:hardclock+0x8b [ 24737.0090714] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+Oxle [ 24737.0090714] --- interrupt [ 24737.0090714] pmap_tlb_shootnow() at netbs:pmap_tlb_shootnow+0x1f7 [ 24737.0090714] map_update() at netbsd:map_update+0×17 [ 24737.0090714] uvm_pagermapout() at netbsd:um_pagermapout+0×29 [ 24737.0090714] genfs_getpages() at netbsd:genfs_getpages+0×1755 [ 24737.0090714] VOP_GETPAGES() at netbsd:VOP_GETPAGES+0×58 [ 24737.0190721] ufs_balloc_range() at netbsd:ufs_balloc_range+0x11a [ 24737.0190721) ffs_write() at netbsd:ffs _write+0x34c [ 24737.0190721) layer_bypass() at netbsd: layer_bypass+0×102 [ 24737.0190721] VOP_WRITE() at netbsd:VOP_WRITE+0x103 [ 24737.0190721] vn_write() at netbsd:vn_write+Oxe0 [ 24737.0190721] dofilewrite() at netbsd: dofilewrite+0×80 [ 24737.0190721] sys_write() at netbsd:sys_write+0x49 [ 24737.0190721] syscall() at netbsd:syscall+0x196 [ 24737.0190721] --- syscall (number 4) --- [ 24737.0190721] netbsd:syscall+0x196: [ 24737.0190721] cpu14: End traceback.. [ 24737.01907211 fatal breakpoint trap in supervisor mode [ 24737.0190721] trap type 1 code 0 rip Ox80235425 cs 0x8 rflags 0x202 cr2 0x71f902c561cf level 0x7 rsp Oxaaa17d0d9628 [ 24737.0190721] curlwp @xe@cbc87ed700 pid 9818.9818 lowest kstack Oxaaal7ded52c0 Stopped in pid 9818.9818 (as) at netbsd:breakpoint+0x5: leave breakpoint () at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x173 panic() at netbsd :panic+0x3c defibrillate() at netbsd:defibrillate+Oxe3 hardclock() at netbsd:hardclock+0x8b Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+Oxle --- interrupt --- pmap_tlb_shootnow() at netbsd:pmap_tlb_shootnow+0x1f7 pmap_update() at netbsd:pmap_update+0x17 um._pagermapout () at netbsd:um_pagermapout+0×29 genfs_getpages() at netbsd:genfs_getpages+0x1755 VOP_GETPAGES () at netbsd: VOP_GETPAGES+0x58 ufs_balloc_range() at netbsd:ufs_balloc_range+0x11a ffs_write() at netbsd:ffs_write+0×34c layer_bypass() at netbsd: layer_bypass+0×102 VOP_WRITE () at netbsd:VOP_WRITE+0x103 vn_write() at netbsd:vn_write+Oxe0 dofilewrite() at netbsd:dofilewrite+0x80 sys_write() at netbsd: sys_write+0×49 syscall() at netbsd:syscall+0x196 Sorry, no crash dump available. Any ideas what this one is about? Thomas
Re: kernel size change
On Thu, Jul 13, 2023 at 08:16:34AM +, RVP wrote: > That's one of them. The DATA segment is now at 0x1a0 instead of at > 0x180 (2MB difference). The CODE segment must've increased in size for > this. Check the previous `Section Headers:' display to see see how the sizes > have changes, starting with the `.text' section. Here's the complete diff for readelf -We: --- old 2023-07-13 12:58:51.079219761 +0200 +++ new 2023-07-13 12:58:37.280681398 +0200 @@ -10,7 +10,7 @@ Version: 0x1 Entry point address: 0x8020e000 Start of program headers: 64 (bytes into file) - Start of section headers: 29650104 (bytes into file) + Start of section headers: 31749240 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) @@ -22,39 +22,39 @@ Section Headers: [Nr] Name TypeAddress OffSize ES Flg Lk Inf Al [ 0] NULL 00 00 00 0 0 0 - [ 1] .text PROGBITS8020 20 e0 00 AX 0 0 4096 - [ 2] .rodata.hotpatch PROGBITS8100 100 00229c 00 A 0 0 1 - [ 3] .rodata PROGBITS810022c0 10022c0 4fab40 00 A 0 0 64 - [ 4] .eh_frame PROGBITS814fce00 14fce00 1bf5b8 00 A 0 0 8 - [ 5] link_set_x86_hotpatch_descriptors PROGBITS816bc3b8 16bc3b8 68 00 A 0 0 8 - [ 6] link_set_modules PROGBITS816bc420 16bc420 000908 00 A 0 0 8 - [ 7] link_set_sdt_argtypes_set PROGBITS816bcd28 16bcd28 0020d8 00 A 0 0 8 - [ 8] link_set_sdt_probes_set PROGBITS816bee00 16bee00 000bb0 00 A 0 0 8 - [ 9] link_set_sdt_providers_set PROGBITS816bf9b0 16bf9b0 30 00 A 0 0 8 - [10] link_set_sysctl_funcs PROGBITS816bf9e0 16bf9e0 0002f0 00 A 0 0 8 - [11] link_set_acpi_device_calls PROGBITS816bfcd0 16bfcd0 10 00 A 0 0 8 - [12] link_set_evcnts PROGBITS816bfce0 16bfce0 000138 00 A 0 0 8 - [13] link_set_linux_module_param_desc PROGBITS816bfe18 16bfe18 0002b8 00 A 0 0 8 - [14] link_set_linux_module_param_info PROGBITS816c00d0 16c00d0 0002c0 00 A 0 0 8 - [15] link_set_domains PROGBITS816c0390 16c0390 58 00 A 0 0 8 - [16] link_set_ieee80211_funcs PROGBITS816c03e8 16c03e8 20 00 A 0 0 8 - [17] link_set_ah_chips PROGBITS816c0408 16c0408 38 00 A 0 0 8 - [18] link_set_ah_rfs PROGBITS816c0440 16c0440 38 00 A 0 0 8 - [19] link_set_dkwedge_methods PROGBITS816c0478 16c0478 18 00 A 0 0 8 - [20] link_set_prop_linkpools PROGBITS816c0490 16c0490 40 00 A 0 0 8 - [21] .data PROGBITS8180 180 0c3bf8 00 WA 0 0 64 - [22] .data.cacheline_aligned PROGBITS818c3c00 18c3c00 00e178 00 WA 0 0 64 - [23] .data.read_mostly PROGBITS818d1d80 18d1d80 001468 00 WA 0 0 32 - [24] .bss NOBITS 818d4000 18d31e8 12c000 00 WA 0 0 4096 - [25] .note.netbsd.ident NOTE81a0 18d31e8 18 00 0 0 4 - [26] .note.Xen NOTE 18d3200 000198 00 0 0 4 - [27] .identPROGBITS 18d3398 031eef 01 MS 0 0 1 - [28] .comment PROGBITS 1905287 22 01 MS 0 0 1 - [29] .SUNW_ctf PROGBITS 19052ac 0e9048 00 0 0 4 - [30] .gnu_debuglinkPROGBITS 19ee2f4 18 00 0 0 4 - [31] .symtab SYMTAB 19ee310 15a800 18 32 31599 8 - [32] .strtab STRTAB 1b48b10 0fdf6e 00 0 0 1 - [33] .shstrtab STRTAB 1c46a7e 000238 00 0 0 1 + [ 1] .text PROGBITS8020 20 100 00 AX 0 0 4096 + [ 2] .rodata.hotpatch PROGBITS8120 120 00229c 00 A 0 0 1 + [ 3] .rodata PROGBITS812022c0 12022c0 4fbc80 00 A 0 0 64 + [ 4] .eh_frame PROGBITS816fdf40 16fdf40 1bf940 00 A 0 0 8 + [ 5] link_set_x86_hotpatch_descriptors PROGBITS818bd880 18bd880 68 00 A 0 0 8 + [ 6] link_set_modules PROGBITS818bd8e8 18bd8e8 000908 00 A 0 0 8 + [ 7] link_set_sdt_argtypes_set PROGBITS818be1f0 18be1f0 0020d8 00 A 0 0 8 + [ 8] link_set_sdt_probes_set PROGBITS
Re: kernel size change
On Wed, Jul 12, 2023 at 03:01:54PM +, RVP wrote: > On Wed, 12 Jul 2023, Jonathan A. Kollasch wrote: > > > The amd64 maximum page size (or something like that) is 2MiB and I > > suspect a section of your kernel just crossed that boundary. Anyway, > > check things like size(1) and nm(1) --print-size (maybe with --size-sort) > > on both kernels. > > > > Yeah, that's more likely. A diff of `readelf -We' on the kernels would > confirm this. See if the offset has changed a lot. Many changes, but this one's the one you mean, I think: 66,67c66,67 < LOAD 0x20 0x8020 0x0020 0x14c04d0 0x14c04d0 R E 0x20 < LOAD 0x180 0x8180 0x0180 0x0d31e8 0x20 RW 0x20 --- > LOAD 0x20 0x8020 0x0020 0x16c1998 > 0x16c1998 R E 0x20 > LOAD 0x1a0 0x81a0 0x01a0 0x0d3228 > 0x20 RW 0x20 Thomas
Re: kernel size change
On Wed, Jul 12, 2023 at 09:19:50AM -0500, Jonathan A. Kollasch wrote: > On Wed, Jul 12, 2023 at 02:28:15PM +0200, Thomas Klausner wrote: > > Hi! > > > > For the last years, my nearly-GENERIC[1] kernel had size around 30MB. > > Yesterday's kernel is 32MB. > > > > Any ideas what changed, or how to find out? > > > > -rwxr-xr-x 2 root wheel 29652280 Jun 27 12:40 /netbsd.10.99.4 > > -rwxr-xr-x 2 root wheel 31751416 Jul 11 22:57 /netbsd.10.99.5 > > > > Thomas > > > > > > [1] amd64/GENERIC plus > > options FONT_GO_MONO12x23 > > no options FONT_BOLD16x32 > > no options FONT_BOLD8x16 > > options COMPAT_LINUX > > options COMPAT_LINUX32 > > The amd64 maximum page size (or something like that) is 2MiB and I > suspect a section of your kernel just crossed that boundary. Anyway, > check things like size(1) and nm(1) --print-size (maybe with --size-sort) > on both kernels. There are a lot of small changes, so I guess your page size idea is the real change. # size /netbsd.10.99.4 textdata bss dec hex filename 21759148 864728 1228800 2385267616bf684 /netbsd.10.99.4 # size /netbsd.10.99.5 textdata bss dec hex filename 23861620 864792 1228800 2595521218c0b8c /netbsd.10.99.5 diff old new (after removing address column): 355a356 > 0001 d besteffort.5 394a396 > 0001 d ready.6 3190a3193 > 0006 t memfd_ioctl 3408a3412 > 0007 r memfd_prefix 7525d7528 < 000a T dk_done 9332d9334 < 000f r __func__.2 9345a9348 > 000f r __func__.4 9822c9825 < 0010 b lasttime.6 --- > 0010 b lasttime.8 10488c10491 < 0010 r interval.5 --- > 0010 r interval.7 11515d11517 < 0012 r __func__.1 11565a11568 > 0012 r __func__.3 12316a12320 > 0014 r __func__.1 13375a13380,13381 > 0018 r CSWTCH.104 > 0018 r CSWTCH.110 13412d13417 < 0018 r CSWTCH.82 13415d13419 < 0018 r CSWTCH.88 14166a14171 > 0019 r __func__.2 14174d14178 < 0019 r __func__.3 14177a14182 > 0019 r __func__.5 16330a16336 > 0020 t crashme_kpreempt_spinout 16813d16818 < 0024 T i915_ggtt_enable_hw 17063a17069 > 0025 T i915_ggtt_enable_hw 18504c18510 < 002c r CSWTCH.62 --- > 002c r CSWTCH.64 22531a22538 > 003a t memfd_seek 22938a22946 > 003e T curcpu_stable 24149a24158 > 0044 t memfd_close 24914a24924 > 0049 t crashme_spl_spinout 26882d26891 < 0057 t entropy_softintr 28521a28531 > 005c t memfd_fcntl 29191a29202 > 0060 t entropy_softintr 29815d29825 < 0065 t entropy_pending_cpu 30843c30853 < 006d t tpm_poll --- > 006d t tpm_poll.constprop.0 30907a30918 > 006e T sys_ftruncate 32288d32298 < 0078 R audio_fileops 32308d32317 < 0078 R drm_fileops 32345d32353 < 0078 R pad_fileops 32360d32367 < 0078 R socketops 32384d32390 < 0078 R vnops 32476d32481 < 0078 r bpf_fileops 32483,32487d32487 < 0078 r cryptofops < 0078 r dmabuf_fileops < 0078 r drm_syncobj_file_ops < 0078 r drvctl_fileops < 0078 r dtv_demux_fileops 32490d32489 < 0078 r eventfd_fileops 32493,32494d32491 < 0078 r fops < 0078 r fops.2 32515,32516d32511 < 0078 r kqueueops < 0078 r ksyms_fileops 32520d32514 < 0078 r mqops 32569,32571d32562 < 0078 r pipeops < 0078 r putter_fileops < 0078 r semops 32578,32579d32568 < 0078 r sync_file_ops < 0078 r tap_fileops 32582d32570 < 0078 r timerfd_fileops 33346a5 > 0080 R audio_fileops 33347a7 > 0080 R drm_fileops 33355a33346 > 0080 R pad_fileops 33357a33349 > 0080 R socketops 33358a33351 > 0080 R vnops 33448a33442 > 0080 r bpf_fileops 33456a33451 > 0080 r cryptofops 33460a33456 > 0080 r dmabuf_fileops 33461a33458,33460 > 0080 r drm_syncobj_file_ops > 0080 r drvctl_fileops > 0080 r dtv_demux_fileops 33462a33462 > 0080 r eventfd_fileops 33463a33464,33465 > 0080 r fops > 0080 r fops.2 33515a33518,33519 > 0080 r kqueueops > 0080 r ksyms_fileops 33518a3
Re: kernel size change
On Wed, Jul 12, 2023 at 02:28:15PM +0200, Thomas Klausner wrote: > For the last years, my nearly-GENERIC[1] kernel had size around 30MB. > Yesterday's kernel is 32MB. > > Any ideas what changed, or how to find out? Comparing "nm /netbsd | sed "s/^[^ ] //" | sort" of old and new kernel, I see So mostly heartbeat, memfd, and some CSWTCH.* I don't understand. Are these really 2MB? Thomas 214a215 > A _KERNEL_OPT_HEARTBEAT 7788a7790 > T addrulwp 10609a10612 > T curcpu_stable 10776a10780 > T db_syncobj_owner 17752a17757 > T linux_sys_memfd_create 25004a25010 > T sys_memfd_create 28043c28049 < b lasttime.6 --- > b lasttime.8 28872a28879 > d besteffort.5 29696a29704 > d ready.6 31127a31136 > r CSWTCH.104 31135a31145 > r CSWTCH.110 31137d31146 < r CSWTCH.1147 31140a31150 > r CSWTCH.1153 31146c31156 < r CSWTCH.1280 --- > r CSWTCH.1283 31151,31152c31161 < r CSWTCH.1336 < r CSWTCH.1337 --- > r CSWTCH.1339 31153a31163 > r CSWTCH.1340 31336d31345 < r CSWTCH.62 31341a31351 > r CSWTCH.64 31365d31374 < r CSWTCH.82 31381d31389 < r CSWTCH.88 32775a32784 > r __func__.4 32862a32872 > r __func__.5 37991c38001 < r interval.5 --- > r interval.7 38370a38381,38382 > r memfd_fileops > r memfd_prefix 45017a45030 > t crashme_kpreempt_spinout 45021a45035 > t crashme_spl_spinout 45268a45283 > t db_show_all_tstiles 50636a50652,50660 > t memfd_close > t memfd_fcntl > t memfd_ioctl > t memfd_mmap > t memfd_read > t memfd_seek > t memfd_stat > t memfd_truncate > t memfd_write 56216c56240 < t tpm_poll --- > t tpm_poll.constprop.0 56225c56249 < t tpm_waitfor.constprop.0 --- > t tpm_waitfor.constprop.0.isra.0 58046a58071 > t vn_truncate
kernel size change
Hi! For the last years, my nearly-GENERIC[1] kernel had size around 30MB. Yesterday's kernel is 32MB. Any ideas what changed, or how to find out? -rwxr-xr-x 2 root wheel 29652280 Jun 27 12:40 /netbsd.10.99.4 -rwxr-xr-x 2 root wheel 31751416 Jul 11 22:57 /netbsd.10.99.5 Thomas [1] amd64/GENERIC plus options FONT_GO_MONO12x23 no options FONT_BOLD16x32 no options FONT_BOLD8x16 options COMPAT_LINUX options COMPAT_LINUX32
scp/sftp -R broken?
Hi! When I try to recursively copy a directory with "scp -r" or sftp's "put -Rp" between a -current and a NetBSD 9, I see: # scp -r a netbsd-9: scp: realpath ./a: No such file scp: upload "./a": path canonicalization failed scp: failed to upload directory a to . # ssh -V OpenSSH_9.1 NetBSD_Secure_Shell-20221004-hpn13v14-lpk, OpenSSL 3.0.8 7 Feb 2023 netbsd-9# ssh -V OpenSSH_8.0 NetBSD_Secure_Shell-20220604-hpn13v14-lpk, OpenSSL 1.1.1k 25 Mar 2021 scp of single files works. The same command works if I copy it onto the same machine (and thus same ssh on the other side), both current -> current and netbsd9 -> netbsd9. Any ideas why this doesn't work, and what the error message wants to tell me?? Thomas
Re: -current build failure
On Mon, Jun 05, 2023 at 03:01:37AM +1000, Luke Mewburn wrote: > I managed to reproduced this just building the tools with -V MKLLVM=yes. > I've reverted tools/Makefile.host revision 1.35 and it seems to fix the > tools build for me. > > Does this resolve the issue for you? Yes. Now I just have a setlists problem: === 10 extra files in DESTDIR = Files in DESTDIR but missing from flist. File is obsolete or flist is out of date ? -- ./usr/tests/libexec/ld.elf_so/libh_abuse_dynamic_g.a ./usr/tests/libexec/ld.elf_so/libh_abuse_static_g.a ./usr/tests/libexec/ld.elf_so/libh_def_dynamic_g.a ./usr/tests/libexec/ld.elf_so/libh_def_static_g.a ./usr/tests/libexec/ld.elf_so/libh_onlyctor_dynamic_g.a ./usr/tests/libexec/ld.elf_so/libh_onlydef_g.a ./usr/tests/libexec/ld.elf_so/libh_onlyuse_dynamic_g.a ./usr/tests/libexec/ld.elf_so/libh_onlyuse_static_g.a ./usr/tests/libexec/ld.elf_so/libh_use_dynamic_g.a ./usr/tests/libexec/ld.elf_so/libh_use_static_g.a = end of 10 extra files === Thomas
-current build failure
Hi! I just tried updating my -current but the build failed: build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -V NOGCCERROR=yes -T /usr/obj/tools.gcc -m amd64 -O /usr/obj/src.amd64 -D /usr/obj/amd64.gcc.20230604 -R /usr/obj/amd64.gcc.20230604.release distribution --- support-modules --- g++: error: unrecognized command-line option '-stdlib=libc++' g++: error: unrecognized command-line option '-fmodules'; did you mean '-fmoduleinfo'? g++: error: unrecognized command-line option '-fcxx-modules' g++: error: unrecognized command-line option '-fmodules-cache-path=./module.cache' Any ideas how to fix this? Cheers, Thomas
Re: LLONG_MAX not available from c++
On Fri, Mar 31, 2023 at 02:46:18PM +0200, Joerg Sonnenberger wrote: > Am Fri, Mar 31, 2023 at 02:39:40PM +0200 schrieb Thomas Klausner: > > On Fri, Mar 31, 2023 at 02:35:38PM +0200, Martin Husemann wrote: > > > Which options does it pass to g++ ? > > > > Good point, but it's not the compiler, it's lua itself: > > > > tar xvzf lua-5.4.4.tar.gz > > cd lua-5.4.4/src > > c++ lbaselib.c > > > > and see it fail. > > > > In file included from lua.h:16, > > from lbaselib.c:18: > > luaconf.h:557:2: error: #error "Compiler does not support 'long long'. Use > > option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for > > details)" > > 557 | #error "Compiler does not support 'long long'. Use option > > '-DLUA_32BITS' \ > > | ^ > > Make sure c++ with using at least -std=c++11? Same error, also with c++17 and gnu++17. Probably lua does something weird. > Also, to ensure stack > unwinding for C, -fexceptions should be enough. Thanks for the information. I don't expect I'd get MAME to change to that though. Thomas
Re: LLONG_MAX not available from c++
On Fri, Mar 31, 2023 at 02:35:38PM +0200, Martin Husemann wrote: > Which options does it pass to g++ ? Good point, but it's not the compiler, it's lua itself: tar xvzf lua-5.4.4.tar.gz cd lua-5.4.4/src c++ lbaselib.c and see it fail. In file included from lua.h:16, from lbaselib.c:18: luaconf.h:557:2: error: #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for details)" 557 | #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' \ | ^ Thomas
LLONG_MAX not available from c++
Hi! mame wants to compile lua with a c++ compiler.[1] lua has a check in its headers to detect C99 mode by looking for LLONG_MAX. If that is not found (and no workaround like an explicit fallback to 32-bit ints is defined) then it fails to compile. g++ in -current doesn't get this symbol when you include limits.h (which lua does, since this is still C code) because of (from /usr/include/machine/limits.h): #if defined(_ISOC99_SOURCE) || (__STDC_VERSION__ - 0) >= 199901L || \ defined(_NETBSD_SOURCE) #define ULLONG_MAX 0xULL /* max unsigned long long */ #define LLONG_MAX 0x7fffLL/* max signed long long */ #define LLONG_MIN (-0x7fffLL-1) /* min signed long long */ #endif What is the best solution here? 1. define LUA_32BITS to use short ints? 2. pass a magic define to the compiler so the header works better when used from c++? (which one would that be, _NETBSD_SOURCE?) 3. change the installed header in some way (but that won't help NetBSD 8/9/10) Suggestions welcome! Thomas [1] https://www.mamedev.org/ says: The technical reason for this change is that MAME requires C++ stack frames to be unwound correctly, including destructor calls, when Lua errors are raised from C++ code. Using Lua compiled as C will cause resource leaks.
Re: error installing libiconv-1.17
On Mon, Mar 27, 2023 at 10:03:18AM +, Riccardo Mottola wrote: > I am trying to upgrade current pkgsrc packages on current. > > Current installed version: > libiconv-1.14nb3Character set conversion library IIRC libiconv doesn't build if a different version is already installed - is that the case in your setup? Thomas
Re: nouveau: console stops updating
On Sun, Mar 19, 2023 at 02:23:42PM +0100, Thomas Klausner wrote: > Ok, so here I'm answering my own question - I looked in the BIOS > settings and in the 'default boot options' I selected 'Legacy OPROM' > instead of either that or UEFI, and the machine booted fine and I > could start X! Yay :) And with today's snapshot which has the attached pullup, I don't even need to do that - it just works. Thank you :-) Thomas--- Begin Message --- The following reply was made to PR port-amd64/53126; it has been noted by GNATS. From: "Martin Husemann" To: gnats-b...@gnats.netbsd.org Cc: Subject: PR/53126 CVS commit: [netbsd-10] src/sys Date: Mon, 20 Mar 2023 17:24:15 + Module Name: src Committed By: martin Date: Mon Mar 20 17:24:15 UTC 2023 Modified Files: src/sys/dev/wscons [netbsd-10]: wsdisplay.c wsdisplayvar.h src/sys/external/bsd/drm2/dist/drm/amd/amdgpu [netbsd-10]: amdgpu_gart.c src/sys/external/bsd/drm2/nouveau [netbsd-10]: nouveau_pci.c src/sys/external/bsd/drm2/radeon [netbsd-10]: radeon_pci.c Log Message: Pull up following revision(s) (requested by mrg in ticket #122): sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_gart.c: revision 1.11 sys/external/bsd/drm2/nouveau/nouveau_pci.c: revision 1.37 sys/external/bsd/drm2/radeon/radeon_pci.c: revision 1.22 sys/dev/wscons/wsdisplay.c: revision 1.166 sys/dev/wscons/wsdisplayvar.h: revision 1.57 amdgpu: Fix bogus loop invariant assertions in amdgpu_gart_map. nouveau: Kick out genfb on firmware framebuffer before initializing. PR kern/53126 radeon: Kick out genfb on firmware framebuffer before initializing. this is the same change as nouveau_pci.c:1.37, and should fix at least PR#56714 and i thought at least another PR i can't find right now. it fixes at least 2 different radeon cards for me on UEFI booted system. To generate a diff of this commit: cvs rdiff -u -r1.165 -r1.165.4.1 src/sys/dev/wscons/wsdisplay.c cvs rdiff -u -r1.56 -r1.56.4.1 src/sys/dev/wscons/wsdisplayvar.h cvs rdiff -u -r1.10 -r1.10.4.1 \ src/sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_gart.c cvs rdiff -u -r1.36 -r1.36.4.1 \ src/sys/external/bsd/drm2/nouveau/nouveau_pci.c cvs rdiff -u -r1.21 -r1.21.4.1 src/sys/external/bsd/drm2/radeon/radeon_pci.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. --- End Message ---
Re: nouveau: console stops updating
On Sun, Mar 19, 2023 at 02:16:47PM +0100, Thomas Klausner wrote: > I tried a NetBSD 10 snapshot with a GTX 970 today. > > Sysinst ran fine -- in high resolution! -- but when I booted NetBSD > after the installation, I get the screen update stop as reported in PR > 57168 and PR 53126. > > So I wonder if why it worked for sysinst, and how I could force my > BIOS to do the same for the installed NetBSD. Any ideas/hints? Ok, so here I'm answering my own question - I looked in the BIOS settings and in the 'default boot options' I selected 'Legacy OPROM' instead of either that or UEFI, and the machine booted fine and I could start X! Yay :) Thomas
nouveau: console stops updating
Hi! I tried a NetBSD 10 snapshot with a GTX 970 today. Sysinst ran fine -- in high resolution! -- but when I booted NetBSD after the installation, I get the screen update stop as reported in PR 57168 and PR 53126. So I wonder if why it worked for sysinst, and how I could force my BIOS to do the same for the installed NetBSD. Any ideas/hints? Thomas
Re: 10.99.2 panic in kern_timeout.c
On Fri, Feb 03, 2023 at 09:24:11AM -, Michael van Elst wrote: > w...@netbsd.org (Thomas Klausner) writes: > > >> The biggest change recently is probably that my bulk build switched > >> from ghc92 to ghc94, but I don't know if that could cause this. > > >Next bulk build, next panic, quite reliably. Has anyone else seen this? > > Not yet, maybe this is the first use of timerfd from multiple threads. I used /usr/pkgsrc/mk> cvs di haskell.mk Index: haskell.mk === RCS file: /cvsroot/pkgsrc/mk/haskell.mk,v retrieving revision 1.54 diff -u -r1.54 haskell.mk --- haskell.mk 1 Feb 2023 03:37:21 - 1.54 +++ haskell.mk 4 Feb 2023 07:56:28 - @@ -148,7 +148,7 @@ HASKELL_ENABLE_TESTS?= no HASKELL_UNRESTRICT_DEPENDENCIES?= # empty -.include "../../lang/ghc94/buildlink3.mk" +.include "../../lang/ghc92/buildlink3.mk" # Some Cabal packages requires preprocessors to build, and we don't # want them to implicitly depend on such tools. Place dummy scripts by and the bulk build succeeded. Can someone else please try building e.g. pandoc in a bulk build on 10.99.2/amd64/Jan 27 and check if they see the same issue? Thanks, Thomas
Re: 10.99.2 panic in kern_timeout.c
On Wed, Feb 01, 2023 at 02:00:07PM +0100, Thomas Klausner wrote: > I have a new problem on a system running 10.99.2/amd64 from Jan 27, > which was heavily bulk building most of the time, and stable. > > Now I have seen this panic twice today already (OCR'd so beware of typos): > > panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || > c->c_cu->ce_active != c" failed: file "/usr/src/sys/kern/kern_timeout.c" line > 381 running callout 0x96631f85b80: c_func (Ox8@e070e3) c_flags > (0x100) destroyed from OxfFff80e51f99 > cpu31: Begin traceback... > vpanic() at netbsd:vpanic+0x183 > kern_assert() at netbsd:kern_assert+0x4b > callout_destroy() at netbsd:callout_destroy+0xa2 > timerfd_fop_close() at netbsd:timerfd_fop_close+0x36 > closef() at netbsd:closef+0x60 > fd_close() at netbsd:fd_close+ 0x138 > sys_close() at netbsd:sys_close+0x22 > syscall() at netbsd: svscall+0x196 > --- syscall (number 6) --- > netbsd: syscall+0x196: > cpu31: End traceback.. > fatal breakpoint trap in supervisor mode > trap type 1 code 0 rip 0x80235315 cs 0x8 rflags 0x202 cr2 > 0x71b8870b12dd ilevel 0 rsp 0xa2a133029db0 > > The biggest change recently is probably that my bulk build switched > from ghc92 to ghc94, but I don't know if that could cause this. Next bulk build, next panic, quite reliably. Has anyone else seen this? Thomas
10.99.2 panic in kern_timeout.c
Hi! I have a new problem on a system running 10.99.2/amd64 from Jan 27, which was heavily bulk building most of the time, and stable. Now I have seen this panic twice today already (OCR'd so beware of typos): panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || c->c_cu->ce_active != c" failed: file "/usr/src/sys/kern/kern_timeout.c" line 381 running callout 0x96631f85b80: c_func (Ox8@e070e3) c_flags (0x100) destroyed from OxfFff80e51f99 cpu31: Begin traceback... vpanic() at netbsd:vpanic+0x183 kern_assert() at netbsd:kern_assert+0x4b callout_destroy() at netbsd:callout_destroy+0xa2 timerfd_fop_close() at netbsd:timerfd_fop_close+0x36 closef() at netbsd:closef+0x60 fd_close() at netbsd:fd_close+ 0x138 sys_close() at netbsd:sys_close+0x22 syscall() at netbsd: svscall+0x196 --- syscall (number 6) --- netbsd: syscall+0x196: cpu31: End traceback.. fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 0x80235315 cs 0x8 rflags 0x202 cr2 0x71b8870b12dd ilevel 0 rsp 0xa2a133029db0 The biggest change recently is probably that my bulk build switched from ghc92 to ghc94, but I don't know if that could cause this. Ideas? Thomas
Re: gnucash coredump on startup
On Sun, Jan 08, 2023 at 11:58:09PM +0100, Thomas Klausner wrote: > On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote: > > In article , > > Thomas Klausner wrote: > > >Hi! > > > > > >I've just replaced my 10.99.2/20221231 userland (kernel slightly > > >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland. > > > > > >Now gnucash dumps core on startup: > > > > Could be rtld related. Can you try with the older ld_elf.so? > > I can't go back with just that one, but I did the following test: > > Downgraded my whole userland to 20221231: > gnucash works > > install ld_elf.so from 20230107: > gnucash dumps core > > install ld_elf.so from 20221231: > gnucash works > > So yes, definitely an issue in ld_elf.so :) After christos' commits from the last hours, this bug is fixed - both gnucash and guile (old binaries) still work on an updated system. I'll re-build the system from scratch next and report if there are any issues. Thanks, christos! Thomas
Re: gnucash coredump on startup
On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote: > In article , > Thomas Klausner wrote: > >Hi! > > > >I've just replaced my 10.99.2/20221231 userland (kernel slightly > >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland. > > > >Now gnucash dumps core on startup: > > Could be rtld related. Can you try with the older ld_elf.so? I can't go back with just that one, but I did the following test: Downgraded my whole userland to 20221231: gnucash works install ld_elf.so from 20230107: gnucash dumps core install ld_elf.so from 20221231: gnucash works So yes, definitely an issue in ld_elf.so :) Thomas
Re: ldscripts not cleaned up
On Sun, Jan 08, 2023 at 08:48:24PM -, Christos Zoulas wrote: > In article , > Thomas Klausner wrote: > >Hi! > > > >NetBSD after the switch to binutils 2.39 does not install the > >following files any longer, but they are not marked as obsolete > >either: > > > >/usr/libdata/ldscripts/elf_k1om.x > >/usr/libdata/ldscripts/elf_k1om.xbn > >/usr/libdata/ldscripts/elf_k1om.xc > >/usr/libdata/ldscripts/elf_k1om.xd > >/usr/libdata/ldscripts/elf_k1om.xdc > >/usr/libdata/ldscripts/elf_k1om.xdw > >/usr/libdata/ldscripts/elf_k1om.xn > >/usr/libdata/ldscripts/elf_k1om.xr > >/usr/libdata/ldscripts/elf_k1om.xs > >/usr/libdata/ldscripts/elf_k1om.xsc > >/usr/libdata/ldscripts/elf_k1om.xsw > >/usr/libdata/ldscripts/elf_k1om.xu > >/usr/libdata/ldscripts/elf_k1om.xw > >/usr/libdata/ldscripts/elf_l1om.x > >/usr/libdata/ldscripts/elf_l1om.xbn > >/usr/libdata/ldscripts/elf_l1om.xc > >/usr/libdata/ldscripts/elf_l1om.xd > >/usr/libdata/ldscripts/elf_l1om.xdc > >/usr/libdata/ldscripts/elf_l1om.xdw > >/usr/libdata/ldscripts/elf_l1om.xn > >/usr/libdata/ldscripts/elf_l1om.xr > >/usr/libdata/ldscripts/elf_l1om.xs > >/usr/libdata/ldscripts/elf_l1om.xsc > >/usr/libdata/ldscripts/elf_l1om.xsw > >/usr/libdata/ldscripts/elf_l1om.xu > >/usr/libdata/ldscripts/elf_l1om.xw > > > >Should new binutils install them, or should they be marked as obsolete? > > > They should be marked as obsolete... Done! Thomas
lang/guile30 crash in build even without lto [was Re: lang/guile30 build issue: lto support missing in ar/ranlib]
On Sun, Jan 08, 2023 at 12:38:05PM -0500, Greg Troxel wrote: > Thomas Klausner writes: > > > On 10.99.2 after the load sections 2->4 change I see the following > > when building lang/guile30: > > > > ar: libguile_3.0_la-alist.o: plugin needed to handle lto object > > ranlib: .libs/libguile-3.0.a(libguile_3.0_la-alist.o): plugin needed to > > handle lto object > > CCLD guile > > > > and the resulting binary segfaults when run (which also happens during > > the build), backtrace below. > > > > Is there a flag to turn off lto, or can we please get ar/ranlib > > support for lto? > > > > To reproduce, just try building 'lang/guile30'. > > It fails to even build on i386. I have a local patch, pending figuring > it out, to just disable lto. I was unsure if that belonged on only some > arches, but seems best to mass disable and theni figure it out. Ok, I tried that out - the warning is gone, but guile is still crashing. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x799bb2aff3a6 in scm_sloppy_assq () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 (gdb) bt #0 0x799bb2aff3a6 in scm_sloppy_assq () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #1 0x799bb2b27c28 in scm_hash_fn_ref () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #2 0x799bb2b15f18 in expand () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #3 0x799bb2b160c4 in expand_and () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #4 0x799bb2b16e49 in expand_cond_clauses () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #5 0x799bb2b16e06 in expand_cond_clauses () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #6 0x799bb2b15dda in expand () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #7 0x799bb2b18f7f in expand_letrec_helper () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #8 0x799bb2b17492 in expand_lambda_star_case () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #9 0x799bb2b179a9 in expand_lambda_star () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #10 0x799bb2b1688a in expand_set_x () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #11 0x799bb2b1618b in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #12 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #13 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #14 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #15 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #16 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #17 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #18 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #19 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #20 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #21 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #22 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #23 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #24 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #25 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #26 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #27 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #28 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #29 0x799bb2b1617c in expand_sequence () from /scratch/lang/guile30/work/guile-3.0.8/libguile/.libs/libguile-3.0.so.1 #30 0x799bb2b1617c in expand_sequence () from /scratch/l
ldscripts not cleaned up
Hi! NetBSD after the switch to binutils 2.39 does not install the following files any longer, but they are not marked as obsolete either: /usr/libdata/ldscripts/elf_k1om.x /usr/libdata/ldscripts/elf_k1om.xbn /usr/libdata/ldscripts/elf_k1om.xc /usr/libdata/ldscripts/elf_k1om.xd /usr/libdata/ldscripts/elf_k1om.xdc /usr/libdata/ldscripts/elf_k1om.xdw /usr/libdata/ldscripts/elf_k1om.xn /usr/libdata/ldscripts/elf_k1om.xr /usr/libdata/ldscripts/elf_k1om.xs /usr/libdata/ldscripts/elf_k1om.xsc /usr/libdata/ldscripts/elf_k1om.xsw /usr/libdata/ldscripts/elf_k1om.xu /usr/libdata/ldscripts/elf_k1om.xw /usr/libdata/ldscripts/elf_l1om.x /usr/libdata/ldscripts/elf_l1om.xbn /usr/libdata/ldscripts/elf_l1om.xc /usr/libdata/ldscripts/elf_l1om.xd /usr/libdata/ldscripts/elf_l1om.xdc /usr/libdata/ldscripts/elf_l1om.xdw /usr/libdata/ldscripts/elf_l1om.xn /usr/libdata/ldscripts/elf_l1om.xr /usr/libdata/ldscripts/elf_l1om.xs /usr/libdata/ldscripts/elf_l1om.xsc /usr/libdata/ldscripts/elf_l1om.xsw /usr/libdata/ldscripts/elf_l1om.xu /usr/libdata/ldscripts/elf_l1om.xw Should new binutils install them, or should they be marked as obsolete? Thomas
lang/guile30 build issue: lto support missing in ar/ranlib
Hi! On 10.99.2 after the load sections 2->4 change I see the following when building lang/guile30: ar: libguile_3.0_la-alist.o: plugin needed to handle lto object ranlib: .libs/libguile-3.0.a(libguile_3.0_la-alist.o): plugin needed to handle lto object CCLD guile and the resulting binary segfaults when run (which also happens during the build), backtrace below. Is there a flag to turn off lto, or can we please get ar/ranlib support for lto? To reproduce, just try building 'lang/guile30'. Thanks, Thomas [New process 4469] Core was generated by `guile'. Program terminated with signal SIGSEGV, Segmentation fault. #0 weak_set_lookup.constprop.0 (set=set@entry=0x7902a5065150, hash=581991487143039215, hash@entry=290995743571519607, pred=pred@entry=0x7902a58f6f1b , closure=closure@entry=0x7f7fff302a50, dflt=) at weak-set.c:483 483 other_hash = entries[k].hash; (gdb) bt #0 weak_set_lookup.constprop.0 (set=set@entry=0x7902a5065150, hash=581991487143039215, hash@entry=290995743571519607, pred=pred@entry=0x7902a58f6f1b , closure=closure@entry=0x7f7fff302a50, dflt=) at weak-set.c:483 #1 0x7902a58f56b8 in scm_c_weak_set_lookup (dflt=0x4, closure=0x7f7fff302a50, pred=0x7902a58f6f1b , raw_hash=290995743571519607, set=) at weak-set.c:763 #2 lookup_interned_symbol (raw_hash=290995743571519607, name=0x7902a500ef80) at symbols.c:112 #3 scm_i_str2symbol (str=0x7902a500ef80) at symbols.c:244 #4 0x7902a58f92a6 in scm_string_to_symbol (string=) at symbols.c:360 #5 0x7902a58fe437 in scm_gensym (prefix=0x7902a4ed30e0) at symbols.c:408 #6 0x7902a5872cb1 in transform_bindings (bindings=bindings@entry=0x7902a5028cb0, expr=expr@entry=0x7902a5028d50, names=names@entry=0x7f7fff302c20, vars=vars@entry=0x7f7fff302c18, initptr=initptr@entry=0x7f7fff302c10) at expand.c:948 #7 0x7902a5872e49 in expand_let (expr=0x7902a5028d50, env=0x7902a5070ff0) at expand.c:1012 #8 0x7902a5871157 in expand_if (expr=0x7902a501d120, env=0x7902a5070ff0) at expand.c:586 #9 0x7902a587266c in expand_letstar_clause (bindings=0x7902a501d140, body=0x7902a502cdf0, env=0x7902a5065150) at expand.c:1074 #10 0x7902a587266c in expand_letstar_clause (bindings=0x7902a501d1e0, body=0x7902a502cdf0, env=0x7902a5065200) at expand.c:1074 #11 0x7902a5871157 in expand_if (expr=0x7902a501d5c0, env=0x7902a5065200) at expand.c:586 #12 0x7902a5871359 in expand_lambda_case (clause=0x7902a501d5d0, alternate=alternate@entry=0x4, env=) at expand.c:662 #13 0x7902a587162e in expand_lambda (expr=0x7902a501d610, env=) at expand.c:676 #14 0x7902a58732ce in expand_exprs (env=0x7902a5062a40, forms=0x7902a5062ad0) at expand.c:393 #15 expand_letrec_helper (expr=, env=0x7902a5062a40, in_order_p=0x404) at expand.c:1040 #16 0x7902a5871359 in expand_lambda_case (clause=0x7902a501dd80, alternate=alternate@entry=0x4, env=) at expand.c:662 #17 0x7902a587162e in expand_lambda (expr=0x7902a501ddc0, env=) at expand.c:676 #18 0x7902a58732ce in expand_exprs (env=0x7902a5059560, forms=0x7902a5059c20) at expand.c:393 #19 expand_letrec_helper (expr=, env=0x7902a5059560, in_order_p=0x404) at expand.c:1040 #20 0x7902a58703d8 in expand (exp=0x7902a501de50, env=0x7902a5055020) at expand.c:361 #21 0x7902a5870703 in expand_sequence (forms=0x7902a5034160, env=0x7902a5055020) at expand.c:405 #22 0x7902a58706f4 in expand_sequence (forms=0x7902a501de60, env=0x7902a5055020) at expand.c:405 #23 0x7902a58706f4 in expand_sequence (forms=0x7902a501df10, env=0x7902a5055020) at expand.c:405 #24 0x7902a58706f4 in expand_sequence (forms=0x7902a501dfc0, env=0x7902a5055020) at expand.c:405 #25 0x7902a58706f4 in expand_sequence (forms=0x7902a501a070, env=0x7902a5055020) at expand.c:405 #26 0x7902a58706f4 in expand_sequence (forms=0x7902a501a120, env=0x7902a5055020) at expand.c:405 #27 0x7902a58706f4 in expand_sequence (forms=0x7902a501a1d0, env=0x7902a5055020) at expand.c:405 #28 0x7902a58706f4 in expand_sequence (forms=0x7902a501a900, env=0x7902a5055020) at expand.c:405 #29 0x7902a58706f4 in expand_sequence (forms=0x7902a5014230, env=0x7902a5055020) at expand.c:405 #30 0x7902a58706f4 in expand_sequence (forms=0x7902a5014860, env=0x7902a5055020) at expand.c:405 #31 0x7902a58706f4 in expand_sequence (forms=0x7902a500c1f0, env=0x7902a5055020) at expand.c:405 #32 0x7902a58706f4 in expand_sequence (forms=0x7902a500cc50, env=0x7902a5055020) at expand.c:405 #33 0x7902a58706f4 in expand_sequence (forms=0x7902a50096b0, env=0x7902a5055020) at expand.c:405 #34 0x7902a58706f4 in expand_sequence (forms=0x7902a50056e0, env=0x7902a5055020) at expand.c:405 #35 0x7902a58706f4 in expand_sequence (forms=0x7902a50010d0, env=0x7902a5055020) at expand.c:405 #36 0x7902a58706f4 in expand_sequence (forms=0x7902a5001cc0, env=0x7902a5055020) at expand.c:405 #37 0x7902a58706f4 in expand_sequence (forms=0x7902a4ffc8b0, env=0x7902a5055020) at
Re: gnucash coredump on startup
On Sat, Jan 07, 2023 at 03:42:00PM -, Christos Zoulas wrote: > In article , > Thomas Klausner wrote: > >Hi! > > > >I've just replaced my 10.99.2/20221231 userland (kernel slightly > >older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland. > > > >Now gnucash dumps core on startup: > > Could be rtld related. Can you try with the older ld_elf.so? Not really, because the base system uses 4 segments now and the old one doesn't handle it - I can just downgrade the whole system. # cd /archive/build/amd64.gcc.20221231/libexec/ # install -c ld.elf_so /libexec/ # ls ls: Shared object "libutil.so.7" not found # cd /archive/build/amd64.gcc.20230107/libexec/ # install -c ld.elf_so /libexec/ install: Shared object "libutil.so.7" not found # LD_PRELOAD=/lib/libutil.so.7 install /lib/libutil.so.7: wrong number of segments (4 != 2) But yes, I suspect that too. Btw, I also see a core dump building lang/guile30 on that system, that's probably related and has less dependencies for trying out on your system. Thomas
gnucash coredump on startup
Hi! I've just replaced my 10.99.2/20221231 userland (kernel slightly older, but also 10.99.2) with a 10.99.2/20230107 kernel+userland. Now gnucash dumps core on startup: (gdb) bt #0 0x in ?? () #1 0x7f59b66414b1 in scm_c_hook_run (hook=0x7f59b695d140 , data=0x0) at chooks.c:95 #2 0x7f59b665fb79 in after_gc_async_thunk () at gc.c:523 #3 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #4 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=5) at vm.c:1608 #5 0x7f59b6647b6a in scm_apply_0 (proc=0x7f59a6a9c500, args=0x304) at eval.c:603 #6 0x7f59b66dcbd7 in scm_throw (key=0x7f59a6a41560, args=0x7f59a6b3c2a0) at throw.c:262 #7 0x7f59b66dcbfe in scm_ithrow (key=, args=, no_return=) at throw.c:457 #8 0x7f59b6644224 in scm_error_scm (key=key@entry=0x7f59a6a41560, subr=, message=message@entry=0x7f59a6a574e0, args=args@entry=0x7f59a6b3c2e0, data=data@entry=0x7f59a6b3c310) at error.c:90 #9 0x7f59b6644282 in scm_error (key=0x7f59a6a41560, subr=subr@entry=0x7f59b670f487 "scm_hash_fn_get_handle", message=message@entry=0x7f59b670c9e0 "Wrong type argument in position ~A (expecting ~A): ~S", args=0x7f59a6b3c2e0, rest=rest@entry=0x7f59a6b3c310) at error.c:62 #10 0x7f59b664592a in scm_wrong_type_arg_msg (subr=0x7f59b670f487 "scm_hash_fn_get_handle", pos=1, bad_value=0x7f59a6a48200, szMessage=) at error.c:282 #11 0x7f59b665c1ad in scm_hash_fn_get_handle (table=, obj=, hash_fn=, assoc_fn=, closure=) at hashtab.c:226 #12 0x7f59b665c1f5 in scm_hash_fn_ref (table=, obj=, dflt=0x4, hash_fn=, assoc_fn=, closure=) at hashtab.c:300 #13 0x7f59b6680574 in scm_symbol_to_keyword (symbol=0x7f59a6a45ec0) at keywords.c:72 #14 0x7f59b66df931 in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:1486 #15 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608 #16 0x7f59b668589a in load_thunk_from_memory (data=0x7f59a621b000 "\177ELF\002\001\001\377", len=716869, is_read_only=) at loader.c:480 #17 0x7f59b6709a64 in scm_c_with_exception_handler.constprop.0 (type=0x404, handler_data=handler_data@entry=0x7f7fff6f1830, thunk_data=thunk_data@entry=0x7f7fff6f1830, thunk=, handler=) at exceptions.c:170 #18 0x7f59b66db1f6 in scm_c_catch (tag=, body=, body_data=, handler=, handler_data=, pre_unwind_handler=, pre_unwind_handler_data=0x0) at throw.c:168 #19 0x7f59b66861e6 in try_load_thunk_from_file (filename=0x7f59a6b393c0) at load.c:622 #20 load_thunk_from_path (filename=filename@entry=0x7f59a6b39400, source_file_name=source_file_name@entry=0x7f59a6b393e0, source_stat_buf=source_stat_buf@entry=0x7f7fff6f1be0, found_stale_file=found_stale_file@entry=0x7f7fff6f1b3c) at load.c:765 #21 0x7f59b66863f0 in scm_primitive_load_path (args=) at load.c:1209 #22 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #23 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608 #24 0x7f59b668650a in scm_primitive_load_path (args=) at load.c:1259 #25 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #26 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608 #27 0x7f59b668650a in scm_primitive_load_path (args=) at load.c:1259 #28 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #29 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608 #30 0x7f59b668650a in scm_primitive_load_path (args=) at load.c:1259 #31 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #32 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=3) at vm.c:1608 #33 0x7f59b6642d8e in scm_call_3 (proc=, arg1=, arg2=, arg3=) at eval.c:510 #34 0x7f59b6684c8d in scm_public_variable (module_name=0x7f59a6b11830, name=0x7f59a6a42120) at modules.c:673 #35 0x7f59b66d8028 in init_eval_string_var_and_k_module () at strports.c:363 #36 0x7f59b69fa717 in pthread_once (once_control=once_control@entry=0x7f59b69555e0 , routine=routine@entry=0x7f59b66d7ffc ) at /disk/6/archive/foreign/src/lib/libpthread/pthread_once.c:66 #37 0x7f59b66d3f53 in scm_eval_string_in_module (string=0x7f59a6b0b2e0, module=0x904) at strports.c:379 #38 0x7f59b66d4004 in scm_eval_string (string=) at strports.c:394 #39 0x7f59b66d5d3c in scm_c_eval_string (expr=) at strports.c:347 #40 0x00f50740 in scm_run_gnucash (data=0x7f7fff6f2de0, argc=, argv=) at /scratch/finance/gnucash/work/gnucash-4.13/gnucash/gnucash.cpp:142 #41 0x7f59b665b6fc in invoke_main_func (body_data=0x7f7fff6f2da0) at init.c:312 #42 0x7f59b6640e62 in c_body (d=0x7f7fff6f2c60) at continuations.c:430 #43 0x7f59b66de59b in vm_regular_engine (thread=0x7f59a729ad80) at vm-engine.c:972 #44 0x7f59b66ecd28 in scm_call_n (proc=, argv=, nargs=2) at vm.c:1608 #45 0x7f59b6642cde in scm_call_2 (proc=, arg1=, arg2=) at eval.c:503 #46
MKLLVM build broken
build.sh -j 32 -x -V MKDEBUG=yes -V MKDEBUGLIB=yes -V MKLLVM=yes -V NOGCCERROR=yes -m amd64 distribution with cvs from about an hour ago failed with: src/external/bsd/compiler_rt/lib/clang/lib/netbsd/safestack-m64/../../../../../../../../sys/external/bsd/compiler_rt/dist/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cc:2182:31: error: 'TIOCRCVFRAME' was not declared in this scope; did you mean 'IOCTL_TIOCRCVFRAME'? 2182 | unsigned IOCTL_TIOCRCVFRAME = TIOCRCVFRAME; | ^~~~ | IOCTL_TIOCRCVFRAME src/external/bsd/compiler_rt/lib/clang/lib/netbsd/safestack-m64/../../../../../../../../sys/external/bsd/compiler_rt/dist/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cc:2183:31: error: 'TIOCXMTFRAME' was not declared in this scope; did you mean 'TIOCPTSNAME'? 2183 | unsigned IOCTL_TIOCXMTFRAME = TIOCXMTFRAME; | ^~~~ | TIOCPTSNAME Thomas
Re: .gdbinit files in the repository
On Wed, Nov 23, 2022 at 04:15:37PM -0500, Andrew Cagney wrote: > On Tue, 22 Nov 2022 at 11:49, Thomas Klausner wrote: > > > > Hi! > > > > Should these files be there? > > > > /usr/src> find . -name .gdbinit > > ./external/gpl3/binutils/dist/gprof/.gdbinit > > ./external/gpl3/binutils.old/dist/gprof/.gdbinit > > ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit > > ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit > > ./external/gpl3/gdb/dist/sim/ppc/.gdbinit > > ./external/gpl3/gdb/dist/gprof/.gdbinit > > ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit > > ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit > > ./external/gpl3/gdb.old/dist/sim/ppc/.gdbinit > > ./external/lgpl3/gmp/dist/.gdbinit > > > > Looks to me like they should be deleted by the *2netbsd import preparation > > scripts. > > Is there a problem? For instance, removing > gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit would break > that test (if someone were to desire to run it). No problem - but usually dot files are not displayed (with ls) and this could lead to surprises when you start gdb in those directories. Thomas
.gdbinit files in the repository
Hi! Should these files be there? /usr/src> find . -name .gdbinit ./external/gpl3/binutils/dist/gprof/.gdbinit ./external/gpl3/binutils.old/dist/gprof/.gdbinit ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit ./external/gpl3/gdb/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit ./external/gpl3/gdb/dist/sim/ppc/.gdbinit ./external/gpl3/gdb/dist/gprof/.gdbinit ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/unlimited/.gdbinit ./external/gpl3/gdb.old/dist/gdb/testsuite/gdb.base/gdbinit-history/zero/.gdbinit ./external/gpl3/gdb.old/dist/sim/ppc/.gdbinit ./external/lgpl3/gmp/dist/.gdbinit Looks to me like they should be deleted by the *2netbsd import preparation scripts. Ok to remove? Thomas
Re: weird less(1) CTRL-Z behaviour
On Mon, Oct 17, 2022 at 12:06:51PM +0200, Thomas Klausner wrote: > On Fri, Oct 14, 2022 at 11:25:49PM +, RVP wrote: > > On Wed, 12 Oct 2022, RVP wrote: > > > > > On Wed, 12 Oct 2022, Thomas Klausner wrote: > > > > > > > bin/57053: continuation problem in shell pipelines > > > > > > > > > > FYI: Just tried on FreeBSD 13.1, and zsh-5.9 is broken there too. > > > > > > > More: The prev. version, zsh-5.8.1, works on -HEAD. zsh-5.9 has the > > same problem on Ubuntu 19.04 too. > > Thanks for testing on other operating systems! > > I've reported this issue to the zsh developers: > > https://zsh.org/mla/workers/2022/msg01115.html Upstream is still iterating, but I've added the second candidate patch to pkgsrc: https://zsh.org/mla/workers/2022/msg01204.html Cheers, Thomas
Re: noisy dhcpcd messages
On Tue, Nov 01, 2022 at 01:29:19PM +0300, Valeriy E. Ushakov wrote: > On Tue, Nov 01, 2022 at 10:05:19 +0100, Thomas Klausner wrote: > > > What's up with these log lines? > > > > Oct 31 07:52:59 yt dhcpcd[3496]: wm0: requesting DHCPv6 information > > Oct 31 07:53:52 yt syslogd[4885]: last message repeated 5 times > [...] > > This is not a new issue, I can find these log lines in my messages > > files going back to at least August 31. Does this message have a point > > or should we remove it from the default log level? > > It is, but in -current, Do you mean "it is removed, but in -current"? That confuses me, because these messages I see are on 9.99.104. Or syslog has a different bug, because more details from the same log file: Nov 5 07:24:05 yt ntpd[6574]: 86.59.113.124 local addr 192.168.0.33 -> Nov 5 07:24:06 yt dhcpcd[3514]: wm0: requesting DHCPv6 information Nov 5 07:24:48 yt syslogd[4883]: last message repeated 4 times Nov 5 07:26:48 yt syslogd[4883]: last message repeated 12 times Nov 5 07:36:48 yt syslogd[4883]: last message repeated 60 times Nov 5 07:46:48 yt syslogd[4883]: last message repeated 60 times Nov 5 07:56:49 yt syslogd[4883]: last message repeated 60 times Nov 5 08:06:49 yt syslogd[4883]: last message repeated 59 times Nov 5 08:16:49 yt syslogd[4883]: last message repeated 60 times Nov 5 08:26:49 yt syslogd[4883]: last message repeated 60 times Nov 5 08:36:49 yt syslogd[4883]: last message repeated 60 times Nov 5 08:46:50 yt syslogd[4883]: last message repeated 60 times Nov 5 08:56:50 yt syslogd[4883]: last message repeated 60 times Nov 5 08:59:21 yt /netbsd: [ 49084.9251623] 192.168.0.19:/volume2/transfer: inaccurate wcc data (ctime) detected, disabling wcc (ctime 1667627567.063208771 1667627567.063208771, mtime 1667627567.063208771 1667627567.063208771) Nov 5 08:59:21 yt /netbsd: [ 56679.1666898] nfs server 192.168.0.19:/volume2/games: not responding Nov 5 08:59:21 yt /netbsd: [ 56679.2966918] nfs server 192.168.0.19:/volume2/games: is alive again Nov 5 09:06:50 yt syslogd[4883]: last message repeated 60 times Nov 5 09:16:50 yt syslogd[4883]: last message repeated 60 times Nov 5 09:26:50 yt syslogd[4883]: last message repeated 60 times It doesn't make sense that the "is alive again" message should be repeated without the "not responding" one before that. Or I don't understand syslog messages :) Thomas
Re: building source against installed libraries?
On Mon, Oct 31, 2022 at 01:46:56PM +0300, Valeriy E. Ushakov wrote: > On Mon, Oct 31, 2022 at 11:10:24 +0100, Thomas Klausner wrote: > > > For test builds, I use 'USETOOLS=no make' to avoid building a > > toolchain. However that still wants to link against libraries built > > in the source tree, i.e. I have to 'cd /usr/src/lib/libcrypto && > > USETOOLS=no make' to build a new libcrypto if this library is used. > > > > Is there a toggle to build against the installed libraries instead? > > It's entirely unclear from this description what exactly you are > trying to do and how does it fail. I'm trying this: wiz@yt:/usr/src/external/bsd/nsd> USETOOLS=no make and get ... all ===> lib/libnsd all ===> lib/libxfrd all ===> sbin all ===> sbin/nsd make[2]: don't know how to make /disk/6/archive/foreign/src/external/bsd/libevent/lib/libevent/libevent.a. Stop I want the build to use /usr/lib/libevent.a instead. > If I have to venture a guess (I don't have time atm to second guess/ > reverse engineer the question), you are probably running into > something like LIBDPLIBS dependencies that are explicitly listed in > the in-tree makefiles, b/c those makefiles are intended to build the > in-tree code (e.g. for curses I would disable its LIBDPLIBS dependency > on terminfo). Just overriding them on the command line might help. I guess this is ./Makefile.inc:DPLIBS+= event ${NETBSDSRCDIR}/external/bsd/libevent/lib/libevent so perhaps what you're talking about? What do I have to set? Do I have to do this for every library separately? Thomas
building source against installed libraries?
Hi! For test builds, I use 'USETOOLS=no make' to avoid building a toolchain. However that still wants to link against libraries built in the source tree, i.e. I have to 'cd /usr/src/lib/libcrypto && USETOOLS=no make' to build a new libcrypto if this library is used. Is there a toggle to build against the installed libraries instead? Thomas
Re: 9.99.104: panic in tcp_shutdown_wrapper
Hi! A couple hours later, my shell was in an NFS mounted directory (probably idle for some time) and I tried tab-completing an entry, and it panicked again. Same location as below. Hand copied: tcp_shutdown_wrapper+0x20 nfs_disconnect+0x69 nfs_reconnect+0x1a nfs_request+0x7fb nfs_access+0x1ed VOP_ACCESS+0x61 nfs_lookup+052f VOP_LOOKUP+0x8a lookup_once+0x1a6 namei_tryemulroot+0xb00 namei+0x29 vn_open+0x133 do_open+0xc3 do_sys_openat+0x74 sys_open+0x24 syscall+0x196 Thomas > On 29.10.2022, at 11:53, Thomas Klausner wrote: > > Hi! > > I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + user > land, but packages still the old ones built on 9.99.100 in case it matters). > A couple hours later I started transmission-gtk and the machine immediately > panicked. > > Hand copied: > > uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e > Fatal page fault in supervisor mode > Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 > ilevel 0 rsp 0xfc62191caaaf0 > Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0 > Kernel: page fault trap, code = 0 > Stopped in pid 6904.22757 (transmission-gtk) at > netbsd:tcp_shutdown_wrapper+0x20 > : movq 38(%rax), %r14 > tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20 > nfs_disconnect() at netbsd:nfs_disconnect+0x69 > nfs_reconnect() at netbsd:nfs_reconnect+0x1a > nfs_request() at netbsd:nfs_request+0x7fb > nfs_statvfs() at netbsd:nfs_statvfs+0x173 > VFS_STATVFS() at netbsd:VFS_STATVFS+0x22 > dostatvfs() at netbsd:dostatvfs+0x132 > do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f > sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b > syscall() at netbsd:syscall+0x196 > > I have nfs mounted some shares from a Synology station. > > Ideas? Perhaps the pcb merge changes from this week? > Thomas
9.99.104: panic in tcp_shutdown_wrapper
Hi! I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + user land, but packages still the old ones built on 9.99.100 in case it matters). A couple hours later I started transmission-gtk and the machine immediately panicked. Hand copied: uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e Fatal page fault in supervisor mode Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 ilevel 0 rsp 0xfc62191caaaf0 Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0 Kernel: page fault trap, code = 0 Stopped in pid 6904.22757 (transmission-gtk) at netbsd:tcp_shutdown_wrapper+0x20 : movq 38(%rax), %r14 tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20 nfs_disconnect() at netbsd:nfs_disconnect+0x69 nfs_reconnect() at netbsd:nfs_reconnect+0x1a nfs_request() at netbsd:nfs_request+0x7fb nfs_statvfs() at netbsd:nfs_statvfs+0x173 VFS_STATVFS() at netbsd:VFS_STATVFS+0x22 dostatvfs() at netbsd:dostatvfs+0x132 do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b syscall() at netbsd:syscall+0x196 I have nfs mounted some shares from a Synology station. Ideas? Perhaps the pcb merge changes from this week? Thomas
Re: weird less(1) CTRL-Z behaviour
On Fri, Oct 14, 2022 at 11:25:49PM +, RVP wrote: > On Wed, 12 Oct 2022, RVP wrote: > > > On Wed, 12 Oct 2022, Thomas Klausner wrote: > > > > > bin/57053: continuation problem in shell pipelines > > > > > > > FYI: Just tried on FreeBSD 13.1, and zsh-5.9 is broken there too. > > > > More: The prev. version, zsh-5.8.1, works on -HEAD. zsh-5.9 has the > same problem on Ubuntu 19.04 too. Thanks for testing on other operating systems! I've reported this issue to the zsh developers: https://zsh.org/mla/workers/2022/msg01115.html Thomas
Re: weird less(1) CTRL-Z behaviour
On Wed, Oct 12, 2022 at 10:58:56AM +, RVP wrote: > File a PR. This is now bin/57053: continuation problem in shell pipelines Thanks, Thomas
weird less(1) CTRL-Z behaviour
Hi! I've been using the following shell function for ages: dir() { ls -al "$@" | less; } On -current (9.99.100 kernel from Oct 9, Userland from Sep 21, zsh from May), when I CTRL-Z the less(1) and then want to go back in, it doesn't work and I see the following: > dir zsh: done ls -al "$@" | zsh: suspended > fg [1] + done ls -al "$@" | continued zsh: donels -al "$@" | zsh: suspended (tty output) zsh: donels -al "$@" | zsh: suspended (tty output) That happens every time I try to 'fg' it. This was working fine not so long ago, but I don't remember exactly when it started happening. Thomas
Re: Panic (KASSERT) in src/sys/uvm/uvm_map.c", line 2120 (today's HEAD).
On Sun, Sep 18, 2022 at 01:44:25PM +0700, Robert Elz wrote: > mmap_hint: [ 991.7219923] panic: kernel diagnostic assertion "!topdown || > hint <= orig_hint" failed: file "/release/src/sys/uvm/uvm_map.c", line 2120 > hint: 0x1ff000, orig_hint: 0x1000 I think this is http://gnats.netbsd.org/56900 An assertion riastradh added that should be true isn't always true. Thomas
Re: namespace pollution? clone()
On Mon, Aug 01, 2022 at 06:06:19PM +0300, Valeriy E. Ushakov wrote: > On Mon, Aug 01, 2022 at 16:50:14 +0200, Thomas Klausner wrote: > > > On Mon, Aug 01, 2022 at 05:45:23PM +0300, Valeriy E. Ushakov wrote: > > > Shouldn't we expose __clone(2) (the real symbol in the reserved > > > namespace) under _NETBSD_SOURCE and only hide clone(2) weak alias > > > under _GNU_SOURCE? You kinda sidestep some potential issues here in > > > this case b/c __clone is an assembler syscall stub, so there's no C > > > source that implements __close() that has to see the declaration. > > > > I don't understand the problem you see here - please fix it as you > > find appropriate. > > I think we should still expose __clone() under _NETBSD_SOURCE, but > expose clone() only under _GNU_SOURCE. My original reply that > prompted your patch was not very clear about this, but it talked > specifically about clone() (and clone() only). > > Your patch hides both clone() and __clone() under _GNU_SOURCE. You > were not forced to consider this choice b/c __clone() is not > implemented in C, so there's no C code in the tree that needs to see > the __clone() prototype that your patch hides. > > __clone is in the reserved namespace, so no well behaving programs > should be affected by that declaration. I don't understand why we expose __clone() in a public header at all, but I understand your comments to result in the attached patch. Please suggest a comment to put before the __clone() line. Thanks, Thomas Index: sched.h === RCS file: /cvsroot/src/include/sched.h,v retrieving revision 1.14 diff -u -r1.14 sched.h --- sched.h 1 Aug 2022 14:34:01 - 1.14 +++ sched.h 1 Aug 2022 15:10:52 - @@ -73,13 +73,17 @@ /* * Historical functions, not defined in standard - * Linux man page documents these functions as only available when + * Linux man page documents clone() as only available when * _GNU_SOURCE is defined */ pid_t clone(int (*)(void *), void *, int, void *); +#endif /* _GNU_SOURCE */ + +#if defined(_NETBSD_SOURCE) + pid_t __clone(int (*)(void *), void *, int, void *); -#endif /* _GNU_SOURCE */ +#endif /* _NETBSD_SOURCE */ __END_DECLS
Re: namespace pollution? clone()
On Mon, Aug 01, 2022 at 05:45:23PM +0300, Valeriy E. Ushakov wrote: > Shouldn't we expose __clone(2) (the real symbol in the reserved > namespace) under _NETBSD_SOURCE and only hide clone(2) weak alias > under _GNU_SOURCE? You kinda sidestep some potential issues here in > this case b/c __clone is an assembler syscall stub, so there's no C > source that implements __close() that has to see the declaration. I don't understand the problem you see here - please fix it as you find appropriate. Thanks, Thomas
Re: namespace pollution? clone()
On Mon, Aug 01, 2022 at 07:32:26AM -0700, Jason Thorpe wrote: > > > On Aug 1, 2022, at 7:22 AM, Thomas Klausner wrote: > > > > On Mon, Aug 01, 2022 at 11:20:11PM +0900, Rin Okuyama wrote: > >> On 2022/08/01 23:13, Martin Husemann wrote: > >>> On Mon, Aug 01, 2022 at 03:57:19PM +0200, Thomas Klausner wrote: > >>>> The attached diff survived a complete amd64-current build. Ok to commit? > >>> > >>> Looks good to me. > >> > >> Can you please mention _GNU_SOURCE in clone(2)? > > > > Thanks for the reminder - done! > > Please also fix the comment style to conform to KNF. Done. Thomas
Re: namespace pollution? clone()
On Mon, Aug 01, 2022 at 11:20:11PM +0900, Rin Okuyama wrote: > On 2022/08/01 23:13, Martin Husemann wrote: > > On Mon, Aug 01, 2022 at 03:57:19PM +0200, Thomas Klausner wrote: > > > The attached diff survived a complete amd64-current build. Ok to commit? > > > > Looks good to me. > > Can you please mention _GNU_SOURCE in clone(2)? Thanks for the reminder - done! Thomas
Re: namespace pollution? clone()
On Tue, Jul 26, 2022 at 03:03:54PM +0200, Martin Husemann wrote: > On Tue, Jul 26, 2022 at 03:46:14PM +0300, Valery Ushakov wrote: > > On Linux clone(2) is declared only for _GNU_SOURCE, which explains why > > linux doesn't run into the name clash. I gather we should follow > > suit, as that's what the apps expect. > > Yes, that is the right thing to do here, especially as clone(2) does > only exist as a portability helper for linux code. > > I think we could even pull that change up to -9. The attached diff survived a complete amd64-current build. Ok to commit? Thomas Index: sched.h === RCS file: /cvsroot/src/include/sched.h,v retrieving revision 1.12 diff -u -r1.12 sched.h --- sched.h 11 Jan 2009 03:04:12 - 1.12 +++ sched.h 1 Aug 2022 13:57:06 - @@ -59,20 +59,26 @@ #define sched_yield__libc_thr_yield #endif /* __LIBPTHREAD_SOURCE__ */ -#if defined(_NETBSD_SOURCE) - __BEGIN_DECLS +#if defined(_NETBSD_SOURCE) + /* Process affinity functions (not portable) */ intsched_getaffinity_np(pid_t, size_t, cpuset_t *); intsched_setaffinity_np(pid_t, size_t, cpuset_t *); +#endif /* _NETBSD_SOURCE */ + +#if defined(_GNU_SOURCE) + /* Historical functions, not defined in standard */ +/* Linux man page documents these functions as only available when + * _GNU_SOURCE is defined */ pid_t clone(int (*)(void *), void *, int, void *); pid_t __clone(int (*)(void *), void *, int, void *); -__END_DECLS +#endif /* _GNU_SOURCE */ -#endif /* _NETBSD_SOURCE */ +__END_DECLS #endif /* _SCHED_H_ */
Re: namespace pollution? clone()
On Tue, Jul 26, 2022 at 06:11:36AM -0400, Greg Troxel wrote: > So where is the visibility restriction? Oh, that's probably a misunderstanding on my side. Thomas
namespace pollution? clone()
Hi! When compiling inkscape I found a weird compilation error that I traced down to clone() being in the visible namespace. https://gitlab.com/inkscape/inbox/-/issues/7378 I wonder why it's visible though, since in sched.h it's protected by _NETBSD_SOURCE. The command line is cd /scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src && c++ -DHAVE_CONFIG_H -DHAVE_X11 -DWITH_CSSBLEND -DWITH_MESH -DWITH_SVG2 -D_REENTRANT -Dinkscape_base_EXPORTS -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410 -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/include -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/adaptagrams -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/2geom/include -I/scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/3rdparty/2geom/include/2geom -isystem /usr/pkg/include/harfbuzz -isystem /usr/pkg/include/freetype2 -isystem /usr/pkg/include -isystem /usr/pkg/include/glib-2.0 -isystem /usr/pkg/lib/glib-2.0/include -isystem /usr/pkg/include/pango-1.0 -isystem /usr/pkg/include/fribidi -isystem /usr/pkg/include/cairo -isystem /usr/pkg/include/pixman-1 -isystem /usr/pkg/include/libpng16 -isystem /usr/pkg/include/libsoup-2.4 -isystem /usr/pkg/include/libxml2 -isystem /usr/pkg/include/poppler -isystem /usr/pkg/include/libwpg-0.3 -isystem /usr/pkg/include/librevenge-0.0 -isystem /usr/pkg/include/libwpd-0.10 -isystem /usr/pkg/include/libvisio-0.1 -isystem /usr/pkg/include/libcdr-0.1 -isystem /usr/pkg/include/gtkmm-3.0 -isystem /usr/pkg/lib/gtkmm-3.0/include -isystem /usr/pkg/include/giomm-2.4 -isystem /usr/pkg/lib/giomm-2.4/include -isystem /usr/pkg/include/glibmm-2.4 -isystem /usr/pkg/lib/glibmm-2.4/include -isystem /usr/pkg/include/sigc++-2.0 -isystem /usr/pkg/lib/sigc++-2.0/include -isystem /usr/pkg/include/gtk-3.0 -isystem /usr/pkg/include/gdk-pixbuf-2.0 -isystem /usr/pkg/include/gio-unix-2.0 -isystem /usr/pkg/include/libdrm -isystem /usr/pkg/include/atk-1.0 -isystem /usr/pkg/include/at-spi2-atk/2.0 -isystem /usr/pkg/include/dbus-1.0 -isystem /usr/pkg/lib/dbus-1.0/include -isystem /usr/pkg/include/at-spi-2.0 -isystem /usr/pkg/include/cairomm-1.0 -isystem /usr/pkg/lib/cairomm-1.0/include -isystem /usr/pkg/include/pangomm-1.4 -isystem /usr/pkg/lib/pangomm-1.4/include -isystem /usr/pkg/include/atkmm-1.6 -isystem /usr/pkg/lib/atkmm-1.6/include -isystem /usr/pkg/include/gtk-3.0/unix-print -isystem /usr/pkg/include/gdkmm-3.0 -isystem /usr/pkg/lib/gdkmm-3.0/include -O2 -g -fPIC -D_FORTIFY_SOURCE=2 -fstack-check -pthread -I/usr/pkg/include -I/usr/include -I/usr/pkg/include/freetype2 -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/harfbuzz -I/usr/pkg/include/python3.10 -I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -DG_DISABLE_ASSERT -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DGLIBMM_DISABLE_DEPRECATED -DGTKMM_DISABLE_DEPRECATED -DGDKMM_DISABLE_DEPRECATED -DGTK_DISABLE_DEPRECATED -DGDK_DISABLE_DEPRECATED -fstack-protector-strong -Werror=format -Werror=format-security -Werror=ignored-qualifiers -Werror=return-type -Wno-switch -Wstrict-null-sentinel -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -D_REENTRANT -pthread -D_REENTRANT -D_REENTRANT -DNDEBUG -fPIC -pthread -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/inkscape_base.dir/actions/actions-edit.cpp.o -MF CMakeFiles/inkscape_base.dir/actions/actions-edit.cpp.o.d -E /scratch/graphics/inkscape/work/inkscape-1.2.1_2022-07-14_9c6d41e410/src/actions/actions-edit.cpp Cheers, Thomas
Re: panic in evo_wait
Hi Matt! On Mon, Jul 18, 2022 at 01:53:49PM +1000, Matthew Green wrote: > > [184218.xxx] warning: > > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: > > 1 > > can you patch this code to print the value of "data" here? > it's probably a bad request for userland, but the BUG_ON() > here does not give you any indication on _what_. Ok, I'll use the attached diff for my next kernel. > > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e > > [184218.xxx] fatal page fault in supervisor mode > > [184218.xxx] trap type 6 code 0x2 ... > > this line's contents would have included the fault address, > which is kinda useful for next time :-) I've got the rip -- it's 0x8095e177. > > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack > > 0xb589296452c0 > > kernel: page fault trap, code=0 > > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 > > 000,0(%rdx,%rax,1) > > evo_wait() at netbsd:evo_wait+0x7b > > base507c_ntfy_set() > > nv50_wndw_flush_set() > > nv50_disp_atomic_commit_tail() > > nv50_disp_atomic_commit() > > drm_atomic_helper_set_config() > > drm_mode_setcrtc() > > drm_ioctl() > > can you find out where evo_wait+0x7b is? in my kernel it's > at line 243, and the disasm seems to patch your "movl" above. > > 235 evo_wait(struct nv50_dmac *evoc, int nr) > 236 { > 237 struct nv50_dmac *dmac = evoc; > 238 struct nvif_device *device = dmac->base.device; > 239 u32 put = nvif_rd32(>base.user, 0x) / 4; > 240 > 241 spin_lock(>lock); > 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { > 243 dmac->ptr[put] = 0x2000; > 244 evo_flush(dmac); > > Dump of assembler code for function evo_wait: >0x8084dfe1 <+0>: push %rbp > [...] >0x8084e05c <+123>: movl $0x2000,(%rdx,%rax,1) > > (0x7b = 123) exactly: (gdb) 241 spin_lock(>lock); 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { 243 dmac->ptr[put] = 0x2000; 244 evo_flush(dmac); 245 246 nvif_wr32(>base.user, 0x, 0x); 247 if (nvif_msec(device, 2000, 248 if (!nvif_rd32(>base.user, 0x0004)) 249 break; 250 ) < 0) { (gdb) info line *(evo_wait+0x7b) Line 243 of "/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c" starts at address 0x8095e170 and ends at 0x8095e17e . which also matches the rip: (gdb) info line *(0x8095e177) Line 243 of "/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c" starts at address 0x8095e170 and ends at 0x8095e17e . > probably "dmac->ptr" is invalid here. a quick guess at the > code indicates it's only set once in nv50_dmac_create(), > the source from the caller(s). at least, i can't see it > set anywhere else right now. Thomas Index: nouveau_nvkm_engine_disp_headgf119.c === RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c,v retrieving revision 1.2 diff -u -r1.2 nouveau_nvkm_engine_disp_headgf119.c --- nouveau_nvkm_engine_disp_headgf119.c18 Dec 2021 23:45:35 - 1.2 +++ nouveau_nvkm_engine_disp_headgf119.c18 Jul 2022 18:36:47 - @@ -80,7 +80,7 @@ case 0: state->or.depth = 18; break; /*XXX: "default" */ default: state->or.depth = 18; - WARN_ON(1); + WARN_ON(data); break; } }
panic in evo_wait
Hi! Yesterday I had a panic on 9.99.98/amd64 from June 22 while playing a couple of videos using mpv. Hand-transcribed from the console [184197.xxx] nouveau0: error: bus: MMIO read of FAULT at 409800 [TIMEOUT ] [184199.xxx] nouveau0: warn: timeout [184199.xxx] nouveau0: error: gr: init failed, -16 [184201.xxx] nouveau0: warn: timeout [184203.xxx] nouveau0: warn: timeout [184205.xxx] nouveau0: warn: timeout [184207.xxx] nouveau0: warn: timeout [184209.xxx] nouveau0: warn: timeout [184211.xxx] nouveau0: warn: timeout [184213.xxx] nouveau0: warn: timeout [184215.xxx] nouveau0: warn: timeout [184218.xxx] nouveau0: warn: timeout [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e [184218.xxx] fatal page fault in supervisor mode [184218.xxx] trap type 6 code 0x2 ... [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 0xb589296452c0 kernel: page fault trap, code=0 Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 000,0(%rdx,%rax,1) evo_wait() at netbsd:evo_wait+0x7b base507c_ntfy_set() nv50_wndw_flush_set() nv50_disp_atomic_commit_tail() nv50_disp_atomic_commit() drm_atomic_helper_set_config() drm_mode_setcrtc() drm_ioctl() drm_ioctl_shim() sys_ioctl() syscall() --- syscall (number 54) --- Does this ring a bell with anyone? Should I file a PR? Thomas
savecore weirdness
Hi! In the last weeks, every reboot tries to write a crashdump, but savecore fails at the end, and the next boot tries to write it again. savecore: writing compressed core to ... savecore: writing compressed kernel to ... savecore: kvm_read ksymcs: _kvm_kvatop(e9x031814c8cf8c7) savecore: (null): Bad address /etc/rc.d/savecore exited with code 1 That looks like a bug in savecore. It shouldn't misbehave whatever the crash data. I've tried overwriting the first 100MB of the 'dp' entry in my fstab with zeroes in the hope of getting rid of the crashdump, but that didn't help either. How can I get rid of the crashdump so savecore doesn't try again to write it out? Thanks, Thomas
Panic in uvm_map_findspace (was Re: 9.99.98: spontaneous reboots)
> On 23.06.2022, at 09:34, Thomas Klausner wrote: > > On Tue, Jun 21, 2022 at 11:04:03PM +0200, Thomas Klausner wrote: >> I've been running a 9.99.97 kernel from June 1 and it was stable, >> including bulk builds. Today I upgraded to 9.99.98 and started a fresh >> bulk build, and it rebooted after a couple hours, nothing in dmesg or >> syslog, no crashdump. >> >> I restarted the bulk build and it rebooted again after about 5 minutes >> and one finished package. I think it was building nodejs and >> webkit-gtk at the time (and some third package I don't know). >> >> Has anyone else seen stability issues with today's current? > > Thanks for the feedback. > > I think you were lucky :) or we have different hardware. > > I've tried bulk building with 98 for a third time and had a reboot > shortly afterwards. Going back to 97 from June 1, I could finish a > bulk build. > > I've locally backed out the uvm changes from early June and will > report back if that brought back stability. > With the UVM changes from early June backed out, the system was stable. I had ddb_onpanic=0, I switched to ddb_onpanic=1 and after some time with a 9.99.98 kernel I got this: panic: kernel diagnostic assertion “!topdown || hint <= orig_hint” failed: file “/usr/src/sys/uvm/uvm_map.c”, line 1795 map=0xc8415a1d9388 hint=0xfff944a0 orig_hint=0x8200 length=0x77e00 uobj=0x0 uoffset=0xf align=0 flags=0x80710 entry=0xc8415a1d93e0 (uvm_map_findspace line 1998) cpu0: begin traceback vpanic() kern_assert() uvm_findspace_invariants() at netbsd:uvm_findspace_invariants+0x83 uvm_map_findspace() at netbsd:uvm_map_findspace()+0x1c6 uvm_map_prepare() at netbsd:uvm_map_prepare+0x1f6 uvm_map() at netbsd:uvm_map+0x70 uvm_mmap.part.0() at netbsd:vm_mmap.part.0+0x15a sys_mmap() at netbsd:sys_mmap+0x42f syscall() number 197 (Handcopied with autocorrection, but I have pictures if you want to verify something). Thomas
Re: 9.99.98: spontaneous reboots
On Tue, Jun 21, 2022 at 11:04:03PM +0200, Thomas Klausner wrote: > I've been running a 9.99.97 kernel from June 1 and it was stable, > including bulk builds. Today I upgraded to 9.99.98 and started a fresh > bulk build, and it rebooted after a couple hours, nothing in dmesg or > syslog, no crashdump. > > I restarted the bulk build and it rebooted again after about 5 minutes > and one finished package. I think it was building nodejs and > webkit-gtk at the time (and some third package I don't know). > > Has anyone else seen stability issues with today's current? Thanks for the feedback. I think you were lucky :) or we have different hardware. I've tried bulk building with 98 for a third time and had a reboot shortly afterwards. Going back to 97 from June 1, I could finish a bulk build. I've locally backed out the uvm changes from early June and will report back if that brought back stability. Thomas
9.99.98: spontaneous reboots
Hi! I've been running a 9.99.97 kernel from June 1 and it was stable, including bulk builds. Today I upgraded to 9.99.98 and started a fresh bulk build, and it rebooted after a couple hours, nothing in dmesg or syslog, no crashdump. I restarted the bulk build and it rebooted again after about 5 minutes and one finished package. I think it was building nodejs and webkit-gtk at the time (and some third package I don't know). Has anyone else seen stability issues with today's current? Thomas
Re: scp -r incompatibility between -current and NetBSD releases
On Sat, Jun 11, 2022 at 08:48:10AM -0700, Brian Buhrow wrote: > Hello. What version of openssh are you using? I just tested between > NetBSD-5.2 and > -current as of 99.77. Those versions are: > 5.2: OpenSSH_5.0 NetBSD_Secure_Shell-20080403-hpn13v1 > 99.77: OpenSSH_8.4 NetBSD_Secure_Shell-20201204-hpn13v14-lpk, > > your command, with a nested directory, works in both directions between these > two machines > without an issue. OpenSSH_9.0 NetBSD_Secure_Shell-20220415-hpn13v14-lpk, OpenSSL 1.1.1n 15 Mar 2022 Thomas
scp -r incompatibility between -current and NetBSD releases
Hi! I cannot use 'scp -r' from -current to NetBSD 8 or NetBSD 9. > scp -r a target: scp: realpath ./a: No such file scp: upload "./a": path canonicalization failed scp: failed to upload directory a to . scp without -r still works fine. Is there a compatibility setting I can enable to make this work again? Thomas
Re: nouveau: back in text console after switch to graphical one
Did either of you install any firmware files? Which firmware file is loaded? Thomas
Re: nouveau: back in text console after switch to graphical one
On Wed, Jun 08, 2022 at 04:21:16PM +0200, Cygnus X-1 wrote: > On 22/06/08 06:58AM, Paul Goyette wrote: > > Yup. At least with 9.99.97 my nouveau is running great on my Geforce > > GTX 1050 Ti > > Thanks a lot for the precious feedback. > Alright, I guess it's time to upgrade to 9.99.97 and report back. Well, I was already on 9.99.97 from June, so that's not enough for me. Thomas
Re: boot.cfg syntax question
On Sun, Jun 05, 2022 at 12:53:09AM +, RVP wrote: > On Sun, 5 Jun 2022, Thomas Klausner wrote: > > > However, when I press '3' in that config, I get a kernel where nouveau > > is disabled. > > > > Did I misunderstand the man page or is there a bug here? > > > > Looks like a bug when a bare `boot' is encountered. Work around it by > forcing a kernel filename: > > --- boot.cfg.orig 2022-06-05 00:48:51.47679 + > +++ boot.cfg2022-06-05 00:49:18.797459000 + > @@ -1,6 +1,6 @@ > -menu=Boot without nouveau:rndseed /var/db/entropy-file;userconf disable > nouveau*;boot > +menu=Boot without nouveau:rndseed /var/db/entropy-file;userconf disable > nouveau*;boot /netbsd > menu=Boot old without nouveau:rndseed /var/db/entropy-file;userconf disable > nouveau*;boot /netbsd.old > -menu=Boot normally:rndseed /var/db/entropy-file;boot > +menu=Boot normally:rndseed /var/db/entropy-file;boot /netbsd > menu=Boot single user:rndseed /var/db/entropy-file;boot -s > menu=Drop to boot prompt:prompt > default=1 Yes, this works around the issue for me. Thanks! Thomas