re: amdgpu laptops with 10 & current?
nia writes: > The ThinkPad A485 looks pretty interesting for use with NetBSD. [ ... ] > - AMD Radeon Vega 6, 8 or 10 > > Usually I prefer the smaller X series, but they've made them > non-upgradable and harder to repair... > > ethernet is re0, this is different from the intel models that are [ ... ] i have an a495s that doesn't work so great, but i also have an a475 that does work pretty well. the onboard re(4) works fine for an re(4) on both (i have the dongle that rjs hinted at for the 495s.) my a475 has a12-9800B cpu (4c 2.7ghz, 3.6ghz turbo), and i think it calls the GPU an "R7", it is an amdgpu and it works fine. the a475 almost suspend/resumes properly. USB3 is broken afterwards. i'm pretty happy with the a475, though it could be faster. the a495s amdgpu doesn't work for me, though the default fb is good enough for basic X usage (firefox without video works.) i haven't played as much with this because my system has a bad battery and won't stay powered on unplugged. what this means is .. an a485 may be some what broken, but perhaps not as broken as the a495s is? :) .mrg.
re: unable to boot 10.0/amd64
this might be the same as https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57153 it's the same faulting function and similar offset... .mrg.
new "compat" sets have really made sets harder to manage.
hiya. the new compat32 sets rearrangement has broken the GCC 12 build, due to dropping "gcc=10" tag in some places. that's a minor issue, and i'll fix that soon (though having looked closer at the first "grep -r" output below, i see most of these are affected. i'll just initially be fixing arm64 and amd64.) however, while looking at this i noticed that there's been a major explosion in sets that shouldn't happen. compare matches for "libasan.so.5.0" betweeen new/old: yesterday-when-i-was-mad distrib/sets/lists> grep -r asan.so.5.0 . ./base/shl.mi:./usr/lib/libasan.so.5.0 base-sys-shlib cxx,gcc=10 ./debug/shl.mi:./usr/libdata/debug/usr/lib/libasan.so.5.0.debug comp-sys-debug debug,cxx,gcc=10 ./base32/ad.aarch64:./usr/lib/eabi/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.aarch64:./usr/lib/eabihf/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mips64eb:./usr/lib/64/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mips64eb:./usr/lib/o32/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mips64el:./usr/lib/64/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mips64el:./usr/lib/o32/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mipsn64eb:./usr/lib/64/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mipsn64eb:./usr/lib/o32/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mipsn64el:./usr/lib/64/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.mipsn64el:./usr/lib/o32/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.powerpc64:./usr/lib/powerpc/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/ad.riscv64:./usr/lib/rv32/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/md.amd64:./usr/lib/i386/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./base32/md.sparc64:./usr/lib/sparc/libasan.so.5.0 base-compat-shlib compat,gcc,cxx ./debug32/ad.aarch64:./usr/libdata/debug/usr/lib/eabi/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.aarch64:./usr/libdata/debug/usr/lib/eabihf/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mips64eb:./usr/libdata/debug/usr/lib/64/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mips64eb:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mips64el:./usr/libdata/debug/usr/lib/64/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mips64el:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mipsn64eb:./usr/libdata/debug/usr/lib/n32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mipsn64eb:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mipsn64el:./usr/libdata/debug/usr/lib/n32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.mipsn64el:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.powerpc64:./usr/libdata/debug/usr/lib/powerpc/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/ad.riscv64:./usr/libdata/debug/usr/lib/rv32/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/md.amd64:./usr/libdata/debug/usr/lib/i386/libasan.so.5.0.debug comp-sys-debug debug,compat ./debug32/md.sparc64:./usr/libdata/debug/usr/lib/sparc/libasan.so.5.0.debug comp-sys-debug debug,compat vs in an older tree: yesterday-when-i-was-mad distrib/sets/lists> grep -r asan.so.5.0 . ./base/shl.mi:./usr/lib/libasan.so.5.0 base-sys-shlib compatfile,cxx,gcc=10 ./debug/shl.mi:./usr/libdata/debug/usr/lib/libasan.so.5.0.debug comp-sys-debug debug,compatfile,cxx,gcc=10 ie, there are just *two* entries for this file (the real file, and the debug file), and the rest is all derived from the "comaptfile" and "debug" tags. the new ones has 30 copies, spread across a number of files, all that will need editing as future GCCs appear. this is compounded across dozens of other files so there are now hundreds or perhaps thousands of unnecessary duplicated lines, in a couple of dozen of files. can someone please fix this? (nia is out for now, so maybe some other enterprising person can help :) thanks. .mrg.
re: raidframe and gpt
Paul Goyette writes: > Does anyone have an example of how to configure raid0 on a GPT disk? these are my notes i refer to every so often: https://www.netbsd.org/~mrg/gpt-raid-setup.txt it's gpt on each with type raid, which gives you dkN @ diskN, you then create a raid with those dkNs, and then you create another gpt on the raid device itself, with a ffs partition. (see below; but skip the raidN.conf method, and just use the newer raidctl create.) > I can easily set the partition type with gpt, but how do I reserve > space for the raid component label? Do I need to reserve that space? note how i pick "-b 128" above to get my partitions aligned on at least 64K bounaries. nvme/sata probably wants higher (check your disk specs, it can vary a lot, and you could go as high as 6MB alignment to catch all known alignment...) > Also, does raidframe understand the NAME=gpt-label syntax in the > config file? Or does it require me to specify the particular dk ? > (And what happens if something moves and changes?) NAME= works. use autoconfig raid.. actually just use the new in -current "raidctl create", since it does all the intro set and good default choices. > It seems so much simpler to use ccd(4) but there's a nasty memory > allocation bug which makes it unuseable for now. you can't root-on-ccd like you can root-on-raidframe :-) you could, using the same initrd method root-on-cgd uses. .mrg.
re: rc.d start order
Paul Goyette writes: > On Tue, 5 Mar 2024, Paul Goyette wrote: > > > I _think_ it will work correctly if I modify fstab to refer to > > NAME=Builds instead of ccd0. I will update here after I confirm. > > Yes this seems to work. this is very much preferred. "ccd0" is the device i suspect if you re-ran 'MAKEDEV ccd0' you'd end up with a new /dev/ccd0 that is an alias for the rawpart (c or d, d for amd64.) so, perhaps the failure to run this and get a modern netbsd device name present actually got you to use the right way of talking to wedges :) .mrg.
re: new BIND in 10.0_RC5/sparc dies w/Bus error
ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type {
re: new BIND in 10.0_RC5/sparc dies w/Bus error
this appears to be a badly aligned structure issue. i can reproduce it by doing "anita interact" with any recent sparc .iso, editing the named.conf to start, starting named, and doing 'dig ns netbsd.org' would trigger the crash. the stack trace is: (gdb) bt #0 ns__client_request (handle=0xeb02d008, eresult=ISC_R_SUCCESS, region=, arg=) at /usr/10/src/external/mpl/bind/lib/libns/../../dist/lib/ns/client.c:1825 #1 0xedb0dc80 in isc__nm_async_readcb (worker=0x0, ev0=0xeccf7ad4) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2914 #2 0xedb0dde0 in isc__nm_readcb (sock=0xecfe8808, uvreq=0xeb0b6008, eresult=ISC_R_SUCCESS) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2887 #3 0xedb1183c in udp_recv_cb (handle=, nrecv=53, buf=0xeccf7c54, addr=, flags=0) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/udp.c:653 #4 0xedb3aec8 in uv__udp_recvmsg (handle=0xecfe89f8) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:303 #5 uv__udp_io (loop=, w=0xecfe8a38, revents=1) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:178 #6 0xedb3a034 in uv__io_poll (loop=0xecf62810, timeout=) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/kqueue.c:390 #7 0xedb431a0 in uv_run (loop=0xecf62810, mode=UV_RUN_DEFAULT) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/core.c:406 #8 0xedb106ec in nm_thread (worker0=0xecf62808) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:704 #9 0xedb20f44 in isc__trampoline_run (arg=0xecf36be0) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/trampoline.c:192 #10 0xed9ecda8 in pthread__create_tramp (cookie=0xecf7b000) at /usr/10/src/lib/libpthread/pthread.c:595 and the problem is that in ns__client_request(), we end up with: (gdb) p client $17 = (ns_client_t *) 0xeb02d144 but the alignment requirement for this structure is 8-bytes as it has 64-bit members. the fault actually occurs when reading two 4-byte members in one instruction: 1825env = client->manager->aclenv; 1826if (client->sctx->blackholeacl != NULL && 0x00036e70 <+408>: ldd [ %l6 + 0x10 ], %g2 "sctx" and "manager" are at offsets 0x10 and 0x14 and can both be read with a single ldd (64-bit load) but this requires correct alignment. i didn't track down how this client value is allocated, it's all via some opaque handle thing in the libraries, but this is a bug in the new bind not allocating structures properly aligned. .mrg.
re: new BIND in 10.0_RC5/sparc dies w/Bus error
actually, i found a core file in /var/chroot/named/etc/namedb/named.core. my build is missing debug info so i don't have a good idea what. .mrg.
re: new BIND in 10.0_RC5/sparc dies w/Bus error
> Unfortunately there was no core dump. this is almost certainly because /var/chroot/named is not writeable by user named, which is on purpose. you can set the corefile path for this process after it starts using sysctl proc.$pid.corename. i think setting to "/var/tmp/%n.core" should allow it to write to /var/tmp in the chroot. .mrg.
re: Removing a superfluous warning from xf86-input-ws/dist/src/ws.c
> On Mon 05 Feb 2024 at 10:18:09 +1100, matthew green wrote: > > perhaps convert into a DBG(4, ...)? > > On Mon 05 Feb 2024 at 02:20:25 +0300, Valery Ushakov wrote: > > May be make it reported only once, so that the message is still there > > in the log, but it's not spammed uselessly, adding no new information? > > I think I like the second suggestion slightly better, so I'll go with > that. I'll do a test build first, even though it seems trivial. I didn't > do a build in a while anyway... i like this. thanks. .mrg.
re: Removing a superfluous warning from xf86-input-ws/dist/src/ws.c
> if (hscroll || vscroll) { > xf86Msg(X_WARNING, "%s: hscroll=%d, vscroll=%d\n", > pInfo->name, hscroll, vscroll); [ ... ] > This touchpad method is not supported by the xf86-input-mouse driver so > with that one the touchpad doesn't scroll. > > Shall I just remove the warning? perhaps convert into a DBG(4, ...)? it certainly shouldn't be generated log flood so downgrade or removal is the right answer. thanks. .mrg.
re: unlink_if_ordinary undefined...
> = note: ld: /usr/libexec/liblto_plugin.so: error loading plugin: > /usr/libexec/liblto_plugin.so: Undefined PLT symbol > "unlink_if_ordinary" (symnum = 47) this part should be fixed now. probably needs a pullup.. .mrg.
re: Update ARFLAGS?
Thomas Klausner writes: > Hi! > > As noted in PR 57565, the default ARFLAGS in share/mk/sys.mk are > broken - they use 'l' which changed behaviour between binutils 2.34 > and 2.39. > > Ok to commit the change? > > (This broke the build of ruby-nokogiri recently, which is how I > noticed.) the change? removing 'l'? yes... though i still find it pretty offensive that it changed behaviour now. it was an ignored option before so, removing it is the right change. thanks. .mrg.
re: gcc 12 question
Patrick Welche writes: > On Thu, Nov 23, 2023 at 12:31:34PM +, Robert Swindells wrote: > > > > Patrick Welche wrote: > > > I'm trying to build a release on amd64 using > > > > > > HAVE_MESA_VER=21 > > > HAVE_GCC=12 > > > > What does pkgsrc graphics/MesaLib do if built using gcc 12? > > It builds OK. > > Given > > https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716 > > my guess is that the pkgsrc package doesn't treat warnings as errors. > (-Werror=stringop-overread) this looks wrong to me (the warning, as you pointed out in your original mail, the code appears fine), and the right workaround is to use ${CC_WNO_STRINGOP_OVERREAD} to avoid it. thanks. .mrg.
re: Aquantia AQC100 issues
Rin Okuyama writes: > Hi Andrius, > > If you still have this AQC100 in working condition, can you try this patch? > > https://gist.github.com/rokuyama/ab6ba1a0fac7fa15f243d63a99e14f33 > > I've collected three fibre aq(4) variants (all rev 2), and link status > interrupts work just fine for me. I think that link intr did not work for > you, not due to fibre variant, but hardware revision. If this is correct, > the patch above should work... this reminded me that my aq(4) doesn't have working link and that mlelstv suggested to me that the linux driver always uses a tick timer to also check status, as well as interrupts. i implemented this recently and now my aq(4) has link status correctly: aq(4): always poll for link status some devices don't have working link status and rather than have a likely incomplete list of issues, always poll as well as use the interrupt if possible. fixes link status on this device: aq0 at pci5 dev 0 function 0: Aquantia AQC107 10 Gigabit Network Adapter (rev. 0x02) aq0: Atlantic revision B1, F/W version 3.1.88 (was otherwise functional, just didn't report status, which likely meant eg, dhcpcd would be upset?) idea via mlelstv@ from linux. remove sc_detect_linkstat and rename sc_poll_linkstat to sc_no_link_intr, as the meaning has changed. simplify the signature for aq_setup_msix() and aq_establish_msix_intr(), removing forward decls that aren't required. obsolete AQ_FORCE_POLL_LINKSTAT. Index: if_aq.c === RCS file: /cvsroot/src/sys/dev/pci/if_aq.c,v retrieving revision 1.45 diff -p -u -r1.45 if_aq.c --- if_aq.c 29 May 2023 08:00:05 - 1.45 +++ if_aq.c 26 Oct 2023 06:55:28 - @@ -1330,8 +1330,7 @@ struct aq_softc { int sc_rx_irq[AQ_RSSQUEUE_MAX]; int sc_linkstat_irq; bool sc_use_txrx_independent_intr; - bool sc_poll_linkstat; - bool sc_detect_linkstat; + bool sc_no_link_intr; #if NSYSMON_ENVSYS > 0 struct sysmon_envsys *sc_sme; @@ -1443,11 +1442,9 @@ static int aq_match(device_t, cfdata_t, static void aq_attach(device_t, device_t, void *); static int aq_detach(device_t, int); -static int aq_setup_msix(struct aq_softc *, struct pci_attach_args *, int, -bool, bool); +static int aq_setup_msix(struct aq_softc *, struct pci_attach_args *); static int aq_setup_legacy(struct aq_softc *, struct pci_attach_args *, pci_intr_type_t); -static int aq_establish_msix_intr(struct aq_softc *, bool, bool); static int aq_ifmedia_change(struct ifnet * const); static void aq_ifmedia_status(struct ifnet * const, struct ifmediareq *); @@ -1784,67 +1781,57 @@ aq_attach(device_t parent, device_t self if (msixcount >= (sc->sc_nqueues * 2 + 1)) { /* TX intrs + RX intrs + LINKSTAT intrs */ sc->sc_use_txrx_independent_intr = true; - sc->sc_poll_linkstat = false; sc->sc_msix = true; } else if (msixcount >= (sc->sc_nqueues * 2)) { /* TX intrs + RX intrs */ sc->sc_use_txrx_independent_intr = true; - sc->sc_poll_linkstat = true; sc->sc_msix = true; } else #endif if (msixcount >= (sc->sc_nqueues + 1)) { /* TX/RX intrs LINKSTAT intrs */ sc->sc_use_txrx_independent_intr = false; - sc->sc_poll_linkstat = false; sc->sc_msix = true; } else if (msixcount >= sc->sc_nqueues) { /* TX/RX intrs */ sc->sc_use_txrx_independent_intr = false; - sc->sc_poll_linkstat = true; + sc->sc_no_link_intr = true; sc->sc_msix = true; } else { /* giving up using MSI-X */ sc->sc_msix = false; } - /* on AQ1a0, AQ2, or FIBRE, linkstat interrupt doesn't work? */ - if (aqp->aq_media_type == AQ_MEDIA_TYPE_FIBRE || - (HWTYPE_AQ1_P(sc) && FW_VERSION_MAJOR(sc) == 1) || - HWTYPE_AQ2_P(sc)) - sc->sc_poll_linkstat = true; - -#ifdef AQ_FORCE_POLL_LINKSTAT - sc->sc_poll_linkstat = true; -#endif - aprint_debug_dev(sc->sc_dev, "ncpu=%d, pci_msix_count=%d." " allocate %d interrupts for %d%s queues%s\n", ncpu, msixcount, (sc->sc_use_txrx_independent_intr ? (sc->sc_nqueues * 2) : sc->sc_nqueues) + - (sc->sc_poll_linkstat ? 0 : 1), + (sc->sc_no_link_intr ? 0 : 1), sc->sc_nqueues, sc->sc_use_txrx_independent_intr ? "*2" : "", - sc->sc_poll_linkstat ? "" : ", and link status"); + (sc->sc_no_link_intr) ? "" : ", and link status"); if (sc->sc_msix) - error = aq_setup_msix(sc, pa, sc->sc_nqueues, - sc->sc_use_txrx_independent_intr, !sc->sc_poll_linkstat); + error = aq_setup_msix(sc, pa); else
re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed
i'm pretty sure i've solved this properly this attempt, but review on this change would be appreciated. https://www.netbsd.org/~mrg/if_rge.c.v3.diff it includes a potential way to avoid wm(4) calling panic() if bus_dmamap_load*() fails.. .mrg.
re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed
> hmmm, but in thie case, no buffers would should be set to > be available for rx, so nowthing should pass RGE_OWN() at > L1245 i'd hope. i still see the problem with everything > being depleted, but then it should just stop getting any > rx packets at all... > > networking folks, am i missing something here? i see the > same problem in wm(4) as well. if wm_add_rxbuf() fails, > where will this ring entry's mbuf ever be replaced again? i see the thing i missed. i was looking at openbsd if_rge.c 1.16, which m_free()s the mbuf in this case, which in our tree has nothing that would refill it, but our if_rge.c has this comment: * If allocating a replacement mbuf fails, * reload the current one. which means that when we have a mbuf allocation error, we basically drop the current packet, and leave the mbuf in place ready for use next time. that means there is no mbuf leak in our current code, and i think the only part of openbsd if_rge.c 1.16 we want is the if_ierrors++ (that we call if_statinc(ifp, if_ierrors).) i think i see the problem (no, really, this time :-). when we have a memory failure, we don't re-load the map with bus_dmamap_unload(), so that's why it has zero size. the fix isn't simple because the load of the new mbuf can fail, and then we want to reload the old one, but it was the load event that failed, why would it work again for the old mbuf now? seems like we need to have a (very short) timer that tries to realloc it again, but i'm hoping someone else has solved this problem and we can use their method.. .mrg.
re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed
> #3 0x80fe6e5f in kern_assert () > #4 0x8058be67 in bus_dmamap_sync () > #5 0x8044edc7 in rge_rxeof () > #6 0x804536fd in rge_intr () i'm pretty sure this is the 2nd bus_dmamap_sync() call, as that's the only dma map that has load/unload applied at run time, vs the init sequence only, and it implies to me that rx dma map has had allocation failures to deplete the entire ring of mbufs, and then there are no mappings in the dma map, which leaves the dm_mapsize as 0, and triggers this bug. if i'm right, what's happened is this: 1237 for (i = sc->rge_ldata.rge_rxq_considx; ; i = RGE_NEXT_RX_DESC(i)) { 1245 if (RGE_OWN(cur_rx)) 1246 break; 1252 rxq = >rge_ldata.rge_rxq[i]; 1253 m = rxq->rxq_mbuf; 1257 /* Invalidate the RX mbuf and unload its map. */ 1258 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0, 1259 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD); 1260 bus_dmamap_unload(sc->sc_dmat, rxq->rxq_dmamap); 1283 * If allocating a replacement mbuf fails, 1284 * reload the current one. 1287 if (rge_newbuf(sc, i) != 0) { 1288 if (sc->rge_head != NULL) { 1289 m_freem(sc->rge_head); 1290 sc->rge_head = sc->rge_tail = NULL; 1291 } 1292 rge_discard_rxbuf(sc, i); 1293 continue; 1294 } loop 'i' has the ability to range between 0 and 1023, and accesses each ring entries rge_rxq. if, over time, each value between 0 and 1023 triggers the rge_newbuf() failure path, each successive entry will be lost, never to be replaced unless an explicit ifconfig down/up occurs. hmmm, but in thie case, no buffers would should be set to be available for rx, so nowthing should pass RGE_OWN() at L1245 i'd hope. i still see the problem with everything being depleted, but then it should just stop getting any rx packets at all... networking folks, am i missing something here? i see the same problem in wm(4) as well. if wm_add_rxbuf() fails, where will this ring entry's mbuf ever be replaced again? .mrg.
re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed
> panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file > "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0 this is from: KASSERTMSG(offset < map->dm_mapsize, "bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE, offset, map->dm_mapsize); the mapsize being zero indicates that there's nothing mapped currently in this dma map, so there's nothing to sync. ie, the caller seems to be trying to sync something not mapped. can you post the full back trace? .mrg.
re: 10.99.9 amd64 panic
i just commited what i believe is a fix for this problem, and for another potential memory leak i saw from inspection. seems to work for me on an amd64 host, been through several down/up sequences, though i did not force the memory alloc failure directly. (annoyingly, it takes 10-11s to regain link to my switch when doing this down/up sequence.) i'll prepare a pullup for netbsd-10, too. .mrg.
re: 10.99.9 amd64 panic
Martin Husemann writes: > On Fri, Sep 29, 2023 at 09:52:42AM +, Chavdar Ivanov wrote: > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9443196] panic: kernel diagnostic > > assertion "offset < map->dm_mapsize" failed: file > > "/home/sysbuild/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= > > 0x0 > [..] > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] bus_dmamap_sync() at > > netbsd:bus_dmamap_sync+0x326 > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] rge_rxeof() at > > netbsd:rge_rxeof+0x179 > > This is a bug in the rge(4) driver (unrelated to userland resource usage > by the build), maybe a race triggered more easily when the system is > under heavey load. hmm, this seems like corruption to me. > bus_dma.c", line 826 bad offset 0x0 >= 0x0 says that offset == 0 (which is right, this seem to this call): 1241 /* Invalidate the RX mbuf and unload its map. */ 1242 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0, 1243 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD); offset is the 0 / 3rd arg here, but the *second* 0x0 value here seems to be corrupted, and shouldn't be zero. ie, there's no case where it will create a zero-length dma map, it should always be either RGE_TX_LIST_SZ, RGE_RX_LIST_SZ, or RGE_JUMBO_FRAMELEN, so for this assert to trigger saying the passed offset is beyond the mapping, because the mapping is zero length, seems to be pretty clear that the bus_dmamap_t has been corrupted. the timing does seem to indicate that a problem with out of memory may be relevant here..oh, i think i may see a problem. 1110 rge_newbuf(struct rge_softc *sc, int idx) ... 1126 if (bus_dmamap_load_mbuf(sc->sc_dmat, rxmap, m, BUS_DMA_NOWAIT)) 1127 goto out; ... 1151 out: 1152 if (m != NULL) 1153 m_freem(m); 1154 return (ENOMEM); so, if bus_dmamap_load_mbuf() fails, we return ENOMEM, not ENOBUFS. however, the callers only consider ENOBUFS as an error case: 1176 rge_rx_list_init(struct rge_softc *sc) ... 1184 if (rge_newbuf(sc, i) == ENOBUFS) 1185 return (ENOBUFS); and 1212 rge_rxeof(struct rge_softc *sc) ... 1271 if (rge_newbuf(sc, i) == ENOBUFS) { so in this case, the code thinks a buffer was allocated, but it wasn't... i haven't gone deeping into what this may cause the code to do wrong yet, but it seems problematic. certainly, both callers should check for != 0, not == ENOBUFS, to avoid this problem. .mrg.
re: panic: assertion "!cpu_softintr_p()" failed
Thomas Klausner writes: > panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file > "/usr/src/sys/kern/subr_kmem.c", line 451 > > gdb says: > > #10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel %sassertion > \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xae2110a93e08) > at /usr/src/sys/kern/subr_prf.c:286 > #11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") > at /usr/src/sys/lib/libkern/kern_assert.c:51 > #12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at > /usr/src/sys/kern/subr_kmem.c:451 > #13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at > /usr/src/sys/kern/kern_rwlock_obj.c:127 > #14 0x80d825d3 in uvm_anon_release (anon=) at > /usr/src/sys/uvm/uvm_anon.c:385 i think this is a new bug. this line changed from: 1.11 (ad 12-Sep-23): pool_cache_put(rw_obj_cache, ro); to 1.12 (ad 23-Sep-23): kmem_free(ro, sizeof(*ro)); i guess it just should be kmem_free_intr(), as pool_cache is intr-safe as well. .mrg.
re: External display for ThinkPad W530
Malte Dehling writes: > Dear all, > > is there a way to get an external display to work on a ThinkPad W530? > >From what I read, both the mini-dp and the vga connector work only > with discrete graphics, which I have enabled in the BIOS > (optimus/switching mode). At boot I see these lines: > > [ 4.991148] nouveau0: NVIDIA GK107 (0e73c0a2) > [ 4.991148] nouveau0: autoconfiguration error: error: bios: unable > to locate usable image > [ 4.991148] nouveau0: autoconfiguration error: error: bios ctor failed, > -22 > [ 4.991148] nouveau0: autoconfiguration error: unable to create > nouveau device: 22 > > Anyone know what the issue is? With BIOS set to discrete only I see > the same lines and then a kernel panic (no console.) > > Running xrandr shows VGA1 as disconnected even with a cable plugged in. > > So 2 questions: 1) Do I really need to use the discrete graphics or is > there some other way? 2) How to get discrete graphics to work. > > Any help appreciated :) i have a thinkpad P51 that has the same basic issue. i've spent many hours trying to figure out where the vbios for the nvidia is, and haven't succeeded. here's the heavily patched boot log from my system where it tries all the ways: nouveau0: NVIDIA GM206 (126360a1) nvbios_shadow:232: method [name=] nvbios_shadow:232: method [name=PRAMIN] shadow_method:129: trying PRAMIN... shadow_method:133: init gave err -19 nvbios_shadow:232: method [name=PROM] shadow_method:129: trying PROM... shadow_image:81: image 0 invalid shadow_method:146: PROM: returning score 0 nvbios_shadow:232: method [name=ACPI] shadow_method:129: trying ACPI... shadow_method:133: init gave err -19 nvbios_shadow:232: method [name=ACPI] shadow_method:129: trying ACPI... shadow_method:133: init gave err -19 nvbios_shadow:232: method [name=PCIROM] shadow_method:129: trying PCIROM... linux_pci_map_rom:716: starting.. linux_pci_map_rom:722: mapped! pci_find_rom:686: size 524288 pci_find_rom:705: magic wrong is 2 linux_pci_map_rom:736: failed! pci_map_rom_md:684: entered pci_map_rom_md:687: is display shadow_method:133: init gave err -14 nvbios_shadow:232: method [name=PLATFORM] shadow_method:129: trying PLATFORM... shadow_method:133: init gave err -19 nouveau0: autoconfiguration error: error: bios: unable to locate usable image nouveau0: autoconfiguration error: error: bios ctor failed, -22 nouveau0: autoconfiguration error: unable to create nouveau device: 22 this one seems to be missing it, but at some point i'd patched the acpi nouveau code to try and it still failed (the logs above may appear to show it, but the code isn't in that tree.) AFAIK, the external ports on these laptops are only connected to the nvidia GPU so it is absolutely necessary to use anything but the built in display. i'd love someone to figure this out :-) .mrg.
re: Netbsd10_beta evbarm aarch64 userland build failure
> nbmtree: .: missing directory in specification > nbmtree: failed at line 1 of the specification there must be something wrong in your build tree or src tree. i updated and built this from a clean tree fine. this should have been fixed with this pullup: revision 1.175.2.1 date: 2023-09-04 10:33:28 -0700; author: martin; state: Exp; lines: +3 -1; commitid: 2TUS7rO7f7zuGtDE; Pull up following revision(s) (requested by riastradh in ticket #343): to etc/mtree/special. try making sure this is properly updated, and perhaps clean the objdir for etc/mtree and/or the destdir entirely. .mrg
re: panic with AMD EPYC 7313P on 10.0_BETA
Mark Davies writes: > Trying to boot a Dell Power Edge R6515 that has an AMD EPYC 7313P with > 10.0_BETA from a couple of day ago panics with: > > > panic: kernel diagnostic assertion "rcr4() & CR4_SMAP" failed: file > "...sys/arch/x86/x86/patch.c" > > backtrace of: > vpanic() > kern_assert() > x86_patch() > cpu_boot_secondary_processors() > main() > > Any suggestions what's going on and how to fix? perhaps there's a bios setting you have to enable? i've seen this before, and i just #if 0'd the panic since i didn't have time to think about it then. at worst, #if 0 will work around if while missing out on some modern security features. .mrg.
re: modesetting vs intel in 10.0
> [ 1.051227] i915drmkms: preliminary hardware support disabled this is a combo of the driver data for tiger lake (11th gen) having "require_force_probe" set to 1 (our drm base), and the netbsd probe code seeing this set and not matching properly. there's nothing you're doing wrong, it just isn't enabled (it may not work, i don't know.) if you want try, edit sys/external/bsd/drm2/i915drm/i915_pci_autoconf.c to disable the check at line 111. .mrg.
re: MKCROSSGDB=yes broken in new gdb?
> I pass it in LDFLAGS=-L${GMPOBJ} ? this doesn't help gmp.h being missing... i don't know what is up and for me, it works because pkgsrc gmp is installed. .mrg. > christos > > > On Aug 13, 2023, at 2:41 PM, matthew green wrote: > > > > FWIW, when i was looking at why my build worked it seems that > > the build is thinking it's building against the tools gmp but > > the -I path to find it is missing, but -I/usr/pkg/include is > > so that for me i'm getting the host gmp.h, but it's linking > > the tools libgmp.a.
re: MKCROSSGDB=yes broken in new gdb?
FWIW, when i was looking at why my build worked it seems that the build is thinking it's building against the tools gmp but the -I path to find it is missing, but -I/usr/pkg/include is so that for me i'm getting the host gmp.h, but it's linking the tools libgmp.a.
re: What to do about "WARNING: negative runtime; monotonic clock has gone backwards"
one problem i've seen in kern_tc.c when the timecounter returns a smaller value is that tc_delta() ends up returning a very large (underflowed) value, and that makes the consumers of it do a very wrong thing. eg, -2 becomes 2^32-2, and then eg in binuptime: 477 bintime_addx(bt, th->th_scale * tc_delta(th)); or in tc_windup(): 933 delta = tc_delta(th); 938 th->th_offset_count += delta; 939 bintime_addx(>th_offset, th->th_scale * delta); i "fixed" the time goes backwards on sparc issue a few years ago with this change, which avoids the above issue: http://mail-index.netbsd.org/source-changes/2018/01/12/msg091064.html but i really think that the way tc_delta() can underflow is a bad problem we should fix properly, i just wasn't sure of the right way to do it. .mrg.
re: tweaks needed for 10 branch
can you try commenting/removing this line (@L44 in -current) in external/gpl3/gcc/usr.bin/Makefile.inc: CXXFLAGS+= -std=gnu++98 i started seeing at least the gcc.c failure with GCC 10.5, and it seems that the upstream build doesn't use this by default now, and removing it fixed the build for me. .mrg.
re: modesetting vs intel in 10.0
> But maybe modesetting is mature enough (and intel bad enough) > to warrant being the default for Intel GPUs. i'm not familiar with the various intel chipsets, i've only had a couple of them over the years and besides porting the kabylake bits into the older drm version, i've not really touched it much. but, you can adjust the list of drivers used by default here in the xorg-server sources: hw/xfree86/common/xf86pciBus.c:xf86VideoPtrToDriverList() where it has a "default:" case for intel of "intel", and if you can properly figure out how to change this to "modesetting" for the newer ones (only?) that would be fine by me. (one way to handle this without having to patch this code would be to install the intel driver as some other name, and then make a copy of the "ati" front end called "intel" that loads either the real intel driver or modesetting, depending.) .mrg.
re: cpu temperature readings
> > though NetBSD's cpu selection algorithm doesn't (yet anyway) really > > understand processors like this. > > The scheduler did use first cores first, with performance cores > using low cpu numbers, they should be utilized first but not > necessarily for the important workloads. > > It now handles big.little configurations independent of cpu numbers, > but probably only on arm. our scheduler has a fast/slow CPU method only, so it handles "HT" by saying the non-1st sibling is slow, and the 1st one attached is fast, and for big.little/dynamiq it just marks the big cores as fast and little cores as slow. it then prefers fast cores over slow cores, and it will typically select lower cpu numbers once within the fast/slow zone. eg, on rk3399, cpu4 and cpu5 are used first for most tasks as they're the big cores, and cpu0 ends up getting a lot of random interrupts, and cpu1-3 are idle unless you're using more than 3 cores of CPU. this means that the 3-level speed provided by the newer intel client cpus is not handled by our code, and i believe it means it will not give up and not attempt any special and will thus just end up using cpu numbers. i had a look at converting the "bool cpu_is_slow" in cpu_data into an integer, but i didn't get far enough understanding all the current uses to properly know where to start. would be great if someone where to have a look at this. one hack to make thing work "sort of OK", would be to allow this to have one thread of the e-cores as fast, and both the other thread and the p-cores as slow. .mrg.
re: How to recover a root partition with damaged boot blocks
things to do: - reinstall bootxx_ffsvN -- make sure you're installing the right ffsvN. you can use "dumpfs | head -2", and it should say FFSv1 or FFSv2 here. that's "installboot" that you may have already done, but perhaps used the wrong one? - re-copy /boot. cp /usr/mdec/boot / - re-copy your /netbsd (where ever it came from) - uefi wants a MSDOS partition with /efi/boot/bootx64.efi, so if you haven't provided that it won't work. if you have enough space at the start or end of the disk you probably can do this, as it only needs to be pretty tiny. i did this on a system where root started at sector 2048, and i was able to create about 700KB file system, and bootx64.efi is only about 230KB. it normally is ok with mbr _or_ gpt partitions here. - check that the fdisk (gpt?) and disklabel are OK. ie, run both "fdisk wd0" and "disklabel wd0" and compare to your working system, see if anything stands out. HTH, .mrg. ps see "man 7 entropy" for how to fix the problem you observed.
re: Failure to build amd64 current
> `./build.sh -j 6 -u -x -U -o -T ../obj/tooldir.NetBSD-9.3-amd64 release > install-image' have you tried without "-o"? that might be the trigger here. it should work, but maybe it's broken in the src/compat build. thanks. .mrg.
re: GENERIC64 aarch64 failure to autoboot
Chavdar Ivanov writes: > On Sat, 4 Mar 2023 at 23:30, Michael van Elst wrote: > > > > ci4...@gmail.com (Chavdar Ivanov) writes: > > > > >Since my last aarch64 build yesterday, 03/03/2023, my machine no > > >longer boots automatically, > > > > sys/arch/evbarm/fdt/fdt_machdep.c 1.100 > > > > changed how the boot disk is determined. Apparently it now fails for you. > > That's right, I rebuilt it with 1.99 and it now boots as before. > > I guess I'll file a pr. on the system that didn't auto-boot properly, can you answer the ask root prompt dk1 like it should, and once it is booted up, show the result of "drvctl -p dk1" and also "ofctl -p /chosen"? that should help narrow what's going wrong here. i'm guessing that netbsd,gpt* are wrong some how, but we'll see.. thanks. .mrg.
re: AMDGPU Driver patches/bugs
thanks for your patches and help, Jeff! Taylor R Campbell writes: > > Date: Tue, 21 Feb 2023 13:20:13 -0800 > > From: Jeff Frasca > > > > I was going to try the radeon driver again, because I want to see if > > my wayland compositor works better against it than the AMDGPU driver > > (I'm getting some weird corruption problems with my compositor that > > do not happen under Linux, but that's probably my code). > > We have seen other weird minor graphics corruption problems with X, > even with xcompmgr or picom running. I probably made another stupid > bug, maybe in cacheability attributes or something, buried somewhere > in the megabytes of diffs... i see corruption with radeon and bios boot on a ryzen 5600G system. (this is one that fails the ring3 (?) test with UEFI, and even with "CSM" in the bios enabled, still attempts to load our uefi boot program, which then fails cuz it's in BIOS mode and hangs the boot. with no msdosfs visible to UEFI it boots fine in CSM mode.) i do *not* recall seeing it on my older systems (haswell, earlier ryzen.) i'll try out my amdgpu's next week some time. .mrg.
re: Difference between i915drm and i915drmkms
the old drm code for i915 is probably extremely obsolete at this point. i don't think it works on anything that current does (or least, before the latest refresh -- i think there are still a couple of blank screens, but i think newer than this code would support anyway.) the only reason i haven't removed it all is that for old radeon (R100/R200), some systems can't use new drm and you end up with both a black-on-black (or similarly unusable) console setup, and X doesn't work anyway. there's some problem with LUT setup in the current code, but there are no public docs and no one with access to them cares. removing from configs is probably a decent idea at this point. kre, this drm hasn't been the "main" drm since july 2013, we've had linux 3.8, 4.4, and now 5.6 based drm (all have the same failure mode.) .mrg.
re: binutils still failing on amd64
Robert Elz writes: > I wonder if perhaps part of the reason (or perhaps all of it) that > Paul and I see problems, where others aren't, is that we are both > building from a read only mounted source tree. oh yeah - this is only going to break r/o src tree builds, which is also something i use as much as possible. i recommend r/o src trees for all netbsd src builds. my random build failed issues became far less common when i did that decades ago. > Eg: from Paul's error log: > > Making info in po >GEN > /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc/bfdver.texi > x86_64--netbsd-install: > /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc: > chown/chmod: Read-only file system > sh: cannot create > /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc/bfdver.texi: > read-only file system > > which indicates something is trying to make files in the source > tree, instead of the obj tree. this specific instance should now be fixed. > The errors I'm seeing are different, but could have the same underlying > cause. what are you seeing? can you update and post the latest failures? .mrg.
re: binutils still failing on amd64
> > Sources updated to 2022-12-31 at 13:42:04 UTC and all output dirs (obj, > > release, dist, tools) were cleaned. > > Is no-one else seeing this problem with ``build.sh tools'' ? it's not seen by most because it depends upon the timestamps of some files.. my first attempt to fix it failed, i haven't gotten back to looking. try manually touching any of the files the build is trying to update for now. .mrg.
re: 10_BETA: Nice QOL improvements to the installer
nia writes: > On Thu, Dec 22, 2022 at 08:05:02AM +0530, Mayuresh wrote: > > On Thu, Dec 22, 2022 at 06:18:41AM +1300, Lloyd Parkes wrote: > > > I used the second (non-BIOS) image because I guessed it might be a hybrid > > > installer. I think that my old NUCs only support BIOS booting from USB > > > sticks, but I could easily be wrong. > > > > Ok. So, it appears the -bios image has become redundant now. Or hasn't it? > > > > If yes, they may want to stop building it to preempt such confusion. > > The BIOS-only image exists because of broken firmware. i have a system that still seems to load efiboot when configured in CSM-enabled mode. the only way to get it to load bootxx/boot was to move bootx64.efi away. i just discovered that this week. when CSM-enabled, efiboot would then print the memory map and hang. this system is also affected by PR#56714 -- and with bios booting working [*], radeon accel is also (mostly) working. anyway, what i suspect these broken systems do is still load efiboot and then efiboot is in some environment it doesn't handle well and then hangs... this is a guess. thanks. .mrg. [*] - keyboard access in /boot is broke on my installed system but seems to work on the USB with the bios image. each key press ends up generating 15-20 actual characters.
re: libX11 updated, fvwm, etc., hangs perhaps fixed now
"John D. Baker" writes: > I've updated to sources containing the new libX11 and rebuilt "wm/fvwm" > without the patches posted in: > > https://mail-index.netbsd.org/pkgsrc-users/2022/10/17/msg036348.html > > and the resulting fvwm appears to work properly. > > Thanks! great news. thanks for testing! > The patches were added to pkgsrc-HEAD. I suppose they can be removed > now. as i understand it, they probably should remain as the fixes in libX11 are considered workarounds for buggy code. .mrg.
HEADS UP: build break in xsrc update builds coming your way
hi folks. FYI: i just added this note to UPDATING: 2022: The new libdrm import worsened the conflict issues for the kdump/ktruss ioctl, and i915 now conflicts with base, and has been turned off. This will cause update build issues like: kdump-ioctl.c:12175:143: error: 'DRM_IOCTL_I915_DESTROY_HEAP' undeclared here (not in a function); did you mean 'DRM_IOCTL_MODE_DESTROY_DUMB'? You'll need to clean usr.bin/ktruss, usr.bin/kdump, and rescue. there are a few other things updated, please send-pr if you see issues. .mrg.
libX11 updated, fvwm, etc., hangs perhaps fixed now
hi folks. the newly released libX11 1.8.2 claims to fix issues in fvwm, xfce, and some motif stuff, related to hanging because of the thread safety changes. i know some problems were fixed, but this should now make the old binaries work again. i've merged into -current. if you have something still problematic, or have been avoiding using old binaries, it would be great to hear things work for you again now. thanks. .mrg.
re: How to BIOS-boot from NVMe device?
> > > If anyone wants to play with UEFI booting and has access to a recent Xen > > > DOM0 system you can install the pkgsrc/sysutils/ovmf package and point a > > pkgsrc/sysutils/ovmf does not build on -current at least - and hasn't been > > building for a long while. > > Hmm... unfortunate... it does build just fine on 9.2ish from 2022Q2 > pkgsrc. this just built for me on a ~3-day old -current src & pkgsrc system. Chavdar, hoe does it fail for you?
re: current USE_SSP=yes build failure
i've commited my fix for this after testing it. .mrg.
re: current USE_SSP=yes build failure
rudolf writes: > Hi, > > I have "USE_SSP=yes" in mk.conf and the build is failing with: > > --- dependall-drivers --- > /usr/xsrc/external/mit/xorg-server/dist/hw/xfree86/drivers/modesetting/drmmode_display.c: > > In function 'drmmode_crtc_gamma_set': > /usr/xsrc/external/mit/xorg-server/dist/hw/xfree86/drivers/modesetting/drmmode_display.c:1768:1: > > error: stack protector not protecting local variables: variable length > buffer [-Werror=stack-protector] > 1768 | drmmode_crtc_gamma_set(xf86CrtcPtr crtc, uint16_t * red, > uint16_t * green, >| ^~ > > Is this to be expected? Am I doing something wrong? The function itself > is very simple. ah, this comes from the call this function makes: if (drmmode_crtc->use_gamma_lut) { drmmode_set_gamma_lut(drmmode_crtc, red, green, blue, size); which is: drmmode_set_gamma_lut(drmmode_crtc_private_ptr drmmode_crtc, uint16_t * red, uint16_t * green, uint16_t * blue, int size) [ ... ] struct drm_color_lut lut[size]; i'll figure out a fix or workaround. thanks. .mrg.
re: FYI: new X server in -current, among other X things
> > (1) out of bounds problem in xserver/hw/xfree86/modes/xf86Crtc.h > > > > OpenBSD/luna88k maintainer (Kenji Aoyama) reported the following fix > > was neceesary for non-XFree86 driver based dumb server (on luna88k etc.): > > https://gist.github.com/ao-kenji/afb0ea5b6dca04975161f84ab41ba32b > > https://gist.github.com/ao-kenji/b0fd6b876605ba1b2b43309233566153 > > > > https://cvsweb.openbsd.org/cgi-bin/cvsweb/xenocara/xserver/hw/xfree86/modes/xf86Crtc.h#rev1.16 > > I turns out that at least luna68k Xorg server (happens to?) works > without this change, but anyway upstream 1.22.x branch already > has this fix: > > https://gitlab.freedesktop.org/xorg/xserver/-/commit/75d70612888f18339703315549db781a22c0cb23 > > I wonder if we should pull this fix or not for our (1.)21.1.4 tree.. this looks simple enough to just do. > > (2) "-flipPixels" option removal > > > > "-flipPixels" option (that inverts black and white on 1bpp server) > > has been removed since 1.21. > > > > https://gitlab.freedesktop.org/xorg/xserver/-/commit/d1c00c859c6676fbb540420c9055788bc19cb18f > > > > As noted in the log the upstream authors claim > > "No supported driver supports 1bpp anymore, nor has in a very long time." > > > > Howeverwe we still have several working servers (xf86-video-wsfb based > > servers on mac68k and luna68k, monolithic servers for sun3 and x68) > > and at least there was a report that this option was mandatory on SE/30. > > So I would like to revert this change. > > It also turns out that the above changes also remove a menber from > ScrnInfoRec structure in hw/xfree86/common/xf86str.h and it breaks > ABIs of xf86-video-* drivers. > > However fortunately the removed member "Bool flipPixels" in the > SrcnInfoRec has not been used for -flipPixels options so we can > safely pull back -flipPixels support by reverting the changes > except xf86str.h. > > If there is no particular comments I would like to commit the > attached (reverting -flipPixels removal) patch. go for it. we have a few things reverted, we maybe should talk to upstream to have them either revert there or at least provide the removed features elsewhere. thanks. .mrg.
re: FYI: new X server in -current, among other X things
Robert Swindells writes: > > I wrote: > > It looks like not all the functions are getting setup in the glamor > > struct by load_glamor(), I'm guessing because those functions are > > not exported by libglamoregl.so. > > > > Do we need to add more source files to this: > > > > src/external/mit/xorg/server/xorg-server/hw/xfree86/glamor_egl/Makefile > > Adding all of the glamor modules to libglamoregl.so makes it stop > crashing for me. can you send a patch? i'll look at it soon. .mrg.
re: FYI: new X server in -current, among other X things
Robert Swindells writes: > > I wrote: > > > >>> [ 378.033] (EE) 0: /usr/X11R7/bin/X (xorg_backtrace+0x44) [0x1467d46d5] > >>> [ 378.033] (EE) 1: /usr/X11R7/bin/X (os_move_fd+0x79) [0x1467d0465] > >>> [ 378.033] (EE) 2: /usr/lib/libc.so.12 (__sigtramp_siginfo_2+0x0) > >>> [0x75b46379c930] > >>> [ 378.034] (EE) > >>> [ 378.034] (EE) Segmentation fault at address 0x0 > >>> > >>> This happens with ctwm as part of the base installation, as well as with > >>> other pre-existing window managers and such from pkgsrc built against > >>> 9.99.97. > >> > >>can you configure X to generate a core dump or run it > >>under GDB and get the real stack trace? i thought we'd > >>fixed this problem in libexecinfo, but it's still not > >>tracing through the SEGV above, so finding what is > >>crashing where is what we need next. > > > >FWIW, I get the same on my Pinebook with a lima kernel, this may not be > >i915 specific. > > > >Doing a full debug build now. > > Building with MKDEBUG=yes stops it crashing, but it also stops glamor > from working. > > I guess it is back to printf(). with a normal build, you should at least be able to get a stack trace with function names, if not line numbers. you'll have to disable the xorg SEGV catcher... oh they seem to have removed that entirely: commit c7414f4d07b69a4b2f0d0af06f032393cf5fe6aa Author: Adam Jackson Date: Wed Aug 22 14:57:05 2018 -0400 xfree86: Remove NoTrapSignals This was dangerous on UMS and largely pointless on KMS. have you tried running the (non-debug) one from inside gdb as well, that should also give you something. .mrg.
re: panic in evo_wait
> > > [184218.xxx] fatal page fault in supervisor mode > > > [184218.xxx] trap type 6 code 0x2 ... > > > > this line's contents would have included the fault address, > > which is kinda useful for next time :-) > > I've got the rip -- it's 0x8095e177. oh - i was after the "cr2" value -- the actual fault address, not the code address that triggered it. your patch looks good. .mrg.
re: panic in evo_wait
> [184218.xxx] warning: > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: > 1 can you patch this code to print the value of "data" here? it's probably a bad request for userland, but the BUG_ON() here does not give you any indication on _what_. > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e > [184218.xxx] fatal page fault in supervisor mode > [184218.xxx] trap type 6 code 0x2 ... this line's contents would have included the fault address, which is kinda useful for next time :-) > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack > 0xb589296452c0 > kernel: page fault trap, code=0 > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 > 000,0(%rdx,%rax,1) > evo_wait() at netbsd:evo_wait+0x7b > base507c_ntfy_set() > nv50_wndw_flush_set() > nv50_disp_atomic_commit_tail() > nv50_disp_atomic_commit() > drm_atomic_helper_set_config() > drm_mode_setcrtc() > drm_ioctl() can you find out where evo_wait+0x7b is? in my kernel it's at line 243, and the disasm seems to patch your "movl" above. 235 evo_wait(struct nv50_dmac *evoc, int nr) 236 { 237 struct nv50_dmac *dmac = evoc; 238 struct nvif_device *device = dmac->base.device; 239 u32 put = nvif_rd32(>base.user, 0x) / 4; 240 241 spin_lock(>lock); 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { 243 dmac->ptr[put] = 0x2000; 244 evo_flush(dmac); Dump of assembler code for function evo_wait: 0x8084dfe1 <+0>: push %rbp [...] 0x8084e05c <+123>: movl $0x2000,(%rdx,%rax,1) (0x7b = 123) probably "dmac->ptr" is invalid here. a quick guess at the code indicates it's only set once in nv50_dmac_create(), the source from the caller(s). at least, i can't see it set anywhere else right now. .mrg.
re: FYI: new X server in -current, among other X things
> can you post the whole Xorg.0.log somewhere? most of > my i915 systems have become non-functional the last few > years, but i have one system to test. unfortunately, my system (kaby lake, GT 630) seems to work fine with xorg-server 21.1.4 for me.
re: FYI: new X server in -current, among other X things
> TL;DR: after upgrading via the sets available from releng builds from > July 16th (http://releng.netbsd.org/builds/HEAD/202207160630Z) I'm not > able to start X on amd64 with i915 graphics. Separately, there may be > issues with libX11 1.8.1 where clients will hang due to recursive locks > occurring. the libX11 thing is pretty terrible. upstream says that _not_ enabling it means other things are broken. i don't know anything better than fixing the clients i guess, which is pretty terrible for backwards compat code/binaries. > [ 378.033] (EE) 0: /usr/X11R7/bin/X (xorg_backtrace+0x44) [0x1467d46d5] > [ 378.033] (EE) 1: /usr/X11R7/bin/X (os_move_fd+0x79) [0x1467d0465] > [ 378.033] (EE) 2: /usr/lib/libc.so.12 (__sigtramp_siginfo_2+0x0) > [0x75b46379c930] > [ 378.034] (EE) > [ 378.034] (EE) Segmentation fault at address 0x0 > > This happens with ctwm as part of the base installation, as well as with > other pre-existing window managers and such from pkgsrc built against > 9.99.97. can you configure X to generate a core dump or run it under GDB and get the real stack trace? i thought we'd fixed this problem in libexecinfo, but it's still not tracing through the SEGV above, so finding what is crashing where is what we need next. does it happen when X starts up? maybe it crashes with plain running "X" without any arguments (ie, not using some frontend that will also fire up clients etc.) can you post the whole Xorg.0.log somewhere? most of my i915 systems have become non-functional the last few years, but i have one system to test. .mrg.
FYI: new X server in -current, among other X things
hi folks. i've updated most of xsrc to their latest versions. fontconfig and Mesa are remaining. i've tested the new code on amd64 and arm64, and built several ports to confirm they still build. the biggest change is the new xorg-server. there are probably a few build issues left to find across all ports, and perhaps some run-time ones too but basic testing looks fine for me. please send-pr or email here if you find problems. thanks! .mrg.
re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot
> (but I'm nots sure 64KB blocksize is valied on FFS because > newfs(8) man page just says 4KB-32KB for it) FWIW, i've been using 64K block *and frag size FFS for over a decade without any problem, on a file system that almost always has extremely large files on it. so, this should be fixed in the manual i guess. .mrg.
re: savecore weirdness
> I've tried overwriting the first 100MB of the 'dp' entry in my fstab > with zeroes in the hope of getting rid of the crashdump, but that > didn't help either. How can I get rid of the crashdump so savecore > doesn't try again to write it out? martin answered this, but to answer differently, the core dump is stored at the *end* of the dump partition, so clearing it is kind of annoying -- you have to work out "dumplo", and then count backwards from the end, etc. the purpose is that swap starts at the start, and dumps are at the end, so, if savecore needs to swap, it hopefully won't overwrite the not-yet-read dump data. .mrg.
re: effective use of blkdiscard(8)?
nia writes: > blkdiscard(8) seems like a command in -current that's useful for regular > maintenance of SSDs. > > I would assume that a regular run of: > > blkdiscard -v /dev/rwd0d > > would be useful to TRIM an entire SSD, obviously destructively, so would > be useful when reinstalling NetBSD. correct. > However, what about less obvious cases? > > A large file could be created, for example, with dd: > > # dd if=/dev/zero of=./testfile bs=4m count=1 > > Then discarded: > > # blkdiscard -v ./testfile > > Would this effectively mark 40GB of this drive unused to its controller? that's my understanding, given that the questions you ask below are handled to be "yes". it certainly does seem to be the case in my testing -- sometimes reading this data would return the same data, sometimes random data (!? where from?), and most of the time, zeroes -- so it certainly was triggering the storage. an additional thing we could/should add is "fstrim", which is designed to be run eg, weekly, and the idea is to tell the disk to discard all the unallocated sectors on the disk, which would give you the above feature without having to do anything, and infact give it to you for all the unused space. i have not looked at how linux implements this, but it clearly needs the file system itself to implement the backend. > How good are we at propgating TRIM commands through various block device > layers? > > Is fdiscard() effective on a file on FFSv2 on a cgd(4) on a dk(4) wedge? > > What about ZFS on a dk(4) on a cgd(4) on a dk(4) wedge? all of this depends upon what their driver 'd_discard' method does. the only ones that are not assigned to be "nodiscard" are: /home/src/current/src/sys/dev/ld.c:95: .d_discard = lddiscard, /home/src/current/src/sys/dev/ld.c:110: .d_discard = lddiscard, /home/src/current/src/sys/dev/ld.c:123: .d_discard = ld_discard /home/src/current/src/sys/dev/ata/wd.c:154: .d_discard = wddiscard, /home/src/current/src/sys/dev/ata/wd.c:171: .d_discard = wddiscard, /home/src/current/src/sys/dev/ata/wd.c:230: .d_discard = wd_discard /home/src/current/src/sys/dev/dkwedge/dk.c:125: .d_discard = dkdiscard, /home/src/current/src/sys/dev/dkwedge/dk.c:140: .d_discard = dkdiscard, so the vast majority of disk drivers do not support this yet. .mrg.
re: Potential iostat output format change
k...@munnari.oz.au writes: > Anyway, let me know what you think - is this worth finishing, or will > the changes break people's scripts (or something similar) - or do you > just prefer it the current way. i like this. i find the ordering of the default output has the same problem, and had vaguely been considering looking at at least the column size problem, but i really like the idea of re-ordering the columns so they're more visually separate. thanks. please finish it. .mrg.
re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
[ .. ] > install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that > after installing to a qcow2 disk any attempt to boot the disk results in > not being about to find the boot device. However, the boot log shows was this between 2022-05-08 and 2022-05-22? i accidentally broke some types of bootable images that Jared fixed, and i think this error matches the failure seen. .mrg. https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html
re: Radeon HD 5450?
Phil Nelson writes: > On Wed, 11 May 2022 11:15:42 +1000 > matthew green wrote: > > > do you have anything else handy to test? gpus are crazy stupid > > prices these days :-( > > Hi Matthew, > > My department has several nvidia around and I have not yet found > one that works to the point of getting X running. I've tried > the following: > >MSI GEFORCE GTX 1060 >GIGABYTE GEFORCE GTX 1650 >EVGA GEFORCE GTX 1018Ti (not enough power) >An older Radeon I had sitting around, not sure which one but > it blew up in the same place as the 5450 ... not mapping > the BIOS. >The video chip on the motherboard ... it finds it as > acpivga0 with acpiout0 to acpiout7. It finds a genfb0 > and labels it "Intel Rocket Lake UHD Graphics 750 (32EU) (rev. 0x04) > It then reports drm at genfb0 not configured. I do get a > working wscons with 4 screens. "X -configure" quits with > an error saying that the number of created screens does not > match number of detected devices. In the Xorg.0.log when > it probes for the Intel integrated Graphics Chipsets it > doesn't list the 750 and it doesn't match it. > I don't have heavy gpu requirements so if I could get the > intel UHD graphics working, that would be good. > > You said you have a working nouveau 730. I'll see if I can > acquire one of those to try. Any specific card you recommend? i have asus 730 and asus 1030 silent cards both working for me in my two main desktop systems now. the 730 did once assert(3) in libdrm_nouveau and X exited, and the 1030 has one had some minor display damage (green dots over the root window, likely generated by my green-on-black terminal, but cleared by simply moving a window over that space), and it's only been a couple of weeks using the 730, and few days for 1030. i don't have anything newer/better due to prices, and also cuz the above are more than sufficient for my needs. it's possible that back porting the rocket lake code wouldn't be too difficult -- that was true a few years back when i did this for kabylake when skylake was already supported... quick peek says that RKL appeared right after our drm, sometime between linux 5.6 and 5.10, and unfortunately, this struct: static const struct intel_device_info rkl_info = { in the new code has a couple of new members inside struct intel_device_info{} than our code, so the back port would need to consider these parts too. .mrg.
re: Trendnet TEW-648UBM detected as ugen not urtwn
Brook Milligan writes: > I am trying to use a Trendnet TEW-648UBM usb wifi dongle, which is > supposed to be recognized by the urtwn driver. However, it is > recognized as a ugen device, instead. > > [ 2.9586490] ugen0 at uhub1 port 1 > [ 2.9586490] ugen0: Realtek (0x20f4) 802.11n WLAN Adapter (0x648c), > rev 2.00/2.00, addr 3 > > I am not sure how to extract relevant information from the device. For > example, what usb tools should be used to figure out why this is not > recognized by urtwn? i guess see urtwn_devs[] in if_urtwn.c. it has no entry for this ID (0x648c) (or does usbdevs at all.) ie, add to usbdevs, make -f Makefile.usbdevs; add the new id string to if_urtwn.c. test. commit the usbdevs file, regen usbdevs*.h again (with the updated rcsids), and then commit the changes to usbdevs*.h and if_urtwn.c. hopefully it's actually still a urtwn(4). :-) .mrg.
re: Radeon HD 5450?
Phil Nelson writes: > Hi All, > >I've been trying to get -current running on a new Dell Precision > 3650. It is a UEFI boot only machine and when booting -current > with a Radeon HD 5450 installed (which works great on 9.2 on > an Dell Optiplex 7040) it panics when it can't find the Radeon BIOS. > > The messages at this point are: > > kern info: [drm] register mmio base: 0x7090 > kern info: [drm] register mmio size: 131072 > {drm:netbsd:radeon_get_bios+0x480} *ERROR Unable to locate a BIOS ROM > radeon0: autoconfiguration error: error: Fatal error during GPU init > radeon0: autoconfiguration error: unable to register drm: 22 > panic: cnopen: no console device > ... > > Is the 5450 too old a device for the UEFI boot only machine or > is there a way to get the BIOS address for the autoconfiguration? i can't easily check for a couple of weeks, but i have a system i think i had to use UEFI for that had a 5450. it didn't fail entirely like the above, it failed the "ring 3" test, and disabled acceleration. this mean eg, X worked ok, but many things use a lot of CPU. fortunately, this is a zen3 system so it's got a lot of CPU -- would cost about 1.5 cpus to play a 1080p video. this system has a nouveau 730 in it now, and everything is better except one time libdrm_nouveau triggered an assert() and X crashed. (an operation that should have a resource available didn't have it, and the assert() tripped this wanted invariant. i don't have the details handy.) the 5450 is old enough that while pcie shouldn't have these sorts of problems, i've had modern systems fail with pcie gpus, and i've had newer gpus fail in older pcie systems -- i believe it was the radeon RX 550 that caused my (old) core2 system to not boot. do you have anything else handy to test? gpus are crazy stupid prices these days :-( .mrg.
re: Supported graphics (in HEAD)
> Radeon RX 550 (HDMI, DP, and DVI with a DVI to HTML converter) FWIW, i put my RX 550 into my test box yesterday and ran my basic stress test -- 12 glxgears tiled separately and then playing a movie on top of it. it failed. the GPU resets itself a few times, there's severe display corruption, and usually a reboot is needed to get the system back. i don't know if simple usage will work better, but there are some significant bugs left here for us to find.. these are the older bugs from the new drm branch on github before the merge, in case anyone wants to look at them: https://github.com/riastradh/netbsd-src/issues/24 https://github.com/riastradh/netbsd-src/issues/28 https://github.com/riastradh/netbsd-src/issues/42 (#42 appears to be the same problems at 24 and 28.) .mrg.
re: Supported graphics (in HEAD)
Tom Ivar Helbekkmo writes: > Robert Elz writes: > > > Any advice? > > Well, in my experience, nvidia is probably something you only want if > you have lots of RAM in your workstation. In HEAD, there's a lot of > memory leaking going on - every change to the image on the monitor leaks > kmem-04096 items, and on my 1920x1080 monitor, watching a youtube video > in firefox leaks 2-300 of those per second. > > Of course, I only notice because I have a mere 4 GiB of RAM in this > workstation, which is more than plenty for the first couple of hours of > work (firefox and a few terminal windows, using the browser as little as > possible, and completely avoiding video), but demands a daily reboot. can you file a PR about this? i don't see the problem on a 750 or 730 cards. i don't have anything newer yet. (well, there's a 9x0M in a laptop, but i haven't managed to get any drm to find the video bios for that one and work.) there are likely some dtrace methods we can use to find the leak you're seeing, but it might be good to keep it all in the PR :) thanks. .mrg.
re: Supported graphics (in HEAD)
Robert Elz writes: > Date:Sat, 07 May 2022 14:28:12 +1000 > From: matthew green > Message-ID: <16731.1651897...@splode.eterna.com.au> > > Thanks for the reply. > > | the GTX 16xx are both in the recent supported list for > | new drm, > > Thanks, I might try one to see. But where is that list? > I searched everywhere I could think of, and could not find it. it's not obvious. i usually start with the PCI frontend that points to a list of pciids, and then you have to match those to product names. nvidia is actually a little easier because we support everything upto the latest GTX 30 series. eg, sys/external/bsd/drm2/nouveau/nouveau_pci.c: * NetBSD drm2/5.6 doesn't support Ampere (GTX 30 series) based cards: * 0x2080-0x20ff GA100 * 0x2200-0x227f GA102 * 0x2300-0x237f GA103 * 0x2480-0x24ff GA104 * 0x2500-0x257f GA106 * 0x2580-0x25ff GA107 * * TU116 (GTX 16xx) occupies the space from 0x2180-0x21ff. for radeon sys/external/bsd/drm2/radeon/radeon_pci.c: radeon_pci_lookup(const struct pci_attach_args *pa, unsigned long *flags) ... if ((PCI_VENDOR(pa->pa_id) == radeon_device_ids[i].vendor) && (PCI_PRODUCT(pa->pa_id) == radeon_device_ids[i].device)) so then you have to find radeon_device_ids[] and realise it's setup with the list in "radeon_PCI_IDS". for amdgpu the list is directly in dist/drm/amd/amdgpu/amdgpu_drv.c. i don't have a good solution for mapping pciids to products, but searching the internet usually finds stuff. > | but the Radeons are not there (these are Navi > | 2x GPUs, and new drm only went to Navi 1x.) > > How about the RX 550 ? I forgot that one when I sent the message > yesterday. RX 550 mostly works. it's been a while since i tried and this is actually a card i have.. > | i don't know how well they work thought, so if you can > | find something older, like geforce 700 series, > > Does the RTX T400 or T600 count as older? I had assumed not, > but I know less than nothing about any of this. T400 and T600 are maybe supported. they live in the very most recently supported list for nvidia, being Turing chipsets, so they should at least attempt to attach and work, but i've only heard of someone attempting the previous generation (these are the same chips as GTX 20 series.) > Of course if I could find the "supported" list(s) (as applicable > to current HEAD) I might be able to answer these questions for > myself. Supported means by both the kernel & X server (base > or pkgsrc) naturally. > > | my 2c. > > Thanks, worth more than that. > > I was kind of hoping (dreaming) that someone might say "If you > really don't care about acceleration" (I don't) "then just disable > x using userconf" (needing to build a custom kernel fine as well) > "and it should just work" (for any one of the 3 possible gpu types). nia's answer here should be useful :-) .mrg.
re: Supported graphics (in HEAD)
> What I need from the new one is no different than I needed > then, a flat frame buffer, capable of supporting 3 high res > monitors (3840x2160, 1440x2560 (portrait mode), and 2560x1080.) it's the 3840x2160 that makes the older cards not potential for your requirements -- they're max at 2560x1440 IIRC. > The oldest addin graphics cards avaikable to me are: > > Radeon RX 6500 > Nvidia GeForce GTX 1650 > > but those don't really offer suitable monitor connections. > There are Nvidia 1030's listed, but all "not available". > > Next are: > > Nvidia GeForce GTX 1660 > Radeon RX 6600 > > Which look as if they might be workable, if supported. the GTX 16xx are both in the recent supported list for new drm, but the Radeons are not there (these are Navi 2x GPUs, and new drm only went to Navi 1x.) i don't know how well they work thought, so if you can find something older, like geforce 700 series, that is likely to work better (i have a 730 in one of my systems, and besides tripping on a libdrm_nouveau assert once -- which mean X crashed unfortunately -- it has been fine.) my 2c. .mrg.
re: Stable names for USB serial adapters
> Perhaps you, like me, are frustrated that USB serial devices can get > enumerated in non-deterministic ways, which makes putting those device > names in configuration files (such as /etc/remote) less than useful. > > I threw together a little devpubd hook to fix this problem for those > adapters that have serial numbers (FTDI devices seem to reliably have > these): > > https://www.netbsd.org/~thorpej/99-ucom-symlinks [ .. ] this works great! if i have serialnumbers in my ucoms :-( out of 20 devices, i have 3 with serial numbers, leaving me with 22 ucoms without a stable name (5 dual port devices.) tempted to suggest we include something like this in src, i just wish it could work better for me. i just spent far too long making them attach in the same order in a new machine using hard coded kernel config.. oh well. thanks! .mrg.
re: Understanding's snippet of athn(9) code
Farhan Khan writes: > Hi all, > I am trying to understand a snippet of athn(9) code for the purpose of > porting to FreeBSD. I am reading the function athn_usb_htc_setup() > located in /usr/src/sys/dev/usb/if_athn_usb.c. After tracing it > through, it seems to terminate at a usbd_setup_xfer(9) call. > > Is this the equivalent of setting up which USB function will handle > which channel? This function seems similar to FreeBSD's > usbd_transfer_setup(9), which I believe does that. If so, how is that > different from athn_usb_open_pipes()? > > If not, what does athn_usb_htc_setup() do? It is not clear to me and > therefore I am having trouble making the translation. usbd_setup_xfer() is used to setup one USB transfer. it requires that an open pipe already be provided. the "TRANSFERS" section of usbdi.9 in netbsd has more details than the above: https://man.netbsd.org/usbdi.9 it doesn't do much more than fill in the "usbd_xfer" structure for the transfer operation - does not change the status of the device in any way until the transfer is actually submitted. .mrg.
re: odd setlist failure
this should be fixed now. sorry for the fallout. .mrg.
re: HDMI sound not working
Jaap Boender writes: > > connected to a dell ultrasharp lcd (both 2415 and 2715 models) using > > it's audio jack connected to a 2.1 speaker setup. > > So just to be sure - you get the sound to the monitor by HDMI and then > onwards with the audio jack? Then there's basically no difference in our > setups and I should be able to get mine to work somehow. Thanks for > this, knowing that it's possible is a big help. yes. .mrg.
re: HDMI sound not working
i don't have anything useful for you, except to say that this should or can be a working setup. > I've got a setup with two sound cards: the on-board sound chip, and the > graphics card (a Radeon RX550). These both seem to be dectected (after > adding the HDAUDIO_ENABLE_HDMI option to the kernel config), as the > dmesg shows: this works for me, across a couple of systems (same GPU), the last few years. my setup was haswell + supermicro motherboard, and is now zen2 + asus m/b, both with radeonhd 5450. my mixerctl, audioctl, and audiocfg output match yours almost identically except i'm missing the 8 channel options, and my mixerctl has just this: outputs.dacsel=HDMI00 connected to a dell ultrasharp lcd (both 2415 and 2715 models) using it's audio jack connected to a 2.1 speaker setup. i last updated my kernel about 3 weeks ago. .mrg.
re: well-supported card for new DRM?
> I'm looking to upgrade my graphics card. What's the newest generation > that's well supported by NetBSD-current now? > NVidia "Pascal" (e.g. GTX 1050 Ti)? > Radeon "Polaris" (e.g. Radeon RX 550)? > or even something newer? we have reports that 1030 works well. i still haven't gotten a newer nvidia since those are beyond my toy-gpu-card-price :-) the RX 550 mostly works but we are still having some bugs (i have one of these from before they went up about 2x in cost). the two issues i reported before merge: https://github.com/riastradh/netbsd-src/issues/24 https://github.com/riastradh/netbsd-src/issues/28 i don't think we got fixes for these yet. #42 looks like the same issue as my #24, and #34 is a panic i've never seen but on the next generation card (5500.) HTH. .mrg.
re: HEADS UP: Merging drm update
> Please update and try again? (I've only compile-tested the changes, > will take a closer look tomorrow if it doesn't fix the problem.) seems to work for me. i can once again mostly play 720p video with "mpv -vo x11". thanks! .mrg.
re: backward compatibility: how far can it reasonably go?
> > On Dec 8, 2021, at 10:52 AM, Greg A. Woods wrote: > > > > That's one bullet I've dodged entirely already since my oldest systems > > are running netbsd-5 stable. (Though in theory isn't there supposed to > > be COMPAT support for SA?) > > int > compat_60_sys_sa_register(lwp_t *l, > const struct compat_60_sys_sa_register_args *uap, > register_t *retval) > { > return sys_nosys(l, uap, retval); > } > > SA is one of those things that's REALLY hard to provide compatibility for. indeed, and only static userland would be affected, as ad@ also provided replacement libpthread.so's for netbsd-4 that made it use the newer kernel system instead, for use in chroot, and this technique could also be provided for earlier if needed. .mrg.
re: DRM access rights
> libGL error: failed to open drm device: Permission denied > libGL error: failed to load driver: i965 > > how can I solve this? Usually it is the means of adding oneself to a > specific group, changing devfs, but found nothing of there like. check the perms on /dev/dri/card0. make sure your console user has read/write access to it. .mrg.
HEADS UP: DTS update will renumber rockpro64 sd and emmc storage
hi folks. an unfortunate problem with the DTS 5.15 update is that the sdio device has been enabled by default, which means that the default ordering of sdmmc(4) devices changes, which leads to the ld(4) numbers changing too. the old sdmmc0 and sdmmc1 become sdmmc1 and sdmmc2, and so ld0 and ld1 become ld1 and ld2. there's no real simple fix for this, as we want to enable the sdio support, and forcing the old attachments would require some ugly patches. fortunately, installations with only one sd or emmc likely already use the "ROOT.a" method in /etc/fstab, which means that the change won't affect the root file system. a different pre-fix would be to reconfigure these as gpt and dk and mount via name or uuid, then it won't matter what the device unit is. .mrg.
re: IDENTIFY failed
> > wd1 at atabus1 drive 0 > > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for > > drive 0 > > wd1: autoconfiguration error: IDENTIFY failed > > wd1(ahcisata0:1:0): using PIO mode 0 > > > > and booting fails. Reverting and booting with 9.99.90 gets me a working box: > > > > wd1 at atabus1 drive 0 > > wd1: > > wd1: drive supports 16-sector PIO transfers, LBA48 addressing > > wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect... > > ... > > wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 > > (Ultra/133) (using DMA), NCQ (31 tags) > > > > I'm sure someone else saw this too, but I can't find the original post... > > https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: > between > NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK > NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed two possible changes to test reverting: http://mail-index.netbsd.org/source-changes/2021/10/05/msg132733.html which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. .mrg.
re: MIDI with Java on -current?
Tom Ivar Helbekkmo writes: > I have this Java application (JSynthLib) that needs to talk MIDI with my > synthesizers. I've previously run it with older versions of the JRE, > using the LinuxCharDevMidiProvider that came with it - but that no > longer works with the current environments. All I really need is a > standard interface between the official MIDI bits in the JRE and the > NetBSD /dev/rmidi stuff... > > Anyone know of something like that? did anyone ever port portmidi? i only see portaudio in pkgsrc. i don't know, but i would expect this is your best bet :( can you find the old LinuxCharDevMidiProvider code and port it to the new java? they probably ditched an old API that was the same as the one we implement since linux dropped it ages ago. .mrg.
re: ./build.sh tool build failure in very recent NetBSD-current
> I feel that this failure is related to recent gmp update. probably. it didn't happen initially but i see it now. will fix.. .mrg.
re: Anyone feel like fixing pkgsrc/emulators/tme ?
> but starting tmesh via gdb works?? i've had some crashes with pkgsrc/graphics/blender lately that go away in gdb. (it's kinda annoying, the -g enabled blender takes a really long time for gdb to load...) this is amd64 and 9.1-ish userland. .mrg.
re: requires a working dlopen()
dashdruid writes: > Hello List, > > I keep getting this error whatever I try to build from pkgsrc on NetBSD9.1 > i386. > > Even if I follow the basic tutorial with figlet: > > https://wiki.netbsd.org/pkgsrc/how_to_use_pkgsrc/ > > It's the same error for all packages. what's the actual error? > I have tried with GCC10, 7, 5 > > Also LEX was not present in /usr/bin so I have installed flex and linked the > flex binary to lex, idk if its a problem. sounds like you missed installing the "comp" set, which would make building packages quite challenging yes. .mrg.
re: X11 doesn't start -current amd64 lenovo laptop
can you try without an xorg.conf at all? this hardware is not currently supported by our kernel drm driver and will need to fallback to wsfb or vesa. it should do automatically without an xorg.conf to force a driver. in general, X -configure is no longer recommend, and a minimal xorg.conf for only the parts that aren't default OK is what we recommend now. when i am trying to force a driver all i have is an xorg.conf with just this: --- Section "Device" Identifier "Card0" Driver "wsfb" EndSection --- thanks. .mrg.
re: Problem reports for version control systems
> I too get long pauses with cvs, both at the beginning, > and even longer at the end after update is complete. the end part is most likely cvs cleaning up after itslf by removing all the subdirs it created but doesn't need. check disk io or ktrace for this part -- it's usually a local iops issue, than a network issue. .mrg.
re: math/cgal and gcc10
> Here (or pkgsrc-users?) seems ok. But my question would be if cgal > documents that it needs a C++11 compiler, in which case this change is > right regardless, or if it's supposed to be ok with C++03, in which case > maybe something else is wrong. the release notes from 2017 say that demos require c++11 now but the library is c++03 itself. i've tested building with this change on netbsd-9 and in current with GCC 10, and it seems fine in both. i'll commit the suggested fix, barring obejctions. thanks. .mrg.
re: HEADS UP: GCC 10 now default on several ports
matthew green writes: > i saw a report that netbsd-8 can't be built on -current but i'm > not finding it right now. > > i can confirm this is the case. you can work around the GCC 10 > inspired issues for now with eg: > >./build.sh -V HOST_CFLAGS='-fcommon -O2' this is likely to remain necessary unless we pullup fixes for at least make(1), if not more. i don't think that will happen, though if someone were to do the work we would consider it. > but then there is a -current regex vs -8 file magic regex issue. > > christos and i working on fixes for that. this part is now fixed in the netbsd-8 branch, and the tree can fully build with HOST_CFLAGS set as above. thanks. .mrg.
re: HEADS UP: GCC 10 now default on several ports
i saw a report that netbsd-8 can't be built on -current but i'm not finding it right now. i can confirm this is the case. you can work around the GCC 10 inspired issues for now with eg: ./build.sh -V HOST_CFLAGS='-fcommon -O2' but then there is a -current regex vs -8 file magic regex issue. christos and i working on fixes for that. .mrg.
re: GCC 10 available for testing etc. in -current.
> > - build.sh with no -u (update), and set -V HAVE-GCC=10 as a > >option. this ensures that everything is actually rebuilt > >with the new compiler. > > I'm guessing that should be "-V HAVE_GCC=10", but even so I just can't yup! > get this to build. I always get the message "cc: error: CET_HOST_FLAGS@: > No such file or directory". I'm going to see if I can find where this > has come from. Does it ring any bells for anyone? this is from GDB: gdb/dist/libiberty/Makefile.in:116: @CET_HOST_FLAGS@ did you try clean'ing the gdb objdirs? (both tools and the build one.) i think i recall a while back this was a problem when GDB was updated. .mrg.
re: HEADS UP: GCC 10 now default on several ports
"Thomas Mueller" writes: > > i've switched the alpha, amd64, sparc*, riscv*, ia64, and vax ports > > have all been switched to GCC 10. > > > please send-pr or send email here about problems you encounter. > > > thanks. > > > > .mrg. > > What about the i386 port? see README.gcc10: --- [8] - i386 seems to have a signal delivery issue. pthread tests hang and then complain with eg: threads_and_exec: q[ 627.6700846] sorry, pid 3154 was killed: orphaned traced process this problem occurs with GCC 9 as well. --- it all builds and mostly works, but atf hangs for me (with GCC 9 as well, so it's not a compiler issue, or, it's not a *new* compiler issue..) > Upgrading from NetBSD (amd64 and i386) 8.99.51, might the build encounter > trouble jumping from GCC 7.4 to 10? > > My NetBSD ports of interest are amd64 and i386. > > Or might it be better to do a two-step. source-upgrading to NetBSD 9.1_STABLE > first and then to current? > > Or is it OK to upgrade straight to current (9.99.81)? it shouldn't be necessary to go to netbsd-9 branch first. .mrg.
HEADS UP: GCC 10 now default on several ports
hi folks. i've switched the alpha, amd64, sparc*, riscv*, ia64, and vax ports have all been switched to GCC 10. please send-pr or send email here about problems you encounter. thanks. .mrg.
GCC 10 available for testing etc. in -current.
hi folks. (please reply privately to this spams-many-lists message, and i will keep src/external/gpl3/gcc/README.gcc10 updated with the latest status.) i've just commited the final parts that make most platforms build (and many run) with GCC 10 as the system compiler. i've tested these systems: - amd64 - sparc (qemu) - sparc64 - shark - evbarmv7hf (cubietruck) - i386 (has a signal delivery issue, but that seems to have been introduced last year, however, things seem to be equally as functional/broken.) - ia64 (ski boots as far as before) - mipsel (malta gxemul) - mips64 (either big or little endian) - sh3-el (landisk gxemul) - vax (simh) so i'm after testing for these targets: - alpha - hppa - powerpc - sh3-eb - arm32-eb - mipseb - m68k there are still issues for these targets: - arm64 -- 'LSE' extension issues, likely needs both fixes for libgcc and kernel work - sun2 ramdisk overflows, and it's already at the limit of what can boot without crashing from lack of space - x68k 'loadbsd' program appears to pull in TLS code from libc and does not link. the steps are fairly simple: - update -currnet srcs - build.sh with no -u (update), and set -V HAVE-GCC=10 as a option. this ensures that everything is actually rebuilt with the new compiler. - install new kernel/userland and perform testing. if you can run atf that would be great, but other tests are useful too. - reply to this message with results. thanks! .mrg.
re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
Martin Husemann writes: > On Sun, Apr 11, 2021 at 10:37:21AM +1000, matthew green wrote: > > > How can you invoke a make to test this (besides a full build.sh and adding > > > some output to the makefiles)? > > > Or: can you just fix and request pullup ;-) > > > I can run sparc tests (quickly) again. > > > > cd src/compat > > nbmake-sparc64 > > BOOTSTRAP_SUBDIRS=../../../crypto/external/bsd/openssl/lib/libcrypto > > dependall > > I still have no simple way to test the sparc64 -m32 libs - does this > obfuscation really gain something in the real world? i guess you figured it out going on the commit? to be a little more verbose about this: to build any subset of the normal "src/compat" dirs, invoke the right nbmake-$arch in src/compat with BOOTSTRAP_SUBDIRS set to a series of paths that built using the provided target (so only standard targets are available -- all, dependall, depend, clean, cleandir, install, etc.) so to just test the -m32 libc, i've used this: cd src/compat nbmake-sparc64 BOOTSTRAP_SUBDIRS=../../../lib/libc dependall nbmake-sparc64 BOOTSTRAP_SUBDIRS=../../../lib/libc install DESTDIR=/export/root/sparc64 and then my nfsroot has a new /usr/lib/sparc/libc.so.12 and i test it on the target. thanks. .mrg.
re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
Martin Husemann writes: > On Sat, Apr 10, 2021 at 04:12:55PM +1000, matthew green wrote: > > Martin Husemann writes: > > > On Sat, Apr 10, 2021 at 08:38:39AM +1000, matthew green wrote: > > > > for a quick fix, this is OK, but long term, these are built > > > > for sparc64 compat32 as well, and benefit from having this > > > > code in place. > > > > > > I have seen that (and the previous modes.inc conditionalizing it), but I > > > do not understand how we get there in the sparc64 compat libs build. > > > > > > Are you sure it used to pick this code for that case? I mean it clearly > > > was > > > intended to do so, but did it really work? If so, we should restore all > > > the conditionals to make it happen again and add better comments to > > > describe the involved make(1) magic. > > > > src/compat/sparc64/sparc/bsd.sparc.mk:CRYPTO_MACHINE_CPU= ${MLIBDIR} > > > > and > > > > src/crypto/external/bsd/openssl/lib/libcrypto/srcs.inc:.include > > "${.CURDIR}/arch/${CRYPTO_MACHINE_CPU}/${cryptoinc}" > > OK, and the ?= in srcs.inc not overriding this - I see. > > How can you invoke a make to test this (besides a full build.sh and adding > some output to the makefiles)? > Or: can you just fix and request pullup ;-) > I can run sparc tests (quickly) again. cd src/compat nbmake-sparc64 BOOTSTRAP_SUBDIRS=../../../crypto/external/bsd/openssl/lib/libcrypto dependall .mrg.
re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
Martin Husemann writes: > On Sat, Apr 10, 2021 at 08:38:39AM +1000, matthew green wrote: > > for a quick fix, this is OK, but long term, these are built > > for sparc64 compat32 as well, and benefit from having this > > code in place. > > I have seen that (and the previous modes.inc conditionalizing it), but I > do not understand how we get there in the sparc64 compat libs build. > > Are you sure it used to pick this code for that case? I mean it clearly was > intended to do so, but did it really work? If so, we should restore all > the conditionals to make it happen again and add better comments to > describe the involved make(1) magic. src/compat/sparc64/sparc/bsd.sparc.mk:CRYPTO_MACHINE_CPU= ${MLIBDIR} and src/crypto/external/bsd/openssl/lib/libcrypto/srcs.inc:.include "${.CURDIR}/arch/${CRYPTO_MACHINE_CPU}/${cryptoinc}" .mrg.
re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
> Different to other asm code that e.g. properly detetects various VIS > instructions that may or may not be available on the current CPU, the code > in ghash-sparcv9.pl is plain sparcv9 code and can not be enabled for our > sparc builds. > > Christos, can you disable all "modes" asm and request pullup? > I can quickly test on -current... for a quick fix, this is OK, but long term, these are built for sparc64 compat32 as well, and benefit from having this code in place. John's point about __arch64__ may be relevant -- i'm pretty sure that, before, that would only be set for sparc64 builds, be it 32 or 64 bit userland, since that target defaults to __arch64__ (which means sparcv9, not 64 bit ABI.) so if this has been removed, we're now building this code on sparc as well as sparc64 (both ways), which is new, and clearly it is buggy. .mrg.
re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
> >> and one more > >> > >> __sigaction_sigtramp(SIGILL...) > >> > >> Then, at the end: > >> > >> PSIG SIGILL SIG_DFL: code=ILL_ILLOPC, addr=0xedccbdf0, trap=2) > > Program was terminated due to an illegal opcode being detected in > the gcm_ghash_4bit() assembly function: yes. John, can you, from gdb, print the value of OPENSSL_sparcv9cap_P[0] and OPENSSL_sparcv9cap_P[1]. if 1<<6 is set in the first, then the vis3 path will be taken in gcm_ghash_4bit(). it seems that these caps are setup wrongly. you could try to instrument OPENSSL_cpuid_setup() in crypto/external/bsd/openssl/dist/crypto/sparcv9cap.c to print the various settigs. it seems that SPARCV9_VIS3 is set. note that there are two places it can be set, but the first one is only for _SVR4 so not used here. nothing here seems changed with the update. these values should all be zero for real sparc 32 bit hardware (they're the sparcv9 caps after all :) > As a workaround, until the offending opcode is found, try > `#undef GHASH_ASM_SPARC' on line 692 in > src/crypto/external/bsd/openssl/dist/crypto/modes/gcm128.c to force > use of the C functions. good idea. .mrg.
re: nothing contributing entropy in Xen domUs? or dom0!!!
> In this particular example server it's in a Dell R510 with a pair of > 6-core E5645 CPUs that "cpuid" shows the following for (in the dom0): this is a westmere-ep CPU, which does not support rdseed or rdrand. rdrand appeared in ivybridge (2 generations later, with sandybridge in the middle.)
re: -current tar(1) breakage
> Joerg thinks that this is an nfs issue (a bug with nfs giving incorrect data). even if true, tar shouldn't *core dump*. is there a path to RCE here some where? it's clearly overwriting pointers with strings, so unless someone can clearly show there is no code exec vector here, it seems potentially problematic and should be fixed. .mrg.
re: How to determine if graphics is supported by radeondrm?
radeondrm does not support any modern graphics card, and we don't have a working amdgpu driver yet (last i tried, it hung at boot and i did not have a serial console setup to test with yet.) you can have almost OK stuff with the vesa driver. maybe wsfb also can work. we're working (slower than hoped) on a drm update, but we do not have any ETA currently. .mrg.
re: Panic in usbd_create_xfer
Yorick Hardy writes: > Dear current-users, > > Happy new year! happy new year yorick! and everyone. > [ 659.839003] usbd_create_xfer() at netbsd:usbd_create_xfer+0x186 > [ 659.849001] usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0x74 > [ 659.849001] uhidev_open() at netbsd:uhidev_open+0x21c can you find out what lines in the source these are? espcially usbd_create_xfer+0x186, the other ones are most likely obvious only the single callers - eg, usbd_open_pipe_intr() calls usbd_create_xfer() once. thanks. .mrg.
re: Audio subsystem versus unplugging uaudio
nice. LGTM. .mrg.