re: amdgpu laptops with 10 & current?

2024-05-14 Thread matthew green
nia writes:
> The ThinkPad A485 looks pretty interesting for use with NetBSD.
[ ... ]
> - AMD Radeon Vega 6, 8 or 10
>
> Usually I prefer the smaller X series, but they've made them
> non-upgradable and harder to repair...
>
> ethernet is re0, this is different from the intel models that are
[ ... ]

i have an a495s that doesn't work so great, but i also have an a475
that does work pretty well.  the onboard re(4) works fine for an re(4)
on both (i have the dongle that rjs hinted at for the 495s.)

my a475 has a12-9800B cpu (4c 2.7ghz, 3.6ghz turbo), and i think it
calls the GPU an "R7", it is an amdgpu and it works fine.

the a475 almost suspend/resumes properly.  USB3 is broken afterwards.

i'm pretty happy with the a475, though it could be faster.

the a495s amdgpu doesn't work for me, though the default fb is good
enough for basic X usage (firefox without video works.)  i haven't
played as much with this because my system has a bad battery and
won't stay powered on unplugged.


what this means is .. an a485 may be some what broken, but perhaps
not as broken as the a495s is? :)


.mrg.


re: unable to boot 10.0/amd64

2024-04-15 Thread matthew green
this might be the same as

   https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57153

it's the same faulting function and similar offset...


.mrg.


new "compat" sets have really made sets harder to manage.

2024-04-13 Thread matthew green
hiya.


the new compat32 sets rearrangement has broken the GCC 12 build,
due to dropping "gcc=10" tag in some places.  that's a minor issue,
and i'll fix that soon (though having looked closer at the first
"grep -r" output below, i see most of these are affected.  i'll
just initially be fixing arm64 and amd64.)

however, while looking at this i noticed that there's been a major
explosion in sets that shouldn't happen.  compare matches for
"libasan.so.5.0" betweeen new/old:

yesterday-when-i-was-mad distrib/sets/lists> grep -r asan.so.5.0 .
./base/shl.mi:./usr/lib/libasan.so.5.0  base-sys-shlib  
cxx,gcc=10
./debug/shl.mi:./usr/libdata/debug/usr/lib/libasan.so.5.0.debug 
comp-sys-debug  debug,cxx,gcc=10
./base32/ad.aarch64:./usr/lib/eabi/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/ad.aarch64:./usr/lib/eabihf/libasan.so.5.0 
base-compat-shlib   compat,gcc,cxx
./base32/ad.mips64eb:./usr/lib/64/libasan.so.5.0
base-compat-shlib   compat,gcc,cxx
./base32/ad.mips64eb:./usr/lib/o32/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/ad.mips64el:./usr/lib/64/libasan.so.5.0
base-compat-shlib   compat,gcc,cxx
./base32/ad.mips64el:./usr/lib/o32/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/ad.mipsn64eb:./usr/lib/64/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/ad.mipsn64eb:./usr/lib/o32/libasan.so.5.0  
base-compat-shlib   compat,gcc,cxx
./base32/ad.mipsn64el:./usr/lib/64/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/ad.mipsn64el:./usr/lib/o32/libasan.so.5.0  
base-compat-shlib   compat,gcc,cxx
./base32/ad.powerpc64:./usr/lib/powerpc/libasan.so.5.0  
base-compat-shlib   compat,gcc,cxx
./base32/ad.riscv64:./usr/lib/rv32/libasan.so.5.0   
base-compat-shlib   compat,gcc,cxx
./base32/md.amd64:./usr/lib/i386/libasan.so.5.0 
base-compat-shlib   compat,gcc,cxx
./base32/md.sparc64:./usr/lib/sparc/libasan.so.5.0  
base-compat-shlib   compat,gcc,cxx
./debug32/ad.aarch64:./usr/libdata/debug/usr/lib/eabi/libasan.so.5.0.debug  
comp-sys-debug  debug,compat
./debug32/ad.aarch64:./usr/libdata/debug/usr/lib/eabihf/libasan.so.5.0.debug
comp-sys-debug  debug,compat
./debug32/ad.mips64eb:./usr/libdata/debug/usr/lib/64/libasan.so.5.0.debug   
comp-sys-debug  debug,compat
./debug32/ad.mips64eb:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug  
comp-sys-debug  debug,compat
./debug32/ad.mips64el:./usr/libdata/debug/usr/lib/64/libasan.so.5.0.debug   
comp-sys-debug  debug,compat
./debug32/ad.mips64el:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug  
comp-sys-debug  debug,compat
./debug32/ad.mipsn64eb:./usr/libdata/debug/usr/lib/n32/libasan.so.5.0.debug 
comp-sys-debug  debug,compat
./debug32/ad.mipsn64eb:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug 
comp-sys-debug  debug,compat
./debug32/ad.mipsn64el:./usr/libdata/debug/usr/lib/n32/libasan.so.5.0.debug 
comp-sys-debug  debug,compat
./debug32/ad.mipsn64el:./usr/libdata/debug/usr/lib/o32/libasan.so.5.0.debug 
comp-sys-debug  debug,compat
./debug32/ad.powerpc64:./usr/libdata/debug/usr/lib/powerpc/libasan.so.5.0.debug 
comp-sys-debug  debug,compat
./debug32/ad.riscv64:./usr/libdata/debug/usr/lib/rv32/libasan.so.5.0.debug  
comp-sys-debug  debug,compat
./debug32/md.amd64:./usr/libdata/debug/usr/lib/i386/libasan.so.5.0.debug
comp-sys-debug  debug,compat
./debug32/md.sparc64:./usr/libdata/debug/usr/lib/sparc/libasan.so.5.0.debug 
comp-sys-debug  debug,compat


vs in an older tree:

yesterday-when-i-was-mad distrib/sets/lists> grep -r asan.so.5.0 . 
./base/shl.mi:./usr/lib/libasan.so.5.0  base-sys-shlib  
compatfile,cxx,gcc=10
./debug/shl.mi:./usr/libdata/debug/usr/lib/libasan.so.5.0.debug 
comp-sys-debug  debug,compatfile,cxx,gcc=10

ie, there are just *two* entries for this file (the real file, and
the debug file), and the rest is all derived from the "comaptfile"
and "debug" tags.  the new ones has 30 copies, spread across a
number of files, all that will need editing as future GCCs appear.

this is compounded across dozens of other files so there are now
hundreds or perhaps thousands of unnecessary duplicated lines, in
a couple of dozen of files.

can someone please fix this?  (nia is out for now, so maybe some
other enterprising person can help :)

thanks.


.mrg.


re: raidframe and gpt

2024-03-16 Thread matthew green
Paul Goyette writes:
> Does anyone have an example of how to configure raid0 on a GPT disk?

these are my notes i refer to every so often:

https://www.netbsd.org/~mrg/gpt-raid-setup.txt

it's gpt on each with type raid, which gives you dkN @ diskN,
you then create a raid with those dkNs, and then you create
another gpt on the raid device itself, with a ffs partition.

(see below; but skip the raidN.conf method, and just use the
newer raidctl create.)

> I can easily set the partition type with gpt, but how do I reserve
> space for the raid component label?  Do I need to reserve that space?

note how i pick "-b 128" above to get my partitions aligned on
at least 64K bounaries.  nvme/sata probably wants higher (check
your disk specs, it can vary a lot, and you could go as high as
6MB alignment to catch all known alignment...)

> Also, does raidframe understand the NAME=gpt-label syntax in the
> config file?  Or does it require me to specify the particular dk ?
> (And what happens if something moves and  changes?)

NAME= works.  use autoconfig raid.. actually just use the new
in -current "raidctl create", since it does all the intro set
and good default choices.

> It seems so much simpler to use ccd(4) but there's a nasty memory
> allocation bug which makes it unuseable for now.

you can't root-on-ccd like you can root-on-raidframe :-)  you
could, using the same initrd method root-on-cgd uses.


.mrg.


re: rc.d start order

2024-03-05 Thread matthew green
Paul Goyette writes:
> On Tue, 5 Mar 2024, Paul Goyette wrote:
>
> > I _think_ it will work correctly if I modify fstab to refer to
> > NAME=Builds instead of ccd0.  I will update here after I confirm.
>
> Yes this seems to work.

this is very much preferred.  "ccd0" is the device i suspect if
you re-ran 'MAKEDEV ccd0' you'd end up with a new /dev/ccd0 that
is an alias for the rawpart (c or d, d for amd64.)

so, perhaps the failure to run this and get a modern netbsd 
device name present actually got you to use the right way of
talking to wedges :)


.mrg.


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
ah.  the problem is that struct isc_nmhandle grew a pointer member,
adding 4 bytes to the struct size, and it uses C99 [] variable array
for the final member, which is later assigned to other pointers, and
this memory was now only 4-byte aligned.  this hack patch works to
stop named crashing for me, but i'll let christos figure out what the
right general solution here is.


.mrg.


Index: lib/isc/netmgr/netmgr-int.h
===
RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v
retrieving revision 1.8.2.1
diff -p -u -r1.8.2.1 netmgr-int.h
--- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 -  1.8.2.1
+++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 -
@@ -276,7 +276,7 @@ struct isc_nmhandle {
LINK(isc_nmhandle_t) active_link;
 #endif
void *opaque;
-   char extra[];
+   char extra[] __attribute__((__aligned__(8)));
 };
 
 typedef enum isc__netievent_type {


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
this appears to be a badly aligned structure issue.  i can reproduce
it by doing "anita interact" with any recent sparc .iso, editing the
named.conf to start, starting named, and doing 'dig ns netbsd.org'
would trigger the crash.

the stack trace is:

(gdb) bt
#0  ns__client_request (handle=0xeb02d008, eresult=ISC_R_SUCCESS, 
region=, arg=)
at /usr/10/src/external/mpl/bind/lib/libns/../../dist/lib/ns/client.c:1825
#1  0xedb0dc80 in isc__nm_async_readcb (worker=0x0, ev0=0xeccf7ad4) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2914
#2  0xedb0dde0 in isc__nm_readcb (sock=0xecfe8808, uvreq=0xeb0b6008, 
eresult=ISC_R_SUCCESS)
at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2887
#3  0xedb1183c in udp_recv_cb (handle=, nrecv=53, 
buf=0xeccf7c54, addr=, flags=0)
at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/udp.c:653
#4  0xedb3aec8 in uv__udp_recvmsg (handle=0xecfe89f8) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:303
#5  uv__udp_io (loop=, w=0xecfe8a38, revents=1) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:178
#6  0xedb3a034 in uv__io_poll (loop=0xecf62810, timeout=) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/kqueue.c:390
#7  0xedb431a0 in uv_run (loop=0xecf62810, mode=UV_RUN_DEFAULT) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/core.c:406
#8  0xedb106ec in nm_thread (worker0=0xecf62808) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:704
#9  0xedb20f44 in isc__trampoline_run (arg=0xecf36be0) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/trampoline.c:192
#10 0xed9ecda8 in pthread__create_tramp (cookie=0xecf7b000) at 
/usr/10/src/lib/libpthread/pthread.c:595

and the problem is that in ns__client_request(), we end up with:

(gdb) p client
$17 = (ns_client_t *) 0xeb02d144

but the alignment requirement for this structure is 8-bytes as it has
64-bit members.  the fault actually occurs when reading two 4-byte
members in one instruction:

1825env = client->manager->aclenv;
1826if (client->sctx->blackholeacl != NULL &&
   0x00036e70 <+408>: ldd  [ %l6 + 0x10 ], %g2

"sctx" and "manager" are at offsets 0x10 and 0x14 and can both be
read with a single ldd (64-bit load) but this requires correct
alignment.

i didn't track down how this client value is allocated, it's all
via some opaque handle thing in the libraries, but this is a bug
in the new bind not allocating structures properly aligned.


.mrg.


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
actually, i found a core file in /var/chroot/named/etc/namedb/named.core.

my build is missing debug info so i don't have a good idea what.


.mrg.


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
> Unfortunately there was no core dump.

this is almost certainly because /var/chroot/named is not writeable
by user named, which is on purpose.

you can set the corefile path for this process after it starts using
sysctl proc.$pid.corename.  i think setting to "/var/tmp/%n.core"
should allow it to write to /var/tmp in the chroot.


.mrg.


re: Removing a superfluous warning from xf86-input-ws/dist/src/ws.c

2024-02-06 Thread matthew green
> On Mon 05 Feb 2024 at 10:18:09 +1100, matthew green wrote:
> > perhaps convert into a DBG(4, ...)?
>
> On Mon 05 Feb 2024 at 02:20:25 +0300, Valery Ushakov wrote:
> > May be make it reported only once, so that the message is still there
> > in the log, but it's not spammed uselessly, adding no new information?
>
> I think I like the second suggestion slightly better, so I'll go with
> that. I'll do a test build first, even though it seems trivial. I didn't
> do a build in a while anyway...

i like this.  thanks.


.mrg.


re: Removing a superfluous warning from xf86-input-ws/dist/src/ws.c

2024-02-04 Thread matthew green
> if (hscroll || vscroll) {
> xf86Msg(X_WARNING, "%s: hscroll=%d, vscroll=%d\n",
> pInfo->name, hscroll, vscroll);
[ ... ]

> This touchpad method is not supported by the xf86-input-mouse driver so
> with that one the touchpad doesn't scroll.
>
> Shall I just remove the warning?

perhaps convert into a DBG(4, ...)?

it certainly shouldn't be generated log flood so downgrade or
removal is the right answer.

thanks.


.mrg.


re: unlink_if_ordinary undefined...

2023-12-31 Thread matthew green
>   = note: ld: /usr/libexec/liblto_plugin.so: error loading plugin:
> /usr/libexec/liblto_plugin.so: Undefined PLT symbol
> "unlink_if_ordinary" (symnum = 47)

this part should be fixed now.  probably needs a pullup..


.mrg.


re: Update ARFLAGS?

2023-12-28 Thread matthew green
Thomas Klausner writes:
> Hi!
>
> As noted in PR 57565, the default ARFLAGS in share/mk/sys.mk are
> broken - they use 'l' which changed behaviour between binutils 2.34
> and 2.39.
>
> Ok to commit the change?
>
> (This broke the build of ruby-nokogiri recently, which is how I
> noticed.)

the change?  removing 'l'?  yes... though i still find it
pretty offensive that it changed behaviour now.  it was an
ignored option before so, removing it is the right change.

thanks.


.mrg.


re: gcc 12 question

2023-11-24 Thread matthew green
Patrick Welche writes:
> On Thu, Nov 23, 2023 at 12:31:34PM +, Robert Swindells wrote:
> > 
> > Patrick Welche  wrote:
> > > I'm trying to build a release on amd64 using
> > >
> > > HAVE_MESA_VER=21
> > > HAVE_GCC=12
> > 
> > What does pkgsrc graphics/MesaLib do if built using gcc 12?
>
> It builds OK.
>
> Given
>
> https://gcc.gnu.org/bugzilla//show_bug.cgi?id=109716
>
> my guess is that the pkgsrc package doesn't treat warnings as errors.
> (-Werror=stringop-overread)

this looks wrong to me (the warning, as you pointed out in your
original mail, the code appears fine), and the right workaround
is to use ${CC_WNO_STRINGOP_OVERREAD} to avoid it.

thanks.


.mrg.


re: Aquantia AQC100 issues

2023-11-12 Thread matthew green
Rin Okuyama writes:
> Hi Andrius,
>
> If you still have this AQC100 in working condition, can you try this patch?
>
> https://gist.github.com/rokuyama/ab6ba1a0fac7fa15f243d63a99e14f33
>
> I've collected three fibre aq(4) variants (all rev 2), and link status
> interrupts work just fine for me. I think that link intr did not work for
> you, not due to fibre variant, but hardware revision. If this is correct,
> the patch above should work...

this reminded me that my aq(4) doesn't have working link and that
mlelstv suggested to me that the linux driver always uses a tick
timer to also check status, as well as interrupts.  i implemented
this recently and now my aq(4) has link status correctly:


aq(4): always poll for link status

some devices don't have working link status and rather than have
a likely incomplete list of issues, always poll as well as use
the interrupt if possible.

fixes link status on this device:

aq0 at pci5 dev 0 function 0: Aquantia AQC107 10 Gigabit Network Adapter (rev. 
0x02)
aq0: Atlantic revision B1, F/W version 3.1.88

(was otherwise functional, just didn't report status, which likely
meant eg, dhcpcd would be upset?)

idea via mlelstv@ from linux.

remove sc_detect_linkstat and rename sc_poll_linkstat to
sc_no_link_intr, as the meaning has changed.  simplify the signature
for aq_setup_msix() and aq_establish_msix_intr(), removing forward
decls that aren't required.  obsolete AQ_FORCE_POLL_LINKSTAT.


Index: if_aq.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_aq.c,v
retrieving revision 1.45
diff -p -u -r1.45 if_aq.c
--- if_aq.c 29 May 2023 08:00:05 -  1.45
+++ if_aq.c 26 Oct 2023 06:55:28 -
@@ -1330,8 +1330,7 @@ struct aq_softc {
int sc_rx_irq[AQ_RSSQUEUE_MAX];
int sc_linkstat_irq;
bool sc_use_txrx_independent_intr;
-   bool sc_poll_linkstat;
-   bool sc_detect_linkstat;
+   bool sc_no_link_intr;
 
 #if NSYSMON_ENVSYS > 0
struct sysmon_envsys *sc_sme;
@@ -1443,11 +1442,9 @@ static int aq_match(device_t, cfdata_t, 
 static void aq_attach(device_t, device_t, void *);
 static int aq_detach(device_t, int);
 
-static int aq_setup_msix(struct aq_softc *, struct pci_attach_args *, int,
-bool, bool);
+static int aq_setup_msix(struct aq_softc *, struct pci_attach_args *);
 static int aq_setup_legacy(struct aq_softc *, struct pci_attach_args *,
 pci_intr_type_t);
-static int aq_establish_msix_intr(struct aq_softc *, bool, bool);
 
 static int aq_ifmedia_change(struct ifnet * const);
 static void aq_ifmedia_status(struct ifnet * const, struct ifmediareq *);
@@ -1784,67 +1781,57 @@ aq_attach(device_t parent, device_t self
if (msixcount >= (sc->sc_nqueues * 2 + 1)) {
/* TX intrs + RX intrs + LINKSTAT intrs */
sc->sc_use_txrx_independent_intr = true;
-   sc->sc_poll_linkstat = false;
sc->sc_msix = true;
} else if (msixcount >= (sc->sc_nqueues * 2)) {
/* TX intrs + RX intrs */
sc->sc_use_txrx_independent_intr = true;
-   sc->sc_poll_linkstat = true;
sc->sc_msix = true;
} else
 #endif
if (msixcount >= (sc->sc_nqueues + 1)) {
/* TX/RX intrs LINKSTAT intrs */
sc->sc_use_txrx_independent_intr = false;
-   sc->sc_poll_linkstat = false;
sc->sc_msix = true;
} else if (msixcount >= sc->sc_nqueues) {
/* TX/RX intrs */
sc->sc_use_txrx_independent_intr = false;
-   sc->sc_poll_linkstat = true;
+   sc->sc_no_link_intr = true;
sc->sc_msix = true;
} else {
/* giving up using MSI-X */
sc->sc_msix = false;
}
 
-   /* on AQ1a0, AQ2, or FIBRE, linkstat interrupt doesn't work? */
-   if (aqp->aq_media_type == AQ_MEDIA_TYPE_FIBRE ||
-   (HWTYPE_AQ1_P(sc) && FW_VERSION_MAJOR(sc) == 1) ||
-   HWTYPE_AQ2_P(sc))
-   sc->sc_poll_linkstat = true;
-
-#ifdef AQ_FORCE_POLL_LINKSTAT
-   sc->sc_poll_linkstat = true;
-#endif
-
aprint_debug_dev(sc->sc_dev,
"ncpu=%d, pci_msix_count=%d."
" allocate %d interrupts for %d%s queues%s\n",
ncpu, msixcount,
(sc->sc_use_txrx_independent_intr ?
(sc->sc_nqueues * 2) : sc->sc_nqueues) +
-   (sc->sc_poll_linkstat ? 0 : 1),
+   (sc->sc_no_link_intr ? 0 : 1),
sc->sc_nqueues,
sc->sc_use_txrx_independent_intr ? "*2" : "",
-   sc->sc_poll_linkstat ? "" : ", and link status");
+   (sc->sc_no_link_intr) ? "" : ", and link status");
 
if (sc->sc_msix)
-   error = aq_setup_msix(sc, pa, sc->sc_nqueues,
-   sc->sc_use_txrx_independent_intr, !sc->sc_poll_linkstat);
+   error = aq_setup_msix(sc, pa);
else

re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-18 Thread matthew green
i'm pretty sure i've solved this properly this attempt, but
review on this change would be appreciated.

   https://www.netbsd.org/~mrg/if_rge.c.v3.diff

it includes a potential way to avoid wm(4) calling panic() if
bus_dmamap_load*() fails..


.mrg.


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-17 Thread matthew green
> hmmm, but in thie case, no buffers would should be set to
> be available for rx, so nowthing should pass RGE_OWN() at
> L1245 i'd hope.  i still see the problem with everything
> being depleted, but then it should just stop getting any
> rx packets at all...
>
> networking folks, am i missing something here?  i see the
> same problem in wm(4) as well.  if wm_add_rxbuf() fails,
> where will this ring entry's mbuf ever be replaced again?

i see the thing i missed.

i was looking at openbsd if_rge.c 1.16, which m_free()s
the mbuf in this case, which in our tree has nothing that
would refill it, but our if_rge.c has this comment:

   * If allocating a replacement mbuf fails,
   * reload the current one.

which means that when we have a mbuf allocation error,
we basically drop the current packet, and leave the mbuf
in place ready for use next time.  that means there is no
mbuf leak in our current code, and i think the only part
of openbsd if_rge.c 1.16 we want is the if_ierrors++
(that we call if_statinc(ifp, if_ierrors).)

i think i see the problem (no, really, this time :-).

when we have a memory failure, we don't re-load the
map with bus_dmamap_unload(), so that's why it has zero
size.

the fix isn't simple because the load of the new mbuf
can fail, and then we want to reload the old one, but
it was the load event that failed, why would it work
again for the old mbuf now?  seems like we need to have
a (very short) timer that tries to realloc it again,
but i'm hoping someone else has solved this problem and
we can use their method..


.mrg. 


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> #3  0x80fe6e5f in kern_assert ()
> #4  0x8058be67 in bus_dmamap_sync ()
> #5  0x8044edc7 in rge_rxeof ()
> #6  0x804536fd in rge_intr ()

i'm pretty sure this is the 2nd bus_dmamap_sync() call, as that's
the only dma map that has load/unload applied at run time, vs the
init sequence only, and it implies to me that rx dma map has had
allocation failures to deplete the entire ring of mbufs, and then
there are no mappings in the dma map, which leaves the dm_mapsize
as 0, and triggers this bug.

if i'm right, what's happened is this:

1237 for (i = sc->rge_ldata.rge_rxq_considx; ; i = RGE_NEXT_RX_DESC(i)) 
{

1245 if (RGE_OWN(cur_rx))
1246 break;

1252 rxq = >rge_ldata.rge_rxq[i];
1253 m = rxq->rxq_mbuf;

1257 /* Invalidate the RX mbuf and unload its map. */
1258 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0,
1259 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD);
1260 bus_dmamap_unload(sc->sc_dmat, rxq->rxq_dmamap);

1283  * If allocating a replacement mbuf fails,
1284  * reload the current one.

1287 if (rge_newbuf(sc, i) != 0) {
1288 if (sc->rge_head != NULL) {
1289 m_freem(sc->rge_head);
1290 sc->rge_head = sc->rge_tail = NULL;
1291 }
1292 rge_discard_rxbuf(sc, i);
1293 continue;
1294 }

loop 'i' has the ability to range between 0 and 1023, and
accesses each ring entries rge_rxq.  if, over time, each 
value between 0 and 1023 triggers the rge_newbuf() failure
path, each successive entry will be lost, never to be 
replaced unless an explicit ifconfig down/up occurs.

hmmm, but in thie case, no buffers would should be set to
be available for rx, so nowthing should pass RGE_OWN() at
L1245 i'd hope.  i still see the problem with everything
being depleted, but then it should just stop getting any
rx packets at all...

networking folks, am i missing something here?  i see the
same problem in wm(4) as well.  if wm_add_rxbuf() fails,
where will this ring entry's mbuf ever be replaced again?


.mrg.


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0

this is from:

KASSERTMSG(offset < map->dm_mapsize,
"bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
offset, map->dm_mapsize);

the mapsize being zero indicates that there's nothing mapped
currently in this dma map, so there's nothing to sync.  ie,
the caller seems to be trying to sync something not mapped.

can you post the full back trace?


.mrg.


re: 10.99.9 amd64 panic

2023-10-05 Thread matthew green
i just commited what i believe is a fix for this problem, and for
another potential memory leak i saw from inspection.

seems to work for me on an amd64 host, been through several down/up
sequences, though i did not force the memory alloc failure directly.

(annoyingly, it takes 10-11s to regain link to my switch when doing
this down/up sequence.)

i'll prepare a pullup for netbsd-10, too.


.mrg.


re: 10.99.9 amd64 panic

2023-10-02 Thread matthew green
Martin Husemann writes:
> On Fri, Sep 29, 2023 at 09:52:42AM +, Chavdar Ivanov wrote:
> > Sep 29 01:53:13 ymir /netbsd: [ 228407.9443196] panic: kernel diagnostic 
> > assertion "offset < map->dm_mapsize" failed: file 
> > "/home/sysbuild/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 
> > 0x0
> [..]
> > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] bus_dmamap_sync() at 
> > netbsd:bus_dmamap_sync+0x326
> > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] rge_rxeof() at 
> > netbsd:rge_rxeof+0x179
>
> This is a bug in the rge(4) driver (unrelated to userland resource usage
> by the build), maybe a race triggered more easily when the system is
> under heavey load.

hmm, this seems like corruption to me.

> bus_dma.c", line 826 bad offset 0x0 >= 0x0

says that offset == 0 (which is right, this seem to this call):

1241   /* Invalidate the RX mbuf and unload its map. */
1242   bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0,
1243   rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD);

offset is the 0 / 3rd arg here, but the *second* 0x0 value here
seems to be corrupted, and shouldn't be zero.  ie, there's no
case where it will create a zero-length dma map, it should always
be either RGE_TX_LIST_SZ, RGE_RX_LIST_SZ, or RGE_JUMBO_FRAMELEN,
so for this assert to trigger saying the passed offset is beyond
the mapping, because the mapping is zero length, seems to be
pretty clear that the bus_dmamap_t has been corrupted.

the timing does seem to indicate that a problem with out of
memory may be relevant here..oh, i think i may see a problem.

1110 rge_newbuf(struct rge_softc *sc, int idx)
...
1126 if (bus_dmamap_load_mbuf(sc->sc_dmat, rxmap, m, BUS_DMA_NOWAIT))
1127 goto out;  
...
1151 out:
1152 if (m != NULL)
1153 m_freem(m);
1154 return (ENOMEM);

so, if bus_dmamap_load_mbuf() fails, we return ENOMEM, not
ENOBUFS.  however, the callers only consider ENOBUFS as an
error case:

1176 rge_rx_list_init(struct rge_softc *sc)
...
1184 if (rge_newbuf(sc, i) == ENOBUFS)
1185 return (ENOBUFS);

and

1212 rge_rxeof(struct rge_softc *sc)
...
1271 if (rge_newbuf(sc, i) == ENOBUFS) {

so in this case, the code thinks a buffer was allocated, but it
wasn't... i haven't gone deeping into what this may cause the
code to do wrong yet, but it seems problematic.

certainly, both callers should check for != 0, not == ENOBUFS,
to avoid this problem.


.mrg.


re: panic: assertion "!cpu_softintr_p()" failed

2023-10-01 Thread matthew green
Thomas Klausner writes:
> panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file 
> "/usr/src/sys/kern/subr_kmem.c", line 451
>
> gdb says:
>
> #10 0x80e3551e in vpanic (fmt=0x813a1880 "kernel %sassertion 
> \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xae2110a93e08)
> at /usr/src/sys/kern/subr_prf.c:286
> #11 0x80ffab6f in kern_assert (fmt=fmt@entry=0x813a1880 
> "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
> at /usr/src/sys/lib/libkern/kern_assert.c:51
> #12 0x80e27e15 in kmem_free (p=0x9afa82af5b80, size=64) at 
> /usr/src/sys/kern/subr_kmem.c:451
> #13 0x80df5960 in rw_obj_free (lock=0x9afa82af5b80) at 
> /usr/src/sys/kern/kern_rwlock_obj.c:127
> #14 0x80d825d3 in uvm_anon_release (anon=) at 
> /usr/src/sys/uvm/uvm_anon.c:385

i think this is a new bug.  this line changed from:

1.11 (ad   12-Sep-23):  pool_cache_put(rw_obj_cache, ro);

to

1.12 (ad   23-Sep-23):  kmem_free(ro, sizeof(*ro));

i guess it just should be kmem_free_intr(), as pool_cache
is intr-safe as well.


.mrg.


re: External display for ThinkPad W530

2023-09-19 Thread matthew green
Malte Dehling writes:
> Dear all,
>
> is there a way to get an external display to work on a ThinkPad W530?
> >From what I read, both the mini-dp and the vga connector work only
> with discrete graphics, which I have enabled in the BIOS
> (optimus/switching mode).  At boot I see these lines:
>
> [ 4.991148] nouveau0: NVIDIA GK107 (0e73c0a2)
> [ 4.991148] nouveau0: autoconfiguration error: error: bios: unable
> to locate usable image
> [ 4.991148] nouveau0: autoconfiguration error: error: bios ctor failed, 
> -22
> [ 4.991148] nouveau0: autoconfiguration error: unable to create
> nouveau device: 22
>
> Anyone know what the issue is?  With BIOS set to discrete only I see
> the same lines and then a kernel panic (no console.)
>
> Running xrandr shows VGA1 as disconnected even with a cable plugged in.
>
> So 2 questions: 1) Do I really need to use the discrete graphics or is
> there some other way? 2) How to get discrete graphics to work.
>
> Any help appreciated :)

i have a thinkpad P51 that has the same basic issue.  i've spent
many hours trying to figure out where the vbios for the nvidia
is, and haven't succeeded.  here's the heavily patched boot log
from my system where it tries all the ways:

nouveau0: NVIDIA GM206 (126360a1)
nvbios_shadow:232: method [name=]
nvbios_shadow:232: method [name=PRAMIN]
shadow_method:129: trying PRAMIN...
shadow_method:133: init gave err -19
nvbios_shadow:232: method [name=PROM]
shadow_method:129: trying PROM...
shadow_image:81: image 0 invalid
shadow_method:146: PROM: returning score 0
nvbios_shadow:232: method [name=ACPI]
shadow_method:129: trying ACPI...
shadow_method:133: init gave err -19
nvbios_shadow:232: method [name=ACPI]
shadow_method:129: trying ACPI...
shadow_method:133: init gave err -19
nvbios_shadow:232: method [name=PCIROM]
shadow_method:129: trying PCIROM...
linux_pci_map_rom:716: starting..
linux_pci_map_rom:722: mapped!
pci_find_rom:686: size 524288
pci_find_rom:705: magic wrong is 2
linux_pci_map_rom:736: failed!
pci_map_rom_md:684: entered
pci_map_rom_md:687: is display
shadow_method:133: init gave err -14
nvbios_shadow:232: method [name=PLATFORM]
shadow_method:129: trying PLATFORM...
shadow_method:133: init gave err -19
nouveau0: autoconfiguration error: error: bios: unable to locate usable image
nouveau0: autoconfiguration error: error: bios ctor failed, -22
nouveau0: autoconfiguration error: unable to create nouveau device: 22

this one seems to be missing it, but at some point i'd patched
the acpi nouveau code to try and it still failed (the logs above
may appear to show it, but the code isn't in that tree.)


AFAIK, the external ports on these laptops are only connected to
the nvidia GPU so it is absolutely necessary to use anything but
the built in display.  i'd love someone to figure this out :-)


.mrg.


re: Netbsd10_beta evbarm aarch64 userland build failure

2023-09-10 Thread matthew green
> nbmtree: .: missing directory in specification
> nbmtree: failed at line 1 of the specification

there must be something wrong in your build tree or src tree.
i updated and built this from a clean tree fine.

this should have been fixed with this pullup:

revision 1.175.2.1
date: 2023-09-04 10:33:28 -0700;  author: martin;  state: Exp;  lines: +3 -1;  
commitid: 2TUS7rO7f7zuGtDE;
Pull up following revision(s) (requested by riastradh in ticket #343):

to etc/mtree/special.  try making sure this is properly updated,
and perhaps clean the objdir for etc/mtree and/or the destdir
entirely.


.mrg


re: panic with AMD EPYC 7313P on 10.0_BETA

2023-08-29 Thread matthew green
Mark Davies writes:
> Trying to boot a Dell Power Edge R6515 that has an AMD EPYC 7313P  with 
> 10.0_BETA from a couple of day ago panics with:
>
>
> panic: kernel diagnostic assertion "rcr4() & CR4_SMAP" failed: file 
> "...sys/arch/x86/x86/patch.c"
>
> backtrace of:
> vpanic()
> kern_assert()
> x86_patch()
> cpu_boot_secondary_processors()
> main()
>
> Any suggestions what's going on and how to fix?

perhaps there's a bios setting you have to enable?

i've seen this before, and i just #if 0'd the panic since
i didn't have time to think about it then.

at worst, #if 0 will work around if while missing out on
some modern security features.


.mrg.


re: modesetting vs intel in 10.0

2023-08-29 Thread matthew green
> [ 1.051227] i915drmkms: preliminary hardware support disabled

this is a combo of the driver data for tiger lake (11th gen) having
"require_force_probe" set to 1 (our drm base), and the netbsd probe
code seeing this set and not matching properly.

there's nothing you're doing wrong, it just isn't enabled (it may
not work, i don't know.)  if you want try, edit 
sys/external/bsd/drm2/i915drm/i915_pci_autoconf.c to disable the
check at line 111.


.mrg.


re: MKCROSSGDB=yes broken in new gdb?

2023-08-13 Thread matthew green
> I pass it in LDFLAGS=-L${GMPOBJ}

?  this doesn't help gmp.h being missing... i don't know what is up
and for me, it works because pkgsrc gmp is installed.


.mrg.

> christos
>
> > On Aug 13, 2023, at 2:41 PM, matthew green  wrote:
> > 
> > FWIW, when i was looking at why my build worked it seems that
> > the build is thinking it's building against the tools gmp but
> > the -I path to find it is missing, but -I/usr/pkg/include is
> > so that for me i'm getting the host gmp.h, but it's linking
> > the tools libgmp.a.


re: MKCROSSGDB=yes broken in new gdb?

2023-08-13 Thread matthew green
FWIW, when i was looking at why my build worked it seems that
the build is thinking it's building against the tools gmp but
the -I path to find it is missing, but -I/usr/pkg/include is
so that for me i'm getting the host gmp.h, but it's linking
the tools libgmp.a.


re: What to do about "WARNING: negative runtime; monotonic clock has gone backwards"

2023-07-26 Thread matthew green
one problem i've seen in kern_tc.c when the timecounter returns
a smaller value is that tc_delta() ends up returning a very large
(underflowed) value, and that makes the consumers of it do a very
wrong thing.  eg, -2 becomes 2^32-2, and then eg in binuptime:

477 bintime_addx(bt, th->th_scale * tc_delta(th));

or in tc_windup():

933 delta = tc_delta(th);
938 th->th_offset_count += delta;
939 bintime_addx(>th_offset, th->th_scale * delta);

i "fixed" the time goes backwards on sparc issue a few years ago
with this change, which avoids the above issue:

   http://mail-index.netbsd.org/source-changes/2018/01/12/msg091064.html

but i really think that the way tc_delta() can underflow is a
bad problem we should fix properly, i just wasn't sure of the
right way to do it.


.mrg.


re: tweaks needed for 10 branch

2023-07-11 Thread matthew green
can you try commenting/removing this line (@L44 in -current) in
external/gpl3/gcc/usr.bin/Makefile.inc:

   CXXFLAGS+=   -std=gnu++98

i started seeing at least the gcc.c failure with GCC 10.5, and it
seems that the upstream build doesn't use this by default now, and
removing it fixed the build for me.


.mrg.


re: modesetting vs intel in 10.0

2023-07-10 Thread matthew green
> But maybe modesetting is mature enough (and intel bad enough)
> to warrant being the default for Intel GPUs.

i'm not familiar with the various intel chipsets, i've only had
a couple of them over the years and besides porting the kabylake
bits into the older drm version, i've not really touched it much.

but, you can adjust the list of drivers used by default here in
the xorg-server sources:

   hw/xfree86/common/xf86pciBus.c:xf86VideoPtrToDriverList()

where it has a "default:" case for intel of "intel", and if you
can properly figure out how to change this to "modesetting" for
the newer ones (only?) that would be fine by me.

(one way to handle this without having to patch this code would
be to install the intel driver as some other name, and then make
a copy of the "ati" front end called "intel" that loads either
the real intel driver or modesetting, depending.)


.mrg.


re: cpu temperature readings

2023-07-10 Thread matthew green
> > though NetBSD's cpu selection algorithm doesn't (yet anyway) really
> > understand processors like this.
>
> The scheduler did use first cores first, with performance cores
> using low cpu numbers, they should be utilized first but not
> necessarily for the important workloads.
>
> It now handles big.little configurations independent of cpu numbers,
> but probably only on arm.

our scheduler has a fast/slow CPU method only, so it handles
"HT" by saying the non-1st sibling is slow, and the 1st one
attached is fast, and for big.little/dynamiq it just marks
the big cores as fast and little cores as slow.  it then
prefers fast cores over slow cores, and it will typically
select lower cpu numbers once within the fast/slow zone.

eg, on rk3399, cpu4 and cpu5 are used first for most tasks
as they're the big cores, and cpu0 ends up getting a lot of
random interrupts, and cpu1-3 are idle unless you're using
more than 3 cores of CPU.

this means that the 3-level speed provided by the newer intel
client cpus is not handled by our code, and i believe it
means it will not give up and not attempt any special and
will thus just end up using cpu numbers.

i had a look at converting the "bool cpu_is_slow" in cpu_data
into an integer, but i didn't get far enough understanding
all the current uses to properly know where to start.  would
be great if someone where to have a look at this.

one hack to make thing work "sort of OK", would be to allow
this to have one thread of the e-cores as fast, and both the
other thread and the p-cores as slow.


.mrg.


re: How to recover a root partition with damaged boot blocks

2023-04-05 Thread matthew green
things to do:

- reinstall bootxx_ffsvN -- make sure you're installing the right
  ffsvN.  you can use "dumpfs  | head -2", and it should
  say FFSv1 or FFSv2 here.  that's "installboot" that you may have
  already done, but perhaps used the wrong one?

- re-copy /boot.  cp /usr/mdec/boot /

- re-copy your /netbsd (where ever it came from)

- uefi wants a MSDOS partition with /efi/boot/bootx64.efi, so if
  you haven't provided that it won't work.  if you have enough
  space at the start or end of the disk you probably can do this,
  as it only needs to be pretty tiny.  i did this on a system where
  root started at sector 2048, and i was able to create about 700KB
  file system, and bootx64.efi is only about 230KB.  it normally is
  ok with mbr _or_ gpt partitions here.

- check that the fdisk (gpt?) and disklabel are OK.  ie, run both
  "fdisk wd0" and "disklabel wd0" and compare to your working
  system, see if anything stands out.

HTH,


.mrg.

ps see "man 7 entropy" for how to fix the problem you observed.


re: Failure to build amd64 current

2023-03-20 Thread matthew green
> `./build.sh -j 6 -u -x -U -o -T ../obj/tooldir.NetBSD-9.3-amd64 release 
> install-image'

have you tried without "-o"?  that might be the trigger here.
it should work, but maybe it's broken in the src/compat build.

thanks.


.mrg.


re: GENERIC64 aarch64 failure to autoboot

2023-03-05 Thread matthew green
Chavdar Ivanov writes:
> On Sat, 4 Mar 2023 at 23:30, Michael van Elst  wrote:
> >
> > ci4...@gmail.com (Chavdar Ivanov) writes:
> >
> > >Since my last aarch64 build yesterday, 03/03/2023, my machine no
> > >longer boots automatically,
> >
> > sys/arch/evbarm/fdt/fdt_machdep.c 1.100
> >
> > changed how the boot disk is determined. Apparently it now fails for you.
>
> That's right, I rebuilt it with 1.99 and it now boots as before.
>
> I guess I'll file a pr.

on the system that didn't auto-boot properly, can you answer the ask
root prompt dk1 like it should, and once it is booted up, show the
result of "drvctl -p dk1" and also "ofctl -p /chosen"?

that should help narrow what's going wrong here.  i'm guessing that
netbsd,gpt* are wrong some how, but we'll see..

thanks.


.mrg.


re: AMDGPU Driver patches/bugs

2023-02-24 Thread matthew green
thanks for your patches and help, Jeff!

Taylor R Campbell writes:
> > Date: Tue, 21 Feb 2023 13:20:13 -0800
> > From: Jeff Frasca 
> > 
> > I was going to try the radeon driver again, because I want to see if
> > my wayland compositor works better against it than the AMDGPU driver
> > (I'm getting some weird corruption problems with my compositor that
> > do not happen under Linux, but that's probably my code).
>
> We have seen other weird minor graphics corruption problems with X,
> even with xcompmgr or picom running.  I probably made another stupid
> bug, maybe in cacheability attributes or something, buried somewhere
> in the megabytes of diffs...

i see corruption with radeon and bios boot on a ryzen 5600G
system.  (this is one that fails the ring3 (?) test with UEFI,
and even with "CSM" in the bios enabled, still attempts to 
load our uefi boot program, which then fails cuz it's in BIOS
mode and hangs the boot.  with no msdosfs visible to UEFI it
boots fine in CSM mode.)  i do *not* recall seeing it on my
older systems (haswell, earlier ryzen.)

i'll try out my amdgpu's next week some time.


.mrg.


re: Difference between i915drm and i915drmkms

2023-01-07 Thread matthew green
the old drm code for i915 is probably extremely obsolete at this
point.  i don't think it works on anything that current does
(or least, before the latest refresh -- i think there are still
a couple of blank screens, but i think newer than this code would
support anyway.)

the only reason i haven't removed it all is that for old radeon
(R100/R200), some systems can't use new drm and you end up with
both a black-on-black (or similarly unusable) console setup, and
X doesn't work anyway.  there's some problem with LUT setup in
the current code, but there are no public docs and no one with
access to them cares.

removing from configs is probably a decent idea at this point.


kre, this drm hasn't been the "main" drm since july 2013, we've
had linux 3.8, 4.4, and now 5.6 based drm (all have the same
failure mode.)


.mrg.


re: binutils still failing on amd64

2023-01-01 Thread matthew green
Robert Elz writes:
> I wonder if perhaps part of the reason (or perhaps all of it) that
> Paul and I see problems, where others aren't, is that we are both
> building from a read only mounted source tree.

oh yeah - this is only going to break r/o src tree builds, which is
also something i use as much as possible.

i recommend r/o src trees for all netbsd src builds.  my random build
failed issues became far less common when i did that decades ago.

> Eg: from Paul's error log:
>
> Making info in po
>GEN  
> /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc/bfdver.texi
> x86_64--netbsd-install: 
> /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc:
>  chown/chmod: Read-only file system
> sh: cannot create 
> /build/netbsd-current/src_ro/tools/binutils/../../external/gpl3/binutils/dist/bfd/doc/bfdver.texi:
>  read-only file system
>
> which indicates something is trying to make files in the source
> tree, instead of the obj tree.

this specific instance should now be fixed.

> The errors I'm seeing are different, but could have the same underlying
> cause.

what are you seeing?  can you update and post the latest failures?


.mrg.


re: binutils still failing on amd64

2023-01-01 Thread matthew green
> > Sources updated to 2022-12-31 at 13:42:04 UTC and all output dirs (obj,
> > release, dist, tools) were cleaned.
>
> Is no-one else seeing this problem with ``build.sh tools'' ?

it's not seen by most because it depends upon the timestamps of
some files..  my first attempt to fix it failed, i haven't gotten
back to looking.

try manually touching any of the files the build is trying to
update for now.


.mrg.


re: 10_BETA: Nice QOL improvements to the installer

2022-12-24 Thread matthew green
nia writes:
> On Thu, Dec 22, 2022 at 08:05:02AM +0530, Mayuresh wrote:
> > On Thu, Dec 22, 2022 at 06:18:41AM +1300, Lloyd Parkes wrote:
> > > I used the second (non-BIOS) image because I guessed it might be a hybrid
> > > installer. I think that my old NUCs only support BIOS booting from USB
> > > sticks, but I could easily be wrong.
> > 
> > Ok. So, it appears the -bios image has become redundant now. Or hasn't it?
> > 
> > If yes, they may want to stop building it to preempt such confusion.
>
> The BIOS-only image exists because of broken firmware.

i have a system that still seems to load efiboot when configured in
CSM-enabled mode.  the only way to get it to load bootxx/boot was
to move bootx64.efi away.  i just discovered that this week.

when CSM-enabled, efiboot would then print the memory map and hang.

this system is also affected by PR#56714 -- and with bios booting
working [*], radeon accel is also (mostly) working.


anyway, what i suspect these broken systems do is still load efiboot
and then efiboot is in some environment it doesn't handle well and
then hangs... this is a guess.

thanks.


.mrg.

[*] - keyboard access in /boot is broke on my installed system but
seems to work on the USB with the bios image.  each key press ends
up generating 15-20 actual characters.


re: libX11 updated, fvwm, etc., hangs perhaps fixed now

2022-11-13 Thread matthew green
"John D. Baker" writes:
> I've updated to sources containing the new libX11 and rebuilt "wm/fvwm"
> without the patches posted in:
>
>   https://mail-index.netbsd.org/pkgsrc-users/2022/10/17/msg036348.html
>
> and the resulting fvwm appears to work properly.
>
> Thanks!

great news.  thanks for testing!

> The patches were added to pkgsrc-HEAD.  I suppose they can be removed
> now.

as i understand it, they probably should remain as the fixes
in libX11 are considered workarounds for buggy code.


.mrg.


HEADS UP: build break in xsrc update builds coming your way

2022-11-11 Thread matthew green
hi folks.


FYI: i just added this note to UPDATING:

2022:
The new libdrm import worsened the conflict issues for the
kdump/ktruss ioctl, and i915 now conflicts with base, and has
been turned off.  This will cause update build issues like:

kdump-ioctl.c:12175:143: error: 'DRM_IOCTL_I915_DESTROY_HEAP'
   undeclared here (not in a function);
   did you mean 'DRM_IOCTL_MODE_DESTROY_DUMB'?

You'll need to clean usr.bin/ktruss, usr.bin/kdump, and rescue.


there are a few other things updated, please send-pr if you see issues.


.mrg.


libX11 updated, fvwm, etc., hangs perhaps fixed now

2022-11-10 Thread matthew green
hi folks.


the newly released libX11 1.8.2 claims to fix issues in fvwm,
xfce, and some motif stuff, related to hanging because of the
thread safety changes.

i know some problems were fixed, but this should now make the
old binaries work again.  i've merged into -current.

if you have something still problematic, or have been avoiding
using old binaries, it would be great to hear things work for
you again now.

thanks.


.mrg.


re: How to BIOS-boot from NVMe device?

2022-09-08 Thread matthew green
> >  > If anyone wants to play with UEFI booting and has access to a recent Xen
> >  > DOM0 system you can install the pkgsrc/sysutils/ovmf package and point a
> > pkgsrc/sysutils/ovmf does not build on -current at least - and hasn't been 
> > building for a long while. 
>
> Hmm... unfortunate...  it does build just fine on 9.2ish from 2022Q2
> pkgsrc.

this just built for me on a ~3-day old -current src & pkgsrc
system.  Chavdar, hoe does it fail for you?


re: current USE_SSP=yes build failure

2022-08-12 Thread matthew green
i've commited my fix for this after testing it.


.mrg.


re: current USE_SSP=yes build failure

2022-08-01 Thread matthew green
rudolf writes:
> Hi,
>
> I have "USE_SSP=yes" in mk.conf and the build is failing with:
>
> --- dependall-drivers ---
> /usr/xsrc/external/mit/xorg-server/dist/hw/xfree86/drivers/modesetting/drmmode_display.c:
>  
> In function 'drmmode_crtc_gamma_set':
> /usr/xsrc/external/mit/xorg-server/dist/hw/xfree86/drivers/modesetting/drmmode_display.c:1768:1:
>  
> error: stack protector not protecting local variables: variable length 
> buffer [-Werror=stack-protector]
>   1768 | drmmode_crtc_gamma_set(xf86CrtcPtr crtc, uint16_t * red, 
> uint16_t * green,
>| ^~
>
> Is this to be expected? Am I doing something wrong? The function itself 
> is very simple.

ah, this comes from the call this function makes:

if (drmmode_crtc->use_gamma_lut) {
drmmode_set_gamma_lut(drmmode_crtc, red, green, blue, size);

which is:

drmmode_set_gamma_lut(drmmode_crtc_private_ptr drmmode_crtc,
  uint16_t * red, uint16_t * green, uint16_t * blue,
  int size)
[ ... ]
struct drm_color_lut lut[size];


i'll figure out a fix or workaround.  thanks.


.mrg.


re: FYI: new X server in -current, among other X things

2022-07-24 Thread matthew green
> > (1) out of bounds problem in xserver/hw/xfree86/modes/xf86Crtc.h
> > 
> > OpenBSD/luna88k maintainer (Kenji Aoyama) reported the following fix
> > was neceesary for non-XFree86 driver based dumb server (on luna88k etc.):
> >  https://gist.github.com/ao-kenji/afb0ea5b6dca04975161f84ab41ba32b
> >  https://gist.github.com/ao-kenji/b0fd6b876605ba1b2b43309233566153
> >  
> > https://cvsweb.openbsd.org/cgi-bin/cvsweb/xenocara/xserver/hw/xfree86/modes/xf86Crtc.h#rev1.16
>
> I turns out that at least luna68k Xorg server (happens to?) works
> without this change, but anyway upstream 1.22.x branch already
> has this fix:
>  
> https://gitlab.freedesktop.org/xorg/xserver/-/commit/75d70612888f18339703315549db781a22c0cb23
>
> I wonder if we should pull this fix or not for our (1.)21.1.4 tree..

this looks simple enough to just do.

> > (2) "-flipPixels" option removal
> > 
> > "-flipPixels" option (that inverts black and white on 1bpp server)
> > has been removed since 1.21.
> >  
> > https://gitlab.freedesktop.org/xorg/xserver/-/commit/d1c00c859c6676fbb540420c9055788bc19cb18f
> > 
> > As noted in the log the upstream authors claim
> > "No supported driver supports 1bpp anymore, nor has in a very long time."
> > 
> > Howeverwe we still have several working servers (xf86-video-wsfb based
> > servers on mac68k and luna68k, monolithic servers for sun3 and x68)
> > and at least there was a report that this option was mandatory on SE/30.
> > So I would like to revert this change.
>
> It also turns out that the above changes also remove a menber from
> ScrnInfoRec structure in hw/xfree86/common/xf86str.h and it breaks
> ABIs of xf86-video-* drivers.
>
> However fortunately the removed member "Bool flipPixels" in the
> SrcnInfoRec has not been used for -flipPixels options so we can
> safely pull back -flipPixels support by reverting the changes
> except xf86str.h.
>
> If there is no particular comments I would like to commit the
> attached (reverting -flipPixels removal) patch.

go for it.  we have a few things reverted, we maybe should
talk to upstream to have them either revert there or at least
provide the removed features elsewhere.

thanks.


.mrg.


re: FYI: new X server in -current, among other X things

2022-07-20 Thread matthew green
Robert Swindells writes:
> 
> I wrote:
> > It looks like not all the functions are getting setup in the glamor
> > struct by load_glamor(), I'm guessing because those functions are
> > not exported by libglamoregl.so.
> >
> > Do we need to add more source files to this:
> >
> > src/external/mit/xorg/server/xorg-server/hw/xfree86/glamor_egl/Makefile
>
> Adding all of the glamor modules to libglamoregl.so makes it stop
> crashing for me.

can you send a patch?  i'll look at it soon.


.mrg.


re: FYI: new X server in -current, among other X things

2022-07-18 Thread matthew green
Robert Swindells writes:
> 
> I wrote:
> >
> >>> [   378.033] (EE) 0: /usr/X11R7/bin/X (xorg_backtrace+0x44) [0x1467d46d5]
> >>> [   378.033] (EE) 1: /usr/X11R7/bin/X (os_move_fd+0x79) [0x1467d0465]
> >>> [   378.033] (EE) 2: /usr/lib/libc.so.12 (__sigtramp_siginfo_2+0x0) 
> >>> [0x75b46379c930]
> >>> [   378.034] (EE) 
> >>> [   378.034] (EE) Segmentation fault at address 0x0
> >>> 
> >>> This happens with ctwm as part of the base installation, as well as with
> >>> other pre-existing window managers and such from pkgsrc built against
> >>> 9.99.97.
> >>
> >>can you configure X to generate a core dump or run it
> >>under GDB and get the real stack trace?  i thought we'd
> >>fixed this problem in libexecinfo, but it's still not
> >>tracing through the SEGV above, so finding what is
> >>crashing where is what we need next.
> >
> >FWIW, I get the same on my Pinebook with a lima kernel, this may not be
> >i915 specific.
> >
> >Doing a full debug build now.
>
> Building with MKDEBUG=yes stops it crashing, but it also stops glamor
> from working.
>
> I guess it is back to printf().

with a normal build, you should at least be able to get
a stack trace with function names, if not line numbers.

you'll have to disable the xorg SEGV catcher... oh they
seem to have removed that entirely:

commit c7414f4d07b69a4b2f0d0af06f032393cf5fe6aa
Author: Adam Jackson
Date:   Wed Aug 22 14:57:05 2018 -0400 

xfree86: Remove NoTrapSignals 

This was dangerous on UMS and largely pointless on KMS.


have you tried running the (non-debug) one from inside
gdb as well, that should also give you something.


.mrg.


re: panic in evo_wait

2022-07-18 Thread matthew green
> > > [184218.xxx] fatal page fault in supervisor mode
> > > [184218.xxx] trap type 6 code 0x2 ...
> > 
> > this line's contents would have included the fault address,
> > which is kinda useful for next time :-)
>
> I've got the rip -- it's 0x8095e177.

oh - i was after the "cr2" value -- the actual fault address,
not the code address that triggered it.

your patch looks good.


.mrg.


re: panic in evo_wait

2022-07-17 Thread matthew green
> [184218.xxx] warning: 
> /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
>  1

can you patch this code to print the value of "data" here?
it's probably a bad request for userland, but the BUG_ON()
here does not give you any indication on _what_.

> [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
> [184218.xxx] fatal page fault in supervisor mode
> [184218.xxx] trap type 6 code 0x2 ...

this line's contents would have included the fault address,
which is kinda useful for next time :-)

> [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
> 0xb589296452c0
> kernel: page fault trap, code=0
> Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
> 000,0(%rdx,%rax,1)
> evo_wait() at netbsd:evo_wait+0x7b
> base507c_ntfy_set()
> nv50_wndw_flush_set()
> nv50_disp_atomic_commit_tail()
> nv50_disp_atomic_commit()
> drm_atomic_helper_set_config()
> drm_mode_setcrtc()
> drm_ioctl()

can you find out where evo_wait+0x7b is?  in my kernel it's
at line 243, and the disasm seems to patch your "movl" above.

235 evo_wait(struct nv50_dmac *evoc, int nr)
236 {
237 struct nv50_dmac *dmac = evoc;
238 struct nvif_device *device = dmac->base.device;
239 u32 put = nvif_rd32(>base.user, 0x) / 4;
240
241 spin_lock(>lock);
242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
243 dmac->ptr[put] = 0x2000;
244 evo_flush(dmac);

Dump of assembler code for function evo_wait:
   0x8084dfe1 <+0>:   push   %rbp
[...]
   0x8084e05c <+123>: movl   $0x2000,(%rdx,%rax,1)

(0x7b = 123)

probably "dmac->ptr" is invalid here.  a quick guess at the
code indicates it's only set once in nv50_dmac_create(),
the source from the caller(s).  at least, i can't see it
set anywhere else right now.


.mrg.


re: FYI: new X server in -current, among other X things

2022-07-17 Thread matthew green
> can you post the whole Xorg.0.log somewhere?  most of
> my i915 systems have become non-functional the last few
> years, but i have one system to test.

unfortunately, my system (kaby lake, GT 630) seems to work
fine with xorg-server 21.1.4 for me.


re: FYI: new X server in -current, among other X things

2022-07-16 Thread matthew green
> TL;DR: after upgrading via the sets available from releng builds from
> July 16th (http://releng.netbsd.org/builds/HEAD/202207160630Z) I'm not
> able to start X on amd64 with i915 graphics. Separately, there may be
> issues with libX11 1.8.1 where clients will hang due to recursive locks
> occurring.

the libX11 thing is pretty terrible.  upstream says that
_not_ enabling it means other things are broken.  i don't
know anything better than fixing the clients i guess,
which is pretty terrible for backwards compat code/binaries.

> [   378.033] (EE) 0: /usr/X11R7/bin/X (xorg_backtrace+0x44) [0x1467d46d5]
> [   378.033] (EE) 1: /usr/X11R7/bin/X (os_move_fd+0x79) [0x1467d0465]
> [   378.033] (EE) 2: /usr/lib/libc.so.12 (__sigtramp_siginfo_2+0x0) 
> [0x75b46379c930]
> [   378.034] (EE) 
> [   378.034] (EE) Segmentation fault at address 0x0
> 
> This happens with ctwm as part of the base installation, as well as with
> other pre-existing window managers and such from pkgsrc built against
> 9.99.97.

can you configure X to generate a core dump or run it
under GDB and get the real stack trace?  i thought we'd
fixed this problem in libexecinfo, but it's still not
tracing through the SEGV above, so finding what is
crashing where is what we need next.

does it happen when X starts up?  maybe it crashes with
plain running "X" without any arguments (ie, not using
some frontend that will also fire up clients etc.)

can you post the whole Xorg.0.log somewhere?  most of
my i915 systems have become non-functional the last few
years, but i have one system to test.


.mrg.


FYI: new X server in -current, among other X things

2022-07-14 Thread matthew green
hi folks.


i've updated most of xsrc to their latest versions.
fontconfig and Mesa are remaining.  i've tested the
new code on amd64 and arm64, and built several ports
to confirm they still build.  the biggest change is
the new xorg-server.

there are probably a few build issues left to find
across all ports, and perhaps some run-time ones too
but basic testing looks fine for me.

please send-pr or email here if you find problems.

thanks!


.mrg.


re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot

2022-07-10 Thread matthew green
> (but I'm nots sure 64KB blocksize is valied on FFS because
>  newfs(8) man page just says 4KB-32KB for it)

FWIW, i've been using 64K block *and frag size FFS for over
a decade without any problem, on a file system that almost
always has extremely large files on it.

so, this should be fixed in the manual i guess.


.mrg.


re: savecore weirdness

2022-07-05 Thread matthew green
> I've tried overwriting the first 100MB of the 'dp' entry in my fstab
> with zeroes in the hope of getting rid of the crashdump, but that
> didn't help either. How can I get rid of the crashdump so savecore
> doesn't try again to write it out?

martin answered this, but to answer differently, the core dump
is stored at the *end* of the dump partition, so clearing it is
kind of annoying -- you have to work out "dumplo", and then 
count backwards from the end, etc.

the purpose is that swap starts at the start, and dumps are at
the end, so, if savecore needs to swap, it hopefully won't
overwrite the not-yet-read dump data.


.mrg.


re: effective use of blkdiscard(8)?

2022-06-25 Thread matthew green
nia writes:
> blkdiscard(8) seems like a command in -current that's useful for regular
> maintenance of SSDs.
>
> I would assume that a regular run of:
>
> blkdiscard -v /dev/rwd0d
>
> would be useful to TRIM an entire SSD, obviously destructively, so would
> be useful when reinstalling NetBSD.

correct.

> However, what about less obvious cases?
>
> A large file could be created, for example, with dd:
>
> # dd if=/dev/zero of=./testfile bs=4m count=1
>
> Then discarded:
>
> # blkdiscard -v ./testfile
>
> Would this effectively mark 40GB of this drive unused to its controller?

that's my understanding, given that the questions you ask below
are handled to be "yes".  it certainly does seem to be the case
in my testing -- sometimes reading this data would return the
same data, sometimes random data (!? where from?), and most of
the time, zeroes -- so it certainly was triggering the storage.

an additional thing we could/should add is "fstrim", which is
designed to be run eg, weekly, and the idea is to tell the disk
to discard all the unallocated sectors on the disk, which would
give you the above feature without having to do anything, and
infact give it to you for all the unused space.  i have not
looked at how linux implements this, but it clearly needs the
file system itself to implement the backend.

> How good are we at propgating TRIM commands through various block device
> layers?
>
> Is fdiscard() effective on a file on FFSv2 on a cgd(4) on a dk(4) wedge?
>
> What about ZFS on a dk(4) on a cgd(4) on a dk(4) wedge?

all of this depends upon what their driver 'd_discard'
method does.  the only ones that are not assigned to be
"nodiscard" are:

/home/src/current/src/sys/dev/ld.c:95:  .d_discard = lddiscard,
/home/src/current/src/sys/dev/ld.c:110: .d_discard = lddiscard,
/home/src/current/src/sys/dev/ld.c:123: .d_discard = ld_discard
/home/src/current/src/sys/dev/ata/wd.c:154: .d_discard = wddiscard,
/home/src/current/src/sys/dev/ata/wd.c:171: .d_discard = wddiscard,
/home/src/current/src/sys/dev/ata/wd.c:230: .d_discard = wd_discard
/home/src/current/src/sys/dev/dkwedge/dk.c:125: .d_discard = dkdiscard,
/home/src/current/src/sys/dev/dkwedge/dk.c:140: .d_discard = dkdiscard,

so the vast majority of disk drivers do not support
this yet.


.mrg.


re: Potential iostat output format change

2022-06-07 Thread matthew green
k...@munnari.oz.au writes:
> Anyway, let me know what you think - is this worth finishing, or will
> the changes break people's scripts (or something similar) - or do you
> just prefer it the current way.

i like this.

i find the ordering of the default output has the same problem,
and had vaguely been considering looking at at least the column
size problem, but i really like the idea of re-ordering the
columns so they're more visually separate.

thanks.  please finish it.


.mrg.


re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-25 Thread matthew green
[ .. ]
> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that
> after installing to a qcow2 disk any attempt to boot the disk results in
> not being about to find the boot device.  However, the boot log shows

was this between 2022-05-08 and 2022-05-22?  i accidentally
broke some types of bootable images that Jared fixed, and
i think this error matches the failure seen.


.mrg.

https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html
https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html


re: Radeon HD 5450?

2022-05-16 Thread matthew green
Phil Nelson writes:
> On Wed, 11 May 2022 11:15:42 +1000
> matthew green  wrote:
>
> > do you have anything else handy to test?  gpus are crazy stupid
> > prices these days :-(
>
> Hi Matthew,
>
>   My department has several nvidia around and I have not yet found
> one that works to the point of getting X running.  I've tried
> the following:
>
>MSI GEFORCE GTX 1060
>GIGABYTE GEFORCE GTX 1650
>EVGA GEFORCE GTX 1018Ti (not enough power)
>An older Radeon I had sitting around, not sure which one but
> it blew up in the same place as the 5450 ... not mapping
> the BIOS.
>The video chip on the motherboard ... it finds it as
> acpivga0 with acpiout0 to acpiout7.  It finds a genfb0
> and labels it "Intel Rocket Lake UHD Graphics 750 (32EU) (rev. 0x04)
> It then reports drm at genfb0 not configured.  I do get a
> working wscons with 4 screens.  "X -configure" quits with
> an error saying that the number of created screens does not
> match number of detected devices.  In the Xorg.0.log when
> it probes for the Intel integrated Graphics Chipsets it
> doesn't list the 750 and it doesn't match it.
> I don't have heavy gpu requirements so if I could get the
> intel UHD graphics working, that would be good.
>
>   You said you have a working nouveau 730.  I'll see if I can
> acquire one of those to try.  Any specific card you recommend?

i have asus 730 and asus 1030 silent cards both working for me
in my two main desktop systems now.  the 730 did once assert(3)
in libdrm_nouveau and X exited, and the 1030 has one had some
minor display damage (green dots over the root window, likely
generated by my green-on-black terminal, but cleared by simply
moving a window over that space), and it's only been a couple
of weeks using the 730, and few days for 1030.

i don't have anything newer/better due to prices, and also cuz
the above are more than sufficient for my needs.

it's possible that back porting the rocket lake code wouldn't
be too difficult -- that was true a few years back when i did
this for kabylake when skylake was already supported... quick
peek says that RKL appeared right after our drm, sometime between
linux 5.6 and 5.10, and unfortunately, this struct:

static const struct intel_device_info rkl_info = {

in the new code has a couple of new members inside struct
intel_device_info{} than our code, so the back port would need
to consider these parts too.


.mrg.


re: Trendnet TEW-648UBM detected as ugen not urtwn

2022-05-13 Thread matthew green
Brook Milligan writes:
> I am trying to use a Trendnet TEW-648UBM usb wifi dongle, which is
> supposed to be recognized by the urtwn driver.  However, it is
> recognized as a ugen device, instead.
>
> [   2.9586490] ugen0 at uhub1 port 1
> [   2.9586490] ugen0: Realtek (0x20f4) 802.11n WLAN Adapter (0x648c),
> rev 2.00/2.00, addr 3
>
> I am not sure how to extract relevant information from the device.  For
> example, what usb tools should be used to figure out why this is not
> recognized by urtwn?

i guess see urtwn_devs[] in if_urtwn.c.  it has no entry
for this ID (0x648c) (or does usbdevs at all.)

ie, add to usbdevs, make -f Makefile.usbdevs; add the new
id string to if_urtwn.c.  test.  commit the usbdevs file,
regen usbdevs*.h again (with the updated rcsids), and then
commit the changes to usbdevs*.h and if_urtwn.c.

hopefully it's actually still a urtwn(4).  :-)


.mrg.


re: Radeon HD 5450?

2022-05-10 Thread matthew green
Phil Nelson writes:
> Hi All,
>
>I've been trying to get -current running on a new Dell Precision
> 3650.  It is a UEFI boot only machine and when booting -current
> with a Radeon HD 5450 installed (which works great on 9.2 on
> an Dell Optiplex 7040) it panics when it can't find the Radeon BIOS.
>
> The messages at this point are:
>
>   kern info: [drm] register mmio base: 0x7090
>   kern info: [drm] register mmio size: 131072
>   {drm:netbsd:radeon_get_bios+0x480} *ERROR Unable to locate a BIOS ROM
>   radeon0: autoconfiguration error: error: Fatal error during GPU init
>   radeon0: autoconfiguration error: unable to register drm: 22
>   panic: cnopen: no console device
>   ...
>
> Is the 5450 too old a device for the UEFI boot only machine or 
> is there a way to get the BIOS address for the autoconfiguration?

i can't easily check for a couple of weeks, but i have a system
i think i had to use UEFI for that had a 5450.  it didn't fail
entirely like the above, it failed the "ring 3" test, and
disabled acceleration.  this mean eg, X worked ok, but many things
use a lot of CPU.  fortunately, this is a zen3 system so it's got
a lot of CPU -- would cost about 1.5 cpus to play a 1080p video.

this system has a nouveau 730 in it now, and everything is better
except one time libdrm_nouveau triggered an assert() and X crashed.
(an operation that should have a resource available didn't have it,
and the assert() tripped this wanted invariant.  i don't have the
details handy.)

the 5450 is old enough that while pcie shouldn't have these sorts
of problems, i've had modern systems fail with pcie gpus, and i've
had newer gpus fail in older pcie systems -- i believe it was the
radeon RX 550 that caused my (old) core2 system to not boot.

do you have anything else handy to test?  gpus are crazy stupid
prices these days :-(


.mrg.


re: Supported graphics (in HEAD)

2022-05-09 Thread matthew green
>   Radeon RX 550 (HDMI, DP, and DVI with a DVI to HTML converter)

FWIW, i put my RX 550 into my test box yesterday and ran my
basic stress test -- 12 glxgears tiled separately and then
playing a movie on top of it.

it failed.  the GPU resets itself a few times, there's severe
display corruption, and usually a reboot is needed to get
the system back.

i don't know if simple usage will work better, but there are
some significant bugs left here for us to find..  these are
the older bugs from the new drm branch on github before the
merge, in case anyone wants to look at them:

https://github.com/riastradh/netbsd-src/issues/24
https://github.com/riastradh/netbsd-src/issues/28
https://github.com/riastradh/netbsd-src/issues/42

(#42 appears to be the same problems at 24 and 28.)


.mrg.


re: Supported graphics (in HEAD)

2022-05-08 Thread matthew green
Tom Ivar Helbekkmo writes:
> Robert Elz  writes:
>
> > Any advice?
>
> Well, in my experience, nvidia is probably something you only want if
> you have lots of RAM in your workstation.  In HEAD, there's a lot of
> memory leaking going on - every change to the image on the monitor leaks
> kmem-04096 items, and on my 1920x1080 monitor, watching a youtube video
> in firefox leaks 2-300 of those per second.
>
> Of course, I only notice because I have a mere 4 GiB of RAM in this
> workstation, which is more than plenty for the first couple of hours of
> work (firefox and a few terminal windows, using the browser as little as
> possible, and completely avoiding video), but demands a daily reboot.

can you file a PR about this?  i don't see the problem on a
750 or 730 cards.  i don't have anything newer yet.  (well,
there's a 9x0M in a laptop, but i haven't managed to get any
drm to find the video bios for that one and work.)

there are likely some dtrace methods we can use to find the
leak you're seeing, but it might be good to keep it all in
the PR :)

thanks.


.mrg.


re: Supported graphics (in HEAD)

2022-05-07 Thread matthew green
Robert Elz writes:
> Date:Sat, 07 May 2022 14:28:12 +1000
> From:    matthew green 
> Message-ID:  <16731.1651897...@splode.eterna.com.au>
>
> Thanks for the reply.
>
>   | the GTX 16xx are both in the recent supported list for
>   | new drm,
>
> Thanks, I might try one to see.   But where is that list?
> I searched everywhere I could think of, and could not find it.

it's not obvious.  i usually start with the PCI frontend that
points to a list of pciids, and then you have to match those
to product names.

nvidia is actually a little easier because we support everything
upto the latest GTX 30 series.

eg, sys/external/bsd/drm2/nouveau/nouveau_pci.c:

 * NetBSD drm2/5.6 doesn't support Ampere (GTX 30 series) based cards:
 *   0x2080-0x20ff  GA100
 *   0x2200-0x227f  GA102
 *   0x2300-0x237f  GA103
 *   0x2480-0x24ff  GA104
 *   0x2500-0x257f  GA106
 *   0x2580-0x25ff  GA107
 *
 * TU116 (GTX 16xx) occupies the space from 0x2180-0x21ff.

for radeon sys/external/bsd/drm2/radeon/radeon_pci.c:

radeon_pci_lookup(const struct pci_attach_args *pa, unsigned long *flags)
...
if ((PCI_VENDOR(pa->pa_id) == radeon_device_ids[i].vendor) &&
(PCI_PRODUCT(pa->pa_id) == radeon_device_ids[i].device))

so then you have to find radeon_device_ids[] and realise it's setup
with the list in "radeon_PCI_IDS".

for amdgpu the list is directly in dist/drm/amd/amdgpu/amdgpu_drv.c.

i don't have a good solution for mapping pciids to products, but
searching the internet usually finds stuff.

>   | but the Radeons are not there (these are Navi
>   | 2x GPUs, and new drm only went to Navi 1x.)
>
> How about the RX 550 ?   I forgot that one when I sent the message
> yesterday.

RX 550 mostly works.  it's been a while since i tried and this is
actually a card i have..

>   | i don't know how well they work thought, so if you can
>   | find something older, like geforce 700 series,
>
> Does the RTX T400 or T600 count as older?   I had assumed not,
> but I know less than nothing about any of this.

T400 and T600 are maybe supported.  they live in the very most
recently supported list for nvidia, being Turing chipsets, so
they should at least attempt to attach and work, but i've only
heard of someone attempting the previous generation (these are
the same chips as GTX 20 series.)

> Of course if I could find the "supported" list(s) (as applicable
> to current HEAD) I might be able to answer these questions for
> myself.   Supported means by both the kernel & X server (base
> or pkgsrc) naturally.
>
>   | my 2c.
>
> Thanks, worth more than that.
>
> I was kind of hoping (dreaming) that someone might say "If you
> really don't care about acceleration" (I don't) "then just disable
> x using userconf" (needing to build a custom kernel fine as well)
> "and it should just work" (for any one of the 3 possible gpu types).

nia's answer here should be useful :-)


.mrg.


re: Supported graphics (in HEAD)

2022-05-06 Thread matthew green
> What I need from the new one is no different than I needed
> then, a flat frame buffer, capable of supporting 3 high res
> monitors (3840x2160, 1440x2560 (portrait mode), and 2560x1080.)

it's the 3840x2160 that makes the older cards not potential
for your requirements -- they're max at 2560x1440 IIRC.

> The oldest addin graphics cards avaikable to me are:
>
>   Radeon RX 6500
>   Nvidia GeForce GTX 1650
>
> but those don't really offer suitable monitor connections.
> There are Nvidia 1030's listed, but all "not available".
>
> Next are:
>
>   Nvidia GeForce GTX 1660
>   Radeon RX 6600
>
> Which look as if they might be workable, if supported.

the GTX 16xx are both in the recent supported list for
new drm, but the Radeons are not there (these are Navi
2x GPUs, and new drm only went to Navi 1x.)

i don't know how well they work thought, so if you can
find something older, like geforce 700 series, that is
likely to work better (i have a 730 in one of my systems,
and besides tripping on a libdrm_nouveau assert once --
which mean X crashed unfortunately -- it has been fine.)

my 2c.


.mrg.


re: Stable names for USB serial adapters

2022-04-30 Thread matthew green
> Perhaps you, like me, are frustrated that USB serial devices can get
> enumerated in non-deterministic ways, which makes putting those device
> names in configuration files (such as /etc/remote) less than useful.
>
> I threw together a little devpubd hook to fix this problem for those
> adapters that have serial numbers (FTDI devices seem to reliably have
> these):
>
>   https://www.netbsd.org/~thorpej/99-ucom-symlinks
[ .. ]

this works great!  if i have serialnumbers in my ucoms :-(
out of 20 devices, i have 3 with serial numbers, leaving me
with 22 ucoms without a stable name (5 dual port devices.)

tempted to suggest we include something like this in src,
i just wish it could work better for me.  i just spent far
too long making them attach in the same order in a new
machine using hard coded kernel config.. oh well.

thanks!


.mrg.


re: Understanding's snippet of athn(9) code

2022-03-05 Thread matthew green
Farhan Khan writes:
> Hi all,
> I am trying to understand a snippet of athn(9) code for the purpose of
> porting to FreeBSD. I am reading the function athn_usb_htc_setup()
> located in /usr/src/sys/dev/usb/if_athn_usb.c. After tracing it
> through, it seems to terminate at a usbd_setup_xfer(9) call.
>
> Is this the equivalent of setting up which USB function will handle
> which channel? This function seems similar to FreeBSD's
> usbd_transfer_setup(9), which I believe does that. If so, how is that
> different from athn_usb_open_pipes()?
>
> If not, what does athn_usb_htc_setup() do? It is not clear to me and
> therefore I am having trouble making the translation.

usbd_setup_xfer() is used to setup one USB transfer.  it
requires that an open pipe already be provided.  the
"TRANSFERS" section of usbdi.9 in netbsd has more details
than the above:

   https://man.netbsd.org/usbdi.9

it doesn't do much more than fill in the "usbd_xfer"
structure for the transfer operation - does not change
the status of the device in any way until the transfer
is actually submitted.


.mrg.


re: odd setlist failure

2022-02-25 Thread matthew green
this should be fixed now.  sorry for the fallout.


.mrg.


re: HDMI sound not working

2022-02-19 Thread matthew green
Jaap Boender writes:
> > connected to a dell ultrasharp lcd (both 2415 and 2715 models) using
> > it's audio jack connected to a 2.1 speaker setup.
>
> So just to be sure - you get the sound to the monitor by HDMI and then 
> onwards with the audio jack? Then there's basically no difference in our 
> setups and I should be able to get mine to work somehow. Thanks for 
> this, knowing that it's possible is a big help.

yes.


.mrg.


re: HDMI sound not working

2022-02-18 Thread matthew green
i don't have anything useful for you, except to say that this should
or can be a working setup.

> I've got a setup with two sound cards: the on-board sound chip, and the 
> graphics card (a Radeon RX550). These both seem to be dectected (after 
> adding the HDAUDIO_ENABLE_HDMI option to the kernel config), as the 
> dmesg shows:

this works for me, across a couple of systems (same GPU), the last
few years.  my setup was haswell + supermicro motherboard, and is now
zen2 + asus m/b, both with radeonhd 5450.  my mixerctl, audioctl, and
audiocfg output match yours almost identically except i'm missing the
8 channel options, and my mixerctl has just this:

   outputs.dacsel=HDMI00

connected to a dell ultrasharp lcd (both 2415 and 2715 models) using
it's audio jack connected to a 2.1 speaker setup.

i last updated my kernel about 3 weeks ago.


.mrg.


re: well-supported card for new DRM?

2021-12-27 Thread matthew green
> I'm looking to upgrade my graphics card. What's the newest generation
> that's well supported by NetBSD-current now?
> NVidia "Pascal" (e.g. GTX 1050 Ti)?
> Radeon "Polaris" (e.g. Radeon RX 550)?
> or even something newer?

we have reports that 1030 works well.  i still haven't gotten
a newer nvidia since those are beyond my toy-gpu-card-price :-)

the RX 550 mostly works but we are still having some bugs (i
have one of these from before they went up about 2x in cost).
the two issues i reported before merge:

   https://github.com/riastradh/netbsd-src/issues/24
   https://github.com/riastradh/netbsd-src/issues/28

i don't think we got fixes for these yet.  #42 looks like the
same issue as my #24, and #34 is a panic i've never seen but
on the next generation card (5500.)

HTH.


.mrg.


re: HEADS UP: Merging drm update

2021-12-19 Thread matthew green
> Please update and try again?  (I've only compile-tested the changes,
> will take a closer look tomorrow if it doesn't fix the problem.)

seems to work for me.  i can once again mostly play 720p video
with "mpv -vo x11".

thanks!


.mrg.


re: backward compatibility: how far can it reasonably go?

2021-12-09 Thread matthew green
> > On Dec 8, 2021, at 10:52 AM, Greg A. Woods  wrote:
> > 
> > That's one bullet I've dodged entirely already since my oldest systems
> > are running netbsd-5 stable.  (Though in theory isn't there supposed to
> > be COMPAT support for SA?)
>
> int
> compat_60_sys_sa_register(lwp_t *l,
> const struct compat_60_sys_sa_register_args *uap,
> register_t *retval)
> {
> return sys_nosys(l, uap, retval);
> }
>
> SA is one of those things that's REALLY hard to provide compatibility for.

indeed, and only static userland would be affected, as ad@ also
provided replacement libpthread.so's for netbsd-4 that made it
use the newer kernel system instead, for use in chroot, and
this technique could also be provided for earlier if needed.


.mrg.


re: DRM access rights

2021-11-20 Thread matthew green
> libGL error: failed to open drm device: Permission denied
> libGL error: failed to load driver: i965
>
> how can I solve this? Usually it is the means of adding oneself to a 
> specific group, changing devfs, but found nothing of there like.

check the perms on /dev/dri/card0.  make sure your console
user has read/write access to it.


.mrg.


HEADS UP: DTS update will renumber rockpro64 sd and emmc storage

2021-11-11 Thread matthew green
hi folks.


an unfortunate problem with the DTS 5.15 update is that the
sdio device has been enabled by default, which means that 
the default ordering of sdmmc(4) devices changes, which leads
to the ld(4) numbers changing too.

the old sdmmc0 and sdmmc1 become sdmmc1 and sdmmc2, and so
ld0 and ld1 become ld1 and ld2.

there's no real simple fix for this, as we want to enable
the sdio support, and forcing the old attachments would
require some ugly patches.

fortunately, installations with only one sd or emmc likely
already use the "ROOT.a" method in /etc/fstab, which means
that the change won't affect the root file system.

a different pre-fix would be to reconfigure these as gpt and
dk and mount via name or uuid, then it won't matter what the
device unit is.


.mrg.


re: IDENTIFY failed

2021-10-28 Thread matthew green
> > wd1 at atabus1 drive 0
> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
> > drive 0
> > wd1: autoconfiguration error: IDENTIFY failed
> > wd1(ahcisata0:1:0): using PIO mode 0
> >
> > and booting fails. Reverting and booting with 9.99.90 gets me a working box:
> >
> > wd1 at atabus1 drive 0
> > wd1: 
> > wd1: drive supports 16-sector PIO transfers, LBA48 addressing
> > wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect...
> > ...
> > wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
> > (Ultra/133) (using DMA), NCQ (31 tags)
> >
> > I'm sure someone else saw this too, but I can't find the original post...
>
> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

> between
> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 

two possible changes to test reverting:

   http://mail-index.netbsd.org/source-changes/2021/10/05/msg132733.html

which changed how some interrupt handling works, and:

   http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html

which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


.mrg.


re: MIDI with Java on -current?

2021-10-13 Thread matthew green
Tom Ivar Helbekkmo writes:
> I have this Java application (JSynthLib) that needs to talk MIDI with my
> synthesizers.  I've previously run it with older versions of the JRE,
> using the LinuxCharDevMidiProvider that came with it - but that no
> longer works with the current environments.  All I really need is a
> standard interface between the official MIDI bits in the JRE and the
> NetBSD /dev/rmidi stuff...
>
> Anyone know of something like that?

did anyone ever port portmidi?  i only see portaudio in pkgsrc.

i don't know, but i would expect this is your best bet :(

can you find the old LinuxCharDevMidiProvider code and port it
to the new java?  they probably ditched an old API that was the
same as the one we implement since linux dropped it ages ago.


.mrg.


re: ./build.sh tool build failure in very recent NetBSD-current

2021-07-12 Thread matthew green
> I feel that this failure is related to recent gmp update.

probably.  it didn't happen initially but i see it now.

will fix..


.mrg.


re: Anyone feel like fixing pkgsrc/emulators/tme ?

2021-07-11 Thread matthew green
> but starting tmesh via gdb works??

i've had some crashes with pkgsrc/graphics/blender lately that
go away in gdb.  (it's kinda annoying, the -g enabled blender
takes a really long time for gdb to load...)

this is amd64 and 9.1-ish userland.


.mrg.


re: requires a working dlopen()

2021-07-10 Thread matthew green
dashdruid writes:
> Hello List,
>
> I keep getting this error whatever I try to build from pkgsrc on NetBSD9.1 
> i386.
>
> Even if I follow the basic tutorial with figlet:
>
> https://wiki.netbsd.org/pkgsrc/how_to_use_pkgsrc/
>
> It's the same error for all packages.

what's the actual error?

> I have tried with GCC10, 7, 5
>
> Also LEX was not present in /usr/bin so I have installed flex and linked the 
> flex binary to lex, idk if its a problem.

sounds like you missed installing the "comp" set, which
would make building packages quite challenging yes.


.mrg.


re: X11 doesn't start -current amd64 lenovo laptop

2021-05-10 Thread matthew green
can you try without an xorg.conf at all?  this hardware is
not currently supported by our kernel drm driver and will
need to fallback to wsfb or vesa.  it should do automatically
without an xorg.conf to force a driver.

in general, X -configure is no longer recommend, and a
minimal xorg.conf for only the parts that aren't default OK
is what we recommend now.  when i am trying to force a driver
all i have is an xorg.conf with just this:

---
Section "Device"
Identifier  "Card0"
Driver  "wsfb"
EndSection
---

thanks.


.mrg.


re: Problem reports for version control systems

2021-04-30 Thread matthew green
> I too get long pauses with cvs, both at the beginning,
> and even longer at the end after update is complete.

the end part is most likely cvs cleaning up after itslf by
removing all the subdirs it created but doesn't need.

check disk io or ktrace for this part -- it's usually a
local iops issue, than a network issue.


.mrg.


re: math/cgal and gcc10

2021-04-25 Thread matthew green
> Here (or pkgsrc-users?) seems ok.  But my question would be if cgal
> documents that it needs a C++11 compiler, in which case this change is
> right regardless, or if it's supposed to be ok with C++03, in which case
> maybe something else is wrong.

the release notes from 2017 say that demos require c++11 now
but the library is c++03 itself.

i've tested building with this change on netbsd-9 and in
current with GCC 10, and it seems fine in both.

i'll commit the suggested fix, barring obejctions.

thanks.


.mrg.


re: HEADS UP: GCC 10 now default on several ports

2021-04-21 Thread matthew green
matthew green writes:
> i saw a report that netbsd-8 can't be built on -current but i'm
> not finding it right now.
>
> i can confirm this is the case.  you can work around the GCC 10
> inspired issues for now with eg:
>
>./build.sh -V HOST_CFLAGS='-fcommon -O2'

this is likely to remain necessary unless we pullup fixes for
at least make(1), if not more.  i don't think that will happen,
though if someone were to do the work we would consider it.

> but then there is a -current regex vs -8 file magic regex issue.
>
> christos and i working on fixes for that.

this part is now fixed in the netbsd-8 branch, and the tree can
fully build with HOST_CFLAGS set as above.

thanks.


.mrg.


re: HEADS UP: GCC 10 now default on several ports

2021-04-19 Thread matthew green
i saw a report that netbsd-8 can't be built on -current but i'm
not finding it right now.

i can confirm this is the case.  you can work around the GCC 10
inspired issues for now with eg:

   ./build.sh -V HOST_CFLAGS='-fcommon -O2'

but then there is a -current regex vs -8 file magic regex issue.

christos and i working on fixes for that.


.mrg.


re: GCC 10 available for testing etc. in -current.

2021-04-17 Thread matthew green
> > - build.sh with no -u (update), and set -V HAVE-GCC=10 as a
> >option.  this ensures that everything is actually rebuilt
> >with the new compiler.
>
> I'm guessing that should be "-V HAVE_GCC=10", but even so I just can't 

yup!

> get this to build. I always get the message "cc: error: CET_HOST_FLAGS@: 
> No such file or directory". I'm going to see if I can find where this 
> has come from. Does it ring any bells for anyone?

this is from GDB:

gdb/dist/libiberty/Makefile.in:116:   @CET_HOST_FLAGS@

did you try clean'ing the gdb objdirs?  (both tools and the
build one.)  i think i recall a while back this was a problem
when GDB was updated.


.mrg.


re: HEADS UP: GCC 10 now default on several ports

2021-04-17 Thread matthew green
"Thomas Mueller" writes:
> > i've switched the alpha, amd64, sparc*, riscv*, ia64, and vax ports
> > have all been switched to GCC 10. 
>
> > please send-pr or send email here about problems you encounter.
> 
> > thanks. 
> 
> 
> > .mrg.
>
> What about the i386 port?

see README.gcc10:

---
[8] - i386 seems to have a signal delivery issue.  pthread tests hang and then
  complain with eg:
  threads_and_exec: q[ 627.6700846] sorry, pid 3154 was killed: 
orphaned traced process
  this problem occurs with GCC 9 as well.
---

it all builds and mostly works, but atf hangs for me (with GCC 9
as well, so it's not a compiler issue, or, it's not a *new*
compiler issue..)

> Upgrading from NetBSD (amd64 and i386) 8.99.51, might the build encounter 
> trouble jumping from GCC 7.4 to 10?
>
> My NetBSD ports of interest are amd64 and i386.
>
> Or might it be better to do a two-step. source-upgrading to NetBSD 9.1_STABLE 
> first and then to current?
>
> Or is it OK to upgrade straight to current (9.99.81)?

it shouldn't be necessary to go to netbsd-9 branch first.


.mrg.


HEADS UP: GCC 10 now default on several ports

2021-04-16 Thread matthew green
hi folks.


i've switched the alpha, amd64, sparc*, riscv*, ia64, and vax ports
have all been switched to GCC 10.

please send-pr or send email here about problems you encounter.

thanks.


.mrg.


GCC 10 available for testing etc. in -current.

2021-04-14 Thread matthew green
hi folks.


(please reply privately to this spams-many-lists message, and
i will keep src/external/gpl3/gcc/README.gcc10 updated with
the latest status.)


i've just commited the final parts that make most platforms build
(and many run) with GCC 10 as the system compiler.

i've tested these systems:
- amd64
- sparc (qemu)
- sparc64
- shark
- evbarmv7hf (cubietruck)
- i386 (has a signal delivery issue, but that seems to
  have been introduced last year, however, things seem
  to be equally as functional/broken.)
- ia64 (ski boots as far as before)
- mipsel (malta gxemul)
- mips64 (either big or little endian)
- sh3-el (landisk gxemul)
- vax (simh)

so i'm after testing for these targets:

- alpha
- hppa
- powerpc
- sh3-eb
- arm32-eb
- mipseb
- m68k

there are still issues for these targets:
- arm64 -- 'LSE' extension issues, likely needs both
  fixes for libgcc and kernel work
- sun2 ramdisk overflows, and it's already at the limit
  of what can boot without crashing from lack of space
- x68k 'loadbsd' program appears to pull in TLS code from
  libc and does not link.

the steps are fairly simple:

- update -currnet srcs
- build.sh with no -u (update), and set -V HAVE-GCC=10 as a
  option.  this ensures that everything is actually rebuilt
  with the new compiler.
- install new kernel/userland and perform testing.  if you can
  run atf that would be great, but other tests are useful too.
- reply to this message with results.

thanks!


.mrg.


re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?

2021-04-11 Thread matthew green
Martin Husemann writes:
> On Sun, Apr 11, 2021 at 10:37:21AM +1000, matthew green wrote:
> > > How can you invoke a make to test this (besides a full build.sh and adding
> > > some output to the makefiles)?
> > > Or: can you just fix and request pullup ;-)
> > > I can run sparc tests (quickly) again.
> > 
> > cd src/compat
> > nbmake-sparc64 
> > BOOTSTRAP_SUBDIRS=../../../crypto/external/bsd/openssl/lib/libcrypto 
> > dependall
>
> I still have no simple way to test the sparc64 -m32 libs - does this
> obfuscation really gain something in the real world?

i guess you figured it out going on the commit?

to be a little more verbose about this:

to build any subset of the normal "src/compat" dirs, invoke
the right nbmake-$arch in src/compat with BOOTSTRAP_SUBDIRS
set to a series of paths that built using the provided target
(so only standard targets are available -- all, dependall,
depend, clean, cleandir, install, etc.)  so to just test the
-m32 libc, i've used this:

cd src/compat
nbmake-sparc64 BOOTSTRAP_SUBDIRS=../../../lib/libc dependall
nbmake-sparc64 BOOTSTRAP_SUBDIRS=../../../lib/libc install 
DESTDIR=/export/root/sparc64

and then my nfsroot has a new /usr/lib/sparc/libc.so.12 and
i test it on the target.

thanks.


.mrg.


re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?

2021-04-10 Thread matthew green
Martin Husemann writes:
> On Sat, Apr 10, 2021 at 04:12:55PM +1000, matthew green wrote:
> > Martin Husemann writes:
> > > On Sat, Apr 10, 2021 at 08:38:39AM +1000, matthew green wrote:
> > > > for a quick fix, this is OK, but long term, these are built
> > > > for sparc64 compat32 as well, and benefit from having this
> > > > code in place.
> > >
> > > I have seen that (and the previous modes.inc conditionalizing it), but I
> > > do not understand how we get there in the sparc64 compat libs build.
> > >
> > > Are you sure it used to pick this code for that case? I mean it clearly 
> > > was
> > > intended to do so, but did it really work? If so, we should restore all
> > > the conditionals to make it happen again and add better comments to
> > > describe the involved make(1) magic.
> > 
> > src/compat/sparc64/sparc/bsd.sparc.mk:CRYPTO_MACHINE_CPU=  ${MLIBDIR}
> > 
> > and
> > 
> > src/crypto/external/bsd/openssl/lib/libcrypto/srcs.inc:.include 
> > "${.CURDIR}/arch/${CRYPTO_MACHINE_CPU}/${cryptoinc}"
>
> OK, and the ?= in srcs.inc not overriding this - I see.
>
> How can you invoke a make to test this (besides a full build.sh and adding
> some output to the makefiles)?
> Or: can you just fix and request pullup ;-)
> I can run sparc tests (quickly) again.

cd src/compat
nbmake-sparc64 
BOOTSTRAP_SUBDIRS=../../../crypto/external/bsd/openssl/lib/libcrypto dependall


.mrg.


re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?

2021-04-10 Thread matthew green
Martin Husemann writes:
> On Sat, Apr 10, 2021 at 08:38:39AM +1000, matthew green wrote:
> > for a quick fix, this is OK, but long term, these are built
> > for sparc64 compat32 as well, and benefit from having this
> > code in place.
>
> I have seen that (and the previous modes.inc conditionalizing it), but I
> do not understand how we get there in the sparc64 compat libs build.
>
> Are you sure it used to pick this code for that case? I mean it clearly was
> intended to do so, but did it really work? If so, we should restore all
> the conditionals to make it happen again and add better comments to
> describe the involved make(1) magic.

src/compat/sparc64/sparc/bsd.sparc.mk:CRYPTO_MACHINE_CPU=  ${MLIBDIR}

and

src/crypto/external/bsd/openssl/lib/libcrypto/srcs.inc:.include 
"${.CURDIR}/arch/${CRYPTO_MACHINE_CPU}/${cryptoinc}"


.mrg.


re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?

2021-04-09 Thread matthew green
> Different to other asm code that e.g. properly detetects various VIS
> instructions that may or may not be available on the current CPU, the code
> in ghash-sparcv9.pl is plain sparcv9 code and can not be enabled for our
> sparc builds.
>
> Christos, can you disable all "modes" asm and request pullup?
> I can quickly test on -current...

for a quick fix, this is OK, but long term, these are built
for sparc64 compat32 as well, and benefit from having this
code in place.

John's point about __arch64__ may be relevant -- i'm pretty
sure that, before, that would only be set for sparc64 builds,
be it 32 or 64 bit userland, since that target defaults to
__arch64__ (which means sparcv9, not 64 bit ABI.)  so if
this has been removed, we're now building this code on sparc
as well as sparc64 (both ways), which is new, and clearly it
is buggy.


.mrg.


re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?

2021-04-08 Thread matthew green
> >> and one more
> >>
> >>   __sigaction_sigtramp(SIGILL...)
> >>
> >> Then, at the end:
> >>
> >>   PSIG  SIGILL SIG_DFL: code=ILL_ILLOPC, addr=0xedccbdf0, trap=2)
>
> Program was terminated due to an illegal opcode being detected in
> the gcm_ghash_4bit() assembly function:

yes.  John, can you, from gdb, print the value of
OPENSSL_sparcv9cap_P[0] and OPENSSL_sparcv9cap_P[1].  if 1<<6 is set
in the first, then the vis3 path will be taken in gcm_ghash_4bit().

it seems that these caps are setup wrongly.  you could try to
instrument OPENSSL_cpuid_setup() in 
crypto/external/bsd/openssl/dist/crypto/sparcv9cap.c to print the
various settigs.  it seems that SPARCV9_VIS3 is set.  note that
there are two places it can be set, but the first one is only for
_SVR4 so not used here.

nothing here seems changed with the update.  these values should
all be zero for real sparc 32 bit hardware (they're the sparcv9
caps after all :)

> As a workaround, until the offending opcode is found, try
> `#undef GHASH_ASM_SPARC' on line 692 in
> src/crypto/external/bsd/openssl/dist/crypto/modes/gcm128.c to force
> use of the C functions.

good idea.


.mrg.


re: nothing contributing entropy in Xen domUs? or dom0!!!

2021-04-01 Thread matthew green
> In this particular example server it's in a Dell R510 with a pair of
> 6-core E5645 CPUs that "cpuid" shows the following for (in the dom0):

this is a westmere-ep CPU, which does not support rdseed
or rdrand.  rdrand appeared in ivybridge (2 generations
later, with sandybridge in the middle.)


re: -current tar(1) breakage

2021-03-27 Thread matthew green
> Joerg thinks that this is an nfs issue (a bug with nfs giving incorrect data).

even if true, tar shouldn't *core dump*.  is there a path
to RCE here some where?  it's clearly overwriting pointers
with strings, so unless someone can clearly show there is
no code exec vector here, it seems potentially problematic
and should be fixed.


.mrg.


re: How to determine if graphics is supported by radeondrm?

2021-03-20 Thread matthew green
radeondrm does not support any modern graphics card, and
we don't have a working amdgpu driver yet (last i tried,
it hung at boot and i did not have a serial console setup
to test with yet.)

you can have almost OK stuff with the vesa driver.  maybe
wsfb also can work.

we're working (slower than hoped) on a drm update, but we
do not have any ETA currently.


.mrg.


re: Panic in usbd_create_xfer

2021-01-03 Thread matthew green
Yorick Hardy writes:
> Dear current-users,
>
> Happy new year!

happy new year yorick! and everyone.

> [   659.839003] usbd_create_xfer() at netbsd:usbd_create_xfer+0x186
> [   659.849001] usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0x74
> [   659.849001] uhidev_open() at netbsd:uhidev_open+0x21c

can you find out what lines in the source these are? 
espcially usbd_create_xfer+0x186, the other ones are
most likely obvious only the single callers - eg,
usbd_open_pipe_intr() calls usbd_create_xfer() once.

thanks.


.mrg.


re: Audio subsystem versus unplugging uaudio

2020-12-27 Thread matthew green
nice.  LGTM.


.mrg.


  1   2   3   >