from:"matthew green"

re: Perceivable time differences [was Re: PSA: Clock drift and pkgin]

2023-12-31 Thread matthew green

Johnny Billquist writes:
> Ok. I oversimplified.
>
> If I remember right, the point was that something sub 200ms is perceived 
> by the brain as being "instananeous" response. It don't mean that one 
> cannot discern shorter times, just that from an action-reaction point of 
> view, anything below 200ms is "good enough".
>
> My point was merely that I don't believe you need to have something down 
> to ms resolution when it comes to human interaction, which was the claim 
> I reacted to.

mouse's example is actually not the limits.  humans can tell
audio down to about 1ms or less timing in the best cases.

reaction time has nothing to do with expected time when you're
doing music things.  you can't react to the beat, you have to
be ready to go at the same time, *MUCH* closer than 200ms, so
that all the musicians in a band are in sync.

what one needs from their computer is different for each of us
and while most tasks are fine with our current tickless setup,
there are plenty of cases we can do better with it.

note that tickless and hi-res timers are not really the same
thing even if we can achieve one by implementing the other, we
*could* introduce hi-res timers on machines with it, but it
would be easier with a tickless framework to use.

.mrg.

re: how do I preset ddb's LINES to zero

2023-12-16 Thread matthew green

> > try "options DB_MAX_LINE=0" in your kernel?
>
> Right.  Unfortunately that fails the "without having to rebuild the
> kernel" requirement :-)

another option would be to use say, gdb :-), to change the value of
the "db_max_line" variable from default of 24 to 0, and then boot
that modified kernel... if you find the boot to ddb is problematic.

re: how do I preset ddb's LINES to zero

2023-12-15 Thread matthew green

Andrew Cagney writes:
> > > thanks, I'll add that (it won't help with my immediate problem of a
> > > panic during boot though)
> >
> > From DDB command prompt "set $lines = 0" ...
>
> Um, the test framework's VM is stuck waiting for someone to hit the
> space bar :-)
>
> I guess I could modify my pexpect script to do just that, but I was
> kind of hoping I could do something like add ddb.lines=0 to the boot
> line.

try "options DB_MAX_LINE=0" in your kernel?

we have poor boot-command line support if you compare against
say what linux can do.


.mrg.

re: radeon RV730 in iMac troubleshooting

2023-11-13 Thread matthew green

ah, i am reminded of this channge that went in recently:

   https://mail-index.netbsd.org/source-changes/2023/11/06/msg148463.html

can you try a -current snapshot and see if it works less badly?


.mrg.

re: radeon RV730 in iMac troubleshooting

2023-11-13 Thread matthew green

some random ideas/info about this, no real help yet.

Josh Moyer writes:
> I hacked kern/init_main.c to panic on line 739 before the display goes dark 
> and the last two dmesgs are:
> 
> pci_mem_find: void region (dev/pci/pci_map.c)

hmm, this one happens when the mapping size is 0.  can you post the
output of "pcictl pciN dump -d 0", for whatever "radeon0 at pciN"
matches the dmesg logs, for a boot that disabled radeon0.

> radeon0: map 5 failed (could not find corresponding source file)

this comes from sys/external/bsd/drm2/pci/drm_pci.c:

115 if (pci_mapreg_info(pa->pa_pc, pa->pa_tag, reg, type,
116 >bm_base, >bm_size, >bm_flags) != 0) {
117 aprint_debug_dev(self, "map %u failed\n", unit);

which is basically reporting the same error.  BAR5 has no size.

this may be related to it not doing a POST properly, and our code
isn't properly doing that first.. eg, radeon_r300.c:r300_init()
calls radeon_boot_test_post_card(), which will do POST if it has
not been done... though it relies upon vbios being found.

.mrg.

re: DRM/KMS: report

2023-10-15 Thread matthew green

> By now I think we should just delete sys/external/bsd/drm; it has been
> unmaintained for so long it is unlikely to work.  If there's interest
> in the legacy UMS drivers, they should all still be in the drm2 tree
> and can be adapted like I did with viadrmums.  But I have no hardware
> for most of them.

it's been a long time since i checked, but it was the only working
DRM for most older R100/R200 chipsets.  see PR#49744.  this is really
the only reason it's remained in the tree.

i did check drm2 post-recent update on this old system, and it was
still not working properly.

... but i'll not argue against removing.


.mrg.

re: Testing Emulation Syscalls

2023-08-01 Thread matthew green

have we considered checking in tiny binaries actually built for
the emulation and using them?

various methods discussed so far have their own advantages, but
this is the only way that tests actual binaries built for eg
linux and run with compat_linux.  anything else is limited in
it's ability to test, and may actually not find faults if the
test & emulation are both broken.


.mrg.

re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-21 Thread matthew green

> Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
> (No debugging symbols found in
> /usr/libdata/debug//usr/lib/libm.so.0.12.debug)
> Reading symbols from /usr/lib/libc.so.12...
> Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
> (No debugging symbols found in
> /usr/libdata/debug//usr/lib/libc.so.12.220.debug)
> Reading symbols from /usr/libexec/ld.elf_so...
> (No debugging symbols found in /usr/libexec/ld.elf_so)
> [Switching to LWP 7893 of process 7893]
> 0x00010820564b in qop_clear ()
> (gdb) bt
> #0  0x00010820564b in qop_clear ()
> #1  0x000108205774 in qop_delete_if ()
> #2  0x0001082058c6 in qcmd_destroyall ()
> #3  0x000108212c45 in main ()
> (gdb) info locals
> No symbol table info available.
> (gdb)
>
>   I don't understand why gdb complains about debugging symbols.

i think it's because our build has a bug.  i was recently
trying to debug something in a shared library and had the
same problem and i traced it down to this code:

   share/mk/bsd.README:MKSTRIPSYM If "yes", strip all local symbols from 
shared libraries;

   share/mk/bsd.lib.mk:.if ${MKSTRIPSYM:Uyes} == "yes"
   share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-x
   share/mk/bsd.lib.mk:.else
   share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-X

and putting "MKSTRIPSYM=no" in my mk.conf fixed it.

i believe this is a bug and should default to "no", if
MKDEBUG has also been enabled.


.mrg.

re: building 9.1 kernel with /usr/src elsewhere?

2023-03-07 Thread matthew green

> This completed apparently normally, reporting the build directory and
> telling me to remember to make depend.  I then went to ~/kbuild/GEN91
> and ran make depend && make.  It failed fast - no more than a second or
> two - with
>
> make[1]: don't know how to make absvdi2.c. Stop

what happens if you run "make USETOOLS=no"?


.mrg.

re: proposed cpuctl modification

2023-03-02 Thread matthew green

matthew green writes:
> hmm, someone seems to have broken this recently?  on my system
> i'm seeing cpu0 and cpu16 both reporting:
>
>cpu0: SMT ID 0
>cpu16: SMT ID 0

duh.  this is user error.  it also said:

   Cannot bind to target CPU.  Output may not accurately describe the target.
   Run as root to allow binding.

running as root works fine.

.mrg.

re: proposed cpuctl modification

2023-03-02 Thread matthew green

> we just did that).I guess we could look and skip the update on
> the hyperthread companion cores (pseudo-cores) but that's not what I am
> proposing, partly because I'd have to work out how to do that, but also
> because that by itself would only solve half the problem.

we should do this as well, it should fairly simple.  we already
display the relevant info in "cpuctl identify 0" eg:

hmm, someone seems to have broken this recently?  on my system
i'm seeing cpu0 and cpu16 both reporting:

   cpu0: SMT ID 0
   cpu16: SMT ID 0

via cpuctl, though dmesg has them as:

   cpu0: node 0, package 0, core 0, smt 0
   cpu16: node 0, package 0, core 0, smt 1

> My processor also has 8 cores, with no hyperthreading, where it looks as
> if internally, there's just 2 sets of ucode, one for the first 4, and one
> for the second 4.   Updates of the other 3 of each group find the ucode
> version already installed, and error out.

that's odd, but we should handle it saner.

> So, what I am proposing is to have cpuctl simply ignore EEXIST errors from
> the update the microcode" ioctl. unless the -v option (which already exists)
> is given.

i like this idea.

> --- cpuctl.c  1 Feb 2022 10:45:02 -   1.32
> +++ cpuctl.c  2 Mar 2023 21:36:06 -
> @@ -247,7 +247,7 @@
>   cpuset_destroy(cpuset);
>   }
>   error = ioctl(fd, IOC_CPU_UCODE_APPLY, );
> - if (error < 0) {
> + if (error < 0 && (verbose || errno != EEXIST)) {
>   if (uc.fwname[0])
>   err(EXIT_FAILURE, "%s", uc.fwname);
>   else

this seems odd to me.  verbose should print additional info
but it shouldn't change the actual behaviour here.

eg, if i were to use this with -v on your CPU, i'd want to
have it show it working on cpu0 and cpu4, and EEXIST for
the rest, and this appears that it will exit after cpu1.

feel free to ignore the SMT issue, it's orthogonal to this.

thanks!


.mrg.

re: devsw_detach is failing -- is this a manifestation of PR kern/56962?

2023-02-12 Thread matthew green

your change seems to fix a clear but to me.

> - if (*bmajor < 0)
> + if ((bdev != NULL) && (*bmajor < 0)) 
>   *bmajor = conv->d_bmajor;

there's also this line i'm curious about, just below:

if (*bmajor != conv->d_bmajor || *cmajor != conv->d_cmajor) {
error = EINVAL;
goto out;

should the first part also depend upon either bdev != NULL or
perhaps (*bmajor >= 0 && bdev == NULL) as the following code
uses...


.mrg.

re: crash in timerfd building pandoc / ghc94 related

2023-02-06 Thread matthew green

> dd if=/dev/urandom bs=65536 of=/dev/mem

FWIW, secururelevel > 0 fixes this issue.  so, perhaps you
can rephrase by including something about correct separation
of privs, since root write-access to /dev/mem is literally
giving it kernel-level privs.


.mrg.

crash in timerfd building pandoc / ghc94 related

2023-02-05 Thread matthew green

hi folks.


i saw a report about ghc94 related crashes, and while it's easy
to build ghc94 itself, it's easy to trigger a crash by having
packages use it.  for me 'pandoc' wants a bunch of hs-* pkgs and
i had crashes in 2 separate ones.

i added some addditional logging to the failed assert to confirm
what part of it is failing.  here's the panic and stack:

[ 2875.6028592] panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp 
|| c->c_cpu->cc_active != c" failed: file "/usr/src/sys/kern/kern_timeout.c", 
line 381 running callout 0xfaa403b50e00: c_func (0x80f53893) 
c_flags (0x100) c_active (0xfaa403b50e00) cc_lwp (0xfab1b4bba080) 
destroyed from 0x80fa0d89

breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x183
kern_assert() at netbsd:kern_assert+0x4b
callout_destroy() at netbsd:callout_destroy+0xbc
timerfd_fop_close() at netbsd:timerfd_fop_close+0x36
closef() at netbsd:closef+0x60
fd_close() at netbsd:fd_close+0x138
sys_close() at netbsd:sys_close+0x22
syscall() at netbsd:syscall+0x196
--- syscall (number 6) ---


as you can see, "c_active" is "c", and cc_lwp is not curlwp, so
the assert triggers.  the active lwp is a softint thread:

db{1}> bt/a 0xfab1b4bba080
trace: pid 0 lid 5 at 0xa990969120e0
softint_dispatch() at netbsd:softint_dispatch+0x1ba
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xa990969120f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
--- interrupt ---

this softint_dispatch() address is:

(gdb) l *(softint_dispatch+0x1ba)
0x80f45c4b is in softint_dispatch 
(/usr/src/sys/kern/kern_softint.c:623).
621 PSREF_DEBUG_BARRIER();
622
623 CPU_COUNT(CPU_COUNT_NSOFT, 1);

and the actual address is a "test" instruction, so it seems that
this lwp was interrupted by the panic and saved at this point of
execution.  so the assert is firing because the callout is both
currently about to run _and_ being destroyed.


this is what i've learned about this so far.


.mrg.

re: nouveau0 error messages when starting X11

2023-01-05 Thread matthew green

>  9.597827] wsdisplay0 at nouveaufb0 kbdmux 1: console (default, vt100 
> emulation), using wskbd0

what's the pci device?  (ie, what's the card?  or at least, what's
the chipset the pci id claims -- flashed cards can have a different
pci id than reality and drivers work because they check internal
regs that don't change.  i have a "960" that is really a 550 Ti.)

> [...]
> (here x11 starts:)
>
> [34.738026] nouveau0: autoconfiguration error: error: user: 
> nvif_object_map, -12
> [34.738026] nouveau0: autoconfiguration error: error: user: channel 
> failed to initialise, -12

these ones aren't familiar to me, but i'm used to seeing weird
issues with nouveau.  a 1030 boots with:

[ 6.418821] nouveau0: autoconfiguration error: error: DRM: core notifier 
timeout
[ 8.418804] nouveau0: autoconfiguration error: error: DRM: base-0: timeout
[10.418787] nouveau0: autoconfiguration error: error: DRM: base-1: timeout

and then x11:

[49.938540] nouveau0: autoconfiguration error: error: gr: intr 0040

but it doesn't seem to matter.  i keep meaning to look at all
of them, and i have looked or even fixed some.. but are you seeing
any real issues?


.mrg.

re: ATA TRIM?

2022-12-12 Thread matthew green

are you trying to trim a really large section at once?  i
think that's what i see:

> [ - root] 3> date; ./trim /dev/rwd1d 4 2; date

at least in my experience, the problem is that most devices
take a while to handle a TRIM request, longer than the 30s
timeout typically used.

this is why blkdiscard(8) defaults to 32MiB chunks.

maybe port that tool back, it's also supposed to match the
linux command of the same name.  it's not in netbsd-9, but
last i tried, the interfaces the -current tool uses are
available in -9 kernels.


.mrg.

re: #pragma once

2022-10-15 Thread matthew green

it seems that pcc is missing '#pragma once' support.  at least,
the version in src.

ragge, can you fix it? :-)  thanks.


.mrg.

re: #pragma once

2022-10-15 Thread matthew green

i am a fan of #pragma once.  the main reason is that it avoids
the class of bugs where you copy a header and forget to change
the #define and then sometimes the header is empty upon use,
and you end up having to use cpp output to debug it.

i'd be ok with using it anywhere -- AFAIK the compiler support
is available everywhere.

note that it is already used in libpcap, bind, libuv, and as a
test for xlint, in our tree, for consumed components.


.mrg.

re: MP-safe /dev/console and /dev/constty

2022-10-01 Thread matthew green

i really like this except for the if () do { ... } while (0); else
abuse portion.  please rework that part.  it looks easiest to push
into a separate function, perhaps.

thanks.


.mrg.

re: 9.99.100 fallout: file(1)

2022-09-21 Thread matthew green

> However, I wonder why this kind of info is embedded in ELF files, what
> point does that have?   Maybe it would be better to have them just say
> x.99 (and forget the kernel ABI bump number) ?

i would rather keep the info.  i use it as a quick check of
whether i reinstalled recently or not.  "file /bin/sh".


.mrg.

re: Lock of NetBSD-current with ifconfig down / up

2022-09-20 Thread matthew green

this should be fixed now.  be sure to have usbnet.c 1.112.


.mrg.

re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread matthew green

> I got two off-list mails suggesting LOCKDEBUG.  I tried that today -
> I added LOCKDEBUG and VNODE_LOCKDEBUG both (the latter I found in the
> ALL config when grepping for LOCKDEBUG).
>
> That kernel panicked promptly on boot
>
> panic: vop_read: vp: locked 0, expected 1

just use LOCKDEBUG for now.

this is VNODE_LOCKDEBUG and it might be buggy since no one
uses it, but LOCKDEBUG is quite common.


.mrg.

re: valgrind

2022-03-20 Thread matthew green

> But it's less a lack of interest and more an unwillingness to ignore
> the licensing issues.  `Modern' GCC is licensed under the GPLv3.  I

try clang, which usually has newer/better sanitizers.

cpu_rootconf() vs booted_device issues

2022-03-13 Thread matthew green

hi folks.


i recently changed the evbmarm/fdt code to avoid forcing the
'booted_device' to be set to something if it was already set
(in that case, by the raidframe "soft root" code).

i surveyed all the cpu_rootconf() implementations.  there are
a lot, some are fixed but many are not.

i'm not planning on fixing any of these right now, though many
will use an idential patch (eg, all hpc{arm,mips.sh}, many of
the m68k/ppc/mips ports, use similar or identical code here
already), and some require thinking a little more about it
(eg, the sparc* ports set "booted_device" during autoconf when
the boot device is found in the tree.)  however, here's the
list of them all and my vague notes.


.mrg.


cpu_rootconf problems.most of these only need to have eg
"if (booted_device) return;" added to a function usually
called findroot() or set_root_device():
amiga
amigappc
atari
- also checks RB_ASKNAME, which means no default will be set
  (consider removing this check here)
cats
- might need to avoid setting boot_args?few others too.
evbppc/ev64260
evbppc/pmppc
ews4800mips
hp300
hpcarm
hpcmips
hpcsh
hppa
ibmnws
mac68k
mipsco
mvmeppc
news68k
newsmips
next68k
prep
shark
sparc
sparc64
- both sparc/sparc64 set booted_device during autoconfig
  when the device is found, matched with the ofw provided
  boot path info.
sun3
sun68k
usermode

these ones have an always-true test could be fixed:
evbmips/cavium
evbmips/gdium
evbmips/loongson
evbmips/malta
evbmips/rmixl
if (booted_device)
return;
if ((booted_device == NULL) && netboot == 0) {

re: panic in -current: "vp->v_iflag & VI_TEXT" failed: file "/usr/src/sys/kern/exec_subr.c", line 183

2022-03-08 Thread matthew green

matthew green writes:
> "J. Hannken-Illjes" writes:
> > I'm now able to reproduce it here -- takes about six hours to trigger.
> >
> > I suppose vrelel() lost a check for new references with my last changes,
> > currently testing the diff attached.
>
> well, this ran until i ran out of file building the native gcc,
> it had been going for over an hour and was close to finishing.
>
> only one crash and not non crash, but this seems to help so far.
>
> i've cleaned the objdir and restarted, and i'll try again a couple
> more times when it's done.

this partial build completed, and 2 more since.

i say this fixes the problem.  thanks!


.mrg.

re: panic in -current: "vp->v_iflag & VI_TEXT" failed: file "/usr/src/sys/kern/exec_subr.c", line 183

2022-03-08 Thread matthew green

"J. Hannken-Illjes" writes:
> I'm now able to reproduce it here -- takes about six hours to trigger.
>
> I suppose vrelel() lost a check for new references with my last changes,
> currently testing the diff attached.

well, this ran until i ran out of file building the native gcc,
it had been going for over an hour and was close to finishing.

only one crash and not non crash, but this seems to help so far.

i've cleaned the objdir and restarted, and i'll try again a couple
more times when it's done.

thanks!

.mrg.

panic in -current: "vp->v_iflag & VI_TEXT" failed: file "/usr/src/sys/kern/exec_subr.c", line 183

2022-03-07 Thread matthew green

that's this:

175 vmcmd_map_pagedvn(struct lwp *l, struct exec_vmcmd *cmd)
176 {
[ ... ]
181 vm_prot_t prot, maxprot;
182
183 KASSERT(vp->v_iflag & VI_TEXT);

christos said this happened to him on a 8c/16t 64GB machine, using
build.sh -j40, and i was able reproduce it on a 6c/12th 64GB machine.
my build only got as far as installing includes (so literally right
after tools were built.)  the longer pacic message is:

panic: kernel diagnostic assertion "vp->v_iflag & VI_TEXT" failed: file 
"/usr/src/sys/kern/exec_subr.c", line 183
cpu9: Begin traceback...
vpanic() at netbsd:vpanic+0x156
kern_assert() at netbsd:kern_assert+0x4b
vmcmd_map_pagedvn() at netbsd:vmcmd_map_pagedvn+0x137
execve_runproc() at netbsd:execve_runproc+0x394
execve1() at netbsd:execve1+0x4f
sys_execve() at netbsd:sys_execve+0x2a
syscall() at netbsd:syscall+0x196
--- syscall (number 59) ---
netbsd:syscall+0x196:

my panic is the same as christos' -- inside execve1().  i'm getting
a crash dump now, so i can probably inspect this further, but i'm
wondering if this is related to the changes to vnode/nfs locking
changes that came around a week ago:

https://mail-index.netbsd.org/source-changes/2022/02/28/msg137218.html
https://mail-index.netbsd.org/source-changes/2022/02/28/msg137219.html

i don't know when this broke.  i only really started looking because
christos said he saw this problem..

thanks.


.mrg.

reboot panic: "error == EOPNOTSUPP" in vfs_vnode.c line 1120

2022-02-06 Thread matthew green

while rebooting a quartz64 with a usb attached disk that just
had a about 3.5GB of data written to it, i the umass gave some
errorse and then i got a panic shortly later:

[ 5869.8702963] unmounting 0x0001f9fdb000 /mnt (/dev/sd1a)...
[ 5879.8716525] umass1: Invalid CSW: tag 0 should be 428938
-- note the long time between here and the next message
[ 6178.3212124] umass1: BBB reset failed, IOERROR
[ 6178.3354377] sd1(umass1:0:0:0): generic HBA error
[ 6178.3354377] sd1a: error writing fsbn 128 of 128-135 (sd1 bn 196736; cn 195 
tn 2 sn 50)
[ 6178.3438154] sd1(umass1:0:0:0): generic HBA error
[ 6178.3438154] sd1: cache synchronization failed
[ 6178.3538144] sd1(umass1:0:0:0): generic HBA error
[ 6178.3538144] sd1a: error writing fsbn 23744 of 23744-23767 (sd1 bn 220352; 
cn 218 tn 9 sn 41)
[ 6178.3655689] sd1(umass1:0:0:0): generic HBA error
[ 6178.3738157] sd1: cache synchronization failed
[ 6178.3738157] unmounting 0x0001fbc0c000 /tmp (tmpfs)...
[ 6178.3838163] unmounting 0x0001fbc88000 /var/shm (tmpfs)...
[ 6178.3938196] unmounting 0x0001fbe8f000 /proc (procfs)...
[ 6178.3938196] unmounting 0x0001fbe59000 /dev/pts (ptyfs)...
[ 6178.4038176] unmounting 0x0001fccff000 /boot (/dev/dk3)...
[ 6178.4138186] unmounting 0x0001fd4f7000 / (/dev/dk4)...
[ 6179.5938954] unmounting done
[ 6179.6038961] Skipping crash dump on recursive panic
[ 6179.6038961] panic: kernel diagnostic assertion "error == EOPNOTSUPP" 
failed: file "/usr/src/sys/kern/vfs_vnode.c", line 1120 
[ 6179.6138953] cpu2: Begin traceback...
[ 6179.6238959] trace fp c000ec1637f0
[ 6179.6238959] fp c000ec163820 vpanic() at c05948ac 
netbsd:vpanic+0x14c
[ 6179.6338975] fp c000ec163880 kern_assert() at c06f55f8 
netbsd:kern_assert+0x58
[ 6179.6438970] fp c000ec163910 vrevoke_suspend_next.part.0() at 
c05fa83c netbsd:vrevoke_suspend_next.part.0+0x6c
[ 6179.6538979] fp c000ec163930 vrevoke() at c05fc9e0 
netbsd:vrevoke+0x40
[ 6179.6638983] fp c000ec163970 genfs_revoke() at c0612d88 
netbsd:genfs_revoke+0x14
[ 6179.6738989] fp c000ec163980 VOP_REVOKE() at c0608e38 
netbsd:VOP_REVOKE+0x38
[ 6179.6838996] fp c000ec1639c0 vdevgone() at c05eeb88 
netbsd:vdevgone+0x48
[ 6179.6838996] fp c000ec163a10 sddetach() at c0163af0 
netbsd:sddetach+0xc0
[ 6179.6939003] fp c000ec163a70 config_detach() at c0574e9c 
netbsd:config_detach+0x18c
[ 6179.7039009] fp c000ec163af0 scsipi_target_detach() at c015bb8c 
netbsd:scsipi_target_detach+0xfc
[ 6179.7139016] fp c000ec163b60 scsibusdetach() at c015e29c 
netbsd:scsibusdetach+0x68
[ 6179.7239022] fp c000ec163ba0 config_detach() at c0574e9c 
netbsd:config_detach+0x18c
[ 6179.7339041] fp c000ec163c20 umass_detach() at c0198a48 
netbsd:umass_detach+0xb8
[ 6179.7439038] fp c000ec163c70 config_detach() at c0574e9c 
netbsd:config_detach+0x18c
[ 6179.7539042] fp c000ec163cf0 usb_disconnect_port() at c017d610 
netbsd:usb_disconnect_port+0xc0
[ 6179.7639052] fp c000ec163d60 uhub_explore() at c01822a8 
netbsd:uhub_explore+0x218
[ 6179.7739055] fp c000ec163e20 usb_discover() at c016cde4 
netbsd:usb_discover+0xa4
[ 6179.7839065] fp c000ec163e70 usb_event_thread() at c016d06c 
netbsd:usb_event_thread+0xc8
address 0x100 is invalid
address 0xf0 is invalid
address 0xe8 is invalid
[ 6179.8039078] cpu2: End traceback...
[ 6180.7139674] Kernel lock error: _kernel_lock,239: spinout

[ 6180.7239684] lock address : 0xc0f57400 type :   spin
[ 6180.7339690] initialized  : 0xc06f56c0
[ 6180.7339690] shared holds :  0 exclusive:  1
[ 6180.7439697] shares wanted:  0 exclusive:  1
[ 6180.7539701] relevant cpu :  3 last held:  2
[ 6180.7539701] relevant lwp : 0x0001fd530080 last held: 0x0001fd667100
[ 6180.7639710] last locked* : 0xc055d000 unlocked : 0xc051da28
[ 6180.7739714] curcpu holds :  0 wanted by: 0x0001fd530080

at this point it has hung and needs a reset.

the addresses 0xc055d000 and 0xc051da28 (locked,
last locked) are:

0xc055d000 is in sleepq_block (/usr/src/sys/kern/kern_sleepq.c:404).
401 if (__predict_false(biglocks != 0)) {
402 KERNEL_LOCK(biglocks, NULL);
403 }
404 return error;

0xc051da28 is in cv_enter (/usr/src/sys/kern/kern_condvar.c:133).
132 sleepq_enter(sq, l, mp);
133 sleepq_enqueue(sq, cv, CV_WMESG(cv), _syncobj, catch_p);
134 mutex_exit(mtx);



.mrg.

re: Some guidance/suggestion please

2022-01-14 Thread matthew green

it seems to me that if some driver depends upon altq, then
altq should simply always refuse to unload if a driver is
loaded that depends upon it.  this should be an explicit
dependency, and probably implicit via symbols.

if, say there's a fully modular system with two NICs, and
only one of them supports altq.  only one of the NIC drivers
will declare a dep on altq, blocking the unload of altq
while the driver remains loaded.


if this isn't already the case, can we arrange it to be?


.mrg.

re: Proposal: Deprecate (or rename) extsrc/

2022-01-08 Thread matthew green

> I propose that we deprecate or remove the "extsrc/" tree,
> as the name name-complete conflicts with "external/".

yes, please.


.mrg.

re: kaveri_mec2.bin file missing

2021-11-25 Thread matthew green

Riza Dindir writes:
> Hello All,
>
> Finally... I was able to enable and use my Kaveri device
> (0x1002/0x1309), at the end. It is using the external monitor and is
> using the correct 1440x900 resolution. This is very nice.
>
> There is one thing that I would like to ask.
>
> There is a "CONFIG_ACPI" definition that is being used in the
> radeon_bios file. I looked for this definition in the sources and
> found that "sys/external/bsd/drm2/dist/drm/i915/i915_drv.h" has this
> definition (that is the only place).
>
> I wanted to copy or do something similar in the radeon driver. But I
> was not sure. Then I checked at runtime definitions CONFIG_ACPI and
> NACPICA (as used in the i915 driver). The NACPICA has a value and is >
> 0. And CONFIG_ACPI is not defined, as seen below.
>
>   [10.911949] kern info: [drm] CONFIG_ACPI not set
>   [10.920577] kern info: [drm] NACPICA > 0
>
> Do you have any ideas on why CONFIG_ACPI is not defined in radeon?
> (Maybe this is not the correct place to ask this question).

the ACPI code in the drm code base requires porting to netbsd,
for both the radeon and the nouveau drivers.  it's been a while
since i was looking at it (i had nouveau compiling at some
point).  ie, it's not enabled because it doesn't compile there,
it is only enabled on i915 because it does there.

> If I were to submit patches, how can I submit these, where?

sending patches to this list is good.

(there's also tech-x11@ but this stuff is relevant for the
console and other operations that don't need X, and is about
the kernel side..)

thanks.


.mrg.

re: Graphics driver and CONFIG_ACPI

2021-11-20 Thread matthew green

> I am trying to make the radeonr7 m265 display device to work on NetBSD
> 9.2 (amd64). In the radeon_bios.c file there is a definition used,
> named CONFIG_ACPI. Some functions are using this definition. I want to
> enable this. Where can I do this? In the configuration file (GENERIC
> for instance) or someplace else?

CONFIG_ACPI needs work to run last i tried.

you can simply put "options CONFIG_ACPI" into your kernel config
to try it, but last i tried it did not build (it might work if
you remove i915 and nouveau drivesr, or, at least require far
less effort in this case.)


i also encourage you to try the new drm code we're working on,
you can fit the latest copy here:

   https://github.com/riastradh/netbsd-src/tree/rereredrm56


.mrg.

re: kaveri_mec2.bin file missing

2021-11-20 Thread matthew green

Riza Dindir writes:
> Hello
>
> I am using NetBSD 9.2 (amd64). Am trying to make the radeon driver work.
>
> In the file "sys/external/bsd/drm2/dist/drm/radeon/radeon_cik.c",
> there is a reference to a kaveri_mec2.bin file. But this file is not
> present in the "src/sys/dev/microcode/radeon/" directory. Is it
> possible to comment out the reference to kaveri_mec2.bin?

copy this out of the upstream "linux-firmware" package:

   
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/

we're planning on updating these files for netbsd-10, but for
now this is the right process.


.mrg.

re: Trying to access the Expansion ROM of a video card

2021-10-23 Thread matthew green

> > a full dmesg might also be useful -- might help identify what
> > else has mapped this.  (perhaps vga@something.)
>
> Tha is probably I did not hava a bus_space_unmap in my code. And since
> the system finds two radeon devices one is (0x1002, 0x1309) and the
> other is (0x1002, 0x6604) it is passing the first one and failing in
> the second one.

you say "passing the first"?  does vga@pci attach instead?
that is very likely the reason it is double mapped, and it
may be that the PCI BARs for the other are not setup for
one or the other device.

a dmesg would be really useful -- even if just the parts for
both radeon cards.

> Is using rdev->pdev->pd_pa.pa_memt incorrect when mapping the bus space?

probably.

> What address should I write to the ROM BAR?

i think that you shouldn't do this, without coordination
with the whole PCI subsystem.  see the code that is behind
the PCI_CONFIGURE_NETBSD option - this is not a simple
task.

> Is using BUS_SPACE_MAP_PREFETCHABLE in the flags in the bus_space_map
> call correct? I saw this in the code for pci_map_rom, and thought that
> it might need that?

this part is probably going to be fine.


.mrg.

re: Trying to access the Expansion ROM of a video card

2021-10-22 Thread matthew green

> I am using NetBSD 9.2 (amd64). I have a laptop that has a radeon r7
> m265 video card (vendor 0x1002, product 0x6604). The system does not
> recognize this card. It fails with this message:

are you booting uefi or bios?

sometimes modern systems don't work properly with bios and gpus,
not being configured properly.  can you try booting uefi if not
already?

if that doesn't work, can you try booting a -current kernel?

a full dmesg might also be useful -- might help identify what
else has mapped this.  (perhaps vga@something.)


.mrg.

re: [PATCH] Move DRM-driver firmware from base to its own set, gpufw

2021-09-25 Thread matthew green

this looks good.  thanks for doing it.

> Comments:
> Switched to a single MK tunable for it - that is probably unneeded.
> Might be an issue for not obsoleting directories on non-DRM archs.

please put back the separate firmwares -- there are x86-only
ones coming so we want to be able to choose them more easily,
and it would be nice if i hvae a nvidia-only setup to be able
to turn off amd/intel etc, i my special build.

thanks.


.mrg.

re: Level for Unix-domain socket options

2021-08-05 Thread matthew green

i like it.

> Index: sys/sys/un.h
> ===
> RCS file: /cvsroot/src/sys/sys/un.h,v
> retrieving revision 1.59
> diff -u -r1.59 un.h
> --- sys/sys/un.h  6 Nov 2020 14:50:13 -   1.59
> +++ sys/sys/un.h  5 Aug 2021 13:19:20 -
> @@ -56,6 +56,7 @@
>   * Socket options for UNIX IPC domain.
>   */
>  #if defined(_NETBSD_SOURCE)
> +#define SOL_LOCAL0   /* options level for get/setsockopt */
>  #define  LOCAL_OCREDS0x0001  /* pass credentials to receiver 
> */
>  #define  LOCAL_CONNWAIT  0x0002  /* connects block until 
> accepted */
>  #define  LOCAL_PEEREID   0x0003  /* get peer identification */

please fix the space/tab issue here.

> Index: share/man/man4/unix.4
> ===
> RCS file: /cvsroot/src/share/man/man4/unix.4,v
> retrieving revision 1.26
> diff -u -r1.26 unix.4
> --- share/man/man4/unix.4   3 Jul 2017 21:30:58 -   1.26
> +++ share/man/man4/unix.4   5 Aug 2021 13:19:16 -
> @@ -173,8 +173,8 @@
>  when the destination socket is closed.
>  .Pp
>  A UNIX-domain socket supports two
> -.Tn socket-level
> -options for use with
> +.Dv SOL_LOCAL
> +level options for use with
>  .Xr setsockopt 2
>  and
>  .Xr getsockopt 2 :

now reads:

"A UNIX-domain socket supports two SOL_LOCAL level options for use .."

shouldn't this be something like "only supports SOL_LOCAL level"?
if not, what the "two" part now?

thanks.


.mrg.

re: Where's f_audioctx set?

2021-07-13 Thread matthew green

i'm not sure, but i'm guessing that it happens here:

audio.c:2491:   error = fd_clone(fp, fd, flags, _fileops, af);
audio.c:3541:   error = fd_clone(fp, fd, flags, _fileops, af);
audio.c:8167:   error = fd_clone(fp, fd, flags, _fileops, af);

with

kern_descrip.c:1882:fd_clone(file_t *fp, unsigned fd, int flag, const struct 
fileops *fops,
kern_descrip.c:1883:   void *data)
[ ... ]
kern_descrip.c:1895: fp->f_data = data;


.mrg.

re: kern.maxlockf for byte range lock limit

2021-07-09 Thread matthew green

> @@ -146,8 +146,10 @@
>  {
>   extern int kern_logsigexit; /* defined in kern/kern_sig.c */
>   extern fixpt_t ccpu;/* defined in kern/kern_synch.c */
>   extern int dumponpanic; /* defined in kern/subr_prf.c */
> + extern int maxlocksperuid;  /* defined in kern/vfs_lockf.c */
> +

this part is gross, please don't add more of this.

all of those variabes should be put in a header file so
that the declarations can't be different.  at the very
least, do that for your new one, even if you don't want
to fix the other 3 as well.

..or simply move the sysctl creation into the file that
owns that variable and you could avoid hard coding a
version number for it (not something we prefer since
the introduction of dynamic sysctl long ago.)


.mrg.

re: Devices.

2021-06-01 Thread matthew green

Brian Buhrow writes:
>   hello David.  What I don't see in your proposal is a way of 
> implementing a dynamic device
> filesystem.  NetBSD, and possibly OpenBSD, are the last Unix-like OS's that 
> I'm aware of that
> use static special files in their filesystems to point to devices.  If your 
> proposal was

i don't believe this is a correct claim.

linux with udev is, at any reasonably high level, the
same as netbsd with devpubd.


.mrg.

re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green

> > Me too.  I was - am - rather puzzled by it.
>
> Right. That was my issue. Claiming that you'd get more problems with 
> rounding and issues with coarsness when running at a higher frequency, 
> when it in fact is the exact opposite. With a higher frequency you have 
> more accuracy and less errors from truncations.
>
> Anyway, I have no idea what the actual problem is. Good luck with that part.

as i posted earlier in this thread:

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

when hz > 1000, this returns 0 for input of 1.  0 can sometimes
mean "never block" and sometimes can mean "block forever".


.mrg.

re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green

Manuel Bouyer writes:
> On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> > Manuel Bouyer writes:
> > > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > > 
> > > > >Another issue could be mstohz() called with a delay too short;
> > > > >mstohz() will round it up to 1 tick.
> > > > 
> > > > 
> > > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))

actually, this one is problematic as well.

mstohz(1) here gives 0 for hz < 1000.  "(1 + 0) * hz / 1000"
for any 'hz' less than 1000 will give 0.

> > it's hztoms() that is the problem here.
> > 
> > #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> > 
> > ah... this is the new one.  the old one was:
> > 
> > #define hztoms(t) \
> >  (__predict_false((t) >= 0x2) ? \
> >  ((t +0u) / hz) * 1000u : \
> >  ((t +0u) * 1000u) / hz)
> > 
> > looks like christos fixed it in 2019.
>
> I'm not sure how christos's change could be a fix. I introduced hztoms()
> and mstohz() to avoid interger overflow for large values, and it looks like
> christos reintroduced it for 32bits platforms :(

there's an inline for 32 bit now, not the above code.

i realise the problem i recall here:  it happens with both
of the above with hz is > 1000.  "1 * 1000 / 1024" is 0.


.mrg.

re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green

Manuel Bouyer writes:
> On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > 
> > >Another issue could be mstohz() called with a delay too short;
> > >mstohz() will round it up to 1 tick.
> > 
> > 
> > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > 
> > If mstohz() would round up to full ticks, it could actually avoid
> > some pitfalls. But it doesn't.
>
> indeed. But in this case the problem will show up with smaller hz, not
> larger. So I think we can rule out this case.

it's hztoms() that is the problem here.

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

ah... this is the new one.  the old one was:

#define hztoms(t) \
 (__predict_false((t) >= 0x2) ? \
 ((t +0u) / hz) * 1000u : \
 ((t +0u) * 1000u) / hz)

looks like christos fixed it in 2019.

revision 1.615
date: 2019-09-28 08:10:58 -0700;  author: christos;  state: Exp;  lines: +25 
-12;  commitid: QCMKDk7BwqyE4NEB;
For 32 bit the mstohz and hztoms functions evaluate their parameter multiple
times. This is inefficient for cases like:
unsigned ms = hztoms(MIN(timeout, mstohz(INT_MAX)));
Make them inline functions; also provide the 64 bit versions for them here
so all the LP64 machines can use them (before only amd64 and sparc64
specialized mstohz).
Make them both return unsigned int.

which explains why the change i made in mid-2019 that seemed to no
longer be necessary was no longer necessary!


.mrg.

re: 9.1: boot-time delay?

2021-05-25 Thread matthew green

> > +optionsHZ=8000

this can become a problem due to integer division.

any number of ticks less than hz (8000) will be rounded
down to 0 in a number of places now, where as before it
was only less than 100.  i've seen this trip up in the
kernel before, and sometimes that '0' means 'poll', and
sometimes it means 'sleep forever'.

a lot of places in the kernel *do* avoid (eg, with adding
hz-1 and then dividing by hz) but there are a number that
do not...


.mrg.

re: Some changes to autoconfiguration APIs

2021-05-10 Thread matthew green

> > It was REALLY obvious that there was a lot of blind copy-pasta
> > lurking around as I audited everything.
>
> I appreciate the work you did to audit this!  But we should make it
> easier, not harder, for the compiler to audit mistakes if we're going
> to make tree-wide changes; this appears to be strictly a regression on
> technical grounds, for a change in cosmetics.  The same audit outcome
> could have applied just as well to the type-safe(r) API we had.

having looked at this issue closely, and spent the last few
days pondering, i've come to agree quite strongly with
Taylor's POV here.

please, can we revert and re-do with a type-safe API.

thanks.


.mrg.

re: problem with USER_LDT in current 9.99.81

2021-04-26 Thread matthew green

> Thanks for the idea - however machdep.user_ldt was already 0 :-(
>
> I'll pull the older sources and try and find the change which caused the 
> problem

oops! i meant to say "=1".

re: problem with USER_LDT in current 9.99.81

2021-04-26 Thread matthew green

Dave Tyson writes:
> I have wine 4.4 working under NetBSD 9.99.23 amd64 (> 1 year ago) and 
> recently 
> tried against a recent current - 9.99.81.
>
> Wine now complains:
> i386_set_ldt: Operation not permitted
> Did you reconfigure the kernel with "options USER_LDT"?
>
> but GENERIC has the option enabled and strings on the kernel shows:
>
> _CFG_options \011USER_LDT\011# User-settable LDT, used by Wine
> _KERNEL_OPT_USER_LDT
>
> I am guessing a change between 9.99.23 and 9.99.81 has broken the 
> functionality, but before I bisect does anyone have any ideas...

try this?

  # sysctl -w machdep.user_ldt=0


.mrg.

re: UVM behavior under memory pressure

2021-04-04 Thread matthew green

one additional thing about the behaviour of vm.*{min,max}
is that they do not (currently?) consider how much memory
is consumed by major kernel consumers like pools.  eg, there
is a known radeondrmkms leak we have not figured out why it
happens only sometimes, and this ends up making the kmem-160
or kmem-192 (size depends on DIAGNOSTIC, IIRC) pool grow,
possibly using serious percentage of the total memory, which
can end up meaning that adding vm.*min to 95% actually means
way more than 100% of available memory.

so, also check what "vmstat -m" says pools are using.

(we should fix this.  anyone want to define what the right
solution is? :-)


.mrg.

re: nothing contributing entropy in Xen domUs? or dom0!!!

2021-04-01 Thread matthew green

> In this particular example server it's in a Dell R510 with a pair of
> 6-core E5645 CPUs that "cpuid" shows the following for (in the dom0):

this is a westmere-ep CPU, which does not support rdseed
or rdrand.  rdrand appeared in ivybridge (2 generations
later, with sandybridge in the middle.)

kmem pool for half pagesize is very wasteful

2021-02-22 Thread matthew green

hi folks.


while we were debugging some memory starvation issues i noticed that
the "kmem-02048" pool only has 1 item per page on a system with 4KiB
pages, same similarly "kmem-04096" on 8KiB page systems.  i assume
this also occurs on 16KiB page systems for the "kmem-08192" pool.

this happens because the pool redzone increases the size from 2048
bytes to 2048+CACHE_LINE_SIZE bytes.

this feels extremely wasteful to me.  for the common 4K page size
case, it's 2048+64 bytes of header of useful data, plus the 64 bytes
lost for the redzone alignment, and thus 1920 bytes of lost space.

the lost space is similarly just under 1/2 for large page size
systems.

for the smaller kmem pools, the lost space seems not great, but
significantly less than the previous (1024*3 + 2*64*3 = 3456 bytes
of useful info).  46.8% lost vs 15.6% lost, while not great, seems
like a reasonable compromise to almost half the memory required
for the kmem-02048 pool.

this patch avoids this problem:

   https://www.netbsd.org/~mrg/poolwaste.diff

comments?


.mrg.

re: Issues with intelfb(4) and USB keyboards

2020-12-19 Thread matthew green

> [ 1.03] cpu0: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3
[ .. ]
> [ 7.099124] warning: 
> ../../../../external/bsd/drm2/dist/drm/i915/i915_drv.c:591: 
> WARN_ON(!IS_KABYLAKE(dev_priv))kern info: [drm] Memory usable by graphics 
> device = 4096M

this is odd.  it indicates that a skylake cpu (
https://ark.intel.com/content/www/us/en/ark/products/88184/intel-core-i5-6500-processor-6m-cache-up-to-3-60-ghz.html)
but the GPU is calling itself kabylake generation.

i wonder if there's something fixed in newer drm that
handles this situation.  might be worth looking at
the upstream code around this part now.  does it add
additional types for skylake vs kabylake?  when i
patched in the kabylake support in 2018, i had made it
mostly match the upstream version from then.


.mrg.

re: thundering pipe herds

2020-11-23 Thread matthew green

> @@ -395,9 +401,8 @@ pipeunlock(struct pipe *pipe)
>   KASSERT(pipe->pipe_state & PIPE_LOCKFL);
>  
>   pipe->pipe_state &= ~PIPE_LOCKFL;
> - if (pipe->pipe_state & PIPE_LWANT) {
> - pipe->pipe_state &= ~PIPE_LWANT;
> - cv_broadcast(>pipe_lkcv);
> + if (pipe->pipe_waiters > 0) {
> + cv_signal(>pipe_lkcv);
>   }
>  }

this part misses the while loop from the freebsd version i think?


.mrg.

re: [PATCH 0/2] Delete CIRCLEQ

2020-10-11 Thread matthew green

Kamil Rytarowski writes:
> Switch the last user (ypserv) from CIRCLEQ to TAILQ.
> This is inspired by analogous refactoring from OpenBSD:
> https://github.com/openbsd/src/commit/d53c0cf4d32fdbd8b42debfba068f1b0efa423dc#diff-8d0a4fbb89658213ebf314ff188581d7
> 
> Remove the CIRCLEQ API completely from the system headers and document
> this fact in the QUEUE(3) man-page.

why?  queue.h may be used by more than src.

i don't see any benefit except forcing working code to
be changed, possibly introducing bugs.

please leave it alone.

.mrg.

re: "Boot this kernel once" functionality? (amd64)

2020-09-24 Thread matthew green

Reinoud Zandijk writes:
> On Wed, Sep 16, 2020 at 12:09:43PM +0200, Martin Husemann wrote:
> > On Wed, Sep 16, 2020 at 12:05:26PM +0200, Anthony Mallet wrote:
> > > I was also wondering if it would be possible to pass arguments to the
> > > primary or secondary bootloader via reboot(2) and the boothowto
> > > flags. But this doesn't seem doable. Right?
> > 
> > This works fine on e.g. sparc*; I can do: shutdown -b netbsd.t -r now
> > 
> > and it will pass "netbsd.t" as boot argument to the firmware, which passes
> > it on to the bootloader and then it boots /netbsd.t once.
> 
> In shutdown(8) I read that the arguments are passed to reboot(8) and that is
> mentioned in kloader(4) so I guess its using that mechanism.

this does not use kloader(4), which appeared in netbsd 1.6.
the "pass boot args to loader" appeared in netbsd 1.3, and
it is implemented on sparc* by asking the firmware to reboot
with these arguments instead of the default.  it very much
relies upon this firmware feature.


.mrg.

re: patch for am7930 audio

2020-09-10 Thread matthew green

> By the way, I'm not so familiar to sparc.
> audioamd(4) has many assembly code (though they look very old stuff).
> Is there any (historical or traditional) reason why I should not
> remove these?

i would just ignore it.  it's currently unused and if someone
wants to fix it up, it's sitting there easy to look at, but it
is not currently used and is known needing work for years ago
API changes, so don't feel you can't change the code around it.
it's just more work for someone who may eventually revive it.


.mrg.

re: pmap_activate() with non-curlwp?

2020-08-16 Thread matthew green

Jason Thorpe writes:
> From my reading of the code, it seems that there are no longer any
> circumstances where pmap_activate() will be called with non-curlwp, at
> least in MI code.
> 
> Is this a correct reading?

seems right, and only vax has one MD caller that appears to
not be curlwp but soon-to-be-curlwp.

.mrg.

re: style change: explicitly permit braces for single statements

2020-07-13 Thread matthew green

> On 20-07-12 10:01, Luke Mewburn wrote:
>   | I propose that the NetBSD C style guide in to /usr/share/misc/style
>   | is reworded to more explicitly permit braces around single statements,
>   | instead of the current discourgement.
>   |
>   | IMHO, permitting braces to be consistently used:
>   | - Adds to clarity of intent.
>   | - Aids code review.
>   | - Avoids gotofail: 
> https://en.wikipedia.org/wiki/Unreachable_code#goto_fail_bug
>   |
>   | regards,
>   | Luke.
> 
> I was asked to CC this thread to tech-kern@, so I'm doing that.

yes please.


.mrg.

re: makesyscalls

2020-06-09 Thread matthew green

i'm not very interested in a solution that doesn't use tools
available to the build.  you've not shown that there is
sufficient pain here to force an external solution.  i'm not
sure i buy your claims about awk and size of program.  IME,
it just requires that one is strict to rules.

if you want to use python for creating files in the tree
for _our_ things, then i think you have to propose adding
python to src/tools.  [*]

why aren't you willing to discuss a lua version?  it has
most of the features you complained awk is missing, and would
make it relatively easy to unit-test the components easily.


.mrg.


[*] i support this generally, and as a non-public visible
library that our gdb can use as well.  i do not think we
should add it to /usr/bin (but i might be convinceable that
it can be installed in a non-standard location, using non-
standard paths for libararies, as long as it does not have
anything to do with or interfere with pkgsrc or other ways
of using python.)

re: kernel stack usage

2020-05-30 Thread matthew green

glad to see this effort and the clean up already!

ideally, we can break the kernel build if large stack consumers
are added to the kernel.  i'd be OK with it being default on,
with of course a way to skip it, and if necessary it can have
a whitelist of "OK large users."

> 1264cdioctl at cd.c:1204
> 1248uvm_swap_stats at uvm_swap.c:726

i think we can ignore these two.  they're both going to be
early in the stack so very unlikely to be problematic.


.mrg.

re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-17 Thread matthew green

what's the use-case for disabling encrypted swap later?

i'd argue we should avoid kauth for this and simply disable
it always as i've been unable to think of any use case that
is the only solution.


.mrg.

re: DDC info refresh?

2020-04-15 Thread matthew green

have you looked at xrandr?  you may be able to just run
xrandr --auto after connecting the display, or you can
explicitly ask for a mode with it with --output 
--mode .


.mrg.

re: Please review: lookup changes

2020-03-10 Thread matthew green

> This reminds me that we need to find a way to release upper layer
> vnodes when the lower layer is recyclable -- see the comment in
> layer_inactive.  Otherwise files deleted under a null mount while open
> through the null mount may persist on disk indefinitely as long as the
> upper layer is still around...

FWIW, i think it's worse than you describe.  for me:

- it doesn't matter what layer i delete the files from

- the files aren't open anymore

the only way i found to recover disk space was to unmount
the upper layer (which hung for a while, assuming writing
the new disk contents), or, later reduce kern.maxvnodes.


.mrg.

re: NULL pointer arithmetic issues

2020-02-24 Thread matthew green

> > Nonsense, I think it's fair to classify that as a bug.  That sort of
> > stuff is *not* supposed to happen if -ffreestanding is passed to the
> > compiler.
> 
> If we use 0x0, it can be a valid pointer.
> 
> If we use NULL, it's not expected to work and will eventually generate a
> syntax erro.

this is not true in GCC, since a while now -- 2 years ago i
had to write this code to stop GCC from emitting "trap" when
accessing the 0x0 pointer, from powerpc/oea/oea_machdep.c:

/*
 * Load pointer with 0 behind GCC's back, otherwise it will
 * emit a "trap" instead.
 */
static __inline__ uintptr_t
zero_value(void)
{
uintptr_t dont_tell_gcc;

__asm volatile ("li %0, 0" : "=r"(dont_tell_gcc) :);
return dont_tell_gcc;
}


.mrg.

re: NULL pointer arithmetic issues

2020-02-23 Thread matthew green

> It seems to me the proper approach is to teach the tool to accept
> this, and to avoid cluttering the tree with churn to work around the
> tool's deficiency, unless there's actually a serious compelling
> argument -- beyond a language-lawyering troll -- that (char *)NULL + 0
> is meaningfully undefined.
> 
> We already assume, for example, that memset(...,0,...) is the same as
> initialization to null pointers where the object in question is a
> pointer or has pointers as subobjects.
> 
> I think we should treat memcpy(NULL,NULL,0) similarly and tell the
> tool `no, on NetBSD that really is defined and we're not interested in
> hearing about theoretical nasal demons from armchair language
> lawyers'.

well said.  i 100% agree.  these extreme edge-cases of UB
that have a very clear definition don't seem to he helpful
in finding any real class of bugs and only seem to be good
at cluttering up working code.


.mrg.

re: config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-20 Thread matthew green

Jarom?r Dole?ek writes:
> Le lun. 17 f=C3=A9vr. 2020 =C3=A0 17:55, matthew green  =
> a =C3=A9crit :
> >
> > FWIW, i've been running my radeon with a patch that exlicitly drops
> > kernel lock around the "real attach" function (the one that config
> > mountroot ends up calling.)
> >
> > we really need to MPSAFE-ify the autoconf subsystem.  right now, it
> > is expected that autoconf runs with kernel lock... i am not sure of
> > the path we should take for this -- but let's actually have a design
> > in place we are happy with, while my change below works, it's ugly
> > and wrong.
> 
> Would it make sense to simply require all callers of
> config_mountroot() to have MPSAFE config routines and call the
> callbacks for them without kernel lock?
> 
> There are just few of those, and they can simply be changed to do the
> KERNEL_LOCK()/UNLOCK themselves.

this is a tempting idea, but i'm loathe to have different parts
of autoconf run with different locking semantics.

eg, my hack has the possibility of attempting to update the
autoconf private data which is currently protected by the
kernel lock, and i suspect it's actually possible because
these routines attach sub-devices:

radeon0 at pci10 dev 0 function 0: ATI Technologies Mobility Radeon HD 4500 
(rev. 0x00)
radeondrmkmsfb0 at radeon0
wsdisplay0 at radeondrmkmsfb0 kbdmux 1
wsmux1: connecting to wsdisplay0
wskbd0: connecting to wsdisplay0

right now with my change, all that runs with no protection.
it seems non-trivial to fix that all up for only
config_mountroot().

this is why i wrote "let's actually have a design"  :-)


.mrg.

re: config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-17 Thread matthew green

FWIW, i've been running my radeon with a patch that exlicitly drops
kernel lock around the "real attach" function (the one that config
mountroot ends up calling.)

we really need to MPSAFE-ify the autoconf subsystem.  right now, it
is expected that autoconf runs with kernel lock... i am not sure of
the path we should take for this -- but let's actually have a design
in place we are happy with, while my change below works, it's ugly
and wrong.


.mrg.


Index: sys/external/bsd/drm2/radeon/radeon_pci.c
===
RCS file: /cvsroot/src/sys/external/bsd/drm2/radeon/radeon_pci.c,v
retrieving revision 1.14
diff -p -u -r1.14 radeon_pci.c
--- sys/external/bsd/drm2/radeon/radeon_pci.c   24 Jan 2020 11:44:27 -  
1.14
+++ sys/external/bsd/drm2/radeon/radeon_pci.c   17 Feb 2020 16:54:05 -
@@ -229,6 +229,9 @@ radeon_attach_real(device_t self)
unsigned long flags;
int error;
 
+   /* XXXSMP autoconf */
+   KERNEL_UNLOCK_ONE(NULL);
+
ok = radeon_pci_lookup(pa, );
KASSERT(ok);
 
@@ -274,6 +277,9 @@ radeon_attach_real(device_t self)
}
 
 out:   sc->sc_dev = self;
+
+   /* XXXSMP autoconf */
+   KERNEL_LOCK(1, NULL);
 }
 
 static int

re: [filemon] CVS commit: htdocs/support/security

2019-12-18 Thread matthew green

> As far as I can tell, there are many races caused by autoloading.

i have long advocated that we should turn off both
module autoload and autounload, as they're security
and reliability nightmares.

*perhaps* autoload, for a specific list of known OK
modules would be OK in the default for me, but the
general & global defaults are, IMO, very bad.


.mrg.

re: usbhist support for urtwn

2019-11-25 Thread matthew green

Andreas Gustafsson writes:
> matthew green wrote:
> > [misread everything]

ah, please go ahead.


.mrg.

re: usbhist support for urtwn

2019-11-25 Thread matthew green

Andreas Gustafsson writes:
> Hi all,
> 
> I have this patch to replace the debug printfs in sys/dev/usb/if_urtwn.c
> by usbhist calls, roughly modelled after the use of usbhist in if_axe.c:
> 
>   https://www.gson.org/netbsd/patches/urtwn-usbhist.patch
> 
> OK to commit?

this looks fine, but if you feel up to it, you can re-combine
the single DPRINTF() -> two calls using URTWNHIST_CALLARGS().
if_aue.c is a recent file converted to this method.  it has
the advantage of consuming fewer kernhist slots -- 1 line vs 2.

nit:  the new DPRINTFN(), URTWNHIST_CALLED(), and
URTWNHIST_CALLARGS() macros should use the do { .. } while(0)
idiom the old DPRINTFN() did so they remain/are safe for us in
all contexts.

thanks!

.mrg.

re: __{read,write}_once

2019-11-21 Thread matthew green

_always()?

re: __{read,write}_once

2019-11-06 Thread matthew green

> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.

i really do not like the "once" name.

even though i'm familiar with the actual desired semantics,
i have to ignore my builtin meaning that comes first.  to me,
"once" initially reads as compiler will only do it one time,
ever, not every time it is called.  like pthread "once".

can we make them __read_always() and __write_always()?


.mrg.

re: Adding an ioctl to check for disklabel existence

2019-09-29 Thread matthew green

> Yes, we will - but can't we make that something detectable?   If the
> kernel invents a lael, it says "fictitious" in the label field.

i had a similar idea, and i like your "converted" idea too.

i was looking at struct disklabel itself, and while i realised
we could probably abuse d_subtype to mark this?  it seems to
be entirely unused in our code base.

however, disklabel fails at >2TiB for 512 byte sector, so i'm
now thinking that fixing this doesn't really solve the problem
for the future properly -- disklabel doesn't return a true
label here anyway... so it seems that we should be retiring
DIOCGDINFO usage as much as possible, rather than figuring out
how to enhance it..


.mrg.

re: mstohz and hztoms

2019-09-28 Thread matthew green

> Comments?

i like the clean up.  it's clearly a step forward.

i only don't understand why 32 bit platforms can't handle
large values here but 64 bit ones can.  is it only so that the
32 bit platforms don't use 64 bit maths when it's not needed?

it just seems wrong to me to limit 32 bit artificially here,
and it's not like it's _that_ difficult to overflow 32 bit hz.
i run with HZ=1000 on some systems, like alpha does by
default.  that gives you 49 days.  even with standard HZ=100
it's only 16 months or so.  (hmm, i wonder if these macros
compile nothing with HZ=1000 kernels.  be nice to confirm or
add a hack :-)

can we make the 32 bit version smarter about accepting small
values with 32 bit maths, but large or non-constant values
with 64 bit maths?

perhaps an explicit mstohz64() could handle the cases if we
know they will exist, since most probably _know_ they are
dealing with short intervals.

there's just something about the artificial limit here that
is bugging me...

thanks.


.mrg.

re: reducing the use of M_NOWAIT / KM_NOSLEEP memory allocations

2019-09-25 Thread matthew green

> Does anyone have any comments on this plan?

yes, please.


.mrg.

re: x86 bootstrap features

2019-09-24 Thread matthew green

Emmanuel Dreyfus writes:
> Generally speaking, does anyone see any usage fpr GPT and RAIDframe
> support, or multiboot 2 support for dosboot, pxeboot and netboot? If
> not, we could disable the features for that boostraps, which would
> save space for later.

i use raidframe and gpt with pxeboot.  i'm not sure i get the
usefulness of multiboot in pxeboot, but perhaps just need to
know about the missing link.

> And last question; should we have a build-time check that pxeboot_ia32.bin
> is smaller than 64 kB?

more research with more systems would be good here, but perhaps
we simply need to provide multiple versions of each style (eg,
netboot + something, biosboot + something, etc.)

this sucks, though.

it would be great if someone looked at removing as much code
here as possible -- growth shouldn't be that much for good
long time once it works sanely with current systems.

.mrg.

re: fexecve

2019-09-08 Thread matthew green

not really commenting on the proposal itself, but ..

> Let us not forget that you need a binary inside the chroot that can
> call fexecve() on a file descriptor or the ability to build such a
> binary.

this is only one buffer overflow away...  ie, strength in
layers would imply you should not rely this.


.mrg.

re: pool guard

2019-09-07 Thread matthew green

> No performance cost, because these guarded buffers are allocated only when the
> pools grow, which is a rare operation that occurs almost only at boot time. No
> actual memory consumption either, because unmapped areas don't consume 
> physical
> memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on
> amd64, far beyond what we will ever need -, so no problem with consuming VA.

i like this idea, but i would like to point out that HPE already
sell a machine with 48TiB ram and nvdimms are going to explode
the apparently memory in the coming years, so "32TiB" is very far
from "plenty".  we have many challenges to get beyond 8TiB tho,
since we count pages in 'int' all over uvm.


.mrg.

re: ioctl VNDIOCSET vs netbsd32

2019-09-03 Thread matthew green

> /*
>  * The next two structures are marked "__packed" as they normally end up
>  * being padded in 64-bit mode.
>  */
> struct netbsd32_vnd_ioctl {
> netbsd32_charp  vnd_file;   /* pathname of file to mount */  
> int vnd_flags;  /* flags; see below */   
> struct vndgeom  vnd_geom;   /* geometry to emulate */
> unsigned intvnd_osize;  /* (returned) size of disk */
> uint64_tvnd_size;   /* (returned) size of disk */
> } __packed;
> 
> where the __packed makes the bogus difference.
> 
> I don't understand the comment. Of course they end up being padded
> (2 bytes after vnd_flags, 2 bytes after vnd_osize), but that applies to
> both the 64bit and the 32bit ABI.
> 
> For which architectures is this __packed important? We need to somehow
> conditionalize it. Or am I missing something?

the packed makes it work on x86-64 where the alignment of
vnd_size is at 7 * 4 bytes, and plain uint64_t will get
padded for 8 byte alignment.

this is bogus (thanks, 9 year old me.)  can you try this?
it removes the __packed and replaces references to the
properly aligned types.  compile tested only.

thanks.


.mrg.


Index: netbsd32_ioctl.h
===
RCS file: /cvsroot/src/sys/compat/netbsd32/netbsd32_ioctl.h,v
retrieving revision 1.68
diff -p -u -r1.68 netbsd32_ioctl.h
--- netbsd32_ioctl.h20 Aug 2019 09:32:21 -  1.68
+++ netbsd32_ioctl.h3 Sep 2019 18:48:44 -
@@ -473,23 +473,19 @@ struct netbsd32_sioc_sg_req {
 /* from  */
 #defineSIOCGETSGCNT32  _IOWR('u', 52, struct netbsd32_sioc_sg_req) /* 
sg pkt cnt */
 
-/*
- * The next two structures are marked "__packed" as they normally end up
- * being padded in 64-bit mode.
- */
 struct netbsd32_vnd_ioctl {
netbsd32_charp  vnd_file;   /* pathname of file to mount */
int vnd_flags;  /* flags; see below */
struct vndgeom  vnd_geom;   /* geometry to emulate */
unsigned intvnd_osize;  /* (returned) size of disk */
-   uint64_tvnd_size;   /* (returned) size of disk */
-} __packed;
+   netbsd32_uint64 vnd_size;   /* (returned) size of disk */
+};
 
 struct netbsd32_vnd_user {
int vnu_unit;   /* which vnd unit */
-   dev_t   vnu_dev;/* file is on this device... */
-   ino_t   vnu_ino;/* ...at this inode */
-} __packed;
+   netbsd32_dev_t  vnu_dev;/* file is on this device... */
+   netbsd32_ino_t  vnu_ino;/* ...at this inode */
+};
 
 /* from  */
 #define VNDIOCSET32_IOWR('F', 0, struct netbsd32_vnd_ioctl)/* 
enable disk */

RFC: usb ethernet common code

2019-07-28 Thread matthew green

hi folks.


i recently fixed a bunch of problems in several usb ethernet
drivers and there's a lot of copied code among them.

to avoid this i wrote this patch which introduces a common
library for usb ethernet drivers and converts if_axen.c to
use it.  these are the areas handled:

 - USB endpoint pipe handling
 - rx and tx chain handling
 - generic handlers or support for several struct ifnet callbacks
 - MII bus locking
 - interrupt handling
 - partial autoconf handling

for if_axen.c, the code reduction is 1902 to 1166 lines, and
i expect similiar savings for the other drivers as well.

i plan to convert other drivers i have handy (cdce, ure, axe),
but i'd like others with more network clue than i do to consider
the interfaces i've implemented.  some of them may seem ugly,
but i was unable to see a better method without forcing less
code sharing.

   https://www.netbsd.org/~mrg/usb_net.v15.diff

there is one issue left to deal with: this will require that
usb_net.c to be part of the base USB stack, or, fork this code
into its own module and ensure all other modules have the right
dependency.  i haven't thought about this byeond assuming that
itw ill be as simple as a usb_net module that other modules
have as a dep.

thanks.


.mrg.

re: [PATCH v3] Include XSTATE note in x86 core dumps

2019-07-15 Thread matthew green

> Introduce a simple COREDUMP_MACHDEP_LWP_NOTES logic to provide machdep
> API for injecting per-LWP notes into coredumps, and use it to append
> PT_GETXSTATE note.

this macro is pretty gross.  these are the problems i see:

- assumes 'int error'
- assumes 'struct lwp *l = curlwp'
- assumes 'name' is something
- assumes 'ns' is something
- returns from macro

all of those should be resolved.  4 are simple, they can be easy
inputs.  the return issue is more difficult, and perhaps should
be revisited (ie, perhaps just skip over this entry instead of
assuming return is right.)

> Since the XSTATE block uses the same format on i386 and amd64, the code
> does not have to conditionalize between 32-bit and 64-bit ELF format
> on that.  However, it does need to distinguish between 32-bit and 64-bit
> PT_* values.  In order to do that, it reuses PT32_* constant already
> present for ptrace(), and adds a matching PT64_GETXSTATE to satisfy
> the cpp logic.

i don't understand why the need to include netbsd32 headers or
define these separately.

if the layout is the same, what's the benefit?  we already know
to assume 32 or 64 bit from the ELF core header, right?


.mrg.

re: re-enabling debugging of 32 bit processes with 64 bit debugger

2019-06-30 Thread matthew green

i fully support making debugging work.  that means debugging
32 bit apps on 64 it kernels, and also, eventually?, using
a 32 bit debugger on 64 bit kernel.

these are all real-world cases people use and expect to work.

thanks.


.mrg.

re: ehci: fix error handling?

2019-06-15 Thread matthew green

> Ok, thanks, I've fixed it and the machine shuts down correctly now.

i saw the commit - thanks for making it better.

> However there seem to be other problems in the attach/detach. ehci_init()
> and ehci_detach() are not symmetrical, the former initializes more stuff
> than the latter deinitializes. ehci_pci.c deinitializes the extra stuff
> correctly, but the others (cardbus, arm, mips) don't.

yup.  it's kind of a wreck, but we're slowly making all of
this better.  for a long time many of the host controllers
didn't need to support detach, but we've been moving more
and more towards everything is capable of detaching the
last few years, and sometimes things crash annoyingly when
you are trying to reboot.

thanks.


.mrg.

re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-29 Thread matthew green

> Do we really expect module updates without updating kernel
> to work?
>
> If yes will do iot next time.

yes - if you add, not just change, to the kernel abi in a way
that modules will notice, please bump the version.  it should
make this sort of problem more obvious (due to missing file
errors, vs "error 8".)

thanks!


.mrg.

re: [PATCH 1/2] compat32: translate userland PT_* request values into kernel

2019-05-29 Thread matthew green

thanks for working on this.

> + case PT_FIRSTMACH + 0:
> + return PT_STEP;
> + case PT_FIRSTMACH + 1:
> + return PT_GETREGS;
[ ... ]

these magic numbers are a little ugly.  can you avoid them?
is there a way to have amd64 have direct access to the i386
values?

> --- a/sys/sys/ptrace.h
> +++ b/sys/sys/ptrace.h
> @@ -283,6 +283,10 @@ int  ptrace_machdep_dorequest(struct lwp *, struct 
> lwp *, int,
>  #define FIX_SSTEP(p)
>  #endif
>  
> +#ifndef PTRACE_TRANSLATE_REQUEST32
> +#define PTRACE_TRANSLATE_REQUEST32(x) x
> +#endif
> +

can this part live in sys/compat(/netbsd32)?  no need for the
public ptrace header to get this, right?


.mrg.

re: evbarm hang

2019-04-19 Thread matthew green

> So here's our deadlock: cpu 0 holds the kernel lock and wants the pool spin
> mutex; cpu 1 holds the spin mutex and wants the kenrel lock.

AFAICT, cpu 1 is at fault here.  this locking order is
backwards.

not sure why arm32 pmap operations on the kernel map are
with the kernel lock instead of another lock.  the arm64
version of this code doesn't take any lock.


.mrg.

re: To get net ioctl number

2019-02-27 Thread matthew green

Iain Hibbert writes:
> On Thu, 28 Feb 2019, Masanobu SAITOH wrote:
> 
> >  I'd like to get new number for new ioctl. How should I find unused number
> > for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
> > so I made usr.bin/kdump-ioctl.c and did
> > 
> > grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t
> > 
> > I think it might not enough because kdump-ioctl.c has no compat-related
> > code.
> > 
> >  What should I do?
> 
> In most cases, an ioctl value is only valid when performed on a handle 
> leading to a specific subsystem, meaning that it does not really need to 
> be globally distinct. There is a list of the letters used in ioctl(9)

true that that don't _need_ to be globally unique, but it sure
is nice for kdump(1).


.mrg.

re: To get net ioctl number

2019-02-27 Thread matthew green

>   This might be FAQ...
> 
>   I'd like to get new number for new ioctl. How should I find unused number
> for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
> so I made usr.bin/kdump-ioctl.c and did
> 
>   grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t
> 
> I think it might not enough because kdump-ioctl.c has no compat-related
> code.
> 
>   What should I do?

i don't have a script, but looking in the tree for _IO*() calls should
find all the relevant ioctls we have defined.


.mrg.

re: RFC: New userspace fetch/store API

2019-02-24 Thread matthew green

> > On Feb 23, 2019, at 4:04 PM, matthew green  wrote:
> >
> > i like this.  the current API is ... odd.
> 
> Oh any thoughts on have intrsafe or not?

if we can get away without it, that would be best.  how hard is
the single(?) caller to fix? :-)


.mrg.

re: RFC: New userspace fetch/store API

2019-02-23 Thread matthew green

i like this.  the current API is ... odd.

can you add alignment documentation?  eg, if only to say that it
must support aligned or unaligned accesses for the relevant
datatypes.  not sure what we need currently, ie, if unaligned is
needed because that happens today, or we can define it as wrong
in this API.

thanks.


.mrg.

re: fallthrough and breaks in drm and atheros codes

2019-02-18 Thread matthew green

> The only different is that my GENERIC has DEBUG and LOCKDEBUG... And as
> you can see from the command line, there is
> -Wno-error=implicit-fallthrough

hmmm..  is this being missed some how?

> cat conf/copts.mk
#   $NetBSD: copts.mk,v 1.3 2019/02/10 05:01:59 mrg Exp $

# MI per-file compiler options required.

.ifndef _SYS_CONF_COPTS_MK_
_SYS_CONF_COPTS_MK_=1

.if defined(HAVE_GCC) && ${HAVE_GCC} == 7 && ${ACTIVE_CC} == "gcc"
COPTS.zlib.c+=  -Wno-error=implicit-fallthrough
COPTS.pf.c+=-Wno-error=implicit-fallthrough
COPTS.radeon_cs.c+= -Wno-error=implicit-fallthrough
COPTS.via_dmablit.c+=   -Wno-error=implicit-fallthrough
.endif

.endif

it's a new file, but should be present for all kernel builds?


.mrg.

re: fallthrough and breaks in drm and atheros codes

2019-02-18 Thread matthew green

> I am not sure if the following is correct, so I am posting it here instead
> of committing...

see my post to source-changes-d -- most of these should
be ignored for warings already, so i'm curious why you
had to fix them.

wrt what is correct, i found several fixes needed by our
copy of upstream code were already upstream, so it was
easy to determine what was right for those ones.

i doubt anyone can really tell you about nouveau fixes
except upstream.


.mrg.

re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread matthew green

> >>> @@ -472,6 +472,9 @@
> >>>   const char *bootname = device_xname(bdv);
> >>>   size_t len = strlen(bootname);
> >>>
> >>> + if (bdv == NULL)
> >>> + return 0;
> >>> +
> >>
> >> This looked suspicious, even before I read the code.
> >>
> >> The question is if it is ever legitimate for bdv to be NULL.
> >
> > infact, bdv has already been dereferenced at this point:
> > the assignment to bootname 3 lines up.
> 
> So, if we're going to insert a KASSERT() we would need to separate
> the assignment from bootname's declaration, and do the assignment
> after the KASSERT()

i am not a huge fan of KASSERT(ptr != NULL).  page fault is
pretty much just as clear as the assert, and does not need
special code to handle it.

that said, i do see some utility in it as a guide to others
reading this code, and won't object to this usage.


.mrg.

re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread matthew green

> > @@ -472,6 +472,9 @@
> > const char *bootname = device_xname(bdv);
> > size_t len = strlen(bootname);
> >  
> > +   if (bdv == NULL)
> > +   return 0;
> > +
> 
> This looked suspicious, even before I read the code.
> 
> The question is if it is ever legitimate for bdv to be NULL.

infact, bdv has already been dereferenced at this point:
the assignment to bootname 3 lines up.


.mrg.

re: Audio device mmap and kevents

2019-01-19 Thread matthew green

i will have a look at the audioplay changes in the next week.


.mrg.

re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread matthew green

> | for xhci, all of these seem to be the same issue and it
> | appears to be a "high level allocator doesn't know what
> | it is allocating and does not align properly".  the
> | problem is likely:
> | 
> | #define XHCI_TRB_ALIGN 16
> | struct xhci_trb {
> | ...
> | } __packed __aligned(XHCI_TRB_ALIGN);   
> | 
> | which is included in:
> | 
> | struct xhci_softc {
> | ...
> | struct xhci_trb sc_result_trb;
> | 
> | 
> | ... how do we convey the structure alignment needed for
> | softc, or do we fix it by moving it into its own separate
> | aligned allocation?
> 
> Shouldn't the compiler know how to do this right by padding around the
> structure that needs alignment and assuming the default alignment for
> the enclosing structure?

how can it?  it has to be 16 byte aligned.  the alignment
provided is only 8.  so sometimes the padding will work and
sometimes the padding will fault, depending on what bit 3
is of the returned address.  it may already pad before hand
to get 16 byte alignment from the start of the structure,
as well as force the whole structure alignment to 16, looking
above this member, it's hard to easily tell, it could be
needing 4, 8 or 12 bytes of padding (4/12 only 32 bit only.)

this is why the compiler assumes it must be 16 byte aligned
already -- because that's the only case that is valid.


.mrg.

re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread matthew green

> http://netbsd.org/~kamil/kubsan/0007-boot-real-hardware.txt

for xhci, all of these seem to be the same issue and it
appears to be a "high level allocator doesn't know what
it is allocating and does not align properly".  the
problem is likely:

#define XHCI_TRB_ALIGN 16
struct xhci_trb {
...
} __packed __aligned(XHCI_TRB_ALIGN);   

which is included in:

struct xhci_softc {
...
struct xhci_trb sc_result_trb;


... how do we convey the structure alignment needed for
softc, or do we fix it by moving it into its own separate
aligned allocation?


.mrg.

re: scsipi: physio split the request

2018-12-27 Thread matthew green

> Of course larger transfers would also mitigate the overhead for each I/O
> operation, but we already do several Gigabyte/s with 64k transfers and
> filesystem I/O tends to be even smaller.

yes - the benefits will be in the 0-10% range for most things.  it
will help, but only a fairly small amount, most of us won't notice.

i've seen peaks of 1.4GB/s with an nvme(4) device with ffs on top.


.mrg.

linux compat code vs netbsd kernel-internal negative error numbers

2018-12-26 Thread matthew green

hi folks.


my laptop has been crashing in i915 the last few days and we finally
tracked down the cause.  cv_wait_sig() can return ERESTART, which has
the value of -3.  when linux code returns errors it often does it
encoded in a pointer, and in the low negative range, so typically a
return value is 0, or negative errno.  this means that -ERESTART turns
into positive 3 and then eg, our PTR_ERR() macro asserts that the
errno is not less than zero.

so, the basic problem is that netbsd errnos are in the range -6 ..
95, linux has them all positive, and declares special handling for
negative values in many cases, and these aren't easy to reconcile.

there are two ways i can see to fix this.

- all errno interactions with linux style code bases need to have
  their values mapped to/from linux in all cases, and that the
  mapped values probably need to be unique.  it may be that only
  ERESTART needs handling (could only remap this, and assert the
  others aren't set.)  this means updating a lot of code, though
  the vast majority have a common comment.

- re-number these for netbsd.  i am running a kernel where the
  negative values have 255 addded to them.  (current ELAST is 96,
  so there's quite a lot of space left.)

the first one means not changing our old style for kernel-only
errors, but i'm really a fan of that anyway.  the second does
mean giving up a minor ability to check quickly (user errno should
be a small positive number, vs, range or similiar check.  i could
not find any of these checks anyway, so maybe not a real issue.)

i think i prefer the second because it is both simpler to deal with
means not having to constantly patch new code for the remap, but it
does seem disappointing to have to give up this "feature" so we can
have nice things.


any suggestions or preferences?  


.mrg.

re: svr4, again

2018-12-20 Thread matthew green

Christos Zoulas writes:
> In article <3af3b7c6-d34e-471d-8cf8-a411e9032...@me.com>,
> Jason Thorpe   wrote:
> >While I agree that it’s not exactly difficult to maintain the a.out
> >exec package, I do wonder if there is anyone out there actually running
> >ancient a.out binaries.
> 
> Well, I still have a lisp from 0.9 :-) Anyway if we get rid of a.out we
> might as well get rid of compat <= 1.5...

netbsd had ELF ports before 1.5.  i forget when, but at
least alpha and maybe mips were ELF before i386/sparc's
conversion from a.out in 1.5.

IIRC.


.mrg.

re: svr4, again

2018-12-19 Thread matthew green

Jason Thorpe writes:
> ...and while we're talking about "questionable utility"...
> 
> COMPAT_AOUT_M68K, COMPAT_M68K4K, and COMPAT_VAX1K are merely exec glue 
> (well, mostly; COMPAT_AOUT_M68K has some ancient stat()-related 
> emulation) for what are, at this point, ridiculously old a.out binaries 
> that you also need to have the entire a.out runtime for.
> 
> I would say at this point that nobody has any business running these 
> ancient a.out binaries.  All of these platforms switched to ELF **long** 
> ago.  Even though these layers are extremely thin, a reasonable argument 
> could be made that these should go, too.

i would argue that until we're willing to drop a.out exec
entirely we should keep the above.  let's not chip and hack
around it.


.mrg.

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo